13

I have a scenario where I am uploading .csv files to a specific folder, /tmp/data_upload, every day, and the old files are replaced by the new one.

I need to run a Python script once the data is uploaded. For this, I have an idea to create a cron job and monitor the changes in the file. I tried using inotify, but I am not much into the Unix domain. How can I do that?

I need to execute the script test.py once there is a date change of a file in the upload folder, for example, /tmp/data_upload.

Alex
  • 302

5 Answers5

11

You might need incrond (inotify cron daemon) which will monitors changes on files and then execute scripts.

Incrond can monitor add new file, modify, delete and many more. This is an article shows what event incrond can monitor with some example.

Example for your case, you might create the file /etc/incron.d/data_upload with the contents

/tmp/data_upload IN_CREATE,IN_MODIFY /path/to/test.py 
Jenny D
  • 28,400
  • 21
  • 80
  • 117
4

You could use entr to automatically run the script everytime a file changes by running ls /tmp/data_upload | entr -p script.py once at startup.

Project website: http://eradman.com/entrproject/

Online man page: https://www.systutorials.com/docs/linux/man/1-entr/

jln-ho
  • 41
1

The watchexec (https://watchexec.github.io/) command line utility sounds like exactly what you need, although I believe to install it you'd need to have the Rust build tools installed on your machine, so that may be a dealbreaker

TeNNoX
  • 103
0

Try to look at iwatch (only for Linux, because of inotify) or fswatch commands.

It is likely you need to install them on your machine. i.e. for Debian Linux (bookworm release) it is as easy as

sudo aptitude install iwatch

or

sudo aptitude install fswatch
Zeke Fast
  • 101
0

My general approach would be to fiddle with the classical Unix find utility. For example, the command

find /tmp/upload_data/*.csv -mtime -1 -exec /home/myname/test.py

will find any .csv files in /tmp/upload_data that have been modified less than one day ago, and run your test.py if it finds any. Of course, if your test.py file is in some other directory, you want to update your path to it accordingly.

If you run your cron job more often than once a day, you can use the mmin option to find to specify the maximal time since modification in minutes. For example,

find /tmp/upload_data/*.csv -mmin -60 -exec /home/myname/test.py

will search for .csv files that were modified less than 60 minutes ago -- useful if cron runs the job hourly.

Two fair warnings are in order: First, this won't catch .csv files that you entirely deleted. You may want to check for these separately. Second, I did not have time to test any of this. Expect typos in my code that you'll have to debug by yourself.