1

I have a large file system in which I have to delete certain directories from time to time. Currently I have a script which amongst other things, deletes a folder and subsequently generates an email notification. However, as the deletion of a directory can take anything from a few seconds to a few days, I would like to do this asychronously.

I can cook up a solution by say, generating little snippets like rm -rf /some/directory in the appropriate cron directory, but that might get clogged if a large number of large directories need to be deleted.

Is anyone aware of a better solution?

loris
  • 242

2 Answers2

0

Deleting a folder should be nearly instantaneous. It is searching the directory tree and deleting multiple files and directories which is likely the issue.

that might get clogged

I don't know what you mean by this.

If you worry that execution of a single instance may overlap with the subsequent execution, then why is that an issue? If there is a valid for reason for ensuring exclusivity of instances, then use a lock file or limit the run time with timeout.

symcbean
  • 23,767
  • 2
  • 38
  • 58
0

What is slowing down your deletion is not the file removal by itself (as such operations are batched in the journal and committed to the main filesystem in large chunks, so they already are async in a sense), rather the sync reads needed to discover what to delete. In other words, is the metadata traversal needed to list all the inodes to be deleted that commands the biggest hit - by far. There is no real escaping from that, unfortunately.

Some things you can do:

  • use a fast cache device to cache as many metadata as possible
  • use disposable volumes/filesystem, where "delete many files" becomes "simply discard the entire volume or filesystem"
  • schedule partial, progressive deletion via cron or similar tools

For more info about delete performance and other things which slow down file removal, you can read this answer.

shodanshok
  • 52,255