2

As far as I know, deleting a non empty directory could work the same way as deleting an empty directory: by removing the pointer to the directory's metadata there would be no pointers to the items it contained, effectively deleting all its children recursively.

If that's true, then why must directories be empty before being deleted? Is it just a safeguard to prevent erasing many files at once, or a technical limitation of some (possibly ancient) file system?

2 Answers2

4

Many file systems work with a reference count system for files.

~/foo$ ls -l
total 312
-rwxr-xr-x  2 shagie  staff  14884 Jan 24 10:35 a.out*
-rwxr-xr-x  1 shagie  staff    379 Apr 10 13:56 alias.pl*
-rw-r--r--@ 1 shagie  staff  14236 Apr 10 13:50 aliases.csv
-rwxr-xr-x  2 shagie  staff  14884 Jan 24 10:35 b.out*
-rw-r--r--  1 shagie  staff    137 Feb 17 15:30 f.pl
-rwxr-xr-x  1 shagie  staff    616 Mar 24 15:19 file*

You've got permissions, and then the reference count. Note that a.out and b.out are the same file. Once all references to a file are deleted it can be reclaimed. But files and directories are different.

In POSIX, you've got files which are removed by unlink - which removes them from a directory entry.

The unlink function deletes the file name filename. If this is a file's sole name, the file itself is also deleted. (Actually, if any process has the file open when this happens, deletion is postponed until all processes have closed the file.)

And you've got directories which are removed with rmdir.

The rmdir function deletes a directory. The directory must be empty before it can be removed; in other words, it can only contain entries for . and ...

These are separate checks and rmdir can't remove a file nor can unlink remove a directory. That's just the way it works.

This makes the code much simpler and fewer error messages involved. A function does one and only one thing. From Is it OK to split long functions and methods into smaller ones even though they won't be called by anything else?

If, when describing the activity of the code to another programmer you use the word 'and', the method needs to be split into at least one more part.

For rmdir to behave this way the description would be:

Remove the directory entry, and any files if any are contained within the directory.

Note the and. By making rmdir non-atomic this means there are many more situations where a partial process could be done without clearly saying if it was done or not. If you hit files you cannot delete as part of the hypothetical rmalldir is it a success? failure? what error message?

And thus, each function call does one and only one thing - remove a file. remove a directory. Any errors are specifically applicable to that operation.

The code is simpler and there are fewer possible situations for error either in the library or people using the library.

0

I think it's mostly for your protection, to avoid deleting a lot of files you don't mean to. You potentially could have lots of directories and files directories under the top=level one, and not know especially when working from the command line, which is where this convention started back before there were GUIs. Furthermore, I don't there is the concept of a Trask or Recycle bin available for files and folders deleted using the rm command.

As someone already pointed out, you can get around this behavior on Linux and Unix by providing a "-r" flag to the remove command, or even mask it by making rm a alias for "rm -r".

On GUI-based systems, it's much easier to see if you have folders or files underneath the one you are about to delete, plus you can recover files easier.

If you try to delete a folder on Windows, it asks first (like it does for any deletion), but then goes ahead and deletes the folder and all files and folders underneath without any further prompting. If you make a mistake, it's pretty trivial to restore your folders/files from the Recycle bin.

tcrosley
  • 9,621