5

From time to time I run into problems when server hard disks (Linux) fill up quickly with lots of small files. When this happens I have to try to figure out how much space is being taken up and where the files are that are taking up the space. This can be a surprisingly frustrating task because:

  1. Just doing simple things like running ls in a directory with a lot of files can take a long time.
  2. df is fast, but inaccurate and imprecise
  3. du is accurate and can tell you where all your space is going, but takes forever to run

I want to know, quickly and accurately, where all my space is going on a hard disk where terabytes of space may be occupied by millions of small files.

It seems that this is impossible with conventional filesystems (if not, I'd like to hear about it)

My question is whether any of the new filesystems available on Linux (btrfs, zfs, reiserfs etc) have any super-clever features that might help with this problem. For example, I can imagine some kind of log - that is constantly updated every time there is a write - that contains a record of the amount of space occupied at each branch in the filesystem. Then asking my question would just be a matter of reading the log.

That's just a example of the kind of feature that might help, but I am asking for any examples of any sort of feature that might help with answering the question: tell me, quickly and accurately, exactly where the space is being used on my hard disk.

Thanks, Tom

ewwhite
  • 201,205

2 Answers2

2

I only have experience with ZFS in the list you mentioned. With ZFS you can make hierarchical volumes, so for example you could make;

  • tank/category
  • tank/category/product
  • tank/category/product/a
  • tank/category/product/b

etc

With the command "zfs list" you can then get the used, available and reference space for each volume within seconds. But this ofcourse only works when you are able to let your application split it up the right way.

Jeroen
  • 1,339
2

I still use ncdu with my ZFS filesystems. It's even more important now, as it is sparse-file aware and helps make sense of compressed ZFS filesystems.

See: How can I determine what is taking up so much space?

ewwhite
  • 201,205