3

My sysadmin is telling me that we should remove old static files from a server and store them in a database instead because having too many files on a filesystem impacts the general performance of the system. Is the impact significant? We have about 20,000 files in a directory at the moment, and would expect to hit 100,000 sometime in the next few years. This is on a relatively recent Ubuntu LTS system. If 100,000 isn't significant, then what number would be?

Edit: This is different from Maximum number of files in one ext3 directory while still getting acceptable performance? because I don't care about directory performance, but rather about total system performance if the number of files on a system reaches an arbitrary number. In my specific case, the sysadmin is arguing that Apache will slow down due to the total number of files on the entire system.

samspot
  • 347

1 Answers1

1

Since ext3 the handling of files in the file system is at least as fast as finding an indexed row in a database. This is called the HTree (actually, many indexes in databases still use a BTree.)

http://en.wikipedia.org/wiki/HTree

Older systems would start having problems at 1,000 files because the search was linear (start from the first file, and go through the entire directory to find the file you were interested in.)

Why using a database then?

PRO

Then you only need to transport the database from one computer to another (think of a cloud system...), especially if you want to use automatic replication between computers.

CON

All the database you send to the database goes through the network! This means a huge bottleneck. If you do not foresee using the replication feature of your database, then that's enough (for me) to avoid using the database. This will have a HUGE impact on your system. Use the file system directly, since anyway the database will be doing the same thing: save the data to a file!

P.S. Your admin seems to be from the past...

P.P.S. "ext3 HTree indexes are available in ext3 when the dir_index feature is enabled." -- I use ext4 so I don't worry too much about that, although it can be turned off in ext4; hopefully it is turned ON on your server...

Alexis Wilke
  • 2,496