12

Let's imagine a web site that is a directory of people. For each person there may be a profile photo and a biography.

I'll admit my SQL queries could be better but in general what would be faster and use less processing power.

To check if a file exists and then open it or

check against MySql to see if a bio exists and display it.

I'm pretty sure in the above case the filesystem will smoke the mysql database.

What if I make the database a read only delimited txt file?

What's faster in this case?

Is there a certain point where if the txt file has too many records it's better to use MySql?

4 Answers4

20

The file system is useful if you are looking for a particular file, as operating systems maintain a sort of index. However, the contents of a txt file won't be indexed, which is one of the main advantages of a database. Another is understanding the relational model, so that data doesn't need to be repeated over and over. Another is understanding types. If you have a txt file, you'll need to parse numbers, dates, etc.

So - the file system might work for you in some cases, but certainly not all.

Rob Farley
  • 16,324
  • 2
  • 39
  • 61
15

It really depends on what you are doing. In general the speed at which you can open a file for reading will be better than the speed at which you can establish a network connection. So for very simple operations, the filesystem is definitely faster. Filesystems will probably beat an RDBMS for raw read throughput too since there is less overhead. In fact, if you think about it, the database can never be faster than the filesystem it sits on in terms of raw throughput.

For very complex operations, the filesystem is likely to be very slow. For example:

Read 10 lines out of this 1 billion line file and then search for matching lines in this other file. I pity you if you have to do this. A good database server however has strategies for doing this fast and well so you aren't reinventing the wheel.

Additionally you really need to figure out what you are doing. What data are you storing? How are you going to transform it? If it is 100k image files your solution will look very different than if it is a directory for 100k people. (LDAP maybe? Or an SQL database? Depends on what you are doing, perhaps.) The key here is to pick the tools that match what you are doing and which give you room to add more uses, rather than whatever seems fastest for some rather abstract use case. Databases are wonderful tools, but you can't get a good answer to a question like this.

Finally premature optimization is the root of all evil. Choose useful tools now and figure out the rest later.

Chris Travers
  • 13,112
  • 51
  • 95
5

The file system might be faster initially, but I doubt it. However, as your data size increases you will likely have to restructure your file system to maintain performance. Besides their obvious ability to index on multiple attributes, databases tend to scale better.

Web caches which work similarly to what you are considering use directory tree to maintain performance. They also tend to be of a relatively fixed scale, so they don't have to deal with a growing scale.

For this kind of application I would start with a database, as it fits better with your needs. It will scale much better in the long run. Compared to most file systems, a database will also be more space efficient.

BillThor
  • 4,518
  • 20
  • 12
0

I always love coming to these forums and reading all the heavy database gurus pitching that the file System can't do it as quick as a database. On quite the contrary, a properly laid out tree, well designed hash tables and saving them as an object to a file will yield the same speeds as a database. And from my tests, a properly designed hash table and directory tree will win every time. Way less overhead.

Recently, I have been moving away from database driven programming and more to the file tree for simplicity and program portability. No DB means easy backup - just zip up your tree and go. It is very nice and a recommendation to program in this fashion for one-time clients with small applications.

Look at the big picture: Do I have the time to design my own or just leverage what is already there, like a DB. I personally like saving my objects to file and using them later - just keep an eye on the size of your tables and look into using a RandomAccessFile in order to be able to seek quickly, lay it out like a database and break it up into hash table objects. Enjoy.

Remember whatever data you store in the file will consume double the memory at times depending on your code, the hash table itself and typically where you consume it to view.

matronator
  • 103
  • 4
JDeCarlo
  • 39
  • 1