Why is filesystem preferred for logs instead of RDBMS?

Question

Question should be clear from its title. For example Apache saves its access and error logs in files instead of RDBMS no matter on how large or small scale it is being utilized.

For RDMS we just have to write SQL queries and it will do the work while for files we must decide a particular format and then write regex or may be parsers to manipulate them. And those might even fail in particular circumstances if great care was not paid.

Yet everyone seems to prefer filesystem for maintaining the logs. I am not biased against any of these methods but I would like to know why it is practiced like this. Is it speed or maintainability or something else?

score 41 · Accepted Answer · edited Sep 18 '17 at 16:44

Too many things can fail with the database and logging these failures is important too.
Unless you have a database system allowing autonomous transactions (or no transactions at all), logging would require a separate connection so a rollback or commit in logging doesn't interfere with rollback or commit in the application.
Many things worth logging happen during startup, i.e. possibly before the database connection has been established.
In what could be a typical setup, a new logfile is created every day, old log files are compressed and kept for 2 weeks, before eventually being deleted. It's not easy to do the same in an RDBMS.

score 17 · Answer 2 · answered Jul 12 '11 at 12:10

I've seen logs written to the DB before (and sometimes you get configurable options for logging, where trace goes to file, errors to DB, fatals to Windows Event log).

The main reasons are speed and size, enabling some tracing can produce vast, vast qualtities of logging - I've trawled through log files gigabytes in size. The other main reason is that reading the logs needs to be sequential, there's no real need to query the log, except to find a certain error or entry - and find-in-file works perfectly well for that.

score 16 · Answer 3 · answered Jul 12 '11 at 12:25

Speed is one reason; others are:

Eliminating points of failure. A filesystem rarely fails under conditions where a DBMS wouldn't, but there are lots and lots of error conditions in databases that simple don't exist in filesystems.
Low-tech accessibility. If things go really really bad, you can boot into a rescue shell, or mount the disk on a different system, and still have adequate tools available to inspect log files. If it's a database, you're nowhere without a database server running.

score 3 · Answer 4 · edited Jul 01 '13 at 12:36

First off.

And those might even fail in particular circumstances if great care was not paid.

Database transactions can't fail when you are not careful?

Writing to a text file has a number of benefits, the most important being

Text is human readable. Anyone can open up a log file with an basic text editor and see what the messages are. You don't need to understand how the database is organized.
Speed. Writing text to disc is much faster that a database service figuring out where the text goes in a database, writing it there, and ensuring the transaction completed.

score 2 · Answer 5 · answered Jul 25 '15 at 10:02

You raise Apache specifically, so I will discuss this in detail.

Apache can be configured to log to a database, although it requires an external plugin to do so. Using such a plugin can make log analysis easier, but only if you intend to write your own log analysis software. Standard off-the-shelf log analysers assume your logs are in files, so you won't be able to use these.

When I was doing this, I also experienced reliability issues: if the database server's write buffer filled up (which can happen with mysql if you use up your file system quota for the user it runs under) it starts queuing up queries until they are able to proceed, at which point Apache starts waiting for it to finish, resulting in hung requests to your web site.

(This issue may now be fixed, of course - it was many years ago that I did this)

score 1 · Answer 6 · answered Aug 16 '17 at 14:50

A filesystem is a database. It's indeed a simpler, hierarchical database instead of an relational DBMS, but it's a database nevertheless.

The reason why logging to a filesystem is popular is because text logs fits well with Unix philosophy: "Text is the universal interface."

Unix had developed with lots of general purpose tools that can work well with text logs. It doesn't matter whether the text logs are produced by mysql, apache, your custom application, third party software that's long out of support, the sysadmin can use standard Unix tools like grep, sed, awk, sort, uniq, cut, tail, etc, to trawl through the logs all the same.

If every app logs to its own database, one to MySQL, another to Postgres, another to Elasticsearch, another wants to log to ELK, another can only log to MongoDB, then you would have to learn twenty different tools to trawl the logs of each application. Text is a universal medium that everyone can log to.

Even when you manage to make it so that all logs goes to a single database, say MySQL, you may find that each application would want to log with different table schemas, so you still would have to write customized tool to query the logs for each application. And if you somehow crammed every applications to log to a single schema, you'll likely find that that generic schema couldn't really tell you the full story of each application, so you still have to parse the log texts anyway.

Logging to a database often don't really make things significantly easier in practice.

Logging to a database can be useful when you have a specific analysis that you have in mind, or for specific audit retainment requirement, for which you can design a specific database schema to collect just the data for those specific purposes. But for forensic and debugging and when you collect log without specific objective in mind, text logs are usually good enough that the cost of learning or creating the specialized tools often aren't worth it.

score 0 · Answer 7 · answered Jan 09 '17 at 07:43

Let's look at this on a few layers:

Machine layer
Operating system layer
Service layer
Application layer

In brief:

On the machine layer, you really cannot do logging other than some sort of dumps.
On the OS layer you can do logging but you really only have the file system available.
Services can log to file system, but they cannot trust other services to be running so they cannot log there.
Applications can log to services and the file system.

Then we have the use-case based approach:

Do you want to log node-specific errors to a horizontally scaled RDBMS where you need to take the extra work to find the error of a specific node when you could just pop open the hood for the one node and see it there? On the other hand, your application possibly should log to an RDBMS to gather application-level errors and notices.

What happens when the RDBMS needs to do logging for itself because the database cannot be written into?

score -2 · Answer 8 · answered Jul 24 '15 at 19:59

-2

Complexity. Adding RDBMS will increase complexity of whole system astronomically. And ability to manage complexity is the main thing which distinguishes programmers from source code producers.

answered Jul 24 '15 at 19:59

noonex

105
1

score -5 · Answer 9 · answered Jul 12 '11 at 12:10

-5

Is it speed or maintainability or something else?

Speed.

answered Jul 12 '11 at 12:10

S.Lott

45,522
6
93
155

Why is filesystem preferred for logs instead of RDBMS?

9 Answers9

Linked