Should a frequently-growing log file be stored as a table or a blob in the database?

Question

I'm an undergrad and haven't worked with any actually database yet so my apologies if I misunderstand/misuse any terms. My question comes from an assignment of mine, but this is my attempt at learning and not having my homework done for me.

The assignment, paraphrased:

A university's printing service allows students on the campus to upload their files to the system and send uploaded files to one of the university's printers for printing... The system logs all printings done by all students, recording the student's ID, which printer was used, the print file's name as well as the time of printing. An admin of the system should be able to view the log file filtered to any students, printers and/or time period (to my knowledge this requires indexing).

A question I come across is which data storage approach (database or file system) I should be using for the student's files and the log file, with both being frequently-growing data.

From this SO question I gathered that frequenting updating files are better stored as file system storage, and from this question that files could be stored in a database with blobs.

My questions are:

Should the log file be stored with database or file system? (Or I reckon the determining question would be "Is it possible to filter by multiple values without indexing?")
If student files are stored with a blob related to a student in the database table, is it possible for each student to only have access to their own files?
Why is the file system better fitted for frequently-growing files?
Is it possible (or practical) for a server to use both database and file system?

Ergest Basha · Accepted Answer · 2023-11-05T17:13:20.840

Based on the following which is the main concern

An admin of the system should be able to view the log file filtered to any students, printers and/or time period (to my knowledge this requires indexing)

you should store the data in a database. Surely logs using files are more efficient, however logs stored in the database are easier to read and maintain and it seems that you need reading a lot.

Two tables are enough if you will have not descriptions for printers or keeping info about students (in your question it is mentioned that only studentid, printerid, date and print description is needed).

The following example is based on MySQL (syntax might differ on different DBMS)

printed table

CREATE TABLE `printed` (
  `StudentId` int NOT NULL,
  `PrintedFileId` int NOT NULL,
  `PrinterId` int NOT NULL,
  `PrintedDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`StudentId`,`PrintedFileId`),
  KEY `PrintedFileId` (`PrintedFileId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

printed_details table

CREATE TABLE `printed_details` (
  `PrintedFileId` int NOT NULL,
  `PrintedFilename` varchar(30) DEFAULT NULL,
  `PrintedFileDetails` text,
  PRIMARY KEY (`PrintedFileId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

With proper indexes speed will not be a problem.

score 0 · Answer 2 · answered Nov 06 '23 at 12:30

Should the log file be stored with database or file system? (Or I reckon the determining question would be "Is it possible to filter by multiple values without indexing?")

The "filtering" aspect of viewing the logs strongly indicates a database solution.
All the complexity of "understanding" the stored record format and pulling out the bits you're interested in is all taken care of by your database. Put it into a file and you have to reinvent all that work.

Also, leaving the data in the database gives you far better security controls over it. Once it's "out there" in a file, it's much harder to keep things secure.

If student files are stored with a blob related to a student in the database table, is it possible for each student to only have access to their own files?

Yes. "Row Level Security" is a commonly-used term for this.
However, IMHO, your Students should not have access to the Database at all.
The Application that they use to look at stuff should use a separate (i.e. secret) Service account. End Users should not have their own [database] accounts inside the database.

Why is the file system better fitted for frequently-growing files?

Simply put, it's what file systems are built for.
Writing new records to a table is what DBMSs are built for, but extending the values inside existing rows? Not so much.

Is it possible (or practical) for a server to use both database and file system?

Yes, but unnecessary.

However you store this data, though, remember that it's only ever going to grow unless you put some House-Keeping in place for it.
Will anyone really care that person 'X' printed out document 'Y' seven-and-a-half years ago? It's unlikely, so you need to have processes in place to tear down the data that you genuinely no longer need (or are, [legally], no longer required/allowed to keep - please see "GDPR" if this is any kind of "surprise" to you).

Should a frequently-growing log file be stored as a table or a blob in the database?

2 Answers2