Is there any practice to align file system block size with database block size?

Question

This topic has already been discussed here: Understanding block sizes But I have few more things to add for my use case.

Generally, most database systems use a default block size of 8 KB, though some allow it to be modified. On the other hand, modern operating systems often use a 4 KB block size for file systems. This discrepancy can result in multiple physical I/O requests to fill a single database page.

A smaller file system block size benefits random reads, such as index lookups, while larger block sizes are advantageous for sequential scans and heap fetches. Considering these points, I have a few questions:

Is there a common practice to align the database block size with the file system block size for OLTP?
In a clustered system (e.g., SQL Server Availability Groups or PostgreSQL streaming replication) with a primary and one or more secondaries, is it acceptable to have different file system block sizes, or is this something that should always be avoided?
For analytical databases or columnar tables, is it beneficial to use a larger block size?

score 1 · Accepted Answer · answered Dec 11 '24 at 18:10

Is there a common practice to align the database block size with the file system block size for OLTP?

Not really, no. The best filesystem block size will depend on too many other factors such as the stripe sizes (or similar based on technology), location (you may not actually know what remote storage is using under the covers), drivers, filesystem, and then the actual workload of the environment which is the most important part.

It doesn't help, for example, to se an 8k block size for an 8k allocation unit for a database system only to have most/all of the actual IO be done in 256 KB units. The graph for block size vs IOPs is generally inversely proportional, meaning that as block size increases the amount of IOPs decreases while overall throughput increases.

In a clustered system (e.g., SQL Server Availability Groups or PostgreSQL streaming replication) with a primary and one or more secondaries, is it acceptable to have different file system block sizes, or is this something that should always be avoided?

Ideally everything should be the same carbon-copies of each other. You can run it without being exactly the same, but depending on your OS/DBMS this may cause unforeseen performance issues. For example, if the system is doing physical replication at a block level but a block on one system is 512 bytes but a block on another system is 16kb (note this is at the hardware level). Having different filesystem block sizes generally works, with the same caveats.

For analytical databases or columnar tables, is it beneficial to use a larger block size?

It depends on the overall IO profile that is created by use of the features. Different features work in different ways. You're better off getting an ETW trace fo the IO profile for your DBMS to understand what IO sizes and types (synchronous/async) it prefers for that feature/workload.

Is there any practice to align file system block size with database block size?

1 Answers1