5

I've recently learned that enabling disk write caching can significantly improve system performance. However, I'm concerned about the potential risks of data corruption or loss in case of a sudden power failure.

Here's some context about my setup:

Operating System: Windows Server 2012 R2

Disk Type: SATA 3.0 HDD

Purpose : I'm considering enabling write caching on my disk to boost performance. My understanding is that data corruption can occur if a power failure occurs when the data on write cache has still not been committed to the disk, when the operating system crashes, or when an application that accesses data crashes.

During my research, I found the following details in this Article, they have mentioned "Data corruption occurs without the users awareness when the active disks write cache is enabled and the disk performs a Read Look Ahead (RLA), which is prematurely ended." . I could not understand the exact meaning of this statement.

Is there any cases of data corruption / file corruption happening after enabling write caching even when no data write is taking place at the time of power failure.

Greg Askew
  • 39,132

2 Answers2

5

Modern file systems (XFS, ZFS, JFS, ext4, APFS, NTFS, etc) all use journaling so yes, you’re going to lose some data (latest commits and what’s not committed yet and stored in cache, that’s obvious), but no, you won’t experience any data corruption.

Here’s some good reading with lots of diagrams and detailed explanations about IBM’s JFS, everything within the article is 100% relevant to the other journaling file systems:

https://www.ibm.com/docs/en/aix/7.2?topic=types-journaled-file-system-jfs

Either way… You have to do backups! So-called “3-2-2 backup rule” is what you should follow.

https://www.starwindsoftware.com/blog/3-2-1-backup-strategy-why-your-data-always-survives

Hope this helped!

-1

Short version: no, using a modern SATA disk and a journaled filesystem it is not possible to corrupt acknowoledged (ie: synced) writes even when disk cache is enabled. On the other side, unsynced (buffered) writes can be lost/corrupted in case of powerloss. However, the article you linked is about a specific firmware issue and does not talk about generic behavior when using disk caching:

While performing extended disk test exercises, a latent firmware issue was discovered.

Long answer: two kind of writes can be issued:

  • sync writes, which guarantee persistence (and ordering) by leveraging ATA FLUSHes or FUAs;
  • unsynced (buffered) writes, which can be cached, aggregated and reordered by the disk DRAM cache.

When dealing with HDDs and consumer SSDs, sync writes are very slow: the process of flushing any single write means the per-IO latency is payed at each single write. So, sync writes are generally reserved for the most important IOs: journal commit, databases, email delivery, etc. All other less-important writes (ie: a user file copy) are issued as cached/buffered writes and data be lost if powerloss happens at the right moment (up until 30-60s after the original write).

Note that ancient PATA and SATA drives lied to the OS, pretending to honor syncs while actually discarding the required flushing behavior. This led to the suggestion of totally disabling the disk DRAM cache (or setting it in read-only mode), so that any written data was really stored on the (durable) disk platters. A disk with its cache disabled effectively treat each write as sync, providing maximum safety guarantees at a great performance cost.

Please note that this does not means that buffered writes can not be lost: if a crash happen before the OS flushed its buffers, all unsynced data will be lost. For this reason, and considering that modern (post-2008) disks honor ATA FLUSHes or (post-2015) FUAs, the current common advice is to let the disk cache enabled and to rely on the OS to flush important writes.

SSDs and HW RAID cards with powerloss protection escape this performance/safety tradeoff by having on-board circuitry to safely cache any writes (even sync ones). Anyway, when using an HW RAID card, how the disk cache will be managed is implementation dependent (ie: PERC disable it for SAS disks, but not for SATA ones).

shodanshok
  • 52,255