Making Writes Durable

about | archive

[ 2011-March-18 18:27 ]

When a computer tells you your data is saved, you expect it to be there, even if the power fails. It turns out this doesn't always happen because there are all sorts of caches, queues and buffers between your application and the physical platters. However, when a database tells you your data is saved, you (usually) want a strong guarantee that it is going to survive a failure (durability). Since this is hard and I've spent a fair bit of time figuring this stuff out, here are my notes on how to make writes durable on Linux.


The first step is to ensure that the application is telling the operating system to really write the data, and not just to write it to some cache somewhere. There are many ways to do this on Linux:

As a side note, I've found that I get better performance on ext3 and ext4 if I pre-fill the entire file then overwrite it. I suspect this avoids extra block allocation. This is even true on ext4 when calling fallocate, which supposedly pre-allocates space on disk. I suspect this will not be true on filesystems like btrfs or ZFS, which copy-on-write anyway.

File System Issues

For durable writes to actually be durable, the operating system must use a write barrier to inform the disk that the data really should be flushed out of cache and written to disk. For ext3, you must mount the file system with the barrier=1 option to enable write barriers. For ext4, this is enabled by default, so you must make sure you do not specify barrier=0.

Disk Issues

You must finally make sure that your disks are working correctly. I've tested about 4 "modern" magnetic disks, both SATA and SAS, and I've found that they all work correctly, even with their write caches enabled. However, I've found that cheap flash disks aren't crash safe, and you'll need to buy more expensive "enterprise" class disks. If you have a RAID controller, you must ensure that it is also configured correctly. You'll need to read the manual: there are too many options for me to describe them here, but typically you'll want to read about things like the battery backup refresh cycle, and if write caches are automatically disabled on the disks attached to the controller.