Intel Consumer SSDs: Not appropriate for databases

about | archive

[ 2010-September-03 10:04 ]

I have been testing one of Intel's "consumer" SSD (X25-M G2) to see if it stores data durably, meaning that if the disk claims the data has been written, it actually survives a power failure. This is important because you want your airline ticket to stay purchased after your buy it, even if the system crashes. The conclusion is that this SSD can lose data in power failures, even with the write cache disabled. This means that if you are using it for a database, committed transactions could be lost (meaning lost airline tickets, forgotten bank deposits, etc.). I believe this is a "bug" in this device, as this is not how disks are supposed to work. The good news is that I was able to test Intel's "enterprise" SSD (X25-E), and it seems to work as expected. Unfortunately, other SSDs have similar problems. In this article, I'll describe the bug, the real-world impact, and how I tested it. I'm still contacting some experts to see if this observation agrees with what they have found, so I'll update this if I discover new information.

The Bug and the Real World

Intel's X25-M G2 disk loses data approximately 25% of the time when the disk loses power, even when the disk is instructed to flush its cache (write barriers on the ext4 file system), or with the write cache disabled. While disabling the cache doesn't solve the problem, it does make it more rare; I was only able to get it to happen with writes that are 16 kB or larger. I tested three magnetic disks and Intel's X25-E, and only the X25-M G2 lost data during this test.

This really only matters if you are using this disk for a database where the data is critical. If your server loses power (either due to the power failing, or your UPS failing), you could lose data. You can reduce the chance of data loss with the following:

Detailed Summary of Results

I tested three devices: Intel's X25-M G2 SSD (80 GB), Intel's X25-E (64 GB) and a Western Digital SATA magnetic disk (WD3200AAKS, 320 GB). Only the Intel X25-M G2 lost data, and it did so in the following conditions:

Testing Methodology

A program runs that writes a sequence number to disk, then reports the number. While it is running you "crash" the system, reboot, and check what data exists on disk. If sequence number x was reported as written, then the last value written should be x, x+1, or some partial version of x+1. If the last complete record is less than x, then data has been lost. I did the following test five times for each configuration:

  1. Start logfilecrashserver on a workstation:

    ./logfilecrashserver 12345
  2. Start minlogcrash on the system under test:

    ./minlogcrash tmp workstation 12345 131072
  3. Once the workstation starts receiving log records, pull the power from the back of the disk.
  4. Power off the system (my system doesn't support hotplug, so pulling the power on the disk makes it unhappy; if your system supports hotplug, this may not be needed).
  5. Reconnected power to the disk.
  6. Turn on the test system.
  7. Observe the output file using hexdump.

The output of hexdump should show that the file has at least the last record reported by logfilecrashserver.

Durable Writes

In order for a write to actually be durable, then all the layers in the stack need to cooperate, so that "save to disk" actually means "I mean it: make sure all the stuff is on the disk so that I can read it back if something bad happens." On Linux with the ext3 and ext4 file systems, this actually works when write barriers are enabled (the default on ext4, must be manually enabled on ext3). This feature causes the operating system to issue a CACHE FLUSH command to the disk when an application calls fsync or fdatasync. This is exactly what databases, and my test program, do. If the disk works correctly, it waits until all the data is actually written to the disk, then it acknowledges that the flush operation has completed.

Similar Reports

I am not the only person to observe these problems with Intel SSDs. Others have found that the X25-E and the X25-M G2 both lose data with the write cache enabled. However, it appears that I am the first to report this type of problem with the write cache disabled, perhaps because I am the first to test writes that are larger than 4 kB? I've reported this to ext4 developers, in an attempt to make sure I'm not making an error. Similar issues have been reported for magnetic disks in the past, although with write barriers enabled (the default for ext4, use the barrier=1 mount option for ext3) this should not be a problem. However, it is still worth testing your configuration. My test program was inspired by brad's diskchecker.

Additional Boring Test Details

Configurations Tested

Failure Modes Observed