Building Reliable Storage on Virtual Infrastructure (evanjones.ca)

[ 2009-August-07 17:33 ]

I have had a few discussions recently about storing data reliably on so-called "Infrastructure as a Service" (IaaS) platforms, such as Amazon's EC2 or Rackspace Cloud. These services provide a virtual machine hosted in some data center somewhere. You can use the local disk on this virtual machine to store data. The catch is that for some types of failures, the data stored on that disk will be lost. This makes life easier for the infrastructure provider, since they don't need to worry about attempting to recover data from failed disks, or moving disks from failed machines. The problem is that users panic when they hear that they can't trust the data on the local disks. However, from my perspective, this is not a new problem.

Running Directly on Physical Hardware

Traditionally, a service durably stores data by writing it to a hard drive. If this hard drive fails, the data is gone. To work around this limitation, many servers use a RAID array, so that the data survives at least one hard drive failure. If the server itself fails, the disks must be attached to a different system in order to recover the data. If multiple disks fail, than the data can still be lost. Therefore, critical data must still be backed up.

Running on Virtual Hardware

When deploying a service on virtual infrastructure, the same kinds of problems can occur. The difference is that because you no longer have access to the physical hardware, you can't move hard drives from a failed server to a working backup. Rather, if the server dies, your data is gone. This is not really a new type of failure, and services should already be designed to handle it. However, this type of failure might be more common on virtual hardware, since the data center operator may need to bring a physical host server down for many reasons, such as software upgrades. Note that a failure does not mean a reboot. In case the physical host reboots, the virtual machine will still see the data on its hard drive. Amazon's EC2 documentation explains this as follows: "If an instance reboots (intentionally or unintentionally), the data on the instance store will survive. If the underlying drive fails or the instance is terminated, the data will be lost."

Replication on Virtual Infrastructure

It is still possible to reliably store data on virtual infrastructure. The answer is replication. The data must be stored multiple times on systems that fail independently. On Amazon's EC2, you can use their Elastic Block Storage service to get RAID-like reliability guarantees with a disk interface. An alternative approach is to replicate the entire service across multiple virtual machines. In this case, if a single system dies the data is still available on the backups. This approach works perfectly on a infrastructure like Amazon EC2, assuming that the virtual systems fail independently.

This last part is the tricky bit. It turns out that, to the best of my knowledge, Amazon's EC2 cannot guarantee that your instances will run on the same physical machine. They attempt to physically spread your instances across machines, but there are no guarantees. This means that if you get unlucky and two instances are on the same physical machine, you could easily violate the reliability guarantees that are needed for running a replicated service like Yahoo Zookeeper on EC2. However, you can work around this issue by running a traceroute to some IP address. Reports are that the first hop is the Xen dom0 address, which should be the same for virtual machines on the same host. Thus, when bringing up a replicated service in EC2, it is important to verify that your instances are assigned to different physical hosts.