A Portable Cloud Experiment: SFTP Cloud Storage Sync (evanjones.ca)

[ 2018-August-18 12:17 ]

Google recently announced a portable cloud library called Go Cloud. At Bluecore I work with Google Cloud, but at my previous startup (RIP Mitro) we used AWS, and I like to keep an eye on what the various cloud providers are doing, so I found the concept intriguing. To better understand the trade-offs, I used it to build sftpsync, which synchronizes an SFTP server to a cloud storage bucket. (If you work with FTP and cloud storage, it would probably be better to use a service like Conduit FTP; Full disclosure: I helped build it.) My conclusion from this experiment is that a library can't hide all the differences between platforms. As a result, cross-cloud programs take longer to build, and the benefits are pretty close to non-existent.

Let's look at two of the issues I ran into as examples of the costs imposed on cross-cloud programs. First, I wanted list the contents of SFTP and cloud storage and compute the difference between the two file lists. Unfortunately, the Go Cloud library does not (currently) implement bucket listing, since Google and Amazon's products have different consistency models, and the library authors haven't decided how to deal with it yet. This is an example of the inevitable features an application will want that are not implemented in the library. This wasn't a total showstopper. Instead, I walked through the SFTP list and checked the metadata of each file in cloud storage (using NewRangeReader with length 0) to see if it needed to be copied. Unfortunately, this require approximately 1000X more API calls (one per file, instead of one per file list page).

The second issue was that I was checking file modification times (Reader.ModTime) to determine if a file with the same length has been updated. Unfortunately, on S3 this worked perfectly, but on Google Cloud Storage it always returned the zero time. I failed to read the API description, which states: "This is optional and will be time.Time zero value if unknown." I changed my code to not rely on this behaviour (and it was promptly fixed in the Go Cloud library). Unfortunately, it is unavoidable that there will be cases where the API will be the same, but behaves differently for different providers. A portable application will need to work around these differences. This reminds me of building applications that support multiple SQL databases. In theory, they all use the same query language, but the implementations are different enough that it requires changes to the application.

So what are the benefits of writing a program that supports multiple clouds? If you are building a service, I think there are two: you might be able to improve your reliability by failing over between providers, and you might be able to negotiate better discounts. For reliability, all the clouds provide their own internal tools (e.g. regions/zones, replicated services), and most applications have more self-imposed downtime than downtime caused by their underlying infrastructure. However, there are some "global" resources where this could be useful (e.g. DNS or global load balancing), and some applications where reliability is so critical this could be worth considering. However, I think this is useful for a small number of applications. For discounts, I do not have any personal experience to understand how much being able to threaten that you will move will motivate the negotiation. I do know negotiated cloud discounts are a thing, and judging from the fact that Snap has spending commitments with both Google and Amazon, there are some benefits to not being held hostage by a single vendor. However, you have to weigh that against the additional engineering costs imposed by using multiple clouds. Unless your bills are huge, I think the engineering time is likely to cost most.

In my opinion: These benefits do not outweigh the cost. I think a generic cross-cloud API sounds like a great idea, but probably isn't useful for things that are more complex than my tiny sftpsync experiment. Terry Crowley's discussion about this library suggests a better approach: build a very thin layer around the external APIs you rely on. This is useful for testing your applications in general, even without considering about multiple clouds. However, if you happen to find a critical need to use multiple providers, it will make porting easier.