Auto scaling changes performance bugs from an outage into a cost problem

about | archive

[ 2019-October-26 10:38 ]

Cloud services that scale automatically free us from thinking about capacity. However, auto scaling also changes performance bugs from an outage into a cost problem. This is an improvement, but means it is easy to make expensive mistakes. I've introduced big cost bugs more than once, costing at least tens of thousands of US dollars. In this article, I'll try to explain why auto scaling makes cost problems easier to introduce, and suggest a few things to avoid the problem. (This article was inspired by a Twitter conversation with Alex Rasmussen and Joe Hellerstein; Thanks!)

The most straightforward way to set up a service is to run it on a fixed amount of resources (e.g. a number of physical machines, VMs or Kubernetes pods). In this configuration, when the incoming request rate approaches the system's throughput limit, bad things happen. The time to process a request skyrockets, and if you are really lucky, things start to crash, usually due to trying to process too many tasks at the same time. If the incoming request rate exceeds the throughput, you have an outage, where the best a system can do is serve a fraction of the requests, and less robust systems will fall over.

Fixed capacity: system explodes when requests exceed limit

In modern systems, we can use auto scaling to add resources to the system when it starts getting heavily utilized. The figure below shows a hypothetical system that can detect the increasing load, and increase the capacity before it is overloaded.

Auto scaling: system adds capacity as requests increase

In general, this is an improvement over a fixed-capacity system, since it automatically handles unexpected increases in load. However, the cost of operating this system is much less predictable, since it depends on load. Worse, if you add a performance bug, good auto scaling will hide it from you. The only thing you will notice is your bill goes up a lot. Billing data is typically delayed for a day, so this can at least a few days to notice, at which point you might have wasted a lot of money. Performance problems are more obvious in fixed-capacity systems. In really bad cases, you deploy a new version and your service explodes, since it can no longer handle the load. Even small performance bugs are often visible on service time charts. As you deploy a new version, the time to process requests goes up, so you see a graph like the following in your monitoring system:

Service time increases as a performance bug is deployed

In either case, the change is obvious after you deploy it, so you can investigate if this is expected, or if you are accidentally burning money.

With really good auto scaling, large or small performance changes should be invisible. The system notices that the application is doing more work, so it seamlessly provisions more capacity. In my experience, some real-world cloud services work like this. You can throw 10X more work at the system, the performance stays the same, but your bill goes up.

What to do about it

There are a few things that can help:

Cloud pricing trends: Both fixed and flexible

If my memory is correct, at the beginning of the cloud services era, most services were billed on a per-request basis. For example, AWS S3 charges for the number of API calls you make. In theory, this is better for customers since you only pay for exactly what you use. It also incentivizes providers to build efficient systems since it improves profitability. Today, many services provide both fixed-price, fixed-capacity and per-request pricing models (e.g. AWS DynamoDB and Google BigQuery). I think part of the reason is that fixed-price billing avoids the potential cost surprises of auto scaling.