Autoscaling PaaSTA Services

If you haven’t heard about PaaSTA before, feel free to check out the blog post introducing it.

One step in creating a service is to decide how many compute resources it needs. From the inception of PaaSTA, changing a service’s resource allocation has required manually editing and pushing new configs, and service authors had to pore over graphs and alerts to determine the proper resource allocation for a service whenever load requirements changed. This changed earlier this month when autoscaling was introduced into PaaSTA.

Why did we do this?

Autoscaling was introduced into PaaSTA to make sure services are allocated an appropriate amount of resources as their load changes. Many services at Yelp have large differences in how many resources they need throughout the day. Previously we’ve dealt with this by overprovisioning – giving services the resources they need to operate at peak load, even if they’re only using a fraction of that most of the time. Autoscaling allows PaaSTA to free up resources from services that aren’t using them and either give those resources to services that need them or scale down our clusters and cut our infrastructure bill. Alternatively, if usage increases we can prevent performance degradation by automatically allocating more resources to a service.

How did we do this?

One of our goals when developing autoscaling was to make it as easy as possible for service authors to migrate to autoscaling. When a service author enables autoscaling, they only need to specify the minimum and maximum number of instances for their service.

Here’s an example of a PaaSTA service’s config file with autoscaling enabled:

---
main:
  cpus: 1
  mem: 1024

  # All that's required to enable autoscaling
  min_instances: 3
  max_instances: 12

That’s it!

PaaSTA then chooses sane defaults for the other autoscaling parameters: the metrics_provider and the decision_policy. The metrics_provider tells the autoscaler how much a service is under load. The decision_policy code takes load information from a metrics_provider and determines how many instances a service should be running. If a service author wants more control, they can override PaaSTA’s defaults by specifying a metrics_provider and decision_policy in their service configs. They can also inject additional autoscaling parameters by adding them to their service configs.

Here’s an example of a customized service config:

---
main:
  cpus: 1
  mem: 1024
  min_instances: 3
  max_instances: 12

  # Advanced configuration
  autoscaling:
    metrics_provider: http
    endpoint: metrics.json
    decision_policy: threshold
    setpoint: 0.5

If the autoscaler decides to scale a service, the new instance count is written to Zookeeper and an event is logged describing why the service was scaled. The next time PaaSTA updates Marathon, it reads the updated instance count and scales the corresponding Marathon app. As the app scales, Smartstack handles service discovery by registering or de-registering instances. Our Sensu replication alerts and other tools are also automatically updated to use the new instance counts specified in Zookeeper.

Autoscaling Graph

Above is an example of PaaSTA dealing with a spike in CPU usage for a service. This service uses a metrics_provider that emits the average cpu utilization for a service and a decision_policy that uses a PID controller to control the instance count.

Bespoke autoscaling methods

But what happens if a service’s utilization isn’t accurately represented by any metrics_provider? For example: a queue worker might want to scale based on the change in its queue length over time. Since the instance count for a service is just a number written in Zookeeper, PaaSTA can get autoscaling signals from custom external sources. Service authors can specify that they want to use the 'bespoke' decision_policy and PaaSTA will skip its internal autoscaling code and instead respond to external autoscaling signals in Zookeeper.

Why can’t you just use AWS autoscaling (or something similar)?

Let’s compare PaaSTA autoscaling to Amazon’s ECS autoscaling. Both of these technologies can scale Docker containers, and AWS has CloudWatch as a very powerful monitoring and metrics solution. Why didn’t we just use ECS? The biggest difference between the two is that PaaSTA autoscaling is infrastructure-agnostic (just like PaaSTA itself) while ECS autoscaling operates only on Amazon ASGs. ECS autoscaling also creates and destroys EC2 servers, while PaaSTA autoscaling only changes the number of services that are running on existing servers.

Finally, ECS autoscaling uses fairly simple logic for determining to scale up or down, checking to see if a metric passes above or below a static threshold for a certain amount of time. PaaSTA autoscaling has multiple decision_policy that provide different methods of control: some can do simple threshold-based logic, while other decision_policy use more complicated control systems such as PID controllers to achieve better control at the cost of increased complexity and reduced transparency.

Conclusion

PaaSTA autoscaling allows us to ensure services stay performant and cost-efficient even as new code is shipped or their utilization changes. It’s flexible enough to support many of the hundreds of services we have at Yelp; widespread adoption across our services will greatly increase the elasticity of our infrastructure. If you want to learn more about PaaSTA, or see the source for autoscaling, check out our Github. Stay tuned for a future blog post where we use the same concepts to autoscale clusters of virtual machines.

Want to help cook PaaSTA?

Like building this sort of thing? At Yelp we love building systems we can be proud of, and we are proud of PaaSTA. Check out the Site Reliability Engineer positions on our careers page if you like building systems you can be proud of too!

View Job

Back to blog

Yelp

Engineering