David R. Morrison, Compute Infra Tech Lead
- Feb 19, 2019
Here at Yelp, we host a lot of servers in the cloud. In order to make our website more reliable—yet cost-efficient during periods of low utilization—we need to be able to autoscale clusters based on usage metrics. There are quite a few existing technologies for this purpose, but none of them really meet our needs of autoscaling extremely diverse workloads (microservices, machine learning jobs, etc.) at Yelp’s scale. In this post, we’ll describe our new in-house autoscaler called Clusterman (the “Cluster Manager”) and its magical ability to unify autoscaling resource requests for diverse workloads. We’ll also describe the Clusterman simulator,...