Ilkin Mammadzada, Site Reliability Engineer
- Jan 5, 2023
Manually managing the lifecycle of Kubernetes nodes can become difficult as the cluster scales. Especially if your clusters are multi-tenant and self-managed. You may need to replace nodes for various reasons, such as OS upgrades and security patches. One of the biggest challenges is how to terminate nodes without disturbing tenants. In this post, I’ll describe the problems we encountered administering Yelp’s clusters and the solutions we implemented. Problems At Yelp we use PaaSTA for building, deploying and running services. Initially, PaaSTA just supported stateless services. This meant it was relatively easy to replace nodes since we only needed to...