Zero downtime Upgrade: Yelp’s Cassandra 4.x Upgrade Story
-
Mark Surnin and Muhammad Junaid Muzammil, Software Engineer
- Apr 7, 2026
The Database Reliability Engineering team at Yelp seamlessly upgraded more than a thousand Cassandra nodes with zero downtime. This post takes you behind the scenes of our upgrade strategy, from planning sessions to flawless rollouts. Background Motivation Apache Cassandra is a distributed wide-column NoSQL datastore and is used widely at Yelp for storing both primary and derived data. Yelp orchestrates Cassandra clusters on Kubernetes with the help of operators, as explained in our operator overview post. Upgrading from Cassandra 3.11 to 4.1 offered several observability and reliability improvements, in addition to performance gains. Based on public benchmarks, we expected to...