Casper is a caching proxy designed to intercept traffic flowing between internal Yelp services. It is built with Nginx and OpenResty at its core and contains some logic in Lua to fit in our ecosystem. Today we’re proud to announce that Casper is opensource and available on Github.

To introduce the context in which Casper was created, this post outlines a few basics about Yelp’s SOA, explains the technical decisions behind Casper’s design, and finally exposes concrete problems that we’ve encountered while rolling it out and running it in our production environment.

Moving past “Memcached for everything”

Yelp has had a solid Memcached infrastructure in place for years. Our main application (aka “yelp-main”) and a few services are still taking advantage of it today to persist data across restarts or cache database/logic-derived data for performance and cost reasons.

However, as the number of services and traffic begin to scale, some of Yelp’s services experience a disproportionate amount of load. These are what we call “data” services, or “core” services, containing or processing core data like user, business or review-related information.

Our Memcached usage is fairly standard: we have a client library which lets applications interact with a set of Memcached clusters deployed in their AWS region. It’s a very thin wrapper around the Memcached protocol (“GET”, “SET”, “DELETE”, etc). Applications dictate cache keys, values, TTL (Time-To-Live), and invalidation strategies. This flexibility comes at a cost:

  • As Yelp grows the number of services in its infrastructure, it’s extremely hard to understand which pieces of data are cached, how, and where.
  • Tricky forward/backward compatibility of data in the cache with the code being deployed can lead to outages
  • Updates to caching policies require deployments, because application code drives these decisions

We also realized that we were limited by a few fundamental Memcached design choices. Namely:

  • Memcached clusters are hard to share fairly because of their global eviction policy. If eviction spikes, it’s hard to tell where it comes from unless custom metrics and instrumentation for all clients are in place.
  • Memcached does not have the concept of key enumeration. This makes cache invalidation hard to perform granularly unless you keep a running tally of keys somewhere.
  • Memcached has no built-in mechanism for data replication. Because Yelp has one cluster per AWS region, this leads to suboptimal hit rates and makes DELETEs hard to perform correctly.

From these challenges emerged a list of guiding principles: transparency, to know when and how data is cached in our infrastructure, built-in operational metrics, granular cache invalidation, fail-safe mechanism when our cache isn’t available or misbehaving, and isolation from app deploys, allowing us to modify our caching layer independently from applications or clients.

Background: Yelp service calls

Yelp has hundreds of small services running in production today. Our overall service infrastructure, PaaSTA, relies on SmartStack (made of Nerve and Synapse) to perform service discovery and routing.

In practice, each host at Yelp runs a local HAProxy instance to route requests to services. When a process wants to contact a service it connects to HAProxy locally, and requests are forwarded to available backend servers transparently.

The HAProxy instance running on each host is periodically updated and restarted to reflect backend servers going up or down. This is done without downtime!

Yelp’s first proxy-as-a-service

Caching service calls is traditionally done either on the server or on the client, through a thin client library (in pale blue in the picture below):

Instead of deploying Casper as infrastructure with an associated client or server library we designed, built and deployed Casper as a separate service.

This brought several key advantages. The first is that changes or updates to Casper are completely independent from clients and underlying Yelp services. More importantly, Casper deployments are not special, because Casper is just another Yelp service.

PaaSTA deploys and runs Docker containers without caring about what’s inside of them, making it possible to choose libraries and technologies that best fit what we wanted to accomplish. This was important when the time came to decide how we’d build our caching proxy.

We decided to use Nginx and the OpenResty platform to build Casper. Nginx is a widely adopted, performant proxy solution, and Openresty is a platform designed to script Nginx with Lua. We’ve decided to adopt OpenResty for three main reasons:

  1. Its extremely mature documentation. Everything you might want to know is in a single README file, full of complete examples and references.
  2. The OpenResty community is active and has authored dozens of high-quality Lua modules.
  3. We’re confident that Nginx-based solutions can be both performant and reliable because Yelp already uses Nginx at its outermost edge (external load balancers).

We took advantage of Smartstack’s calling mechanism described above to roll out Casper transparently and make caching available for all Yelp services. Rather than routing to backend servers directly, HAProxy instances running on client hosts route requests to Casper first. As an added bonus, HAProxy does not forward requests to Casper if it’s down or unhealthy, allowing for a natural fail-safe mechanism.

How does HAProxy “know” which requests to send to Casper rather than the usual backend server? The answer is configuration, which we cover later on in this post. Read on!

Yelp’s first in-memory Cassandra cluster

The first iteration (proof-of-concept) of Casper bundled Nginx, OpenResty and Redis (a popular key/value store for caches) in a single Docker container. This is ideal when it comes to read/write latency: everything is available on localhost! The downside is poor hit rates, because this yields N independent caches, where N is the number of running Casper containers. Not to mention that caches are cleared when containers restart, which happens on every deploy. Clearly not a sustainable option!

To work around these problems we knew we needed to host our datastore outside of Casper’s container. Scaling Redis to multiple machines is done through cluster mode. Since Casper is built with Nginx/Openresty at its core we needed to find a Redis driver written in Lua supporting cluster mode. Unfortunately, at the time, we weren’t able to find one.

Eventually we explored using Cassandra, hosted outside of Casper’s container. Why? A few reasons:

  • Has a well-maintained lua driver, lua-cassandra
  • Offers built-in replication mechanism across AWS regions
  • Maintains the ability to list cache keys (this is important for cache invalidation)
  • Maintains the ability to automatically expire cache entries with a built-in TTL mechanism

Another big factor in this decision was performance. We were initially reluctant to look at Cassandra because it persists data to disk by default. But we realized that reads and writes could be made fast enough if we used Cassandra with data and log directories mounted in-memory (in tmpfs). The following table compares different Cassandra drivers, and shows that lua-cassandra gets decently close to what Redis offers out-of-the-box:

Cassandra drivers benchmark results

Cassandra drivers benchmark results

Today we’re still running with this setup (in-memory Cassandra, with lua-cassandra), but had to tweak it along the way. More specifically, we:

  • Initialized lua-cassandra with local nodes (within the same AWS region) to avoid timeouts. Since our cluster spans multiple regions, there were cases of cross-region traffic and initialization timing out
  • Decreased the default timeouts and got rid of retries in lua-cassandra to minimize the impact of an unhealthy cluster on traffic flowing through Casper (see cassandra_helper.retry_policy)
  • Adjusted our Cassandra schema to distribute data fairly between nodes (we now have a “bucket” as our partition key – see cassandra_helper.get_bucket)
  • Contributed back to lua-cassandra to handle errors that slipped through the default error handling:
  • Split our traffic to cassandra across two connections: one for reads, the other for writes. That’s because reads (cache lookups) are performed synchronously in the critical path while writes (for invalidation or cache writes) are asynchronous and performed post-request.
  • Tuned our write consistency level from ALL to LOCAL_QUORUM to avoid blocking clients performing invalidation
  • Decreased gc_grace_seconds from 10 days to 1 day, which forces Cassandra to run compaction more often and helps with read timings.

Tuning cache keys

Yelp services talk to each other with HTTP (1.1), a text-based protocol.

Picking the whole request as a cache key isn’t acceptable since it includes headers that will vary with every request (e.g. “Date”). This would bring the potential cache hit rate to 0 because all keys would be different.

For our first iteration, we picked the HTTP URI and method as a cache key. This is pretty typical for an HTTP cache, but it triggered an interesting bug in gzip-enabled services: if Casper proxies a gzip-accepting request first, the cached response will be gzipped. This cached gzipped response will then be served to all subsequent clients requesting that URI, regardless of whether they accept gzip!

This made us reconsider our initial decision and we now let developers configure Casper to optionally include HTTP headers in the cache key. This is configured per service. For example, if a service configures caching with vary_headers: "Accept-Encoding", then the Accept-Encoding header along with its value will be included in the cache key.

Ultimately it’s up to developers to recognize which cache key is right for them. Keep in mind that adding more headers to the “vary_headers” list is potentially expensive and will bring hit rates down. If a header included in that list has too many possible values (e.g. “User-Agent”), then caching becomes a waste. This Fastly article goes into more details about this problem.

Caching bulk endpoints

Bulk endpoints: a quick primer

Traditionally, REST endpoints return one resource per response:

Bulk endpoints, on the other hand, typically return multiple resources in one request/response cycle. For example, one of our core services exposes an endpoint to retrieve Yelp user information with /user/v2?ids=<one_or_more_ids>. This endpoint returns one or several user objects per response:

Bulk endpoints are fairly common inside of Yelp’s infrastructure, especially for core data services. They’re useful to avoid network overhead by letting clients minimize the number of round trips necessary to retrieve the pieces of data that they need. Caching bulk endpoints with Casper was non-trivial and led to interesting challenges that we didn’t foresee.

Accommodating bulk endpoints in Casper

First, let’s talk about why caching bulk endpoints leads to problems with a naive caching strategy. Consider requests being made to /user/v2?ids=1,2,42, /user/v2?ids=1,42,2 and /user/v2?ids=42,1,2. They all return the same data (information about users #1, #2 and #42) yet their URLs are distinct. Assuming we enable caching for this endpoint, Casper stores 3 different objects in its cache. Lots of inefficiencies to this scenario:

  • We have 3 objects in the cache instead of one. This occupies unnecessary space in Casper’s datastore
  • If a client comes along and requests /user/v2?ids=42,2,1 (yet another combination!), /user/v2?ids=1,2 (a subset), or /user/v2?ids=1,2,42,43 (a superset), Casper is unable to serve data from its cache and has to forward the entire request (complete cache miss)
  • If user #42 changes (say, they write a review and we have a new review_count value), it’s impossible to perform granular invalidation. We’d have to invalidate all keys where “42” appears, regardless of which other resources are requested alongside of it.

We author our internal endpoints somewhat consistently across the board with Swagger and use JSON/HTTP as our encoding/transport of choice. This makes it possible to assume the following when it comes to bulk endpoints:

  • Bulk endpoint URIs contain a comma-separated list of resource IDs (e.g. /bulk/endpoint?ids=1,2,3). The param name can be anything (e.g. reviews=1,4,16)
  • Bulk endpoint responses are JSON-encoded object arrays, and each object contains a resource id. For example, a request to a bulk endpoint for resources 1, 2 and 3 is in the form "[{id: 1,...}, {id: 2,...}, {id: 3,...}]"

To cache bulk endpoints responses more efficiently, we parse responses to extract resources and cache them separately. At request time we extract IDs from its URL and use them to fetch the corresponding resource in Casper’s cache.

Concrete example and illustration

If a request comes in for /user/v2?ids=1,2,3 Casper forwards the request downstream because nothing is in its cache yet. When the response comes back, it’s parsed and Casper extracts user objects from it. This lets us populate the cache with as many entries as user objects present in the response (3 in this case).

For subsequent requests, we extract IDs from request URIs and use them to retrieve users from Casper’s cache. Concretely, once we have user #1, #2 and #3 in our cache:

  • GET /user/v2?ids=1,2,3 is a hit. But so is /user/v2?ids=3,1,2 or any permutations of requesting data about users #1, #2 and #3.
  • GET /user/v2?ids=1 is also a hit since we have user #1 in our cache
  • GET /user/v2?ids=1,2,3,4 is a partial hit/miss. In this case, we get users #1, #2 and #3 from cache, but send a downstream request for resource 4

Below is a slide deck summarizing these scenarios (important note: for the sake of clarity, cache writes are depicted as if they happened synchronously, but in reality they happen asynchronously!)

(see this deck on SlideShare)

Note that this strategy also solves our invalidation problem: if user #1 changes, we can invalidate precisely that user without affecting the rest of our cache. Keep reading for more information about how Casper’s cache gets invalidated.

If you’re curious about how the code behind bulk endpoints works, head to bulk_endpoints.lua.


We mentioned in the previous section that Yelp services had the ability to “configure” bulk endpoint caching and customize their cache keys. How does this work exactly? How does a service opt into using Casper?

To enable caching services opt-in via PaaSTA’s SOA configs. We’ve baked that into our platform with a new directive, proxied_through. If a service declares proxied_through: casper.main in its configuration (smartstack.yaml file), then Synapse generates HAProxy configuration such that requests are forwarded to Casper first, then to a service backend.

Curious about how this works in more details? This directive is defined here, and documented there.

Another system that we rely on is our “srv-configs” (service configs) system. It’s essentially a set of files distributed on all hosts at Yelp and is the canonical place for configuration used by services (as opposed to SOA configs representing configuration about Yelp services).

On each Yelp server, we thus have Casper-specific configuration:

This is the place where services declare which endpoints they want cached, whether or not these endpoints are bulk endpoints, and which set of headers should be part of the cache key, among other things. Here is a simple configuration file:

This allows for very explicit caching rules, all gathered in a central place, not tied to application code.

Invalidating Caches

Famously shortlisted as one of the hardest problems in computer science, we anticipated a lot of hurdles around cache invalidation. Our solution today includes two mechanisms for invalidation: through an API that we built into Casper (Casper then talks to its datastore to expire entries, see the code here), and through a standard TTL system.

The invalidation API is a standard, human-friendly HTTP API, which allows for manual cache invalidations in case of emergencies (“I need this cache busted right now!”).

We also use this API to reactively invalidate resources instead of letting them expire naturally. A concrete example is user data, which we cache for 1hr. In the absence of reactive invalidation, any update to a Yelp profile (say, an update to your tagline) could take up to an hour to be reflected on your app. To address this concern we’ve built a system based our Data Pipeline to trigger invalidation requests when our database commits changes to our “user” table, bringing down the potential staleness from up to an hour to consistently less than a few minutes.


Casper has been running in our production data centers for more almost a year now. It currently offloads a big chunk of traffic off of our primary internal API service, saving 20-30% of its compute capacity.

Below are 2 graphs showing the rollout of Casper in front of an endpoint called user.retrieve_users_by_ids_v2. On the top is QPS (query per second) for each AWS region, and on the bottom is hit rate (in blue), raw number of hits (green) and raw number of misses (yellow/brown):

Observations from the graphs above:

  • QPS went down severely. This is good! It means less compute resources used overall for that given endpoint (serving responses from our cache is way cheaper)
  • Cache hit rate (blue dotted line on the bottom graph) is above 80%, which means for >80% of requests sent to that endpoint, the response is returned immediately from Casper’s cache instead of being computed in the underlying service. This is a huge win for performance.
  • See the two spikes in traffic from the bottom graph? They were completely “absorbed” by our cache. Most responses delivered during these spikes were served by Casper. In these 2 instances, Casper successfully “shielded” the service from traffic spikes. That’s a win for reliability and performance as well.

Aside from the performance wins, let us also celebrate the fact that we can measure per-service and per-endpoint QPS, hit rates and timings really easily now! This is because Casper was built with some baked in metrics reporting: see metrics_helper.lua for the nitty-gritty details on how we do this.

A look ahead

While we’re truly happy with the way Casper has been helping our infrastructure, we’re aware of a few challenges we aim to consider and address as we move forward. Some have simple resolutions while others will require some creative problem-solving.

  • Static cache keys are not good enough. What we have now (configurable cache keys on a per-service basis) is better than pure URL-based caching, but we’d eventually want to provide services with the ability to dynamically set cache keys per request. This can be done if we supported the “Vary” header (see Yelp/casper#18)
  • Lack of generic mechanism to invalidate subgroups of cache objects. What if we wanted to invalidate all entries in all caches related to user #1? This is typically solved by surrogate keys, and we’re looking into building this into Casper (Yelp/casper#17)
  • Casper’s bulk endpoint logic relies on assumptions specific to Swagger and JSON-encoding. This makes our bulk endpoint support less broad than it could be since we exclude Yelp services with different bulk endpoints implementations/assumptions. This is a hard problem to solve, for which we don’t have a solid lead at the moment.
  • As Casper gets more and more popular, it handles more and more traffic and becomes a critical part of our infrastructure. While we’ve initially built Casper to be “fail-safe” it’s becoming increasingly unrealistic to completely shut it off without Yelp users suffering the consequences of an infrastructure under pressure (slow responses, timeouts). Yelp/casper#7 contains a potential avenue to solve this problem.


I hope that you’ve had as much fun reading this post as I’ve had writing it. Again, Casper is opensource and available on Github today. I’d highly encourage you to open issues or ask questions if you are curious. While we don’t expect Casper to be “plug-and-play” (after all, it’s built with Yelp’s infrastructure in mind!), we hope that it’ll inspire others to replicate its design to make their infrastructure more efficient and speed up user experience!


Casper has had a number of key contributors I’d like to acknowledge for their respective contributions:

  • Bai L. for the initial prototype and rollout
  • Anthony T. for the design and implementation of our bulk endpoint logic
  • Prateek A. for benchmarking different datastore options for Casper
  • Daniele R. and Avadhut P. for their continued help and contributions, especially during our migration to Cassandra
  • Ben P., Stephen A., Daniele R., Tomer E. and Liina P. for suggestions and comments on this blog post

Want to become one of our performance engineers?

Yelp engineering is looking for engineers passionate about solving performance problems. Sounds interesting? Apply below!

View Job

Back to blog