JR Heard has many big projects under his belt, and this week we'll get to learn about one of the most recent. Yelp pushes new code almost every day, so it's no surprise we get new features every week. But how do we make sure they're working as intended? JR describes one element of our solution below!
Let's talk about features. Building new features is super fun. Improving pre-existing ones is fantastic, too. What would be less fantastic would be if the your new feature turned out to crumble under production load, or if your untested-gut-feeling improvement to an old feature ended up causing people to use it less. Here at Yelp, we don’t have to worry about that too often, thanks to a system we use both for rolling out new features and for allocating percentages of traffic into the different branches of our A/B tests. Let me tell you about it!
Over the years, we’ve found that the best way to build a big new feature is to break it into small pieces and push each bit to production as it’s completed. There are about a thousand reasons for this, most of which will be familiar to those who’ve worked on a large, long-lived software project (Facebook and The Guardian know what I’m talking about). Fast iteration cycles mean that we get to see how our feature works in the wild much more quickly; on top of that, no matter how well-tested your code is, there’s just no substitute for the peace of mind you get from seeing it run on live traffic.
Of course, when we’re working on a giant new feature that completely replaces an existing page (e.g. our homepage redesign a year and a half ago, not to mention our recent business page redesign!), we can’t just suddenly replace the old page with a blank “Hello world!” page and ask that our users bear with us for a few months. Instead, for each big feature like this, we used to end up writing a function that looked something like:
def should_show_new_homepage(self, request):
if self.user_id in config.homepage_rollout_user_ids:
if self.device_id in config.homepage_rollout_device_ids:
if self.user.is_elite and config.homepage_rollout_is_active_for_elite_users:
This function lets us control who gets to see our new feature-in-progress; essentially, it implements the logic that lets us whitelist a request into seeing our new feature. So this is great - the only people who get to see our feature-in-development are the people who are supposed to be seeing it, and our users don’t have to put up with an unfinished feature while we implement a redesign.
The catch here is that we’ve got a lot of people working on lots of features. Writing one of these functions from scratch for each feature was a clear violation of the DRY principle. Worse: even though we code-review every line of code we write before shipping it, none of us was comfortable with the possibility of accidentally launching an incomplete feature due to mistakenly including a `not` in the wrong place the fiftieth time we wrote one of these functions. We decided to build a tool to solve this problem once and for all.
Our ideal tool would be something that took in a string like ‘foo_shiny_feature’ and returned a string like ‘enabled’, ‘disabled’, or possibly some other string(s) depending on the semantics of the feature being gated. Our solution would have to satisfy the following requirements:
- Traffic allocation
- We should be able to say that, for instance, 5% of traffic gets to see our new feature and 95% of traffic doesn’t.
- Extensible whitelisting
- We should be able to whitelist users into (or out of!) a particular feature in a number of ways (more on this below), and it should be very simple for maintainers to add new ways to whitelist requests.
- We should be able to quickly ask about the status of hundreds of features/experiments over the course of serving a Web request.
- One of the main motivations behind building this tool was to minimize the chance of accidentally launching an in-progress feature, so it should have as little room for operator error as possible.
- We would want to use this tool for other things besides feature rollouts: for instance, we would also like to use it to distribute traffic among cohorts in A/B tests.
We came up with a solution we call RequestBucketer, and we’ve been using it in production for about a year now. You interact with it like this:
# => 'bright_red'
# => True
# => False
RequestBucketer gets its name because it lets you say: “My feature has these four buckets; these two buckets have special whitelisting behavior; and here are all four buckets’ traffic percentages. Here’s an HTTP request: what bucket does it fall into?”
Let’s be more specific about what I mean when I talk about how buckets can have “special whitelisting behavior.” Toward the start of a new feature’s life, we want to make sure that the only people who actually see that new feature are the engineers working on it. We can do this in a couple of ways:
- We can whitelist access to the feature based on the ID of a request’s logged-in user, so that engineers can see the feature from their home computer if they’re logged into yelp.com.
- That doesn’t let our engineers test out how the feature behaves for logged-out users - to cover that case, we can whitelist access to the feature based on a device-specific ID.
Later on, once the feature’s working well enough that it can be beta-tested by other folks, we have a couple of other whitelisting tools at our disposal:
- We can say that any request that originates from within our internal corporate network gets to see our new feature, but usually we won’t want to do this until the feature is pretty fully-functional, so that other departments don’t have to deal with our feature-in-progress.
- We also like to roll out features to certain types of logged-in users. For instance, when we added the ability for users to write reviews from their mobile devices, our Elites got to play with that feature weeks before anyone else. We also have a team of Community Managers in cities across the globe, and we love to collect early feedback on new features by giving our CMs early access.
RequestBucketer is backed by a simple YAML file with a bunch of entries (we call them BucketSets) that look like this:
type: *FEATURE_RELEASE # as opposed to, for instance, *EXPERIMENT
- *JRHEARD_USER_ID # jrheard is a curmudgeon, doesn't want the new version until it's done
In the example above, when we check what bucket a given request falls into for `foo_shiny_feature`, we’ll first check the buckets’ whitelists. For instance, if my boss Wing is logged in, he’ll be in the ‘enabled’ bucket, guaranteed. If a request isn’t whitelisted into any buckets at all (e.g. it’s made from an IP outside of the Yelp corporate network and doesn’t have a whitelisted user-id or device ID), we’ll fall back to the buckets’ traffic percentages. As you’d expect, 10% of those requests will be assigned to the ‘enabled’ bucket, and the other 90% will be assigned to the ‘disabled’ bucket.
“Hold on a second,” astute readers say - “what happens if jrheard is logged in and is making a request from an internal IP?” Great question! To deal with situations like this, RequestBucketer has a simple concept of a “whitelist match specificity.” Simply put: some types of whitelisting are more specific than others - a device ID is more specific than a logged-in user ID, and a logged-in user ID is more specific than an IP range. If a request has a whitelist match in multiple buckets, the bucket with the most specific match wins. This is all easily configurable, and as you teach RequestBucketer about new ways to whitelist requests, it’s super-simple to teach it how specific these new whitelist matches are - it looks a lot like this:
WHITELIST_MATCH_SPECIFICITY_ORDERING = [
RequestBucketer’s a simple system, and we use it so frequently that I launched a feature with it halfway through writing this blog post. We use it to power our experiments system, too - but that’s a discussion for another post. Have any questions about how we use RequestBucketer in production or comments on its design? Let us know in the HN discussion thread!