Docker in the Real World at Yelp

Thousands of businesses use Yelp SeatMe every day to manage their seating and reservations. Having a stable system is incredibly important to us, given how critical this system is to many businesses. This blog post is going to dive into how we use Docker to reliably develop and deploy Yelp SeatMe. Docker is an incredibly powerful productivity booster and has simplified our deployment pipeline. Hopefully, from this post, you’ll understand how to do it for your team. First, we’ll give a little bit of background on what Yelp SeatMe is, how it’s developed, and deployed.

What is Yelp SeatMe?

Restaurants manage and accept reservations via our native iPad app and/or web interface. We keep all devices in sync in real-time and also support offline changes if clients become disconnected.

Seatme Devices

Our web technology stack is:

Javascript on the client, using Backbone for our single page apps
Python is our backend language, with Django as our framework, running under uWSGI
Celery and RabbitMQ for asynchronous work
Postgres database, using triggers for data validation and update notifications to support our sync protocol and long polling engines

We host the entire platform in AWS and we use Chef as our primary tool for bootstrapping servers, managing deploys, and organizing our testing, staging, and production environments.

Getting Docker Into Production

A year ago, we started down a path to make our environments (testing, staging, and production) more consistent and to simplify our deployment process using Docker containers.

In our pre-container setup, all the logic for deploying a web app server was kept in a Chef recipe. The process looked something like:

set directory permissions
install required Python libraries and other packages
download a specific git tag to the local git repo
maintain a cache of the last few releases to ensure we can quickly roll back

These Chef recipes weren’t really accessible for new engineers, since they were in a different language (Ruby) and used a complex framework. This setup made the deploy brittle and difficult to change. Containers to the rescue!

By using Docker, we were able to simplify the parts of our deploy that were Chef managed, down to just Docker container manipulation:

pull a specific Docker image to the server
bring the server out of service, by health checking it down
stop the existing container
tag the new image with a human readable name to point out it’s the current release (www:latest)
start the image (with all of its filesystem mappings, etc) as a new named container (www)

Benefits of the Docker setup:

increased developer control of the environment via source-control managed edits to the Dockerfile
elimination of server environment drift, over and above what Chef already provided
reduced the amount of Chef code required to configure the application from thousands to hundreds of LOC
centralized repository of deployable images that can always be mapped to an exact git commit
we continuously build our Docker images, so every completed code review always has a deployable image built when its tests pass
developers can now modify system level packages, without requiring the operations team to do it

After two months of development and testing, we rolled Docker into production early October 2014 with no major hiccups.

Things to Keep in Mind as you Transition

Docker is a fairly new technology, so of course there are going to be some ‘gotchas’ as you try to roll things out. Here are some issues/tips while you think about deploying Docker from development to production in your environment.

Make sure you thoroughly test your storage drivers

There are several storage drivers available for Docker, with AUFS typically considered the prior generation, and Device Mapper being the current generation. One thing we discovered while testing was local filesystem corruption while using Device Mapper, as well as full system hangs. While it’s very likely these were kernel and distribution version specific, we found AUFS to be very stable for us in production. Always thoroughly test your storage driver, especially via repeated deployments, before moving out of a staging environment.
Build times for Docker images are slow and disk images can become big quickly

Due to how the Docker file system layers work, if you change something early in the layer order, you must rebuild all successive layers. Figuring out what steps are expensive in terms of time or network access and how to order your build steps to minimize future rebuilding isn’t always straightforward. There is a fine art to combining commands into a single command line to minimize layers and rebuilding time. Consider caching Docker images from successful builds in order to decrease build times.
Always have explicit commands for building and launching containers kept in source control alongside your Dockerfile

We used make to do this. Once you have a few arguments to docker run, you want these arguments documented and easily extractable. If these arguments change over time or with a new version of Docker, you don’t want to have to change a configuration file or shell alias on the server; it should be a code change.
Images and containers don’t automatically clean themselves up

This has been talked about at length, but we discovered the hard way when all our server hard drives were mysteriously full, that every image we deployed was on the servers until we removed it. We have reduced our disk usage with some simple image management scripts, but are exploring tools like docker-custodian to really solve the issue.
Consider layering your images into multiple Dockerfiles to speed up build times

Over and above what we discussed earlier with image caching, we’ve implemented a multi-layer Docker image strategy, designing images to inherit from each other.

The base image, which contains things like system packages, is very slow to build but changes infrequently.

The production web image inherits from the base container, and contains a fully built set of Python packages as well as a point in time snapshot of the source code. This is fast to build except in the event of a requirements.txt change, which then triggers a rebuild of the virtualenv.

The developer web image inherits from the production image, adding the tools to run selenium tests (Xvnc, google-chrome), as well as additional developer tools. The developers then use file system mappings to bring their live source code into this container, overlaying the snapshot of source in the production web image.
In an emergency, you can still reach inside the container to debug your code

Initially, when you run your code inside a Docker container, it can feel like putting a layer between you and the code that prohibits a lot of normal debugging techniques. While it’s not considered best practice for production systems, when in testing (especially development) it’s fully possible to open a shell with docker exec -ti <containerid> bash and then use normal tools like strace on your running code or interactively examine the file system.
You need a strategy for log management

While an extremely mundane topic, log management with Docker turns out to be a lot trickier than it sounds. While there are simple techniques you can use to map host file systems into the container and simply log to disk, this quickly fails to scale when you decide to start more than one container on a given host.

In the end, after a lot of experimentation, we’ve had the most success with the application logging directly to the host rsyslog over UDP. Before version 1.7 of Docker, doing this required a technique for finding the correct IP address to syslog to, which we solved by passing it in through environment variables.

To the Future

In conclusion, containerization has been a big win. For example, when the Shellshock vulnerability was announced, our incident response was almost completely taken care of by a container rebuild and a Chef run. Despite growing pains, Docker has been a change we are happy with and continue to invest in. With a stable web infrastructure building block, we have found newer and more efficient ways to test, deploy, and manage our infrastructure, as well as iterate faster in development. We hope to share these uses in future articles, so stay tuned!

If you are interested in learning more about Docker, Yelp is hosting a Docker meetup this Thursday at our office. We would love to continue the discussion in person! We are also hiring developers excited to work alongside us with Docker as we continue to grow.

Acknowledgements

Many thanks to Kris Wehner and Charles Guenther for their experience and help writing this blog post.

Work on Docker at Yelp

Interested in working on our Docker deployments and continuing to help improve our Docker setup? Join our SRE team!

View Job

Back to blog

Yelp

Engineering