Introducing venv-update

venv-update is an MIT-Licensed tool to quickly and exactly synchronize a Python project’s virtualenv with its requirements.

This project ships as two separable components: pip-faster and venv-update. Both are designed for use on large Python projects with hundreds of requirements and are used daily by Yelp engineers.

For complete documentation, please see http://venv-update.rtfd.org

Making large Python projects painless

The majority of yelp.com is implemented in a single Python project, dubbed “yelp-main”. Initially, yelp-main installed all of its dependencies at the system level. We’ve done the work to transition this to using virtualenv, and managing yelp-main’s (Python) requirements via pip and requirements.txt. Immediately we saw some problems with simply running virtualenv && pip install; any removed requirements would persist, creating potential mismatches between developers’ environments and production. The simplest solution was to completely remove the virtualenv when any requirements changed.

It’s too slow!

This led to a second problem: rebuilding a virtualenv from scratch was slow. Some of our requirements had a long build process (e.g. lxml, numpy) which would repeat each time. The obvious solution was wheels, which essentially are a post-build zip of a Python package. Given our own pypi server, we could upload Linux wheels for all of our requirements. Then a full rebuild would only consist of some hits to pypi and unzip.

It’s still slow!

Even at this point, re-installing yelp-main would take 3-5 minutes. This was true even if we didn’t remove the virtualenv beforehand. Profiling revealed that all of the time was taken in http requests to our PyPI server. Even though we had pinned all our dependencies (with requirements like package-x==1.2.3), pip would still reach out to PyPI before before deciding which version to pick. It would do this even when the pinned version was available in the local wheel cache, and even when the pinned version was installed.

Enter pip-faster

I tried for over a month to untangle the hairball that was pip 1.5, but at last gave up. A workable solution was to invent a wrapper on pip that would insert the necessary monkey-patches to do what we needed: don’t hit PyPI when a requirement is pinned and there’s already a wheel available in the local cache. While monkey-patching is a despicable code smell, it worked quite well. What used to take minutes now takes seconds! We gave pip-faster an extensive test suite and called it Good Enough.

What’s more, we could add a bit of logic to simply uninstall any packages that were no longer required, eliminating the need to delete the virtualenv when our requirements changed. You can use this via pip-faster install --prune.

It’s notable that pip has since fixed or improved several of these issues, and I should try again to push these improvements upstream someday.

How much faster is pip-faster?

If we install plone (a large python application with more than 250 dependencies) we get these numbers:

testcase	pip v8.0.2	pip-faster	improvement
cold	4:39s	4:16s	8%
noop	7.11s	2.40s	196%
warm	44.6s	21.3s	109%

In the “cold” case, all caches are completely empty. In the “noop” case nothing needs to be done in order to update the virtualenv. In the “warm” case caches are fully populated, but the virtualenv has been completely deleted.

The Benchmarks page has more detail.

The bootstrap problem

Now we can smoothly and quickly update our virtualenv, but how exactly does the virtualenv come to exist in the first place? After all, pip-faster needs to be installed somewhere. We initially implemented our virtualenv bootstrap as a bash script, but it quickly got out of hand. There are many edge cases to be handled. For example, we’ve made a couple major transitions in our virtualenv: we stopped using --system-site-packages, and we moved from python2.6 to python2.7. During these transitions, we needed developers to be able to switch branches across these changes without things exploding in inexplicable ways. Enter venv-update

venv-update solves all of the above problems, from start to finish. It idempotently gives you a correct virtualenv with the correct packages installed. That includes creating a virtualenv as necessary, or possibly removing a virtualenv that’s become invalid due to the above edge cases (and more). It installs pip-faster to that virtualenv (as necessary), and uses it to quickly and exactly update your installed Python packages to match your checked-in requirements.txt. This is quite useful in a Makefile (a developer can always just git pull && make to be up-to-date), but it’s sufficiently configurable to be a general solution to these problems in your Python projects.

For the full documentation of venv-update, please visit http://venv-update.rtfd.org

Back to blog

Yelp

Engineering