Buck E., Software Engineer
- Mar 4, 2016
venv-update is an MIT-Licensed tool to quickly and exactly synchronize a Python project’s virtualenv with its requirements.
This project ships as two separable components: pip-faster and venv-update. Both are designed for use on large Python projects with hundreds of requirements and are used daily by Yelp engineers.
For complete documentation, please see http://venv-update.rtfd.org
Making large Python projects painless
The majority of yelp.com is implemented in a single Python project, dubbed
“yelp-main”. Initially, yelp-main installed all of its dependencies at the
system level. We’ve done the work to transition this to using virtualenv, and
managing yelp-main’s (Python) requirements via pip and requirements.txt.
Immediately we saw some problems with simply running
virtualenv && pip install;
any removed requirements would persist, creating potential mismatches between
developers’ environments and production. The simplest solution was to
completely remove the virtualenv when any requirements changed.
It’s too slow!
This led to a second problem: rebuilding a virtualenv from scratch was slow. Some of our requirements had a long build process (e.g. lxml, numpy) which would repeat each time. The obvious solution was wheels, which essentially are a post-build zip of a Python package. Given our own pypi server, we could upload Linux wheels for all of our requirements. Then a full rebuild would only consist of some hits to pypi and unzip.
It’s still slow!
Even at this point, re-installing yelp-main would take 3-5 minutes. This was
true even if we didn’t remove the virtualenv beforehand. Profiling revealed
that all of the time was taken in http requests to our PyPI server. Even though
we had pinned all our dependencies (with requirements like
pip would still reach out to PyPI before before deciding which version to pick.
It would do this even when the pinned version was available in the local wheel
cache, and even when the pinned version was installed.
I tried for over a month to untangle the hairball that was pip 1.5, but at last gave up. A workable solution was to invent a wrapper on pip that would insert the necessary monkey-patches to do what we needed: don’t hit PyPI when a requirement is pinned and there’s already a wheel available in the local cache. While monkey-patching is a despicable code smell, it worked quite well. What used to take minutes now takes seconds! We gave pip-faster an extensive test suite and called it Good Enough.
What’s more, we could add a bit of logic to simply uninstall any packages that
were no longer required, eliminating the need to delete the virtualenv when our
requirements changed. You can use this via
pip-faster install --prune.
It’s notable that pip has since fixed or improved several of these issues, and I should try again to push these improvements upstream someday.
How much faster is pip-faster?
If we install plone (a large python application with more than 250 dependencies) we get these numbers:
In the “cold” case, all caches are completely empty. In the “noop” case nothing needs to be done in order to update the virtualenv. In the “warm” case caches are fully populated, but the virtualenv has been completely deleted.
The Benchmarks page has more detail.
The bootstrap problem
Now we can smoothly and quickly update our virtualenv, but how exactly does the
virtualenv come to exist in the first place? After all, pip-faster needs to be
installed somewhere. We initially implemented our virtualenv bootstrap as a
bash script, but it quickly got out of hand. There are many edge cases to be
handled. For example, we’ve made a couple major transitions in our virtualenv:
we stopped using
--system-site-packages, and we moved from
python2.7. During these transitions, we needed developers to be able to switch
branches across these changes without things exploding in inexplicable ways.
venv-update solves all of the above problems, from start to finish. It
idempotently gives you a correct virtualenv with the correct packages
installed. That includes creating a virtualenv as necessary, or possibly
removing a virtualenv that’s become invalid due to the above edge cases (and
more). It installs pip-faster to that virtualenv (as necessary), and uses it to
quickly and exactly update your installed Python packages to match your
checked-in requirements.txt. This is quite useful in a Makefile (a developer
can always just
git pull && make to be up-to-date), but it’s sufficiently
configurable to be a general solution to these problems in your Python
For the full documentation of venv-update, please visit http://venv-update.rtfd.org