The Yelp code base has been under development for over six years. We push multiple times a day. We don’t have a dedicated QA team. Yelpers get very angry when their detailed account of how that waitress was totally into them doesn’t get saved.

Effective automated testing is the only way we can stay sane. One of the great benefits of Python is its “batteries included” philosophy which gives us access to lots of great libraries including the built-in unittest module. However, as our code base grows in size and complexity, so do our testing needs.

There are plenty of open source libraries that have been developed to help augment Python testing. Notably:

• Nose which provides a more advanced test runner (and automated test discovery)
• unittest2 (or just unittest in python 2.7) provides some enhancements to unittest including class and module level setup/teardown, discovery and decorator-based test skipping. However none of these projects really met all our needs and more importantly didn’t prove to be very easily extended.

So we wrote our own test framework: Testify.

Testify provides:

• PEP8 naming conventions. No more setUp()
• Less java-like dependencies on class methods for things like assertions. Down with self.assertEqual()
• Enhanced test fixture setup. Multiple setup/teardown methods. Class and module level fixtures.
• Test discovery
• Flexible, decorator based suite system.
• Fancy color test runner with lots of logging / reporting options (JSON anyone?)
• Split test suites into buckets for easy parallelization
• Built-in utilities for common operations like building mocks, profiling, and measuring code coverage.
• Plugin system for hooking in whatever extra features you like.
• Mostly backwards compatible with unittest test cases.
• A cool name. A few words on how we actually use testify

Testify is pretty flexible so it may help to give some examples of how it works for us.

Our test framework is required to meet a few different workflows:

• Test Driven Development
• Verifying your change isn’t going to break the whole site prior to code review and staging
• Verification that a code push isn’t going to break the site. For TDD, you often need to re-run the same test or tests from a single module over and over again. Selecting exactly what test method to run is important:

testify yelp.tests.search SpellCheck.test_pizza_search

For running a larger set of tests that might be affected by your change you might organize your tests into one or more suites. For example, tests that impact search will put together in the search suite. If you make a change that might impact search, we can just run:

testify yelp.tests -i search

For pre-deployment testing we need to run everything. Everything is a lot.

KegMate or no, who wants to wait hours to find out UFC is really FUC’d. Thankfully we can split tests up:

testify yelp.tests –buckets=10 –bucket=0

This allows us to split all our tests into 10 equal-ish sized chunks. We use Buildbot, another open source project, to run them across a cluster of machines.

We are not using testify only for unit testing (Despite the name, most users of unittest probably aren’t either). We have a mix of unit, functional and integration tests which means we have a variety of environmental requirements. Sometimes we need a snapshot of a real database to test on (scrubbed of those illicit PMs to your favorite reviewer of course). Sometimes we need a full search index to search against. Sometimes we need an empty database where we can create exactly what data a test will see.

This means complex test fixtures with slow startup times.

Since different tests require different resources to run, we can associate these tests with suites to isolate them. So if your test can run in a simulated environment with minimal data, it can go in a sandbox suite. If it requires external services like search index, you can add it a search-index-required suite. This allows us to further collect Buildbot slaves together to reduce the number of tests running in more expensive environments.

Of course when you have complex tests with complex external dependencies, eventually some of them will become unreliable. To help control for that we use Testify’s reporting capabilities to store a history of our test runs.

When was the last time this test failed ? Is it me? Is it you? We can take all the JSON output from prior runs and put it all in a database making such analysis much easier.

If the test is just fundamentally flawed, it’s easy to disable it. Just add it to the disabled suite, file a ticket, and move on.

Try it out

Testify has been great for us. We hope others find it useful as well.

May your testing infrastructure be forever pain free.

Testify is released under the Apache 2.0 license and hosted on GitHub.