Engineering Blog

Posts for October 2010

Now Testify!

The Yelp code base has been under development for over six years. We push multiple times a day. We don’t have a dedicated QA team. Yelpers get very angry when their detailed account of how that waitress was totally into them doesn’t get saved. Effective automated testing is the only way we can stay sane. One of the great benefits of Python is its “batteries included” philosophy which gives us access to lots of great libraries including the built-in unittest module. However, as our code base grows in size and complexity, so do our testing needs. There are plenty of...

Continue reading

mrjob: Distributed Computing for Everybody

Ever wonder how we power the People Who Viewed this Also Viewed… feature? How does Yelp know that people viewing Coit Tower might also be interested in the Filbert Steps and the Parrots of Telegraph Hill? It’s pretty much what you’d expect: we look at a few months of access logs, find sessions where people viewed more than one business, and collect statistics about pairs of businesses that were viewed in the same session. Now here’s the kicker: we generate on the order of 100GB of log data every day. How do we deal with terabytes of data in a...

Continue reading