mrjob: Distributed Computing for Everybody
-
Dave M Search and Data-Mining Engineer
- Oct 29, 2010
Ever wonder how we power the People Who Viewed this Also Viewed… feature? How does Yelp know that people viewing Coit Tower might also be interested in the Filbert Steps and the Parrots of Telegraph Hill? It’s pretty much what you’d expect: we look at a few months of access logs, find sessions where people viewed more than one business, and collect statistics about pairs of businesses that were viewed in the same session. Now here’s the kicker: we generate on the order of 100GB of log data every day. How do we deal with terabytes of data in a...