Engineering Blog

Analyzing the Web For the Price of a Sandwich

I geek out about the Common Crawl. It’s an open source crawl of huge parts of the Internet, accessible for anyone to use. You have full access to the HTML and text of billions of web pages. What’s more, you can scan the entire thing, tens of terabytes, for just a few bucks on Amazon EC2. These days they’re releasing a new dataset every month. It’s awesome. People frequently use mrjob to scan the Common Crawl, so it seems like a fitting tool for us to use. mrjob, if you’re not familiar, is a Python framework written by Yelp to...

Continue reading

March Events @ Yelp

This month we’re ramping up and preparing for an awesome time at PyCon. We’ll be there in full force next month so look for us there at booth 606! Be sure to catch a presentation by our own Soups R. on Friday, April 10 at 12:10 where he’ll be speaking on Data Science in Advertising: Or a future when we love ads. In the meantime, hopefully you aren’t too sleepy from daylight savings time to attend some great events this month: Wednesday, March 11, 2015 - 6:00PM - Tech talks and PyCon Startup Row Pitches (SF Python) Thursday, March 19,...

Continue reading

Reading Between the Lines: How We Make Sense of Users' Searches

The Problem People expect a lot out of search queries on Yelp. Understanding exact intent from a relatively vague text input is challenging. A few months ago, the Search Quality team felt like we needed to take a step back and reassess how we were thinking about a user’s search so that we could return better results for a richer set of searches. Our main business search stack takes into account many kinds of features that can each be classified as being related to one of distance, quality and relevance. However, sometimes these signals encode related meaning, making the equivalence...

Continue reading

Yelp Dataset Challenge is Doubling Up!

Two years, four highly competitive rounds, over $35,000 in cash prizes awarded and several hundred peer-reviewed papers later: the Yelp Dataset Challenge is doubling up. We are proud to announce our latest dataset that includes information about local businesses in 10 cities across 4 countries. This dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses along with rich attributes data (such as hours of operation, ambience, parking availability) for these businesses, social network information about the users, as well as aggregated check-ins over time for all these users. This treasure trove of local business data is...

Continue reading

assert_called_once: Threat or Menace

I remember the first time I laid eyes on the beast: Summer, 2012. The air conditioning at Yelp HQ hummed imperceptibly as I reviewed code for a colleague. This wasn’t my first rodeo, but I was new to Yelp and to working in a large Python codebase. I painstakingly scanned the code for lurking bugs, but couldn’t find any. “Ship it!” I declared electronically, freeing my colleague to deploy his changes. It is chilling to think that on that day, I looked the beast squarely in the eyes and never realized it. Cunning camouflage allowed it to slip past me...

Continue reading

February Events At Yelp

The Northeast may be covered in snow, but here in sunny San Francisco, we’re operating at full steam! We got the year off to a great start with a few meetups and the launch of our new tech talk series (more on that later). This month, on top of the meetups we’re hosting, we’re also getting involved with GirlDevWeek. We’ll be hosting a panel here at Yelp HQ, as well as throwing the official after party in conjunction with Pandora. So if you’re going to be at GirlDevWeek, we hope to see you at both events! Meetups at Yelp HQ:...

Continue reading

Animating the Mobile Web

One of the most engaging features of Yelp is our photos and videos gallery. When you visit a Yelp Business Page inside of the mobile app, there is a photo at the top of the page to provide visual context. It also serves as a compelling entry point to our photo viewer if you pull it down. We wanted to have this same effect on our mobile site, so we set out to develop a nice, smooth animation to pull down this photo and delight mobile web users with the same experience they’re used to on our mobile applications. The...

Continue reading

CTEs and Window Functions: Unleashing the Power of Redshift

At Yelp, we’re very big fans of Amazon’s RedShift data warehouse. We have multiple deployments of RedShift with different data sets in use by product management, sales analytics, ads, SeatMe and many other teams. While it minimizes a lot of the work the RedShift team has done to call RedShift a simple fork of Postgres 8.4, RedShift does share a common code ancestry with PG 8.4. This means that much of the advanced query functionality of Postgres is available, which, when combined with the petabyte scale of RedShift, offers some amazingly powerful analytics tools. Because most of the PG 8.4...

Continue reading