Engineering Blog

November 7th, 2013

Data Quality: How Yelp stacks up to the competition

Here at Yelp we take pride in providing accurate business listings. This is a harder problem than you might think. There are about 50 million businesses listed on Yelp, businesses come and go at an astonishing rate, and reasonable people canand frequently dodisagree about whether a piece of information is accurate or not (e.g., does a doctor qualify for his own listing or just to be listed under her medical practice?). We've always thought we did a pretty good job, but we thought it was worth benchmarking ourselves against the competition. Though there is always room to improve, it turns out we are one of the best.

For our study we collected “Best Of” lists from independent local publications like 7×7 in San Francisco, Time Out London, Esquire, and LA Weekly. We assumed that these lists will provide a rough sample of important local businesses as well as correct listing data (backed up by the business’s own website). We pulled together these lists until we had approximately 1,000 businesses distributed across a variety of categories and geographiesfrom ski resorts in Colorado to pubs in Londongiving us a broad range of businesses for our test group. Clearly this approach has flaws. Neither the relative weighting of lists nor our assumptions are perfectwe have 100 gas stations but no French business listingsbut it does give us a rough idea of where things stand.

Once we had this list of businesses, our team went through the listings one by one verifying name, address, phone number, website, presence of photos, lack of duplicates, and location accuracy on Yelp as well as some of our competitors. Roughly, we gave each data provider 1 point for each correct datum and 0 points for each incorrect or missing datum (you can see a more detailed rubric at the end of the article). Here is an example of a Yelp listing that misses a few points:

You may be wondering why we’re looking at photos and not at reviews. Not all of the sites in our study have the same concept of reviews, and we needed a measure of content that is comparable across sites. We also like photos as a data point since they validate listings data: folks posting photos usually have physically visited the business.

With all this data collected, we looked to see what percentage of the possible points each listing site received in each category. Without further ado, lets see the results!

Entries that are significantly better than the other entries in a given row at a 95% confidence level are bolded.

There are a few things of note. First, TripAdvisor and Yellowbook do not have listings in some categories/geographies so they have a smaller set of samples. Second, fewer listings had websites than other types of data. This is at least partly because not all businesses have websites, so the maximum is less than 100%. Finally, one downside of our approach to scoring is that a missing listing gives you credit for not having a duplicate. This flaw in scoring combined with a fairly large number of listings outside of TripAdvisor’s main area of focus79 Shopping businesses in Dublin, Ireland for examplelikely inflate TripAdvisor’s “No Dupes” score.

The high-level story is that in terms of listing data Yelp and Google are closely matched and ahead of the other competition. Google wins out on finding business websites, which isn’t surprising given that crawling the web is part of its core business. On the other hand, Yelp is well ahead of Google in terms of photo content.

This is just a rough early step toward understanding the quality of our data. Fortunately, we have a Data team to take on this very challenge. They are applying machine learning and data mining techniques to improve our listings both automatically and by better leveraging the Yelp community. If you think this sounds cool, check out our Data Mining or Data Infrastructure jobs. If you want to dive deeper into these differences across the market and publish your results then ping us at dataset@yelp.com and we will be happy to get you a closer look at the raw data.

Scoring Rubric

Business Name – Is the name recognizable?

  • 1 – Correct within bounds of simple rewrites. “st” vs “street” gets a point, but “Fifth Avenue Barbershop” vs “5th Avenue Barber & Salon” is much harder to match and thus gets no point.

  • 0 – Having a name that is either very wrong or unmatchable, not having any listing at all, or being closed and not marked as such

Address

  • 1 – Correct address

  • 0 – Incorrect or missing address data

Phone

  • 1 – Correct phone number

  • 0 – Incorrect or missing phone number

Has Photos – A rough cross-site measure of content quality

  • 1 – Has any photos

  • 0 – Has no photos

Website

  • 1 – Correct website link

  • 0 – Incorrect or missing link

No Dupes – Are there any duplicate listings of the business?

  • 1 – There is at most one listing of the business

  • 0 – There are multiple listings for the business

Location accuracy – Can you find the business from the map pin?

  • 1 – Location is close enough to easily find the business. Fifteen yards down the street is OK, 3 blocks away is not.

  • 0 – Location is clearly different than other sites, like not on the same block. Spot-verified by Google Street View