6.

An impressive number.

It’s 1 x 2 x 3 AND 1 + 2 + 3.

It’s brilliant AND the number of degrees of freedom a rigid object has to move in three dimensions.

It’s where we are in the history of the Yelp Academic Dataset challenge.

We’ve had 5 rounds, hundreds of academic papers written, and we are excited to go at it again.

Dataset_Map

Our dataset for this iteration includes information about local businesses in 10 cities across 4 countries. This dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses. It also comes with rich attributes data (such as hours of operation, ambience, parking availability) for these businesses, social network information about the users, as well as aggregated check-ins over time for all these users.

At Yelp, one of our missions is to engage with the academic community and help them by providing real-world data to aid their research. Our dataset should be useful to researchers in data mining, machine learning, economics and urban planning alike. Whether you’re building a cutting-edge Natural Language Parsing (NLP) algorithm that mines sentiments expressed by our reviewers, figuring out what business attributes (service quality, ambience, etc.) make a local business popular, or designing better cities and communities by mining local business data – our dataset has everything you need to put your research together.

New Competition: Deadline is Dec 31, 2015

Download the new dataset and remember to submit your entry by December 31, 2015 in order to be eligible for one of our prizes of $5,000. We expect that many folks will be putting the finishing touches on their projects instead of partying that evening, so it’s important to note that we’ll be using Pacific Standard Time for that deadline. Please note that the contest itself is open only to students, though others can use the dataset for academic purposes if they otherwise follow our standard terms and conditions. Check out this webpage for more details.

Fifth Round

We’re still judging the fifth round entries, and we’ll publish a list of winners in the next months, but in the mean time here are a few interesting findings to whet your appetite:

Yelpers have written over 80 million reviews about local businesses as diverse as the wave organ in San Francisco and Daiwa Sushi in Tokyo. Many of those reviews have been voted “Useful”, “Funny”, or “Cool” by our users, but R. Cheng et al. from Santa Clara University decided that they couldn’t read them all. Instead they studied reviews with Useful, Funny and Cool votes in the dataset and distilled them into a single “koan-like” phrase:

"like good food, just one great place."

We suspect there is more than one great place, but we can’t disagree with the sentiment.

Overall, Yelp reviews tend to be more positive than negative, and they also tend to contain a lot of useful information about local businesses. D. Song et al. from University of Washington wondered if there was a correlation between length and sentiment of reviews, and they produced a nice visualization of review lengths on Yelp as it relates to rating. They also compared it to readability guidelines and useful votes:

image01

It turns out positive reviews tend to be shorter than negative ones, but useful reviews tend towards the magical middle.

Back to blog