The Challenge

The second round of the Yelp Dataset Challenge opened in May 2013, giving students access to our massive Phoenix Academic Dataset, with reviews and businesses from the greater Phoenix metro area. The Yelp team is very excited to provide the academic community with a rich dataset over which to train and extend their models and research. We encourage students to take advantage of this wealth of data to develop and extend their own research in data analysis and machine learning. Students who submit their research are eligible for cash awards and incentives for publishing and presenting their findings.

The dataset was downloaded by thousands of students around the world. From the completed entries we have selected David W. Vinson of University of California, Merced as the Round 2 winner with his submission “Valence Constrains the Information Density of Messages.”

Updating and Extending the Dataset

We are excited to announce that we have updated and extended the original Phoenix Academic Dataset! The original dataset, released in March 2013, has been well-received by the academic community, and has already been cited in papers and included in presentations around the world. For more information on past winners and their papers, please check out the Yelp Dataset Challenge site.

The new dataset builds upon this foundation by not only refreshing it with new content created over the past year but also including new data like business attributes, the social graph and tips.

The new Phoenix Academic Dataset incorporates the following updates and new data types:

  • Businesses - 15,585 (+4,048 new businesses!)
  • Business Attributes - 111,561 (new!)
  • Check-in Sets - 11,434 (+3,152 new check-in sets!)
  • Tips - 113,993 (new!)
  • Users - 70,817 (+26,944 new users!)
  • User Connections - 151,516 (new!)
  • Reviews - 335,022 (+105,115 new reviews!) This new data is available for immediate download at www.yelp.com/dataset_challenge and replaces the previous Phoenix Academic Dataset. We are eagerly anticipating seeing the projects and research that will be built using this data. We are especially excited to see the research related to the new content: from micropost analysis on tips to inferring business attributes from reviews to mining the rich social graph for insights. We look forward to what you come up with!

Round 3 is Now Live

Along with the updated dataset, we’re also happy to announce the next iteration of the Yelp Dataset Challenge. The challenge will be open to students in the US and Canada and will run from February 11th, 2014 to July 31, 2014. See the website for the full terms and conditions. This data can be used to train a myriad of models and extend research in many fields. So download the dataset now and start using our data right away!

Back to blog