What can you learn from a Photo? Show us with the Yelp Dataset Challenge Round 7!

The Challenge

The Yelp Dataset Challenge provides the academic community with a real-world dataset over which to apply their research. We encourage students to take advantage of this wealth of data to develop and extend their own research in data science and machine learning. Students who submit their research are eligible for cash awards and incentives for publishing and presenting their findings. A new round of the Yelp Dataset Challenge (our seventh already!) opened on January 15, 2016, giving students access to reviews and businesses from 10 cities scattered over 4 different countries. The challenge is also open to international students. See the terms and conditions for more information.

New Data: Now Including 200,000 Photos

Deep learning has changed the game for researchers and companies alike over the last couple of years. To keep up with the trend, we are proud to announce that we are updating the previous dataset by adding more businesses, reviews, tips, check-ins, and users across the 10 cities we selected! Moreover, we’re excited to announce a great new feature: a dataset of 200,000 photos taken by our users in the businesses selected for round 7. These photos nicely complement reviews, business attributes, check-ins, and tips, and open the door to even more exciting research. This addition to the dataset is released as an auxiliary 5.9 Gb tar file. With this update, the new dataset is comprised of reviews, businesses and user information from:

  • Phoenix, AZ
  • Pittsburgh, PA
  • Charlotte, NC
  • Urbana-Champaign, IL
  • Las Vegas, NV
  • Madison, WI
  • Waterloo, Canada
  • Montreal, Canada
  • Karlsruhe, Germany
  • Edinburgh, UK

This new dataset and the photo auxiliary file are available for immediate download. Compared to the round 6 dataset, here are all the new features this dataset includes:

  • Businesses: 77,445 (+27%)
  • Business Attributes: 566,610 (+18%)
  • Check-in Sets: 55,569 (+23%)
  • Tips: 591,864 (+18%)
  • Users: 552,339 (+51%!)
  • User Connections: 3,563,817 (+23%)
  • Reviews: 2,225,213 (+42%)
  • Photos: 200,000 from 41,658 businesses

Round 7 is Now Live

The challenge is now open to students around the world and will run from January 15, 2016 to June 30, 2016. See the website for the full terms and conditions. These data can be used to train a myriad of models and extend research in many fields. So download the dataset now and start using this real-world dataset right away!

Back to blog