The Yelp Dataset Challenge provides the academic community with a real-world dataset over which to apply their research. We encourage students to take advantage of this wealth of data to develop and extend their own research in data science and machine learning. Students who submit their research are eligible for cash awards and incentives for publishing and presenting their findings.
The most recent Yelp Dataset Challenge (our third round) opened in February 2014, giving students access to our Phoenix Academic Dataset, with reviews and businesses from the greater Phoenix metro area. In the fourth round, open now, we are expanding the dataset to include data from four new cities from around the world. We are also opening up the challenge to international students, see the terms and conditions for more information.
We are proud to announce that we are extending the popular Phoenix Academic Dataset to include four new cities! By adding a diverse set of cities we hope to encourage students to compare and contrast the different aspects of each city and find new insights about what makes each city unique. The dataset is comprised of reviews, businesses and user information from:
- Phoenix, AZ
- Las Vegas, NV (new!)
- Madison, WI (new!)
- Waterloo, CAN (new!)
- Edinburgh, UK (new!)
This new dataset increases the data included in the previous Phoenix Academic Dataset with the following new data and is available for immediate download:
- Businesses - 42,153 (+26,568 new businesses!)
- Business Attributes - 320,002 (+208,441 new attributes!)
- Check-in Sets - 31,617 (+20,183 new check-in sets!)
- Tips - 403,210 (+289,217 new tips!)
- Users - 252,898 (+182,081 new users!)
- User Connections - 955,999 (+804,482 new edges!)
- Reviews - 1,125,458 (+790,436 new reviews!)
Round 4 is Now Live
Along with the updated dataset, we’re also happy to announce the next iteration of the Yelp Dataset Challenge. The challenge will be open to students around the world and will run from August 1st, 2014 to December 31, 2014. See the website for the full terms and conditions. This data can be used to train a myriad of models and extend research in many fields. So download the dataset now and start using this real-world dataset right away!