The inaugural Yelp Dataset Challenge opened in March 2013 with the release of our latest academic dataset featuring reviews and businesses from the greater Phoenix metro area. The goal of the dataset was to encourage development of new techniques in data analysis and machine learning while providing the academic community with a rich dataset over which to train their models. Students who submitted their research related to the dataset were eligible for a cash reward and further incentives for publishing and presenting their findings.
The Winners of the First Yelp Dataset Challenge
The challenge was viewed by many thousands of people and thousands of qualified applicants participated by downloading the dataset. From the completed entries we received, a team of our data mining engineers have selected four entries as grand prize winners (in alphabetical order by entry name):
- "Clustered Layout Word Cloud for User Generated Review." Ji Wang, Jian Zhao, Sheng Guo, Chris North. Virginia Tech and University of Toronto.
- "Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text." Julian McAuley, Jure Leskovec. Stanford University.
- "Improving Restaurants by Extracting Subtopics from Yelp Reviews." James Huang, Stephanie Rogers, Eunkwang Joo. UC Berkeley.
- "Inferring Future Business Attention." Bryan Hood, Victor Hwang, Jennifer King. Carnegie Mellon University.
We were extremely pleased with the breadth and depth of the entries received. Universities have started incorporating the dataset into their machine learning and natural language processing courses and we saw many strong entries from class projects. We look forward to more academic integration at the course level and seeing more entries from such projects.
Most of the entries used some aspect of machine learning; from inferring subtopics (Huang, et. al. and McAuley, et. al.) to predicting future reviews (Hood, et. al.) and many others. We also received many other entries including one of the winners, Wang, et. al., which applied word clouds to increase the utility of a large number of reviews.
Opening the Next Yelp Dataset Challenge
We are happy to announce the next iteration of the Yelp Dataset Challenge. The challenge will be open to students in the US and Canada and will run from September 26, 2013 to February 10, 2014. See the website for the full terms and conditions.
This data can be used to train a myriad of models and extend research in many fields. We can’t wait to see what you come up with!