« Yelp Engineering Opens an Office in Palo Alto: Come Help the World Find Great Local Businesses! | Main | Guido is coming to Yelp -- to talk about Tulip! »


Yelp Dataset Challenge Winners & Round Two Now Live

The Challenge

The inaugural Yelp Dataset Challenge opened in March 2013 with the release of our latest academic dataset featuring reviews and businesses from the greater Phoenix metro area. The goal of the dataset was to encourage development of new techniques in data analysis and machine learning while providing the academic community with a rich dataset over which to train their models. Students who submitted their research related to the dataset were eligible for a cash reward and further incentives for publishing and presenting their findings.

The Winners of the First Yelp Dataset Challenge

The challenge was viewed by many thousands of people and thousands of qualified applicants participated by downloading the dataset. From the completed entries we received, a team of our data mining engineers have selected four entries as grand prize winners (in alphabetical order by entry name): 

We were extremely pleased with the breadth and depth of the entries received. Universities have started incorporating the dataset into their machine learning and natural language processing courses and we saw many strong entries from class projects. We look forward to more academic integration at the course level and seeing more entries from such projects.

Most of the entries used some aspect of machine learning; from inferring subtopics (Huang, et. al. and McAuley, et. al.) to predicting future reviews (Hood, et. al.) and many others. We also received many other entries including one of the winners, Wang, et. al., which applied word clouds to increase the utility of a large number of reviews.

Opening the Next Yelp Dataset Challenge

We are happy to announce the next iteration of the Yelp Dataset Challenge. The challenge will be open to students in the US and Canada and will run from September 26, 2013 to February 10, 2014. See the website for the full terms and conditions

This data can be used to train a myriad of models and extend research in many fields. We can’t wait to see what you come up with!