Yelp Dataset Challenge: Round 10 Winners And Round 12 Announcement
Sébastien C., Data Scientist
- Aug 2, 2018
Round 10 Winners
The tenth round of the Yelp Dataset Challenge ran throughout the second half of 2017 and we received many impressive, original, and fascinating submissions. As usual, we were struck by the quality of the entries: keep up the good work, folks!
Today, we are proud to announce the grand prize winner of the $5,000 award: “Understanding Hidden Memories of Recurrent Neural Networks” by Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, and Huamin Qu (from the Hong Kong University of Science and Technology). These authors developed a visual analytics method for understanding and comparing Recurrent Neural Network (RNN) models used in Natural Language Processing (NLP) tasks. Despite their impressive performances and wide use in deep learning problems, RNNs are still “black boxes” that are difficult for humans to understand. The authors developed a method to, amongst others, facilitate the fine-tuning of RNNs by making it easier to analyze what the different layers and units are doing.
This entry was selected from numerous submissions for its technical and academic merit by our panel of data scientists, data mining engineers, and software engineers. For a list of all previous winners of the Yelp Dataset Challenge, head over to the challenge site. Thanks to all who participated!
Round 12 Announcement
Round 11 just closed (stay tuned for the winners announcement!), and another one opened: round 12 of the Yelp Dataset Challenge is now live. It will run from August 1, 2018 to December 31, 2018. The Yelp Dataset Challenge gives college students access to reviews and businesses from 10 metropolitan areas scattered over 2 different countries. This time around, there are close to 6 million reviews written by about 1.5 million users about 188,500 businesses, as well as 157,075 check-ins and 1.2 million tips left by these users. Moreover, we have added even more photos about these businesses in a separate file, for convenience. With such a trove of data, the sky (or the processing power you have access to) is the limit. Remember, if you are a student, you’ll have the opportunity to win a $5,000 award if your submission is selected as the winner.
Want to try your hand at our dataset? Head to yelp.com/dataset to download and use it for personal, educational, and academic purposes.
And to see what else we’re up to with Yelp data, check out the Yelp blog’s data section.
Dataset Example Code
We maintain a repository of example code to help you get started playing with the dataset. These examples show different ways to interact with the data and how to use our open source Python MapReduce tool mrjob with the data.
The repository includes scripts for
- Converting the dataset from JSON to CSV
- Predicting likely categories given review text
- Finishing reviews using Markov Chains
- Finding the sentiment of words in the dataset
There are many ways to explore the vast data within the Yelp Dataset Challenge Dataset. Below are some examples of some of the many cool tools that can be used with our data:
CartoDB is a cloud based mapping, analysis, and visualization engine that shows you how you can transform reviews into insightful visualizations. They wrote a blog post demonstrating how to use their tools to gain interesting insights about the Las Vegas part of the dataset.
Statwing is a tool used to clean data, explore relationships, and create charts quickly. They loaded the dataset into their system for people to play with and explore interesting insights.