Yelp Dataset Challenge Round 9 Winner
-
Sébastien C., Data Scientist
- Jan 8, 2018
Yelp Dataset Challenge Round 9 Winners
The ninth round of the Yelp Dataset Challenge ran throughout the first half of 2017 and, as usual, we received a large number of highly impressive and interesting submissions. Needless to say, we were struck by the quality of the entries: keep up the good work!
Today, we are proud to announce the grand prize winner of the $5,000 award: “CORALS: Who are My Potential New Customers? Tapping into the Wisdom of Customers’ Decisions” by Ruirui Li, Chelsea J-T Ju, Jyunyu Jiang, and Wei Wang (from the Department of Computer Science of the University of California in Los Angeles). These authors developed an elaborate recommendation system that matches businesses with potential customers by taking into account a number of aspects of the customer decision-making process.
Their model considers the personal preferences of the users (based, for example, on their check-ins), but also geographical data, as well as the reputation of local businesses. In comparison with other recent and widely-used methods, the authors demonstrated that their model performs better. This work is fully relevant to what we do at Yelp, and should be of interest to tech companies across the globe.
This entry was selected from numerous submissions for its technical and academic merit by our panel of data scientists, data mining engineers, and software engineers. For a list of all previous winners of the Yelp Dataset Challenge, head over to the challenge site. Thanks to all who participated!
Want to try your hand at our dataset? Head to yelp.com/dataset to download and use it for personal, educational, and academic purposes.
And to see what else we’re up to with Yelp data, check out the Yelp blog’s data section.
Dataset Example Code
We maintain a repository of example code to help you get started playing with the dataset. These examples show different ways to interact with the data and how to use our open source Python MapReduce tool mrjob with the data.
The repository includes scripts for
- Converting the dataset from JSON to CSV
- Predicting likely categories given review text
- Finishing reviews using Markov Chains
- Finding the sentiment of words in the dataset
Other Tools
There are many ways to explore the vast data within the Yelp Dataset Challenge Dataset. Below are some examples of some of the many cool tools that can be used with our data:
CartoDB is a cloud based mapping, analysis, and visualization engine that shows you how you can transform reviews into insightful visualizations. They wrote a blog post demonstrating how to use their tools to gain interesting insights about the Las Vegas part of the dataset.
Statwing is a tool used to clean data, explore relationships, and create charts quickly. They loaded the dataset into their system for people to play with and explore interesting insights.