Yelp Dataset Challenge Round 6 Winners

The sixth round of the Yelp Dataset Challenge ran throughout the second half of 2015 and we were really impressed with the projects and ideas that came out of the challenge.

Today, we are proud to announce the grand prize winner of the $5,000 award: “Topic Regularized Matrix Factorization for Review Based Rating Prediction” by Jiachen Li, Yan Wang, Xiangyu Sun, Chengliang Lian, and Ming Yao (from the Language Technologies Institute, School of Computer Science, at Carnegie Mellon University). The authors created a recommender system to inform Yelpers about which business they might be interested in, by predicting the star rating they would give it.

To achieve this, the authors propose combining topic modeling with a Latent Dirichlet Allocation and matrix factorization through a new model called Topic Regularized Matrix Factorization (TRMF). Their model incorporates topic modeling as a constraint for regulating the learning process of matrix factorization. TRMF is an original method that performs better than other ones proposed in the past, and is a clever way to solve a problem that is both difficult and highly relevant to many tech companies.

This entry was selected from tons of submissions for its technical and academic merit. For a full list of all previous winners of the Yelp Dataset Challenge, head over to the challenge site. Thanks to all who participated!

Dataset Example Code

We maintain a repository of example code to help you get started playing with the dataset. These examples show different ways to interact with the data and how to use our open source Python MapReduce tool mrjob with the data.

The repository includes scripts for

Other Tools

There are many ways to explore the vast data within the Yelp Dataset Challenge Dataset. Below are some examples of some of the many cool tools that can be used with our data:

CartoDB is a cloud based mapping, analysis, and visualization engine that shows you how you can transform reviews into insightful visualizations. They recently wrote a blog post demonstrating how to use their tools to gain interesting insights about the Las Vegas part of the dataset.

Statwing is a tool used to clean data, explore relationships, and create charts quickly. They loaded the dataset into their system for people to play with and explore interesting insights.

Next Yelp Dataset Challenge Round

Submissions for Round 7 closed on June 30, 2016, but Round 8 is just around the corner so stay connected. We are excited to see what you will come up with!

Back to blog