Introducing The Yelp Open Dataset

About five years ago, we announced the Yelp Dataset Challenge: a competition that lets students explore and research with the help of our large corpus of data. Each participant can also formally submit their projects for the chance to win prizes. Over the years we’ve seen incredible interest and usage of our dataset for educational purposes. We’ve had teachers use it to teach their classes about databases, engineers use it learn graph databases, and students use it to understand machine learning.

We’re very proud of this type of usage and are continuing to encourage more people to do so with the announcement of the Yelp Open Dataset. The Yelp Open Dataset allows you to use our dataset for personal or educational purposes so that you can can learn from a realistic dataset.

The dataset itself is well-structured and highly relational. We’re providing it as both JSON files and a SQL dump for easy use in any scenario. The dataset itself contains almost 5 million reviews from over 1.1 million users on over 150,000 businesses from 12 metropolitan areas. We’re also making 200,000 photos, their captions, and photo classification labels available for people looking to explore deep learning techniques around photo classification or search.

We’re very excited to make this available to everyone and hope that it’s both fun and educational.

Dataset Round 10

Today also marks the start of round 10 of the dataset challenge! Our nine previous rounds have lead to a number of very exciting projects and dozens of winners. Round 10 kicks off today (August 30, 2017) and will run through the end of the year (December 31st, 2017). The challenge will run using the same data as the Yelp Open Dataset.

Back to blog