Engineering Blog

Yelp Dataset Challenge Round 7 Winner and Announcing Round 9

Yelp Dataset Challenge Round 7 Winners The seventh round of the Yelp Dataset Challenge ran throughout the first half of 2016 and, as usual, we were impressed with the projects and ideas that came out of the challenge. Today, we are proud to announce the grand prize winner of the $5,000 award: “Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams” by Abhinav Maurya, Kenton Murray, Yandong Liu, Chris Dyer, William W. Cohen, and Daniel B. Neill (from Carnegie Mellon University, University of Notre Dame in Indiana). The authors created a model to detect and characterize emerging topics in...

Continue reading

What it’s Like To Be a First-Time Speaker at Grace Hopper

Alexa H. has been a Product Designer at Yelp for two years, having previously graduated from California College of the Arts studying Graphic Design. This year at Grace Hopper, she co-presented “Ask questions, lots of questions: A workshop for practicing building beautiful presentations in Google Slides and giving design critique.” It was her first public speaking experience, and after watching her fantastic performance, I wanted to know more! Jenni: So Alexa, I’ll just dive right in: how did you feel about the prospect of public speaking before you submitted this proposal? Had you considered it in the past? Alexa: Well,...

Continue reading

First 100 Days of Yelp's Public Bug Bounty Program

One hundred days ago we launched Yelp’s public bug bounty program on HackerOne. Since launching the program, we received over 564 reports from 512 reporters. The distribution of the reports was as follows: Resolved: ~ 7% Informative: ~ 36% Duplicate: ~ 31% Not Applicable: ~ 26% Looking back on the first 100 days of our program, we fixed 39 vulnerabilities and paid out $13,850 in rewards. We maintained less than 24 hours response time and less than 1 month resolution time. The distribution of bug-bounty payouts over time is shown in Chart 1. Chart 1: Distribution of bug-bounty payouts over...

Continue reading

Finding Beautiful Yelp Photos Using Deep Learning

Yelp users upload around 100,000 photos a day to a collection of tens of millions, and that rate continues to grow. In fact, we’re seeing a growth rate for photos that is outpacing the rate of reviews. These photos provide a rich tapestry of information about the content and quality of local businesses. One important aspect of photos is the type of content being displayed. In August of 2015 we introduced a system that classified restaurant photos as food, drink, outside, inside, or menu. Since then, we have trained and put into production similar systems for coffee shops and bars,...

Continue reading

Open-Sourcing Yelp's Data Pipeline

For the past few months we’ve been spreading the word about our shiny new Data Pipeline: a Python-based tool that streams and transforms real-time data to services that need it. We wrote a series of blog posts covering how we replicate messages from our MySQL tables, how we track schemas and compute schema migrations, and finally how we connect our data to different types of data targets like Redshift and Salesforce. With all of this talk about the Data Pipeline, you might think that we here at Yelp are like a kid with a new toy, wanting to keep it...

Continue reading

Comparing searches at the DNC and RNC

Ah - politics. It’s frustrating, it’s funny, it’s serious business. It is perhaps one of the best windows we have into the nature of the human condition. So we think it’s fitting that we combine politics with another window into human nature - search data. In particular, Yelp’s PR team was interested in looking at how the Republican and Democratic National Conventions affected what people looked for on Yelp. Could we confirm certain political stereotypes? Would there be surprises? Methodology From a data analysis perspective, we first need to precisely state the question before we can get to tackling it....

Continue reading

Streaming Messages from Kafka into Redshift in near Real-Time

This post is part of a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL and Cassandra data at real-time, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into data stores like Redshift, Salesforce, and Elasticsearch. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in near...

Continue reading

Embedded Reviews at Yelp

Yelp is known for useful, funny, and cool reviews of all kinds of businesses, so it’s no surprise that they often get shared in other places. While we love seeing screenshots of these reviews shared on other websites, they don’t make for a great user experience. Their image formats make it easy to create and share but they become outdated, load slower than text, and make it hard for users to find out more about the business. We wanted to build something that retains screenshots’ ease of sharing while solving some their biggest shortcomings, so we created embedded reviews. Now,...

Continue reading