Engineering Blog

First 100 Days of Yelp's Public Bug Bounty Program

One hundred days ago we launched Yelp’s public bug bounty program on HackerOne. Since launching the program, we received over 564 reports from 512 reporters. The distribution of the reports was as follows: Resolved: ~ 7% Informative: ~ 36% Duplicate: ~ 31% Not Applicable: ~ 26% Looking back on the first 100 days of our program, we fixed 39 vulnerabilities and paid out $13,850 in rewards. We maintained less than 24 hours response time and less than 1 month resolution time. The distribution of bug-bounty payouts over time is shown in Chart 1. Chart 1: Distribution of bug-bounty payouts over...

Continue reading

Finding Beautiful Yelp Photos Using Deep Learning

Yelp users upload around 100,000 photos a day to a collection of tens of millions, and that rate continues to grow. In fact, we’re seeing a growth rate for photos that is outpacing the rate of reviews. These photos provide a rich tapestry of information about the content and quality of local businesses. One important aspect of photos is the type of content being displayed. In August of 2015 we introduced a system that classified restaurant photos as food, drink, outside, inside, or menu. Since then, we have trained and put into production similar systems for coffee shops and bars,...

Continue reading

Open-Sourcing Yelp's Data Pipeline

For the past few months we’ve been spreading the word about our shiny new Data Pipeline: a Python-based tool that streams and transforms real-time data to services that need it. We wrote a series of blog posts covering how we replicate messages from our MySQL tables, how we track schemas and compute schema migrations, and finally how we connect our data to different types of data targets like Redshift and Salesforce. With all of this talk about the Data Pipeline, you might think that we here at Yelp are like a kid with a new toy, wanting to keep it...

Continue reading

Comparing searches at the DNC and RNC

Ah - politics. It’s frustrating, it’s funny, it’s serious business. It is perhaps one of the best windows we have into the nature of the human condition. So we think it’s fitting that we combine politics with another window into human nature - search data. In particular, Yelp’s PR team was interested in looking at how the Republican and Democratic National Conventions affected what people looked for on Yelp. Could we confirm certain political stereotypes? Would there be surprises? Methodology From a data analysis perspective, we first need to precisely state the question before we can get to tackling it....

Continue reading

Streaming Messages from Kafka into Redshift in near Real-Time

This post is part of a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL and Cassandra data at real-time, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into data stores like Redshift, Salesforce, and Elasticsearch. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in near...

Continue reading

Embedded Reviews at Yelp

Yelp is known for useful, funny, and cool reviews of all kinds of businesses, so it’s no surprise that they often get shared in other places. While we love seeing screenshots of these reviews shared on other websites, they don’t make for a great user experience. Their image formats make it easy to create and share but they become outdated, load slower than text, and make it hard for users to find out more about the business. We wanted to build something that retains screenshots’ ease of sharing while solving some their biggest shortcomings, so we created embedded reviews. Now,...

Continue reading

Data Pipeline: Salesforce Connector

This post is part of a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL and Cassandra data at real-time, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into data stores like Redshift, Salesforce, and Elasticsearch. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in near...

Continue reading

The Great HTTPS Migration

Yelp is now entirely on HTTPS! While several pages have been secured for quite some time (we began securing pages with sensitive information like passwords, credit card numbers, and even the reviews you submit, in 2008), we’ve finally made the transition to using TLS across the entirety of our website. To some, this will sound like quite the accomplishment while others may wonder why it took until mid-2016 to complete this migration. Why Now? Brief History of HTTPS Netscape created SSL in 1994 and by 2000, TLS became the default encryption protocol and the modern HTTPS spec was created. But...

Continue reading