Engineering Blog

Finding Beautiful Yelp Photos Using Deep Learning

Yelp users upload around 100,000 photos a day to a collection of tens of millions, and that rate continues to grow. In fact, we’re seeing a growth rate for photos that is outpacing the rate of reviews. These photos provide a rich tapestry of information about the content and quality of local businesses. One important aspect of photos is the type of content being displayed. In August of 2015 we introduced a system that classified restaurant photos as food, drink, outside, inside, or menu. Since then, we have trained and put into production similar systems for coffee shops and bars,...

Continue reading

Open-Sourcing Yelp's Data Pipeline

For the past few months we’ve been spreading the word about our shiny new Data Pipeline: a Python-based tool that streams and transforms real-time data to services that need it. We wrote a series of blog posts covering how we replicate messages from our MySQL tables, how we track schemas and compute schema migrations, and finally how we connect our data to different types of data targets like Redshift and Salesforce. With all of this talk about the Data Pipeline, you might think that we here at Yelp are like a kid with a new toy, wanting to keep it...

Continue reading

Comparing searches at the DNC and RNC

Ah - politics. It’s frustrating, it’s funny, it’s serious business. It is perhaps one of the best windows we have into the nature of the human condition. So we think it’s fitting that we combine politics with another window into human nature - search data. In particular, Yelp’s PR team was interested in looking at how the Republican and Democratic National Conventions affected what people looked for on Yelp. Could we confirm certain political stereotypes? Would there be surprises? Methodology From a data analysis perspective, we first need to precisely state the question before we can get to tackling it....

Continue reading

Streaming Messages from Kafka into Redshift in near Real-Time

This is the sixth post in a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL updates in real-time with an exactly-once guarantee, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into datastores like Redshift and Salesforce. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in...

Continue reading

Embedded Reviews at Yelp

Yelp is known for useful, funny, and cool reviews of all kinds of businesses, so it’s no surprise that they often get shared in other places. While we love seeing screenshots of these reviews shared on other websites, they don’t make for a great user experience. Their image formats make it easy to create and share but they become outdated, load slower than text, and make it hard for users to find out more about the business. We wanted to build something that retains screenshots’ ease of sharing while solving some their biggest shortcomings, so we created embedded reviews. Now,...

Continue reading

Data Pipeline: Salesforce Connector

This is the fifth post in a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL updates in real-time with an exactly-once guarantee, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into datastores like Redshift and Salesforce. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in...

Continue reading

The Great HTTPS Migration

Yelp is now entirely on HTTPS! While several pages have been secured for quite some time (we began securing pages with sensitive information like passwords, credit card numbers, and even the reviews you submit, in 2008), we’ve finally made the transition to using TLS across the entirety of our website. To some, this will sound like quite the accomplishment while others may wonder why it took until mid-2016 to complete this migration. Why Now? Brief History of HTTPS Netscape created SSL in 1994 and by 2000, TLS became the default encryption protocol and the modern HTTPS spec was created. But...

Continue reading

Yelp's Bug-Bounty Map

For the past two years we’ve been running a private bug-bounty program. We worked with academic researchers and bug hunters from all over the world and, as a result, we have fixed over a hundred potential vulnerabilities, and have paid bug bounties to dozens of security experts. Today we’re launching our public bug-bounty program as our next step towards improving the security of Yelp’s systems and services. Our vulnerability reward payouts will go up to $15,000 USD for the most impactful exploits. Since getting familiar with our infrastructure may be a bit intimidating, we’ve put together some information below to...

Continue reading