Engineering Blog

Data Pipeline: Salesforce Connector

This post is part of a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL and Cassandra data at real-time, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into data stores like Redshift, Salesforce, and Elasticsearch. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in near...

Continue reading

The Great HTTPS Migration

Yelp is now entirely on HTTPS! While several pages have been secured for quite some time (we began securing pages with sensitive information like passwords, credit card numbers, and even the reviews you submit, in 2008), we’ve finally made the transition to using TLS across the entirety of our website. To some, this will sound like quite the accomplishment while others may wonder why it took until mid-2016 to complete this migration. Why Now? Brief History of HTTPS Netscape created SSL in 1994 and by 2000, TLS became the default encryption protocol and the modern HTTPS spec was created. But...

Continue reading

Yelp's Bug-Bounty Map

For the past two years we’ve been running a private bug-bounty program. We worked with academic researchers and bug hunters from all over the world and, as a result, we have fixed over a hundred potential vulnerabilities, and have paid bug bounties to dozens of security experts. Today we’re launching our public bug-bounty program as our next step towards improving the security of Yelp’s systems and services. Our vulnerability reward payouts will go up to $15,000 USD for the most impactful exploits. Since getting familiar with our infrastructure may be a bit intimidating, we’ve put together some information below to...

Continue reading

PaaStorm: A Streaming Processor

This post is part of a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL and Cassandra data at real-time, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into data stores like Redshift, Salesforce, and Elasticsearch. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in near...

Continue reading

Undebt: How We Refactored 3 Million Lines of Code

Peter Seibel wrote that to maximize engineering effectiveness, “Let a thousand flowers bloom. Then rip 999 of them out by the roots.” Flowers, in how the metaphor applies to us, are code patterns — the myriad different functions, classes, styles, and idioms that developers use when writing code. At first, new flowers are welcome — maybe the new pattern seems easier to use, more scalable, more efficient, or more suited to some particular task than the old. As a code base grows, and the flowers proliferate, however, it becomes clear which patterns work and which don’t. Suddenly, code patterns that...

Continue reading

AMIRA: Automated Malware Incident Response and Analysis

Brave malware analysts at Yelp have spent a lot of time looking at the digital forensics from potentially infected macOS systems, gathered using our open source project, OSXCollector. Early on, we automated parts of the analysis process, augmenting the initial set of digital forensics collected from the machines with the information gathered from the threat intelligence APIs and internal blacklists. This involved identifying potentially suspicious domains, URLs and file hashes but our approach to the analysis still required a certain degree of configuration and manual maintenance which was tedious for the malware response team. In this blog post I will...

Continue reading

More Than Just a Schema Store

This post is part of a series covering Yelp's real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL and Cassandra data at real-time, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into data stores like Redshift, Salesforce, and Elasticsearch. Read the posts in the series: Billions of Messages a Day - Yelp's Real-time Data Pipeline Streaming MySQL tables in real-time to Kafka More Than Just a Schema Store PaaStorm: A Streaming Processor Data Pipeline: Salesforce Connector Streaming Messages from Kafka into Redshift in near...

Continue reading

How We Scaled Our Ad Analytics with Apache Cassandra

On the Ad Backend team, we recently moved our ad analytics data from MySQL to Apache Cassandra. Here’s why we thought Cassandra was a good fit for our application, and some lessons we learned that you might find useful if you’re thinking about using Cassandra! Why Cassandra? First, a little bit about our application. We have over 100,000 paying advertisers. Every day, we calculate the numbers of views and clicks each ad campaign received the previous day and the amount of money spent by each campaign. With these analytics, we generate bills and many different types of reports. Back in...

Continue reading