Engineering

Engineering Blog

Modernizing Ads Targeting Machine Learning Pipeline

Yelp’s mission is to connect users with great local businesses. As part of that mission, we provide local businesses with an ads product to help them better reach out to users. This product strives to showcase the most relevant ads to the user without taking away from their overall search experience on Yelp. In this blog post, we’ll walk through the architecture of how this is made possible by using one of the largest machine learning systems at Yelp: Ads Targeting System. The Ads Targeting System is a machine learning (ML) system designed to serve only the most relevant ads...

Continue reading

Streams and Monk – How Yelp is Approaching Kafka in 2020

We launched our very first Kafka cluster at Yelp more than five years ago. It was not monitored, did not expose any metrics, and we definitely did not have anyone on call for it. One year later, Kafka had already become one of the most important distributed systems running at Yelp, and today has become one of the core components of our infrastructure. Kafka has come a long way since the 0.8 version we were running back then, and our tooling (some of it open-source) has also significantly improved, increasing reliability and reducing the amount of operational work required to...

Continue reading

Automated IDOR Discovery through Stateful Swagger Fuzzing

Scaling security coverage in a growing company is hard. The only way to do this effectively is to empower front-line developers to be able to easily discover, triage, and fix vulnerabilities before they make it to production servers. Today, we’re excited to announce that we’ll be open-sourcing fuzz-lightyear: a testing framework we’ve developed to identify Insecure Direct Object Reference (IDOR) vulnerabilities through stateful Swagger fuzzing, tailored to support an enterprise, microservice architecture. This integrates with our Continuous Integration (CI) pipeline to provide consistent, automatic test coverage as web applications evolve. The Problem As a class of vulnerabilities, IDOR is arguably...

Continue reading

Streaming Cassandra into Kafka in (Near) Real-Time: Part 2

The first half of this post covered the requirements and design choices of the Cassandra Source Connector and dove into the details of the CDC Publisher. As described, the CDC Publisher processes Cassandra CDC data and publishes it as loosely ordered PartitionUpdate objects into Kafka as intermediate keyed streams. The intermediate streams then serve as input for the DP Materializer. Data Pipeline Materializer The DP Materializer ingests the serialized PartitionUpdate objects published by the CDC Publisher, transforms them into fully formed Data Pipeline messages, and publishes them into the Data Pipeline. The DP Materializer is built on top of Apache...

Continue reading

Architecting Restaurant Wait Time Predictions

Is there a restaurant you’ve always wanted to check out, but haven’t been able to because they don’t take reservations and the lines are out the door? Here at Yelp, we’re trying to solve problems just like these and delight consumers with streamlined dining experiences. Yelp Waitlist is part of the Yelp Restaurants product suite, and its mission is to take the mystery out of everyday dining experiences, enabling you to get in line at your favorite restaurant through just the tap of a button. For diners, in addition to joining an online waitlist, Yelp Waitlist provides live wait times...

Continue reading

Streaming Cassandra into Kafka in (Near) Real-Time: Part 1

At Yelp, we use Cassandra to power a variety of use cases. As of the date of publication, there are 25 Cassandra clusters running in production, each with varying sizes of deployment. The data stored in these clusters is often required as-is or in a transformed state by other use cases, such as analytics, indexing, etc. (for which Cassandra is not the most appropriate data store). As seen in previous posts from our Data Pipeline series, Yelp has developed a robust connector ecosystem around its data stores to stream data both into and out of the Data Pipeline. This two-part...

Continue reading

Organizing and Securing Third-Party CDN Assets at Yelp

At Yelp, we use a service-oriented architecture to serve our web pages. This consists of a lot of frontend services, each of which is responsible for serving different pages (e.g., the search page or a business listing page). In these frontend services, we use a couple of third-party JavaScript/CSS assets (React, Babel polyfill, etc.) to render our web pages. We chose to serve such assets using a third-party Content Delivery Network (CDN) for better performance. In the past, if a frontend service needed to use a third-party JavaScript/CSS asset, engineers had to hard-code its CDN URL. For example: <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.8.3/jquery.min.js"...

Continue reading

Remember Clusterman? Now It's Open-Source, and Supports Kubernetes Too!

Earlier this year, I wrote a blog post showing off some cool features of our in-house compute cluster autoscaler, Clusterman (our Cluster Manager). This time, I’m back with two announcements that I’m really excited about! Firstly, in the last few months, we’ve added another supported backend to Clusterman; so not only can it scale Mesos clusters, it can also scale Kubernetes clusters. Second, Clusterman is now open-source on GitHub so that you, too, can benefit from advanced autoscaling techniques for your compute clusters. If you prefer to just read the code, you can head there now to find some examples...

Continue reading