Engineering Blog

Rebuilding a Cassandra cluster using Yelp’s Data Pipeline

Robots are frequently used in the manufacturing industry for numerous use-cases. Amongst many, one case is to eliminate defective products automatically from reaching the finished goods inventory. The same principles of these systems can be adopted to filter out malformed data from datastores. This blog post deep dives into how we rebuilt one of our Cassandra(C*) clusters by removing malformed data using Yelp’s Data Pipeline. Apache Cassandra is a distributed wide-column NoSQL datastore and is used at Yelp for storing both primary and derived data. Many different features on Yelp are powered by Cassandra. Yelp orchestrates Cassandra clusters on Kubernetes...

Continue reading

Recycling Kubernetes Nodes

Manually managing the lifecycle of Kubernetes nodes can become difficult as the cluster scales. Especially if your clusters are multi-tenant and self-managed. You may need to replace nodes for various reasons, such as OS upgrades and security patches. One of the biggest challenges is how to terminate nodes without disturbing tenants. In this post, I’ll describe the problems we encountered administering Yelp’s clusters and the solutions we implemented. Problems At Yelp we use PaaSTA for building, deploying and running services. Initially, PaaSTA just supported stateless services. This meant it was relatively easy to replace nodes since we only needed to...

Continue reading

Lessons from A/B Testing on Bandit Subjects

Abstract   Compared to full-scale ML, multi-armed bandit is a lighter weight solution that can help teams quickly optimize their product features without major commitments. However, bandits need to have a candidate selection step when they have too many items to choose from. Using A/B testing to optimize the candidate selection step causes new bandit bias and convergence selection bias. New bandit bias occurs when we try to compare new bandits with established ones in an experiment; convergence selection bias creeps in when we try to solve the new bandit bias by defining and selecting established bandits. We discuss our...

Continue reading

Spark Data Lineage

In this blog post, we introduce Spark-Lineage, an in-house product to track and visualize how data at Yelp is processed, stored, and transferred among our services. What is Spark-Lineage? Spark and Spark-ETL: At Yelp, Spark is considered a first-class citizen, handling batch jobs in all corners, from crunching reviews to identify similar restaurants in the same area, to performing reporting analytics about optimizing local business search. Spark-ETL is our inhouse wrapper around Spark, providing high-level APIs to run Spark batch jobs and abstracting away the complexity of Spark. Spark-ETL is used extensively at Yelp, helping save time that our engineers...

Continue reading

Android in Analytics Infra

At Yelp, we have a reasonably large Android community for a company of Yelp’s size. These talented and skilled Android engineers work on Yelp’s client and business applications. We would like to share some of the unique challenges that we’ve experienced along with our various efforts to overcome those challenges. Analytics Infra is a team at Yelp that works on experimentation and logging platforms and supports them across the entire Yelp ecosystem. Within the Analytics Infra team, we have an Android working group. You may consider our team as an infrastructure team - a team that implements end-user functionality -...

Continue reading

Writing Emails Using React

As part of our effort to connect users with great local businesses, Yelp sends out tens of millions of emails every month. In order to support the scale of those sends, we rely on third-party Email Service Providers (ESPs) as well as our internal email system, Mercury. Delivering the emails is just part of the challenge—we also need to give email developers a way to craft sophisticated templates that conform to our Yelp design guidelines. In the past, Yelp web and full stack engineers would rely on our legacy template language, Cheetah, to write emails. However, as the Yelp design...

Continue reading

Migrating from Styleguidist to Storybook

One of the core tenets for our infrastructure and engineering effectiveness teams at Yelp is ensuring we have a best-in-class developer experience. Our React monorepo codebase has steadily grown as developers create new React components, but our existing React Styleguidist (Styleguidist, for short) development environment has failed to scale in parallel. By transitioning from Styleguidist to Storybook, we were able to offer a faster and more user-friendly development environment for React components along with better alignment to developer and designer workflows. In this post we’ll take a deep dive into how and why we migrated to Storybook. Background Status Quo...

Continue reading

A Simply, Ordinary Reduction

Experimentation has become standard practice for companies, and one of the most important aspects is how to evaluate the results to make ship/no-ship decisions. Have you run into experiments where you don’t have enough data for statistically significant results or perhaps the performance of your primary metric seemingly disagrees with that of your secondary metrics? If so, leveraging existing features to perform variance reduction may help with coming to a conclusion. At Yelp, we have found that using features typically used in ML modeling, in particular, can help with measuring treatment effects better than solely using t-tests! Introduction Before deciding...

Continue reading