Engineering Blog

How Partition Access Visualizations Reduced our Data Lake S3 Cost by 33%

Introduction In large analytics environments, data teams often struggle to answer deceptively simple questions, like who their stakeholders are and how their data is being used. At Yelp, we address this by visualizing access patterns, plotting time-based partition key values against access event timestamps. These visualizations reveal distinct usage signatures – ad hoc queries, daily batch jobs, and periodic backfills – allowing data owners to understand their stakeholders and use cases. This deeper insight into data usage has enabled high-impact platform initiatives including migrating thousands of tables to Apache Iceberg format and identifying storage efficiencies which reduced the cost of...

Continue reading

Optimizing Our Build Times by Migrating from Webpack to Rspack

Over the years, Webpack has remained the bundler of choice for many JS projects, including here at Yelp. While it has served us well, its speed has increasingly become a bottleneck as our monorepo continues to grow. Fortunately, a bunch of new build tools (Vite, Parcel, Rspack, etc.) have emerged in recent years. Each of these tools promises different ways of improving performance and developer experience. In this blog post, we’ll walk through how we migrated our monorepo builds from Webpack to Rspack and achieved an approximately 50% reduction in build time. Why Rspack Our team has been closely observing...

Continue reading

How Yelp Keeps Server-Driven UI Consistent Across Four Platforms

If you’ve read our earlier post, you already know about CHAOS—the server-driven UI (SDUI) framework we built at Yelp that powers our dynamic views. Until now, we’ve explored its architecture, backend implementation, and component model. In this post, we’ll dive into how we integrated CHAOS with Yelp’s cross-platform design system, Cookbook, and the auto-generated bridge library, Konbini. Introduction to Cookbook At Yelp, we support two major applications across our Web, iOS, and Android platforms: Yelp and Yelp for Business. This results in six different variations, which makes it challenging to maintain a unified experience. To address this challenge, we created...

Continue reading

Zero downtime Upgrade: Yelp’s Cassandra 4.x Upgrade Story

The Database Reliability Engineering team at Yelp seamlessly upgraded more than a thousand Cassandra nodes with zero downtime. This post takes you behind the scenes of our upgrade strategy, from planning sessions to flawless rollouts. Background Motivation Apache Cassandra is a distributed wide-column NoSQL datastore and is used widely at Yelp for storing both primary and derived data. Yelp orchestrates Cassandra clusters on Kubernetes with the help of operators, as explained in our operator overview post. Upgrading from Cassandra 3.11 to 4.1 offered several observability and reliability improvements, in addition to performance gains. Based on public benchmarks, we expected to...

Continue reading

Building Biz Ask Anything: From Prototype to Product

Introduction Users have access to a wealth of information on Yelp business pages – from reviews and photos to structured information, menus, and Ask the Community feature on the business page, a single business page can be an ocean of content. At the same time, user expectations have evolved: people now expect immediate, direct answers. Sifting through dozens of reviews to find a simple fact can be time-consuming. Fortunately, advances in Large Language Models (LLMs) have given us a new set of tools, allowing us to tackle information retrieval and summarization tasks that were prohibitively complex just a few years...

Continue reading

How Yelp Built a Back-Testing Engine for Safer, Smarter Ad Budget Allocation

Introduction Modern advertising platforms are fast-paced and interconnected: even small adjustments can have ripple effects on how ads are shown, how budgets are spent, and the value advertisers get from their ad spend. At Yelp, Ad Budget Allocation means splitting each campaign’s spend between on‑platform inventory (our website, mobile site, and app) and off‑platform inventory (the Yelp Ad Network). We optimize this split to meet advertisers’ performance goals while growing overall revenue. Due to the complexity of the budget allocation system and its feedback loop, even small changes can lead to unexpected system‑wide effects. To help us safely evaluate changes,...

Continue reading

S3 server access logs at scale

Introduction Yelp heavily relies on Amazon S3 (Simple Storage Service) to store a wide variety of data, from images, logs, database backups, and more. Since data is stored on the cloud, we need to carefully manage how this data is accessed, secured, and eventually deleted—both to control costs and uphold high standards of security and compliance. One of the core challenges in managing S3 buckets is gaining visibility into who is accessing your data (known as S3 objects), how frequently, and for what purpose. Without robust logging, it’s difficult to troubleshoot access issues, respond to security incidents, and ensure we...

Continue reading

Exploring CHAOS: Building a Backend for Server-Driven UI

A little while ago, we published a blog post on CHAOS: Yelp’s Unified Framework for Server-Driven UI. We strongly recommend reading that post first to gain a solid understanding of SDUI and the goals of CHAOS. This post builds on those concepts to delve into the inner workings of the CHAOS backend and how it generates server-driven content. To briefly recap, CHAOS is a server-driven UI framework used at Yelp. When a client wants to display CHAOS-powered content, it sends a GraphQL query to the CHAOS API. The API processes the query, requests the CHAOS backend to construct the configuration,...

Continue reading