Yelp Engineering and Product Blog

Exploring CHAOS: Building a Backend for Server-Driven UI

Jonathan Baird, Software Engineer; Xin Shen, Software Engineer
Jul 8, 2025

A little while ago, we published a blog post on CHAOS: Yelp’s Unified Framework for Server-Driven UI. We strongly recommend reading that post first to gain a solid understanding of SDUI and the goals of CHAOS. This post builds on those concepts to delve into the inner workings of the CHAOS backend and how it generates server-driven content. To briefly recap, CHAOS is a server-driven UI framework used at Yelp. When a client wants to display CHAOS-powered content, it sends a GraphQL query to the CHAOS API. The API processes the query, requests the CHAOS backend to construct the configuration,...

Revenue Automation Series: Testing an Integration with Third-Party System

Anukriti Mishra, Software Engineer; Chukwuemeka Okobi, Software Engineer
May 27, 2025

Background As described in the second blog post of Revenue Automation series, Revenue Data Pipeline processes a large amount of data via complex logic transformations to recognize revenue. Thus, developing a robust production testing and integration strategy was essential to the success of this project phase. The status quo testing process utilized the Redshift Connector for data synchronization once the report was generated and published to the data warehouse (Redshift). This introduced a latency of approximately 10 hours before the data was available in the data warehouse for verification. This delay impacted our ability to verify whether the changes were...

Nrtsearch 1.0.0: Incremental Backups, Lucene 10, and More

Sarthak Nandi and Andrew Prudhomme
May 8, 2025

It has been over 3 years since we published our Nrtsearch blog post and over 4 years since we started using Nrtsearch, our Lucene-based search engine, in production. We have since migrated over 90% of Elasticsearch traffic to Nrtsearch. We are excited to announce the release of Nrtsearch 1.0.0 with several new features and improvements from the initial release. Glossary EBS (Elastic Block Store): Network-attached block storage volumes in AWS. HNSW (Hierarchical Navigable Small World): A graph-based approximate nearest neighbor search technique. Lucene: An open-source search library used by Nrtsearch. S3: Cloud object storage offered in AWS. Scatter-gather: A pattern...

Journey to Zero Trust Access

Carlos B. Hernandez, Software Engineer; Adam Skalicky, Software Engineer
Apr 15, 2025

Glossary ZTA: zero trust architecture SAML: security assertion markup language (an SSO facilitation protocol) Devbox: a remote server used to develop software Zero Trust Access Remote Future Yelp is now a fully remote company, which means our employee base has become increasingly distributed across the world, making secure access to resources from anywhere a critical business function. Yelp historically used Ivanti Pulse Secure as the employee VPN, but due to the need for a more reliable solution, it became clear that a change was necessary to ensure secure and consistent access to internal resources. The Corporate Systems and Client Platform...

Revenue Automation Series: Building Revenue Data Pipeline

Yizheng Zhang, Software Engineer; Yirun Zhou, Software Engineer
Feb 19, 2025

Background As Yelp’s business continues to grow, the revenue streams have become more complex due to the increased number of transactions, new products and services. These changes over time have challenged the manual processes involved in Revenue Recognition. As described in the first post of the Revenue Automation Series, Yelp invested significant resources in modernizing its Billing System to fulfill the pre-requisite of automating the revenue recognition process. In this blog, we would like to share how we built the Revenue Data Pipeline that facilitates the third party integration with a Revenue Recognition SaaS solution, referred to hereafter as the...

Search Query Understanding with LLMs: From Ideation to Production

Loc Trinh, Software Engineer; Ali Rokni, Tech Lead; John Hawksley, Group Tech Lead
Feb 4, 2025

How we bring LLM intelligence to millions of daily searches at Yelp. From the moment a user enters a search query to when we present a list of results, understanding the user’s intent is crucial for meeting their needs. Were they looking for a general category of business for that evening, a particular dish or service, or one specific business nearby? Does the query contain nuanced location or attribute information? Is the query misspelled? Is their phrasing unusual, so that it might not align well with our business data? All of the above questions represent Natural Language Understanding tasks where...

Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

Yunhui Zhang, Software Engineer
Jan 22, 2025

At Yelp, we encountered challenges that prompted us to enhance the training time of our ad-revenue generating models, which use a Wide and Deep Neural Network architecture for predicting ad click-through rates (pCTR). These models handle large tabular datasets with small parameter spaces, requiring innovative data solutions. This blog post delves into our journey of optimizing training time using TensorFlow and Horovod, along with the development of ArrowStreamServer, our in-house library for low-latency data streaming and serving. Together, these components have allowed us to achieve a 1400x speedup in training for business critical models compared to using a single GPU...

Revisiting Compute Scaling

Ilkin Mammadzada and Ankit Tripathi, Site Reliability Engineers
Dec 13, 2024

As mentioned in our earlier blog post Fine-tuning AWS ASGs with Attribute Based Instance Selection, we recently embarked on an exciting journey to enhance our Kubernetes cluster’s node autoscaler infrastructure. In this blog post, we’ll delve into the rationale behind transitioning from our internally developed Clusterman autoscaler to AWS Karpenter. Join us as we explore the reasons for our switch, address the challenges with Clusterman, and embrace the opportunities with Karpenter. Clusterman and its challenges At Yelp, we used Clusterman to handle autoscaling of nodes in Kubernetes clusters. It is an open-source tool we initially designed for Mesos clusters and...

Yelp

Engineering

Engineering Blog

Exploring CHAOS: Building a Backend for Server-Driven UI

Revenue Automation Series: Testing an Integration with Third-Party System

Nrtsearch 1.0.0: Incremental Backups, Lucene 10, and More

Journey to Zero Trust Access

Revenue Automation Series: Building Revenue Data Pipeline

Search Query Understanding with LLMs: From Ideation to Production

Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

Revisiting Compute Scaling

About

Discover

Yelp for Business Owners