Yelp Engineering and Product Blog

Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

Yunhui Zhang, Software Engineer
Jan 22, 2025

At Yelp, we encountered challenges that prompted us to enhance the training time of our ad-revenue generating models, which use a Wide and Deep Neural Network architecture for predicting ad click-through rates (pCTR). These models handle large tabular datasets with small parameter spaces, requiring innovative data solutions. This blog post delves into our journey of optimizing training time using TensorFlow and Horovod, along with the development of ArrowStreamServer, our in-house library for low-latency data streaming and serving. Together, these components have allowed us to achieve a 1400x speedup in training for business critical models compared to using a single GPU...

Revisiting Compute Scaling

Ilkin Mammadzada and Ankit Tripathi, Site Reliability Engineers
Dec 13, 2024

As mentioned in our earlier blog post Fine-tuning AWS ASGs with Attribute Based Instance Selection, we recently embarked on an exciting journey to enhance our Kubernetes cluster’s node autoscaler infrastructure. In this blog post, we’ll delve into the rationale behind transitioning from our internally developed Clusterman autoscaler to AWS Karpenter. Join us as we explore the reasons for our switch, address the challenges with Clusterman, and embrace the opportunities with Karpenter. Clusterman and its challenges At Yelp, we used Clusterman to handle autoscaling of nodes in Kubernetes clusters. It is an open-source tool we initially designed for Mesos clusters and...

Revenue Automation Series: Modernizing Yelp's Legacy Billing System

Simon Zeng, Payments Tech Lead; Supriya Lal, Commerce Platform Group Tech Lead
Dec 6, 2024

This blog focuses on how Yelp successfully implemented a multi-year, cross-organizational initiative to modernize its billing processes. The goal was to automate its revenue recognition system by enhancing integration capabilities with third-party financial systems, all while maintaining the accuracy and reliability our users expect. Summary When Yelp first developed its billing system a decade ago, the database design was based on the requirements known at that time. These initial choices laid the foundation for the billing system, upon which multiple Yelp systems and processes were built. However, as the company evolved, it became evident that these design choices were not...

Loading data into Redshift with DBT

Christopher Arnold, Software Engineer
Nov 6, 2024

At Yelp, we embrace innovation and thrive on exploring new possibilities. With our consumers’ ever growing appetite for data, we recently revisited how we could load data into Redshift more efficiently. In this blog post, we explore how DBT can be used seamlessly with Redshift Spectrum to read data from Data Lake into Redshift to significantly reduce runtime, resolve data quality issues, and improve developer productivity. Starting Point Our method of loading batch data into Redshift had been effective for years, but we continually sought improvements. We primarily used Spark jobs to read S3 data and publish it to our...

How we improved our Android navigation performance by ~30%

Paul Martin, Core Android Tech Lead
Oct 8, 2024

In 2019, Yelp’s Core Android team led an effort to boost navigation performance in Yelp’s Consumer app. We switched from building screens with multiple separate activities to using fragments inside a single activity. In this blog post, we’ll cover our solution, how we approached the migration and share learnings from along the way as well as performance wins. Where we started circa 2018 Navigating between screens in an Android app is often when the app and device are under the most strain. The new screen and its dependencies are quickly created, which can lead to slow or frozen frames. Prior...

Migrating in-place from PostgreSQL to MySQL

Alex Toumazis, Software Engineer
Oct 7, 2024

The Yelp Reservations service (yelp_res) is the service that powers reservations on Yelp. It was acquired along with Seatme in 2013, and is a Django service and webapp. It powers the reservation backend and logic for Yelp Guest Manager, our iPad app for restaurants, and handles diner and partner flows that create reservations. Along with that, it serves a web UI and backend API for our Yelp Reservations app, which has been superseded by Yelp Guest Manager but is still used by many of our restaurant customers. This service was built using a DB-centric architecture, and uses a “DB sync”...

Boosting ML Pipeline Efficiency: Direct Cassandra Ingestion from Spark

Muhammad Junaid Muzammil, Software Engineer; Arnold Ziesche-Blank, Machine Learning Engineer
Sep 19, 2024

Machine Learning Feature Stores ML Feature Store at Yelp Many of Yelp’s core capabilities such as business search, ads, and reviews are powered by Machine Learning (ML). In order to ensure these capabilities are well supported, we have built a dedicated ML platform. One of the pillars of this infrastructure is the Feature Store, which is a centralized data store for ML Features that are the input of ML models. Having a centralized dedicated datastore for ML Features serves a number of purposes: Data Quality and Data Governance Feature discovery Improved operational efficiency Availability of Features in every required environment...

dbt Generic Tests in Sessions Validation at Yelp

Tian, Yukang, Software Engineer
Aug 14, 2024

Sessions, Where Everything Started For the past few years, Yelp has been using dbt as one of the tools to develop data products that power data marts, which are one stop shops for high visibility dashboards pertaining to top level business metrics. One of the key data products that’s owned by my team, Clickstream Analytics, is the Sessions Data Mart. This product is our in-house solution to understand what consumers do during their session interaction with Yelp products and provide insights on top of it. This blog post will walk you through how dbt is used as an important test...

Yelp

Engineering

Engineering Blog

Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

Revisiting Compute Scaling

Revenue Automation Series: Modernizing Yelp's Legacy Billing System

Loading data into Redshift with DBT

How we improved our Android navigation performance by ~30%

Migrating in-place from PostgreSQL to MySQL

Boosting ML Pipeline Efficiency: Direct Cassandra Ingestion from Spark

dbt Generic Tests in Sessions Validation at Yelp

About

Discover

Yelp for Business Owners