Engineering Blog

Revenue Automation Series: Building Revenue Data Pipeline

Background As Yelp’s business continues to grow, the revenue streams have become more complex due to the increased number of transactions, new products and services. These changes over time have challenged the manual processes involved in Revenue Recognition. As described in the first post of the Revenue Automation Series, Yelp invested significant resources in modernizing its Billing System to fulfill the pre-requisite of automating the revenue recognition process. In this blog, we would like to share how we built the Revenue Data Pipeline that facilitates the third party integration with a Revenue Recognition SaaS solution, referred to hereafter as the...

Continue reading

Search Query Understanding with LLMs: From Ideation to Production

How we bring LLM intelligence to millions of daily searches at Yelp. From the moment a user enters a search query to when we present a list of results, understanding the user’s intent is crucial for meeting their needs. Were they looking for a general category of business for that evening, a particular dish or service, or one specific business nearby? Does the query contain nuanced location or attribute information? Is the query misspelled? Is their phrasing unusual, so that it might not align well with our business data? All of the above questions represent Natural Language Understanding tasks where...

Continue reading

Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

At Yelp, we encountered challenges that prompted us to enhance the training time of our ad-revenue generating models, which use a Wide and Deep Neural Network architecture for predicting ad click-through rates (pCTR). These models handle large tabular datasets with small parameter spaces, requiring innovative data solutions. This blog post delves into our journey of optimizing training time using TensorFlow and Horovod, along with the development of ArrowStreamServer, our in-house library for low-latency data streaming and serving. Together, these components have allowed us to achieve a 1400x speedup in training for business critical models compared to using a single GPU...

Continue reading

Revisiting Compute Scaling

As mentioned in our earlier blog post Fine-tuning AWS ASGs with Attribute Based Instance Selection, we recently embarked on an exciting journey to enhance our Kubernetes cluster’s node autoscaler infrastructure. In this blog post, we’ll delve into the rationale behind transitioning from our internally developed Clusterman autoscaler to AWS Karpenter. Join us as we explore the reasons for our switch, address the challenges with Clusterman, and embrace the opportunities with Karpenter. Clusterman and its challenges At Yelp, we used Clusterman to handle autoscaling of nodes in Kubernetes clusters. It is an open-source tool we initially designed for Mesos clusters and...

Continue reading

Revenue Automation Series: Modernizing Yelp's Legacy Billing System

This blog focuses on how Yelp successfully implemented a multi-year, cross-organizational initiative to modernize its billing processes. The goal was to automate its revenue recognition system by enhancing integration capabilities with third-party financial systems, all while maintaining the accuracy and reliability our users expect. Summary When Yelp first developed its billing system a decade ago, the database design was based on the requirements known at that time. These initial choices laid the foundation for the billing system, upon which multiple Yelp systems and processes were built. However, as the company evolved, it became evident that these design choices were not...

Continue reading

Loading data into Redshift with DBT

At Yelp, we embrace innovation and thrive on exploring new possibilities. With our consumers’ ever growing appetite for data, we recently revisited how we could load data into Redshift more efficiently. In this blog post, we explore how DBT can be used seamlessly with Redshift Spectrum to read data from Data Lake into Redshift to significantly reduce runtime, resolve data quality issues, and improve developer productivity. Starting Point Our method of loading batch data into Redshift had been effective for years, but we continually sought improvements. We primarily used Spark jobs to read S3 data and publish it to our...

Continue reading

How we improved our Android navigation performance by ~30%

In 2019, Yelp’s Core Android team led an effort to boost navigation performance in Yelp’s Consumer app. We switched from building screens with multiple separate activities to using fragments inside a single activity. In this blog post, we’ll cover our solution, how we approached the migration and share learnings from along the way as well as performance wins. Where we started circa 2018 Navigating between screens in an Android app is often when the app and device are under the most strain. The new screen and its dependencies are quickly created, which can lead to slow or frozen frames. Prior...

Continue reading

Migrating in-place from PostgreSQL to MySQL

The Yelp Reservations service (yelp_res) is the service that powers reservations on Yelp. It was acquired along with Seatme in 2013, and is a Django service and webapp. It powers the reservation backend and logic for Yelp Guest Manager, our iPad app for restaurants, and handles diner and partner flows that create reservations. Along with that, it serves a web UI and backend API for our Yelp Reservations app, which has been superseded by Yelp Guest Manager but is still used by many of our restaurant customers. This service was built using a DB-centric architecture, and uses a “DB sync”...

Continue reading