Engineering Blog

Migrating in-place from PostgreSQL to MySQL

The Yelp Reservations service (yelp_res) is the service that powers reservations on Yelp. It was acquired along with Seatme in 2013, and is a Django service and webapp. It powers the reservation backend and logic for Yelp Guest Manager, our iPad app for restaurants, and handles diner and partner flows that create reservations. Along with that, it serves a web UI and backend API for our Yelp Reservations app, which has been superseded by Yelp Guest Manager but is still used by many of our restaurant customers. This service was built using a DB-centric architecture, and uses a “DB sync”...

Continue reading

Boosting ML Pipeline Efficiency: Direct Cassandra Ingestion from Spark

Machine Learning Feature Stores ML Feature Store at Yelp Many of Yelp’s core capabilities such as business search, ads, and reviews are powered by Machine Learning (ML). In order to ensure these capabilities are well supported, we have built a dedicated ML platform. One of the pillars of this infrastructure is the Feature Store, which is a centralized data store for ML Features that are the input of ML models. Having a centralized dedicated datastore for ML Features serves a number of purposes: Data Quality and Data Governance Feature discovery Improved operational efficiency Availability of Features in every required environment...

Continue reading

dbt Generic Tests in Sessions Validation at Yelp

Sessions, Where Everything Started For the past few years, Yelp has been using dbt as one of the tools to develop data products that power data marts, which are one stop shops for high visibility dashboards pertaining to top level business metrics. One of the key data products that’s owned by my team, Clickstream Analytics, is the Sessions Data Mart. This product is our in-house solution to understand what consumers do during their session interaction with Yelp products and provide insights on top of it. This blog post will walk you through how dbt is used as an important test...

Continue reading

Implementing multi-metric scaling: making changes to legacy code safely

We’re excited to announce that multi-metric horizontal autoscaling is available for all services at Yelp. This allows us to scale services using multiple metrics, such as the number of in-flight requests and CPU utilization, rather than relying on a single metric. We expect this to provide us with better resilience and faster recovery during outages. This year, PaaSTA (Yelp’s platform-as-a-service, which we use to manage all of the applications running on our infrastructure) turns eleven years old! The first commit was on August 20th, 2013, and the first public commit was on October 22nd, 2015. That’s over half of Yelp’s...

Continue reading

Fine-tuning AWS ASGs with Attribute Based Instance Selection

This is the next installment of our blog series on improving our autoscaling infrastructure. In the previous blog posts (Open-sourcing Clusterman, Recycling kubernetes nodes) we explained the architecture and inner-working of Clusterman. This time we are discussing how attribute based instance selection in the autoscaling group has helped us make our infrastructure more reliable and cost effective, while also decreasing the operation overhead. This will also cover how these changes enabled us to migrate from Clusterman to Karpenter. (Spoiler alert: Karpenter blog post is coming soon!) Motivation At Yelp we run most of our workload on AWS spot instances, and...

Continue reading

Moderating Inappropriate Video Content at Yelp

One of Yelp’s top priorities is the trust and safety of our users. Yelp’s platform is most well-known for its reviews, and its moderation practices have been recognised in academic research for mitigating misinformation and building consumer trust. In addition to reviews, Yelp’s Trust and Safety team takes significant measures when it comes to protecting its users from inappropriate material posted through other content types. This blog post discusses how Yelp protects its users from inappropriate content in videos. Videos at Yelp Recently, Yelp revamped its review experience by giving users the ability to upload videos alongside their review text....

Continue reading

Phone Number Masking for Yelp Services Projects

In this blog post, we highlight how phone number masking helps build consumer trust in the services marketplace at Yelp, decreases the friction in communication with service professionals, and allows for seamless switching between the Yelp app and a user’s phone. We present a high level overview of our in-house phone masking system and dive into the details of the engineering challenge of optimizing the usage of proxy phone number resources at Yelp’s scale. Background Every year, millions of requests for quotes, consultations or other messages are sent to businesses on Yelp. Those users choose to use Yelp to connect...

Continue reading

CHAOS: Yelp's Unified Framework for Server-Driven UI

Yelp develops two major applications, Yelp & Yelp for Business, for Web (Desktop & Mobile), iOS, and Android platforms. That’s eight unique clients! Keeping a fresh, consistent UI on all these clients is a major challenge. Server-driven UI (SDUI) has become a standard industry technique for managing UI on multiple platforms. At Yelp, many product teams created SDUI frameworks for their features. Though successful, these frameworks were expensive to develop and maintain, and no single SDUI framework supported all our clients. In late 2021, we began building a unified SDUI framework called CHAOS or “Content Hosting Architecture with Optimization Strategies”....

Continue reading