Engineering Blog

Revenue Automation Series: Testing an Integration with Third-Party System

Background As described in the second blog post of Revenue Automation series, Revenue Data Pipeline processes a large amount of data via complex logic transformations to recognize revenue. Thus, developing a robust production testing and integration strategy was essential to the success of this project phase. The status quo testing process utilized the Redshift Connector for data synchronization once the report was generated and published to the data warehouse (Redshift). This introduced a latency of approximately 10 hours before the data was available in the data warehouse for verification. This delay impacted our ability to verify whether the changes were...

Continue reading

Nrtsearch 1.0.0: Incremental Backups, Lucene 10, and More

It has been over 3 years since we published our Nrtsearch blog post and over 4 years since we started using Nrtsearch, our Lucene-based search engine, in production. We have since migrated over 90% of Elasticsearch traffic to Nrtsearch. We are excited to announce the release of Nrtsearch 1.0.0 with several new features and improvements from the initial release. Glossary EBS (Elastic Block Store): Network-attached block storage volumes in AWS. HNSW (Hierarchical Navigable Small World): A graph-based approximate nearest neighbor search technique. Lucene: An open-source search library used by Nrtsearch. S3: Cloud object storage offered in AWS. Scatter-gather: A pattern...

Continue reading

Journey to Zero Trust Access

Glossary ZTA: zero trust architecture SAML: security assertion markup language (an SSO facilitation protocol) Devbox: a remote server used to develop software Zero Trust Access Remote Future Yelp is now a fully remote company, which means our employee base has become increasingly distributed across the world, making secure access to resources from anywhere a critical business function. Yelp historically used Ivanti Pulse Secure as the employee VPN, but due to the need for a more reliable solution, it became clear that a change was necessary to ensure secure and consistent access to internal resources. The Corporate Systems and Client Platform...

Continue reading

Revenue Automation Series: Building Revenue Data Pipeline

Background As Yelp’s business continues to grow, the revenue streams have become more complex due to the increased number of transactions, new products and services. These changes over time have challenged the manual processes involved in Revenue Recognition. As described in the first post of the Revenue Automation Series, Yelp invested significant resources in modernizing its Billing System to fulfill the pre-requisite of automating the revenue recognition process. In this blog, we would like to share how we built the Revenue Data Pipeline that facilitates the third party integration with a Revenue Recognition SaaS solution, referred to hereafter as the...

Continue reading

Search Query Understanding with LLMs: From Ideation to Production

How we bring LLM intelligence to millions of daily searches at Yelp. From the moment a user enters a search query to when we present a list of results, understanding the user’s intent is crucial for meeting their needs. Were they looking for a general category of business for that evening, a particular dish or service, or one specific business nearby? Does the query contain nuanced location or attribute information? Is the query misspelled? Is their phrasing unusual, so that it might not align well with our business data? All of the above questions represent Natural Language Understanding tasks where...

Continue reading

Enhancing Neural Network Training at Yelp: Achieving 1,400x Speedup with WideAndDeep

At Yelp, we encountered challenges that prompted us to enhance the training time of our ad-revenue generating models, which use a Wide and Deep Neural Network architecture for predicting ad click-through rates (pCTR). These models handle large tabular datasets with small parameter spaces, requiring innovative data solutions. This blog post delves into our journey of optimizing training time using TensorFlow and Horovod, along with the development of ArrowStreamServer, our in-house library for low-latency data streaming and serving. Together, these components have allowed us to achieve a 1400x speedup in training for business critical models compared to using a single GPU...

Continue reading

Revisiting Compute Scaling

As mentioned in our earlier blog post Fine-tuning AWS ASGs with Attribute Based Instance Selection, we recently embarked on an exciting journey to enhance our Kubernetes cluster’s node autoscaler infrastructure. In this blog post, we’ll delve into the rationale behind transitioning from our internally developed Clusterman autoscaler to AWS Karpenter. Join us as we explore the reasons for our switch, address the challenges with Clusterman, and embrace the opportunities with Karpenter. Clusterman and its challenges At Yelp, we used Clusterman to handle autoscaling of nodes in Kubernetes clusters. It is an open-source tool we initially designed for Mesos clusters and...

Continue reading

Revenue Automation Series: Modernizing Yelp's Legacy Billing System

This blog focuses on how Yelp successfully implemented a multi-year, cross-organizational initiative to modernize its billing processes. The goal was to automate its revenue recognition system by enhancing integration capabilities with third-party financial systems, all while maintaining the accuracy and reliability our users expect. Summary When Yelp first developed its billing system a decade ago, the database design was based on the requirements known at that time. These initial choices laid the foundation for the billing system, upon which multiple Yelp systems and processes were built. However, as the company evolved, it became evident that these design choices were not...

Continue reading