Engineering Blog

Introducing Folium: Enabling Reproducible Notebooks at Yelp

Jupyter notebooks are a key tool that powers Yelp data. It allows us to do ad hoc development interactively and analyze data with visualization support. As a result, we rely on Jupyter to build models, create features, run Spark jobs for big data analysis, etc. Since notebooks play a crucial role in our business processes, it is really important for us to ensure the notebook output is reproducible. In this blog post, we’ll introduce our notebook archive and sharing service called Folium and its key integrations with our Jupyterhub that enable notebook reproducibility and improve ML engineering developer velocity. Folium...

Continue reading

Flink on PaaSTA: Yelp’s new stream processing platform runs on Kubernetes

At Yelp we process terabytes of streaming data a day using Apache Flink to power a wide range of applications: ETL pipelines, push notifications, bot filtering, sessionization and more. We run hundreds and hundreds of Flink jobs, so routine operations like deployments, restarts, and savepoints don’t take thousands of hours of developers’ time, which would be the case without the right degree of automation. The latest addition to our toolshed is a new stream processing platform built on top of PaaSTA, Yelp’s Platform As A Service. Sitting at its core, a Kubernetes operator automatically watches over the deployment and the...

Continue reading

The Dream Query: How we scope projects with GraphQL

At Yelp, new web pages and app screens are powered by GraphQL for fetching data. This blog post describes the Dream Query – a pattern our feature teams use when refactoring or creating new pages. (Check out our previous blog post to see how we dynamically codegen DataLoaders to implement the server layer!) Scoping a new feature with GraphQL Let’s jump in with an example! Imagine your team is tasked with creating the new version of the “Header component” for the website (we’ll use the Yelp.com website in our example). You may receive a design mock that looks like this:...

Continue reading

Improving the performance of the Prometheus JMX Exporter

At Yelp, usage of Prometheus, the open-source monitoring system and time series database, is blossoming. Yelp is initially focusing on onboarding infrastructure services to be monitored via Prometheus, one such service being Apache Kafka. This blogpost discusses some of the performance issues we initially encountered while monitoring Kafka with Prometheus, and how we solved them by contributing back to the Prometheus community. Kafka at Yelp primer Kafka is an integral part of Yelp’s infrastructure, clusters are varied in size and often contain several thousand topics. By default, Kafka exposes a lot of metrics that can be collected, most of which...

Continue reading

Introducing Yelp's Machine Learning Platform

Understanding data is a vital part of Yelp’s success. To connect our consumers with great local businesses, we make millions of recommendations every day for a variety of tasks like: Finding you immediate quotes for a plumber to fix your leaky sink Helping you discover which restaurants are open for delivery right now Identifying the most popular dishes for you to try at those restaurants Inferring possible service offerings so business owners can confidently and accurately represent their business on Yelp In the early days of Yelp circa 2004, engineers painstakingly designed heuristic rules to power recommendations like these, but...

Continue reading

How businesses have reacted to COVID-19 using Yelp features

Yelp periodically releases an open, all-purpose dataset for learning. The dataset is a subset of our businesses, reviews, and user data to inform government policy, academic research, and business strategy, among other uses. It has provided opportunities including teaching students about databases, helping others study natural language processing, sampling production data while learning to create mobile apps, and discovering compelling research findings. Our most recent dataset was published in March 2020. Businesses everywhere are adapting to the effects of the Coronavirus and have been using Yelp features to stay connected with their customers. To this end, we’re releasing an addendum...

Continue reading

dataloader-codegen: Autogenerate DataLoaders for your GraphQL Server!

We’re open sourcing dataloader-codegen, an opinionated JavaScript library for automatically generating DataLoaders over a set of resources (e.g. HTTP endpoints). Go check it out on GitHub! This blog post discusses the motivation and some the lessons we learned along the way. Managing GraphQL DataLoaders at Scale At Yelp, we use GraphQL to provide data for our React webapps. The GraphQL Server is deployed as a public gateway that wraps hundreds of internal HTTP endpoints that are distributed across hundreds of services. GraphQL Request Diagram DataLoaders DataLoaders provide an important caching/optimization layer in many GraphQL servers. If you aren’t already familiar...

Continue reading

An Ever Evolving Company Requires an Ever Evolving Communication Plan

It’s 2014 and your teams are divided by platform, something like: Web, Mobile Web, Android, and iOS. In order to launch features, product managers jump from platform to platform and teams move fast. Really fast. Lines of code in each repository increase to the point where you now name them “monoliths.” A few engineers maintain these monoliths when they need to, but no one is solely dedicated to the task. Engineers are distributed by platform; so communication on when to maintain the monoliths is easy, but presents another problem. Can you continue to ship code efficiently if you depend entirely...

Continue reading