Engineering Blog

Streaming Cassandra into Kafka in (Near) Real-Time: Part 1

At Yelp, we use Cassandra to power a variety of use cases. As of the date of publication, there are 25 Cassandra clusters running in production, each with varying sizes of deployment. The data stored in these clusters is often required as-is or in a transformed state by other use cases, such as analytics, indexing, etc. (for which Cassandra is not the most appropriate data store). As seen in previous posts from our Data Pipeline series, Yelp has developed a robust connector ecosystem around its data stores to stream data both into and out of the Data Pipeline. This two-part...

Continue reading

Organizing and Securing Third-Party CDN Assets at Yelp

At Yelp, we use a service-oriented architecture to serve our web pages. This consists of a lot of frontend services, each of which is responsible for serving different pages (e.g., the search page or a business listing page). In these frontend services, we use a couple of third-party JavaScript/CSS assets (React, Babel polyfill, etc.) to render our web pages. We chose to serve such assets using a third-party Content Delivery Network (CDN) for better performance. In the past, if a frontend service needed to use a third-party JavaScript/CSS asset, engineers had to hard-code its CDN URL. For example: <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.8.3/jquery.min.js"...

Continue reading

Remember Clusterman? Now It's Open-Source, and Supports Kubernetes Too!

Earlier this year, I wrote a blog post showing off some cool features of our in-house compute cluster autoscaler, Clusterman (our Cluster Manager). This time, I’m back with two announcements that I’m really excited about! Firstly, in the last few months, we’ve added another supported backend to Clusterman; so not only can it scale Mesos clusters, it can also scale Kubernetes clusters. Second, Clusterman is now open-source on GitHub so that you, too, can benefit from advanced autoscaling techniques for your compute clusters. If you prefer to just read the code, you can head there now to find some examples...

Continue reading

Inside TensorFlow

It’s probably not surprising that Yelp utilizes deep neural networks in its quest to connect people with great local businesses. One example is the selection of photos you see in the Yelp app and website, where neural networks try to identify the best quality photos for the business displayed. A crucial component of our deep learning stack is TensorFlow (TF). In the process of deploying TF to production, we’ve learned a few things that may not be commonly known in the Data Science community. TensorFlow’s success stems not only from its popularity within the machine learning domain, but also from...

Continue reading

Winning the Hackathon with Sourcegraph

Visualizing how code is used across the organization is a vital part of our engineers’ day-to-day workflow - and we have a *lot* of code to search through! This blog post details our journey of adopting Sourcegraph at Yelp to help our engineers maintain and dig through the tens of gigabytes of data in our git repos! Here at Yelp, we maintain hundreds of internal services and libraries that power our website and mobile apps. Examples include our mission-critical “emoji service” which helps translate and localize emojis, as well as our “homepage service” which… you guessed it, serves our venerable...

Continue reading

Beyond Labels: Stories of Asian Pacific Islanders at Yelp

During Asian Pacific American Heritage Month, ColorCoded (a Yelp employee resource group) hosted a panel discussion called “Beyond Labels: Stories of Asian Pacific Islanders (API)* at Yelp.” We heard stories from five API Yelpers about their cultural backgrounds, identities, and thoughts on what it means to be an API in today’s world. Their stories helped us understand that identity is both multilayered and contextual, and that individuality goes beyond labels. Read more from their unique perspectives below. Tenzin Kunsal, Events + Partnerships, Engineering Recruiting From a young age, I knew the concept of “home” was complicated. Like many refugees, my...

Continue reading

Open sourcing spark-redshift-community

At Yelp, we are heavy users of both Spark and Redshift. We’re excited to announce spark-redshift-community, a fork from databricks’ original spark-redshift project. spark-redshift is a Scala package which uses Amazon S3 to efficiently read and write data from AWS Redshift into Spark DataFrames. After the open source project effort was abandoned in 2017, the community has struggled to keep up with updating dependencies and fixing bugs. The situation came to a complete halt upon release of Spark 2.4 which was sharply incompatible with the latest spark-redshift. Developers looking for a solution turned to online threads on websites like StackOverflow...

Continue reading

Redesigning Yelp for Apple Watch with SwiftUI

At this year’s WWDC, Apple unveiled SwiftUI, a framework that helps developers build declarative user interfaces. At Yelp, we were immediately excited about it and were looking for a way to start adopting it. We decided that our Apple Watch application was the perfect candidate for modernization using SwiftUI and were excited to explore a redesign with this new framework. At Yelp, one of the things we pride ourselves on is the quality of our content. Yelp users have posted hundreds of millions of reviews and photos. As we set out to re-imagine the user interface for our Apple Watch...

Continue reading