Engineering

Engineering Blog

PySpark Coding Practices: Lessons Learned

In our previous post, we discussed how we used PySpark to build a large-scale distributed machine learning model. In this post, we will describe our experience and some of the lessons learned while deploying PySpark code in a production environment. Yelp’s systems have robust testing in place. It’s a hallmark of our engineering. It allows us to push code confidently and forces engineers to design code that is testable and modular. Broadly speaking, we found the resources for working with PySpark in a large development environment and efficiently testing PySpark code to be a little sparse. By design, a lot...

Continue reading

Scaling Collaborative Filtering with PySpark

Here at Yelp our core mission is to connect users with great local businesses. On the engineering side, that requires us to tackle big data and machine learning problems. One of the natural ways to apply machine learning to our core mission is to deliver relevant and personalized business recommendations. This helps us deliver great content to users. For example, we can let users know when a hot and new coffee shop we think they will love opens in their neighborhood through push notifications or features like collections. This is an area of active research and development at Yelp, and...

Continue reading

Performance Improvements for Search on The Yelp Android App - Part 1

#perfmatters At Yelp, we’ve been working hard to improve the performance of our search results on mobile devices (iOS and Android). We know that performance matters from a numbers perspective, but another reason why we’ve decided to invest more in performance is due to a recent increase in user studies commissioned by Yelp. We noticed that most users “grunted or made noises when waiting for search to load”. We took this as a sign that something needed to be done! In our quest to make search faster on the Android mobile app client, we broke down performance into two different...

Continue reading

Introducing Salsa: A tool for exporting iOS components into Sketch

What is Salsa? Salsa is an open source library that renders iOS views and exports them into a Sketch file. We built Salsa to help bridge the gap between design and engineering in an effort to create a single source of truth for visual styling of UI. Why use Salsa A few years ago, we started putting together a library of common components that developers and designers could use to build features. Initially, we had to manually maintain consistency between the designs in Sketch and their implementations in code. When we only had a handful components, this wasn’t difficult to...

Continue reading

Active Directory Password Blacklisting

Many enterprise professionals use passwords that are weak and easily compromised. Equipped with this knowledge, as well as the exposure of more and more password leaks, dictionary attacks focused on compromised or popular passwords have become increasingly effective. As such, the National Institute of Standards and Technology recommends password blacklisting as a highly-effective means of preventing such attacks. Unfortunately, use of password blacklisting countermeasures has remained a relatively new innovation that has not yet achieved widespread corporate adoption. At Yelp, however, we strive to add the latest and greatest defense mechanisms to our arsenal, which is why we adopted such...

Continue reading

Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift

Since Yelp introduced its real-time streaming data infrastructure, “Data Pipeline”, it has grown in scope and matured vastly. It now supports some of Yelp’s most critical business requirements in its mission to connect people with great local businesses. Today, it has expanded into a diverse ecosystem of connectors sourcing data from Kafka and MySQL, and sinking data into Cassandra, Elasticsearch, Kafka, MySQL, Redshift, and S3. To ensure that the whole ecosystem is functioning correctly, Yelp’s Data Pipeline infrastructure is continually growing its repertoire of reliability techniques such as write-ahead logging, two-phase commit, fuzz testing, monkey testing, and black-box auditing to...

Continue reading

Caching Internal Service Calls at Yelp

Casper is a caching proxy designed to intercept traffic flowing between internal Yelp services. It is built with Nginx and OpenResty at its core and contains some logic in Lua to fit in our ecosystem. Today we’re proud to announce that Casper is opensource and available on Github. To introduce the context in which Casper was created, this post outlines a few basics about Yelp’s SOA, explains the technical decisions behind Casper’s design, and finally exposes concrete problems that we’ve encountered while rolling it out and running it in our production environment. Moving past “Memcached for everything” Yelp has had...

Continue reading

CSS in the Age of React: How We Traded the Cascade for Consistency

With hundreds of engineers, developers and designers working on Yelp, ensuring visual consistency across Yelp is a challenging task. We’ve been migrating our web components from Yelp Cheetah to React to increase designer and developer productivity while ensuring visual consistency across our web app. Along the way, we built Lemon Reset - a package containing consistent, cross-browser React DOM tags, powered by CSS Modules. Since our design system components are the building blocks of our frontend, we had to port them to React as the first step before our developers could port their features. We made a lot of design...

Continue reading