Engineering

Engineering Blog

Active Directory Password Blacklisting

Many enterprise professionals use passwords that are weak and easily compromised. Equipped with this knowledge, as well as the exposure of more and more password leaks, dictionary attacks focused on compromised or popular passwords have become increasingly effective. As such, the National Institute of Standards and Technology recommends password blacklisting as a highly-effective means of preventing such attacks. Unfortunately, use of password blacklisting countermeasures has remained a relatively new innovation that has not yet achieved widespread corporate adoption. At Yelp, however, we strive to add the latest and greatest defense mechanisms to our arsenal, which is why we adopted such...

Continue reading

Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift

Since Yelp introduced its real-time streaming data infrastructure, “Data Pipeline”, it has grown in scope and matured vastly. It now supports some of Yelp’s most critical business requirements in its mission to connect people with great local businesses. Today, it has expanded into a diverse ecosystem of connectors sourcing data from Kafka and MySQL, and sinking data into Cassandra, Elasticsearch, Kafka, MySQL, Redshift, and S3. To ensure that the whole ecosystem is functioning correctly, Yelp’s Data Pipeline infrastructure is continually growing its repertoire of reliability techniques such as write-ahead logging, two-phase commit, fuzz testing, monkey testing, and black-box auditing to...

Continue reading

Caching Internal Service Calls at Yelp

Casper is a caching proxy designed to intercept traffic flowing between internal Yelp services. It is built with Nginx and OpenResty at its core and contains some logic in Lua to fit in our ecosystem. Today we’re proud to announce that Casper is opensource and available on Github. To introduce the context in which Casper was created, this post outlines a few basics about Yelp’s SOA, explains the technical decisions behind Casper’s design, and finally exposes concrete problems that we’ve encountered while rolling it out and running it in our production environment. Moving past “Memcached for everything” Yelp has had...

Continue reading

CSS in the Age of React: How We Traded the Cascade for Consistency

With hundreds of engineers, developers and designers working on Yelp, ensuring visual consistency across Yelp is a challenging task. We’ve been migrating our web components from Yelp Cheetah to React to increase designer and developer productivity while ensuring visual consistency across our web app. Along the way, we built Lemon Reset - a package containing consistent, cross-browser React DOM tags, powered by CSS Modules. Since our design system components are the building blocks of our frontend, we had to port them to React as the first step before our developers could port their features. We made a lot of design...

Continue reading

Introducing LogFeeder - A log collection system

Introduction The collection and processing of logs is essential to good security. One of the primary functions of a security team is to keep organizations safe by eliminating blind spots in infrastructure. Breach investigations without logs result in a lot of guesswork. Worse, the activities of an attacker can easily remain undiscovered without adequate logging. To ensure we have a robust log storage and visualization platform, we use Elasticsearch, Logstash and Kibana (ELK). These tools form part of the toolset that we use in our Security Incident and Event Monitoring (SIEM) solution. ElastAlert is the primary means by which alerts...

Continue reading

Celebrating the Women of Yelp: AWE the Book

As a recruiter, I talk to a lot of people about what it’s like to work at Yelp. Most often, I find myself answering questions about the work environment and individual growth opportunities. During my four and a half years at Yelp, I would summarize the people here as very sharp and intelligent, while also humble and open minded. This spirit has fostered an environment that encourages individuals to learn by trying things for themselves (new hires get to push code out their first week!) and empowers them to ask questions. This collaborative work culture invites tremendous opportunity and gives...

Continue reading

Making 30x performance improvements on Yelp’s MySQLStreamer

Introduction MySQLStreamer is an important application in Yelp’s Data Pipeline infrastructure. It’s responsible for streaming high-volume, business-critical data from our MySQL clusters into our Kafka-powered Data Pipeline. When we rolled out the first test version of MySQLStreamer, the system operated at under 100 messages/sec. But for it to keep up with our production traffic, the system needed to process upwards of thousands of messages/sec (MySQL databases at Yelp on an average receive over hundreds of millions of data manipulation requests per day, and tens of thousands of queries per second). In order to make that happen, we used a variety...

Continue reading

Yelp Dataset Challenge Round 11 Announcement and Kaggle Weekly Kernel Award

Yelp Dataset Challenge Round 11 Is On! The eleventh round of the Yelp Dataset Challenge has opened. It will run until June 30, 2018. As in the past, the Yelp Dataset Challenge gives college students access to reviews and businesses from 11 metropolitan areas scattered over 4 different countries. This time around, there are a staggering 5.26 million reviews written by 1.3 million users about 175,000 businesses, as well as 146,350 check-ins and 1.1 million tips left by these users. Moreover, we have added photos about these businesses in a separate file, for convenience. With such a trove of data,...

Continue reading