Engineering

Engineering Blog

Generating Web Pages in Parallel with Pagelets, the Building Blocks of Yelp.com

At Yelp, pagelets are a server-side optimization to parallelize the rendering of web pages across multiple web workers (loosely inspired by Facebook’s Big Pipe). We’ve implemented this and have been running it successfully in production for a while now. This blog post is about our journey implementing and rolling out pagelets, including what we’ve learned since the initial rollout. Pagelets at Yelp: an overview Main and pagelet workers Usually a request made to Yelp is fulfilled by a single web worker. This worker is in charge of generating a response (in the form of an HTTP packet, with headers and...

Continue reading

Moving Yelp's Core Business Search to Elasticsearch

While newer search engines at Yelp typically use Elasticsearch as a backend, Yelp’s core business search used its own custom backend, built directly on top of Lucene. This system was one of the oldest systems at Yelp to still be deployed in production. Some features of this custom search engine were Distributed Lucene instances Master-slave architecture Custom text analysis support for various languages Custom business ranking which relied mostly on using business features (think business attributes like reviews, name, hours_open, service_areas, etc.) Derived Yelp analytics data to improve quality of search results; e.g. most popular queries for a business ##Problems...

Continue reading

Upcoming Deprecation of Yelp API v2

As we continue to invest in the Yelp Fusion API, we are announcing that Yelp API v2 will be discontinued on June 30, 2018. If you are currently using v2 API endpoints, you have until June 30, 2018 to move them over to the Yelp Fusion API. We hope this gives everyone sufficient time to transition your applications from Yelp API v2 to Yelp Fusion, giving you the ability to tap into awesome new features like GraphQL, which we launched last month. Here is a summary of the Yelp API v2 sunset timeline: April 1, 2017 - Disabled Yelp API...

Continue reading

Yelp Dataset Challenge Round 8 Winner

Yelp Dataset Challenge Round 8 Winners The eighth round of the Yelp Dataset Challenge ran throughout the first half of 2017 and, as usual, we received a large number of very impressive and interesting submissions. Today, we are proud to announce the grand prize winner of the $5,000 award: “Clustered Model Adaption for Personalized Sentiment Analysis” by Lin Gong, Benjamin Haines, and Hongnin Wang (from the Department of Computer Science of the University of Virginia). The authors built a personalized sentiment classification model at the group level. Their model is based on social theories about group psychology and how human...

Continue reading

Moving the Rest of the Monolith to PaaSTA

This past April (2017) we finally migrated our monolith to PaaSTA (our open source PaaS based on Apache Mesos). Yes, although Yelp does subscribe to the Service-Oriented-Architecture theory and we constantly try to reduce the scope of the monolith, realistically it still looms over us as a large towering codebase that pays the bills. But that doesn’t mean we can’t try to constantly improve it. This blog post is about our latest improvement to the monolith: treating it just like any other service at Yelp and running it on PaaSTA. Background: What is Yelp’s Monolith Made Of? Yelp’s monolith is...

Continue reading

Making Photos Smaller Without Quality Loss

Yelp has over 100 million user-generated photos ranging from pictures of dinners or haircuts, to one of our newest features, #yelfies. These images account for a majority of the bandwidth for users of the app and website, and represent a significant cost to store and transfer. In our quest to give our users the best experience, we worked hard to optimize our photos and were able to achieve a 30% average size reduction. This saves our users time and bandwidth and reduces our cost to serve those images. Oh, and we did it all without reducing the quality of these...

Continue reading

Taking Zero-Downtime Load Balancing even Further

Ever since we rolled out our zero-downtime HAProxy reload system a few years ago, we have been disappointed that it required additional investment to work well for our external load balancing on our edge. We did generate a prototype that used an intermediary qdisc so we could apply the approach, but after evaluating the prototype, and finding that Linux wasn’t going to fix the upstream Kernel issue, we decided to go another way. Our edge is different than our internal load balancing tier because we have typically terminated TLS with another great proxy: NGINX. NGINX is useful because it does...

Continue reading

Introducing Yelp's Local Graph

We’re continuously adding new features to our API to make it easier for developers to integrate with our data and share great local businesses through their apps. Today, we’re releasing access to query our data via GraphQL, a graph query language. This is available immediately through our developer beta program. What is GraphQL? GraphQL is a query language for APIs that places emphasis on being able to query for exactly the data you want. I’m sure most of you at some point have thought, “I really wish this endpoint had more data,” or, “I only need one or two pieces...

Continue reading