Engineering Blog

ElastAlert: Alerting At Scale With Elasticsearch, Part 2

It’s 10:51 PM on a Friday, and someone on the internet has decided to try to break into your network. They are guessing passwords and generating failed login events. Your security team is paged, the attacker is blocked, and everyone can go back to bed. This is one example of the power of ElastAlert. Now we’ll give you background on how it works and how to set it up yourself. In part one of this blog post, we introduced an open source alerting framework for Elasticsearch which allows you to match and take action on a wide variety of patterns....

Continue reading

Introducing venv-update

venv-update is an MIT-Licensed tool to quickly and exactly synchronize a Python project’s virtualenv with its requirements. This project ships as two separable components: pip-faster and venv-update. Both are designed for use on large Python projects with hundreds of requirements and are used daily by Yelp engineers. For complete documentation, please see http://venv-update.rtfd.org Making large Python projects painless The majority of yelp.com is implemented in a single Python project, dubbed “yelp-main”. Initially, yelp-main installed all of its dependencies at the system level. We’ve done the work to transition this to using virtualenv, and managing yelp-main’s (Python) requirements via pip and...

Continue reading

Leaping into February

February is going to be an exciting month! We’re looking forward to hosting our third Girl Geek Dinner next week which will feature talks from Yelp engineers on topics like learning, system performance metrics, and service monitoring. We’ll be present at the WSDM Conference that focuses on data mining and search, and at the Lesbians Who Tech Summit in San Francisco. If you’re in the area, swing by to say hi! We’ll be the ones giving away the infamous Yelp mints. This month we’ll also be hosting SF Python, Designers + Geeks, and Products that Count in the office. At...

Continue reading

Announcing the Yelp Dataset Challenge Round 7

What can you learn from a Photo? Show us with the Yelp Dataset Challenge Round 7! The Challenge The Yelp Dataset Challenge provides the academic community with a real-world dataset over which to apply their research. We encourage students to take advantage of this wealth of data to develop and extend their own research in data science and machine learning. Students who submit their research are eligible for cash awards and incentives for publishing and presenting their findings. A new round of the Yelp Dataset Challenge (our seventh already!) opened on January 15, 2016, giving students access to reviews and...

Continue reading

Yelp Dataset Challenge Round 5 Winner

Yelp Dataset Challenge Round 5 Winners The fifth round of the Yelp Dataset Challenge ran throughout the first half of 2015 and we were quite impressed with the projects and concepts that came out of the challenge. Today, we are proud to announce the grand prize winner of the $5,000 award: “From Group to Individual Labels Using Deep Features” by Dimitrios Kotzias, Misha Denil, Nando De Freitas, and Padhraic Smyth (from the University of California, Irvine, the University of Oxford, and the Canadian Institute for Advanced Research). This paper proposes a novel approach to using group-level labels (e.g. the category...

Continue reading

Critical CSS Middleware: Inlining The Important CSS rules On-The-Fly

Website performance can be judged in a lot of ways, but perhaps the most important is user-perceived performance: the amount of time that is taken between clicking a link and having the desired page rendered on the screen. A big part of keeping things feeling snappy is understanding which bits of content are blocking the “critical rendering path,” and coming up with ways to shorten or unblock them. At Yelp we focused on shortening the process of loading our CSS stylesheets. Before the browser can begin rendering the page, it needs to have its HTML markup and CSS rules. Usually,...

Continue reading

Introducing dumb-init, an init system for Docker containers

At Yelp we use Docker containers everywhere: we run tests in them, build tools around them, and even deploy them into production. In this post we introduce dumb-init, a simple init system written in C which we use inside our containers. Lightweight containers have made running a single process without normal init systems like systemd or sysvinit practical. However, omitting an init system often leads to incorrect handling of processes and signals, and can result in problems such as containers which can’t be gracefully stopped, or leaking containers which should have been destroyed. dumb-init is simple to use and solves...

Continue reading

Introducing the Yelp Restaurant Photo Classification Challenge

We’re excited to release our first image dataset with hundreds of thousands of user-submitted photos as part of a challenge to all data scientists, launching this week on Kaggle! Yelp’s users provide several kinds of “unstructured” data such as reviews, photos, and videos. They can also answer structured questions like, “Is the restaurant romantic?” These structured answers are incredibly useful to users who want a quick summary of important attributes of a business. We want to know: can you extract these attributes from our photos dataset, and what is the right way to approach this problem? If this type of...

Continue reading