Engineering Blog

September 10th, 2014

Yelp Sponsors Women Who Code (WWCode)

We’re happy to announce that we are one of the first official sponsors of Women Who Code! WWCode is an organization whose goal is to help women excel in technology careers.

WWCode and Yelp started working together three years ago when the meetup group was created. We’ve hosted many of their events ranging from Ruby workshops to discussion panels including CEOs and CTOs. Since their launch, WWCode has grown to 14,000 members across 14 countries worldwide. By sponsoring their new non-profit (as of this July!), we’re excited and happy to help them achieve their goals of expanding into 50 cities worldwide by 2015 with 1 million members by 2019.

As WWCode expands, they’re looking to reach out to top technical universities around the nation in order to introduce women to engineering at a younger age. We will be partnering with them at their first pilot university, Waterloo, this fall.

“Yelp has supported most of Women Who Code’s major events over the past three years,” said Alaina Percival, WWCode CEO. “Collaborating with Yelp will be key in reaching our goal of being in 50 cities by the end of the year.”

Want to get involved? You can help support WWCode by attending one of their upcoming events listed here:

Grace Hopper Practice Talks
September 23, 2014
Doors open: 6:30PM
Yelp HQ

Women Who Code Fundraiser – Applaud Her
October 23, 2014
Doors open: 6:00PM
Zendesk HQ

September 8th, 2014

Making Sense of CSP Reports @ Scale

CSP is Awesome

Content Security Policy isn’t new, but it is so powerful that it still feels like the new hotness. The ability to add a header to HTTP responses that tightens user-agent security rules and reports on violations is really powerful. Don’t want to load scripts from third party domain? Set a CSP and don’t.  Trouble with mixed content warnings on your HTTPS domain? Set a CSP and let it warn you when users are seeing mixed content. Realistically, adding new security controls to a website and a codebase as large as Yelp needs to be a gradual process. If we apply the new controls all at once, we’ll end up breaking our site in unexpected ways and that’s just not cool. Fortunately, CSP includes a reporting feature – a “lemme know what would happen, but don’t actually do it” mode. By using CSP reporting, Yelp is able to find and fix problems related to new CSP controls before they break our site.

Reading Sample CSP Report

CSP reports are JSON documents POSTed from a user’s browser to Yelp. An example report might look like:

{
 "csp-report": {
   "document_uri": "https://biz.yelp.com/foo",
   "blocked_uri": "http://www.cooladvertisement.bro/hmm?x=asdfnone",
   "referrer": "https://biz.yelp.com",
   "source_file": "https://biz.yelp.com/foo",
   "violated_directive": "script-src https:",
   "original_policy": "report-uri https://biz.yelp.com/csp_report; default-src https:; script-src https:; style-src https:"
 }
}

This report says, “I went to https://biz.yelp.com/foo but it loaded some stuff from cooladvertisement.bro over HTTP and I showed a mix content warning.” Looks like www.cooladvertisement.bro needs to get loaded over HTTPS and then all will be good.

Making Sense of CSP Reports @ Scale

It’s easy to read a single CSP report but what if you’re getting thousands of reports a minute? At that point you need to use some smart tools and work with the data to make sense of everything coming in. We wanted to reduce noise as much as possible so had to take a few steps to do that.

Get rid of malformed or malicious reports

Not all reports are created equally.  Some are missing required fields and some aren’t even JSON. If you have an endpoint on your website where users can POST arbitrary data, there will  be a lot of noise mixed with the signal.  The first thing we do is discard any reports that aren’t well formed JSON and don’t contain the necessary keys.

Massage the reports to make them easier to aggregate

It was helpful to group similar reports and apply the Pareto principle to guide our efforts at addressing CSP reports. We take any URI in the report and chop it down to it’s domain, getting rid of the uniqueness of nonces, query params, and unique IDs, making it easier to group

{
 "csp-report": {
   "document_uri": "https://biz.yelp.com/foo",
   "document_uri_domain": "biz.yelp.com"
   "blocked_uri": "http://www.cooladvertisement.bro/hmm?x=asdfnone",
   "blocked_uri_domain": "www.cooladvertisement.bro",
   ...
 }
}

Discard unhelpful reports

Surprisingly, you’ll see a whole lot of stuff that’s not really about your website when you start collecting reports. We found some good rules to discard the unhelpful data.

blocked_uri and source_file must start with http

We see loads of reports with browser specific URI schemes, stuff related to extensions or the inner workings of a browser like chromeinvoke:// or safari-extension://.  Since we can’t fix these, we ignore them. source_file is an optional field in a CSP report, so we apply this rule to source_file only when it has a value.

document_uri must match the subdomain the report was sent to

If we’re looking at CSP reports that were sent to biz.yelp.com then we’re only interested in reports about documents on biz.yelp.com.  All sorts of strange proxy services or client side ad injectors will land up serving a modified copy of your page and generate reports for things you can’t fix.

Retain some useful data

We don’t want to lose useful data that came in as part of the POST request, so we tack it onto the report. Info like user-agent can be super helpful in tracking down a “Oh… that’s only happening on the iPhone” issue.

{
 "csp-report": {
   "document_uri": "https://biz.yelp.com/foo",
   "document_uri_domain": "biz.yelp.com"
   ...
 },
 "request_metadata": {
    "server_time": 1408481038,
    "yelp_site": "biz",
    "user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53",
    "remote_ip": "127.0.0.1"
  }
}

Throw this all in a JSON log

Once we’ve got a nice, well formed JSON report with some helpful extras, we throw it into a log. Logs aggregate from our various datacenters and make themselves available as a stream for analysis tools.

Visualize, monitor, and alert for the win

The Yelp security team is a huge fan of Elasticsearch/Logstash/Kibana.  Like we do with pretty much any log, we throw these CSP reports into our ELK cluster and visualize the results.

CSP Kibana Dashboard

This image shows a rapid decrease in incoming CSP report volume after fixing a page that caused mixed content warnings

From there it’s easy for our engineers to view trends, drill into specific reports, and make sense of reports at scale. We’re also adding monitoring and alerting to the reports in our Elasticsearch cluster so it can let us know if report volumes rise or new issues crop up.

Give it a try

We’re making sense of CSP reports at scale and that’s super useful in monitoring and increasing web application security. We’d love to hear from you about how you’re using CSP. Let us know at opensource@yelp.com.

September 4th, 2014

Maker’s Day – Or How Thursdays Became Every Yelp Engineer’s Dream

It’s been a busy past few years here at Yelp Engineering. With our 10th anniversary this year and our recent launch in Chile, we think it’s safe to say we’re on to something. But it would be foolish of us to stop here. At the end of the day, we’re engineers: we live for the fact that there are still so many challenging problems to solve, features to improve, and datasets to explore. With the size of the projects we’re tackling nowadays, our Engineering and Product Management teams need to be in constant contact to coordinate development, testing, and release. We fully embrace the rapid iterative process customary of Agile, Scrum, and XP programming, so you’ll often see a product manager and engineer hashing an idea out at one of our large built-in whiteboards or in the team’s pod. Soon enough, though, we found ourselves with schedules like this:

makers-day-calendar

Coordination is incredibly important, but we also need time to actually build all those cool features we come up with. That’s why, about a year ago, we introduced Maker’s Day here at Yelp.

So what is Maker’s Day? The concept is pretty simple: meetings, interviews, and general interruptions aren’t allowed for engineers on Thursdays. Some teams even cancel standups on those days while others use them as a quick way to unblock folks so that there are fewer disruptions later on. If any questions come up, we use email instead of showing up at a person’s desk or pinging them over IM. Outside of those general guidelines, how engineers use Maker’s Day is really up to them: some make it into a long, uninterrupted coding period, others prefer it for reviewing designs and diving deep into a topic. And by the way, for those engineering managers out there, this applies to us, too.

We’re certainly not the first to come up with this idea. Back in 2009, Paul Graham, in his “Maker’s Schedule, Manager’s Schedule” post, wrote how the partners at YC Combinator were implementing the idea: “You can’t write or program well in units of an hour. That’s barely enough time to get started.” Craig Kerstiens of Heroku mentioned, as part of his How Heroku Works series, how the value of Maker’s Day had increased exponentially as the company had grown. Intel even jumped into the discussion with hard facts from their “Quiet Time” pilot. Closer to the Python Community, Daniel Greenfeld tweeted what everyone was thinking back in 2012:

So how has Maker’s Day done here at Yelp? We don’t have spreadsheets of numbers to prove its success. However, on Thursdays, you’ll notice the engineering floors are a tad quieter, and folks are eager to get to their desks and jump into whatever task they’ve lined up for that day. That’s enough for us to stick with it.

In the end, Maker’s Day was a good step, but we don’t think it’s the be-all end-all solution. Similar to our software development strategy, we’re also constantly iterating with our processes within Engineering. If you love thinking about these kinds of problems, we’re always looking for great Engineering Managers to help grow our talented team of engineers.

August 27th, 2014

Announcing pre-commit: Yelp’s Multi-Language Package Manager For Pre-Commit Hooks

Screen Shot 2014-08-26 at 11.43.11 AM

At Yelp we rely heavily on pre-commit hooks to find and fix common issues before changes are submitted for code review. We run our hooks before every commit to automatically point out issues like missing semicolons, whitespace problems, and testing statements in code. Automatically fixing these issues before posting code reviews allows our code reviewer to pay attention to the architecture of a change and not worry about trivial errors.

As we created more libraries and projects we recognized that sharing our pre commit hooks across projects is painful. We copied and pasted bash scripts from project to project and had to manually change the hooks to work for different project structures.

We believe that you should always use the best industry standard linters. Some of the best linters are written in languages that you do not use in your project or have installed on your machine. For example scss-lint is a linter for SCSS written in Ruby. If you’re writing a project in node you should be able to use scss-lint as a pre-commit hook without adding a Gemfile to your project or understanding how to get scss-lint installed.

We built pre-commit to solve our hook issues. It is a multi-language package manager for pre-commit hooks. You specify a list of hooks you want and pre-commit manages the installation and execution of any hook written in any language before every commit. pre-commit is specifically designed to not require root access. If one of your developers doesn’t have node installed but modifies a JavaScript file, pre-commit automatically handles downloading and building node to run jshint without root.

Read more about pre-commit at: http://pre-commit.com

August 26th, 2014

HACK209 – dockersh

tdoran-docker

Yelp’s engineering team loves Docker. We’re already using it for a growing number of projects internally but there are applications that Docker isn’t a great fit for, such as providing independent (VM like) containers on shared hardware for people to interactively ssh into. You can, of course, run sshd inside a Docker container but you can only run one sshd per port. If you wanted multiple users to be able to ssh into the same server, without having a custom port allocated per user, you’re out of luck.

To solve this problem we built dockersh, a user shell for isolated, containerized environments in Docker. It’s available as a container on dockerhub with a 1 step install. This is a fully functional and usable implementation that you can play with. I’m already using it on some of my home servers to separate people’s screen/tmux sessions for irc into separate containers.

Originally I solved this by using the ssh ForceCommand setting so that users would log into the host and then immediately be forced to ssh to a Docker container. This was not ideal as key management and mapping user’s ForceCommand setting was complex.

After discussing this problem with a couple of my colleagues, and some inspiration from a recent blog post about sshd in containers, we had a rough idea of what we wanted to experiment with: making a shell which located a user inside a container using nsenter!

In brief, we planned to write a utility which would:

  • Be invoked by the login system as a user shell (when a user sshs into the host)
  • Start up a Docker container for the user (if it wasn’t already running), with the user’s home directory mounted
  • Lastly, exec nsenter to give the user a shell inside this container

Theoretically, this could give us isolated environments, as each user would have their own network stack, process and memory namespaces, etc. If subsequent ssh sessions just enter the pre-existing container, it would look (to the user) like they had their own dedicated machine.

We thought we could hack together a prototype of this pretty quickly. Turns out we were right; with a couple of patches, and a custom nsenter version we were in business, at least the ‘terrible but kinda functional prototype’ business, perfect for our upcoming hackathon.

We decided to rewrite the prototype version in Go, which was a first for me. At the time, this was mostly an excuse to play with Go but it quickly proved to be a very sound decision. We were able to make use of some excellent libraries, like libcontainer, which did a lot of the heavy lifting for us.

dockersh can be explicitly invoked for testing, but more usually will be setup as an ssh ForceCommand, or added to /etc/shells and used as the shell for users in /etc/passwd. It reads a global config file and per user configuration files, and then sets up a container and invokes a real shell process for the user. This user shell process inside their container should be more secure than access to the host, as the container’s kernel namespaces and cgroups are applied to the new shell, in addition to dropping additional ‘capabilities’ for the process including SUID and SGID bits.

The utility we have now is just a start – I’ve already got a list of future improvements and further work (in the issue tracker), and I encourage you to have a play with the project and let us know what you think.