Three times a year the entire engineering team at Yelp gets together and does innovative (sometimes crazy) things like launching a 3D printer into space or flying a quadcopter with human wings…err…arms or figuring out whether Cronuts are more popular than Donuts. We call this tri-annual event…Hackathon! It’s a festival celebrating innovation, creativity, and technical badassery where our smart, talented and witty engineers get 48 hours to work on anything they like. Needless to say, a relentless supply of delicious food also plays a key role in this event.
Our hackers showing off their projects in a science fair style exhibition
The 14th version of our Hackathon, which was held this past month, saw around 80 projects across all of our engineering offices, covering a wide variety of topics ranging from mining our rich dataset to developing visualization tools to building robots.
Sometimes our engineers de-stress by attempting to put together ridiculously hard monochromatic jigsaw puzzles custom created with an inside joke
Shahid C., one of our intern extraordinaires this summer, worked on a project that he calls “Yelp Boost,” a nifty visualization tool that tries to address the age-old economics question of supply and demand. Shahid echoes what sounds like a fundamental tenet of Yelponomics 101, “If we can figure out where the demand for a product greatly outweighs supply, we could recommend business owners to set up their shops in those locations in order to meet this demand and boost their sales!” To determine these supply and demand logistics, Shahid dug deep into our search logs and came up with real time visualizations that look like this:
The heat map (the light blue to intense red) represents an increasing demand for pizza, while the red dots with green halos represent pizzerias in San Francisco. You see those big red blobs with dropped pins inside them? There is a high demand for pizza there, but unfortunately, there aren’t many pizzerias nearby. Hmm…wonder what could be done about that.
Pretty cool, eh?
Have the creative engineering gears in your brain started turning? Check out our exciting product and engineering job openings at www.yelp.com/careers and apply today! Who knows, you may be showing off your killer idea at Yelp Hackathon 15.
We’re happy to announce that we are one of the first official sponsors of Women Who Code! WWCode is an organization whose goal is to help women excel in technology careers.
WWCode and Yelp started working together three years ago when the meetup group was created. We’ve hosted many of their events ranging from Ruby workshops to discussion panels including CEOs and CTOs. Since their launch, WWCode has grown to 14,000 members across 14 countries worldwide. By sponsoring their new non-profit (as of this July!), we’re excited and happy to help them achieve their goals of expanding into 50 cities worldwide by 2015 with 1 million members by 2019.
As WWCode expands, they’re looking to reach out to top technical universities around the nation in order to introduce women to engineering at a younger age. We will be partnering with them at their first pilot university, Waterloo, this fall.
“Yelp has supported most of Women Who Code’s major events over the past three years,” said Alaina Percival, WWCode CEO. “Collaborating with Yelp will be key in reaching our goal of being in 50 cities by the end of the year.”
Want to get involved? You can help support WWCode by attending one of their upcoming events listed here:
Grace Hopper Practice Talks
September 23, 2014
Doors open: 6:30PM
Women Who Code Fundraiser – Applaud Her
October 23, 2014
Doors open: 6:00PM
CSP is Awesome
Content Security Policy isn’t new, but it is so powerful that it still feels like the new hotness. The ability to add a header to HTTP responses that tightens user-agent security rules and reports on violations is really powerful. Don’t want to load scripts from third party domain? Set a CSP and don’t. Trouble with mixed content warnings on your HTTPS domain? Set a CSP and let it warn you when users are seeing mixed content. Realistically, adding new security controls to a website and a codebase as large as Yelp needs to be a gradual process. If we apply the new controls all at once, we’ll end up breaking our site in unexpected ways and that’s just not cool. Fortunately, CSP includes a reporting feature – a “lemme know what would happen, but don’t actually do it” mode. By using CSP reporting, Yelp is able to find and fix problems related to new CSP controls before they break our site.
Reading Sample CSP Report
CSP reports are JSON documents POSTed from a user’s browser to Yelp. An example report might look like:
"violated_directive": "script-src https:",
"original_policy": "report-uri https://biz.yelp.com/csp_report; default-src https:; script-src https:; style-src https:"
This report says, “I went to https://biz.yelp.com/foo but it loaded some stuff from cooladvertisement.bro over HTTP and I showed a mix content warning.” Looks like www.cooladvertisement.bro needs to get loaded over HTTPS and then all will be good.
Making Sense of CSP Reports @ Scale
It’s easy to read a single CSP report but what if you’re getting thousands of reports a minute? At that point you need to use some smart tools and work with the data to make sense of everything coming in. We wanted to reduce noise as much as possible so had to take a few steps to do that.
Get rid of malformed or malicious reports
Not all reports are created equally. Some are missing required fields and some aren’t even JSON. If you have an endpoint on your website where users can POST arbitrary data, there will be a lot of noise mixed with the signal. The first thing we do is discard any reports that aren’t well formed JSON and don’t contain the necessary keys.
Massage the reports to make them easier to aggregate
It was helpful to group similar reports and apply the Pareto principle to guide our efforts at addressing CSP reports. We take any URI in the report and chop it down to it’s domain, getting rid of the uniqueness of nonces, query params, and unique IDs, making it easier to group
Discard unhelpful reports
Surprisingly, you’ll see a whole lot of stuff that’s not really about your website when you start collecting reports. We found some good rules to discard the unhelpful data.
blocked_uri and source_file must start with http
We see loads of reports with browser specific URI schemes, stuff related to extensions or the inner workings of a browser like chromeinvoke:// or safari-extension://. Since we can’t fix these, we ignore them. source_file is an optional field in a CSP report, so we apply this rule to source_file only when it has a value.
document_uri must match the subdomain the report was sent to
If we’re looking at CSP reports that were sent to biz.yelp.com then we’re only interested in reports about documents on biz.yelp.com. All sorts of strange proxy services or client side ad injectors will land up serving a modified copy of your page and generate reports for things you can’t fix.
Retain some useful data
We don’t want to lose useful data that came in as part of the POST request, so we tack it onto the report. Info like user-agent can be super helpful in tracking down a “Oh… that’s only happening on the iPhone” issue.
"user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53",
Throw this all in a JSON log
Once we’ve got a nice, well formed JSON report with some helpful extras, we throw it into a log. Logs aggregate from our various datacenters and make themselves available as a stream for analysis tools.
Visualize, monitor, and alert for the win
The Yelp security team is a huge fan of Elasticsearch/Logstash/Kibana. Like we do with pretty much any log, we throw these CSP reports into our ELK cluster and visualize the results.
This image shows a rapid decrease in incoming CSP report volume after fixing a page that caused mixed content warnings
From there it’s easy for our engineers to view trends, drill into specific reports, and make sense of reports at scale. We’re also adding monitoring and alerting to the reports in our Elasticsearch cluster so it can let us know if report volumes rise or new issues crop up.
Give it a try
We’re making sense of CSP reports at scale and that’s super useful in monitoring and increasing web application security. We’d love to hear from you about how you’re using CSP. Let us know at firstname.lastname@example.org.
It’s been a busy past few years here at Yelp Engineering. With our 10th anniversary this year and our recent launch in Chile, we think it’s safe to say we’re on to something. But it would be foolish of us to stop here. At the end of the day, we’re engineers: we live for the fact that there are still so many challenging problems to solve, features to improve, and datasets to explore. With the size of the projects we’re tackling nowadays, our Engineering and Product Management teams need to be in constant contact to coordinate development, testing, and release. We fully embrace the rapid iterative process customary of Agile, Scrum, and XP programming, so you’ll often see a product manager and engineer hashing an idea out at one of our large built-in whiteboards or in the team’s pod. Soon enough, though, we found ourselves with schedules like this:
Coordination is incredibly important, but we also need time to actually build all those cool features we come up with. That’s why, about a year ago, we introduced Maker’s Day here at Yelp.
So what is Maker’s Day? The concept is pretty simple: meetings, interviews, and general interruptions aren’t allowed for engineers on Thursdays. Some teams even cancel standups on those days while others use them as a quick way to unblock folks so that there are fewer disruptions later on. If any questions come up, we use email instead of showing up at a person’s desk or pinging them over IM. Outside of those general guidelines, how engineers use Maker’s Day is really up to them: some make it into a long, uninterrupted coding period, others prefer it for reviewing designs and diving deep into a topic. And by the way, for those engineering managers out there, this applies to us, too.
We’re certainly not the first to come up with this idea. Back in 2009, Paul Graham, in his “Maker’s Schedule, Manager’s Schedule” post, wrote how the partners at YC Combinator were implementing the idea: “You can’t write or program well in units of an hour. That’s barely enough time to get started.” Craig Kerstiens of Heroku mentioned, as part of his How Heroku Works series, how the value of Maker’s Day had increased exponentially as the company had grown. Intel even jumped into the discussion with hard facts from their “Quiet Time” pilot. Closer to the Python Community, Daniel Greenfeld tweeted what everyone was thinking back in 2012:
So how has Maker’s Day done here at Yelp? We don’t have spreadsheets of numbers to prove its success. However, on Thursdays, you’ll notice the engineering floors are a tad quieter, and folks are eager to get to their desks and jump into whatever task they’ve lined up for that day. That’s enough for us to stick with it.
In the end, Maker’s Day was a good step, but we don’t think it’s the be-all end-all solution. Similar to our software development strategy, we’re also constantly iterating with our processes within Engineering. If you love thinking about these kinds of problems, we’re always looking for great Engineering Managers to help grow our talented team of engineers.
At Yelp we rely heavily on pre-commit hooks to find and fix common issues before changes are submitted for code review. We run our hooks before every commit to automatically point out issues like missing semicolons, whitespace problems, and testing statements in code. Automatically fixing these issues before posting code reviews allows our code reviewer to pay attention to the architecture of a change and not worry about trivial errors.
As we created more libraries and projects we recognized that sharing our pre commit hooks across projects is painful. We copied and pasted bash scripts from project to project and had to manually change the hooks to work for different project structures.
We believe that you should always use the best industry standard linters. Some of the best linters are written in languages that you do not use in your project or have installed on your machine. For example scss-lint is a linter for SCSS written in Ruby. If you’re writing a project in node you should be able to use scss-lint as a pre-commit hook without adding a Gemfile to your project or understanding how to get scss-lint installed.
Read more about pre-commit at: http://pre-commit.com