One of Yelp’s core values is “play well with others.”  So it’s no surprise that Yelp thrives with open source projects written by others, and gives back by sharing projects of our own.  That’s why I’m excited to share this post by the manager of our Infrastructure team, Oliver N. (or as he’s known around the office, “BigO”), which adds to our library of open source projects.


How do you know if your website slows down as a result of a code push?  How do you keep tabs on the performance of your most important endpoints?  How do you know if your error rates spike, or what their baselines are?  If you’re not actively using it, how do you even know your website is serving traffic?  For Yelp’s Infrastructure team, the answer is an emphatic “GRAPHS”.  Tasked with keeping the site up and running smoothly, we rely heavily on graphing the data from a variety of real-time metric systems.  We keep these graphs open on our work computers as well as splashed across large LCDs in our office, and they communicate to us the heartbeat of a system that serves approximately 78 million uniques per month.  Today we are releasing the home-grown tool we use to navigate, explore, annotate and graph these time series metrics on github.  Meet Firefly:

Firefly is not a metric collection system itself.  It is a pluggable front-end for exploring time series metrics stored anywhere, combining these metrics into graphs, combining those graphs into dashboards, and sharing those dashboards with colleagues.  It’s designed to scale across datacenters and to bring together data from disparate sources.  It supports automatic graph titles and legends, as well as highly configurable graph annotations (vertical bars to record particular events). It has a stateful UI based on HTML5 pushState to allow easy sharing of dashboards.  Its server is built in Python on top of the Tornado framework, and its front-end is built in JavaScript atop the fantastic D3 graphing engine.

Firefly consists of a Data Server and a UI Server.  The Data Server can run in multiple datacenters and provides the primary data interface - responding to requests that essentially translate as “give me the time series data for these particular metrics over this particular time window.”  It is the abstraction between a unified UI and a potential myriad of backends, be they Ganglia RRDs, HyperTable, MySQL, Redis or any other data source.  Each distinct data source is described as a (surprise) DataSource subclass, which exposes the standard interface methods of list_path(path), data(sources, start, end), legend(sources), and title(sources).  The UI server configures and serves up the HTML and JavaScript base interface, and handles the URL minification/expansion that underlies the stateful pushState features.

Firefly has a ton of features, and has been super useful to us.  We’re going to keep expanding those features and are also really interested in seeing what uses the community can find for this tool.  We’re only shipping it with a DataSource configured for Ganglia right now, but adding new sources is designed to be easy and over time we’ll be looking to release some more parts of this system.  Fork Firefly on github, give it and shot and let us know what you think!

Back to blog