Documentation is something that many of us in software and site reliability engineering struggle with – even if we recognize its importance, it can still be a struggle to write it consistently and to write it well. While we in Yelp’s Production Engineering group are no different, over the last few quarters we’ve engaged in a concerted effort to do something about it.

One of the first steps towards changing this process was developing our documentation style guide, something that started out as a Hackathon project late last year. I spoke about it when I was giving my talk on documentation at SRECon EMEA in August, and afterwards, a number of people reached out to ask if they could have a copy.

While what we’re sharing today isn’t our exact style guide – we’ve trimmed out some of the specifics that aren’t really relevant, done a bit of rewording for a more general audience, and added some annotations – it’s essentially the one we’ve been using since the start of this year, with the caveat that it’s a living document and continues to be refined. While this may not be perfect for every team (both at Yelp and elsewhere), it’s helped us raise the bar on our own documentation and provides an example for others to follow.

So, without further ado, here’s the…

# Yelp Production Engineering Documentation Style Guide

To make sure we provide consistently good documentation that’s also easy to find, this guide lays out a number of standards to use when writing your documentation.

One of the most important parts of writing documentation is making sure it’s discoverable. This is not the same as searchable; discoverable means that someone should be able to find the document without knowing exactly what they’re looking for. For this reason, Production Engineering has decided that our Wiki will be our primary repository for documentation. Here are a few methods you can use within this framework to make your documentation more discoverable.

We also provide a more specific doc for the steps to take when migrating docs from elsewhere to our Wiki.

• Create a portal. If you’re working on a large project or collecting projects for a team, you can have a portal, like the Production Engineering Home Page, that gives easy access to the most frequently used or most important documents of that project/team. When you do this, try to make sure that the portal is no more than one page, and that you’ve arranged it so that the most essential docs are towards the top of the page (check out the inverted pyramid style).
• Group similar documents together. Let’s say you have a new service and need to create a topical guide, how-to, and runbook for it. One way of grouping these so that the connection between them is obvious is to make the how-to and runbook child documents of the topical guide. These should also include links so that it’s easy to navigate between them. In general, the best groupings are based on subject, rather than the type of document. If I’m trying to solve a problem with Puppet, for instance, and I’m looking at its runbook, the other documents I’m probably most interested in looking at next are other Puppet documents.
• Keep titles short and descriptive. If you’re writing a runbook for the “Helper” service logs, calling it “Runbook: Helper Logs” is much better than “Runbook: Helper 1” or “Runbook: Logs” (even if it’s connected to the Helper service document). Documentation doesn’t necessarily need to be dry and boring, but it should be functional, first and foremost. So, try to avoid giving it a name like “Help! What do I do with Helper Logs?” or something else where the intent isn’t clear.
• Avoid making titles too generic. If you name your page “Docker,” chances are someone else will choose a similar name, and they’ll all show up in the same search. “Docker for Production Engineering” or “Docker for Service X” is a much narrower title. Also, keep in mind that links will be organized in the left side table of contents alphabetically, so try to make your first word (or the first word after the “How-to:” or “Runbook:” prefix) specific; i.e., “How-to: Load Balancer Configuration” rather than “Howto: Configuring Load Balancers.”

## Types of Documentation

Within Production Engineering, we have a number of different categories for documentation. Each category serves a different purpose and sometimes, a different audience. This determines what should be in the document; regardless of audience and content, all documentation should be in the Production Engineering Wiki space, if at all possible.

Topical guides provide an overview of how a system or service works, as well as why certain technical decisions were made. The goal of a topical guide is to give the reader a thorough understanding of the topic at hand. They should come away feeling confident that they understand how the system or service functions and how to further investigate any issues that may fall outside the scope of a how-to or runbook. Configuration files, important directories, Puppet modules, and other components of the service should be discussed (or at least mentioned), as well as any upstream or downstream dependencies. Any complex behaviors should be demonstrated with concrete examples of how the system would behave under various conditions. Diagrams can be helpful, but keep in mind that adding any sort of graphics adds to the load required to revise the document, so use them sparingly. In general, these should be on the Wiki (though the initial draft can be done in Google Docs, which has a better collaborative model; this should then be deleted after the final version is put into the Wiki to avoid confusion). For open-sourced projects, this documentation may be elsewhere (such as readthedocs.org).

Runbooks describe how to diagnose and remediate issues with a system or service. They should seek to address specific questions: How do I diagnose an issue with this service? What does this alert mean and how do I fix it? When organizing the runbook, follow the inverted pyramid style, placing the most common and/or important questions at the top of the page, and avoid making the runbook too long; it should be ~2-3 pages at the most. If it needs to be longer, find a way to break it up, but try to keep related topics together. Do not include long explanations for each action, a sentence or two will do. For anything longer, link to another, more comprehensive technical document. If you’re addressing specific alerts, make sure to include their names in the document so that they’re easily searchable. Specific command lines and expected outputs are good to include in these; avoid screenshots and other large graphics which can make the document too long or disjointed. The title for all Runbooks should start with “Runbook:,” as in “Runbook: Fixing Docs.”

Note that How-tos and Runbooks are both more specific versions of the “runbook” type document referred to in the SRECon talk that inspired this post.

### A Note on How-tos and Runbooks

One consideration when writing How-tos and Runbooks is that they should be seen as the first step on the route to automating these processes. Because of this, the more specifics you can include and the more explicit steps you can define, the easier it will be to automate the process.

For more on this, see Tom Limoncelli’s ACM Queue article “Manual Work Is A Bug”.

## Writing Documentation

### General Writing Tips

#### Command Lines

Especially when writing runbooks and how-tos, you should include exact command lines that people can actually use. When writing example command lines, you should be sure of two things: that they actually work and that they are benign. The first is self-explanatory, the second means that if someone takes the command line and cuts-and-pastes it into a terminal, it will not cause an unwanted behavior. For instance, if you were writing an example of a command intended to delete a user account, you’d want to make sure your example uses a nonexistent user so that a cut-and-pasted action would not impact any real users.

When adding command lines to documents, they should be added in {code} blocks, like so:

\$ /usr/bin/do_the_thing -o now


Linking to other docs is one way you can keep runbooks and how-tos short. However, in order to avoid sending someone down an ever-expanding link hole, try to give a one or two sentence summary of the relevant material in the original doc so that they don’t need to reference another one. In addition, when you link to another document, try to use the title of the document as your link.

#### Using Graphics

In general, you want to use graphics sparingly. They take up a lot of space and can make your document longer than necessary. If you do use them, make sure they clearly show what you’re trying to illustrate, and include alt-text for accessibility. You could also consider providing a thumbnail or link to the graphic, which would provide a way for people to see the resource without impacting the surrounding text as much. If you create a diagram, if possible, attach its source to the document you’ve included it in so that it can be easily edited later on.

### Spelling, Grammar, and Language

In general, when it comes to basic spelling, grammar, and language, we follow the Yelp Brand Style Guide.

This is an internal style guide written by our Marketing department copywriters that tackles many common language issues, as well as how to use specific Yelp-branded terms. Your organization probably has one too!

#### Acronyms

When using acronyms, be sure that its first appearance is in expanded form, so that new readers understand what it means.

#### Jargon

Be very careful about using company-specific jargon; keep in mind that your documentation may be the first thing a new hire reads on the topic. If you can use more general or industry-wide language, you should. If you do use company-specific jargon, make sure its meaning is clear. (That doesn’t mean just linking to a glossary.)

## Documentation Resources

This section might not seem that important, but people really like having a place to go to learn more about things, and it can be really helpful for gathering momentum – especially having a dedicated chat channel.

Having a specific Slack or company chat channel for documentation discussion is also a helpful tool to ensure everyone understands company best practices for documentation.

Articles:

Talks:

### Site Reliability Engineer (SRE)

Feel strongly about reliability? So do we. Check out the Site Reliability Engineer positions on our careers page.

View Job

Back to blog