The Road To HSTS

What is HTTP Strict Transport Security?

HTTP Strict Transport Security, commonly referred to as HSTS, is a Web standard that aims to ensure all web resources off a domain are fetched over a secure transport layer. The core objective of HSTS is to protect users against passive and active network attacks. To this end, it prevents protocol downgrade attacks and blocks insecure click throughs.

From a configuration perspective, HSTS is an easy to deploy HTTP header. Its format is:

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

Unfortunately, many companies who have tried to deploy HSTS have experienced various challenges, some of which resulted in service outages. We recently deployed HSTS successfully at Yelp and would like to share what we’ve learned so that others can enjoy a quiet deployment.

Before we dive into how we deployed HSTS, let’s first explain the different components that go into making HSTS possible and how they work.

What is Transport Layer Security?

Most of the web today is built on top of HTTPS. The security of HTTPS depends on that of the Transport Layer Security (TLS) protocol. TLS is composed of two parts, the TLS Handshake protocol and the TLS Record protocol. The objective of TLS is to provide a secure channel between two communicating parties. Specifically, it provides confidentiality, integrity and authentication.

For the purpose of this blog post, we will focus only on the authentication part of TLS — the TLS Handshake. Authentication in TLS is done by validating X509 certificates. The process of authenticating a party is twofold: 1) validate that the party’s certificate is properly signed and has a valid chain of trust; 2) ensure that the hostname of the requested server matches one of the entities on the certificate.

Here’s a closer look at these two steps:

Chain of trust validation: Certificate Authorities (CAs) "stamp" all X509 certificates they issue by recording their unique identifier under an issuer field. For security purposes, root CAs do not normally issue leaf certificates (e.g., a certificate for www.yelp.com, api.yelp.com, etc.) directly. Rather, they delegate that responsibility to "intermediate" CAs. As a consequence, leaf certificates often have certificate chains of depth 3 or higher.

TLS clients (e.g., browsers) either ship with a pre-configured certificate store or use the one provided by the underlying operating system. During the TLS handshake, TLS clients try to validate that the certificate the server provided has a valid chain of trust: it was issued by a valid intermediate CA, and is anchored at a trusted root CA. Several other checks are also performed as per RFC 5280.

Hostname validation: Identity checks in TLS are done by performing hostname validation. X509 certificates may contain both fully qualified domain names (FQDNs) and wildcard domains. User Agents follow RFC 2818 when trying to match FQDNs. Wildcard domain names are matched by following the rules outlined in RFC 5280 and RFC 6125.

If either chain of trust validation or hostname validation fails, client-side software is expected to terminate the connection. Browsers often display a warning message to the user notifying her about the specific issue encountered: e.g., expired certificate, hostname mismatch, self-signed certificate, etc. The user is then given an option to “click through”, if the risk is acceptable.

Unfortunately, users often don’t quite understand the technical aspects of “[website’s] certificate is not trusted by your computer’s operating systems.” As a result, users have become accustomed to clicking through security warnings and potentially exposing themselves to malicious third parties sharing the network.

HSTS closes this gap by informing complying User Agents to terminate insecure connections. To this end, security warnings become hard failures.

Deployment Plan

It doesn’t hurt to be cautious when deploying a major change on your platform that can block all of its users or hurt your platform’s SEO. Here’s the roadmap we followed to deploy HSTS at Yelp.

Support HTTPS platform wide

Platforms interested in enforcing HSTS must first start supporting HTTPS throughout their web app. Here at Yelp, we fully migrated our platform from HTTP to HTTPS in 2016. If you haven’t done it yet, go over that first.

Towards Strict Transport Security

Step 1: Set but do not enforce HSTS

HSTS policies can have a variable time to live (TTL) period, expressed in seconds via the max-age directive. When complying User-Agents observe an HSTS header, they store and enforce the policy until it expires.

We found that deploying HSTS with max-age=0 can help us evaluate the potential impact on the platform without causing an outage. In fact, max-age=0 is used to clear problematic HSTS policies (in case you need to revert a previously set large TTL).

Once we set the header, we tailed our server logs to ensure we have full understanding of where it pops up. This information was very valuable and informed us what relative distribution of traffic across domains may potentially be impacted by the change.

Yelp uses a complex routing infrastructure. As a result, setting the HSTS header in our monolith caused the header to pop up in responses to hostnames outside of yelp.<tld>’s range. For instance, we observed several internal apps that were sending single-part hostnames (e.g., “localhost”, “internal-api”, etc.) in the host header and failing to properly process the HSTS header altogether.

Step 2: Enforce HSTS using a small TTL

Sending HSTS with max-age=0 does not help identify whether or not any of the apps served off the platform’s subdomains will become inaccessible once the TTL is increased. To this end, it’s important that the first rollout of the header with a non-zero TTL uses a sufficiently small time window. In our case, we tried it out with max-age=60 for a week. No new problems were identified so we were able to move to the next stage.

Step 3: Clear the HSTS TTL (max-age=0)

Clearing the HSTS header between steps 2 and 4 is necessary to prevent potential outages. These steps help determine whether or not apps mapped to subdomains off the main domain fail to operate correctly over HTTPS and prevents app misconfigurations (e.g., invalid certificates) from turning into a firestorm.

Step 4: Add includeSubDomains

The HSTS includeSubDomains directive informs complying User Agents to apply the policy to all subdomains of the domain where the policy was observed. For instance, if the browser sees the HSTS header with the includeSubDomains directive when visiting yelp.com, it will assume that all subdomains of yelp.com must also be accessed over HTTPS (e.g., app1.yelp.com, app2.yelp.com, etc.).

Unfortunately, most web apps nowadays are accessed from hosts such as www.<platform>.<tld>. Users often sign up on signup.<platform>.<tld> and login from login.<platform>.<tld> and never land on the platform’s base domain (<platform>.<tld>) in their daily usage of the web app. Consequently, the includeSubDomains directive cannot live up to its full potential without a little bit of extra help.

The extra help comes in the form of a tracking pixel. Tracking pixels are often used by third parties to track users on the Internet. In the context of HSTS, a tracking pixel can be used to bootstrap the HSTS policy. When strategically placed, a tracking pixel can ensure that all visitors of the platform make a request to the platform’s base domain (on the same TLD as the visited domain) and thus get the HSTS policy for all subdomains of the platform.

Step 5: Enforce HSTS with includeSubDomains using a small TTL

Similar to step 2, we deployed HSTS using a small TTL. The difference between this step and step 2 is the presence of includeSubDomains. The objective here is the same: if there is an issue with our HSTS deployment, we can easily revert.

This configuration helped us uncover a few issues we discuss later on in this blog post. Fixing these issues was the most time-consuming part of this project.

Step 6: Enforce HSTS with includeSubDomains using a large TTL

Once we ensured that HSTS with includeSubDomains and a small max-age wouldn’t break our apps, we increased the max-age: first to one week, then to one month. We then followed up with a six-month policy and ultimately set the expiration time to one year.

Step 7: Express intention to ‘preload’

Most modern browsers ship with a preconfigured list of known HSTS hosts. Websites request inclusion in the list by sending a preload directive with their HSTS policy.

Setting up the preload directive in the HSTS header took the least amount of time. We appended the directive to our header and confirmed in our logs that all domains we wanted to cover were indeed covered. No breakages were observed.

Step 8: Submit to the preload list

The last step in fully deploying HSTS platform-wide was to submit all our TLDs to the HSTS preload list. Unfortunately, we were in for an unpleasant surprise: base domains cannot be submitted to the HSTS preload list if they have more than 3 redirects. We had a few international domains that needed a touchup to qualify for the preload list: we improved their redirect chains by optimizing our geocoding methodology.

Major Speed Bumps

Many deployments come with a few gotchas and the deployment of HSTS is no exception. Just ask the companies who were too quick to deploy it and realized that they had to revert. Some got even as far as being included in the HSTS preload list, before they realized that they had a problem.

Next, we’ll discuss a few major problems we had to overcome at different stages of the HSTS deployment process. Some of these issues were due to bugs left unnoticed for years; others, required new development.

Vanity subdomains: [<lang>.]vanity.yelp.tld

Yelpers can reserve their very own, custom subdomain on the Yelp platform. These subdomains can then be used as shortcuts to users’ profile pages.

Vanity subdomains follow the pattern: http://<vanity>.yelp.<tld>, where <vanity> is the user-chosen prefix and <tld> is determined by the country Yelp is browsed from.

Some countries have multiple official languages. Yelp provides multi-language support in these countries by prepending the language short code to the hostname. For instance, fr.yelp.be is used to server our French-speaking users in Belgium, whereas en.yelp.be is used for our English-speaking userbase there.

When a user navigates to a vanity subdomain, Yelp determines user’s language preference based on a preset language cookie. The language short code is then prepended to the vanity subdomain, and a redirect is issued. That is, <vanity>.yelp.<tld> is redirected to <lang>.<vanity>.yelp.<tld>. The latter then redirects to the user’s profile page: [<lang>|www].yelp.<tld>/user_details?userid=<user_id>.

Vanity subdomains were introduced on the Yelp platform in 2012. Back then, all pages, other than the login, signup and password reset ones were served over HTTP. The design of this feature seemed reasonable and was left untouched for years.

Once we deployed HSTS, we saw a slight drop in traffic from international vanity subdomains. Concurrently, we got a report via our bug bounty program notifying us that some international vanity subdomains were non-navigable. We quickly dug into the problem and determined that international vanity subdomains with language short code prepended to the hostname could not pass certificate validation.

Recall how hostname validation works: hostnames must either exactly match a whitelisted entry or loosely match a wildcard one. For example, if the user is trying to access <vanity>.yelp.com, then either <vanity>.yelp.com or *.yelp.com needs to be on the X509 certificate for the certificate validation to be successful.

Note that neither RFC-2818, RFC 5280, RFC 6125, nor browsers allow double-nested wildcards (e.g., *.*.yelp.com is not supported).

Consequently, the redirect chains for international vanity subdomains had to change. To this end, we could no longer prepend the language short code to the hostname and redirects had to go from <vanity>.yelp.<tld> straight to [<lang>|www].yelp.<tld>/user_details?userid=<user_id>.

We fixed the redirect chains by reordering the redirects as described above. The change was completely transparent to the users. After the change, all vanity subdomains could be served over HTTPS.

Data centers hostnames

We have a set of reserved subdomains on which we serve content directly from a given datacenter. These domains aren’t meant for public access but are used for debugging. We also change our datacenters’ hostnames as our infrastructure evolves. Consequently, we do not list our datacenters’ hostnames on our prod TLS certificates.

Prior to enabling HSTS, devs used to either navigate to our datacenters’ hostnames over HTTP or accept the certificate warning and access the hostnames over HTTPS. HSTS removed the former and made the latter impossible.

To solve this problem, we could either add all our datacenters’ hostnames to Yelp’s TLS certificates or serve a completely different certificate on these domains. All of these domains are only used internally and never meant to serve external traffic so we opted for serving a certificate signed by our internal CA.

Dev Playgrounds

Engineers build new features in local playgrounds — these are dev “copies” of Yelp that run on subdomains based off the main yelp domain.

Dev playgrounds are not routable over the Internet, nor is there any real users’ data in dev playgrounds. As a result, we used to serve dev playground traffic over HTTP. The HSTS deployment made that impossible. We had to make all playground traffic fully HTTPS compatible.

To resolve this issue, we started signing dev playground certificates using our internal CA. Certificates were then issued each time a developer ran the “make” command to prep their playground.

1-Offs

Historically, Yelp supported several custom subdomains such as mail.app.yelp.com and calendar.app.yelp.com. Yelp IT used those to create and preconfigure bookmarks to commonly used apps, e.g., Mail, Calendar, etc. Providers could then be changed transparently. These bookmarks were part of our legacy infrastructure. They were preconfigured to use HTTP.

When users click on any of the custom bookmarks, Yelp issues a 301 redirect to the service provider; the browser follows the redirect and the user lands on the desired page.

HSTS changed this perspective.

The core objective of HSTS is to prevent protocol downgrade attacks. That is, if the HSTS header with the includeSubDomains directive is served on the base domain (e.g., yelp.com), then content from all subdomains must be fetched over the HTTPS protocol. Furthermore, certificate validation must be successful for the connection to go through.

Browsers validate the certificate they receive against the hostname the user tries to access. In the example here, the browser tries to match mail.app.yelp.com against either the common name or one of the subject alternative names listed on the X509 certificate it receives at connection establishment time. Yelp does not have {mail|calendar}.app.yelp.com on the X509 certificate, and thus no match can be found. The browser rejects the connection and blocks the user from accessing the mail/calendar/etc. app.

The possible solutions to the problem here were: (1) add the troubled hostnames to our X509 certificate, (2) serve a certificate signed by our internal CA for all *.app.yelp.com, or (3) remove all legacy bookmarks.

We went with the last option: we removed all legacy bookmarks from employees’ machines. For those who wanted to have a bookmark they can click on and go to common apps, such as Gmail and Google Calendar, we set up direct links to those apps.

Browser Quirks:

While testing our HSTS deployment, we observed a quirk in Firefox: If the browser is set to “Never remember history”, the configuration is auto updated to:

Surprisingly, the “Clear history when Firefox closes” is not checked on this screen. Yet, HSTS pins are cleared when the browser is closed.

Here is the sequence of steps we followed (last tested on Firefox 54.0.1):

User sets “Never remember history” in Firefox.
HSTS is deployed on yelp.com. includeSubDomains is set.
User navigates to yelp.com (browser now knows that HSTS should be enforced on any subdomain of yelp.com).
User tries to visit http://mail.app.yelp.com.
4a. Browser rewrites the URL to https://mail.app.yelp.com (due to HSTS).
4b. Browser fails to validate the certificate and issues an HSTS error.
4c. User cannot access mail.app.yelp.com.
User closes the browser.
5a. Browser clears the HSTS directive.

Later on:

User opens the browser and tries to access http://mail.app.yelp.com.
1a. Browser doesn’t remember the HSTS directive; access is successful.

While this behavior is not necessarily a bug, it’s a bit unexpected. So keep this in mind while testing HSTS deployments.

Catch22: Certificate Exceptions

Browsers issue TLS errors whenever they encounter a certificate that cannot be validated for the website the user is trying to access. Without HSTS, users can bypass these errors temporarily or permanently. Temporary exceptions are honored for the duration of the session. Permanent exceptions are honored until they are manually unset.

TLS error bypasses make users vulnerable to MITM attacks. To prevent such attacks, HSTS removes the TLS error bypass option: certificate errors are treated as hard failures.

HSTS does respect pre-approved certificate exceptions. For instance, if a user navigates to a website that serves a bad certificate, and the user accepts the TLS warning before HSTS is deployed, then the browser will validate the certificate with or without HSTS; users who are under constant MITM cannot be protected by rolling out HSTS. This is known as the bootstrap MITM vulnerability.

Conclusion

Deploying HSTS at Yelp was a fun exercise. We got to learn a lot about what’s under the hood, we identified and fixed several bugs, and removed code debt in several features along the way. We hope that by sharing our knowledge we will make everyone else’s HSTS deployment go smoothly.

Here’s to a more secure Yelp!

Back to blog

Yelp

Engineering