The Great HTTPS Migration
Andrew E., Technical Lead
- Sep 15, 2016
Yelp is now entirely on HTTPS! While several pages have been secured for quite some time (we began securing pages with sensitive information like passwords, credit card numbers, and even the reviews you submit, in 2008), we’ve finally made the transition to using TLS across the entirety of our website. To some, this will sound like quite the accomplishment while others may wonder why it took until mid-2016 to complete this migration.
Brief History of HTTPS
Netscape created SSL in 1994 and by 2000, TLS became the default encryption protocol and the modern HTTPS spec was created. But it wasn’t until around 2010 that HTTPS began to gain traction on more than just login and payment pages. Facebook, for example, implemented HTTPS as an option in 2011 and by mid-2013 it was the default setting. Google added HTTPS to search as early as 2010 and continued adding it to their other services over the following years. By 2014 Google announced that HTTPS would be a minor factor in their search ranking algorithm.
Advantages to Securing all Pages
Yelp has always cared deeply about its users’ security. A core value at Yelp, encompassing more than just web security and borrowed from journalism, is “protect the source.” By serving all content over HTTPS, we can ensure that wherever and whenever action is taken on Yelp, it’s being done in the context of a TLS-secured environment.
Additionally, with all pages on HTTPS, users would be able to login or submit reviews from any page, not just dedicated and secured pages for those specific actions. And since HTTPS sites don’t send referrers to HTTP pages by default, we’d actually see more of our referrers by transitioning. Even with the advantages there were a few things that prevented us from rolling out HTTPS across the entirety of the site.
While we knew the number of requests would increase, we didn’t know to what extent or for how long. Because we wanted to monitor any changes caused by the migration and because of the way our rollout toggle was set up (more on that below), we actually redirected at the server level where we could have more detailed logging, instead of at the load balancer. As a result, even increasing 301s could have a significant impact on operations load.
The impact on traffic from search engines when suddenly issuing 301s for almost every URL in their indexes was also unclear. While Google now has official guidance that supports full site or per-URL transitions, it was unclear at the time what rollout strategies were supported by search engines.
Despite the fact that we already supported TLS on a technical level, rolling it out globally would still take a tremendous amount of developer effort. We had no tests written for the transition and we would need to ensure all XHR requests, internal links, assets, metadata and more were all fully transitioned at the same time as well. Complicating matters further, issuing permanent redirects made it virtually impossible to roll back the changes if something went wrong.
The only real technical blocker, however, was our use of display advertisements, which didn’t universally support HTTPS and as a result securing all pages would have had a meaningful impact on revenue. In late 2015 however, we ended our display advertising business, removing this barrier entirely, and so it became time to take the leap and make the transition.
What It Took
We decided to roll out HTTPS to logged-in users first, since doing so wouldn’t affect search engine crawling. Ensuring we hadn’t overlooked any mixed content or HTTP endpoints, we’d continue on a per-TLD basis to avoid any potential search engine issues with having a domain sharded across the two protocols. We chose to start in Canada (a good proxy to the US market, but on a smaller scale) and continued from there once we had more information on the impact.
In order to be sure we had converted everything correctly, we decided to take a test driven development approach. We wrote tests to catch, among other things, any mixed content warnings or HTTP links on the page. These tests then allowed us to analyze failures to determine what still needed to be converted. We then migrated page by page, shipping the new code to production behind a toggle that served the HTTPS pages only to logged in users.
It was important to make sure that changes were roll forward safe. When the changes were deployed, several users still had stale pages (with HTTP forms and links) and so the new form post and XHR endpoints needed to support HTTP urls for a period of time after the flip. Our testing strategy allowed us to catch the few mixed content endpoints that our tests missed before beginning the bigger roll out to all our users.
Flipping Canada in March
At Yelp, we have a variety of tools to monitor both user traffic and search engine crawl in real time. Within moments of enabling for logged out users, we saw many of the changes we expected. Overall requests to the site increased 50% (and more than doubled for non-XHR requests). Some of that increase was from our users coming into the site with 301s, and a lot of it was Googlebot and Applebot immediately increasing their crawl rates. Crawl on HTTPS began only a minute after the flip, and for weeks we saw meaningful crawl to both HTTP and HTTPS urls (graph 2 below shows this for the US, which had similar crawl patterns). But the most interesting thing we learned was how long these changes lasted. It actually took about two weeks for bot traffic to peak. Google also updated its index slowly, with about half of our incoming traffic from Google still being directed to HTTP a month after the flip.
Still, the transition had been a success. We saw no hit to our traffic from search engines, no mixed content warnings, and everything seemed to be pointing in a positive direction. The only real change in our initial plan was the amount of time between country rollouts, as the increased traffic took a little longer to reach equilibrium than we expected. It took us four weeks before we felt confident in migrating the next set of countries.
Flipping the US in August
By June we had completed the rollout to all countries except the US and the results had been more or less the same as with Canada. Because the US constitutes the majority of our traffic, we wanted to know how long the increased server load would last since search engines would be crawling every link resulting in millions of 301 redirects. As a result, we waited a couple months until bot crawl stabilized and users from Google were largely being directed to HTTPS pages receiving 200s before launching.
When we did make the change, we were pleasantly surprised by the results. While Google was crawling us a bit more than they had when we rolled out the other TLDs, they were also updating their index far faster. Within only a few days, the majority of users were being referred straight to HTTPS (graph 1 below), and we had only a fraction of the 301s we’d seen in the previous stages.
Most importantly, we saw almost no change in search engine traffic whatsoever. It’s still early, and we will be monitoring this closely in the future, but search engines did not seem to either penalize or reward us at all for this mass migration. This is obviously good news for anyone considering migrating that receives lots of traffic from search engines. This also means that if you’re tempted to make the change solely because Google announced that they take it into account in their search algorithm, it would probably be wise to temper your expectations a bit at this point in time.
Armed with this knowledge, Google’s guidance on migrating per-URL, and with overall requests increasing only about 150%, we probably could have skipped the toggle and tempered rollout and instead migrated the entire site either all at once, or as we completed work.
We did have some fallout from the migration as well. Due to the way we had set up our page cache headers, most of our HTTP pages were being cached by browsers whereas our HTTPS pages were not. We overlooked this at first, and saw some discrepancies in some internal metrics as a result. Still, outside of a few alerts being set off, there wasn’t really anything wrong.
Additionally, since we were now HTTPS, we were no longer sending referrers to HTTP pages by default, and needed to implement a new way to send information to some of our businesses and partners to let them know what traffic was referred to them by Yelp (this can be done easily with a meta referrer tag).
Nevertheless, we’re happy with the result, and proud to say that we are fully HTTPS at Yelp!