Organizing and Securing Third-Party CDN Assets at Yelp
-
Rishabh Rao, Software Engineer
- Nov 20, 2019
At Yelp, we use a service-oriented architecture to serve our web pages. This consists of a lot of frontend services, each of which is responsible for serving different pages (e.g., the search page or a business listing page).
In these frontend services, we use a couple of third-party JavaScript/CSS assets (React, Babel polyfill, etc.) to render our web pages. We chose to serve such assets using a third-party Content Delivery Network (CDN) for better performance.
In the past, if a frontend service needed to use a third-party JavaScript/CSS asset, engineers had to hard-code its CDN URL. For example:
<script
src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.8.3/jquery.min.js"
></script>
With hundreds of engineers working at Yelp, it was difficult to ensure the following (for each third-party asset):
-
<script>
or<link>
tags had a subresource integrity checksum via theintegrity
attribute (see the section on Subresource integrity checksums below) - URLs used the HTTPS protocol
- Only public CDN providers (approved by our security team) were used
- Engineers could update to the latest versions easily
Organizing our Third-Party Assets
Here at Yelp, we’ve built our frontend services using a Python service stack, with Pyramid as our web framework and uWSGI as our web server.
We created a shared Python package, cdn_assets
, for storing the URLs and subresource integrity checksums of our third-party JavaScript/CSS assets.
For each asset, we simply used a Python dictionary with the asset’s semantic version as the key. For example:
# React (facebook.github.io/react)
CDN_SCRIPT_REACT = {
'16.8.6': CDNAsset.construct_asset(
cdn=CDNDomain.CDNJS,
library='react',
version='16.8.6',
filename='umd/react.production.min',
filename_unminified='umd/react.development',
extension='js',
integrity='sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1',
integrity_unminified='sha384-u6DTDagyAFm2JKvgGBO8jWd9YzrDzg6FuBPKWkKIg0/GVA6HM9UkSxH2rzxEJ5GF',
),
'16.8.5': CDNAsset.construct_asset(
# … similar properties for this version
),
# … more versions…
}
# Babel Polyfill (babeljs.io/docs/usage/polyfill)
CDN_SCRIPT_BABEL_POLYFILL = {
'6.23.0': CDNAsset.construct_asset(
cdn=CDNDomain.CDNJS,
library='babel-polyfill',
version='6.23.0',
filename='polyfill.min',
filename_unminified='polyfill',
extension='js',
integrity='sha384-FbHUaR69a828hqWjPw4PFllFj1bvveKOTWORGkyosCw720HXy/56+2hSuQDaogMb',
integrity_unminified='sha384-4L0QKU4TUZXBNNRtCIbt9G73L2fXYHnzgCjL65qwFxsXPvuAf1aB6D3X+LIflqu3',
),
# … more versions…
}
# … more assets…
Usage
Here’s a Python code snippet which shows how the asset is included in our Yelp-Cheetah templates:
CDN_SCRIPT_REACT['16.8.6'].generate_script_tag(minified=True)
# returns <script src="https://cdnjs.cloudflare.com/ajax/libs/react/16.8.6/umd/react.production.min.js" integrity="sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1" crossorigin="anonymous"></script>
Scaffolding Infrastructure
To facilitate ease of use and maintenance, we developed some scaffolding infrastructure to:
- Define public CDN providers (e.g., Cloudflare CDNJS, Google CDN, etc.)
- Render minified scripts & styles in the production environment and unminified scripts & styles in the development environment
- Create a helpful
generate_script_tag
method, which allows consumers of this package to easily generate an HTML<script>
tag with the correct subresource integrity SHA (see the section on Comparing cryptographic hash functions below)
We made it easy for engineers to add a new version by creating a make
target to calculate the integrity checksum, like so:
# Usage: make sri-hash --urls="URL1[ URL2 ... URLn]
$ make sri-hash --urls="https://cdnjs.cloudflare.com/ajax/libs/react/16.8.6/umd/react.production.min.js"
sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1
Testing
We wrote tests which iterate all versions of all assets to ensure that:
- URLs point to a valid asset on the CDN
- Integrity SHA checksums are correct
- URLs begin with
https://
and end with.js
or.css
Here’s a snippet from one of our test files:
# `all_cdn_scripts` is a Pytest fixture; it’s not shown in this snippet.
@pytest.mark.parametrize('script', all_cdn_scripts)
def test_integrity_hashes_match(script):
# Test that the unminified URL doesn’t error and has the right integrity hash.
resp = requests.get(script.url_unminified)
resp.raise_for_status()
assert (
'sha384-{}'.format(base64.b64encode(hashlib.sha384(resp.content).digest()).decode('utf8')) ==
script.integrity_unminified
)
# Test that the minified URL doesn’t error and has the right integrity hash.
resp = requests.get(script.url)
resp.raise_for_status()
assert (
'sha384-{}'.format(base64.b64encode(hashlib.sha384(resp.content).digest()).decode('utf8')) ==
script.integrity
)
def test_sha384_for_all_checksums(all_cdn_scripts):
SHA384_CHECKSUM_LENGTH = 64
for cdn_script in all_cdn_scripts:
assert cdn_script.integrity.startswith('sha384-')
assert cdn_script.integrity_unminified.startswith('sha384-')
checksum = cdn_script.integrity.replace('sha384-', '')
assert len(checksum) == SHA384_CHECKSUM_LENGTH
checksum = cdn_script.integrity_unminified.replace('sha384-', '')
assert len(checksum) == SHA384_CHECKSUM_LENGTH
def test_valid_https_urls(all_cdn_scripts):
https_url_validator = URLValidator(schemes=['https'], message='HTTPS URL validation failed')
for cdn_script in all_cdn_scripts:
https_url_validator(cdn_script.url)
def test_valid_script_files(all_cdn_scripts):
for cdn_script in all_cdn_scripts:
assert cdn_script.url.endswith('.js')
def test_minified_and_unminified_urls(all_cdn_scripts):
for cdn_script in all_cdn_scripts:
assert cdn_script.url.endswith('.min.js')
assert not cdn_script.url_unminified.endswith('.min.js')
Securing the Assets That We Use
Yelp serves tens of millions of users every month. Ensuring that these users are protected should an attacker gain control of the CDN we’re using is of prime importance. That’s where subresource integrity checksums come into the picture.
Subresource Integrity Checksums
The web docs on Mozilla Developer Network define Subresource Integrity as:
A security feature that enables browsers to verify that resources they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched resource must match.
Support for subresource integrity checksum verification is achieved by adding an integrity
attribute on the <script>
or <link>
tags. For example:
<script
src="https://cdnjs.cloudflare.com/ajax/libs/react/16.8.6/umd/react.production.min.js"
integrity="sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1"
></script>
The web browser will calculate a hash from the contents of the <script>
or <link>
tag. It will then compare this hash with the integrity
attribute’s value. If they don’t match, the browser will stop the <script>
or <link>
tag from executing.
Comparing Cryptographic Hash Functions
As per the Subresource Integrity (SRI) specification:
Conformant user agents must support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a request’s integrity metadata and may support additional hash functions.
Although both SHA-256 and SHA-512 are supported, we recommend using the SHA-384 cryptographic hash function for the integrity attribute. This is largely because SHA-384 is less susceptible to length extension attacks. (See github.com/w3c/webappsec — SRI: upgrade examples to sha384? and github.com/mozilla/srihash.org — Why SHA384? for further information.)
Always Using HTTPS for Loading CDN Assets
At Yelp, we’ve migrated web traffic to be served exclusively using HTTPS and HSTS. If you’re interested in learning more, check out these excellent blog posts by my colleagues: The Great HTTPS Migration and The Road To HSTS.
Protocol Relative URLs
It’s recommended to use HTTPS while serving CDN assets instead of protocol-relative URLs. Quoting the article “The Protocol-relative URL” by Paul Irish:
Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset. Allowing the snippet to request over HTTP opens the door for attacks like the recent Github Man-on-the-side attack. It’s always safe to request HTTPS assets even if your site is on HTTP, however the reverse is not true. More guidance and details in Eric Mills’ guide to CDNs & HTTPS and digitalgov.gov’s writeup on secure analytics hosting.
Acknowledgements
The work described in this blog post has been carried out and supported by numerous members of the Engineering Team here at Yelp. Particular credit goes to engineers on our Core Web Infrastructure (Webcore) team.
Become a Software Engineer at Yelp
Want to help us make even better tools for our full stack engineers?
View Job