At Yelp, we use a service-oriented architecture to serve our web pages. This consists of a lot of frontend services, each of which is responsible for serving different pages (e.g., the search page or a business listing page).

In these frontend services, we use a couple of third-party JavaScript/CSS assets (React, Babel polyfill, etc.) to render our web pages. We chose to serve such assets using a third-party Content Delivery Network (CDN) for better performance.

In the past, if a frontend service needed to use a third-party JavaScript/CSS asset, engineers had to hard-code its CDN URL. For example:

<script
  src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.8.3/jquery.min.js"
></script>

With hundreds of engineers working at Yelp, it was difficult to ensure the following (for each third-party asset):

  • <script> or <link> tags had a subresource integrity checksum via the integrity attribute (see the section on Subresource integrity checksums below)
  • URLs used the HTTPS protocol
  • Only public CDN providers (approved by our security team) were used
  • Engineers could update to the latest versions easily

Organizing our Third-Party Assets

Here at Yelp, we’ve built our frontend services using a Python service stack, with Pyramid as our web framework and uWSGI as our web server.

We created a shared Python package, cdn_assets, for storing the URLs and subresource integrity checksums of our third-party JavaScript/CSS assets.

For each asset, we simply used a Python dictionary with the asset’s semantic version as the key. For example:

# React (facebook.github.io/react)
CDN_SCRIPT_REACT = {
    '16.8.6': CDNAsset.construct_asset(
        cdn=CDNDomain.CDNJS,
        library='react',
        version='16.8.6',
        filename='umd/react.production.min',
        filename_unminified='umd/react.development',
        extension='js',
        integrity='sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1',
        integrity_unminified='sha384-u6DTDagyAFm2JKvgGBO8jWd9YzrDzg6FuBPKWkKIg0/GVA6HM9UkSxH2rzxEJ5GF',
    ),
    '16.8.5': CDNAsset.construct_asset(
        # … similar properties for this version
    ),
    # … more versions…
}

# Babel Polyfill (babeljs.io/docs/usage/polyfill)
CDN_SCRIPT_BABEL_POLYFILL = {
    '6.23.0': CDNAsset.construct_asset(
        cdn=CDNDomain.CDNJS,
        library='babel-polyfill',
        version='6.23.0',
        filename='polyfill.min',
        filename_unminified='polyfill',
        extension='js',
        integrity='sha384-FbHUaR69a828hqWjPw4PFllFj1bvveKOTWORGkyosCw720HXy/56+2hSuQDaogMb',
        integrity_unminified='sha384-4L0QKU4TUZXBNNRtCIbt9G73L2fXYHnzgCjL65qwFxsXPvuAf1aB6D3X+LIflqu3',
    ),
    # … more versions…
}

# … more assets…

Usage

Here’s a Python code snippet which shows how the asset is included in our Yelp-Cheetah templates:

CDN_SCRIPT_REACT['16.8.6'].generate_script_tag(minified=True)
# returns <script src="https://cdnjs.cloudflare.com/ajax/libs/react/16.8.6/umd/react.production.min.js" integrity="sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1" crossorigin="anonymous"></script>

Scaffolding Infrastructure

To facilitate ease of use and maintenance, we developed some scaffolding infrastructure to:

  • Define public CDN providers (e.g., Cloudflare CDNJS, Google CDN, etc.)
  • Render minified scripts & styles in the production environment and unminified scripts & styles in the development environment
  • Create a helpful generate_script_tag method, which allows consumers of this package to easily generate an HTML <script> tag with the correct subresource integrity SHA (see the section on Comparing cryptographic hash functions below)

We made it easy for engineers to add a new version by creating a make target to calculate the integrity checksum, like so:

# Usage: make sri-hash --urls="URL1[ URL2 ... URLn]
$ make sri-hash --urls="https://cdnjs.cloudflare.com/ajax/libs/react/16.8.6/umd/react.production.min.js"
sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1

Testing

We wrote tests which iterate all versions of all assets to ensure that:

  • URLs point to a valid asset on the CDN
  • Integrity SHA checksums are correct
  • URLs begin with https:// and end with .js or .css

Here’s a snippet from one of our test files:

# `all_cdn_scripts` is a Pytest fixture; it’s not shown in this snippet.

@pytest.mark.parametrize('script', all_cdn_scripts)
def test_integrity_hashes_match(script):
    # Test that the unminified URL doesn’t error and has the right integrity hash.
    resp = requests.get(script.url_unminified)
    resp.raise_for_status()
    assert (
        'sha384-{}'.format(base64.b64encode(hashlib.sha384(resp.content).digest()).decode('utf8')) ==
        script.integrity_unminified
    )

    # Test that the minified URL doesn’t error and has the right integrity hash.
    resp = requests.get(script.url)
    resp.raise_for_status()
    assert (
        'sha384-{}'.format(base64.b64encode(hashlib.sha384(resp.content).digest()).decode('utf8')) ==
        script.integrity
    )


def test_sha384_for_all_checksums(all_cdn_scripts):
    SHA384_CHECKSUM_LENGTH = 64

    for cdn_script in all_cdn_scripts:
        assert cdn_script.integrity.startswith('sha384-')
        assert cdn_script.integrity_unminified.startswith('sha384-')

        checksum = cdn_script.integrity.replace('sha384-', '')
        assert len(checksum) == SHA384_CHECKSUM_LENGTH

        checksum = cdn_script.integrity_unminified.replace('sha384-', '')
        assert len(checksum) == SHA384_CHECKSUM_LENGTH


def test_valid_https_urls(all_cdn_scripts):
    https_url_validator = URLValidator(schemes=['https'], message='HTTPS URL validation failed')

    for cdn_script in all_cdn_scripts:
        https_url_validator(cdn_script.url)


def test_valid_script_files(all_cdn_scripts):
    for cdn_script in all_cdn_scripts:
        assert cdn_script.url.endswith('.js')


def test_minified_and_unminified_urls(all_cdn_scripts):
    for cdn_script in all_cdn_scripts:
        assert cdn_script.url.endswith('.min.js')
        assert not cdn_script.url_unminified.endswith('.min.js')

Securing the Assets That We Use

Yelp serves tens of millions of users every month. Ensuring that these users are protected should an attacker gain control of the CDN we’re using is of prime importance. That’s where subresource integrity checksums come into the picture.

Subresource Integrity Checksums

The web docs on Mozilla Developer Network define Subresource Integrity as:

A security feature that enables browsers to verify that resources they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched resource must match.

Support for subresource integrity checksum verification is achieved by adding an integrity attribute on the <script> or <link> tags. For example:

<script
  src="https://cdnjs.cloudflare.com/ajax/libs/react/16.8.6/umd/react.production.min.js"
  integrity="sha384-qn+ML/QkkJxqn4LLs1zjaKxlTg2Bl/6yU/xBTJAgxkmNGc6kMZyeskAG0a7eJBR1"
></script>

The web browser will calculate a hash from the contents of the <script> or <link> tag. It will then compare this hash with the integrity attribute’s value. If they don’t match, the browser will stop the <script> or <link> tag from executing.

Comparing Cryptographic Hash Functions

As per the Subresource Integrity (SRI) specification:

Conformant user agents must support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a request’s integrity metadata and may support additional hash functions.

Although both SHA-256 and SHA-512 are supported, we recommend using the SHA-384 cryptographic hash function for the integrity attribute. This is largely because SHA-384 is less susceptible to length extension attacks. (See github.com/w3c/webappsec — SRI: upgrade examples to sha384? and github.com/mozilla/srihash.org — Why SHA384? for further information.)

Always Using HTTPS for Loading CDN Assets

At Yelp, we’ve migrated web traffic to be served exclusively using HTTPS and HSTS. If you’re interested in learning more, check out these excellent blog posts by my colleagues: The Great HTTPS Migration and The Road To HSTS.

Protocol Relative URLs

It’s recommended to use HTTPS while serving CDN assets instead of protocol-relative URLs. Quoting the article “The Protocol-relative URL” by Paul Irish:

Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset. Allowing the snippet to request over HTTP opens the door for attacks like the recent Github Man-on-the-side attack. It’s always safe to request HTTPS assets even if your site is on HTTP, however the reverse is not true. More guidance and details in Eric Mills’ guide to CDNs & HTTPS and digitalgov.gov’s writeup on secure analytics hosting.

Acknowledgements

The work described in this blog post has been carried out and supported by numerous members of the Engineering Team here at Yelp. Particular credit goes to engineers on our Core Web Infrastructure (Webcore) team.

Become a Software Engineer at Yelp

Want to help us make even better tools for our full stack engineers?

View Job

Back to blog