serve static files via Cloudfront

How to serve static files via Cloudfront & private media files via S3 in Django

serve static files via Cloudfront

This article introduces how static files can be served with Cloudfront while (possibly private) media files are served from S3 directly when deploying a django application on an AWS stack. Now look into how we can serve static files via Cloudfront & private media files via S3 in Django.

Sample project

https://github.com/impressai/cloudfront-django-setup

Background

Why is it not recommended to store static files in your database or web server?

There are two types of files that you deal with when creating a web application. The first are your static files like Javascript, CSS, etc. that needs to be served to the clients browser. You can either choose to serve this from your web server itself or from a file storage system like S3. It is typically highly recommended to serve from S3 because there is a much higher guarantee of availability and redundancies. Moreover, unlike your “dynamic” contents such as HTML pages, these files are “static” and don’t need to be changed for each user. In django terminology, these are called “static files”.

The second type of files that you deal with are the files that are uploaded by the user in the course of using your application. These files should be stored in persistent storage like an RDS or S3 in order to allow you to scale the web server and to prevent data loss when servers crash (again, the availability of the web server node is typically much lower).However, relational databases (RDS) are an expensive and inefficient place to store large files. So the standard strategy is to store these files in S3 and to just store a reference to them in your relational database.  In django terminology these files are called “media files”.

Ideally, you want static files to be accessible to the general public but you want media files to be private and only accessible to the web server with the application server then controlling access to the file itself. 

I would recommend the 12-factor app methodology from Heroku for a more complete set of guidelines on how to structure the SaaS app.

How do I do this in Django?

One of the easiest ways to achieve this is to use the ‘Django storages’ library. If you want to learn more about getting started with django storages, I highly recommend this tutorial:  https://simpleisbetterthancomplex.com/tutorial/2017/08/01/how-to-setup-amazon-s3-in-a-django-project.html . I highly recommend this resource as it describes an elegant way to have both private and public media files by leveraging on S3’s features and boto3 (AWS SDK for Python – https://boto3.amazonaws.com/v1/documentation/api/latest/index.html). The essence of it is that we use the same bucket in which static files are marked at the object level as public and the media files are marked at the object level as private. And accessing the media files is done through a short-lived signed URL that AWS/boto3 auto generates. 

Ok, but what’s Cloudfront? 

Cloudfront is Amazon’s Content Delivery Network (CDN) solution. A CDN is a system of distributed servers that deliver pages and other web content to a user, based on: his/her geographic locations, the origin of the webpage and the content delivery server. 

This service effectively speeds up the delivery of content of websites that have a global reach as the data will be served to them from a nearby server rather than a central one (as illustrated in the diagram). CDNs also provide other benefits like protection from large surges in traffic.

Image result for cdn works

Like us, if you already use AWS for the rest of your cloud requirements, it makes the most sense to use Cloudfront as your CDN.

Image result for cloudfront s3

Signed URLs and the struggle of enabling Cloudfront

For the longest time we struggled to move to Cloudfront because we couldn’t figure out how to serve our public static files through Cloudfront while making sure our private media files weren’t accessible to the general public. 

As mentioned previously, S3 allows key level access control allowing us to have private media files and public static files in the same S3 bucket. 

Cloudfront has to be configured to either sign all URLs or sign none of them. Signing all URLs is not practical because of caching issues if our JS/CSS, etc were to be  served with signed URLs. Moreover, signing the CloudFront URLs is a lot more complex than signing S3 private URLs, as explained in the documentation (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudfront.html#id57). Additionally, the latter policy of putting all as public would not work as we ideally want our media files to be private and not accessible to a non-authorised user who might be able to guess the URL.

The solution

The rest of this blog explains how we design a solution to this problem of how to serve static files via Cloudfront & private media files via S3 in Django.

Setup

Step 1: Setup the AWS S3 bucket

The setup for s3 is similar to the article https://simpleisbetterthancomplex.com/tutorial/2017/08/01/how-to-setup-amazon-s3-in-a-django-project.html which will have a static and media folder in the s3 bucket. The static folder is where Django’s collect static dumps all the static files. The media folder will store all the Model’s File field files in properly named and structured subfolders. 

Steps 2: Setup an AWS CloudFront from S3

Set the origin to only cache the static folder from the S3 bucket. 

What this does is that it ignores the media folder completely. This is exactly what we need. 

Steps 3: Django modifications

The next step is to modify your Django application so that the static files on your HTML pages are picked from CloudFront rather than S3. The first step to do that is to specify the CDN url in your settings.py:

AWS_S3_CUSTOM_DOMAIN = 'cdn.mydomain.com'

However, we still have the issue that “static” is added to all the calls to your static files from an HTML page.  Let me explain that with an example. Let’s say you specify “{% static ‘my_file’ %}” in your django template. The template rendering engine modifies it to “cdn.mydomain.com/static/my_file” based on the settings.py and django-storages expectations. Cloudfront (since it’s only caching the static folder) helpfully adds another “static” to that and searches in the s3 bucket for the file “s3-bucket.com/static/static/my_file”. 

Step 3a: Add a template tag

The most straightforward fix we found for this was to have a custom static template tag to get the static files from the CloudFront and others from the s3 bucket URL.

These custom template tags are typically in the folder /templatetags/ within the app. We will define a tag called static_cdn.

File directory

@register.simple_tag
def static_cdn(url):
    if settings.CDN_ENABLED:
        url = static(url).replace(settings.AWS_S3_CUSTOM_DOMAIN, settings.AWS_S3_CDN_DOMAIN).replace('/static/', '/')
        return url
    else:
        return static(url)

static_cdn template function

What this actually does is to replace the static URL with the CloudFront URL defined in the settings.py, First, the URL is fetched using the static(URL) function and replaced with its static s3 path with the CDN URL AWS_S3_CDN_DOMAIN. As mentioned previously, CloudFront adds an extra static to all the requests coming its way. The above replacement fixes this issue. We also add a CDN_ENABLED flag so that we can test in non-production without cloudfront.

Step 3b: Modify your templates

The above steps let you set up CloudFront to serve all static files. Now, this allows us to prevent all public access to your S3 bucket and only serve media files through your web application. Additionally, as CloudFront is only caching the sub-folder, we don’t have to worry about data being incorrectly cached without S3’s protections.

In the HTML file, we can import both the static and static_cdn

{% load static_cdn %}
{% load static %}

<link rel="apple-touch-icon" href="{% static_cdn 'icon.png' %}"> # cdn url
<link rel="apple-touch-icon" href="{% static 'icon.png' %}"> # s3 url

Just for reference, our Django AWS setup looks like this

# aws setup
if not DEBUG:
    CDN_ENABLED = True
    AWS_DEFAULT_ACL = None
    AWS_ACCESS_KEY_ID = '*******************************'
    AWS_SECRET_ACCESS_KEY = '*******************************'
    AWS_STORAGE_BUCKET_NAME = 'bucket-name'
    STATIC_DISTRIBUTION_ID = '***********'
    AWS_S3_CDN_DOMAIN = '{}.cloudfront.net'.format(STATIC_DISTRIBUTION_ID)
    AWS_S3_CUSTOM_DOMAIN = '{}.s3.amazonaws.com'.format(AWS_STORAGE_BUCKET_NAME)
    AWS_S3_OBJECT_PARAMETERS = {
        'CacheControl': 'max-age=86400',
    }

    STATICFILES_DIRS = [
        os.path.join(BASE_DIR, 'static'),
    ]

    STATICFILES_STORAGE = 'http_project.storage_backends.StaticStorage'
    DEFAULT_FILE_STORAGE = 'http_project.storage_backends.MediaPublicStorage'
    STATIC_LOCATION = 'static'

    STATIC_URL = 'https://{}/{}/'.format(AWS_S3_CUSTOM_DOMAIN, STATIC_LOCATION)

    AWS_PUBLIC_MEDIA_LOCATION = 'media/public'
    DEFAULT_FILE_STORAGE = 'http_project.storage_backends.PublicMediaStorage'

    AWS_PRIVATE_MEDIA_LOCATION = 'media/private'
    PRIVATE_FILE_STORAGE = 'http_project.storage_backends.PrivateMediaStorage'

    MEDIA_URL = 'https://{}/{}/'.format(AWS_S3_CUSTOM_DOMAIN, AWS_PUBLIC_MEDIA_LOCATION)

Setup 4: Allow cross-origin

We should also set up the distributions behavior to avoid cross origin access errors: 

  1. Open your distribution from the CloudFront console.
  2. Choose the Behaviors tab.
  3. Choose to Create Behavior, or choose an existing behavior, and then choose Edit.
  4. For Allowed HTTP Methods, select GET, HEAD, OPTIONS.
  5. Choose Yes, Edit.

For further information about this cross-origin error refer to this link.

Tips

  1. The static_url template tag can be defined in your own way according to the environment path that was configured in your Django application
  2. The AWS CloudFront service also provides other features such as gzip.
  3. If you are in a situation where you need to allow only certain HTTP methods, then CloudFront has this cool feature to define that too
  4. The CloudFront cached files are served using SSL but if you need a custom SSL certificate that can also be modified from CloudFront
  5. You can find the CloudFront URL from the general settings of a distribution

Summary

You have now learned how to set up CloudFront for your Django project and to allow only the static files to be served via CloudFront and the media files served using s3 that holds your private and public media storage. Hence we know how to serve static files via Cloudfront & private media files via S3 in Django.

Alternative approach when you don’t have to deal with load balancers and multiple servers: Django – Correctly Wiring to AWS CloudFront for Static and Media Files 

Do you have a better way to structure your app that would solve this problem? Do tell us.

Photo credits – Émile Perron on Unsplash