Lessons Learned Teaching Undergraduate Astronomy with a Video Game - Infrastructure and Deployment

Lessons Learned Teaching Undergraduate Astronomy with a Video Game - Infrastructure and Deployment

This is the third installment of the series breaking down my talk from DjangoConUS 2022. The first entry covered background information about the project and the second was about using Django Rest Framework.

First, some important context: if you are a devops engineer, or have a lot of experience with AWS/GCP/Azure, this post may not be for you. This post is aimed at folks who would prefer to write Django code than deal with the intracacies of deployment.

With that being said, this is the section of the talk that made me want to write this series of posts. Specifically, I realized that I focused a lot on the infrastructure setup of this project, which I want to outline here. However, I wish I would have spent more time focusing on what I think the goal of any successful deployment strategy should be for a Django project regardless of the infrastructure:

Repeatability and confidence in deployments.

There are a lot of ways to get to this point. And on day 1 (or even 100 if it's a side project), you likely won't be there. But starting with good process documentation, and moving that toward making your deployments consistent and repeatable is a massive boost in confidence that increases the time you can spend working on your application rather than its infrastructure. It can also significantly lower your stress when it comes to deploying. Importantly, this is all independent of what services you use to deploy your application!

At AstroVenture, I chose Amazon Web Services (AWS) for our infrastructure early on in founding the company because I had experience with it, and we received free credits for a limited time. At that time, I manually created an EC2 instance for the server (and RDS for the database) and manually installed all of the packages I needed to run the server. I manually installed the app and 'deployments' were done by pulling from GitHub and restarting the gunicorn workers. The 'backup strategy', was the in depth document I wrote with step by step instructions about how I did all of that. I tested it by recreating the server a second time for our production environment and using the first as a test/staging server.

In the event of a catastrophic server issue, I was likely to be down for several hours, if not an entire day. But having that document gave me the confidence that it wouldn't be more than that, and that I wouldn't have to stress that I would miss a step during that recovery. So if you aren't sure how you would bring your servers back in the event of everything going down, I'd highly recommend going through this exercise for whatever service you are using. It can at least alleviate some of the stress of a deployment going wrong.

From there, I hired a friend who had more infrastructure experience to write the Packer and Terraform scripts we use now, and to help me make a few architecture decisions that allow us to have ~zero downtime deployments and scaling. I was already using a load balancer, but we added an autoscaling group so that we can spin up new instances if we need.

The Packer scripts create the server image with the application and all of its dependencies, so if we ever have an issue where we need to redeploy an old stable version, we can do that directly from the image instead of having to recreate it. Luckily, we haven't had to do that yet. We use the Terraform scripts to provision an entirely new server and wait until it is live before swapping it with the previous server (and then terminating that one). There are other tools that handle automating infrastructure and application building that others might prefer, but these have worked well (in combination with some good old fashioned bash scripts) for us.

We also have end to end tests (more on this in a post coming soon), which I run after every deployment to make sure that the most important parts of the site are functioning correctly.

What if you don't have a friend with devops experience that can help you, and you don't have that experience yourself?

There are a number of options of Platform as a Service (PaaS) offerings from companies like Render and Fly.io that a lot of folks in the Django community are using. I'm hoping to try these in the near future along with Django Simple Deploy. So while I can't give specific recommendations for these platforms, I can tell you that the goal: Repeatability and confidence in deployments is even easier to achieve on these platforms than it is on AWS (or GCP or Azure). They handle a lot of the work that our Packer and Terraform scripts do so that you can focus on your application code. The tradeoff with these services is that they can be a little bit (to a lot if you scale very large) more expensive than equivalent 'bare metal' servers from AWS, GCP, or Azure. But they can also be cheaper starting out, and the added price can be worth it while you get started.

No matter what tools you use for hosting and deploying your code, if you are reluctant to deploy because of something in your process you aren't confident about, I strongly recommend you look into ways to address that issue. I found that it was a big relief to stop worrying about deploying once I was able to address the more manual parts of our process.

Finally, remember, you don't have to do this all at once, and you don't have to be at the point of continuous integration of your code to feel confident with your deployments. Take small steps and work toward the goal of feeling confident deploying. It'll make coding a lot more fun!