Like my previous vs post, this isn't a showdown. Instead, this post is about matching the solution to the problem. The tl;dr is that Celery is great at what it does, but it requires more infrastructure and setup than I needed to solve my problem.
The Problem to Solve
I needed to sync data from my database to Canvas, a learning management system where instructors store grades. If I had a year to do this, I would send the data directly to Canvas as it comes to my server. Unfortunately, I had a much more limited time frame to get this done, and that solution would require a lot more testing and robust error handling in cases where the data does not go through.
Before starting, I knew Celery was a common solution for task based workflows in Django, but I wanted to see what other options there were and how they stack up. I did some research and posted this to see if there was anything I missed in my research, which got some helpful responses.
Based on those responses, I started looking into setting up cron jobs to sync the data. I decided that Celery seems to be a great product that handles a lot of complexity for tasks (long and short running), but it was more effort to set up and maintain than I needed.
Management Commands and Cron Jobs
I created management commands to nail down the code while I looked into libraries for setting up cron jobs with Django. Several were no longer supported, and I didn't find any that I felt confident in. I began looking into how to set the cron jobs up myself in the packer scripts I already had for deployments. Fortunately, it was pretty easy. I created a shell script for each cron job that activates the proper environment on the server (to get the appropriate context to connect to my database and Django application) then runs a management command. These are copied by packer onto the server instance. I referenced those scripts in cronjob.sh
which packer
executes while setting up the server to set up the cron jobs. It looks like this:
...
(crontab -l 2>/dev/null; echo '11 * * * * /opt/website/cron_canvas_user_sync.sh >> /opt/website/logs/cron_users.log 2>&1') | crontab -
...
Explaining cron syntax is outside the scope of this post, mostly because I don't entirely understand it. Crontab Guru was incredibly helpful in setting these up.
The only (minor) difficulty was having to split my packer script into two so that the cron jobs wouldn't be set up on my staging server.
This has worked great! Initially I had three separate cron jobs running:
- One to sync account names on our service with those in Canvas.
- One for quiz grades.
- One for game progress grades.
The first has since been scoped down a bit, but I still need it for a specific case. I recently combined the last two because they were happening simultaneously and sending data to the same endpoint. I hadn't realized that when I first created them.
In the future, I'd like to set up our server to send the grades directly to Canvas when they come in, but for now, this is working well.
Another alternative to these solutions is Django Q2, which Jeff Triplett has written about. It wasn't as established when I was considering what to adopt, but it's worth considering now.