Periodically running python programs
Kiran Koduru • Apr 23, 2018 • 8 minutes to readWhen I learned to program, one of the things I wanted to do was run tasks, jobs, functions, programs etc. on a periodic basis. On my journey with python I learned that there are multiple techniques to accomplish this. I learned that I could run a program in the background in the command line or run it as a cron. There are also a combination of python packages that are available which can be useful in running background tasks. I will try to walk to some of them during this post. Forewarning, this is a long post so if you would like to put a pin in it and comeback later, I totally understand. I will try and break in down in multiple posts so it’s easier to come back to. For beginners to python and Unix I would recommend reading the whole series in order to get an idea of working with periodic tasks in python or Unix environments.
Before we begin, have you ever wondered, after you have done writing a program, how would you go about trying to run the program periodically? In my earlier days when I started to program, I was excited I wrote a program that worked but, I wanted to run it at regular intervals. One such program I wrote was scraping data from multiple websites. I wondered, how could I scrape information periodically in-order to have the freshest data always. I’ll try and explain in a series of posts how would you go about setting up running a basic cron to a bit of advanced workflows used in python that simulate periodic tasks via celery. So I hope this helps you in some ways.
I need to start with a program. Something that can be run as background task. My side project needed emails to be sent out periodically, so what better way than automating the schedule for the emails to be sent.
I created a module named send_emails.py. In python files are called modules and packages are folders with files. Weird, I know. So, my module emails the latest blog post to an email list and if there aren’t any new posts published, then I won’t be sending out any emails.
I will use the python requests package to send emails via the mailgun API. And SQLAlchemy as the ORM to for my MySQL database.
My database has 2 tables, namely users
and blogpost
that I need to get data from. The user
table has fields id
and email
, whereas the blogpost
table has fields id
, title
, body
and published_date
.
Just incase you are wondering here’s what the 2 tables look like
Next, my module send_email.py that’ll send the new blog post to the email list in users
table.
Now this is what my directory structure looks like
. └── my_emailer ├── __init__.py ├── db_connection.py ├── models.py └── send_emails.py
Running program in the background
The novice implementation of running the above program as a cron would be to run the program in the backgroud. You could add a while True
loop with a delay using python’s time.sleep()
function. I will add a 5 second delay for sending emails. You can probably calculate the time in seconds you would like to send the email at for larger intervals.
To run the program above on Unix I added the &
symbol when calling it and it’ll run in the background.
python send_emails_bg.py &
To view the list of running programs, I used the jobs
command. I can also bring it to the foreground with the fg
command and finally quit the program with the Ctrl + C
command.
user@home:~/test_project $ python send_emails.py & [1] 19173 user@home:~/test_project $ jobs [1]+ Running python send_emails.py & user@home:~/test_project $ fg ^C
Or if I am not interested in coming back to the process then I could use nohup
.
nohup python send_emails_bg.py &
Running my program like this would probably be okay to test something out but I wouldn’t recommend doing this for larger programs. My process would definitely stop running if my machine was rebooted. Hence the alternative, using cron!
Setting up the cron
Cron gives me the assurance that the program will run daily, hourly, weekly or monthly. It wouldn’t be affected by reboots. I could either edit the existing cron tab file with crontab -e
command or edit the /etc/crontab
file on my Unix environment.
# ┌───────────── minute (0 - 59) # │ ┌───────────── hour (0 - 23) # │ │ ┌───────────── day of month (1 - 31) # │ │ │ ┌───────────── month (1 - 12) # │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday; # │ │ │ │ │ 7 is also Sunday on some systems) # │ │ │ │ │ # │ │ │ │ │ * 7 * * * admin cd /my/path/to/folder;python send_emails.py >> test.log 2>&1
Now the above command will run as the user admin
at the 7th hour of the day(i.e. 7 am) daily. The command will pipe the stdout
and the stderr
to the test.log file.
If you are looking for an implementation on Windows then you can probably find some answers here.
Also, cron
has other implementations that I haven’t looked into like Anacron. From what I have seen it allows to schedule tasks to run on next reboot if the system is turned off. Might be useful for someone who wants to run their job atleast once even if not on schedule.
Hope this tutorial was useful in a way to get to start with cron and running code in the background. In my next post, I will talk about how to daemonize your python code so you can run it as a service.
I am writing a book!
While I do appreciate you reading my blog posts, I would like to draw your attention to another project of mine. I have slowly begun to write a book on how to build web scrapers with python. I go over topics on how to start with scrapy and end with building large scale automated scraping systems.
If you are looking to build web scrapers at scale or just receiving more anecdotes on python then please signup to the email list below.