Use Python & Boto3 to Backup files/logs to AWS S3

Python script, using Boto3, to backup files in a folder or server logs to AWS S3 in daily at a fixed time with the backup data in the file/log name.

Python

Photo by Hitesh Choudhary on Unsplash


Introduction

Let’s say we have a folder on our server in which our logs are generated for various services that are running to make our application available to the users. Now, what if we want to backup those logs to AWS S3 bucket daily at 00:00 hour. Well, this guide is exactly to help us achieve the same! Let’s dive in!


Getting the S3 bucket ready

By default Linux on Chrome OS doesn’t come with a password for sudo, we will set one up, as it is required to set zsh as the default shell.


Let’s Write Some Code!

1. Create the project directory & python ‘venv’ environment

$ mkdir 'Backup Logs S3'
$ cd 'Backup Logs S3'
$ python3 -m venv env
$ source env/bin/activate

2. Create requirements.txt file

schedule==0.6.0
boto3==1.13.20

3. Install the requirements using pip

(env)$ pip install -r requirements.txt

4. Create a function to upload a file to S3 bucket

Use your favorite editor to create backup_logs_s3.py as follows:

The function accepts 4 parameters:

In this function what we are doing is, first, we assign object_name with the name of the file after splitting the path from the file_name, if the object_name is given as a parameter.

Then, if folder_name is given, we assign the object_name to be ‘folder_name/object_name’.

In the try block, we create a client by calling the client method of boto3 package. Make sure to replace ‘YOUR_AWS_ACCESS_KEY_ID’ and ‘YOUR_AWS_SECRET_ACCESS_KEY’ with your actual keys which I asked you to keep handy earlier.

This client is then used to call the function upload_file to upload the file to our S3 bucket and the response returned by this function is printed.

5. Create a function to append date to log files (Optional)

This step is optional if you simply want to upload your files to S3, feel free to skip this step. Suppose, I have a log file named ‘server.log’ which gets appended by the requests that the server receives. So, if my server has been running for a week, then all requests of the whole week have been logged to the same file, this makes checking the logs for a particular day troublesome. To resolve this, each day at 00:00 when we backup the logs to S3 bucket, first, we will append the date of the previous day to the file name and then upload the file to S3, which will help us to browse through the logs date-wise.

The function append_text_to_file_names() accepts 2 parameters:

In this function, we rename, by appending the given text to the name of the files. After renaming the files we return the list of the files with the new names.

6. Create a function that will use the above functions

The motive of this function is to call the above functions in it, which will be used as a task for scheduling later on.

In the function rename_and_backup_logs_s3(), the previous day’s date is calculated and converted to ‘DD-MM-YYYY’ string format. log_files list is used to store all the files that we want to backup every day. We call the append_text_to_file_names() passing the list of files and previous day’s date in ‘DD-MM-YYYY’ format to append it to the name of the files. upload_file_to_s3() is called for each renamed file in the list, to upload it to the S3 bucket. Remember to replace YOUR_BUCKET_NAME with the actual name of the bucket that you assigned while creating the bucket.

7. Final Step, Scheduling the task

We create a schedule to run the task ‘rename_and_backup_logs_s3’ to run daily at ‘00:00’.

Run the script:

(env)$ python3 backup_logs_s3.py

In production environment, use supervisord to start up the script.

Resources: