This post was written by Markus Stefanko

Backup Docker to Amazon S3

There is a great tutorial in GIST form on how to backup MySQL Databases through a Cron Job to Amazon S3 by David King over at Github.

Let’s use that to periodically backup running Docker containers to your Amazon S3 bucket as well.

A word of caution: Note that with every backup we’ll do a full export of the docker file, with all its’ files – at every iteration. In order to keep those images small, there are some great tips from the guys at the Intercity Blog on slimming down Docker containers.

Parallel s3cmd version

For our purposes, we actually use a modified version of s3cmd, to allow for parallel uploads – so we’re not stuck with just one upload at a time.

git clone https://github.com/pearltrees/s3cmd-modification
cd s3cmd-modification
apt-get install python2.4-setuptools
python setup.py install

Configure s3cmd

Have your Amazon S3 Bucket credentials handy, and run the following command to configure s3cmd :

s3cmd --configure

Patch the .s3cfg : On selected installs or bucket zones you might have some problems with uploading. This is easily prevented by changing one line in your ~/.s3cfg as seen in this Serverfault article :

host_bucket = %(bucket)s.s3-external-3.amazonaws.com

Create your docker backup script

Create the file called s3dockerbackup.sh – maybe in your home directory, or in a subdirectory like ~/bin – ( create it with mkdir~/bin` ) :

#!/bin/bash
## 
## Usage  : ./s3dockerbackup.sh
## Author : Markus Stefanko <markus@stefanxo.com>
## From   : http://blog.stefanxo.com/category/docker/
##
## Saves all running docker containers, and syncs with Amazon S3 Bucket
##

# Set the Amazon S3 bucket you want to upload the backups to
# Find out which buckets you have access to with : s3cmd ls
bucket="s3://bucket"

# Delete old backups? Any files older than $daystokeep will be deleted on the bucket
# Don't use this option on buckets which you use for other purposes as well
# Default option     : 0
# Recommended option : 1
purgeoldbackups=0

# How many days should we keep the backups on Amazon before deletion?
daystokeep="7"

# How many worker threads to use ( check how many CPUs you have available )
# This uploads faster due to parallelization
# Make sure you don't use all your CPU workers for the backup, and remember that
# the cronjob has all the time in the world
workers=8

# This directory should have enough space to hold all docker containers at the same time
# Subdirectories will be automatically created and deleted after finish
tmpbackupdir="/tmp"

# Based on S3 MYSQL backup at https://gist.github.com/2206527

echo -e ""
echo -e "\e[1;31mAmazon S3 Backup Docker edition\e[00m"
echo -e "\e[1;33m"$(date)"\e[00m"
echo -e "\e[1;36mMore goodies at http://blog.stefanxo.com/category/docker/\e[00m"
echo -e ""

# We only continue if bucket is configured
if [[ -z "$bucket" || $bucket != *s3* || "$bucket" = "s3://bucket" ]]
then
        echo "Please set \$bucket to your bucket."
        echo -e "The bucket should be in the format : \e[1;36ms3://bucketname\e[00m"
        echo "You can see which buckets you have access to with : \e[1;33ms3cmd ls\e[00m"
        exit 1
fi

# Timestamp (sortable AND readable)
stamp=`date +"%Y_%M_%d"`

# Feedback
echo -e "Dumping to \e[1;32m$bucket/$stamp/\e[00m"

# List all running docker instances
instances=`docker ps -q -notrunc` 

tmpdir="$tmpbackupdir/docker$stamp"
mkdir $tmpdir

# Loop the instances
for container in $instances; do

    # Get info on each Docker container
    instancename=`docker inspect --format='{{.Name}}' $container | tr '/' '_'`
    imagename=`docker inspect --format='{{.Config.Image}}' $container | tr '/' '_'`

    # Define our filenames
    filename="$stamp-$instancename-$imagename.docker.tar.gz"
    tmpfile="$tmpdir/$filename"
    objectdir="$bucket/$stamp/"

    # Feedback
    echo -e "backing up \e[1;36m$container\e[00m"
    echo -e " container \e[1;36m$instancename\e[00m"
    echo -e " from image \e[1;36m$imagename\e[00m"

    # Dump and gzip
    echo -e " creating \e[0;35m$tmpfile\e[00m"
    docker export "$container" | gzip -c > "$tmpfile"

done;

# Upload all files
echo -e " \e[1;36mSyncing...\e[00m"
s3cmd --parallel --workers $workers sync "$tmpdir" "$objectdir"

# Clean up
rm -rf "$tmpdir"

# Purge old backups
# Based on http://shout.setfive.com/2011/12/05/deleting-files-older-than-specified-time-with-s3cmd-and-bash/

if [[ "$purgeoldbackups" -eq "1" ]]
then
    echo -e " \e[1;35mRemoving old backups...\e[00m"
    olderThan=`date -d "$daystokeep days ago" +%s`

    s3cmd --recursive ls $bucket | while read -r line;
    do
        createDate=`echo $line|awk {'print $1" "$2'}`
        createDate=`date -d"$createDate" +%s`
        if [[ $createDate -lt $olderThan ]]
        then 
            fileName=`echo $line|awk {'print $4'}`
            echo -e " Removing outdated backup \e[1;31m$fileName\e[00m"
            if [[ $fileName != "" ]]
            then
                s3cmd del "$fileName"
            fi
        fi
    done;
fi

# We're done
echo -e "\e[1;32mThank you for flying with Docker\e[00m"

Change the first variable to the bucket where you want to store your backups. And better create a bucket just for this purpose, so you can use the delete backups feature properly.

Give the file permission to be run : chmod +x s3dockerbackup.sh and then test out the script ./s3dockerbackup.sh

Run it regularly

Now you’ll need to set up a cronjob. You can do that easily on the server by running crontab -e. Add a line like the following :

0 4 * * * /bin/bash ~/bin/s3mysqlbackup.sh >/dev/null 2>&1

This will run the backups at 4 in the morning every night. If you want to run the backups more often, you could use the */6 feature of crontab instead of the 4. This would run the backup every 6 hours:

0 */6 * * * /bin/bash ~/bin/s3mysqlbackup.sh >/dev/null 2>&1

But that just might be overkill in most cases.

Remember

Amazon has also a nifty feature called Object Expiration, so you can read up on that. Unfortunately s3cmd does not support it yet automatically.

Changes from the original script :

  • We removed hours and minutes in the folder and filename
  • We actually create the backups first, and then upload them with s3cmd sync. You will need basically the same amount of harddrive space free in your backup directory as the amount of running Docker containers are using
  • Since we sync them, instead of forcefully uploading them, you can run this script multiple times a day, and it will only overwrite backups with containers which changed in size
  • We added a function to purge old backups – so your Amazon account won’t be clogged up with them. Kudos to setfive consulting

Another nice to have

We could widen out the sync command over older folders. If same size image backup exists in yesterday’s folder, there’s no need to upload the same version again.

  • This could be achieved now easily by having simply the same upload folder in the bucket for the backups. That would mean however that there’s no historic backups
  • Another way to achieve it is to check the files with s3cmd info before uploading, and scan older folders for the filesize and md5 hash. If no files from s3cmd info match the filesize and md5 hash – upload the file.

Enjoy your Docker backups – either regularly or before updating docker, and get yourself familiar with docker import in case you actually need to restore that backup.

Cheers!

by Markus Stefanko