Abstract
I’ve decided to start managing more of my email myself, on my local
computer, rather than relying on Gmail to keep it archived
forever. This means that I need to backup my email myself. In this
article, I share what my considerations were for this and the script I
wrote to do it automatically.
Moving away from Google
A few months ago, I decided that I was done with letting Google keep
my email archive. Don’t get me wrong, I still think that Gmail is a
wonderful tool, but I wanted to have more control over my own
data. This meant setting up scripts to check Gmail for any new emails
on a regular basis and downloading it.
I’ll skip over the details of the downloading to what I ended up with:
a directory full of thousands of plain text files. I chose to store my
emails using the Maildir standard, which means each email is in its
own file. I use Gnome Evolution to read my emails, and I was
immediately surprised and pleased by how snappy an email client can be
if all of the emails are already on your hard drive.
Having all of these emails locally though, I needed to think about how
I was going to protect them from being lost. I’ve written about
backups before, and since I didn’t want to lose my email archive I
needed to set up regular backups. On the other hand, my emails
sometimes contain personal information. I’d rather not just put them
out into the world as is.
After some reading, I came up with a solution and a simple shell
script to implement it. First, I would put my emails into a Git
repository. If you don’t know what a Git repository is, I’ve written
an introduction to version control. I then use a lesser known feature
of Git called bundles to make incremental backups of my emails. I use
GPG to encrypt the bundles, and then I upload them to a cloud hosting
site.
Let’s break down each of those pieces.
Git Bundles
The first thing that my daily backup script does is add any changes
I’ve made to a Git repository. The commit created will be the list of
changes that have been made to files since the last commit. If I
backup the commits, I have a backup of the data.
This is actually a slightly simpler problem to tackle since my Git
repo will only ever be appending new commits. Even if I delete an
email, which would mean tracking that a file has been deleted, in the
Git repository that’s the addition of a commit. To further simplify
matters, I’m not using branching at all here so I only have one linear
path of Git commits in the repository to worry about.
Git bundles are files that Git lets you create that contain
commits. For example, if you enter this command:
git bundle create master.bundle master
then it will create a single file, called master.bundle
, which
contains all of the commits on the master branch. If you’re creating a
backup of the master branch, then you just need to copy that bundle
somewhere safe.
But I promised you incremental backups. That means I need to create a
bundle that only has the changes since the last backup. This is a very
similar process.
git bundle create start-to-master.bundle start..master
This will create a bundle that has everything in master
, except
anything that is also in start
. If start.bundle
was the last
backup, then I can recover my repository if I have both start.bundle
and start-to-master.bundle
.
Encryption with GPG
GPG, the Gnu Privacy Guard, is a program for encrypting files. If I
get into explaining all of the important parts of GPG, then we won’t
ever get back to the backup script. The short version is that if I
take the Git bundle and push it through GPG, I can get out a version
that I will be able to decrypt and read, but nobody else can without
the key.
Importantly, this means that after I upload my backups to the
internet, if somebody accidentally gets access to them they still
can’t read the contents.
If you