Creating Disaster Recovery backups

2016.12.25

Every company should have a Business Continuity & Disaster Recovery (BCDR) plan, which includes making regular backups. These backups should be made to a separate environment than you normally use. So if you use Amazon AWS, you should make backups to Google Cloud Platform or Microsoft Azure. AWS S3 buckets have a 99.999999999% durability SLA, but eventually Amazon might accidentally deploy a bad update that rm -rf’s some S3 buckets, or AWS might have some horrible outage and you’ll want to spin up your company elsewhere, or you might run into the situation Coding Spaces had where their AWS root credentials were compromised and all their infrastructure (including backups) were deleted (link), or your someone on your Ops team might run a bad command somehow that wipes out your backups on AWS.

Disaster Recovery is quite simply about preparing for the worst case, rare, cataclysmic disasters, and recovering from them. Now you could have a separate AWS account to send your backups to, but again, it’s best to really prepare for worst case scenarios where AWS completely disappears.

It is critical you protect your backups at the same level you protect the original data. As such, I require the following for my backups:

Backups are encrypted with strong crypto, so even if an attacker compromises the location of the backups no information is leaked. Backups should be encrypted both in transit and at rest.
Public key crypto is used so I don’t have a key lying around anywhere that can be used for decryption. I don’t need to worry about where the public key for encryption goes, and the only way to decrypt the backups is with a private key that I keep much tighter controls on.
Credentials for the backup location allow only writes, and not over-writing or reading. This is like old tape drives where you can only append more data, and not delete or over-write the existing backups. Similar to my use of public key crypto, this means that these credentials for writing the backups don’t need to be too tightly controlled, whereas the credentials to read the backups or delete old backups needs to be more tightly controlled.

Encrypting your backups

Create a key pair

First we’ll generate an RSA-4096 key pair. This only needs to be done once. Generate the RSA key pair with:

# Generate RSA key pair
openssl genrsa -out backup_key.priv.pem 4096
openssl rsa -in backup_key.priv.pem -out backup_key.pub.pem -outform PEM -pubout

You now have a private key (backup_key.priv.pem) that you need to store securely and a public key (backup_key.pub.pem) that you can put anywhere.

For added security, I recommend immediately encrypting your private key, and deleting the original, as follows:

# Encrypt the private key (need to enter a password)
openssl enc -aes-256-cbc -in backup_key.priv.pem -out backup_key.priv.pem.enc
rm backup_key.priv.pem

You’ll need to enter a password for that encryption. Then burn that file to a CD or somewhere else that you can access it in emergencies. Remember, if you’re preparing for the disaster of your account being compromised or your cloud provider disappearing, you shouldn’t store this in a place where that disaster would affect it. You should make multiple backups of this so burn three CD’s and give one to two other people in addition to yourself.

Also include directions for decrypting it. Do NOT include the password in these instructions alongside the encrypted file! Decrypting is done as follows:

# Decrypt the private key (need to enter a password)
openssl enc -d -aes-256-cbc -in backup_key.priv.pem.enc -out backup_key.priv.pem

Collect the data to backup

I’ll assume you have some way of collecting the data you want to backup (your database, your git repo, etc.) and saving it as a big .tar.gz file. You’ll want to collect this onto a system with at least twice the disk space of the size of the backup. Make sure the system where this data is collected to is protected at the same level as the data itself. So if you’re making a backup of your production data, and you keep your dev and production environment setup, you shouldn’t do this backup in your dev environment or expose any credentials there that shouldn’t be there.

Compress the data before you encrypt it.

Encrypt the backups

Now that we have a public encryption key (backup_key.pub.pem) and a file to backup (we’ll call it backup.tar.gz), we need to encrypt it before we send it anywhere.

To do this properly, we’ll perform the following:

Generate a random AES-256 key.
Encrypt the backup.tar.gz using the AES key and write it to a timestamped file.
Encrypt the AES key using the RSA public key backup_key.pub.pem and write it to a file.

Using AES makes this faster, and we’ll generate and use this key in memory so it never touches disk.

# Encrypt backup file
FILEIN=backup.tar.gz FILEPREFIX=$(date +"%Y%m%d%H%M")-$HOSTNAME; openssl rand 64 | tee >(openssl enc -aes-256-cbc -salt -pass stdin -in $FILEIN -out $FILEPREFIX-$FILEIN.enc) | openssl rsautl -encrypt -pubin -inkey backup_key.pub.pem -out $FILEPREFIX-$FILEIN-aeskey.enc

You should now have two files:

201612241452-backupmaker.yourcompany.com-backup.tar.gz.enc
201612241453-backupmaker.yourcompany.com-backup.tar.gz-aeskey.enc

As you can see I include the date, the server that this backup was made on, the name of the file, and an extension to denote that it is encrypted or is the encrypted key. If you’re actually recovering from a disaster, or you want to clean up old backups, you don’t want to have to decrypt all these massive backup files trying to find the one you want.

Recovering this backup file is done using the reverse and using the private RSA key (backup_key.priv.pem):

# Recover encrypted backups
# First decrypt the AES key
openssl rsautl -decrypt -inkey backup_key.priv.pem -in 201612241453-backupmaker.yourcompany.com-backup.tar.gz-aeskey.enc -out aeskey.txt
# Now decrypt the backup using that key
openssl enc -d -aes-256-cbc -in 201612241452-backupmaker.yourcompany.com-backup.tar.gz.enc -out backup.tar.gz -pass file:./aeskey.txt

Setting up a location for your backups and getting them there

To keep this article from getting too long, I’ve broken it up so information about setting up a storage location for your backups and sending your backups there are in different articles for the different cloud providers. So far I’ve written articles for:

Using Google for backups
Using AWS for backups
Using Azure for backups (not yet written)

Automate your backups

Set up a way to automatically back up your files. Do this via a cron job and a script.

That script should begin with:

#!/bin/bash
set -euo pipefail
logger "Starting backups for X"

and end with:

logger "Backup complete for X"

This is needed so a log message Backup complete for X will be written to your syslog, and will only be written when you backup completes.

Create an alert rule on your SIEM that checks for that message, and if it doesn’t see it every 24 hours it throws an alert that creates a ticket for someone to investigate.

Test your backups

The only thing worse than not having backups in a disaster is finding out you have bad backups. “Bad backups” means that your backups can’t be decrypted, or are corrupted, or are old because your automated backup process stopped working months ago, or you didn’t backup all the data you need to recover with, or your account for where your backups are stored stopped being paid so it stopped working, etc.

Do the following:

After you setup backups, test the recovery of that backed up data. Remember to be mindful during this process of what you would not have in the disaster. Would you have your private decryption key and credentials to the location of the backups? Would you have the instructions you need for recovering the backups? Do you have backups of all the things you need for this recovery process? If the answer is “No” to any of those questions then you need to backup more things, print out hard-copies, or take some other step.
Set up logging and alerting over your automated backups. You don’t want to find out in the event of a disaster that your backups haven’t worked for the past few months because they were running out of disk space on creation or encryption, or some firewall rule blocked their upload, or the credential your automated backup system used was revoked, etc. Make sure your automation records errors if they are encountered and those errors cause alerts. You may also want to set up a system that has read-only privileges on metadata only to check the actual location of the backups to ensure new files are being written there and they are of the expected size.
In your runbook for standing up new resources, ensure you include a step for making sure you have backups in addition to your other “productionizing” steps such as collecting logs, setting up monitoring, etc.
Set a quarterly or annual reminder to test your disaster recovery processes and ensure any new resources that have been stood up in the past quarter or year are backed up.

You can now sleep more soundly knowing you have securely backed up your critical data so you can recover in the event of a disaster.

← Previous: Introducing Serene | Next: Using Google for backups →