Using AWS for backups

2017.01.29

RSS feed

This article should be read in combination with Creating disaster recovery backups which discusses best practices for backups including how to encrypt them properly. This article describes how to:

  • Setup a location on Amazon Web Services (AWS) for your backups.
  • How to create users with limited permissions to perform the tasks needed in having a backup system.
  • How to get your files there and back out.

Google Cloud

Choosing your storage class

AWS offers three types of storage depending on your access needs: S3 (Simple Storage Service), S3 IA (S3 Infrequent Access), and Glacier. S3 is the main storage class, and Glacier is more specifically for backups. S3 IA is an in-between class with cheaper storage than S3, but with higher retrieval costs.

Glacier can take up 5 hours to retrieve data from it. Glacier offers expedited retrieval as well, for an added cost, that reduces the retrieval time down to 5 minutes. We'll only consider normal Glacier retrievals for this article, which is good enough for most Disaster Recovery plans, because in the event that your entire business get's wiped out, there are a lot of other things you'll be doing in the first 5 hours.

Glacier also offers something called Vault Lock, which can be used to enforce "Write Once Read Many" (WORM) policies, that are immutable, so even if someone compromises your root account at AWS, they can't destroy your backups.

Let's look briefly at the costs and other considerations of Glacier. Given that we're doing nightly backups, we don't need to worry about some things, such as cost per object creation that range from $0.005 to $0.05 per 1000 requests, or per object retrieval, because I assume we're just storing one or handful of large blobs per night and for disaster recovery we'll only be retrieving one of those nightly sets of blobs.

S3 S3 IA Glacier
Cost per GB storage (first 50TB/month) $0.023 $0.0125 $0.004
Minimum storage time N/A 30 days 90 days
Retrieval time Instant Instant Up to 5 hours (or as little as 5 minutes for more money)
Retrieval cost N/A $0.01/GB $0.10/GB (standard, $0.01/GB retrieval + $0.09/GB data)

So let's say you had 100GB that you wanted to do nightly backups of that you keep for one year. After one year with Glacier, this would cost you $1752/yr. This can be compared with Google Compute Engine (GCE) which costs $3066/yr for the same thing due to its $0.007/GB/month storage cost. Additionally, as I'll discuss, you get a lot of additional benefits with AWS, with regard to permissions and policies, that GCE doesn't offer.

Retrieval from Glacier for this 100GB will cost $10. Please refer to the official docs to confirm pricing for your situation. There are a number of articles such as How I ended up paying $150 for a single 60GB download from Amazon Glacier, but as far as I can tell that is based on an older pricing model.

AWS has thought a lot about Disaster Recovery for businesses, so they offer all sorts of additional capabilities, such as being able to ship you actual hard-drives full of your data if you have so much that a truck full of hard-drives is faster than downloading the data over the Internet. See their white paper on Disaster Recovery for more.

Setup location and accounts

Once you have an account with AWS, we'll first create the location where backups will be saved.

Create location

First, we need to setup the location to store our backups. From the AWS console, click the "Services" menu and search for "Glacier".

AWS - Search for Glacier

You should now be at https://console.aws.amazon.com/glacier/home and see a welcome screen, where you'll click "Get started".

Glacier welcome screen

Enter the name of your Vault.

Enter Vault name

Leave the default "Do not enable notifications" Set Event Notifications

Hit Submit on the Review screen. Review

The Glacier Vault has now been created. Created

Click on your Vault and write down your Vault ARN (Amazon Resource Name) for later, which will look like: arn:aws:glacier:us-east-1:1234567890:vaults/summitroute_backup where 1234567890 is your Account Number. If you ever need that number, it can be easily found by looking at the top of https://console.aws.amazon.com/support/ or many other places throughout the AWS console.

Vault ARN

Create a service account to write the backups

We now need an account to write our backups to Glacier. First, we'll create a policy that only allows writing to the Vault, and then we'll create a user with this policy assigned to it.

Create policy with write-only permission

Click on "Services" again and search for IAM, so you're taken to https://console.aws.amazon.com/iam

Select "Policies", then "Create policy".

Click Create Policy

Click "Create your own policy"

Choose Create your own policy

Name the policy "BackupWriter", give it a description, and copy/paste in the policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glacier:UploadArchive",
                "glacier:InitiateMultipartUpload",
                "glacier:AbortMultipartUpload",
                "glacier:CompleteMultipartUpload"
            ],
            "Resource": "arn:aws:glacier:us-east-1:1234567890:vaults/summitroute_backup"
        }
    ]
}

Change the ARN to the one you wrote down previously. "Validate" your policy, and then click "Create Policy".

Policy document

Create the user and apply this policy

Choose Users -> Add user

Add user

Name the user "backup_writer", check the box for "Programmatic access", and click "Next: Permissions".

Name the user

Click "Attach existing policies directly", and select our "BackupWriter" policy.

Attach our policy

Create the user.

Create user

Once created, it will show you an Access key ID and Secret access key, which can be thought of as a username and password for the user. Write these down to use later.

Send backups

Install the command line tools

We'll be writing our backups from a Debian Linux system. First, install the command-line tools:

sudo apt-get install awscli

Configure it using the Access key ID and Secret access key from when you created the backup_writer user. If you need to, you can create a new access key for that user and delete the old one. I also use the region us-east-1 (Virginia) and the output format json.

user@ubuntu:~$ aws configure
AWS Access Key ID [None]: AKXXXXXXXXXXXXXXXXXXX
AWS Secret Access Key [None]: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Default region name [None]: us-east-1
Default output format [None]: json

This stores your credentials as plain-text in ~/.aws/credentials

Send files

If you followed my guide on Creating Disaster Recovery backups you should have a file like 201701291452-backupmaker.yourcompany.com-backup.tar.gz.enc that you want to backup. We send it as follows:

aws glacier upload-archive --body 201701291452-backupmaker.yourcompany.com-backup.tar.gz.enc --account-id - --vault-name summitroute_backup --archive-description 201701291452-backupmaker.yourcompany.com-backup

# Returns:
#{
#    "archiveId": "GjJoudd9hC3onlE1ZLof0T3vFRw7XSVHIW4V07GtwF5sf61BKBNF-sZkNbOPcsUID7qzfvzDRQTSOAqdNI7GiUVi97BGZYu80Y9m4cNRzGnMV6WN82yEdQuoAJ5x_BsRkrWmswTbHA",
#    "location": "/XXXXXXXXXXXX/vaults/summitroute_backup/archives/GjJoudd9hC3onlE1ZLof0T3vFRw7XSVHIW4V07GtwF5sf61BKBNF-sZkNbOPcsUID7qzfvzDRQTSOAqdNI7GiUVi97BGZYu80Y9m4cNRzGnMV6WN82yEdQuoAJ5x_BsRkrWmswTbHA",
#    "checksum": "a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447"
#}

Your file has now been uploaded, and you can sleep peacefully.

Note: Your files will NOT immediately appear. Glacier collects an inventory of it's files once a day, which means that your files will NOT show up for a day.

No inventory

It is normal for your files not to appear for up to 24 hours

Items in inventory

Eventually your inventory will include your items (I uploaded two test files)

Create accounts for others to retrieve the backups

Similar to creating our "BackupWriter" policy, create a policy called BackupReader with the following permissions (change 1234567890 to your account number).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "glacier:DescribeJob",
                "glacier:DescribeVault",
                "glacier:GetDataRetrievalPolicy",
                "glacier:GetJobOutput",
                "glacier:GetVaultAccessPolicy",
                "glacier:GetVaultLock",
                "glacier:GetVaultNotifications",
                "glacier:ListJobs",
                "glacier:ListMultipartUploads",
                "glacier:ListParts",
                "glacier:ListTagsForVault",
                "glacier:ListVaults",
                "glacier:InitiateJob"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:glacier:us-east-1:1234567890:vaults/summitroute_backup"
        }
    ]
}

Once you've created a new user, you can configure the AWS tools on a different system to be able to read your backups.

For testing on the same system as your writer, you can create a second profile for the aws command to use by running aws configure --profile backupreader. Then when you need to make aws calls, you'll use aws glacier --profile backupreader ... You do not want to leave both the reader and writer permissions on the same system though.

Download backups

As mentioned previously, Glacier takes up to 5 hours to retrieve your backups, so downloading them requires multiple steps. The first step is to initiate a job request to make them available for download:

aws glacier initiate-job --vault-name summitroute_backup --account-id - --job-parameters '{"Type": "archive-retrieval", "ArchiveId": "GjJoudd9hC3onlE1ZLof0T3vFRw7XSVHIW4V07GtwF5sf61BKBNF-sZkNbOPcsUID7qzfvzDRQTSOAqdNI7GiUVi97BGZYu80Y9m4cNRzGnMV6WN82yEdQuoAJ5x_BsRkrWmswTbHA"}'
# Returns:
# {
#    "jobId": "No-vD4Wb4MOUxRIpmsbDXW5AzMiie0O0DxCDjJdyppF2MkyXXOyjw0hQBeWI7MA0Rl1n5kwQjGRkE-Lv53CiJ9pcZVeA",
#    "location": "/1234567890/vaults/summitroute_backup/jobs/No-vD4Wb4MOUxRIpmsbDXW5AzMiie0O0DxCDjJdyppF2MkyXXOyjw0hQBeWI7MA0Rl1n5kwQjGRkE-Lv53CiJ9pcZVeA"
# }

Now you can periodically check the status of your job:

aws glacier --profile god describe-job --vault-name summitroute_backup --account-id - --job-id No-vD4Wb4MOUxRIpmsbDXW5AzMiie0O0DxCDjJdyppF2MkyXXOyjw0hQBeWI7MA0Rl1n5kwQjGRkE-Lv53CiJ9pcZVeA

# Returns:
# {
#    "JobId": "No-vD4Wb4MOUxRIpmsbDXW5AzMiie0O0DxCDjJdyppF2MkyXXOyjw0hQBeWI7MA0Rl1n5kwQjGRkE-Lv53CiJ9pcZVeA",
#    "StatusCode": "InProgress",
#    "Action": "ArchiveRetrieval",
#    "ArchiveSizeInBytes": 12,
#    "Completed": false,
#    "CreationDate": "2017-01-29T18:57:21.211Z",
#    "VaultARN": "arn:aws:glacier:us-east-1:1234567890:vaults/summitroute_backup",
#    "ArchiveSHA256TreeHash": "a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447",
#    "RetrievalByteRange": "0-11",
#    "SHA256TreeHash": "a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447",
#    "ArchiveId": "GjJoudd9hC3onlE1ZLof0T3vFRw7XSVHIW4V07GtwF5sf61BKBNF-sZkNbOPcsUID7qzfvzDRQTSOAqdNI7GiUVi97BGZYu80Y9m4cNRzGnMV6WN82yEdQuoAJ5x_BsRkrWmswTbHA"
# }

We need to wait until Completed is true. A successful retrieval will look like this:

{
    "ArchiveId": "GjJoudd9hC3onlE1ZLof0T3vFRw7XSVHIW4V07GtwF5sf61BKBNF-sZkNbOPcsUID7qzfvzDRQTSOAqdNI7GiUVi97BGZYu80Y9m4cNRzGnMV6WN82yEdQuoAJ5x_BsRkrWmswTbHA",
    "StatusMessage": "Succeeded",
    "CreationDate": "2017-01-29T18:57:21.211Z",
    "StatusCode": "Succeeded",
    "CompletionDate": "2017-01-29T22:47:24.136Z",
    "JobId": "No-vD4Wb4MOUxRIpmsbDXW5AzMiie0O0DxCDjJdyppF2MkyXXOyjw0hQBeWI7MA0Rl1n5kwQjGRkE-Lv53CiJ9pcZVeA",
    "RetrievalByteRange": "0-11",
    "Action": "ArchiveRetrieval",
    "Completed": true,
    "SHA256TreeHash": "a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447",
    "ArchiveSHA256TreeHash": "a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447",
    "ArchiveSizeInBytes": 12,
    "VaultARN": "arn:aws:glacier:us-east-1:1234567890:vaults/summitroute_backup"
}

You can also list all jobs with: bash aws glacier list-jobs --vault-name summitroute_backup --account-id -

Once your ArchiveRetrieval is complete, you can download the file with:

aws glacier get-job-output --vault-name summitroute_backup --account-id - --job-id No-vD4Wb4MOUxRIpmsbDXW5AzMiie0O0DxCDjJdyppF2MkyXXOyjw0hQBeWI7MA0Rl1n5kwQjGRkE-Lv53CiJ9pcZVeA 201701291452-backupmaker.yourcompany.com-backup.tar.gz.enc
# Returns:
# {
#    "acceptRanges": "bytes",
#    "status": 200,
#    "archiveDescription": "201701291452-backupmaker.yourcompany.com-backup",
#    "contentType": "application/octet-stream",
#    "checksum": "a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447"
#}

You will then have your backup file locally.

Your file will be available for download for 24 hours before you need to iniate another job to retrieve it.

Other considerations

  • Multipart uploads: In my examples, I upload the file in one step and downloaded it in one step. It is also possible to upload portions of a file at a time. This is mostly useful so that you don't spend hours uploading a file only for it to fail for some reason and you have to start all over again. By uploading files in chunks, you could ensure that you only need to resend the last few minutes worth a file upload.
  • Vault Access Policies and Vault Lock policies: You can create special policies on your Glacier Vaults in addition to IAM restrictions. These can do things like ensure Vaults are not deleted that are less than 365 days old.
  • Provisioned capacity for expedited retrievals: As mentions, retrievals can take up to 5 hours, but can be reduced to as little as 5 minutes. This is accomplished using Provisioned capacity.

Conclusion

AWS has created a very mature platform to use as storage for Disaster Recovery. It provides all sorts of permissions and configurations to suit any need. This can be very complicated if your needs are simple and you just want an alternative cloud provider to use for backup storage, but I highly recommend using AWS for this purpose.