I first considered S3 for backup back in 2010. At the time, uploading to S3 cost money and the other costs were relatively much higher than now. I bought two USB drives intending to backup regularly and rotate them to a relative’s house.

Guess how often that happened? (Hint: never on the rotation, and count-on-one-hand-ish times for actually backing up to the USB drives.)

I do have some automated backups for my websites data, and occasionally I’ll copy that to another drive but not nearly often enough.

So now I’m in the middle of studying for an AWS cert, and S3 is much cheaper, so here goes: I’m going to add to my automated backup to upload the backup to S3 with “infrequent access” class storage since I basically hope to never read these backups.

I had already made a backup bucket in August and manually uploaded the backup files I had. (I keep 30 daily db dumps and 5 weekly site file backups.) So I’ll just upload to the same folders I used. Eventually I’ll set a lifecycle policy to delete older objects, but I want to proof the automation for a bit, and I’d also like to figure out how to delete old backups while being sure I have multiple newer backups. e.g. if the automation quits working I don’t want the lifecycle to age-out and delete my S3 backups.

  • I’ve been using AWS CLI to upload to my S3-backed websites, so I installed that on my web server
  • I created a new IAM user with API key auth
  • With the account the scheduled backup task runs as, I configured AWS CLI to use the API key I just created
  • I created a policy for that user that only allows it to upload to a couple of folders of my bucket. It took a couple of tries. I had to include s3:CreateMultipartUpload, and then I had to add /* to the end of my resource lines before this policy worked. Also, I provided two resources because I wanted to restrict where this account can upload and had my data already arranged this way.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1485479979000",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:CreateMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::MYBUCKETNAME/iis/database/*",
                "arn:aws:s3:::MYBUCKETNAME/iis/inetpub/*"
            ]
        }
    ]
}
  • I used Windows encryption to encrypt the aws credential file so only one user can decrypt it
  • I added the aws command to upload the backup file to the batch file (I’ve been using this backup for many years, before I was using Powershell regularly, so it’s a bat file)
aws s3 cp --storage-class STANDARD_IA "%backupfldr%FullBackup.%backuptime%.7z" s3://MYBUCKETNAME/iis/database/
  • And it works! I triggered the task to ensure it runs from the scheduler.

  • And modifying the file backup script is now trivial:

aws s3 cp --storage-class STANDARD_IA "%backupfldr%%backuptime%-inetpub-full.7z" s3://MYBUCKETNAME/iis/inetpub/
  • I just kicked off the file backup task. That will take a while, but I expect it to complete successfully.

I don’t have lifecycles defined yet, but all the big files are set to infrequent access class. As far as how to age and delete while programatically ensuring there are other backups present, I think I might try to make a Lambda function to manage that. Yeah, I was also thinking about using Lambda to periodically move files to a different folder. The restricted user can’t delete, but it could overwrite files by re-uploadig in the unlikely case that someone got hold of the account and wanted to delete or corrupt my backups.

Come to think of it, I could have Lambda trigger on PUTs to those folders, and:

  • Move the object out of harms way
  • Remove any aged-out objects while ensuring enough are left
  • Alert me if there has been a gap in upload times … actually I might need a different trigger and script to alert me if backups quit uploading for a couple of days
  • Periodically keep objects for longer periods of time, e.g. one backup per year for 5 years, one backup per month for the past 6 months in addition to the last 30 daily backups present
  • Going overboard here, but also mentally playing with Lambda: Can I have it message me on files marked for deletion and apply a lifecycle to them that they’ll delete in a week?

The backup data I uploaded a few months ago has been running me under $2 per month, and I was using standard class storage instead of infrequent access. So, offsite automated backups accomplished! Pretty cheaply, too.