From an earlier post

Somewhere in the pipeline is declarative provisioning and configuration lab work which might go better in the cloud, but I might be able to do it at home, too, with up to four Hyper-V > hosts ready to go.

I also have a cloud at work, and some of the things I want to do are relevant to my job.

I keep forgetting how recent a lot of my knowledge is. That was only 15 months ago. Since then I’ve done a fair bit of provisioning and configuration automation at work. I can (and have done) tear down and rebuild about a dozen servers at a whim to set up an application cluster.

This weekend I was reviewing my old cloud posts and looking at Amazon S3 storage again and wondering why I hadn’t done it yet. Seeing how recent that post is makes me realize that because my needs are different now than they were last time I thought about S3 or cloud VMs.

I thought I had poked a couple of s3 buckets before, but apparently not or I deleted them immediately. For work and career reasons I feel I need to do more with cloud APIs, and the S3 API is immediately relevant. Ceph is a free product that became prod-ready 3-4 months ago and has an S3-alike API. While my products at work have current blockers to using the public cloud, the company as a whole wants to be cloud-first. (Although for many applications, internal cloud is required for various security reasons.) So Ceph/S3 may solve some problems for my product going forward.

So, I decided to put a site on S3. (S3 can be used as a static web host without the need for an EC2 host.) I arbitrarily picked jimnelson.us which also can be referred to with www.jimnelson.us , but the ‘proper’ has been without the www. For technical reasons (naked/apex DNS names can’t properly use CNAME) I’ll have to move some of my DNS to Amazon Route 53 or go with the www version and redirect from the naked domain.

In addition to enabling static web access I had to add a policy to allow anonymous users to read via the web interface. (Copied and modified; not sure what the “Version” is about yet.)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::www.jimnelson.us/*"
        }
    ]
}

And then I changed www.jimnelson.us to CNAME to www.jimnelson.us.s3-website-us-east-1.amazonaws.com after updating the site to prefer www. (canoncial name in html header).

I also enabled logging which goes into a different bucket.

S3 isn’t actually a filesystem, and one of my recent questions has been the different between a filesystem (what everyone is used to) and an object store (like S3). One thing is that you can’t just add bits to a file like you can on a filesystem. You have to re-upload the entire file/object. So log files don’t seem to be a good fit for S3. Curiously their built-in logging seems to simply create a new file for every log entry…or maybe it stuffs a few into each file if the hits come in close enough together.

(Another difference is mostly invisible, but each object can be in different physical places which has caused at least one person on the Internet large delays when accessing a large number of files with wildcards.)

So basically each file/object is like a book you check out and place back on the shelf in its entirety whereas on most local and network filesystems you can access bits at a time or continually add to the end of the file. (Actually now that I say that…I wonder if you had a big movie file if you could play from the middle…need to check that out. But incremental or partial updates are definitely an anti-pattern for object storage.)

Also, prices for S3 storage seem to have fallen since I first considered it for backup in 2010. In that thread I mentioned buying two external USB drives which I still have, but I almost never use them. I have automatic local backups happening a couple of places, but a drive failure could lose it all. Well, all since the last USB drive backup. With S3 (or other online service) I could have the script upload. And S3 has good security controls and automation, so I could fairly easily arrange for a backup script that has limited access to S3, can’t delete its uploads if compromised, and even have S3 rotate files to cheaper, slower storage. For a price, but in reality the USB backup idea isn’t “working”.

Google and Microsoft have cloud storage, also, as do others, but AWS is the big dog followed by Google and MS. My immediate interest is to get used to AWS’ S3 API and perhaps Microsoft’s OneDrive for Business.

Observations: After considering online backup in 2010 I bought two 2TB external drives and haven’t really made use of them. (In fact, now that I’m thinking about it I’ll do a one-time upload to S3. 54 GB of backups…see what that bill is.)

And in the past 15 months I’ve bought two new servers to replace my old lab stuff. Those I’m actually making use of, but now I’m looking for reasons to use the cloud again.