Cost of Transitioning S3 Objects to Glacier

| 10 Comments

how I was surprised by a large AWS charge and how to calculate the break-even point

Glacier Archival of S3 Objects

Amazon recently introduced a fantastic new feature where S3 objects can be automatically migrated over to Glacier storage based on the S3 bucket, the key prefix, and the number of days after object creation.

This makes it trivially easy to drop files in S3, have fast access to them for a while, then have them automatically saved to long-term storage where they can’t be accessed as quickly, but where the storage charges are around a tenth of the price.

…or so I thought.

S3 Lifecycle Rule

My first use of this feature was on some buckets where I store about 350 GB of data that fits the Glacier use pattern perfectly: I want to save it practically forever, but expect to use it rarely.

It was straight forward to use the S3 Console to add a lifecycle rule to the S3 buckets so that all objects are archived to Glacier after 60 days:

S3 Lifecycle Rule

(Long time readers of this blog may be surprised I didn’t list the command lines to accomplish this task, but Amazon has not yet released useful S3 tools that include the required functionality.)

Since all of the objects in the buckets were more than 60 days old, I expected them to be transitioned to Glacier within a day, and true to Amazon’s documentation, this occurred on schedule.

Surprise Charge

What I did not expect was an email alert from my AWS billing alarm monitor on this account letting me know that I had just passed $200 for the month, followed a few hours later by an alert for $300, followed by an alert for a $400 trigger.

This is one of my personal accounts, so a rate of several hundred dollars a day is not sustainable. Fortunately, a quick investigation showed that this increase was due to one time charges, so I wasn’t about to run up a $10k monthly bill.

The line item on the AWS Activity report showed the source of the new charge:

$0.05 per 1,000 Glacier Requests x 5,306,220 Requests = $265.31

It had not occurred to me that there would be much of a charge for transitioning the objects from S3 to Glacier. I should have read the S3 Pricing page, where Amazon states:

Glacier Archive and Restore Requests: $0.05 per 1,000 requests

This is five times as expensive as the initial process of putting objects into S3, which is $0.01 per 1,000 PUT requests.

There is one “archive request” for each S3 object that is transitioned from S3 to Glacier, and I had over five million objects in these buckets, something I didn’t worry about previously because my monthly S3 charges were based on the total GB, not the number of objects.5306220

Overhead per Glacier Object

josh.monet has pointed out in the comments that Amazon has documented some Glacier storage overhead:

For each S3 object migrated to Glacier, Amazon adds “an additional 32 KB of Glacier data plus an additional 8 KB of S3 standard storage data”.

Storage for this overhead is charged at standard Glacier and S3 prices. This makes Glacier completely unsuitable for small objects.

Break-even Point

After stopping to think about it, I realized that I was still saving money on the long term by moving objects in these S3 buckets to Glacier storage. This one-time up front cost was going to be compensated for slowly by my monthly savings, because Glacier is cheap, even compared to the reasonably cheap S3 storage costs, at least for larger files.

Here are the results of my calculations:

  • Monthly cost of storing in S3: 350 GB x $0.095/GB = $33.25

  • Monthly cost of storing in Glacier: $8.97

    • 350 GB x $0.01/GB = $3.50
    • Glacier overhead: 5.3 million * 32 KB * $0.01/GB = $1.62
    • S3 overhead: 5.3 million * 8 KB * $0.95/GB = $3.85
  • One time cost to transition 5.3 million objects from S3 to Glacier: $265

  • Months until I start saving money by moving to Glacier: 11

  • Savings per year after first 11 months: $291 (73%)

For this data’s purpose, everything eventually works out to an advantage, so thanks, Amazon! I will, however, think twice before doing this with other types of buckets, just to make sure that the data is large enough and is going to be sitting around long enough in Glacier to be worth the transition costs.

As it turns out, the primary factor in how long it takes to break even is the average size of the S3 objects. If the average size of my data files were larger, then I would start saving money sooner.

Here’s the formula… The number of months to break even and start saving money when transferring S3 objects to Glacier is:

break-even months = 631,613 / (average S3 object size in bytes - 13,011)

(units apologies to math geeks)

In my case, the average size of the S3 objects was 70,824 bytes (about 70 KB). Applying the above formula:

631,613 / (70,824 - 13,011) = 10.9

or about 11 months until the savings in Glacier over S3 covers the cost of moving my objects from S3 to Glacier.

Looking closely at the above formula, you can see that any object 13 KB or smaller is going to cost more to transition to Glacier rather than leaving it in S3. Files approaching that size are going to save too little money to justify the transfer costs.

The above formula assumes an S3 storage cost of $0.095 per GB per month in us-east-1. If you are storing more than a TB, then you’re into the $0.08 tier or lower, so your break-even point will take longer and you’ll want to do more calculations to find your savings.

[Update 2012-12-19: Included additional S3 and Glacier storage overhead per item. Thanks to josh.monet for pointing us to this information buried in the S3 FAQ.]

[Update 2013-03-07] Amazon S3 documentation now has a section on Glacier Pricing Considerations that has some good pointers.

10 Comments

For small objects another solution would be to have a script which downloads, packages in an archive file and uploads the archive to Glacier. The data retrieval would also be cost effective as long as each archive file is not too big.
Hopefully Amazon will provide this option one day: a daily, weekly, or monthly auto-archive from S3 to Glacier into a single archive file.

AItOawknYVk9pU8aZcTo0QSv8pcSapVoAakMXAc:

Custom archivers would make sense given the current restrictions. It's just a lot more work than entering a day count and clicking a checkbox. Alternatively, Amazon could change the structure for the archiving costs if they could work it out with their internal processes. They have to make sure they account for all edge cases (huge objects and minuscule objects).

Another reason to archive (and an important factor for calculating the break-even on small objects) is the Glacier archive overhead. Every Glacier archive has 32KB of overhead:

https://aws.amazon.com/glacier/faqs/#How_is_my_storage_charge_calculated

And when stored from S3, it is actually 40KB:

https://aws.amazon.com/s3/faqs/#How_is_my_storage_charge_calculated_for_Amazon_S3_objects_archived_to_Amazon_Glacier

I couldn't find an official reference for the S3 overhead, but it seems like it is closer to 1KB:

https://forums.aws.amazon.com/thread.jspa?threadID=82490

josh.monet:

Thanks, that's some mighty important fine print I missed!

I've updated the article with the new calculations required; it has a dramatic effect.

I'm still happy with my decision to archive these buckets to Glacier, but the formula for calculating this break-even point gets complicated.

Dumping small files into an archive before sending them to Glacier sounds like a good idea.
And if done from EC2 in the same AWS region I think you would get around data transfer cost. Then it's just the S3 GET and DELETE requests that you'll pay for, but they're pretty cheap.

Mark Seigle, Tech Lead for Amazon Glacier, talked a lot about file aggregation at re:invent 2012. He recommended to

1) archive small files using zip/7zip and put one generated file into Glacier
2) store central directory of that zip file locally
3) retrieve just the part of zip file containing the file you want using the central directory

for small files.

You can see his session here (file aggregation part starts at 13:00)
http://youtu.be/WXLxc2wRCwY

Anyway, thanks for excellent article and update, it helped me a lot.

harupong, jopsen.dk:

Yes, you can certainly develop your own software and processes to:
(1) extract data from S3
(2) aggregate (tar/zip)
(3) upload to Glacier
(4) delete from S3
as well as the reverse for when you need to access files stored in Glacier.

However, the automatic archival from S3 to Glacier performed by Amazon is trivially easy to turn on. It also leaves the objects visible in S3 and makes it easy to temporarily restore individual objects to S3 for reading.

As currently structured, you can only get the convenience and savings with larger S3 objects. It's a great start, but hopefully Amazon can work on extending this to smaller S3 objects at some point.

Thank you Eric for a great article. I too got burned in transferring my S3 content to Glacier, however, not because I used Amazon's own tools - instead I used [third party service elided].

Has anyone worked out if it's a whole lot cheaper to simply to reupload all of your content to a new Glacier bucket? Does that somehow get around the 'Put' costs?

Hey, great post. We actually built this scenario up for some of our larger users in PlanForCloud - it's a simulation engine which lets you design your scenario (so in this case moving xGB of data to Glacier and expected y read requests and z write requests), we then run this through a simulation to see how much it would cost.

Would love to get your thoughts on the tool.

Cheers,
Hassan
Product Manager at PlanForCloud

hotmoss:

Yes, You can reduce the per-file PUT and overhead costs in Glacier if you combine many S3 files into a single tar/zip/7z file. However, you do lose a lot of convenience built in to the S3-Glacier integration.

Leave a comment

Ubuntu AMIs

Ubuntu AMIs for EC2:


AWS Jobs

AWS Jobs

More Entries

Throw Away The Password To Your AWS Account
reduce the risk of losing control of your AWS account by not knowing the root account password As Amazon states, one of the best practices for using AWS is Don’t…
AWS Community Heroes Program
Amazon Web Services recently announced an AWS Community Heroes Program where they are starting to recognize publicly some of the many individuals around the world who contribute in so many…
EBS-SSD Boot AMIs For Ubuntu On Amazon EC2
With Amazon’s announcement that SSD is now available for EBS volumes, they have also declared this the recommended EBS volume type. The good folks at Canonical are now building Ubuntu…
EC2 create-image Does Not Fully "Stop" The Instance
The EC2 create-image API/command/console action is a convenient trigger to create an AMI from a running (or stopped) EBS boot instance. It takes a snapshot of the instance’s EBS volume(s)…
Finding the Region for an AWS Resource ID
use concurrent AWS command line requests to search the world for your instance, image, volume, snapshot, … Background Amazon EC2 and many other AWS services are divided up into various…
Changing The Default "ubuntu" Username On New EC2 Instances
configure your own ssh username in user-data The official Ubuntu AMIs create a default user with the username ubuntu which is used for the initial ssh access, i.e.: ssh ubuntu@<HOST>…
Default ssh Usernames For Connecting To EC2 Instances
Each AMI publisher on EC2 decides what user (or users) should have ssh access enabled by default and what ssh credentials should allow you to gain access as that user.…
New c3.* Instance Types on Amazon EC2 - Nice!
Worth switching. Amazon shared that the new c3.* instance types have been in high demand on EC2 since they were released. I finally had a minute to take a look…
Query EC2 Account Limits with AWS API
Here’s a useful tip mentioned in one of the sessions at AWS re:Invent this year. There is a little known API call that lets you query some of the EC2…
Using aws-cli --query Option To Simplify Output
My favorite session at AWS re:Invent was James Saryerwinnie’s clear, concise, and informative tour of the aws-cli (command line interface), which according to GitHub logs he is enhancing like crazy.…
Reset S3 Object Timestamp for Bucket Lifecycle Expiration
use aws-cli to extend expiration and restart the delete or archive countdown on objects in an S3 bucket Background S3 buckets allow you to specify lifecycle rules that tell AWS…
Installing aws-cli, the New AWS Command Line Tool
consistent control over more AWS services with aws-cli, a single, powerful command line tool from Amazon Readers of this tech blog know that I am a fan of the power…
Using An AWS CloudFormation Stack To Allow "-" Instead Of "+" In Gmail Email Addresses
Launch a CloudFormation template to set up a stack of AWS resources to fill a simple need: Supporting Gmail addresses with “-” instead of “+” separating the user name from…
New Options In ec2-expire-snapshots v0.11
The ec2-expire-snapshots program can be used to expire EBS snapshots in Amazon EC2 on a regular schedule that you define. It can be used as a companion to ec2-consistent-snapshot or…
Replacing a CloudFront Distribution to "Invalidate" All Objects
I was chatting with Kevin Boyd (aka Beryllium) on the ##aws Freenode IRC channel about the challenge of invalidating a large number of CloudFront objects (35,000) due to a problem…
Email Alerts for AWS Billing Alarms
using CloudWatch and SNS to send yourself email messages when AWS costs accrue past limits you define The Amazon documentation describes how to use the AWS console to monitor your…
Cost of Transitioning S3 Objects to Glacier
how I was surprised by a large AWS charge and how to calculate the break-even point Glacier Archival of S3 Objects Amazon recently introduced a fantastic new feature where S3…
Running Ubuntu on Amazon EC2 in Sydney, Australia
Amazon has announced a new AWS region in Sydney, Australia with the name ap-southeast-2. The official Ubuntu AMI lookup pages (1, 2) don’t seem to be showing the new location…
Save Money by Giving Away Unused Heavy Utilization Reserved Instances
You may be able to save on future EC2 expenses by selling an unused Reserved Instance for less than its true value or even $0.01, provided it is in the…
Installing AWS Command Line Tools from Amazon Downloads
This article describes how to install the old generation of AWS command line tools. For the most part, these have been replaced with the new AWS cli that is…
Convert Running EC2 Instance to EBS-Optimized Instance with Provisioned IOPS EBS Volumes
Amazon just announced two related features for getting super-fast, consistent performance with EBS volumes: (1) Provisioned IOPS EBS volumes, and (2) EBS-Optimized Instances. Starting new instances and EBS volumes with…
Which EC2 Availability Zone is Affected by an Outage?
Did you know that Amazon includes status messages about the health of availability zones in the output of the ec2-describe-availability-zones command, the associated API call, and the AWS console? Right…
Installing AWS Command Line Tools Using Ubuntu Packages
See also: Installing AWS Command Line Tools from Amazon Downloads Here are the steps for installing the AWS command line tools that are currently available as Ubuntu packages. These include:…
Ubuntu Developer Summit, May 2012 (Oakland)
I will be attending the Ubuntu Developer Summit (UDS) next week in Oakland, CA. ┬áThis event brings people from around the world together in one place every six months to…
Uploading Known ssh Host Key in EC2 user-data Script
The ssh protocol uses two different keys to keep you secure: The user ssh key is the one we normally think of. This authenticates us to the remote host, proving…
Seeding Torrents with Amazon S3 and s3cmd on Ubuntu
Amazon Web Services is such a huge, complex service with so many products and features that sometimes very simple but powerful features fall through the cracks when you’re reading the…
CloudCamp
There are a number of CloudCamp events coming up in cities around the world. These are free events, organized around the various concepts, technologies, and services that fall under the…
Use the Same Architecture (64-bit) on All EC2 Instance Types
A few hours ago, Amazon AWS announced that all EC2 instance types can now run 64-bit AMIs. Though t1.micro, m1.small, and c1.medium will continue to also support 32-bit AMIs, it…
ec2-consistent-snapshot on GitHub and v0.43 Released
The source for ec2-conssitent-snapshot has historically been available here: ec2-consistent-snapshot on Launchpad.net using Bazaar For your convenience, it is now also available here: ec2-consistent-snapshot on GitHub using Git You are…
You Should Use EBS Boot Instances on Amazon EC2
EBS boot vs. instance-store If you are just getting started with Amazon EC2, then use EBS boot instances and stop reading this article. Forget that you ever heard about instance-store…