Cost of Transitioning S3 Objects to Glacier

| 10 Comments

how I was surprised by a large AWS charge and how to calculate the break-even point

Glacier Archival of S3 Objects

Amazon recently introduced a fantastic new feature where S3 objects can be automatically migrated over to Glacier storage based on the S3 bucket, the key prefix, and the number of days after object creation.

This makes it trivially easy to drop files in S3, have fast access to them for a while, then have them automatically saved to long-term storage where they can’t be accessed as quickly, but where the storage charges are around a tenth of the price.

…or so I thought.

S3 Lifecycle Rule

My first use of this feature was on some buckets where I store about 350 GB of data that fits the Glacier use pattern perfectly: I want to save it practically forever, but expect to use it rarely.

It was straight forward to use the S3 Console to add a lifecycle rule to the S3 buckets so that all objects are archived to Glacier after 60 days:

S3 Lifecycle Rule

(Long time readers of this blog may be surprised I didn’t list the command lines to accomplish this task, but Amazon has not yet released useful S3 tools that include the required functionality.)

Since all of the objects in the buckets were more than 60 days old, I expected them to be transitioned to Glacier within a day, and true to Amazon’s documentation, this occurred on schedule.

Surprise Charge

What I did not expect was an email alert from my AWS billing alarm monitor on this account letting me know that I had just passed $200 for the month, followed a few hours later by an alert for $300, followed by an alert for a $400 trigger.

This is one of my personal accounts, so a rate of several hundred dollars a day is not sustainable. Fortunately, a quick investigation showed that this increase was due to one time charges, so I wasn’t about to run up a $10k monthly bill.

The line item on the AWS Activity report showed the source of the new charge:

$0.05 per 1,000 Glacier Requests x 5,306,220 Requests = $265.31

It had not occurred to me that there would be much of a charge for transitioning the objects from S3 to Glacier. I should have read the S3 Pricing page, where Amazon states:

Glacier Archive and Restore Requests: $0.05 per 1,000 requests

This is five times as expensive as the initial process of putting objects into S3, which is $0.01 per 1,000 PUT requests.

There is one “archive request” for each S3 object that is transitioned from S3 to Glacier, and I had over five million objects in these buckets, something I didn’t worry about previously because my monthly S3 charges were based on the total GB, not the number of objects.5306220

Overhead per Glacier Object

josh.monet has pointed out in the comments that Amazon has documented some Glacier storage overhead:

For each S3 object migrated to Glacier, Amazon adds “an additional 32 KB of Glacier data plus an additional 8 KB of S3 standard storage data”.

Storage for this overhead is charged at standard Glacier and S3 prices. This makes Glacier completely unsuitable for small objects.

Break-even Point

After stopping to think about it, I realized that I was still saving money on the long term by moving objects in these S3 buckets to Glacier storage. This one-time up front cost was going to be compensated for slowly by my monthly savings, because Glacier is cheap, even compared to the reasonably cheap S3 storage costs, at least for larger files.

Here are the results of my calculations:

  • Monthly cost of storing in S3: 350 GB x $0.095/GB = $33.25

  • Monthly cost of storing in Glacier: $8.97

    • 350 GB x $0.01/GB = $3.50
    • Glacier overhead: 5.3 million * 32 KB * $0.01/GB = $1.62
    • S3 overhead: 5.3 million * 8 KB * $0.95/GB = $3.85
  • One time cost to transition 5.3 million objects from S3 to Glacier: $265

  • Months until I start saving money by moving to Glacier: 11

  • Savings per year after first 11 months: $291 (73%)

For this data’s purpose, everything eventually works out to an advantage, so thanks, Amazon! I will, however, think twice before doing this with other types of buckets, just to make sure that the data is large enough and is going to be sitting around long enough in Glacier to be worth the transition costs.

As it turns out, the primary factor in how long it takes to break even is the average size of the S3 objects. If the average size of my data files were larger, then I would start saving money sooner.

Here’s the formula… The number of months to break even and start saving money when transferring S3 objects to Glacier is:

break-even months = 631,613 / (average S3 object size in bytes - 13,011)

(units apologies to math geeks)

In my case, the average size of the S3 objects was 70,824 bytes (about 70 KB). Applying the above formula:

631,613 / (70,824 - 13,011) = 10.9

or about 11 months until the savings in Glacier over S3 covers the cost of moving my objects from S3 to Glacier.

If you are storing 1 KB data records in S3, it would take over 50 years to justify transitioning them to Glacier.

Looking closely at the above formula, you can see that any object 13 KB or smaller is going to cost more to transition to Glacier rather than leaving it in S3. Files approaching that size are going to save too little money to justify the transfer costs.

The above formula assumes an S3 storage cost of $0.095 per GB per month in us-east-1. If you are storing more than a TB, then you’re into the $0.08 tier or lower, so your break-even point will take longer and you’ll want to do more calculations to find your savings.

[Update 2012-12-19: Included additional S3 and Glacier storage overhead per item. Thanks to josh.monet for pointing us to this information buried in the S3 FAQ.]

10 Comments

For small objects another solution would be to have a script which downloads, packages in an archive file and uploads the archive to Glacier. The data retrieval would also be cost effective as long as each archive file is not too big.
Hopefully Amazon will provide this option one day: a daily, weekly, or monthly auto-archive from S3 to Glacier into a single archive file.

AItOawknYVk9pU8aZcTo0QSv8pcSapVoAakMXAc:

Custom archivers would make sense given the current restrictions. It's just a lot more work than entering a day count and clicking a checkbox. Alternatively, Amazon could change the structure for the archiving costs if they could work it out with their internal processes. They have to make sure they account for all edge cases (huge objects and minuscule objects).

Another reason to archive (and an important factor for calculating the break-even on small objects) is the Glacier archive overhead. Every Glacier archive has 32KB of overhead:

https://aws.amazon.com/glacier/faqs/#How_is_my_storage_charge_calculated

And when stored from S3, it is actually 40KB:

https://aws.amazon.com/s3/faqs/#How_is_my_storage_charge_calculated_for_Amazon_S3_objects_archived_to_Amazon_Glacier

I couldn't find an official reference for the S3 overhead, but it seems like it is closer to 1KB:

https://forums.aws.amazon.com/thread.jspa?threadID=82490

josh.monet:

Thanks, that's some mighty important fine print I missed!

I've updated the article with the new calculations required; it has a dramatic effect.

I'm still happy with my decision to archive these buckets to Glacier, but the formula for calculating this break-even point gets complicated.

Dumping small files into an archive before sending them to Glacier sounds like a good idea.
And if done from EC2 in the same AWS region I think you would get around data transfer cost. Then it's just the S3 GET and DELETE requests that you'll pay for, but they're pretty cheap.

Mark Seigle, Tech Lead for Amazon Glacier, talked a lot about file aggregation at re:invent 2012. He recommended to

1) archive small files using zip/7zip and put one generated file into Glacier
2) store central directory of that zip file locally
3) retrieve just the part of zip file containing the file you want using the central directory

for small files.

You can see his session here (file aggregation part starts at 13:00)
http://youtu.be/WXLxc2wRCwY

Anyway, thanks for excellent article and update, it helped me a lot.

harupong, jopsen.dk:

Yes, you can certainly develop your own software and processes to:
(1) extract data from S3
(2) aggregate (tar/zip)
(3) upload to Glacier
(4) delete from S3
as well as the reverse for when you need to access files stored in Glacier.

However, the automatic archival from S3 to Glacier performed by Amazon is trivially easy to turn on. It also leaves the objects visible in S3 and makes it easy to temporarily restore individual objects to S3 for reading.

As currently structured, you can only get the convenience and savings with larger S3 objects. It's a great start, but hopefully Amazon can work on extending this to smaller S3 objects at some point.

Thank you Eric for a great article. I too got burned in transferring my S3 content to Glacier, however, not because I used Amazon's own tools - instead I used [third party service elided].

Has anyone worked out if it's a whole lot cheaper to simply to reupload all of your content to a new Glacier bucket? Does that somehow get around the 'Put' costs?

Hey, great post. We actually built this scenario up for some of our larger users in PlanForCloud - it's a simulation engine which lets you design your scenario (so in this case moving xGB of data to Glacier and expected y read requests and z write requests), we then run this through a simulation to see how much it would cost.

Would love to get your thoughts on the tool.

Cheers,
Hassan
Product Manager at PlanForCloud

hotmoss:

Yes, You can reduce the per-file PUT and overhead costs in Glacier if you combine many S3 files into a single tar/zip/7z file. However, you do lose a lot of convenience built in to the S3-Glacier integration.

Leave a comment

Ubuntu AMIs

Ubuntu AMIs for EC2:


More Entries

Replacing a CloudFront Distribution to "Invalidate" All Objects
I was chatting with Kevin Boyd (aka Beryllium) on the ##aws Freenode IRC channel about the challenge of invalidating a…
Email Alerts for AWS Billing Alarms
using CloudWatch and SNS to send yourself email messages when AWS costs accrue past limits you define The Amazon documentation…
Cost of Transitioning S3 Objects to Glacier
how I was surprised by a large AWS charge and how to calculate the break-even point Glacier Archival of S3…
Running Ubuntu on Amazon EC2 in Sydney, Australia
Amazon has announced a new AWS region in Sydney, Australia with the name ap-southeast-2. The official Ubuntu AMI lookup pages…
Save Money by Giving Away Unused Heavy Utilization Reserved Instances
You may be able to save on future EC2 expenses by selling an unused Reserved Instance for less than its…
Installing AWS Command Line Tools from Amazon Downloads
When you need an AWS command line toolset not provided by Ubuntu packages, you can download the tools directly from…
Convert Running EC2 Instance to EBS-Optimized Instance with Provisioned IOPS EBS Volumes
Amazon just announced two related features for getting super-fast, consistent performance with EBS volumes: (1) Provisioned IOPS EBS volumes, and…
Which EC2 Availability Zone is Affected by an Outage?
Did you know that Amazon includes status messages about the health of availability zones in the output of the ec2-describe-availability-zones…
Installing AWS Command Line Tools Using Ubuntu Packages
Here are the steps for installing the AWS command line tools that are currently available as Ubuntu packages. These include:…
Ubuntu Developer Summit, May 2012 (Oakland)
I will be attending the Ubuntu Developer Summit (UDS) next week in Oakland, CA.  This event brings people from around…
Uploading Known ssh Host Key in EC2 user-data Script
The ssh protocol uses two different keys to keep you secure: The user ssh key is the one we normally…
Seeding Torrents with Amazon S3 and s3cmd on Ubuntu
Amazon Web Services is such a huge, complex service with so many products and features that sometimes very simple but…
CloudCamp
There are a number of CloudCamp events coming up in cities around the world. These are free events, organized around…
Use the Same Architecture (64-bit) on All EC2 Instance Types
A few hours ago, Amazon AWS announced that all EC2 instance types can now run 64-bit AMIs. Though t1.micro, m1.small,…
ec2-consistent-snapshot on GitHub and v0.43 Released
The source for ec2-conssitent-snapshot has historically been available here: ec2-consistent-snapshot on Launchpad.net using Bazaar For your convenience, it is now…
You Should Use EBS Boot Instances on Amazon EC2
EBS boot vs. instance-store If you are just getting started with Amazon EC2, then use EBS boot instances and stop…
Retrieve Public ssh Key From EC2
A serverfault poster had a problem that I thought was a cool challenge. I had so much fun coming up…
Running EC2 Instances on a Recurring Schedule with Auto Scaling
Do you want to run short jobs on Amazon EC2 on a recurring schedule, but don’t want to pay for…
AWS Virtual MFA and the Google Authenticator for Android
Amazon just announced that the AWS MFA (multi-factor authentication) now supports virtual or software MFA devices in addition to the…
Updated EBS boot AMIs for Ubuntu 8.04 Hardy on Amazon EC2 (2011-10-06)
Canonical has released updated instance-store AMIs for Ubuntu 8.04 LTS Hardy on Amazon EC2. Read Ben Howard’s announcement on the…