how I was surprised by a large AWS charge and how to calculate the break-even point
Glacier Archival of S3 Objects
Amazon recently introduced a fantastic new feature where S3 objects can be automatically migrated over to Glacier storage based on the S3 bucket, the key prefix, and the number of days after object creation.
This makes it trivially easy to drop files in S3, have fast access to them for a while, then have them automatically saved to long-term storage where they can’t be accessed as quickly, but where the storage charges are around a tenth of the price.
…or so I thought.
S3 Lifecycle Rule
My first use of this feature was on some buckets where I store about 350 GB of data that fits the Glacier use pattern perfectly: I want to save it practically forever, but expect to use it rarely.
It was straight forward to use the S3 Console to add a lifecycle rule to the S3 buckets so that all objects are archived to Glacier after 60 days:
(Long time readers of this blog may be surprised I didn’t list the command lines to accomplish this task, but Amazon has not yet released useful S3 tools that include the required functionality.)
Since all of the objects in the buckets were more than 60 days old, I expected them to be transitioned to Glacier within a day, and true to Amazon’s documentation, this occurred on schedule.
Surprise Charge
What I did not expect was an email alert from my AWS billing alarm monitor on this account letting me know that I had just passed $200 for the month, followed a few hours later by an alert for $300, followed by an alert for a $400 trigger.
This is one of my personal accounts, so a rate of several hundred dollars a day is not sustainable. Fortunately, a quick investigation showed that this increase was due to one time charges, so I wasn’t about to run up a $10k monthly bill.
The line item on the AWS Activity report showed the source of the new charge:
$0.05 per 1,000 Glacier Requests x 5,306,220 Requests = $265.31
It had not occurred to me that there would be much of a charge for transitioning the objects from S3 to Glacier. I should have read the S3 Pricing page, where Amazon states:
Glacier Archive and Restore Requests: $0.05 per 1,000 requests
This is five times as expensive as the initial process of putting objects into S3, which is $0.01 per 1,000 PUT requests.
There is one “archive request” for each S3 object that is transitioned from S3 to Glacier, and I had over five million objects in these buckets, something I didn’t worry about previously because my monthly S3 charges were based on the total GB, not the number of objects.5306220
Overhead per Glacier Object
josh.monet has pointed out in the comments that Amazon has documented some Glacier storage overhead:
For each S3 object migrated to Glacier, Amazon adds “an additional 32 KB of Glacier data plus an additional 8 KB of S3 standard storage data”.
Storage for this overhead is charged at standard Glacier and S3 prices. This makes Glacier completely unsuitable for small objects.
Break-even Point
After stopping to think about it, I realized that I was still saving money on the long term by moving objects in these S3 buckets to Glacier storage. This one-time up front cost was going to be compensated for slowly by my monthly savings, because Glacier is cheap, even compared to the reasonably cheap S3 storage costs, at least for larger files.
Here are the results of my calculations:
Monthly cost of storing in S3: 350 GB x $0.095/GB = $33.25
Monthly cost of storing in Glacier: $8.97
- 350 GB x $0.01/GB = $3.50
- Glacier overhead: 5.3 million * 32 KB * $0.01/GB = $1.62
- S3 overhead: 5.3 million * 8 KB * $0.95/GB = $3.85
One time cost to transition 5.3 million objects from S3 to Glacier: $265
Months until I start saving money by moving to Glacier: 11
Savings per year after first 11 months: $291 (73%)
For this data’s purpose, everything eventually works out to an advantage, so thanks, Amazon! I will, however, think twice before doing this with other types of buckets, just to make sure that the data is large enough and is going to be sitting around long enough in Glacier to be worth the transition costs.
As it turns out, the primary factor in how long it takes to break even is the average size of the S3 objects. If the average size of my data files were larger, then I would start saving money sooner.
Here’s the formula… The number of months to break even and start saving money when transferring S3 objects to Glacier is:
break-even months = 631,613 / (average S3 object size in bytes - 13,011)
(units apologies to math geeks)
In my case, the average size of the S3 objects was 70,824 bytes (about 70 KB). Applying the above formula:
631,613 / (70,824 - 13,011) = 10.9
or about 11 months until the savings in Glacier over S3 covers the cost of moving my objects from S3 to Glacier.
If you are storing 1 KB data records in S3, it would take over 50 years
to justify transitioning them to Glacier.
Looking closely at the above formula, you can see that any object 13 KB or smaller is going to cost more to transition to Glacier rather than leaving it in S3. Files approaching that size are going to save too little money to justify the transfer costs.
The above formula assumes an S3 storage cost of $0.095 per GB per month in us-east-1. If you are storing more than a TB, then you’re into the $0.08 tier or lower, so your break-even point will take longer and you’ll want to do more calculations to find your savings.
[Update 2012-12-19: Included additional S3 and Glacier storage overhead per item. Thanks to josh.monet for pointing us to this information buried in the S3 FAQ.]



Follow Eric Hammond on Twitter
For small objects another solution would be to have a script which downloads, packages in an archive file and uploads the archive to Glacier. The data retrieval would also be cost effective as long as each archive file is not too big.
Hopefully Amazon will provide this option one day: a daily, weekly, or monthly auto-archive from S3 to Glacier into a single archive file.
AItOawknYVk9pU8aZcTo0QSv8pcSapVoAakMXAc:
Custom archivers would make sense given the current restrictions. It's just a lot more work than entering a day count and clicking a checkbox. Alternatively, Amazon could change the structure for the archiving costs if they could work it out with their internal processes. They have to make sure they account for all edge cases (huge objects and minuscule objects).
Another reason to archive (and an important factor for calculating the break-even on small objects) is the Glacier archive overhead. Every Glacier archive has 32KB of overhead:
https://aws.amazon.com/glacier/faqs/#How_is_my_storage_charge_calculated
And when stored from S3, it is actually 40KB:
https://aws.amazon.com/s3/faqs/#How_is_my_storage_charge_calculated_for_Amazon_S3_objects_archived_to_Amazon_Glacier
I couldn't find an official reference for the S3 overhead, but it seems like it is closer to 1KB:
https://forums.aws.amazon.com/thread.jspa?threadID=82490
josh.monet:
Thanks, that's some mighty important fine print I missed!
I've updated the article with the new calculations required; it has a dramatic effect.
I'm still happy with my decision to archive these buckets to Glacier, but the formula for calculating this break-even point gets complicated.
Dumping small files into an archive before sending them to Glacier sounds like a good idea.
And if done from EC2 in the same AWS region I think you would get around data transfer cost. Then it's just the S3 GET and DELETE requests that you'll pay for, but they're pretty cheap.
Mark Seigle, Tech Lead for Amazon Glacier, talked a lot about file aggregation at re:invent 2012. He recommended to
1) archive small files using zip/7zip and put one generated file into Glacier
2) store central directory of that zip file locally
3) retrieve just the part of zip file containing the file you want using the central directory
for small files.
You can see his session here (file aggregation part starts at 13:00)
http://youtu.be/WXLxc2wRCwY
Anyway, thanks for excellent article and update, it helped me a lot.
harupong, jopsen.dk:
Yes, you can certainly develop your own software and processes to:
(1) extract data from S3
(2) aggregate (tar/zip)
(3) upload to Glacier
(4) delete from S3
as well as the reverse for when you need to access files stored in Glacier.
However, the automatic archival from S3 to Glacier performed by Amazon is trivially easy to turn on. It also leaves the objects visible in S3 and makes it easy to temporarily restore individual objects to S3 for reading.
As currently structured, you can only get the convenience and savings with larger S3 objects. It's a great start, but hopefully Amazon can work on extending this to smaller S3 objects at some point.
Thank you Eric for a great article. I too got burned in transferring my S3 content to Glacier, however, not because I used Amazon's own tools - instead I used [third party service elided].
Has anyone worked out if it's a whole lot cheaper to simply to reupload all of your content to a new Glacier bucket? Does that somehow get around the 'Put' costs?
Hey, great post. We actually built this scenario up for some of our larger users in PlanForCloud - it's a simulation engine which lets you design your scenario (so in this case moving xGB of data to Glacier and expected y read requests and z write requests), we then run this through a simulation to see how much it would cost.
Would love to get your thoughts on the tool.
Cheers,
Hassan
Product Manager at PlanForCloud
hotmoss:
Yes, You can reduce the per-file PUT and overhead costs in Glacier if you combine many S3 files into a single tar/zip/7z file. However, you do lose a lot of convenience built in to the S3-Glacier integration.