how I was surprised by a large AWS charge and how to calculate the break-even point
Glacier Archival of S3 Objects
Amazon recently introduced a fantastic new feature where S3 objects can be automatically migrated over to Glacier storage based on the S3 bucket, the key prefix, and the number of days after object creation.
This makes it trivially easy to drop files in S3, have fast access to them for a while, then have them automatically saved to long-term storage where they can’t be accessed as quickly, but where the storage charges are around a tenth of the price.
…or so I thought.
S3 Lifecycle Rule
My first use of this feature was on some buckets where I store about 350 GB of data that fits the Glacier use pattern perfectly: I want to save it practically forever, but expect to use it rarely.
It was straight forward to use the S3 Console to add a lifecycle rule to the S3 buckets so that all objects are archived to Glacier after 60 days:
(Long time readers of this blog may be surprised I didn’t list the command lines to accomplish this task, but Amazon has not yet released useful S3 tools that include the required functionality.)
Since all of the objects in the buckets were more than 60 days old, I expected them to be transitioned to Glacier within a day, and true to Amazon’s documentation, this occurred on schedule.
What I did not expect was an email alert from my AWS billing alarm monitor on this account letting me know that I had just passed $200 for the month, followed a few hours later by an alert for $300, followed by an alert for a $400 trigger.
This is one of my personal accounts, so a rate of several hundred dollars a day is not sustainable. Fortunately, a quick investigation showed that this increase was due to one time charges, so I wasn’t about to run up a $10k monthly bill.
The line item on the AWS Activity report showed the source of the new charge:
$0.05 per 1,000 Glacier Requests x 5,306,220 Requests = $265.31
It had not occurred to me that there would be much of a charge for transitioning the objects from S3 to Glacier. I should have read the S3 Pricing page, where Amazon states:
Glacier Archive and Restore Requests: $0.05 per 1,000 requests
This is five times as expensive as the initial process of putting objects into S3, which is $0.01 per 1,000 PUT requests.
There is one “archive request” for each S3 object that is transitioned from S3 to Glacier, and I had over five million objects in these buckets, something I didn’t worry about previously because my monthly S3 charges were based on the total GB, not the number of objects.5306220
Overhead per Glacier Object
josh.monet has pointed out in the comments that Amazon has documented some Glacier storage overhead:
For each S3 object migrated to Glacier, Amazon adds “an additional 32 KB of Glacier data plus an additional 8 KB of S3 standard storage data”.
Storage for this overhead is charged at standard Glacier and S3 prices. This makes Glacier completely unsuitable for small objects.
After stopping to think about it, I realized that I was still saving money on the long term by moving objects in these S3 buckets to Glacier storage. This one-time up front cost was going to be compensated for slowly by my monthly savings, because Glacier is cheap, even compared to the reasonably cheap S3 storage costs, at least for larger files.
Here are the results of my calculations:
Monthly cost of storing in S3: 350 GB x $0.095/GB = $33.25
Monthly cost of storing in Glacier: $8.97
- 350 GB x $0.01/GB = $3.50
- Glacier overhead: 5.3 million * 32 KB * $0.01/GB = $1.62
- S3 overhead: 5.3 million * 8 KB * $0.95/GB = $3.85
One time cost to transition 5.3 million objects from S3 to Glacier: $265
Months until I start saving money by moving to Glacier: 11
Savings per year after first 11 months: $291 (73%)
For this data’s purpose, everything eventually works out to an advantage, so thanks, Amazon! I will, however, think twice before doing this with other types of buckets, just to make sure that the data is large enough and is going to be sitting around long enough in Glacier to be worth the transition costs.
As it turns out, the primary factor in how long it takes to break even is the average size of the S3 objects. If the average size of my data files were larger, then I would start saving money sooner.
Here’s the formula… The number of months to break even and start saving money when transferring S3 objects to Glacier is:
break-even months = 631,613 / (average S3 object size in bytes - 13,011)
(units apologies to math geeks)
In my case, the average size of the S3 objects was 70,824 bytes (about 70 KB). Applying the above formula:
631,613 / (70,824 - 13,011) = 10.9
or about 11 months until the savings in Glacier over S3 covers the cost of moving my objects from S3 to Glacier.
Looking closely at the above formula, you can see that any object 13 KB or smaller is going to cost more to transition to Glacier rather than leaving it in S3. Files approaching that size are going to save too little money to justify the transfer costs.
The above formula assumes an S3 storage cost of $0.095 per GB per month in us-east-1. If you are storing more than a TB, then you’re into the $0.08 tier or lower, so your break-even point will take longer and you’ll want to do more calculations to find your savings.
[Update 2012-12-19: Included additional S3 and Glacier storage overhead per item. Thanks to josh.monet for pointing us to this information buried in the S3 FAQ.]
[Update 2013-03-07] Amazon S3 documentation now has a section on Glacier Pricing Considerations that has some good pointers.