how I was surprised by a large AWS charge and how to calculate the
break-even point
Glacier Archival of S3 Objects
Amazon recently introduced a fantastic new feature where S3 objects
can be automatically migrated over to Glacier storage based on
the S3 bucket, the key prefix, and the number of days after object
creation.
This makes it trivially easy to drop files in S3, have fast access to
them for a while, then have them automatically saved to long-term
storage where they can’t be accessed as quickly, but where the
storage charges are around a tenth of the price.
…or so I thought.
S3 Lifecycle Rule
My first use of this feature was on some buckets where I store about
350 GB of data that fits the Glacier use pattern perfectly: I want to
save it practically forever, but expect to use it rarely.
It was straight forward to use the S3 Console to add a
lifecycle rule to the S3 buckets so that all objects are archived to
Glacier after 60 days:
(Long time readers of this blog may be surprised I didn’t list the
command lines to accomplish this task, but Amazon has not yet released
useful S3 tools that include the required functionality.)
Since all of the objects in the buckets were more than 60 days old, I
expected them to be transitioned to Glacier within a day, and true to
Amazon’s documentation, this occurred on schedule.
Surprise Charge
What I did not expect was an email alert from my AWS billing alarm
monitor on this account letting me know that I had just passed $200
for the month, followed a few hours later by an alert for $300,
followed by an alert for a $400 trigger.
This is one of my personal accounts, so a rate of several hundred
dollars a day is not sustainable. Fortunately, a quick investigation
showed that this increase was due to one time charges, so I wasn’t
about to run up a $10k monthly bill.
The line item on the AWS Activity report showed the source
of the new charge:
$0.05 per 1,000 Glacier Requests x 5,306,220 Requests = $265.31
It had not occurred to me that there would be much of a charge for
transitioning the objects from S3 to Glacier. I should have read the
S3 Pricing page, where Amazon states:
Glacier Archive and Restore Requests: $0.05 per 1,000 requests
This is five times as expensive as the initial process of putting
objects into S3, which is $0.01 per 1,000 PUT requests.
There is one “archive request” for each S3 object that is transitioned
from S3 to Glacier, and I had over five million objects in these
buckets, something I didn’t worry about previously because my monthly
S3 charges were based on the total GB, not the number of objects.5306220
Overhead per Glacier Object
josh.monet has pointed out in the comments that Amazon has documented some Glacier storage overhead:
For each S3 object migrated to Glacier, Amazon adds “an additional 32 KB of Glacier data plus an additional 8 KB of S3 standard storage data”.
Storage for this overhead is charged at standard Glacier and S3 prices. This makes Glacier
completely unsuitable for small objects.
Break-even Point
After stopping to think about it, I realized that I was still saving
money on the long term by moving objects in these S3 buckets to
Glacier storage. This one-time up front cost was going to be
compensated for slowly by my monthly savings, because Glacier is
cheap, even compared to the reasonably cheap S3 storage costs, at least for
larger files.
Here are the results of my calculations:
Monthly cost of storing in S3: 350 GB x $0.095/GB = $33.25
Monthly cost of storing in Glacier: $8.97
- 350 GB x $0.01/GB = $3.50
- Glacier overhead: 5.3 million * 32 KB * $0.01/GB = $1.62
- S3 overhead: 5.3 million * 8 KB * $0.95/GB = $3.85
One time cost to transition 5.3 million objects from S3 to Glacier: $265
Months until I start saving money by moving to Glacier: 11
Savings per year after first 11 months: $291 (73%)
For this data’s purpose, everything eventually works out to an advantage,
so thanks, Amazon! I will, however, think twice before doing this
with other types of buckets, just to make sure that the data is large
enough and is going to be sitting around long enough in Glacier to be
worth the transition costs.
As it turns out, the primary factor in how long it takes to break even
is the average size of the S3 objects. If the average size of my data
files were larger, then I would start saving money sooner.
Here’s the formula… The number of months to break even and start
saving money when transferring S3 objects to Glacier is:
break-even months = 631,613 / (average S3 object size in bytes - 13,011)
(units apologies to math geeks)
In my case, the average size of the S3 objects was 70,824 bytes (about
70 KB). Applying the above formula:
631,613 / (70,824 - 13,011) = 10.9
or about 11 months until the savings in Glacier over S3 covers the cost
of moving my objects from S3 to Glacier.
If you are storing 1 KB data records in S3, it would take over 50 years
to justify transitioning them to Glacier.
Looking closely at the above formula, you can see that any object 13 KB or smaller is going to cost more to transition to Glacier rather than leaving it in S3. Files approaching that size are going to save too little money to justify the transfer costs.
The above formula assumes an S3 storage cost of $0.095 per GB per
month in us-east-1. If you are storing more than a TB, then you’re
into the $0.08 tier or lower, so your break-even point will take
longer and you’ll want to do more calculations to find your savings.
[Update 2012-12-19: Included additional S3 and Glacier storage overhead per item. Thanks to josh.monet for pointing us to this information buried in the S3 FAQ.]