Amazon Glacier: Cheap & Mysterious

It's a penny per GB per month saved to multiple locations and that's all you need to know — or so Amazon.com believes. I think Glacier sounds like an paradigm-shifting service that I already wrote about when I first heard about it.

For those who haven't been following, here's a summary:

  • It's $.01/GB per month of data stored in Glacier
  • There are no upload bandwidth charges at all
  • There are no download bandwidth charges — as long as you don't exceed a daily pro-rated quota of 5% of your total storage.  (I believe this should translate into no download bandwidth charges for most people.) 
  • Amazon says that Glacier was designed to provide an "annual durability of 99.999999999%"  It's here where things get interesting and mysterious.
  • If you ask to retrieve an archive, it takes a few hours to assemble that archive for downloading.  Amazon says that "Most jobs will take between 3 to 5 hours to complete."
  • If you delete archives that are less than three months old, there is a charge.

I think the pricing is awesome. I also think the durability sounds awesome.  I'm just not a huge fan of what happens when you ask them what that means.  Before I get into my direct interaction with them, I want to point out a few things from the website.

On one hand, the availability numbers for S3 and Glacier are the same.  What's not the same is how they explain those numbers.  Are the explanations different because the implementations are different?  Or is it just an oversight?  The following are direct quotes from their website (italics added):

Q: How durable is Amazon S3?

Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. In addition, Amazon S3 is designed to sustain the concurrent loss of data in two facilities.

Q: How is Amazon S3 designed to achieve 99.999999999% durability?

Amazon S3 redundantly stores your objects on multiple devices across multiple facilities in an Amazon S3 Region. The service is designed to sustain concurrent device failures by quickly detecting and repairing any lost redundancy. When processing a request to store data, the service will redundantly store your object across multiple facilities before returning SUCCESS. Amazon S3 also regularly verifies the integrity of your data using checksums.

Q: How durable is Amazon Glacier?

Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.

On one hand, these appear to be two different wordings of the same thing.  However, note that it says that "S3 is designed to sustatin the concurrent loss of data in two facilities," but it does not say that about Glacier.  Secondly, notice the addition of the words "average annual" to the durability guarantee.  Is the data in Glacier less safe than the data in S3?  Or is this wording simply an oversight?  What happened, pray tell, when I started asking questions? First, let's talk about the questions they did answer.

I mentioned that I see that a retrieval request is only available for 24 hours, and asked what happens if the data set is large enough that it takes me longer than 24 hours to download it?  Amazon's response was basically, "don't do that."  (They said, "We anticipate that customers will size their archives in a way that allows them to comfortably download an archive within 24 hours once retrieved.)  This is therefore something you're really going to want to discuss with whomever is providing your interface to Glacier.

I als


Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data