In case you missed it, Amazon just announced a new storage cloud service called Glacier. It's designed as a target for archive and backup data at a cost of $.01/GB/mth. That's right, one penny per month per GB. I think my first tweet on this sums up my feelings on this matter: "Amazon glacier announcement today. 1c/GB per month for backup archive type data. Wow. Seriously."
I think Amazon designed and priced this service very well. The price includes unlimited transfers of data into the service. The price also includes retrieving/restoring up to 5% of your total storage per month, and it includes unlimited retrievals/restores from Glacier into EC2. If you want to retrieve/restore more than 5% of your data in a given month, additional retrievals/restores are priced at $.05/GB-$.12/GB depending on the amount you're restoring. Since most backup and archive systems store, store, store and backup, backup, backup and never retrieve or restore, I'd say that it's safe to say that most people's cost will be only $.01/GB/month. (There are some other things you can do to drive up costs, so make sure you're aware of them, but I think as long as you take them into consideration in the design of your system, they shouldn't hit you.)
This low price comes at a cost, starting with the fact that retrievals take a while. Each retrieval request initiates a retrieval job, and each job takes 3-5 hours to complete. That's 3-5 hours before you can begin downloading the first byte to your datacenter. Then it's available for download for another 24 hours.
This is obviously not for mission critical data that needs to be retrieved in minutes. If that doesn't meet your needs, don't use the service. But my thinking is that it is perfectly matched to the way people use archive systems, and to a lesser degree how they use backup systems.
It's better suited for archive, which is why Amazon uses that term first to describe this system. It also properly uses the term retrieve instead of restore. (A retrieve is what an archive system does; a restore is what a backup system does.) Good on ya, Amazon! Glacier could be used for backup, as long as you're going to do small restores, and RTOs of many, many hours are OK. But it's perfect for archives.
We need software! (But not from Amazon!)
Right now Glacier is just an API; there is no backup or archive software that writes to that API. A lot of people on twitter and on Glacier's forum seem to think this is lame and that Amazon should come out with some backup software.
First, let me say that this is how Amazon has always done things. Here's where you can put some storage (S-3), but it's just an API. Here's where you can put some servers (EC2), but what you put in those virtual servers is up to you. This is no different.
Second, let me say that I don't want Amazon to come out with backup software. I want all commercial backup software apps and appliances to write to Glacier as a backup target. I'm sure Jungledisk, which currently writes to S-3, will add Glacier support posthaste. So will all the other backup software products that currently know how to write to S-3. They'll never do that, though, if they have to compete with Amazon's own backup app. These apps and appliances writing to Glacier will add deduplication and compression, significantly dropping the effective price of Glacier — and making archives and backups use far less bandwidth.
We all have questions that the Amazon announcement did not answer. I have asked these questions of Amazon and am awaiting an answer. I'll let you know what they say.
- Is this on disk, tape, or both? (I've heard unofficially that the official answer is no answer, but I'll wait to see what they say to me directly.)
- The briefing says that it distributes my data across mutliple locations. Are they saying that every archive will be in at least two locations, or are they saying they're doing some type of multiple location redundacy. (Think RAID across locations.)
- It says that downloads are avaialble for 24 hours. What if it takes me longer than 24 hours to download something.
- What about tape-based seeding for large archives, or tape-based retrieval of large archives?'
ZDNet's Cost Article
Jack Clark of ZDNet wrote an article that said that Glacier's 1c/GB/mth pricing was ten times that of tape. Suffice it to say that I believe his numbers are way off. I'm writing a blog post to respond to his article, but it will be a long one and a difficult read with lots of numbers and math. I know you can't wait.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.