Zdnet confused about Amazon Glacier pricing

Jack Clark of ZDNet wrote an article entitled AWS Glacier's dazzling price benefits melt next to the cost of tape, where he compares what he believes is the cost of storing 10 PB on tape for five years, versus the cost of doing the same with Amazon's Glacier service.  His conclusion is that Amazon's 1c/GB price is ten times the cost of tape.

I mean no disrespect, but I don't believe Jack Clark has ever had anything to do with a total cost of ownership (TCO) study of anything in IT.  Because if he had, he'd know that the acquisition cost of the hardware is only a fraction of the TCO of any given IT system. If only IT systems only cost what they cost when you buy them…. If only.

So what does it really cost to store 10 PB on tape?  Let's take a look at two published TCO studies to find out.  Before looking at these studies, let me say that since both studies were sponsored by tape companies, the point of them was to prove that tape systems are cheaper than disk systems. If these studies are biased in any way, it would be that they might underestimate the price of tape, since the purpose of these two, uh, independent studies is to prove that tape is cheaper.  (In fact, I wrote about one of the reports being significantly biased in favor of tape.)

Clipper Group Report

The first report we'll look at is the Clipper Group report that said that tape was 15 times cheaper than disk.  It's a very different report, but I'm going to use the graph on page 3, as it gives what it believes to be the TCO of storing a TB of data on tape for a year, based on four different three-year "cycles" of a 12-year period. 

diskvtape

As you can see, the cost per TB is much higher in the first three years, because it includes the cost of buying a tape library that is much larger than it needs to be for that period — because you must plan for growth.  (This, of course, is one of the major advantages of the Glacier model — you only pay for what you use.)  But to get close to Mr. Clark's five-year period, I need to use two three-year periods.

The other problem with the report is that they use graphs and don't show the actual numbers, and they use scales that make the tape numbers look really small.  You can see how difficult it is to figure out the actual numbers for tape.  It is, easy, however, to figure out the cost numbers for disk and then divide them by the multiplier shown in the graph.

The disk number for the first three-year period looks to be about $2600, which is said to be 9x the price of tape.  I divide that $2600 by 9 and I get $288/TB for that 3 year period, which matches up with the line for tape on the graph. Divide it by 3 and we get $96/TB per year.  The disk cost of the second period is $1250/TB. Divide it by15x and you get $83/TB for that 3 year period; divide that by 3 to get $27/TB per year.  If I average those two together, I get $61/TB per year.  Since Amazon Glacier stores your data in multiple locations, we'll need two copies, so the cost is $122/TB per year for two copies.  Since Jack Clark used 10 PB for five years, we'll multiply this by 10,000 to get to 10 PB, then by five to get to five years.  This gives us a cost of $6,100,000 for to store 10 PB on tape for five years, based on the numbers from the Clipper Group study.

Crossroads Report

Let's look at a more recent report that compares a relatively new idea of using a disk front end to LTFS-based tape.  The first fully-baked system of this type is from Crossroads, and they just happen to have created a TCO study that compares the cost of storing 2PB on their system (a combination of disk and tape) vs storing it on disk for ten years.  Awesome! Their 10-year cost for this is $1.64M.  Divide 2PB by 2000 gives us 1TB, then dividing the 10 year cost by 10 gives us the cost of $80/TB for one year.  Double it like we did the last number, and we have $160/TB/yr for two copies. Mutiply it by 10,000 (10 PB) and then again by five (five years) gives us a cost of $8M for 10 PB for five years based on the Crossroads Report.

On a side note, the Crossroads Strongbox system has the ability to replicate backups between two locations using their disk front end.  This makes this system a lot more like what Amazon is offering with their Glacier service.  (As opposed to traditional use of tape like the Clipper Group report was based on, where you'd also have to pay for someone like Iron Mountain to move tapes around as well.)

Net net

According to two TCO studies, storing two copies of 10 PB of data on tape for five years costs the same or more than it costs to store that same data on Amazon's Glacier.

And you don't have to buy everything up front and you only pay for what you use.  You don't have to plan for anything but bandwidth.  Yes, this will only work for data whose usage pattern matches what they offer, but they sure have made it cheap — and you don't have to manage it!

Not bad.

 

Continue reading

Amazon Glacier changes the game

In case you missed it, Amazon just announced a new storage cloud service called Glacier.  It's designed as a target for archive and backup data at a cost of $.01/GB/mth.  That's right, one penny per month per GB.  I think my first tweet on this sums up my feelings on this matter: "Amazon glacier announcement today. 1c/GB per month for backup archive type data. Wow. Seriously."

I think Amazon designed and priced this service very well.  The price includes unlimited transfers of data into the service.  The price also includes retrieving/restoring up to 5% of your total storage per month, and it includes unlimited retrievals/restores from Glacier into EC2.  If you want to retrieve/restore more than 5% of your data in a given month, additional retrievals/restores are priced at $.05/GB-$.12/GB depending on the amount you're restoring. Since most backup and archive systems store, store, store and backup, backup, backup and never retrieve or restore, I'd say that it's safe to say that most people's cost will be only $.01/GB/month.  (There are some other things you can do to drive up costs, so make sure you're aware of them, but I think as long as you take them into consideration in the design of your system, they shouldn't hit you.)

This low price comes at a cost, starting with the fact that retrievals take a while.  Each retrieval request initiates a retrieval job, and each job takes 3-5 hours to complete.  That's 3-5 hours before you can begin downloading the first byte to your datacenter.  Then it's available for download for another 24 hours.  

This is obviously not for mission critical data that needs to be retrieved in minutes.  If that doesn't meet your needs, don't use the service.  But my thinking is that it is perfectly matched to the way people use archive systems, and to a lesser degree how they use backup systems.

It's better suited for archive, which is why Amazon uses that term first to describe this system.  It also properly uses the term retrieve instead of restore.  (A retrieve is what an archive system does; a restore is what a backup system does.)  Good on ya, Amazon!  Glacier could be used for backup, as long as you're going to do small restores, and RTOs of many, many hours are OK.  But it's perfect for archives.

We need software!  (But not from Amazon!)

Right now Glacier is just an API; there is no backup or archive software that writes to that API.  A lot of people on twitter and on Glacier's forum seem to think this is lame and that Amazon should come out with some backup software.

First, let me say that this is how Amazon has always done things.  Here's where you can put some storage (S-3), but it's just an API.  Here's where you can put some servers (EC2), but what you put in those virtual servers is up to you.  This is no different.

Second, let me say that I don't want Amazon to come out with backup software.  I want all commercial backup software apps and appliances to write to Glacier as a backup target.  I'm sure Jungledisk, which currently writes to S-3, will add Glacier support posthaste.  So will all the other backup software products that currently know how to write to S-3. They'll never do that, though, if they have to compete with Amazon's own backup app.  These apps and appliances writing to Glacier will add deduplication and compression, significantly dropping the effective price of Glacier — and making archives and backups use far less bandwidth.

Questions

We all have questions that the Amazon announcement did not answer.  I have asked these questions of Amazon and am awaiting an answer.  I'll let you know what they say.

  1. Is this on disk, tape, or both?  (I've heard unofficially that the official answer is no answer, but I'll wait to see what they say to me directly.)
  2. The briefing says that it distributes my data across mutliple locations.  Are they saying that every archive will be in at least two locations, or are they saying they're doing some type of multiple location redundacy.  (Think RAID across locations.)
  3. It says that downloads are avaialble for 24 hours.  What if it takes me longer than 24 hours to download something.
  4. What about tape-based seeding for large archives, or tape-based retrieval of large archives?'

ZDNet's Cost Article

Jack Clark of ZDNet wrote an article that said that Glacier's 1c/GB/mth pricing was ten times that of tape.  Suffice it to say that I believe his numbers are way off.  I'm writing a blog post to respond to his article, but it will be a long one and a difficult read with lots of numbers and math.  I know you can't wait.

 

Continue reading