I recently attended the Spectralogic Deep Storage Summit in Boulder Colorado. (They paid for all travel and meals during the trip, but no other remuneration was offered.) Their big announcement was a product that is aimed solidly at Amazon Glacier: Spectra ArcticBlue.
ActicBlue is an object-based disk system starting at 300 usable TB and going up to over 5 PB that sits directly in front of a Spectra tape library. It's aimed squarely at Amazon Glacier because its interface is S-3. You can do a get or put to it just like you would to a bucket in Amazon, except the data would be stored in the (up to) 5 PB disk cache and stored on tape in a Spectra tape library — which scale to multiple Exabytes. The product is built on top of the BlackPearl architecture that they announced two years ago.
Two products came immediately to mind when thinking about this product. Quantum's Lattus & Amazon's Glacier. It would seem that Spectra is actually aiming solidly at both. Here are a few things that are very interesting about the product.
ArcticBlue uses erasure coding — not RAID — to ensure that data on disk is not corrupted or lost. Disks are grouped into "bands" of 23 drives, which are part of a 20+3 erasure coding group. This very wide band offers protection from up to three simultaneous disk failures with very minimal overhead. If you're not familiar with erasure coding and how it is definitely not RAID, check out this article from ComputerWeekly.
Power-Down at the Band Level
When an application does a get or put to/from an S-3 bucket, only the units that comprise that bucket need to be on. This means that the rest of the system can be powered off to both save power and cooling and to extend the life of the unit. This is why they are advertising a 7-year lifespan for this product and not a 3-year lifespan. This was one big difference I saw between the ArcticBlue unit and Lattus. Lattus does not appear to have any power down features.
An S-3 bucket can be configured to span both disk and tape, ensuring that any files put onto disk are also put onto tape. It could even span multiple tape types, since Spectra supports both LTO & IBM TS drives. This means that the system could ensure that every file is always on disk, LTO, and IBM TS tape. Spectra referred to this as increasing genetic dispersion. Genetic dispersion protects against multiple types of failures by putting data on multiple different types of media. The system can also be told to make sure one copy is also offline.
Future iterations of the product could have a bucket that spans location, so that any data is always copied to multiple locations.
Shingled Magnetic Recording (SMR) drives
A new type of media from Seagate is called Shingled Magnetic Recording, and it allows data to be stacked on top of each other — just like shingles on a roof. The upside of this is that it increases the density of the disk by about 25%. The downside is that — like roof shingles — you can't remove a lower layer of shingles without removing an upper layer. Therefore, writing to an SMR drive is a lot like writing to tape. You can append all you want, but once you wan to go back and modify things, you have to erase the whole thing and start over. Spectra said this is why they were uniquely suited to leverage these drives. (Their marketing slick says, "It took a tape company to unleash the power of disk.") Using these drives requires advanced planning and logistics that they claim is built into their system from day one.
Why would you use such drives, you may ask? Cheaper and bigger while being smaller. That is the drives have bigger capacities than are possible without SMR today, and therefore allow you to put more data in less space and also save money.
The most interesting part of me what when they compared the TCO of having your own S-3 cloud onsite using ArcticBlue vs. doing the same thing with Glacier or S-3. I have not delved into the TCO model, but according to them it is at least of magnitude cheaper than Glacier. So there's that.
I'd be interested in hearing from anyone who actually deploys this product in his or her datacenter.