|
Customers and audience members often ask about the maturity of deduplication. Is it mature? Are all products mature? Should you buy it now or wait? I thought that would make a nice blog entry.
This blog entry is one in a series on deduplication. The previous entry was about the real odds of hash collisions. On one hand, immaturity and backups don't go well together. We like our backup systems to be stable. They're the backup, after all. On the other hand, backup is often "pushing the envelope," as the requirements we're given are constantly changing. I would argue that there has been much more innovation in the backup space than in the rest of the storage industry. Technologies that really solve problems tend to get adopted rather quickly. Think about the ease with which many customers started using multiplexing, LAN-free backups, media/device servers, centralized tape libraries, etc. CDP didn't make as big of a splash, and I would argue that this was not because backup people don't innovate. I would argue it was for two reasons: their requirements didn't drive them to it, and it was just too big of a pill to swallow. CDP is the only method that can get you an RPO of 0. How many people have apps that have an RPO of 0? Not many. One hour, 15 mins, maybe. But 0? And I can do 15 mins with more traditional methods. As to the big pill issue, CDP uses a completely different paradigm to get the data from here to there. Did I digress? De-dupe is just the latest in the list of innovative backup techniques. So don't be afraid of it just because it's new. Just be aware that it's new and treat it accordingly. I'll talk about both dedupe software and hardware. Both have companies that have been doing this for a while, and both have companies that are just now adding dedupe to their bag of tricks. Dedupe software First let's talk about dedupe software (e.g. Avamar, Puredisk, Asigra, Evault, etc.). To use these products, you uninstall your current backup software on the client you are going to protect and install the dedupe software. That client then backs up to your dedupe backup server instead of your regular backup server. Asigra is the first mover here, as they've been doing it for several years. Avamar is probably the most recognized product, especially since it was acquired by EMC. Puredisk is Symantec's offering and is the result of an acquisition of Datacenter Technologies. Symantec's been selling it since 2007. It's core technology is much older, but I don't think it was actually available to end users. Evault has been around for several years and also has a big customer base, and they've recently added dedupe to their arsenal. Most of these products have many customers using them for quite some time and can give you lots of references. I've personally worked with customers using all of them (been a customer of two of them), and can say that they all appear to actually work, but each of them has limitations that you must address in your design. If you don't address those limitations you will be an unhappy camper. Dedupe hardwareNow let's talk about the products that dedupe inside a disk target (NAS or VTL). This is a bit harder, as a lot of these companies have been talking about having dedupe a lot longer than they've actually had dedupe -- some of them still don't have dedupe, and might not have it until Q408! One vendor has spoken that they do not plan to do deduped storage at all. I'll only talk about the information that I have that I can share. (Obviously, I have some NDA information that I can't share.) This information comes from (and will also be updated in) the disk target product directory in the Backup Central Wiki. Dedupe Vendor
There are currently nine providers of deduplicated target storage: Data Domain, Diligent, Exagrid, FalconStor, HP, NEC, NetApp, Quantum & SEPATON. Everybody else either has no dedupe or is reselling/OEMing products from these companies. The first mover here is, of course, Data Domain. They have been shipping dedupe target devices for several years, and have well over 2000 customers using their products. From a dedupe installed base perspective, everyone else pales in comparison. The first fast follower to ship was Diligent, and they've been shipping since 2005. A few surveys are starting to show that while Data Domain wins in total number of shipped units, Dilgent is either trailing close behind (or possibly beating) Data Domain in total shipped TBs of disk. In order of when GA dedupe product first shipped to customers, the other fast followers would be Exagrid, NEC, Falconstor, SEPATON, Quantum, then HP. (This is a rough approximation based on multiple sources.) IBM acquired Diligent 5/08, so that's obviously their plan. EMC announced 06/08 that they will be shipping dedupe VTLs/NAS based on Quantum's technology. Sun is shipping Falconstor-based units, and HP is shipping SEPATON-based dedupe. (It's using SEPATON for it's larger system, and has its own dedupe for their smaller systems.) HDS' product is based on Diligent, and they say this will continue, even though IBM purchased Diligetn. NetApp's WAFL-based dedupe is their own product. There are several dedupe VTLs to choose from. Dedupe Domain If you back up less than 10 TB a night (including weekends), you don't need to worry about this category, as you can back up 10 TB in an eight-hour backup window with all but one of these solutions. (The Overland unit is aimed at a different market and can handle about 4.3 TB in a 12-hour window.) If you are backing up significantly more than 10 TB a night, then you should be aware of this category. If the dedupe domain says "Single head," then data coming into a given head is only compared to other data that came into that head. A multiple head system will compare data that came into any head with data that came into all other heads. If you use multiple heads of a product with a single-headed dedupe domain, you will need to direct a given set of backups to only one head in order to gain a good dedupe ratio. You should not, for example, point the backup of a given database or filesystem to two heads for performance/load balancing reasons. Doing that will reduce your overall deduplication ratio, as the backup sent to head A will not be compared against the backup sent to head B. A multi head system would allow you to send backups to any head, and have those backups compared against all other backups sent to all other heads. The idea is that this reduces both complexity of design and increases your effective deduplication ratio. While some single-headed vendors attempt to minimize the importance of this (IMHO very important) feature, you can rest assured that any vendor who doesn't have it is currently working on adding it. Having said that it's important, it's also important to note that the vendors to offer this feature are the latest vendors to join the party (Falconstor, SEPATON, Quantum). So if you think this feature is important, you'll be looking at adopting some of the newer technology out there. (I'm not saying don't do it, of course. I'm just saying to test the heck out of it, just as you would with anything.) If "time-in-service" is more important to you, you might want to figure out how not to need this feature. ;) Dedupe Replication
If you're doing cross-campus replication where you have full LAN speeds, they can all replicate that. However, if they can replicate the data after it's been deduplicated, they can also replicate your backups across a WAN. Data Domain has had this feature for several years. The Diligent product is a bit different, in that it doesn't do the replication for you; however, you can use any replication product to replicate its deduplicated data. Falconstor's dedupe-based replication shipped with their dedupe software. Most vendors replicate the same barcodes (VTL) or filenames (NAS) that the backup software writes to the target devices. If you do this, the backup software product is therefore not aware of the second copy, as it can't understand how a single file or barcoded tape can be in two places at once. (Think about it, the same tape cannot be in two tape libraries at the same time.) The replicated copy can be used in a DR scenario by using an alternate master, recovering the backup software's catalog/database of backups, and telling it to inventory the NAS system or VTL. However, it can't be used in an operational backup & recovery perspective. Each vendor has a different answer as to how they handle this particular issue. SEPATON is trailing this race, as it's replication currently does not support replicating the deduplicated data (they can replicate before dedupe, which requires significantly more bandwidth). They're saying this feature will be coming in Q208. An added twist here is Symantec's Open Storage API, or OST for short. If a product supports this feature, NetBackup can tell the intelligent storage device to copy an individual backup from one device to another. The device would use deduplicated replication to copy the backup quickly and then tell NBU that the copy is done and where it was copied. This allows NBU to control the replication and to know about the replicated copy. Data Domain is supporting this today, and Quantum and Falconstor are expected to follow shortly. Other vendors have announced that they will support it as well, but their support is probably farther in the future. That's my best attempt to summarize the state of things today (June 2008). Here's a table summarizing these features. | Vendor/Product | Dedupe Vendor | VTL, NAS, or Local | De-dupe | Dedupe Domain | Dedupe Replication | | COPAN | Falconstor | VTL & NAS | Yes | Multiple heads | Yes | | Data Domain | Data Domain | VTL & NAS | Yes | Single head | Yes (also with Symantec OST) | | Diligent ProtecTier | Diligent | VTL | Yes | Single head | Can replicate deduped bytes using product of your choice | | EMC EDL | Quantum | VTL | Yes
| Single head
| Yes
| | Exagrid | Exagrid | NAS | Yes | Single head | Yes | | Falconstor | Falconstor | VTL | Yes | Multiple heads | Yes | | Gresham Clarita VTL | Will not be doing deduped storage | VTL | N/A | N/A | N/A | | HDS VTL | Diligent | VTL | Yes | Single head | Can replicate deduped bytes using HDS replication products | | HP VTL | HP (SMB) SEPATON (Enterprise) | VTL | Yes | Multiple heads (Enterprise only) | Does not use dedupe feature yet | | IBM VTL | Acquired Diligent
| VTL | Yes
| Single head
| Yes | | NEC Hydrastor | NEC | NAS | Yes | Multiple heads | Yes | | NetApp Nearstore VTL | Not announced
| VTL | No | N/A | N/A | | NetApp NearStore | NetApp (WAFL based)
| NAS | Yes | Single Flex-vol | Yes | | Overland Reo | VTL is Overland, Dedupe is Diligent
| VTL | Yes
| Single head
| No | | Quantum DXi | Quantum | VTL | Yes | Multiple heads | Yes | | SEPATON | SEPATON | VTL | Yes | Multiple heads | Q208 (Non-deduped replication available now) | PureDisk Storage Unit (Only works with NBU) | Symantec | LFS | Yes | Single head | No | | Sun StorageTek VTL | Falconstor | VTL | Yes | Multiple heads | Yes |
This blog entry is one in a series on deduplication. The previous entry was about the real odds of hash collisions.
|