Login Form






Lost Password?
No account yet? Register

Search Backup Central

Curtis

Disclaimer

The opinions contained within this website, it's blog(s), forums, and Wikis, are those of the original poster and do not represent the position of my (or any other) employer. This blog is not owned by my employer nor does it officially represent any company.
Deduplication maturity PDF Print E-mail
Written by W. Curtis Preston   
Monday, 10 December 2007
Customers and audience members often ask about the maturity of deduplication. Is it mature?  Are all products mature?  Should you buy it now or wait?  I thought that would make a nice blog entry.

This blog entry is one in a series on deduplication.  The previous entry was about the real odds of hash collisions.

 

On one hand, immaturity and backups don't go well together. We like our backup systems to be stable.  They're the backup, after all. On the other hand, backup is often "pushing the envelope," as the requirements we're given are constantly changing.  I would argue that there has been much more innovation in the backup space than in the rest of the storage industry.  Technologies that really solve problems tend to get adopted rather quickly.   Think about the ease with which many customers started using multiplexing, LAN-free backups, media/device servers, centralized tape libraries, etc.  CDP didn't make as big of a splash, and I would argue that this was not because backup people don't innovate.  I would argue it was for two reasons: their requirements didn't drive them to it, and it was just too big of a pill to swallow.  CDP is the only method that can get you an RPO of 0. How many people have apps that have an RPO of 0?  Not many.  One hour, 15 mins, maybe.  But 0?  And I can do 15 mins with more traditional methods.  As to the big pill issue, CDP uses a completely different paradigm to get the data from here to there.

Did I digress?  De-dupe is just the latest in the list of innovative backup techniques.  So don't be afraid of it just because it's new.  Just be aware that it's new and treat it accordingly.

I'll talk about both dedupe software and hardware.  Both have companies that have been doing this for a while, and both have companies that are just now adding dedupe to their bag of tricks.  

Dedupe software 

First let's talk about dedupe software (e.g. Avamar, Puredisk, Asigra, Evault, etc.). To use these products, you uninstall your current backup software on the client you are going to protect and install the dedupe software.  That client then backs up to your dedupe backup server instead of your regular backup server.

Asigra is the first mover here, as they've been doing it for several years.  Avamar is probably the most recognized product, especially since it was acquired by EMC.  Puredisk is Symantec's offering and is the result of an acquisition of Datacenter Technologies.  Symantec's been selling it for about a year now.  It's core technology is much older, but I don't think it was actually available to end users.  Evault has been around for several years and also has a big customer base, and they've recently added dedupe to their arsenal.

Most of these products have many customers using them for quite some time and can give you lots of references.  I've personally worked with customers using all of them (been a customer of two of them), and can say that they all appear to actually work, but each of them has limitations that you must address in your design.  If you don't address those limitations you will be an unhappy camper.

Dedupe hardware

Now let's talk about the products that dedupe inside a disk target (NAS or VTL).  This is a bit harder, as a lot of these companies have been talking about having dedupe a lot longer than they've actually had dedupe -- some of them still don't have dedupe, and might not have it until Q408!  One vendor has spoken that they do not plan to do deduped storage at all.  I'll only talk about the information that I have that I can share.  (Obviously, I have some NDA information that I can't share.)  This information comes from (and will also be updated in) the disk target product directory in the Backup Central Wiki.

Dedupe Vendor

There are currently seven providers of deduplicated target storage: Data Domain, Diligent, Exagrid, FalconStor, NEC, NetApp, & SEPATON. Everybody else either has no dedupe or is reselling/OEMing products from these companies.  The first mover here is, of course, Data Domain.  They have been shipping dedupe target devices for several years, and have well over 1000 customers using their products.  From a dedupe installed base perspective, everyone else pales in comparison.  The first fast follower to ship was Diligent, and they've been shipping for somewhere between a year or two now.  In order of whe GA dedupe product first shipped to customers, the other fast followers would be Exagrid, NEC, Falconstor & SEPATON.  (This is a rough approximation based on multiple sources.)  EMC, HP, IBM, & Sun are not talking publicly about their dedupe plans, but you can better your money that they're working on it.  Sources suggest that one or more of these vendors is developing their own dedupe product (Good Luck!), while others are testing the dedupe capabilities of the product that they OEM (i.e. FalconStor or SEPATON).  The only major OEMs to have a shipping dedupe product are HDS & NetApp.  HDS' product is based on Diligent, and NetApp's WAFL-based dedupe is their own product.  Summary: as long as you don't have to buy something from EMC, IBM, or Sun, there are several dedupe VTLs to choose from.

Dedupe Domain 

If you back up less than 5-6 TB a night, you don't need to worry about this category, as you can back up 5-6 TB in an eight-hour backup window (or 8-9 TB with a 12-hour window) with all but one of these solutions.  (The Overland unit is aimed at a different market and can handle about 4.3 TB in a 12-hour window.)  If you are backing up significantly more than 5-9 TB a night, then you should be aware of this category.  If the dedupe domain says "Single head," then data coming into a given head is only compared to other data that came into that head.  A multiple head system will compare data that came into any head with data that came into all other heads.  If you use multiple heads of a product with a single-headed dedupe domain, you will need to direct a given set of backups to only one head in order to gain a good dedupe ratio.  You should not, for example, point the backup of a given database or filesystem to two heads for performance/load balancing reasons.  Doing that will reduce your overall deduplication ratio, as the backup sent to head A will not be compared against the backup sent to head B.  A multi head system would allow you to send backups to any head, and have those backups compared against all other backups sent to all other heads.  The idea is that this reduces both complexity of design and increases your effective deduplication ratio.  While some single-headed vendors attempt to minimize the importance of this (IMHO very important) feature, you can rest assured that any vendor who doesn't have it is currently working on adding it.  Having said that it's important, it's also important to note that the two vendors to offer this feature are the latest vendors to join the party (Falconstor & SEPATON).  So if you think this feature is important, you'll be looking at adopting some of the newer technology out there.  (I'm not saying don't do it, of course.  I'm just saying to test the heck out of it, just as you would with anything.)  If "time-in-service" is more important to you, you might want to figure out how not to need this feature. ;)

Dedupe Replication

If you're doing cross-campus replication where you have full LAN speeds, they can all replicate that.  However, if they can replicate the data after it's been deduplicated, they can also replicate your backups across a WAN.  Data Domain has had this feature for several years.  The Diligent product is a bit different, in that it doesn't do the replication for you; however, you can use any replication product to replicate its deduplicated data.  Falconstor's dedupe-based replication shipped with their dedupe software.  All current vendors replicate the same barcodes (VTL) or filenames (NAS) that the backup software writes to the target devices.   The backup software product is therefore not aware of the second copy, as it can't understand how a single file or barcoded tape can be in two places at once.  (Think about it, the same tape cannot be in two tape libraries at the same time.)  The replicated copy can be used in a DR scenario by using an alternate master, recovering the backup software's catalog/database of backups, and telling it to inventory the NAS system or VTL.  However, it can't be used in an operational backup & recovery perspective. Each vendor has a different answer as to how they handle this particular issue. SEPATON is trailing this race, as it's replication currently does not support replicating the deduplicated data (they can replicate before dedupe, which requires significantly more bandwidth).  They're saying this feature will be coming in Q208.

 

That's my best attempt to summarize the state of things today (December 07).    Here's a table summarizing these features. 

 

Vendor/Product Dedupe Vendor VTL, NAS,
or Local
De-dupe Dedupe Domain Dedupe Replication 
COPAN Falconstor VTL & NAS YesMultiple heads  Yes
Data Domain Data Domain VTL & NAS Yes Single head  Yes
Diligent ProtecTier Diligent VTL Yes Single head Can replicate deduped bytes using product of your choice
EMC EDL Not announced (VTL is Falconstor) VTL No N/A  N/A
Exagrid Exagrid NAS Yes Single head  Yes
Falconstor Falconstor VTL Yes Multiple heads  Yes
Gresham Clarita VTL Will not be doing deduped storage VTLN/A N/A  N/A
HDS VTL Diligent VTL Yes Single head  Can replicate deduped bytes using HDS replication products
HP VTL Not announced (VTL is SEPATON)
VTL No N/A  N/A
IBM VTL Not announced (VTL is Falconstor) VTL No N/A  N/A
NEC Hydrastor NEC NAS Yes Multiple heads  Yes
NetApp Nearstore VTL Not announced
VTL No N/A  N/A
NetApp NearStore NetApp (WAFL based)
NAS Yes Single Flex-vol  Yes
Overland Reo VTL is Overland, Dedupe is Diligent
VTL Yes
Single head
 No
Quantum DXi Quantum VTL Yes Single head  Yes
SEPATON SEPATON VTL Yes Multiple heads  Q208 (Non-deduped replication available now)
PureDisk Storage Unit
(Only works with NBU)
Symantec LFS Yes Single head No
Sun StorageTek VTL Not announced VTL Yes Multiple heads N/A

 

This blog entry is one in a series on deduplication.  The previous entry was about the real odds of hash collisions

 

Comments
Add NewSearch
storagedoctor - Data Domain claims   | 64.252.41.xxx | 2007-12-14 05:53:44
Hi Curtis,

Long time, no talk.. Just curious about your claims that DD is a good solution for 6TB/night requirement. That would require they are able to move their data @ their rated spec of 220MB/sec, which we have never seen in the field. I just learned of a customer in Boston who just unplugged their box because it was slower than their previous tape backkup !!
cpreston - Grain of salt   | Super Administrator | 2007-12-19 23:29:50
Note to readers: Although the comment reads like "storagedoctor" and I know each other, I do not know his/her real identity, nor do I know if he/she works for an end user company or a Data Domain competitor. The verbiage of the comment would suggest the latter, so take his/her comments with a grain of salt.

My response to the actual comment is this: While I haven't seen them do 220, I have seen them do close. My original intent therefore was to say "around 6 TB." (I've changed it now to say 5-6 TB.) My point is that, while the throughput of some of these systems is not 1000s of megabytes per second, it's still enough to meet a lot of people's requirements. I know a lot of customers that back up far fewer than 6 TB a night. As with all things dedupe, your mileage may vary. Therefore you should test anything you buy.
tbiehler - Quantum   | Registered | 2007-12-18 09:44:33
Quantum is mentioned in the table but not in the body of the article. Are they re-marketing someone else's technology?
cpreston - Nope   | Super Administrator | 2008-01-11 12:14:07
Dxi is their own technology.
Only registered users can write comments!

Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved.

 
< Prev   Next >

Sponsored Links