Every once in a while someone talks to a CommVault sales rep that seems to want to classify CommVault as either source dedupe or (at the very least) not target dedupe. As one of those who does not like ambiguity (except for the whole near-CDP thing), I will explain why I put them firmly in the target dedupe camp — for now.
Simpana uses a hash-based dedupe approach, which requires three steps (among a whole bunch of other things):
- “Chunking” the files to be backed up into segments of files which are typically much larger than a byte but smaller than a file. This creates what we’ll call a “chunk.”
- Creating a hash for that chunk. This is typically done using SHA-1, which creates a 160-bit value unique to each chunk.
- Look up the hash in a hash table to see if it has been seen before. (If it has, it will not store it; if it hasn’t, it will store it.)
A typical hash-based target dedupe system does all three behind the backup server (please note that not all target dedupe systems are hash-based). In order to be considered source based, you must do all three at the client. Because if you are not doing all three at the client, you are not deduping at the source; you are sending un-deduped (native) data across then LAN, then deduping it. The whole point of source dedupe is to reduce LAN traffic.
CommVault Simpana does steps 1 & 2 at the client. They can then compress the data that has been chunked & fingerprinted and send it to the media agent where the third step will take place. Because they don’t do the third step at the client, they are deduping at the target; they are a target dedupe solution.
They can (and do) argue that because they do it the way they do it, they reduce more LAN traffic than a typical target dedupe system because they can compress the data prior to sending. If you turned on client compression, for example, with CommVault (or any other backup product) and then sent those compressed backups to a typical target dedupe system (e.g. Data Domain, SEPATON), the compression will negatively impact your dedupe ratio. Therefore, it is a recommended practice to NOT compress data on a client before sending it to a target dedupe system — unless you’re using CommVault’s target dedupe where they do the fingerprinting/chunking at the client.
Until they do step 3 at the client, they are target dedupe — albeit an enhanced one.
But since all they have to do to be source dedupe is to add step three to their client process — I’ve got to believe they’re working on it. That’s why I say they are target dedupe — for now.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.