De-dupe targets in TSM environments? (udpated 06-08)

When speaking about de-dupe to general audiences, a very common question is, "Can TSM customers benefit from de-dupe?"  The short answer is yes, but not to the same degree as customers of other backup products.

This blog entry is one in a series of entries on deduplication.  The previous post was In-line or Post-process deduplcation , and the next one in the series is Deduplication Podcast Available .

 

While a good de-dupe product can identify redundant files between systems, and multiple versions of the same file over time, and gain a lot of de-dupe from that, the bulk of de-dupe data comes from comparing the most recent full backup against the previous full backup.  Most of the full backup will be the same blocks as the previous full backup and thus be eliminated.

For filesystems, TSM customers use a progressive incremental backup technology that does not typically include recurring full backups.  Remember, however, that dedupe systems also find duplicate data between different versions of the same file.  Therefore, while they will receive smaller dedupe ratios than customers using other backup products, their ratios will not be as high as other customers, since a lot of the duplicated data is found in repeated full backups.

Also remember that there's more to life than filesystems.  TSM customers also back up database and applications that do perform full and incremental backups.

When you combine the two, those who have tested TSM and non-TSM backup apps seem to think that while other customers may see 20:1 or more, TSM customers who back up both filesystems and databases should probably see about half that.  If you're backing up all filesystems, though, you may see even less.

Read what other TSM customers said about their experience in this thread on the TSM mailing list. There's some confusion in this discussion between de-dupe targets (such as VTLs) that will de-dupe your TSM/NBU/NW/CV backups and de-dupe backup software (e.g. Avamar, Puredisk, Asigra) that are a replacement for your backup software.  But it's a good discussion just the same.

This blog entry is one in a series of entries on deduplication.  The previous post was In-line or Post-process deduplcation , and the next one in the series is Deduplication Podcast Available .

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

1 comment
  • Hi Curtis

    I don’t understand the sentence:

    Therefore, while they will receive smaller dedupe ratios than customers using other backup products, their ratios will not be as high as other customers, since a lot of the duplicated data is found in repeated full backups.

    Are you saying that incremental backups will contain a mix of completely new files and new versions of existing files, and that the latter will continue offer good dedupe opportunities whereas the former less so? Hence a lower compression ratio than for a full backup but more than you might assume for an incremental backup?