Welcome! » Log In » Create A New Profile

Data Deduplication

Posted by Anonymous 
Data Deduplication
August 29, 2007 12:40PM
I'd like to steer this around a bit. Our sales folks are saying they
are losing TSM opportunities to de-dup vendors. What specific business
problem are customers trying to solve with de-dup?

I'm thinking the following:

1. Reduce the amount of disk/tape required to storage backups.
Especially important for all an all disk backup solution.
2. Reduce backup times (for source de-dup I would think. No benefit in
target de-dup for this).
3. Replication of backup data across a wide area network. Obviously if
you have less stored you have less to replicate.

Others? Relative importance of these?

Does TSM in and of itself provide similar benefits in its natural state?
From this discussion adding de-dup at the backend does not necessarily
provide much though it does for the other traditional backup products.
Since we don't dup, we don't need to de-dup.

Help me get it because aside from the typical "I gotta have it because
the trade rags tell me I gotta have it", I don't get it!

Thanks, (Once again not afraid to expose my vast pool of ignorance...)

Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp < at > storserver.com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Curtis Preston
Sent: Wednesday, August 29, 2007 1:08 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

[quote]As de-dup, from what I have read, compares across all files on a
"system" (server, disk storage or whatever), it seems to me that this
will be an enormous resource hog
[/quote]
Exactly. To make sure everyone understands, the "system," is the
intelligent disk target, not a host you're backing up. A de-dupe
IDT/VTL is able to de-dupe anything against anything else that's been
sent to it. This can include, for example, a file in a filesystem and
the same file inside an Exchange Sent Items folder.

[quote]The de-dup technology only compares / looks at the files with in its
specific repository. Example: We have 8 Protectier node in one data
center which equtes to 8 Virtual Tape Libraries and 8 reposoitires.
[/quote]The

There are VTL/IDT vendors that offer a multi-head approach to
de-duplication. As you need more throughput, you buy more heads, and
all heads are part of one large appliance that uses a single global
de-dupe database. That way you don't have to point worry about which
backups go to which heads. Diligent's VTL Open is a multi-headed VTL,
but ProtecTier is not -- yet. I would ask them their plans for that.

While this feature is not required for many shops, I think it's a very
important feature for large shops.
Data Deduplication
August 29, 2007 01:31PM
You're correct, in that there are products that can provide a more global
repository. We used the Dilligent VTFOpen in a 2node cluster and achieved
a 1200MBS write speed! Impressive, so if you don't need the de-dup the
VtfOpen product really screams.

In one of a few large Data Centers we see 25TB per night (FS Incr and Full
DB Backups) the Clustering feature that Diligent is working is huge for
us, but we will not be the first on to bleed from it as we've already shed
some blood, but you have to expect that when you are working with new
technologies. I think the cluster feature for ProtectTier is first
quarter 08, but its been a moving target for a year now.

Many of you hear me speak to Diligent's product, we do have some older EMC
CDL's and one Data Domain < at > a remote. We did an RFP for Virtual Tape
solutions 18months ago and landed on the Diligent Protectier because it
was the only de-dupe VTL head that was accessible via fiber channel.
Correct me if I'm wrong but DataDomain uses IP NFS mount to access the
repository which just wouldn't scale in our environment, nothing against
any of the other VTL / De-Dupers... It will be interesting to how the "Out
of Band" de-duping VTL (Falconstor) pan out. Its either pay for the
performance hit of de-duping up from or touching the data twice.

Charles Hart

Curtis Preston <cpreston < at > GLASSHOUSE.COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/29/2007 02:07 PM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>

To
ADSM-L < at > VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Data Deduplication

[quote]As de-dup, from what I have read, compares across all files
on a "system" (server, disk storage or whatever), it seems
to me that this will be an enormous resource hog
[/quote]
Exactly. To make sure everyone understands, the "system," is the
intelligent disk target, not a host you're backing up. A de-dupe
IDT/VTL is able to de-dupe anything against anything else that's been
sent to it. This can include, for example, a file in a filesystem and
the same file inside an Exchange Sent Items folder.

[quote]The de-dup technology only compares / looks at the files with in its
specific repository. Example: We have 8 Protectier node in one data
center which equtes to 8 Virtual Tape Libraries and 8 reposoitires.
[/quote]The

There are VTL/IDT vendors that offer a multi-head approach to
de-duplication. As you need more throughput, you buy more heads, and
all heads are part of one large appliance that uses a single global
de-dupe database. That way you don't have to point worry about which
backups go to which heads. Diligent's VTL Open is a multi-headed VTL,
but ProtecTier is not -- yet. I would ask them their plans for that.

While this feature is not required for many shops, I think it's a very
important feature for large shops.

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Data Deduplication
August 29, 2007 01:39PM
At 03:40 PM 8/29/2007, Kelly Lipp wrote:
[quote]Help me get it because aside from the typical "I gotta have it
because the trade rags tell me I gotta have it", I don't get it!
[/quote]
Kelly,

I think you are correct in that TSM already gives you some of the
benefits that a more traditional backup product would get by using a
dedup VTL. But TSM only does it at the file level. I.e., if a file
doesn't change, TSM won't back it up again, whereas other backup
products might. But a dedup VTL will go further, in that it will
dedup more information. For example, common files that exist across
a bunch of clients (think about emails, attachments, Windows System
Objects), or also things like Oracle database backups.

There is still a benefit to using a dedup VTL in a TSM environment,
but not nearly as great as in a traditional backup environment
(father/son/grandson). Since you will likely pay some sort of
premium for a dedup VTL, the question is: is the premium worth
it? Or would you be better off buying a bunch of cheaper storage
(tape or even SATA disk) and storing those extra copies? The answer,
of course, is "it depends". But I think dedup VTLs will be a harder
sell in a TSM environment than in other environments.

As I've researched this, I'm thinking more about buying a smaller
dedup VTL as an adjunct to our other back-end storage, which would
allow us to target certain types of data that we know will dedup
well, such as Windows System Objects, Exchange server backups, Oracle
backups, etc. One problem with this, is that the best way to do this
is via TSM management classes, but they are overloaded with other
things like retention, versions, etc.

It might be nice to see TSM introduce some new capabilities to help
support a dedup VTL, or perhaps do some of what Curtis calls
source-deduping. I know they've been thinking about something along
these lines for awhile.

One other point about dedup VTLs: some do their deduping in-band
whereas others do them out-of-band. The in-band ones will avoid
storing duplicate data, but can be more performance limited. This is
only an issue if you need to move more data than they can
handle. The out-of-band ones will store the data, then dedup it
afterwards. At least one of these that I know of (Sepaton) can scale
their performance by adding engines. I believe that one vendor
recently now supports either way of doing this.

..Paul

--
Paul Zarnowski Ph: 607-255-4757
Manager, Storage Services Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801 Em: psz1 < at > cornell.edu
Data Deduplication
August 29, 2007 02:12PM
Kelly,

I have more than 1 customer considering a de-dup VTL product.

It's true that for regular file systems, TSM doesn't redump unchanged
files, so people aren't getting AS LARGE a reduction in data stored (of
that type) as would a user of an old style full dump- incremental -
incremental - full dump product.

OTOH, even with TSM, your DB dumps (Exchange, SQL, most Oracle
implementations) are still for the most part full dumps, followed by
icrementals, then full dumps. The larger the data base, in most cases,
the less the contents change. And you can't use subfile-backup on
anything larger than 2 GB.

I have several customers that have a relatively small number of clients
(say 50 or less), but the bulk of their daily backup data is 1 or 2 very
large data bases. And the bulk of the CONTENTS of those data bases
doesn't change all that much. Send that DB full dump to a de-dup VTL that
can identify duplicate "blobs" (I'm using that as a generic term because I
don't mean "block" in the sense of a disk block or sector and different
vendors can identify larger or smaller duplicate blobs), and you get a
very large impact that TSM can't provide. The only thing that gets stored
each day is the delta bits. Even if it's an Exchange/SQL/Oracle full-dump
day, the amount of new data to be stored may be 10% or less of what it
used to be.

And I have more than 1 customer looking at a de-dup VTL as a way to make
managing their own DR sites practical, because those VTL's can replicate
to EACH OTHER across the WAN. The huge cost in transmitting your data to
a DR site is the cost of the pipe. If, however, you can get the amount of
data per day down to 10% of what it used to be by having the VTL compress
and dedup, and you have another corporate location where you can put the
other VTL, it starts looking close to cost-effective in $$ terms. In
fact, IBM recovery services is offering Data Domain equipment on the floor
in at least 1 of their recovery sites for that purpose. (The customer
installs a a DD box on their site, leases the DD box in the IBM DR site,
replicates between.)

(Insert disclaimer here: I'm not necessarily a fan of replicating backup
data, because the problem my customers always have is doing the DB
recovery. I think the first choice should be replicating the real DB using
something like MIMIX, so that it's always ready to go on the recovery end.
I merely report the bit about replicating backup data because I have
customers considering it.)

Regarding the lost sales opportunities, I think you gotta go back and
consider the features that TSM has that other people don't, dedup or not -
there was a discussion on the list last month about comparing TSM to
Legato & others, and there was remarkably little emphasis on management
classes and the ability of TSM to treat different data differently
according to business needs- I still haven't seen any other product that
has what TSM provides. (Here not afraid to expose MY ignorance - would
like to know if there is anything else out there -)

Wanda

[quote]I'd like to steer this around a bit. Our sales folks are saying they
are losing TSM opportunities to de-dup vendors. What specific business
problem are customers trying to solve with de-dup?

I'm thinking the following:

1. Reduce the amount of disk/tape required to storage backups.
Especially important for all an all disk backup solution.
2. Reduce backup times (for source de-dup I would think. No benefit in
target de-dup for this).
3. Replication of backup data across a wide area network. Obviously if
you have less stored you have less to replicate.

Others? Relative importance of these?

Does TSM in and of itself provide similar benefits in its natural state?
From this discussion adding de-dup at the backend does not necessarily
provide much though it does for the other traditional backup products.
Since we don't dup, we don't need to de-dup.

Help me get it because aside from the typical "I gotta have it because
the trade rags tell me I gotta have it", I don't get it!

Thanks, (Once again not afraid to expose my vast pool of ignorance...)

Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp < at > storserver.com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Curtis Preston
Sent: Wednesday, August 29, 2007 1:08 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

[quote]As de-dup, from what I have read, compares across all files on a
"system" (server, disk storage or whatever), it seems to me that this
will be an enormous resource hog
[/quote]
Exactly. To make sure everyone understands, the "system," is the
intelligent disk target, not a host you're backing up. A de-dupe
IDT/VTL is able to de-dupe anything against anything else that's been
sent to it. This can include, for example, a file in a filesystem and
the same file inside an Exchange Sent Items folder.

[quote]The de-dup technology only compares / looks at the files with in its
specific repository. Example: We have 8 Protectier node in one data
center which equtes to 8 Virtual Tape Libraries and 8 reposoitires.
[/quote]The

There are VTL/IDT vendors that offer a multi-head approach to
de-duplication. As you need more throughput, you buy more heads, and
all heads are part of one large appliance that uses a single global
de-dupe database. That way you don't have to point worry about which
backups go to which heads. Diligent's VTL Open is a multi-headed VTL,
but ProtecTier is not -- yet. I would ask them their plans for that.

While this feature is not required for many shops, I think it's a very
important feature for large shops.
[/quote]
Data Deduplication
August 29, 2007 03:45PM
Wanda,

Thanks for your cogent analysis. Always appreciated.

We're trying to decide if we need to offer a Data Domain sort of thing
to our customers. In the very specific case you describe, perhaps.

I am 100% with you on the "why replicate backup" when you can more
easily replicate data?! We're offering Compellent as our active data
repository and they have a very nice replication bit that is very
bandwidth friendly. I think money is better spent there than on
replicating backup data. But try convincing a customer that's had the
Kool-Aid that they don't want de-duplication!

Your comment about management classes is right on! If you limit the
number of version of a db backup that you keep to something reasonable,
like seven, let's say and with a 1TB database (which is big!), then you
have 7TB worst case of duplicate data! Let's see: that breaks down to
about 7 LTO4 tapes. Or 10 750GB SATA drives. Or 7 x $100 = $700 for
tape, plus slots of course so let's say $2000. For disk, depending on
your vendor, that could cost between $3K and $8K (and if you're paying
more than that for SATA drives you perhaps ought to seek counseling!).
So how much would you be willing to spend to reduce this cost? No more
than $8K. Does a DD cost less than that? I'm not thinking so. And
unless my math is way off you can make a reasonable argument against for
even more db data!'

It's all about mind share, isn't it? Today, de-duplication is hot...

Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp < at > storserver.com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Wanda Prather
Sent: Wednesday, August 29, 2007 3:12 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

Kelly,

I have more than 1 customer considering a de-dup VTL product.

It's true that for regular file systems, TSM doesn't redump unchanged
files, so people aren't getting AS LARGE a reduction in data stored (of
that type) as would a user of an old style full dump- incremental -
incremental - full dump product.

OTOH, even with TSM, your DB dumps (Exchange, SQL, most Oracle
implementations) are still for the most part full dumps, followed by
icrementals, then full dumps. The larger the data base, in most cases,
the less the contents change. And you can't use subfile-backup on
anything larger than 2 GB.

I have several customers that have a relatively small number of clients
(say 50 or less), but the bulk of their daily backup data is 1 or 2 very
large data bases. And the bulk of the CONTENTS of those data bases
doesn't change all that much. Send that DB full dump to a de-dup VTL
that can identify duplicate "blobs" (I'm using that as a generic term
because I don't mean "block" in the sense of a disk block or sector and
different vendors can identify larger or smaller duplicate blobs), and
you get a very large impact that TSM can't provide. The only thing that
gets stored each day is the delta bits. Even if it's an
Exchange/SQL/Oracle full-dump day, the amount of new data to be stored
may be 10% or less of what it used to be.

And I have more than 1 customer looking at a de-dup VTL as a way to make
managing their own DR sites practical, because those VTL's can replicate
to EACH OTHER across the WAN. The huge cost in transmitting your data
to a DR site is the cost of the pipe. If, however, you can get the
amount of data per day down to 10% of what it used to be by having the
VTL compress and dedup, and you have another corporate location where
you can put the other VTL, it starts looking close to cost-effective in
$$ terms. In fact, IBM recovery services is offering Data Domain
equipment on the floor in at least 1 of their recovery sites for that
purpose. (The customer installs a a DD box on their site, leases the DD
box in the IBM DR site, replicates between.)

(Insert disclaimer here: I'm not necessarily a fan of replicating
backup data, because the problem my customers always have is doing the
DB recovery. I think the first choice should be replicating the real DB
using something like MIMIX, so that it's always ready to go on the
recovery end.
I merely report the bit about replicating backup data because I have
customers considering it.)

Regarding the lost sales opportunities, I think you gotta go back and
consider the features that TSM has that other people don't, dedup or not
- there was a discussion on the list last month about comparing TSM to
Legato & others, and there was remarkably little emphasis on management
classes and the ability of TSM to treat different data differently
according to business needs- I still haven't seen any other product that
has what TSM provides. (Here not afraid to expose MY ignorance - would
like to know if there is anything else out there -)

Wanda

[quote]I'd like to steer this around a bit. Our sales folks are saying they
are losing TSM opportunities to de-dup vendors. What specific
business problem are customers trying to solve with de-dup?

I'm thinking the following:

1. Reduce the amount of disk/tape required to storage backups.
Especially important for all an all disk backup solution.
2. Reduce backup times (for source de-dup I would think. No benefit
in target de-dup for this).
3. Replication of backup data across a wide area network. Obviously
if you have less stored you have less to replicate.

Others? Relative importance of these?

Does TSM in and of itself provide similar benefits in its natural
[/quote]state?
[quote]From this discussion adding de-dup at the backend does not necessarily
[/quote]
[quote]provide much though it does for the other traditional backup products.
Since we don't dup, we don't need to de-dup.

Help me get it because aside from the typical "I gotta have it because
[/quote]
[quote]the trade rags tell me I gotta have it", I don't get it!

Thanks, (Once again not afraid to expose my vast pool of ignorance...)

Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp < at > storserver.com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf
Of Curtis Preston
Sent: Wednesday, August 29, 2007 1:08 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

[quote]As de-dup, from what I have read, compares across all files on a
"system" (server, disk storage or whatever), it seems to me that this
will be an enormous resource hog
[/quote]
Exactly. To make sure everyone understands, the "system," is the
intelligent disk target, not a host you're backing up. A de-dupe
IDT/VTL is able to de-dupe anything against anything else that's been
sent to it. This can include, for example, a file in a filesystem and
[/quote]
[quote]the same file inside an Exchange Sent Items folder.

[quote]The de-dup technology only compares / looks at the files with in its
specific repository. Example: We have 8 Protectier node in one data
center which equtes to 8 Virtual Tape Libraries and 8 reposoitires.
[/quote]The

There are VTL/IDT vendors that offer a multi-head approach to
de-duplication. As you need more throughput, you buy more heads, and
all heads are part of one large appliance that uses a single global
de-dupe database. That way you don't have to point worry about which
backups go to which heads. Diligent's VTL Open is a multi-headed VTL,
[/quote]
[quote]but ProtecTier is not -- yet. I would ask them their plans for that.

While this feature is not required for many shops, I think it's a very
[/quote]
[quote]important feature for large shops.
[/quote]
Data Deduplication
August 30, 2007 12:09AM
When you say "losing TSM opps to de-dupe vendors," you must be talking
about de-dupe SOFTWARE vendors (Avamar, Puredisk, Asigra). I don't see
how someone buying a de-dupe VTL to go with TSM would be considered a
lost TSM opportunity.

Unlike a de-dupe VTL that can be used with TSM, de-dupe backup software
would replace TSM (or NBU, NW, etc) where it's used. De-dupe backup
software takes TSM's progressive incremental much farther, only backing
up new blocks/fragements/pieces of data that have never been seen by the
backup server. This makes de-dupe backup software really great at
backing up remote offices.

The alternative is to put a complete backup infrastructure (server,
tape, disk, etc) at the remote site and have someone swap tapes out
there. That's been the only answer for years. Now, de-dupe backup
software allows you to back up relatively large remote offices with NO
backup infrastructure at the remote site. That's nothing short of huge.

I know of a major trading firm, for example, that is now backing up
almost 300 remote sites to their central datacenter without putting any
backup infrastructure in any of them. Since a 48 hour RTO was fine for
their remote offices, they do restores locally in the central datacenter
and Fed Ex the restored systems/drives to the remote office.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Kelly Lipp
Sent: Wednesday, August 29, 2007 12:41 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

I'd like to steer this around a bit. Our sales folks are saying they
are losing TSM opportunities to de-dup vendors. What specific business
problem are customers trying to solve with de-dup?

I'm thinking the following:

1. Reduce the amount of disk/tape required to storage backups.
Especially important for all an all disk backup solution.
2. Reduce backup times (for source de-dup I would think. No benefit in
target de-dup for this).
3. Replication of backup data across a wide area network. Obviously if
you have less stored you have less to replicate.

Others? Relative importance of these?

Does TSM in and of itself provide similar benefits in its natural state?
From this discussion adding de-dup at the backend does not necessarily
provide much though it does for the other traditional backup products.
Since we don't dup, we don't need to de-dup.

Help me get it because aside from the typical "I gotta have it because
the trade rags tell me I gotta have it", I don't get it!

Thanks, (Once again not afraid to expose my vast pool of ignorance...)

Kelly J. Lipp
VP Manufacturing & CTO
STORServer, Inc.
485-B Elkton Drive
Colorado Springs, CO 80907
719-266-8777
lipp < at > storserver.com

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Curtis Preston
Sent: Wednesday, August 29, 2007 1:08 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

[quote]As de-dup, from what I have read, compares across all files on a
"system" (server, disk storage or whatever), it seems to me that this
will be an enormous resource hog
[/quote]
Exactly. To make sure everyone understands, the "system," is the
intelligent disk target, not a host you're backing up. A de-dupe
IDT/VTL is able to de-dupe anything against anything else that's been
sent to it. This can include, for example, a file in a filesystem and
the same file inside an Exchange Sent Items folder.

[quote]The de-dup technology only compares / looks at the files with in its
specific repository. Example: We have 8 Protectier node in one data
center which equtes to 8 Virtual Tape Libraries and 8 reposoitires.
[/quote]The

There are VTL/IDT vendors that offer a multi-head approach to
de-duplication. As you need more throughput, you buy more heads, and
all heads are part of one large appliance that uses a single global
de-dupe database. That way you don't have to point worry about which
backups go to which heads. Diligent's VTL Open is a multi-headed VTL,
but ProtecTier is not -- yet. I would ask them their plans for that.

While this feature is not required for many shops, I think it's a very
important feature for large shops.
Data Deduplication
August 30, 2007 12:39AM
[quote]I am 100% with you on the "why replicate backup" when you can more
easily replicate data?!
[/quote]
If you've done away with traditional backup, then I'd agree. If you're
still making tapes (or virtual tapes), then you also want those tapes
offsite. You've got two choices: hand them to a dude in a truck or
replicate them. Replicating them may cost a bit more (but it's still a
lot more feasible now due to de-dupe), but you won't lose a tape and
have to go on CNN.

[quote]I think money is better spent there than on
replicating backup data. But try convincing a customer that's had the
Kool-Aid that they don't want de-duplication!
[/quote]
Since I'm passing out the Kool-Aid, I'll answer. ;)

De-dupe really does make sense, even for TSM shops. Storing (and
leaving) backups on disk is definitely the future, and de-dupe makes it
5-10 times cheaper. It just makes sense.

Replication and snapshots (which I call near-CDP) is far superior to
backup in many ways, and the Compellent story is very nice, as they
allow you to have many snapshots without a performance penalty.

But I see replication as the thing to do once you've straightened out
your backup. It's still not a complete replacement. It accomplishes
things backup doesn't, but backup accomplishes things that near-CDP
doesn't.

So..

If you're going to have backup, you're going to want it off-site. Go
back to the beginning of the post.

[quote]Your comment about management classes is right on! If you limit the
number of version of a db backup that you keep to something reasonable,
like seven, let's say and with a 1TB database (which is big!), then you
have 7TB worst case of duplicate data! Let's see: that breaks down to
about 7 LTO4 tapes. Or 10 750GB SATA drives. Or 7 x $100 = $700 for
tape, plus slots of course so let's say $2000. For disk, depending on
your vendor, that could cost between $3K and $8K (and if you're paying
more than that for SATA drives you perhaps ought to seek counseling!).
So how much would you be willing to spend to reduce this cost? No
[/quote]more
[quote]than $8K. Does a DD cost less than that? I'm not thinking so. And
unless my math is way off you can make a reasonable argument against
[/quote]for
[quote]even more db data!'
[/quote]
Your math is definitely off. No offense, but you are making the classic
mistake of looking only at the cost of the media. Those tapes are
worthless without a really expensive tape library around them. Having
said that, we really need to start talking about 30-50 TB of backed up
data before the de-dupe boxes come into play.

[quote]It's all about mind share, isn't it? Today, de-duplication is hot...
[/quote]
I've been in this business about 14 years, and I've never seen a
technology get adopted this quickly. It's going to happen. It is
happening. It's not a fad. It just makes sense.
Data Deduplication
August 30, 2007 07:35AM
[quote][quote]On Wed, 29 Aug 2007 13:40:34 -0600, Kelly Lipp <lipp < at > STORSERVER.COM> said:
[/quote][/quote]

[quote]I'd like to steer this around a bit. Our sales folks are saying
they are losing TSM opportunities to de-dup vendors. What specific
business problem are customers trying to solve with de-dup?
[/quote]
[quote]I'm thinking the following:
[/quote]
[quote]1. Reduce the amount of disk/tape required to storage backups.
Especially important for all an all disk backup solution.
[/quote]
which I love.

"We don't need tape, because disk is cheap!"
[...hiatus...]
"We have to save disk! Buy (and integrate, and manage) a new product!"

[quote]2. Reduce backup times (for source de-dup I would think. No benefit in
target de-dup for this).
[/quote]
[quote]3. Replication of backup data across a wide area network. Obviously if
you have less stored you have less to replicate.
[/quote]
[quote]Others? Relative importance of these?
[/quote]
[quote]Does TSM in and of itself provide similar benefits in its natural
state? From this discussion adding de-dup at the backend does not
necessarily provide much though it does for the other traditional
backup products. Since we don't dup, we don't need to de-dup.
[/quote]
I think a back-end de-dup (de do da da) would still offer advantages
to TSM: if you've got mumblety-hundred (e.g.) Win2K boxen, then most
of their system and app space would be identical. This could,
concievably, end up as close to one system-images' worth of space on
the back end. In a fantasy. :)

However, the server would need to do an awful lot of work to correlate
all these data.

- Allen S. Rout
Data Deduplication
August 30, 2007 01:52PM
Since this message is pretty pro-de-dupe, I want to mention that I don't
sell any of this stuff. I'm just excited about the technology, have
many customers large and small using it, and want to make sure it's
accurately represented.

[quote]"We don't need tape, because disk is cheap!"
[...hiatus...]
"We have to save disk! Buy (and integrate, and manage) a new product!"
[/quote]
I would put that history slightly differently. I don't know anyone who
knew what they were doing that was saying "we don't need tape!" What
they were saying is:

"Tape drives are now way too fast! We have to stage to disk to backup
to stream the drives. Wouldn't it be cool if we could also do away with
tape onsite, but we still need it for offsite."

[...hiatus...]

"Holy crap! VTLs are expensive! Forget the store all onsite backups on
disk part. Let's just do staging. That requires a much smaller amount
of disk."

[...hiatus...]

"De-dupe is here. Using that, we can take the amount of disk that we
would have bought just for staging and store all our onsite backups on
it. Wow."

[quote]I think a back-end de-dup (de do da da) would still offer advantages
to TSM: if you've got mumblety-hundred (e.g.) Win2K boxen, then most
of their system and app space would be identical. This could,
concievably, end up as close to one system-images' worth of space on
the back end. In a fantasy. :)
[/quote]
This is not a fantasy. There are products that have been GA for 3+
years that are doing just this. These products also notice when a file
has been modified multiple times and just backs up the new blocks that
were changed each time. In addition, these products also notice users'
files that are common between the filesystem and sitting inside Exchange
inboxes and Sent Items folders, for example. They notice attachments
that were sent to multiple remote offices that have already been backed
up. All of tis is reality, is GA, and is being used by many companies,
many of them very, very large.

[quote]However, the server would need to do an awful lot of work to correlate
all these data.
[/quote]
It's not easy, but it's not as hard as you may think. The main work
comes from two things: computing a SHA-1 hash on each block of data and
looking up that hash in a big hash table. The first is only performed
by each client (speaking of source de-dupe) on new or changed files, so
it's not as bad as you might think. The second can handle quite a few
clients simultaneously without being a bottleneck. At some point, you
may need multiple hash tables and servers to handle the lookup, but the
workload can be distributed. For example, install a second lookup
server and each server handles lookups for half of the total list of
hashes.

As to how fast de-dupe backup software is, it's definitely fast enough
to keep up with remote offices and medium-sized datacenters. Once we
start getting into many TBs of LOCAL data (i.e. a large datacenter),
there are much more efficient ways to back it up. But if the data is
remote, de-dupe backup software is hard to beat.

(These last few comments were about de-dupe backup software -- not to be
confused with de-dupe VTLs. Those actually go VERY fast and can handle
the largest of environments.)
Data Deduplication
August 31, 2007 01:34PM
On Thu, Aug 30, 2007 at 03:09:09AM -0400, Curtis Preston wrote:
[quote]Unlike a de-dupe VTL that can be used with TSM, de-dupe backup software
would replace TSM (or NBU, NW, etc) where it's used. De-dupe backup
software takes TSM's progressive incremental much farther, only backing
up new blocks/fragements/pieces of data that have never been seen by the
backup server. This makes de-dupe backup software really great at
backing up remote offices.
[/quote]
We had Avamar out a few years ago pitching their solution, and we liked
everything about it except the price. (And now that they're a part of
EMC, I don't expect that price to drop much... *smirk*) But since we're
talking about software, there's an aspect of de-dupe that I don't think
has been explicitly mentioned yet. Avamar said their software got
10-20% reduction on a backup of a stock Windows XP installation. A
single system, say it's the first one you added to your backup group.
That's not two users with the same email attachments saved, or identical
files across two systems - that's hashing files in the OS (I presume
from headers in DLLs and such.) So if you backup two identical stock XP
installs, you get 20% reduction on the first one and 100% on the second
and beyond. Scale that up to hundreds of systems, and that's an
incredible cost savings. Suddenly backing up entire systems doesn't
seem so inefficient anymore.

Dave
Data Deduplication
August 31, 2007 01:55PM
Good point. They mainly get that 10-20% with compression. (They use
compression after they've de-duped.) They're at different levels of
granularity, so it still works.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: Dave Mussulman [mailto]
Sent: Friday, August 31, 2007 1:34 PM
To: Curtis Preston
Cc: ADSM-L < at > VM.MARIST.EDU
Subject: Re: Data Deduplication

On Thu, Aug 30, 2007 at 03:09:09AM -0400, Curtis Preston wrote:
[quote]Unlike a de-dupe VTL that can be used with TSM, de-dupe backup
[/quote]software
[quote]would replace TSM (or NBU, NW, etc) where it's used. De-dupe backup
software takes TSM's progressive incremental much farther, only
[/quote]backing
[quote]up new blocks/fragements/pieces of data that have never been seen by
[/quote]the
[quote]backup server. This makes de-dupe backup software really great at
backing up remote offices.
[/quote]
We had Avamar out a few years ago pitching their solution, and we liked
everything about it except the price. (And now that they're a part of
EMC, I don't expect that price to drop much... *smirk*) But since we're
talking about software, there's an aspect of de-dupe that I don't think
has been explicitly mentioned yet. Avamar said their software got
10-20% reduction on a backup of a stock Windows XP installation. A
single system, say it's the first one you added to your backup group.
That's not two users with the same email attachments saved, or identical
files across two systems - that's hashing files in the OS (I presume
from headers in DLLs and such.) So if you backup two identical stock XP
installs, you get 20% reduction on the first one and 100% on the second
and beyond. Scale that up to hundreds of systems, and that's an
incredible cost savings. Suddenly backing up entire systems doesn't
seem so inefficient anymore.

Dave
Data Deduplication
August 31, 2007 03:13PM
On Aug 31, 2007, at 4:33 PM, Dave Mussulman wrote:

[quote]... Avamar said their software got
10-20% reduction on a backup of a stock Windows XP installation. A
single system, say it's the first one you added to your backup group.
That's not two users with the same email attachments saved, or
identical
files across two systems - that's hashing files in the OS (I presume
from headers in DLLs and such.) ...
[/quote]
I'm mildly amused that in all these postings on the subject, none has
addressed the corollary of the backups: restoral. There are likely
some implications in the restoral of files backed up this way,
perhaps most particularly in system files; and restoral performance
is also something one would wonder about. And there may be
situations where such a backup/restore regimen is to be avoided,
because of issues. Perhaps those with experience in this area would
post what they've found.

Richard Sims, at Boston University
Data Deduplication
August 31, 2007 05:37PM
I thought we DID address that in one of the posts. (Maybe I'm getting
things confused with another thread I'm having on the same topic.)

A properly designed de-duplication backup system should restore the data
at the same speed as, if not faster than the backup, and the tests that
I've done with a few of them have all worked this way. I believe it's
something you should test, but it appears that the designers thought of
this natural objection and designed around it.

I believe it has to do with the fact that restoring 100 random pieces to
create a single file means you get to read off of a bunch of spindles.

I will say that there are speed differences between the de-dupe
appliances (VTLs) and de-dupe backup software. De-dupe backup software
still restores fast enough for what it was designed for. (You should be
able to fill a GbE pipe with such a restore.) But they're not going to
restore at the 100s of MB/s that you can get out of one of the
appliances.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Richard Sims
Sent: Friday, August 31, 2007 3:13 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

On Aug 31, 2007, at 4:33 PM, Dave Mussulman wrote:

[quote]... Avamar said their software got
10-20% reduction on a backup of a stock Windows XP installation. A
single system, say it's the first one you added to your backup group.
That's not two users with the same email attachments saved, or
identical
files across two systems - that's hashing files in the OS (I presume
from headers in DLLs and such.) ...
[/quote]
I'm mildly amused that in all these postings on the subject, none has
addressed the corollary of the backups: restoral. There are likely
some implications in the restoral of files backed up this way,
perhaps most particularly in system files; and restoral performance
is also something one would wonder about. And there may be
situations where such a backup/restore regimen is to be avoided,
because of issues. Perhaps those with experience in this area would
post what they've found.

Richard Sims, at Boston University
Data Deduplication
September 01, 2007 05:56AM
"It depends".

Just another thing to think about:

Yes, it sounds cool to reduce the footprint of all those XP files if you
have hundreds of XP systems.

But, at a site where we were backing up about 200 desktops along with
Windoze severs, I sat down and actually spent a bunch of time looking at
what was really getting backed up (there's no quick and easy way to get
this info out of TSM.)

Those OS files, while annoying, are read-only (translation, only 1 copy
per client) and are actually a very small part of today's very large hard
drives. At that particular site where I did the study, I calculated that
the OS files from 200 Windows systems made up less than 10% of the total
data stored in TSM.

Result: Not the place to spend $ or effort in reducing backup footprint.

That's not to say that de-dup won't save you bunches of space somewhere
else; just that you gotta KNOW YOUR DATA to figure out what is worth
doing.

YMWV..

[quote]On Thu, Aug 30, 2007 at 03:09:09AM -0400, Curtis Preston wrote:
[quote]Unlike a de-dupe VTL that can be used with TSM, de-dupe backup software
would replace TSM (or NBU, NW, etc) where it's used. De-dupe backup
software takes TSM's progressive incremental much farther, only backing
up new blocks/fragements/pieces of data that have never been seen by the
backup server. This makes de-dupe backup software really great at
backing up remote offices.
[/quote]
We had Avamar out a few years ago pitching their solution, and we liked
everything about it except the price. (And now that they're a part of
EMC, I don't expect that price to drop much... *smirk*) But since we're
talking about software, there's an aspect of de-dupe that I don't think
has been explicitly mentioned yet. Avamar said their software got
10-20% reduction on a backup of a stock Windows XP installation. A
single system, say it's the first one you added to your backup group.
That's not two users with the same email attachments saved, or identical
files across two systems - that's hashing files in the OS (I presume
from headers in DLLs and such.) So if you backup two identical stock XP
installs, you get 20% reduction on the first one and 100% on the second
and beyond. Scale that up to hundreds of systems, and that's an
incredible cost savings. Suddenly backing up entire systems doesn't
seem so inefficient anymore.

Dave
[/quote]
Data Deduplication
September 04, 2007 07:49AM
Hi Wanda,

I'm thinking that deduplication might be especially useful for all
those copies of Windows System Objects that are backed up
periodically, for sites that have large numbers of Windows client
nodes. TSM/Windows is unable to back them up incrementally, which
means each backup of a System Object is another copy. If you keep
the default 3 copies, and have 500 systems, that's 1500
copies. Granted, not backing up the files that haven't changed in
the first place would be best, but that doesn't seem to be an option
with Windows and TSM. I don't know about front-end dedup such as Avamar.

With Vista, we see the System Object climbing to 7-8GB per copy. In
the above scenario, that would be 12TB without deduplication. Of
course, if you've chosen not to backup System Objects, then this
won't be a factor for you.

It would be easy to target just these System Object files to a
different storage pool on a dedup VTL. The reduction for these files
should be substantial, I would think.

I would agree with your statement that you have to "know your data",
and think about this some. I'm not convinced that throwing a dedup
VTL behind TSM for *all* of your data makes financial sense with
TSM. I'm on the fence about this, until I see some hard
numbers. But I do think there are some good opportunities for
putting a smaller dedup VTL behind TSM for *some* of your data, if
you know which data will dedup well, and if you have enough of it to
make financial sense in your shop.

..Paul

At 08:55 AM 9/1/2007, Wanda Prather wrote:
[quote]"It depends".

Just another thing to think about:

Yes, it sounds cool to reduce the footprint of all those XP files if you
have hundreds of XP systems.

But, at a site where we were backing up about 200 desktops along with
Windoze severs, I sat down and actually spent a bunch of time looking at
what was really getting backed up (there's no quick and easy way to get
this info out of TSM.)

Those OS files, while annoying, are read-only (translation, only 1 copy
per client) and are actually a very small part of today's very large hard
drives. At that particular site where I did the study, I calculated that
the OS files from 200 Windows systems made up less than 10% of the total
data stored in TSM.

Result: Not the place to spend $ or effort in reducing backup footprint.

That's not to say that de-dup won't save you bunches of space somewhere
else; just that you gotta KNOW YOUR DATA to figure out what is worth
doing.

YMWV..
[/quote]

--
Paul Zarnowski Ph: 607-255-4757
Manager, Storage Services Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801 Em: psz1 < at > cornell.edu
Data Deduplication
January 23, 2008 12:28AM
Hi,
What would likely be the de-dupe ratio if tsm clients do archive processing daily (file level, no tdps) with encryption enabled?

Thanks.
Data Deduplication
January 23, 2008 07:33AM
As with all questions like this, the answer is "it depends".
It depends on the make-up of your data (# of DB full dumps, % of DB
dumps to filesystem data, % of change on the client, etc)
It depends on the vendor of DeDupe you are using.

FWIW, I am about to replace a 100TB of LTO tape with a DataDomain 560
dedupe box starting next week. Once the migration from tape to disk is
complete, I will be reporting what I saw in my environment. The DD folks
are saying that the worst case scenario will be a 7X reduction (i.e.
70TB of data squeezed into a 10TB DataDomain appliance). We shall see.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 1:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------

The Blue Cross of Idaho Email Firewall Server made the following annotations:
------------------------------------------------------------------------------
*Confidentiality Notice:

This E-Mail is intended only for the use of the individual
or entity to which it is addressed and may contain
information that is privileged, confidential and exempt
from disclosure under applicable law. If you have received
this communication in error, please do not distribute, and
delete the original message.

Thank you for your compliance.

You may contact us at:
Blue Cross of Idaho
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550

==============================================================================
Data Deduplication
January 23, 2008 07:48AM
Encryption might have a DRAMATIC effect, completely eliminating the
benefits of either deduplication or compression. I predict 1:1. i.e. NO
savings for dedupliaction, with TSM client encryption.

This is why encryption at the tape drive is a very popular option with
LTO4. You can both encrypt and compress at the same time.

Roger Deschner University of Illinois at Chicago rogerd < at > uic.edu
Academic Computing & Communications Center

On Wed, 23 Jan 2008, Ben Bullock wrote:

[quote]As with all questions like this, the answer is "it depends".
It depends on the make-up of your data (# of DB full dumps, % of DB
dumps to filesystem data, % of change on the client, etc)
It depends on the vendor of DeDupe you are using.

FWIW, I am about to replace a 100TB of LTO tape with a DataDomain 560
dedupe box starting next week. Once the migration from tape to disk is
complete, I will be reporting what I saw in my environment. The DD folks
are saying that the worst case scenario will be a 7X reduction (i.e.
70TB of data squeezed into a 10TB DataDomain appliance). We shall see.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 1:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------

The Blue Cross of Idaho Email Firewall Server made the following annotations:
------------------------------------------------------------------------------
*Confidentiality Notice:

This E-Mail is intended only for the use of the individual
or entity to which it is addressed and may contain
information that is privileged, confidential and exempt
from disclosure under applicable law. If you have received
this communication in error, please do not distribute, and
delete the original message.

Thank you for your compliance.

You may contact us at:
Blue Cross of Idaho
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550

==============================================================================
[/quote]
Data Deduplication
January 23, 2008 07:49AM
Oooh, what a great question!
I'd guess if client encryption is on and working, the dedup ratio should be
about 1:1; because the data should never encrypt the same way twice.

On 1/23/08, lamont <tsm-forum < at > backupcentral.com> wrote:
[quote]
Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------
[/quote]
Data Deduplication
January 23, 2008 07:57AM
I agree about client encryption wrecking dedup ratios.

FWIW however, if you turn on both COMPRESSION and ENCRYPTION on the client,
the client is also smart enough to compress first, then encrypt, so you get
the compression benefits.

However, that of course takes a lot of cycles on the client, and can really
slow down restores. Outboard compression/encryption in the hardware is
definitely superior.

On 1/23/08, Roger Deschner <rogerd < at > uic.edu> wrote:
[quote]
Encryption might have a DRAMATIC effect, completely eliminating the
benefits of either deduplication or compression. I predict 1:1. i.e. NO
savings for dedupliaction, with TSM client encryption.

This is why encryption at the tape drive is a very popular option with
LTO4. You can both encrypt and compress at the same time.

Roger Deschner University of Illinois at Chicago rogerd < at > uic.edu
Academic Computing & Communications Center

On Wed, 23 Jan 2008, Ben Bullock wrote:

[quote]As with all questions like this, the answer is "it depends".
It depends on the make-up of your data (# of DB full dumps, % of DB
dumps to filesystem data, % of change on the client, etc)
It depends on the vendor of DeDupe you are using.

FWIW, I am about to replace a 100TB of LTO tape with a DataDomain 560
dedupe box starting next week. Once the migration from tape to disk is
complete, I will be reporting what I saw in my environment. The DD folks
are saying that the worst case scenario will be a 7X reduction (i.e.
70TB of data squeezed into a 10TB DataDomain appliance). We shall see.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 1:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------

The Blue Cross of Idaho Email Firewall Server made the following
[/quote]annotations:

[quote]------------------------------------------------------------------------------
*Confidentiality Notice:

This E-Mail is intended only for the use of the individual
or entity to which it is addressed and may contain
information that is privileged, confidential and exempt
from disclosure under applicable law. If you have received
this communication in error, please do not distribute, and
delete the original message.

Thank you for your compliance.

You may contact us at:
Blue Cross of Idaho
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550

[/quote]
[quote]==============================================================================

[/quote][/quote]
Data Deduplication
January 23, 2008 08:11AM
True as well as any files that are already "Compressed" We have SQL DB's
doing Flat File Dumps to Disk with compression and we see 1.7:1 Ick.....
Also TDP RMAN backups can use Files per set function which if set to
more than 1 RMAN will "multiplex" each file set differently so you see
different data every time. We have our RMAN set to files per set =1
then the DBA's run multiple channels so we see 20:1 of course our DBA's
do fulls daily ....

We've even forced Compress = No in a Server Side Client option set,
which only applies to File System backups, the compression statement
does not apply to the TDP's as far as I know.

Also do what you can to have Like Data go to the same dedupe devices
(assuming you have more than one). Example Oracle Prod / Non-Prod with
their associated OS's go to the Same Dedupe stgpoool, Exchange etc...

Data DeDupe can be cool, but if you do not pay attention your data types
you can ruin a good thing.

I Cant wait to see how the newer Dedupe engines that are coming out that
perform the DeDupe process "Out Of Band" compares to the Inband DeDupe
methodology. Of course the Inbound Devices dedupes as data comes in
which can affect Backup Performance, (just add more widgets) but it will
be interesting to see how "Out of Band" dedupe methodology will perform
if you "get behind" (i.e. Days one Backup Data is still being DeDuped
while your are taking in Day 2's Backup data, then you add in Backup
Stgpool, Reclamation etc that will force the dedupe engine to re-dupe /
re-factor the data everytime the data is read.....

There's been many Dedupe Threads in this user list, you could almost
write a VTL - DeDupe Best Practice Guide.

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
Wanda Prather
Sent: Wednesday, January 23, 2008 9:42 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

Oooh, what a great question!
I'd guess if client encryption is on and working, the dedup ratio should
be about 1:1; because the data should never encrypt the same way twice.

On 1/23/08, lamont <tsm-forum < at > backupcentral.com> wrote:
[quote]
Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+---------------------------------------------------------------------
+-
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+---------------------------------------------------------------------
+-

[/quote]

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Data Deduplication
January 23, 2008 08:13AM
Hmm, I was going to say I'd expect almost none, because the eencryption
wouldn't generate the same data each time through.

But maybe It depends on encyption scheme, on how keys are managed (I would
expect the same data to encrypt the same way if the same keys are used -
although I am no cryptologist), on the level at which the data is
'collated' - changed block, whole files, etc.. etc.. and how the de-dupe
algorithm of choice does it's thing.

Matt.

Internet
rogerd < at > UIC.EDU
To
ADSM-L
Sent by: ADSM-L < at > VM.MARIST.EDU cc

23/01/2008 15:47 Subject
Re: [ADSM-L] Data Deduplication

Please respond to
ADSM-L < at > VM.MARIST.EDU

[quote]As with all questions like this, the answer is "it depends".
It depends on the make-up of your data (# of DB full dumps, % of DB
dumps to filesystem data, % of change on the client, etc)
It depends on the vendor of DeDupe you are using.

FWIW, I am about to replace a 100TB of LTO tape with a DataDomain 560
dedupe box starting next week. Once the migration from tape to disk is
complete, I will be reporting what I saw in my environment. The DD folks
are saying that the worst case scenario will be a 7X reduction (i.e.
70TB of data squeezed into a 10TB DataDomain appliance). We shall see.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 1:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------

The Blue Cross of Idaho Email Firewall Server made the following
[/quote]annotations:
[quote]------------------------------------------------------------------------------
[/quote]
[quote]*Confidentiality Notice:

This E-Mail is intended only for the use of the individual
or entity to which it is addressed and may contain
information that is privileged, confidential and exempt
from disclosure under applicable law. If you have received
this communication in error, please do not distribute, and
delete the original message.

Thank you for your compliance.

You may contact us at:
Blue Cross of Idaho
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550

==============================================================================
[/quote]

This message and any attachments (the "message") is
intended solely for the addressees and is confidential.
If you receive this message in error, please delete it and
immediately notify the sender. Any use not in accord with
its purpose, any dissemination or disclosure, either whole
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message.
BNP PARIBAS (and its subsidiaries) shall (will) not
therefore be liable for the message if modified.
Do not print this message unless it is necessary,
consider the environment.

---------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le
"message") sont etablis a l'intention exclusive de ses
destinataires et sont confidentiels. Si vous recevez ce
message par erreur, merci de le detruire et d'en avertir
immediatement l'expediteur. Toute utilisation de ce
message non conforme a sa destination, toute diffusion
ou toute publication, totale ou partielle, est interdite, sauf
autorisation expresse. L'internet ne permettant pas
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.
Data Deduplication
January 23, 2008 09:30AM
The other posters are correct. You will get 1:1. Dedupe works by
finding patterns. There are no patterns in encrypted data.

One question would be why would you do that? Most people are encrypting
data as it leaves their site. The best way to do that is hardware
encryption (tape drive or SAN-based). Do that on the other side of your
dedupe box and before it goes to tape -- not at the client -- and you'll
have no issues with dedupe.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 12:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------
Data Deduplication
January 23, 2008 09:18PM
Hi Curtis,
Unfortunately, this was already the case when I came, client encryption is the only option and the tapes are needed to be sent to offsite.
I think we need to consider this - enabling/disabling client encryption and see how - in the test case on the upcoming POC with a de-dupe vendor.

Thanks.

[quote]The other posters are correct. You will get 1:1. Dedupe works by
finding patterns. There are no patterns in encrypted data.

One question would be why would you do that? Most people are encrypting
data as it leaves their site. The best way to do that is hardware
encryption (tape drive or SAN-based). Do that on the other side of your
dedupe box and before it goes to tape -- not at the client -- and you'll
have no issues with dedupe.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 12:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------[/quote]
Data Deduplication
January 24, 2008 06:40AM
Is it possible to change from Client Encrypt to Tape Device Encrypt?
(i.e. LTO4 / 3592 etc) The you're encrypting your offsite but your
onsite is now getting better "Factoring" compression ratios.

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 11:18 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi Curtis,
Unfortunately, this was already the case when I came, client encryption
is the only option and the tapes are needed to be sent to offsite.
I think we need to consider this - enabling/disabling client encryption
and see how - in the test case on the upcoming POC with a de-dupe
vendor.

Thanks.

cpreston wrote:
[quote]The other posters are correct. You will get 1:1. Dedupe works by
finding patterns. There are no patterns in encrypted data.

One question would be why would you do that? Most people are
encrypting data as it leaves their site. The best way to do that is
hardware encryption (tape drive or SAN-based). Do that on the other
side of your dedupe box and before it goes to tape -- not at the
client -- and you'll have no issues with dedupe.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf
Of lamont
Sent: Wednesday, January 23, 2008 12:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+---------------------------------------------------------------------
+-
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+---------------------------------------------------------------------
+-
[/quote]

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Data Deduplication
January 24, 2008 08:29PM
Yes and no.

All the data backed up by a client in encrypted format stays encrypted, it
can only be decrypted by the original client (or a client with the original
encryption key). If you turn on encryption on the drives that's OK, if
client-encrypted data gets sent there via MOVE DATA, reclaim, or BACKUP
STGPOOL, it will work fine. The drives apply their own encryption
algorithm, but it's transparent to everybody. The drives won't be able to
compress the client-encrypted data, but you're no worse off than you are
now.

But if you turn on tape encryption, you can turn off client encryption.
Then the drives will compress first, then encrypt, so you get
good compression ratios for the data. If you send your onsite data to a
de-dup VTL and your TSM copy tapes to encrypting drives, you will get the
benefits of dedup in the VTL and the benefits of compression on the
drives. As older data expires, your overall compression ratio will get
better over time.

On 1/24/08, Hart, Charles A <charles_hart < at > uhc.com> wrote:
[quote]
Is it possible to change from Client Encrypt to Tape Device Encrypt?
(i.e. LTO4 / 3592 etc) The you're encrypting your offsite but your
onsite is now getting better "Factoring" compression ratios.

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf Of
lamont
Sent: Wednesday, January 23, 2008 11:18 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi Curtis,
Unfortunately, this was already the case when I came, client encryption
is the only option and the tapes are needed to be sent to offsite.
I think we need to consider this - enabling/disabling client encryption
and see how - in the test case on the upcoming POC with a de-dupe
vendor.

Thanks.

cpreston wrote:
[quote]The other posters are correct. You will get 1:1. Dedupe works by
finding patterns. There are no patterns in encrypted data.

One question would be why would you do that? Most people are
encrypting data as it leaves their site. The best way to do that is
hardware encryption (tape drive or SAN-based). Do that on the other
side of your dedupe box and before it goes to tape -- not at the
client -- and you'll have no issues with dedupe.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto] On Behalf
Of lamont
Sent: Wednesday, January 23, 2008 12:29 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: [ADSM-L] Data Deduplication

Hi,
What would likely be the de-dupe ratio if tsm clients do archive
processing daily (file level, no tdps) with encryption enabled?

Thanks.

+---------------------------------------------------------------------
+-
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+---------------------------------------------------------------------
+-
[/quote]

+----------------------------------------------------------------------
|This was sent by alancb < at > gmail.com via Backup Central.
|Forward SPAM to abuse < at > backupcentral.com.
+----------------------------------------------------------------------

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
[/quote]
Sorry, only registered users may post in this forum.

Click here to login