| View previous topic :: View next topic |
| Author |
Message |
Hughes, George Guest
|
Posted: Sun Aug 26, 2007 1:16 am Post subject: Data Deduplication |
|
|
Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in files
or simple file name changes would result in additional copies of the
entire file using TSM today.
We recently had a pitch from EMC on avamar. I can think of some reasons
to pass on it (Having two separate backup/restore solutions is a big
one, cost etc) but some persuasive arguments were made supporting their
solution. If TSM is going to be adding similar functionality soon it may
be another reason to focus on other efforts.
George Hughes
Senior UNIX Engineer
Children's National Medical Center
12211 Plum Orchard Dr.
Silver Spring, MD 20904
(301) 572-3693
Confidentiality Notice: This e-mail message, including any attachments, is
for the sole use of the intended recipient(s) and may contain confidential
and privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original message. |
|
| Back to top |
|
 |
Richard Sims Guest
|
Posted: Sun Aug 26, 2007 4:24 am Post subject: Data Deduplication |
|
|
On Aug 26, 2007, at 4:58 AM, Hughes, George wrote:
| Quote: | Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in
files
or simple file name changes would result in additional copies of the
entire file using TSM today.
|
Except in Windows, where Adaptive Subfile Backup may be employed.
That's as far as it has gone in the product thus far.
Richard Sims |
|
| Back to top |
|
 |
Fred Johanson Guest
|
Posted: Sun Aug 26, 2007 7:25 am Post subject: Data Deduplication |
|
|
But it is being pursued for future release - after the conversion of the DB to DB2.
________________________________
From: ADSM: Dist Stor Manager on behalf of Richard Sims
Sent: Sun 8/26/2007 7:23 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication
On Aug 26, 2007, at 4:58 AM, Hughes, George wrote:
| Quote: | Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in
files
or simple file name changes would result in additional copies of the
entire file using TSM today.
|
Except in Windows, where Adaptive Subfile Backup may be employed.
That's as far as it has gone in the product thus far.
Richard Sims |
|
| Back to top |
|
 |
cpreston Site Admin
Joined: 04 May 2007 Posts: 667
|
Posted: Sun Aug 26, 2007 3:06 pm Post subject: Data Deduplication |
|
|
| Quote: | Is TSM planning on adding data deduplication similar to avamar?
|
As mentioned by Richard, the closest thing TSM has to this now is
subfile backup. It is related to de-duplication, where once it has a
backup of a given file, it backs up only the changed bytes of that file.
This is also referred to as delta incrementals.
True de-duplication takes this much farther, as it would recognize a
file or email that's duplicated on two or three different systems, such
as an attachment/email that's sent to users on several different
Exchange servers. The "compression" ratios it can achieve are therefore
much higher than delta differentials.
| Quote: | I understand how TSM does not duplicate data now but minor edits in
| files
| Quote: | or simple file name changes would result in additional copies of the
entire file using TSM today.
|
Instead of switching from TSM to something like Avamar (EMC) or Puredisk
(Symantec), a TSM user can benefit from de-dupe today by using a
de-duplication backup target, such as de-dupe VTL or NAS device. Just
make sure you realize that you won't the same de-dupe as non-TSM users.
(TSM customers who switch to a de-dupe target are seeing approximately
10:1 de-dupe ratios, where non-TSM customers are seeing 20:1.)
Most TSM users don't do repeated full backups of their filesystems, and
a lot of the duplicated data comes from those full backups. But TSM
users still have duplicated data: multiple versions of the same file and
database backups. You already mentioned edited versions of the same
file. It is also common that a file will be present in multiple places.
In addition, TSM users do perform periodic full backups of their
database data.
| Quote: | We recently had a pitch from EMC on avamar. I can think of some reasons
to pass on it (Having two separate backup/restore solutions is a big
one, cost etc) but some persuasive arguments were made supporting their
solution.
|
If you like the idea of using de-dupe to backup your remote offices
(which is what Avamar and Puredisk are designed for), but want to stay
with TSM, again de-dupe targets can help. Buy a small de-dupe target to
place at your remote site, perform TSM backups to it, then replicate the
new/unique blocks to a central location as your offsite mechanism.
| Quote: | If TSM is going to be adding similar functionality soon it may
be another reason to focus on other efforts.
|
Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer. |
|
| Back to top |
|
 |
Dirk Kastens Guest
|
Posted: Sun Aug 26, 2007 11:31 pm Post subject: Data Deduplication |
|
|
Hi,
| Quote: | Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.
|
We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.
--
Regards,
Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470 |
|
| Back to top |
|
 |
cpreston Site Admin
Joined: 04 May 2007 Posts: 667
|
Posted: Mon Aug 27, 2007 4:27 am Post subject: Data Deduplication |
|
|
How are you using it? As your disk cache? You have to store backups on
it long term in order to get de-duplication.
---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Dirk Kastens
Sent: Monday, August 27, 2007 12:31 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication
Hi,
| Quote: | Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.
|
We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.
--
Regards,
Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470 |
|
| Back to top |
|
 |
Charles A Hart Guest
|
Posted: Mon Aug 27, 2007 5:24 am Post subject: Data Deduplication |
|
|
Being that TSM does incremental, your de-dupe ratio will be lower than
other Full / Incr backup products. Here's a few lessons learned with TSM
and a Diligent Protectier.
1) Do the best you can to put like data together. (ie all Oracle
DB Backups go to the same de-dupe VirtualTape head (Repository),
2) Turn off all compression (Client and DB's)
3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time thus
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)
4) Do not Mix Windows and Unix data, It wont de-dupe well.
We are seeing 10 and 15:1 on our Oracle and DB2, Exchange 12:1 (The DB's
and Exchange all do Daily Full Backups) In the regular Win env, we see
1.45 and 2:1 ick...
Hope this helps.
Regards,
Charles
Dirk Kastens <Dirk.Kastens < at > UNI-OSNABRUECK.DE>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 02:31 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
To
ADSM-L < at > VM.MARIST.EDU
cc
Subject
Re: [ADSM-L] Data Deduplication
Hi,
| Quote: | Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.
|
We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.
--
Regards,
Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470
This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately. |
|
| Back to top |
|
 |
Allen S. Rout Guest
|
Posted: Mon Aug 27, 2007 6:46 am Post subject: Data Deduplication |
|
|
| Quote: | | Quote: | On Sun, 26 Aug 2007 04:58:45 -0400, "Hughes, George" <GHughes < at > CNMC.ORG> said:
|
|
| Quote: | Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in
files or simple file name changes would result in additional copies
of the entire file using TSM today.
|
The 'Rename a top level directory' problem is the best case I've seen
for something like this in TSM.
Every pitch I've yet seen on de-dupe has glossed over where the
metadata goes, and how it's defended. If you see a data deduplication
solution which doesn't take at -least- as much care over the DB as we
do in TSM land, then my opinion is "Flee at flank speed".
In TSM-land we're very sensitive to the fact that our TSM database is
both the key to our featureset and the most delicate part of our
infrastructure, so we take neurotic degrees of care with it. I think
perhaps the new products have yet to blood themselves. Be careful it
doesn't splash on you when they do.
- Allen S. Rout |
|
| Back to top |
|
 |
Paul Zarnowski Guest
|
Posted: Mon Aug 27, 2007 7:04 am Post subject: Data Deduplication |
|
|
At 09:24 AM 8/27/2007, Charles A Hart wrote:
| Quote: | We are seeing 10 and 15:1 on our Oracle and DB2, Exchange 12:1 (The
DB's and Exchange all do Daily Full Backups) In the regular Win
env, we see 1.45 and 2:1 ick...
|
Charles, how many windows clients was this with? I've been thinking
about this and am thinking that targeting specific data to a smaller
deduping VTL might make more sense than just putting everything
there. Specifically, windows System Objects might be a good
candidate, as well as e-mail attachments.
Thanks for the Oracle hints.
..Paul
--
Paul Zarnowski Ph: 607-255-4757
Manager, Storage Services Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801 Em: psz1 < at > cornell.edu |
|
| Back to top |
|
 |
Wanda Prather Guest
|
Posted: Mon Aug 27, 2007 7:32 am Post subject: Data Deduplication |
|
|
This is GREAT information, thanks much!
| Quote: | Being that TSM does incremental, your de-dupe ratio will be lower than
other Full / Incr backup products. Here's a few lessons learned with TSM
and a Diligent Protectier.
1) Do the best you can to put like data together. (ie all Oracle
DB Backups go to the same de-dupe VirtualTape head (Repository),
2) Turn off all compression (Client and DB's)
3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time thus
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)
4) Do not Mix Windows and Unix data, It wont de-dupe well.
We are seeing 10 and 15:1 on our Oracle and DB2, Exchange 12:1 (The DB's
and Exchange all do Daily Full Backups) In the regular Win env, we see
1.45 and 2:1 ick...
Hope this helps.
Regards,
Charles
Dirk Kastens <Dirk.Kastens < at > UNI-OSNABRUECK.DE>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 02:31 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
To
ADSM-L < at > VM.MARIST.EDU
cc
Subject
Re: [ADSM-L] Data Deduplication
Hi,
| Quote: | Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.
|
We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.
--
Regards,
Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470
This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
|
|
|
| Back to top |
|
 |
cpreston Site Admin
Joined: 04 May 2007 Posts: 667
|
Posted: Mon Aug 27, 2007 8:40 am Post subject: Data Deduplication |
|
|
| Quote: | 3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
| thus
| Quote: | not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)
|
I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.
Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose? |
|
| Back to top |
|
 |
cpreston Site Admin
Joined: 04 May 2007 Posts: 667
|
Posted: Mon Aug 27, 2007 8:48 am Post subject: Data Deduplication |
|
|
| Quote: | Every pitch I've yet seen on de-dupe has glossed over where the
metadata goes, and how it's defended. If you see a data deduplication
solution which doesn't take at -least- as much care over the DB as we
do in TSM land, then my opinion is "Flee at flank speed".
|
I have seen that, but my experience is that if you ask the right
questions, you'll get the right answers. If they DON'T give you the
right answers, then go on to the next vendor.  |
|
| Back to top |
|
 |
Ben Bullock Guest
|
Posted: Mon Aug 27, 2007 8:56 am Post subject: Data Deduplication |
|
|
Preston, I believe it depends on the de-dupe technology being used. We
have started to play with the NetApp iSIS (dedupe product) and at least
in their case they don't look at every block coming into the host.
Their documentation is lacking, but from what we have been able
to deduce, it seems to take a hash of the first chunk of all the files,
them compares hashes and then tries to de-dupe if the hashes match. We
saw that 400GB of 5GB files took about 3 minutes to try to dedupe and
400GB of 1MB files took over 23 hours. In this case the number of files
seems to dictate how long a de-dupe will take, to me, that doesn't sound
like it is looking at every block, because the number of blocks with
data on the filer are actually the same between the 2 attempts.
Like I said, this is my interpretation of the results of my
testing, not anything I saw documented.
Ben
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Curtis Preston
Sent: Monday, August 27, 2007 10:40 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: Data Deduplication
| Quote: | 3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
| thus
| Quote: | not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)
|
I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.
Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose? |
|
| Back to top |
|
 |
Charles A Hart Guest
|
Posted: Mon Aug 27, 2007 9:53 am Post subject: Data Deduplication |
|
|
According to Dilligent, when RMAN uses Multiplexing, it intermingles the
data from each RMAN so the data block will be different every time so the
blocks are different, similar to Multiplexing with Netbackup... I'm not
an RMAN expert, just trusting what the Vendor is stating.
The following link seems to match with what we are being told
http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmconc1002.htm
(Look for the Multiplex Section)
Is there an RMAn expert in the house? Can some one confirm this info?
Charles Hart
UHT - Data Protection
(763)744-2263
Sharepoint:
http://unitedteams.uhc.com/uht/EnterpriseStorage/DataProtection/default.aspx
Curtis Preston <cpreston < at > GLASSHOUSE.COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 11:40 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
To
ADSM-L < at > VM.MARIST.EDU
cc
Subject
Re: [ADSM-L] Data Deduplication
| Quote: | 3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
| thus
| Quote: | not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)
|
I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.
Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose?
This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately. |
|
| Back to top |
|
 |
cpreston Site Admin
Joined: 04 May 2007 Posts: 667
|
Posted: Mon Aug 27, 2007 10:23 am Post subject: Data Deduplication |
|
|
I agree that this is what Oracle does. What I'm not sure is whether or
not this de-dupe issue applies to de-dupe vendors other than Diligent.
I've fired off a few emails and I'll reply when they do.
---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Charles A Hart
Sent: Monday, August 27, 2007 10:53 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication
According to Dilligent, when RMAN uses Multiplexing, it intermingles
the
data from each RMAN so the data block will be different every time so
the
blocks are different, similar to Multiplexing with Netbackup... I'm
not
an RMAN expert, just trusting what the Vendor is stating.
The following link seems to match with what we are being told
http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmconc10
02.htm
(Look for the Multiplex Section)
Is there an RMAn expert in the house? Can some one confirm this info?
Charles Hart
UHT - Data Protection
(763)744-2263
Sharepoint:
http://unitedteams.uhc.com/uht/EnterpriseStorage/DataProtection/default.
aspx
Curtis Preston <cpreston < at > GLASSHOUSE.COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 11:40 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
To
ADSM-L < at > VM.MARIST.EDU
cc
Subject
Re: [ADSM-L] Data Deduplication
| Quote: | 3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
| thus
| Quote: | not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)
|
I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.
Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose?
This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately. |
|
| Back to top |
|
 |
|