Subscribe to Mailing Lists     FAQFAQ    SearchSearch      Register  Log in to check your private messagesLog in to check your private messages    Log inLog in 
These forums brought to you by Backup Central, where we also have the Mr. Backup Blog, Mailing Lists, FAQs,
and Directories of Backup Software and Hardware
Data Deduplication
Goto page 1, 2, 3, 4  Next
 
Post new topic   Reply to topic    Backup Central Forums Forum Index -> IBM TSM
View previous topic :: View next topic  
Author Message
Hughes, George
Guest





PostPosted: Sun Aug 26, 2007 1:16 am    Post subject: Data Deduplication Reply with quote

Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in files
or simple file name changes would result in additional copies of the
entire file using TSM today.

We recently had a pitch from EMC on avamar. I can think of some reasons
to pass on it (Having two separate backup/restore solutions is a big
one, cost etc) but some persuasive arguments were made supporting their
solution. If TSM is going to be adding similar functionality soon it may
be another reason to focus on other efforts.



George Hughes

Senior UNIX Engineer

Children's National Medical Center

12211 Plum Orchard Dr.

Silver Spring, MD 20904

(301) 572-3693


Confidentiality Notice: This e-mail message, including any attachments, is
for the sole use of the intended recipient(s) and may contain confidential
and privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original message.
Back to top
Richard Sims
Guest





PostPosted: Sun Aug 26, 2007 4:24 am    Post subject: Data Deduplication Reply with quote

On Aug 26, 2007, at 4:58 AM, Hughes, George wrote:

Quote:
Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in
files
or simple file name changes would result in additional copies of the
entire file using TSM today.

Except in Windows, where Adaptive Subfile Backup may be employed.
That's as far as it has gone in the product thus far.

Richard Sims
Back to top
Fred Johanson
Guest





PostPosted: Sun Aug 26, 2007 7:25 am    Post subject: Data Deduplication Reply with quote

But it is being pursued for future release - after the conversion of the DB to DB2.

________________________________

From: ADSM: Dist Stor Manager on behalf of Richard Sims
Sent: Sun 8/26/2007 7:23 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication



On Aug 26, 2007, at 4:58 AM, Hughes, George wrote:

Quote:
Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in
files
or simple file name changes would result in additional copies of the
entire file using TSM today.

Except in Windows, where Adaptive Subfile Backup may be employed.
That's as far as it has gone in the product thus far.

Richard Sims
Back to top
cpreston
Site Admin


Joined: 04 May 2007
Posts: 667

PostPosted: Sun Aug 26, 2007 3:06 pm    Post subject: Data Deduplication Reply with quote

Quote:
Is TSM planning on adding data deduplication similar to avamar?

As mentioned by Richard, the closest thing TSM has to this now is
subfile backup. It is related to de-duplication, where once it has a
backup of a given file, it backs up only the changed bytes of that file.
This is also referred to as delta incrementals.

True de-duplication takes this much farther, as it would recognize a
file or email that's duplicated on two or three different systems, such
as an attachment/email that's sent to users on several different
Exchange servers. The "compression" ratios it can achieve are therefore
much higher than delta differentials.

Quote:
I understand how TSM does not duplicate data now but minor edits in
files
Quote:
or simple file name changes would result in additional copies of the
entire file using TSM today.

Instead of switching from TSM to something like Avamar (EMC) or Puredisk
(Symantec), a TSM user can benefit from de-dupe today by using a
de-duplication backup target, such as de-dupe VTL or NAS device. Just
make sure you realize that you won't the same de-dupe as non-TSM users.
(TSM customers who switch to a de-dupe target are seeing approximately
10:1 de-dupe ratios, where non-TSM customers are seeing 20:1.)

Most TSM users don't do repeated full backups of their filesystems, and
a lot of the duplicated data comes from those full backups. But TSM
users still have duplicated data: multiple versions of the same file and
database backups. You already mentioned edited versions of the same
file. It is also common that a file will be present in multiple places.
In addition, TSM users do perform periodic full backups of their
database data.

Quote:
We recently had a pitch from EMC on avamar. I can think of some reasons
to pass on it (Having two separate backup/restore solutions is a big
one, cost etc) but some persuasive arguments were made supporting their
solution.

If you like the idea of using de-dupe to backup your remote offices
(which is what Avamar and Puredisk are designed for), but want to stay
with TSM, again de-dupe targets can help. Buy a small de-dupe target to
place at your remote site, perform TSM backups to it, then replicate the
new/unique blocks to a central location as your offsite mechanism.

Quote:
If TSM is going to be adding similar functionality soon it may
be another reason to focus on other efforts.

Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.
Back to top
View user's profile Send private message
Dirk Kastens
Guest





PostPosted: Sun Aug 26, 2007 11:31 pm    Post subject: Data Deduplication Reply with quote

Hi,

Quote:
Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.

We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.

--
Regards,

Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470
Back to top
cpreston
Site Admin


Joined: 04 May 2007
Posts: 667

PostPosted: Mon Aug 27, 2007 4:27 am    Post subject: Data Deduplication Reply with quote

How are you using it? As your disk cache? You have to store backups on
it long term in order to get de-duplication.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Dirk Kastens
Sent: Monday, August 27, 2007 12:31 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

Hi,

Quote:
Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.

We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.

--
Regards,

Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470
Back to top
View user's profile Send private message
Charles A Hart
Guest





PostPosted: Mon Aug 27, 2007 5:24 am    Post subject: Data Deduplication Reply with quote

Being that TSM does incremental, your de-dupe ratio will be lower than
other Full / Incr backup products. Here's a few lessons learned with TSM
and a Diligent Protectier.

1) Do the best you can to put like data together. (ie all Oracle
DB Backups go to the same de-dupe VirtualTape head (Repository),

2) Turn off all compression (Client and DB's)

3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time thus
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)

4) Do not Mix Windows and Unix data, It wont de-dupe well.

We are seeing 10 and 15:1 on our Oracle and DB2, Exchange 12:1 (The DB's
and Exchange all do Daily Full Backups) In the regular Win env, we see
1.45 and 2:1 ick...

Hope this helps.

Regards,

Charles





Dirk Kastens <Dirk.Kastens < at > UNI-OSNABRUECK.DE>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 02:31 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>


To
ADSM-L < at > VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Data Deduplication






Hi,

Quote:
Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.

We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.

--
Regards,

Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470



This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Back to top
Allen S. Rout
Guest





PostPosted: Mon Aug 27, 2007 6:46 am    Post subject: Data Deduplication Reply with quote

Quote:
Quote:
On Sun, 26 Aug 2007 04:58:45 -0400, "Hughes, George" <GHughes < at > CNMC.ORG> said:


Quote:
Is TSM planning on adding data deduplication similar to avamar? I
understand how TSM does not duplicate data now but minor edits in
files or simple file name changes would result in additional copies
of the entire file using TSM today.

The 'Rename a top level directory' problem is the best case I've seen
for something like this in TSM.

Every pitch I've yet seen on de-dupe has glossed over where the
metadata goes, and how it's defended. If you see a data deduplication
solution which doesn't take at -least- as much care over the DB as we
do in TSM land, then my opinion is "Flee at flank speed".

In TSM-land we're very sensitive to the fact that our TSM database is
both the key to our featureset and the most delicate part of our
infrastructure, so we take neurotic degrees of care with it. I think
perhaps the new products have yet to blood themselves. Be careful it
doesn't splash on you when they do. Smile


- Allen S. Rout
Back to top
Paul Zarnowski
Guest





PostPosted: Mon Aug 27, 2007 7:04 am    Post subject: Data Deduplication Reply with quote

At 09:24 AM 8/27/2007, Charles A Hart wrote:
Quote:
We are seeing 10 and 15:1 on our Oracle and DB2, Exchange 12:1 (The
DB's and Exchange all do Daily Full Backups) In the regular Win
env, we see 1.45 and 2:1 ick...

Charles, how many windows clients was this with? I've been thinking
about this and am thinking that targeting specific data to a smaller
deduping VTL might make more sense than just putting everything
there. Specifically, windows System Objects might be a good
candidate, as well as e-mail attachments.

Thanks for the Oracle hints.

..Paul




--
Paul Zarnowski Ph: 607-255-4757
Manager, Storage Services Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801 Em: psz1 < at > cornell.edu
Back to top
Wanda Prather
Guest





PostPosted: Mon Aug 27, 2007 7:32 am    Post subject: Data Deduplication Reply with quote

This is GREAT information, thanks much!


Quote:
Being that TSM does incremental, your de-dupe ratio will be lower than
other Full / Incr backup products. Here's a few lessons learned with TSM
and a Diligent Protectier.

1) Do the best you can to put like data together. (ie all Oracle
DB Backups go to the same de-dupe VirtualTape head (Repository),

2) Turn off all compression (Client and DB's)

3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time thus
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)

4) Do not Mix Windows and Unix data, It wont de-dupe well.

We are seeing 10 and 15:1 on our Oracle and DB2, Exchange 12:1 (The DB's
and Exchange all do Daily Full Backups) In the regular Win env, we see
1.45 and 2:1 ick...

Hope this helps.

Regards,

Charles





Dirk Kastens <Dirk.Kastens < at > UNI-OSNABRUECK.DE>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 02:31 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>


To
ADSM-L < at > VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Data Deduplication






Hi,

Quote:
Writing a de-dupe backup product isn't easy. EMC bought Avamar and
Symantec bought Data Center Technologies to get their respective
products. I don't know of any other de-dupe companies for IBM to
acquire, so they'll have to write their own. That may take them a bit
longer.

We're just testing a deduplication disk array from DataDomain with TSM.
The compression ratio is much less than promised by the sales people.
During the last 10 days of incremental backups we only achieved a ratio
of 2.6:1. The disk array is very expensive and for the money you can buy
more disks than you need without compression.

--
Regards,

Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +49-541-969-2347, FAX: -2470



This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Back to top
cpreston
Site Admin


Joined: 04 May 2007
Posts: 667

PostPosted: Mon Aug 27, 2007 8:40 am    Post subject: Data Deduplication Reply with quote

Quote:
3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
thus
Quote:
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)

I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.

Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose?
Back to top
View user's profile Send private message
cpreston
Site Admin


Joined: 04 May 2007
Posts: 667

PostPosted: Mon Aug 27, 2007 8:48 am    Post subject: Data Deduplication Reply with quote

Quote:
Every pitch I've yet seen on de-dupe has glossed over where the
metadata goes, and how it's defended. If you see a data deduplication
solution which doesn't take at -least- as much care over the DB as we
do in TSM land, then my opinion is "Flee at flank speed".

I have seen that, but my experience is that if you ask the right
questions, you'll get the right answers. If they DON'T give you the
right answers, then go on to the next vendor. Wink
Back to top
View user's profile Send private message
Ben Bullock
Guest





PostPosted: Mon Aug 27, 2007 8:56 am    Post subject: Data Deduplication Reply with quote

Preston, I believe it depends on the de-dupe technology being used. We
have started to play with the NetApp iSIS (dedupe product) and at least
in their case they don't look at every block coming into the host.

Their documentation is lacking, but from what we have been able
to deduce, it seems to take a hash of the first chunk of all the files,
them compares hashes and then tries to de-dupe if the hashes match. We
saw that 400GB of 5GB files took about 3 minutes to try to dedupe and
400GB of 1MB files took over 23 hours. In this case the number of files
seems to dictate how long a de-dupe will take, to me, that doesn't sound
like it is looking at every block, because the number of blocks with
data on the filer are actually the same between the 2 attempts.

Like I said, this is my interpretation of the results of my
testing, not anything I saw documented.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Curtis Preston
Sent: Monday, August 27, 2007 10:40 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: Data Deduplication

Quote:
3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
thus
Quote:
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)

I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.

Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose?
Back to top
Charles A Hart
Guest





PostPosted: Mon Aug 27, 2007 9:53 am    Post subject: Data Deduplication Reply with quote

According to Dilligent, when RMAN uses Multiplexing, it intermingles the
data from each RMAN so the data block will be different every time so the
blocks are different, similar to Multiplexing with Netbackup... I'm not
an RMAN expert, just trusting what the Vendor is stating.

The following link seems to match with what we are being told

http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmconc1002.htm
(Look for the Multiplex Section)

Is there an RMAn expert in the house? Can some one confirm this info?

Charles Hart
UHT - Data Protection
(763)744-2263
Sharepoint:
http://unitedteams.uhc.com/uht/EnterpriseStorage/DataProtection/default.aspx




Curtis Preston <cpreston < at > GLASSHOUSE.COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 11:40 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>


To
ADSM-L < at > VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Data Deduplication






Quote:
3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
thus
Quote:
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)

I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.

Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose?



This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Back to top
cpreston
Site Admin


Joined: 04 May 2007
Posts: 667

PostPosted: Mon Aug 27, 2007 10:23 am    Post subject: Data Deduplication Reply with quote

I agree that this is what Oracle does. What I'm not sure is whether or
not this de-dupe issue applies to de-dupe vendors other than Diligent.
I've fired off a few emails and I'll reply when they do.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Charles A Hart
Sent: Monday, August 27, 2007 10:53 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Data Deduplication

According to Dilligent, when RMAN uses Multiplexing, it intermingles
the
data from each RMAN so the data block will be different every time so
the
blocks are different, similar to Multiplexing with Netbackup... I'm
not
an RMAN expert, just trusting what the Vendor is stating.

The following link seems to match with what we are being told

http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmconc10
02.htm
(Look for the Multiplex Section)

Is there an RMAn expert in the house? Can some one confirm this info?

Charles Hart
UHT - Data Protection
(763)744-2263
Sharepoint:
http://unitedteams.uhc.com/uht/EnterpriseStorage/DataProtection/default.
aspx




Curtis Preston <cpreston < at > GLASSHOUSE.COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>
08/27/2007 11:40 AM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L < at > VM.MARIST.EDU>


To
ADSM-L < at > VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Data Deduplication






Quote:
3) Oracle Specific
Do not use RMAN's Multiplexing in RMAN will combine 4
Channels together and the backup data then will be unique every time
thus
Quote:
not allowing for de-duping)
Use the File Seq=1 (Then run multiple channels)

I don't see how this would affect de-duplication if your de-dupe product
knows what it's doing. Every block coming into the device should be
compared to every other block ever seen by the device. So combining
multiple files together using Oracle multiplexing shouldn't affect
de-dupe.

Did you test this, or see it in the docs somewhere? Was this true for
multiple de-dupe vendors, or just the one you chose?



This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Backup Central Forums Forum Index -> IBM TSM All times are GMT - 8 Hours
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Magic SEO URL for phpBB