SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Stupid question about TSM server-side dedup
Author Message
Post Stupid question about TSM server-side dedup 
Have a customer would like to go all disk backups using TSM dedup. This would be a benefit to them in several respects, not the least in having the ability to replicate to another TSM server using the features in 6.3.

The customer has a requirement to keep their NDMP dumps 6 months. (I know that's not desirable, but the backup group has no choice in the matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool. They should dedup like crazy, but client-side dedup is not an option (as there is no client).

So here's the question. NDMP backups come into the filepool and identify duplicates is running. But because of those long retention times, all the volumes in the filepool are FULL, but 0% reclaimable, and they will continue to be that way for 6 months, as no dumps will expire until then. Since the dedup occurs as part of reclaim, and the volumes won't reclaim -how do we "prime the pump" and get this data to dedup? Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather | Senior Technical Specialist | wprather < at > icfi.com<mailto:wprather < at > icfi.com> | www.icf.com<http://www.icf.com>
ICF International | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 | 410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/social>

Post Stupid question about TSM server-side dedup 
I'm sure Wanda has thought of this, but since she didn't explain why she ruled it out, I'll throw it out just in case:

BACKUP NODE allows for a management class. I presume the customer is doing a mix of full differentials and fulls. Why not do an extra differential with a different management class specifically to get something to expire early and thus start the dedupe process?

I love my NDMP clients, too. They really do screw up the tape expiration patterns, especially when a weekly full fails and a prior full lasts an extra week.

Nick


On Nov 21, 2011, at 10:40 PM, Prather, Wanda wrote:

Have a customer would like to go all disk backups using TSM dedup. This would be a benefit to them in several respects, not the least in having the ability to replicate to another TSM server using the features in 6.3.

The customer has a requirement to keep their NDMP dumps 6 months. (I know that's not desirable, but the backup group has no choice in the matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool. They should dedup like crazy, but client-side dedup is not an option (as there is no client).

So here's the question. NDMP backups come into the filepool and identify duplicates is running. But because of those long retention times, all the volumes in the filepool are FULL, but 0% reclaimable, and they will continue to be that way for 6 months, as no dumps will expire until then. Since the dedup occurs as part of reclaim, and the volumes won't reclaim -how do we "prime the pump" and get this data to dedup? Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather | Senior Technical Specialist | wprather < at > icfi.com<mailto:wprather < at > icfi.com> | www.icf.com<http://www.icf.com>
ICF International | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 | 410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/social>

Post Stupid question about TSM server-side dedup 
Hi,


I am not sure if I understand the question.


First backup -> Full Backup and only few duplicate chunks are identified by dedup process. 

Second Backup -> Full as well. But the identify process "mark" a lot of duplicate chunks.


Then, expiration process removes the entries in the DB. Finally the reclamation process will remove the chunks deduplicated.


Regards,

Fran


________________________________
De: "Prather, Wanda" <wPrather < at > ICFI.COM>
Para: ADSM-L < at > VM.MARIST.EDU
Enviado: martes 22 de noviembre de 2011 5:40
Asunto: Stupid question about TSM server-side dedup

Have a customer would like to go all disk backups using TSM dedup.  This would be a benefit to them in several respects, not the least in having the ability to replicate to another TSM server using the features in 6.3.

The customer has a requirement to keep their NDMP dumps 6 months.  (I know that's not desirable, but the backup group has no choice in the matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool.  They should dedup like crazy, but client-side dedup is not an option (as there is no client).

So here's the question.  NDMP backups come into the filepool and identify duplicates is running.  But because of those long retention times, all the volumes in the filepool are FULL, but 0% reclaimable, and they will continue to be that way for 6 months, as no dumps will expire until then.  Since the dedup occurs as part of reclaim, and the volumes won't reclaim -how do we "prime the pump" and get this data to dedup?  Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather  |  Senior Technical Specialist  | wprather < at > icfi.com<mailto:wprather < at > icfi.com>  |  www.icf.com<http://www.icf.com>
ICF International  | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 | 410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/social>

Post Stupid question about TSM server-side dedup 
On 11/21/2011 11:40 PM, Prather, Wanda wrote:

So here's the question. NDMP backups come into the filepool and
identify duplicates is running. But because of those long retention
times, all the volumes in the filepool are FULL, but 0% reclaimable,
and they will continue to be that way for 6 months, as no dumps will
expire until then. Since the dedup occurs as part of reclaim, and
the volumes won't reclaim -how do we "prime the pump" and get this
data to dedup? Should we do a few MOVE DATAs to get the volumes
partially empty?


Would RECLAIMSTGPOOL help you here?

Original usecase was a disk stgpool to permit those with a single
drive to put the data somewhere whilst reclaiming, and the reclaimstg
would then eventually drain back to the primary stgpool.

But you might have the NDMP data filter into a pool in which you force
reclamation of _everything_, and have it debouche into another stg. I
know that dedupe is per-pool, but I seem to recall that it works to
move from a dedupe pool to another dedupe pool?

- Allen S. Rout

Post Stupid question about TSM server-side dedup 
Wanda,

when id dup finds duplicate chunks in the same storagepool, it will
raise the pct_reclaim
value for the volume it is working on. If the pct_reclaim isn't going
up, that means there
are no duplicate chunks being found. Id dup is still chunking the
backups up (watch you database grow!)
but all the chunks are unique.

Is it possible that the ndmp agent in the storage appliance is putting
in unique metadata with each file?
This would make every backup appear to be unique in chunk-speak.

I remember from the v6 beta that the standard v6 clients were enhanced
so that the metadata could
be better identified by id dup and skipped over so that it could just
work on the files and get
better dedup ratios. If id dup doesn't know how to skip over the
metadata in an ndmp stream, and
the metadata is always changed, then you will get very low dedup ratios.

If you do a 'q pr' while the id dup is running, do the processes say
they are finding duplicates?

Bill Colwell
Draper Lab

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Prather, Wanda
Sent: Monday, November 21, 2011 11:41 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Stupid question about TSM server-side dedup

Have a customer would like to go all disk backups using TSM dedup. This
would be a benefit to them in several respects, not the least in having
the ability to replicate to another TSM server using the features in
6.3.

The customer has a requirement to keep their NDMP dumps 6 months. (I
know that's not desirable, but the backup group has no choice in the
matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool.
They should dedup like crazy, but client-side dedup is not an option (as
there is no client).

So here's the question. NDMP backups come into the filepool and
identify duplicates is running. But because of those long retention
times, all the volumes in the filepool are FULL, but 0% reclaimable, and
they will continue to be that way for 6 months, as no dumps will expire
until then. Since the dedup occurs as part of reclaim, and the volumes
won't reclaim -how do we "prime the pump" and get this data to dedup?
Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather | Senior Technical Specialist |
wprather < at > icfi.com<mailto:wprather < at > icfi.com> |
www.icf.com<http://www.icf.com>
ICF International | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 |
410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/social>

Post Stupid question about TSM server-side dedup 
Wanda,

Are the identify processes issuing any failure notices in the activity log ?

You can check if id dup processes have found duplicate chunks yet to be
reclaimed by running 'show deduppending <stgpoolname>' WARNING, can
take a long time to return if stgpool is large, don't panic !

I am unfamiliar with NDMP backup but off the top of my head a couple of
other (simple) things to check would be:
is the server-side SERVERDEDUPETXNLIMIT option set very low and
preventing dedup id ?

Have these dumps been backed up to copypool yet ? ( perhaps you've
overlooked the deduperequiresbackup option at the server )?
- IIRC the identify processes run but find nothing if this option is set
and the data has not yet been backed up to copypool.

Ian Smith


On 22/11/11 15:17, Colwell, William F. wrote:
Wanda,

when id dup finds duplicate chunks in the same storagepool, it will
raise the pct_reclaim
value for the volume it is working on. If the pct_reclaim isn't going
up, that means there
are no duplicate chunks being found. Id dup is still chunking the
backups up (watch you database grow!)
but all the chunks are unique.

Is it possible that the ndmp agent in the storage appliance is putting
in unique metadata with each file?
This would make every backup appear to be unique in chunk-speak.

I remember from the v6 beta that the standard v6 clients were enhanced
so that the metadata could
be better identified by id dup and skipped over so that it could just
work on the files and get
better dedup ratios. If id dup doesn't know how to skip over the
metadata in an ndmp stream, and
the metadata is always changed, then you will get very low dedup ratios.

If you do a 'q pr' while the id dup is running, do the processes say
they are finding duplicates?

Bill Colwell
Draper Lab

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Prather, Wanda
Sent: Monday, November 21, 2011 11:41 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Stupid question about TSM server-side dedup

Have a customer would like to go all disk backups using TSM dedup. This
would be a benefit to them in several respects, not the least in having
the ability to replicate to another TSM server using the features in
6.3.

The customer has a requirement to keep their NDMP dumps 6 months. (I
know that's not desirable, but the backup group has no choice in the
matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool.
They should dedup like crazy, but client-side dedup is not an option (as
there is no client).

So here's the question. NDMP backups come into the filepool and
identify duplicates is running. But because of those long retention
times, all the volumes in the filepool are FULL, but 0% reclaimable, and
they will continue to be that way for 6 months, as no dumps will expire
until then. Since the dedup occurs as part of reclaim, and the volumes
won't reclaim -how do we "prime the pump" and get this data to dedup?
Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather | Senior Technical Specialist |
wprather < at > icfi.com<mailto:wprather < at > icfi.com> |
www.icf.com<http://www.icf.com>
ICF International | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 |
410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/social>

Post Stupid question about TSM server-side dedup 
NDMP data is not dedupable by TSM when using filepools (as opposed to a VTL
like the Protectier that does a great job at it) because it is stuffed with
date/time stamps and TSM can't parse the files correctly for hashing at the
moment., when TSM sees NDMP data is doesn't even try to dedupe the data.

So NDMP with TSM dedupe is not happening at the moment, I don't know about
the roadmap for this feature.




On Tue, Nov 22, 2011 at 5:23 PM, Ian Smith <ian.smith < at > oucs.ox.ac.uk> wrote:

Wanda,

Are the identify processes issuing any failure notices in the activity log
?

You can check if id dup processes have found duplicate chunks yet to be
reclaimed by running 'show deduppending <stgpoolname>' WARNING, can
take a long time to return if stgpool is large, don't panic !

I am unfamiliar with NDMP backup but off the top of my head a couple of
other (simple) things to check would be:
is the server-side SERVERDEDUPETXNLIMIT option set very low and
preventing dedup id ?

Have these dumps been backed up to copypool yet ? ( perhaps you've
overlooked the deduperequiresbackup option at the server )?
- IIRC the identify processes run but find nothing if this option is set
and the data has not yet been backed up to copypool.

Ian Smith



On 22/11/11 15:17, Colwell, William F. wrote:

Wanda,

when id dup finds duplicate chunks in the same storagepool, it will
raise the pct_reclaim
value for the volume it is working on. If the pct_reclaim isn't going
up, that means there
are no duplicate chunks being found. Id dup is still chunking the
backups up (watch you database grow!)
but all the chunks are unique.

Is it possible that the ndmp agent in the storage appliance is putting
in unique metadata with each file?
This would make every backup appear to be unique in chunk-speak.

I remember from the v6 beta that the standard v6 clients were enhanced
so that the metadata could
be better identified by id dup and skipped over so that it could just
work on the files and get
better dedup ratios. If id dup doesn't know how to skip over the
metadata in an ndmp stream, and
the metadata is always changed, then you will get very low dedup ratios.

If you do a 'q pr' while the id dup is running, do the processes say
they are finding duplicates?

Bill Colwell
Draper Lab

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of
Prather, Wanda
Sent: Monday, November 21, 2011 11:41 PM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Stupid question about TSM server-side dedup

Have a customer would like to go all disk backups using TSM dedup. This
would be a benefit to them in several respects, not the least in having
the ability to replicate to another TSM server using the features in
6.3.

The customer has a requirement to keep their NDMP dumps 6 months. (I
know that's not desirable, but the backup group has no choice in the
matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool.
They should dedup like crazy, but client-side dedup is not an option (as
there is no client).

So here's the question. NDMP backups come into the filepool and
identify duplicates is running. But because of those long retention
times, all the volumes in the filepool are FULL, but 0% reclaimable, and
they will continue to be that way for 6 months, as no dumps will expire
until then. Since the dedup occurs as part of reclaim, and the volumes
won't reclaim -how do we "prime the pump" and get this data to dedup?
Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather | Senior Technical Specialist |
wprather < at > icfi.com<mailto:wprat**her < at > icfi.com <wprather < at > icfi.com>> |
www.icf.com<http://www.icf.com**>
ICF International | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 |
410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/**social<http://www.icfi.com/social>



Post Stupid question about TSM server-side dedup 
Wanda,

TSM 6.3 introduced de-duplication of Netapp (nSeries) data. Look up the
ENABLENASDEDUPE server option.


Ken Bury

Post Stupid question about TSM server-side dedup 
Cool, I missed that one, it only works on Ontap NAS boxes but it's nice! Smile

On Wed, Nov 23, 2011 at 1:11 PM, Kenneth Bury <kenbury1 < at > gmail.com> wrote:

Wanda,

TSM 6.3 introduced de-duplication of Netapp (nSeries) data. Look up the
ENABLENASDEDUPE server option.


Ken Bury


Post Stupid question about TSM server-side dedup 
Aha! Thanks Ken...

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU] On Behalf Of Kenneth Bury
Sent: Wednesday, November 23, 2011 7:12 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: Re: [ADSM-L] Stupid question about TSM server-side dedup

Wanda,

TSM 6.3 introduced de-duplication of Netapp (nSeries) data. Look up the ENABLENASDEDUPE server option.


Ken Bury

Post Stupid question about TSM server-side dedup 
Thanks to Ian, Stephan, Ken and everybody who responding. We're downloading 6.3...

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB