Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/pbuc/public_html/forum/mods/ext_phorummail/ezc/Base/src/ezc_bootstrap.php on line 36

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; KeyCAPTCHA_CLASS has a deprecated constructor in /home/pbuc/public_html/forum/mods/keycaptcha/keycaptcha.php on line 108
Parallel save streams per save set?
Welcome! » Log In » Create A New Profile

Parallel save streams per save set?

Posted by George Sinclair - NOAA Federal 
George Sinclair - NOAA Federal
Parallel save streams per save set?
October 04, 2019 01:59AM
Three questions here on the 'parallel save streams per save set' option.
I don't have access to the EMC optimization documentation right now, so
what I could find is minimal.

I never used this feature before, so I was just testing it out. A client
has a single save set, e.g. /data (group=test). The 'parallel save
streams per save set' option is enabled for the client resource.
Client parallelism = 4.

I ran a backup as, `savegrp -l full group`, and it backed up /data as
four separate pieces (all running concurrently):

<1>/data
<2>/data
<3>/data
/data

Each of these has a different ssid, maybe no surprise there.

1. How do you recover the data? How do you piece them together?

If indexing is enabled for the pool, do you just run a browsable
recovery (GUI or CLI), just as you normally would if the stream option
was not enabled in the client resource, and it figures it all out?

2. How would you perform a save set recovery of these pieces? Is that
possible if the stream option is enabled?

When I tested this, I had 'save -S' specified for the backup command
value in the client resource, wherein I'm not creating index entries.
The feature works, but it's unclear how you would piece anything
together if indexing was disabled (pool or client).

3. If /data had four primary subdirectories (1-4), all about the same
size, then couldn't I achieve the same results if I specified four save
sets as:

/data/1
/data/2
/data/3
/data/4

and left the stream option disabled? Seems this option automates this
for you, I guess?

Thanks.

George

--
George Sinclair
Voice: (301) 713-4921
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Preston de Guise
Re: Parallel save streams per save set?
October 04, 2019 02:59AM
Hi George,

Answers inline.

> ´╗┐Three questions here on the 'parallel save streams per save set' option. I don't have access to the EMC optimization documentation right now, so what I could find is minimal.
>
> I never used this feature before, so I was just testing it out. A client has a single save set, e.g. /data (group=test). The 'parallel save streams per save set' option is enabled for the client resource.
> Client parallelism = 4.
>
> I ran a backup as, `savegrp -l full group`, and it backed up /data as four separate pieces (all running concurrently):
>
> <1>/data
> <2>/data
> <3>/data
> /data
>
> Each of these has a different ssid, maybe no surprise there.
>
> 1. How do you recover the data? How do you piece them together?

Just run the recovery. NetWorker automatically works out what bits of each saveset it needs if you're doing a recovery of selected files or directories.

> If indexing is enabled for the pool, do you just run a browsable recovery (GUI or CLI), just as you normally would if the stream option was not enabled in the client resource, and it figures it all out?

Yes, correct.

> 2. How would you perform a save set recovery of these pieces? Is that possible if the stream option is enabled?

If you were to do a saveset recovery, you would do a saveset recovery of all SSIDs associated with the backup.

> When I tested this, I had 'save -S' specified for the backup command value in the client resource, wherein I'm not creating index entries. The feature works, but it's unclear how you would piece anything together if indexing was disabled (pool or client).
>
> 3. If /data had four primary subdirectories (1-4), all about the same size, then couldn't I achieve the same results if I specified four save sets as:
>
> /data/1
> /data/2
> /data/3
> /data/4
>
> and left the stream option disabled? Seems this option automates this for you, I guess?

It's a highly optimised process. There's multiple walkers/save processes being used and generally speaking you'll get a fairly balanced outcome, particularly with dynamic PSS, introduced in 9.2 (I think it was 9.2). In that, if a PSS stream finishes early, NetWorker will start up another PSS savestream if possible to continue to drive additional reads.

> --
> George Sinclair
> Voice: (301) 713-4921
> - The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
> - Any opinions expressed in this message are NOT those of the US Govt. -
>
>
> --
> This list is hosted as a public service at Temple University by Stan Horwitz
> If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
> If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
George Sinclair - NOAA Federal
Re: Parallel save streams per save set?
October 04, 2019 02:59AM
On 10/3/19 10:01 PM, Preston de Guise wrote:
> Hi George,
>
> Answers inline.
>
>> ´╗┐Three questions here on the 'parallel save streams per save set' option. I don't have access to the EMC optimization documentation right now, so what I could find is minimal.
>>
>> I never used this feature before, so I was just testing it out. A client has a single save set, e.g. /data (group=test). The 'parallel save streams per save set' option is enabled for the client resource.
>> Client parallelism = 4.
>>
>> I ran a backup as, `savegrp -l full group`, and it backed up /data as four separate pieces (all running concurrently):
>>
>> <1>/data
>> <2>/data
>> <3>/data
>> /data
>>
>> Each of these has a different ssid, maybe no surprise there.
>>
>> 1. How do you recover the data? How do you piece them together?
> Just run the recovery. NetWorker automatically works out what bits of each saveset it needs if you're doing a recovery of selected files or directories.
>
>> If indexing is enabled for the pool, do you just run a browsable recovery (GUI or CLI), just as you normally would if the stream option was not enabled in the client resource, and it figures it all out?
> Yes, correct.
>
>> 2. How would you perform a save set recovery of these pieces? Is that possible if the stream option is enabled?
> If you were to do a saveset recovery, you would do a saveset recovery of all SSIDs associated with the backup.
Thanks, much :). So let's say there's four chunks. If I only wanted a
single subdirectory then I would run the recovery like this?:

recover -s server -S ssid1 ssid2 ssid3 ssid4 /path/desired_subdir

How do you collate these pieces? How do you know which ones go with the
same backup? Is it just a matter of common sense based on media database
fields that would separate it from a different (or subsequent) backup
like start and completion times and the group name? Is there any danger
or concerns here, e.g. media database reporting or indexes?

>
>> When I tested this, I had 'save -S' specified for the backup command value in the client resource, wherein I'm not creating index entries. The feature works, but it's unclear how you would piece anything together if indexing was disabled (pool or client).
>>
>> 3. If /data had four primary subdirectories (1-4), all about the same size, then couldn't I achieve the same results if I specified four save sets as:
>>
>> /data/1
>> /data/2
>> /data/3
>> /data/4
>>
>> and left the stream option disabled? Seems this option automates this for you, I guess?
> It's a highly optimised process. There's multiple walkers/save processes being used and generally speaking you'll get a fairly balanced outcome, particularly with dynamic PSS, introduced in 9.2 (I think it was 9.2). In that, if a PSS stream finishes early, NetWorker will start up another PSS savestream if possible to continue to drive additional reads.
>
>> --
>> George Sinclair
>> Voice: (301) 713-4921
>> - The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
>> - Any opinions expressed in this message are NOT those of the US Govt. -
>>
>>
>> --
>> This list is hosted as a public service at Temple University by Stan Horwitz
>> If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
>> If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu


--
George Sinclair
Voice: (301) 713-4921
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Preston de Guise
Re: Parallel save streams per save set?
October 04, 2019 02:59AM
Hi George,

>>
>>> 2. How would you perform a save set recovery of these pieces? Is that possible if the stream option is enabled?
>> If you were to do a saveset recovery, you would do a saveset recovery of all SSIDs associated with the backup.
> Thanks, much :). So let's say there's four chunks. If I only wanted a single subdirectory then I would run the recovery like this?:
>
> recover -s server -S ssid1 ssid2 ssid3 ssid4 /path/desired_subdir

That might work, I've never tried doing a multiple SSID recovery and specify a path - wouldn't hurt to check.

> How do you collate these pieces? How do you know which ones go with the same backup? Is it just a matter of common sense based on media database fields that would separate it from a different (or subsequent) backup like start and completion times and the group name? Is there any danger or concerns here, e.g. media database reporting or indexes?

NetWorker maintains extended saveset attributes to allow it to track and know which SSIDs in a PSS backup depend on each other. Check out here for more details on how to see that yourself:

https://nsrd.info/blog/2019/03/20/basics-determining-pss-saveset-dependencies/

Cheers,
Preston.

> --
> George Sinclair
> Voice: (301) 713-4921
> - The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
> - Any opinions expressed in this message are NOT those of the US Govt. -
>
>
> --
> This list is hosted as a public service at Temple University by Stan Horwitz
> If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
> If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Re: Parallel save streams per save set?
October 09, 2019 01:40PM
Well - PSS is a brilliant method to speed up backups.
Unfortunately, NW has the general behavior that each of the resulting save sets will be treated independently.
This leaves room for easy mistakes.

Yes, you get a list of 'mbs dependents' for the 'master save set' but first of all, you have to become aware that the save set is a PSS save set at all. Usually, you do not see the dependencies and after a while you might forget about this fact.

What is the result: you have to take extreme care when you do a save set clone or recovery as you will most likely not remember the various 'chunks'. The consequences:
- For a save set recover process you will easily become aware that you are missing some data.
- For a clone process you might even 'lose' data on the clone volume if you do not consider all chunks.
- For a save set deletion process, you might leave 'orphans' behind as each chunk can be deleted individually.

So NW leaves quite a bunch of responsibilities for the user/admin while it could do better checking dependencies in general. That would eliminate a bunch of potential issues.
And we know that NW already discovers dependencies automatically - I just wonder why this feature has not been implemented here.
Sorry, only registered users may post in this forum.

Click here to login