> When running an incremental or numeric backup, I thought NetWorker walked
the file system (or named path/saveset) to generate a list of what files
will be backed up.
> 1. Is this true?
Yes.

> 2. If so, does it also do this when running a full?
I believe it does, but the "walk" takes less time because NetWorker doesn't
have to do any time comparisons to determine whether or not to back up each
individual file.

> I will sometimes see warning message(s) in the savegroup completion
notifications for a pathname/file whose size grew or shrunk during save.
> It seems, therefore, that the only way it would know this would be to
compare the file's size before and after the backup.
Correct. NetWorker compares the file's size and mtime it found during the
"walk" against the same values when it's ready to actually write the backup
to media.

> 3. When it does this comparison does it determine the pre-backup size on
the fly, just before it backs up the file? Or does it instead already have
its size stored in
> the listing that it generated when it did the initial walk-through?
I believe it reads the inode information (or Windows equivalent) to get the
mtime and size.

> Sometimes there will be a warning message regarding a file that changed
during save.
> 4. How does it determine this change? Does it use some kind of fast
checksum like a CRC before/after or maybe even a security cryptographic hash
(seems that would add
> a lot of time to the backups)? Or does it simply infer a change if the
modtimes/ctimes differ before/after?
It just looks at the mtime and size at the time of the "walk" vs. those
values when it's ready to back up the file.

> 5. Does it store the pre-backup times and/or CRCs in the walk-through
listing or does it generate those on the fly before/after?
I'm pretty sure they're generated on the fly, although the file time and
size are stored in NetWorker's databases, there would be no value in
referencing information about prior backups before beginning a new one. It
only looks at time and size, not CRC.

> Are any of these details documented anywhere?
I think the documents with details of the inner workings of NetWorker are
accessible only to EMC employees. You might find some hints in the Technical
Overview document for the version you're running, particularly if the
process changed since the prior version. Here's the one for 9.2:
https://support.emc.com/docu87149_DELl_EMC_NETWORKER_9.2_-_TECHNICAL_OVERVIE
W.pdf?language=en_US&language=en_US

----

Your second message is related so I'll address the questions here.

> If a file is added or modified after a backup starts but before any data
is sent, say while it's still walking the file system, then it's anyone's
guess whether it will
> get backed up on that backup, or instead the next one, as there's no way
to know how far it's into its walk-through, i.e. whether it's already past
that point.
> Is that right?
Yes.

> So let's say the file system has a small number of inodes in use, and a
file is copied there or created just after NetWorker starts its walking,
then it would be much
> more likely that the file would not get backed up this time as the
walk-through might be very fast, most likely completing before the file
could be created or copied
> there, maybe?
> Alternatively, if a very large number of inodes are in use then the
probability of it not reaching that point in its walk-through before the
file is added/copied there
> would be higher, albeit maybe detecting that the file grew or changed
during the save. Something like that?
> I know there are a lot of factors that could come into play, but is that
the general idea?
Basically, what you're saying here is:
1. The longer it takes NetWorker to "walk" the file system, the more
probable a file created or changed file during the "walk" will be processed
in that backup
2. A backup with few changed files will have a shorter "walk" and therefore
a lower probability of reaching the file before it changed
3. Conversely, a backup with many changed files will be more likely to reach
the file after it is created or changed.
All this is correct, but you have to be aware that we're dealing in
probabilities. It's not something you can count on.

> I often see files whose ctimes (Linux) are newer than the start time of
the last incremental, but older than the completion time. These get captured
on the next
> incremental. I've always inferred that either they weren't there when
NetWorker walked the file system on the previous backup, or they were, but
one or more file
> attributes (e.g. permissions, owner, group, etc.) was changed after the
walk-through completed, or after it passed that point where the file lived,
thus being
> left off that backup and not captured until the next one.
This is true for a file that was created after NetWorker "walked" the file
system and before it did the backup. The file won't be in the work list so
it won't get backed up. But if NetWorker encountered a file during the
"walk" and the file had changed before it got around to backing it up,
NetWorker will back up the modified file and generate a warning message.


DISCLAIMER: All this is based on my understanding, which may be incomplete,
inaccurate or out of date.



From: EMC Data Protection Q & A
[mailto:EMC-DATAPROTECTION-L@LISTSERV.TEMPLE.EDU] On Behalf Of
EMC-DATAPROTECTION-L automatic digest system
Sent: Saturday, August 11, 2018 12:00 AM
To: EMC-DATAPROTECTION-L@LISTSERV.TEMPLE.EDU
Subject: EMC-DATAPROTECTION-L Digest - 7 Aug 2018 to 10 Aug 2018 (#2018-37)



 

EMC-DATAPROTECTION-L Digest - 7 Aug 2018 to 10 Aug 2018 (#2018-37)
Table of contents:
• Some questions on shrunk, grew and changed files?
• Adding a file after backups start?
1. Some questions on shrunk, grew and changed files?
o Some questions on shrunk, grew and changed files? (08/10)
From: George Sinclair - NOAA Federal <george.sinclair@NOAA.GOV>
2. Adding a file after backups start?
o Adding a file after backups start? (08/10)
From: George Sinclair - NOAA Federal <george.sinclair@NOAA.GOV>


Browse the EMC-DATAPROTECTION-L online archives.



Virus-free. www.avg.com



---
This email has been checked for viruses by AVG.
https://www.avg.com


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
George Sinclair - NOAA Federal
Re: EMC-DATAPROTECTION-L Digest - 7 Aug 2018 to 10 Aug 2018 (#2018-37)
August 15, 2018 08:00AM
On 2018-08-13 11:44, Conrad L. Macina wrote:
Thanks much for your reply. :) This indeed corroborates things.

George
>> When running an incremental or numeric backup, I thought NetWorker walked
> the file system (or named path/saveset) to generate a list of what files
> will be backed up.
>> 1. Is this true?
> Yes.
>
>> 2. If so, does it also do this when running a full?
> I believe it does, but the "walk" takes less time because NetWorker doesn't
> have to do any time comparisons to determine whether or not to back up each
> individual file.
>
>> I will sometimes see warning message(s) in the savegroup completion
> notifications for a pathname/file whose size grew or shrunk during save.
>> It seems, therefore, that the only way it would know this would be to
> compare the file's size before and after the backup.
> Correct. NetWorker compares the file's size and mtime it found during the
> "walk" against the same values when it's ready to actually write the backup
> to media.
>
>> 3. When it does this comparison does it determine the pre-backup size on
> the fly, just before it backs up the file? Or does it instead already have
> its size stored in
>> the listing that it generated when it did the initial walk-through?
> I believe it reads the inode information (or Windows equivalent) to get the
> mtime and size.
>
>> Sometimes there will be a warning message regarding a file that changed
> during save.
>> 4. How does it determine this change? Does it use some kind of fast
> checksum like a CRC before/after or maybe even a security cryptographic hash
> (seems that would add
>> a lot of time to the backups)? Or does it simply infer a change if the
> modtimes/ctimes differ before/after?
> It just looks at the mtime and size at the time of the "walk" vs. those
> values when it's ready to back up the file.
>
>> 5. Does it store the pre-backup times and/or CRCs in the walk-through
> listing or does it generate those on the fly before/after?
> I'm pretty sure they're generated on the fly, although the file time and
> size are stored in NetWorker's databases, there would be no value in
> referencing information about prior backups before beginning a new one. It
> only looks at time and size, not CRC.
>
>> Are any of these details documented anywhere?
> I think the documents with details of the inner workings of NetWorker are
> accessible only to EMC employees. You might find some hints in the Technical
> Overview document for the version you're running, particularly if the
> process changed since the prior version. Here's the one for 9.2:
> https://support.emc.com/docu87149_DELl_EMC_NETWORKER_9.2_-_TECHNICAL_OVERVIE
> W.pdf?language=en_US&language=en_US
>
> ----
>
> Your second message is related so I'll address the questions here.
>
>> If a file is added or modified after a backup starts but before any data
> is sent, say while it's still walking the file system, then it's anyone's
> guess whether it will
>> get backed up on that backup, or instead the next one, as there's no way
> to know how far it's into its walk-through, i.e. whether it's already past
> that point.
>> Is that right?
> Yes.
>
>> So let's say the file system has a small number of inodes in use, and a
> file is copied there or created just after NetWorker starts its walking,
> then it would be much
>> more likely that the file would not get backed up this time as the
> walk-through might be very fast, most likely completing before the file
> could be created or copied
>> there, maybe?
>> Alternatively, if a very large number of inodes are in use then the
> probability of it not reaching that point in its walk-through before the
> file is added/copied there
>> would be higher, albeit maybe detecting that the file grew or changed
> during the save. Something like that?
>> I know there are a lot of factors that could come into play, but is that
> the general idea?
> Basically, what you're saying here is:
> 1. The longer it takes NetWorker to "walk" the file system, the more
> probable a file created or changed file during the "walk" will be processed
> in that backup
> 2. A backup with few changed files will have a shorter "walk" and therefore
> a lower probability of reaching the file before it changed
> 3. Conversely, a backup with many changed files will be more likely to reach
> the file after it is created or changed.
> All this is correct, but you have to be aware that we're dealing in
> probabilities. It's not something you can count on.
>
>> I often see files whose ctimes (Linux) are newer than the start time of
> the last incremental, but older than the completion time. These get captured
> on the next
>> incremental. I've always inferred that either they weren't there when
> NetWorker walked the file system on the previous backup, or they were, but
> one or more file
>> attributes (e.g. permissions, owner, group, etc.) was changed after the
> walk-through completed, or after it passed that point where the file lived,
> thus being
>> left off that backup and not captured until the next one.
> This is true for a file that was created after NetWorker "walked" the file
> system and before it did the backup. The file won't be in the work list so
> it won't get backed up. But if NetWorker encountered a file during the
> "walk" and the file had changed before it got around to backing it up,
> NetWorker will back up the modified file and generate a warning message.
>
>
> DISCLAIMER: All this is based on my understanding, which may be incomplete,
> inaccurate or out of date.
>
>
>
> From: EMC Data Protection Q & A
> [mailto:EMC-DATAPROTECTION-L@LISTSERV.TEMPLE.EDU] On Behalf Of
> EMC-DATAPROTECTION-L automatic digest system
> Sent: Saturday, August 11, 2018 12:00 AM
> To: EMC-DATAPROTECTION-L@LISTSERV.TEMPLE.EDU
> Subject: EMC-DATAPROTECTION-L Digest - 7 Aug 2018 to 10 Aug 2018 (#2018-37)
>
>
>
>
>
> EMC-DATAPROTECTION-L Digest - 7 Aug 2018 to 10 Aug 2018 (#2018-37)
> Table of contents:
> • Some questions on shrunk, grew and changed files?
> • Adding a file after backups start?
> 1. Some questions on shrunk, grew and changed files?
> o Some questions on shrunk, grew and changed files? (08/10)
> From: George Sinclair - NOAA Federal <george.sinclair@NOAA.GOV>
> 2. Adding a file after backups start?
> o Adding a file after backups start? (08/10)
> From: George Sinclair - NOAA Federal <george.sinclair@NOAA.GOV>
>
>
> Browse the EMC-DATAPROTECTION-L online archives.
>
>
>
> Virus-free. www.avg.com
>
>
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>


--
George Sinclair
Voice: (301) 713-4921
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Sorry, only registered users may post in this forum.

Click here to login