SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Multiple drives for recovery?
Author Message
Post Multiple drives for recovery? 
Anyone else seen this?

I run a browsable recover (CLI), requiring one full and three
incremental tapes. NW loads all tapes (6 drives in tape library) and
starts reading from each, rather than starting with the full and working
through one tape at a time, to the last incr?

This is on 7.5SP1 on RH Linux. Client is running an older 7.2.2 release.

I think I might have seen this happen every once in a while on older
server releases? Is it possible that NW can somehow read from multiple
tapes for the recover and manage all that, say, under certain recovers,
maybe depending on what it knows needs to be recovered that won't be
jeopardized by anything else that needs to be recovered from one of the
other tapes?

George

--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or
unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

Post Multiple drives for recovery? 
In regard to: [Networker] Multiple drives for recovery?, George Sinclair...:

Anyone else seen this?

I run a browsable recover (CLI), requiring one full and three incremental
tapes. NW loads all tapes (6 drives in tape library) and starts reading from
each, rather than starting with the full and working through one tape at a
time, to the last incr?

This is on 7.5SP1 on RH Linux. Client is running an older 7.2.2 release.

It's funny that you should bring up something like this. I've been
meaning to post to the list about a major issue with parallelized
multi-volume recovers. I don't want to hijack your thread so I'll
post more details in a different thread.

Yes, we've seen NetWorker parallelize multi-volume recovers. Most of the
time it works pretty well. IIRC, this is something that was added in the
7.x series (earlier versions would always serialize volume access). It
used to be configurable by creating a file in /nsr/debug (do a substring
search of the mailing list archives for striped_recover for more info).

We have, however, seen a few instances where recover apparently deadlocks
in the striped recovery code. This happened to us to a couple of times
under 7.2.x or 7.4.x, but we upgraded to 7.5.2 last week and the first big
recover we had to do triggered a deadlock in recovery. We've had a case
open with EMC about this issue since last Friday.

Tim
--
Tim Mooney Tim.Mooney < at > ndsu.edu
Enterprise Computing & Infrastructure 701-231-1076 (Voice)
Room 242-J6, IACC Building 701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

View user's profile Send private message
Post Multiple drives for recovery? 
Tim Mooney wrote:
In regard to: [Networker] Multiple drives for recovery?, George
Sinclair...:

Anyone else seen this?

I run a browsable recover (CLI), requiring one full and three
incremental tapes. NW loads all tapes (6 drives in tape library) and
starts reading from each, rather than starting with the full and
working through one tape at a time, to the last incr?

This is on 7.5SP1 on RH Linux. Client is running an older 7.2.2 release.

It's funny that you should bring up something like this. I've been
meaning to post to the list about a major issue with parallelized
multi-volume recovers. I don't want to hijack your thread so I'll
post more details in a different thread.

Well, this all started because I ran a browsable recover (CLI recover
tool) last night from client A, pointing to an older 7.x server1. The
browsetime that I used was from back in January 2010. The recover
required 4 tapes (one full, three incrementals). It ran the conventional
way by loading each tape one at a time. About 7.5 GB of data was
recovered. Everything looked fine. The dates and times all looked
consistent. I then created an MD5 hash listing of all the recovered data
(file permissions, owner, group, checksum, file size, etc.).

Next, I moved the tapes over to the new 7.5SP1 server2's tape library. I
repeated the recovery, from the same client A, and this time it loaded
the tapes simultaneously and starts reading from all of them, and I'm
thinking: "What the heck!!!????". I don't notice any error messages or
overwrite prompts in the recover window, however. The recover completes
and indicates the same number of recovered files. **BUT**, when I
validated it against the MD5 hash listing from above, it reports a
number of directories with new time stamps - new as in the date of the
recover, not their original mod times. Only directories had new mod
times, but not all of them; some were fine. Otherwise, everything else
was identical. I thought it was odd that only the mod times for
*certain* directories were not preserved during the recovery, but
otherwise, all other files are perfect, as are a number of other
directories.

So, today, I repeated the recovery, but I first disabled all but one of
the drives to force NW to load tapes one at a time. Eureka! It worked
like a champ, and everything validated!

I should also note that before any of the recovers, I generated hash
listing of the CFI for client A on both servers, and they were identical
except for the directory structure under /nsr/index/db6. Otherwise, all
the files were the same and the same number of files. Moreover, nsrinfo
for the given date/time produces identical results for both servers.


Yes, we've seen NetWorker parallelize multi-volume recovers. Most of the
time it works pretty well. IIRC, this is something that was added in the
7.x series (earlier versions would always serialize volume access). It
used to be configurable by creating a file in /nsr/debug (do a substring
search of the mailing list archives for striped_recover for more info).

I find it hard to believe that NW can utilize multiple drives. How does
it merge and/or munge everything properly? What if you're instead
recovering from multiple fulls? How can it temporarily store all that
data as disk space could become jeopardized at some point. How does it
organize and/or re-conglomerate all that later?

Granted, this is a feature that I've always thought would be nice as it
would cut down restore times by many factors if it could recover in
parallel, but again, this raises my questions above. I wasn't aware that
this feature was ever developed and in use in later versions. I
generally watch the GUI when doing restores, and I've always seen tapes
loaded one by one on 7.2.x releases, but 7.5SP1 is all new to us.


We have, however, seen a few instances where recover apparently deadlocks
in the striped recovery code. This happened to us to a couple of times
under 7.2.x or 7.4.x, but we upgraded to 7.5.2 last week and the first big
recover we had to do triggered a deadlock in recovery. We've had a case
open with EMC about this issue since last Friday.

What do you mean by 'deadlocks'?

Do you think the parallel recovery would most likely explain the
weirdness that we see?

I was thinking to try upgrading the client software on client A, but I
doubt that has any effect over what the server decides to do on its end
in terms of loading those tapes. Also, it seems unlikely that the index
itself is somehow the culprit. Granted, it might know which tapes the
data is located on, but the server is still gonna handle the loading.
Moreover, the jukebox configuration and/or the tape library seems an
unlikely suspect as it just does what the server tells it.

If I have to disable drives during multi-tape restores that's gonna be a
real pain. sigh ...

George


Tim


--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or
unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

Post Multiple drives for recovery? 
In regard to: Re: [Networker] Multiple drives for recovery?, George...:

Yes, we've seen NetWorker parallelize multi-volume recovers. Most of the
time it works pretty well. IIRC, this is something that was added in the
7.x series (earlier versions would always serialize volume access). It
used to be configurable by creating a file in /nsr/debug (do a substring
search of the mailing list archives for striped_recover for more info).

I find it hard to believe that NW can utilize multiple drives. How does it
merge and/or munge everything properly? What if you're instead
recovering from multiple fulls?

Huh? Recovering from multiple fulls? When (and why?!) would you recover
from multiple fulls, simultaneously, and restore all of the data into the
original location? I have to be misunderstanding something, cause I'm not
following.

Think about the browse process. Let's say you need to recover three files
from a directory. fileA never changes, so it only gets backed up when you
do a full backup. fileB changes once a week, so it only gets backed up on
your Saturday backups. fileC changes daily, so it gets backed up every
day.

Now you browse the index, find that directory, do a

add fileA fileB fileC

and then run

volumes

and it shows you that it will require 3 tape volumes.

NetWorker knows that only one copy of fileA is coming back (from the
full), only one copy of fileB is coming back (from the tape from Saturday)
and only one copy of fileC is coming back (from the most recently daily).

It can parallelize the recovery because there are no conflicts.

Now I anticipate it would make the coding much simpler if NetWorker
always recovered from backups of a lower level first, so your full
would be the first one that got recovered. If fileB and fileC were
both backed up at level 'incr', there's no reason why they couldn't be
recovered in either order or completely in parallel.

How can it temporarily store all that
data as disk space could
become jeopardized at some point. How does it organize and/or re-conglomerate
all that later?

I'm totally not following.

We have, however, seen a few instances where recover apparently deadlocks
in the striped recovery code. This happened to us to a couple of times
under 7.2.x or 7.4.x, but we upgraded to 7.5.2 last week and the first big
recover we had to do triggered a deadlock in recovery. We've had a case
open with EMC about this issue since last Friday.

What do you mean by 'deadlocks'?

I mean pauses indefinitely and can't seem to be "prodded" into continuing.
The recovery process stops after one or more tapes have been read (so
it's part way through the process of recovering the files that were
requested) and never proceeds with subsequent tapes.

There are reasons why this can happen, like media database corruption
(Networker knows the ssids it needs but can't figure out which tapes
they're on) or issues with the jukebox resource (it's partially corrupt
and NetWorker knows the tape is nearline but doesn't know where it is),
but both of those issues were ruled out in the case we had open.

I was thinking to try upgrading the client software on client A, but I doubt
that has any effect over what the server decides to do on its end in terms of
loading those tapes.

That's exactly what I thought, but it turns out to be incorrect, much to
my great surprise. I had assumed that other than its role in selecting
the files to be recovered, the client couldn't possibly have any influence
on how the server goes about the process of actually finding the data and
feeding it back to the client, but it looks like there's more going on
there than I understand.

We got a resolution from EMC this morning that was able to resolve the
deadlock issue we were seeing. Upgrading the client software from 7.4.2
to 7.5.2 (to match the server) fixed the issue. I still don't understand
how the client could be influencing this, but it appears it was.

Tim
--
Tim Mooney Tim.Mooney < at > ndsu.edu
Enterprise Computing & Infrastructure 701-231-1076 (Voice)
Room 242-J6, IACC Building 701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

View user's profile Send private message
Post Multiple drives for recovery? 
Tim Mooney wrote:
In regard to: Re: [Networker] Multiple drives for recovery?, George...:

Yes, we've seen NetWorker parallelize multi-volume recovers. Most of
the
time it works pretty well. IIRC, this is something that was added in
the
7.x series (earlier versions would always serialize volume access). It
used to be configurable by creating a file in /nsr/debug (do a substring
search of the mailing list archives for striped_recover for more info).

I find it hard to believe that NW can utilize multiple drives. How
does it merge and/or munge everything properly? What if you're instead
recovering from multiple fulls?

Huh? Recovering from multiple fulls? When (and why?!) would you recover
from multiple fulls, simultaneously, and restore all of the data into the
original location? I have to be misunderstanding something, cause I'm not
following.

What I was speaking to was the case wherein the saveset spans, lets say
three tapes, for example, if NW could recover all the data from
volume1of3 while it's also recovering the data from 2of3 and 3of3
simultaneously, etc. and thereby save time versus reading them one at a
time. It would somehow have to merge them together on disk once done.
That was my understanding of parallel recover, but looks like I was
reaching beyond what the term implies.


Think about the browse process. Let's say you need to recover three files
from a directory. fileA never changes, so it only gets backed up when you
do a full backup. fileB changes once a week, so it only gets backed up on
your Saturday backups. fileC changes daily, so it gets backed up every
day.

Now you browse the index, find that directory, do a

add fileA fileB fileC

and then run

volumes

and it shows you that it will require 3 tape volumes.

NetWorker knows that only one copy of fileA is coming back (from the
full), only one copy of fileB is coming back (from the tape from Saturday)
and only one copy of fileC is coming back (from the most recently daily).

It can parallelize the recovery because there are no conflicts.

Now I anticipate it would make the coding much simpler if NetWorker
always recovered from backups of a lower level first, so your full
would be the first one that got recovered. If fileB and fileC were
both backed up at level 'incr', there's no reason why they couldn't be
recovered in either order or completely in parallel.

Yes, I follow you.


How can it temporarily store all that
data as disk space could become jeopardized at some point. How does it
organize and/or re-conglomerate all that later?

I'm totally not following.

We have, however, seen a few instances where recover apparently
deadlocks
in the striped recovery code. This happened to us to a couple of times
under 7.2.x or 7.4.x, but we upgraded to 7.5.2 last week and the
first big
recover we had to do triggered a deadlock in recovery. We've had a case
open with EMC about this issue since last Friday.

What do you mean by 'deadlocks'?

I mean pauses indefinitely and can't seem to be "prodded" into continuing.
The recovery process stops after one or more tapes have been read (so
it's part way through the process of recovering the files that were
requested) and never proceeds with subsequent tapes.

There are reasons why this can happen, like media database corruption
(Networker knows the ssids it needs but can't figure out which tapes
they're on) or issues with the jukebox resource (it's partially corrupt
and NetWorker knows the tape is nearline but doesn't know where it is),
but both of those issues were ruled out in the case we had open.

I was thinking to try upgrading the client software on client A, but I
doubt that has any effect over what the server decides to do on its
end in terms of loading those tapes.

That's exactly what I thought, but it turns out to be incorrect, much to
my great surprise. I had assumed that other than its role in selecting
the files to be recovered, the client couldn't possibly have any influence
on how the server goes about the process of actually finding the data and
feeding it back to the client, but it looks like there's more going on
there than I understand.

We got a resolution from EMC this morning that was able to resolve the
deadlock issue we were seeing. Upgrading the client software from 7.4.2
to 7.5.2 (to match the server) fixed the issue. I still don't understand
how the client could be influencing this, but it appears it was.


I'll try that. I did create the /nsr/debug/no_striped_recover file. That
did the job as far as forcing NW to sequentially load the tapes, BUT
after loading the full first, and recovering the data there, it then
loads the incr tapes one at a time, but in the seemingly wrong order,
going from newest to oldest rather than the converse as I would have
expected!!! After the recover completed, I ran my validation, and this
time there were only 3 bad matches which were the mod times for three
recovered directories as opposed to hundreds of directories as before.
Otherwise, no differences were found. This is clearly much closer than
before, but still 3 mod times away from being perfect. Maybe the client
upgrade will do the trick.

George

Tim


--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or
unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB