Welcome! » Log In » Create A New Profile

Lack of disk space on server: a cautionary tale. And a couple questions ..

Posted by Michael Leone 
So this past Sunday, I woke up to emails that my NW 8.4.2.3 on Win 2008 R2
server wanted more tapes in a specific pool. This was suspicious, because
on Friday, I had more than enough blank tapes for a normal weekend run.
When I connected in, I discovered that my server had run out of disk space
on the drive where it stores all indexes, logs, etc. When I left on
Friday, it had 30G free, which should have been more than enough for a
weekend run. But apparently not. Either that, or there was some weird bug
that caused it to eat all available space ..

Anyway. I managed to gracefully stop all running jobs (both backup jobs
and clone jobs) from my NMC (which thankfully runs on a VM elsewhere).
I then proceeded to clean up space (I still had client indexes around for
clients I had deleted from the NMC, but never removed the index folders
from disk).
So I got it back to like 40G free. I rebooted, for good measure (in case
there were any zombie processes, etc). The server rebooted fine, but the
NW server took longer to come up than usual. When I could connect via NMC
again, I found that several groups, and several clone jobs, had vanished.

Poof! Gone.

Log said "Found invalid resource <filename> in RAP database". Close to 30
of these errors .... I'm guessing those were my now missing groups and
clone jobs.

Since I was in a rush, I just manually recreated the missing groups, and
clone jobs as best I could (mostly because I didn't know how to restore
just the RAP database, and I was doing this remotely from home).

So:

Be sure to keep an eye on your disk spaces (even though I thought I had
enough space). We use SolarWinds, and I do have an alert set up for low
disk space, but I never got a notice ...

Learn to use the "nsrck -YR <client>" command to remove client indexes,
when you delete a client (to free up disk space). I usually don't, because
many times I've had to recreate a deleted client, because the developers
who swore up and down that they had all the files they needed off the
client ... didn't, and so I needed to do a recover.

Find out how to restore just a RAP database (if there is a way), and keep
that handy somewhere.

Also not a bad idea to keep a list of groups and clone jobs documented.

Anyone know how to do that last part? I'm guessing there's a way in
nsradmin to list all groups, and then save that output. Same for clone
jobs. But I know very little nsradmin commands.

Also: I have errors from nsrtask, indicating that there used to be a clone
job scheduled, but apparently the clone job doesn't exist anymore:

Scheduled cloning 'Clone Job - NT_SAN1 Sat++' completed. Total 0 save
sets, 0 Failed.
Unable to process resource: Cannot find NSR clone resource 'Clone Job -
NT_SAN1 Sat++'.

I can't create a new clone job with that same name, but I can create a new
job with a slightly different name ("Clone Job - NT_SAN1 Sat FULL").
But how do I edit the list of nsrtasks, and remove the one that points to
the now available clone resource?

Thanks

--
Michael Leone
Network Administrator, ISM
Philadelphia Housing Authority
1800 South 32nd Street
Phila, PA 19145
Tel: 215-684-4180
Cell: 215-252-0143
<mailto:michael.leone@pha.phila.gov>


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Re: Lack of disk space on server: a cautionary tale. And a couple questions ..
June 05, 2017 10:03AM
Hi Michael

Your disk filled up and corrupted the RES

Open a case with EMC.

You should restore it from a backup. Or re-create the missing items manually. Whatever is easier for you.


To clear the old tasks. You can rename the jobsdb folder when the system is down

Or manually try and identify and kill the jobs by using jobquery.


Andy

________________________________
From: EMC Data Protection Q & A <EMC-DATAPROTECTION-L@LISTSERV.TEMPLE.EDU> on behalf of Michael Leone <Michael.Leone@PHA.PHILA.GOV>
Sent: 05 June 2017 15:12
To: EMC-DATAPROTECTION-L@LISTSERV.TEMPLE.EDU
Subject: [EMC-DataProtection-L] Lack of disk space on server: a cautionary tale. And a couple questions ..

So this past Sunday, I woke up to emails that my NW 8.4.2.3 on Win 2008 R2
server wanted more tapes in a specific pool. This was suspicious, because
on Friday, I had more than enough blank tapes for a normal weekend run.
When I connected in, I discovered that my server had run out of disk space
on the drive where it stores all indexes, logs, etc. When I left on
Friday, it had 30G free, which should have been more than enough for a
weekend run. But apparently not. Either that, or there was some weird bug
that caused it to eat all available space ..

Anyway. I managed to gracefully stop all running jobs (both backup jobs
and clone jobs) from my NMC (which thankfully runs on a VM elsewhere).
I then proceeded to clean up space (I still had client indexes around for
clients I had deleted from the NMC, but never removed the index folders
from disk).
So I got it back to like 40G free. I rebooted, for good measure (in case
there were any zombie processes, etc). The server rebooted fine, but the
NW server took longer to come up than usual. When I could connect via NMC
again, I found that several groups, and several clone jobs, had vanished.

Poof! Gone.

Log said "Found invalid resource <filename> in RAP database". Close to 30
of these errors .... I'm guessing those were my now missing groups and
clone jobs.

Since I was in a rush, I just manually recreated the missing groups, and
clone jobs as best I could (mostly because I didn't know how to restore
just the RAP database, and I was doing this remotely from home).

So:

Be sure to keep an eye on your disk spaces (even though I thought I had
enough space). We use SolarWinds, and I do have an alert set up for low
disk space, but I never got a notice ...

Learn to use the "nsrck -YR <client>" command to remove client indexes,
when you delete a client (to free up disk space). I usually don't, because
many times I've had to recreate a deleted client, because the developers
who swore up and down that they had all the files they needed off the
client ... didn't, and so I needed to do a recover.

Find out how to restore just a RAP database (if there is a way), and keep
that handy somewhere.

Also not a bad idea to keep a list of groups and clone jobs documented.

Anyone know how to do that last part? I'm guessing there's a way in
nsradmin to list all groups, and then save that output. Same for clone
jobs. But I know very little nsradmin commands.

Also: I have errors from nsrtask, indicating that there used to be a clone
job scheduled, but apparently the clone job doesn't exist anymore:

Scheduled cloning 'Clone Job - NT_SAN1 Sat++' completed. Total 0 save
sets, 0 Failed.
Unable to process resource: Cannot find NSR clone resource 'Clone Job -
NT_SAN1 Sat++'.

I can't create a new clone job with that same name, but I can create a new
job with a slightly different name ("Clone Job - NT_SAN1 Sat FULL").
But how do I edit the list of nsrtasks, and remove the one that points to
the now available clone resource?

Thanks

--
Michael Leone
Network Administrator, ISM
Philadelphia Housing Authority
1800 South 32nd Street
Phila, PA 19145
Tel: 215-684-4180
Cell: 215-252-0143
<mailto:michael.leone@pha.phila.gov>


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Hi Michael,

The nsradmin command to list all groups is:

> echo print type:nsr group | nsradmin -i - > all_groups.nsr

Clone jobs are more difficult, because AFAIK they can only be managed via
NMC.

Conrad





-----Original Message-----
From: Michael Leone [mailto:Michael.Leone@PHA.PHILA.GOV]
Sent: Monday, June 5, 2017 10:13 AM
Subject: Lack of disk space on server: a cautionary tale. And a couple
questions ..

So this past Sunday, I woke up to emails that my NW 8.4.2.3 on Win 2008 R2
server wanted more tapes in a specific pool. This was suspicious, because on
Friday, I had more than enough blank tapes for a normal weekend run.
When I connected in, I discovered that my server had run out of disk space
on the drive where it stores all indexes, logs, etc. When I left on Friday,
it had 30G free, which should have been more than enough for a weekend run.
But apparently not. Either that, or there was some weird bug that caused it
to eat all available space ..

Anyway. I managed to gracefully stop all running jobs (both backup jobs and
clone jobs) from my NMC (which thankfully runs on a VM elsewhere).
I then proceeded to clean up space (I still had client indexes around for
clients I had deleted from the NMC, but never removed the index folders from
disk).
So I got it back to like 40G free. I rebooted, for good measure (in case
there were any zombie processes, etc). The server rebooted fine, but the NW
server took longer to come up than usual. When I could connect via NMC
again, I found that several groups, and several clone jobs, had vanished.

Poof! Gone.

Log said "Found invalid resource <filename> in RAP database". Close to 30 of
these errors .... I'm guessing those were my now missing groups and clone
jobs.

Since I was in a rush, I just manually recreated the missing groups, and
clone jobs as best I could (mostly because I didn't know how to restore just
the RAP database, and I was doing this remotely from home).

So:

Be sure to keep an eye on your disk spaces (even though I thought I had
enough space). We use SolarWinds, and I do have an alert set up for low
disk space, but I never got a notice ...

Learn to use the "nsrck -YR <client>" command to remove client indexes, when
you delete a client (to free up disk space). I usually don't, because many
times I've had to recreate a deleted client, because the developers who
swore up and down that they had all the files they needed off the client ...
didn't, and so I needed to do a recover.

Find out how to restore just a RAP database (if there is a way), and keep
that handy somewhere.

Also not a bad idea to keep a list of groups and clone jobs documented.

Anyone know how to do that last part? I'm guessing there's a way in nsradmin
to list all groups, and then save that output. Same for clone jobs. But I
know very little nsradmin commands.

Also: I have errors from nsrtask, indicating that there used to be a clone
job scheduled, but apparently the clone job doesn't exist anymore:

Scheduled cloning 'Clone Job - NT_SAN1 Sat++' completed. Total 0 save sets,
0 Failed.
Unable to process resource: Cannot find NSR clone resource 'Clone Job -
NT_SAN1 Sat++'.

I can't create a new clone job with that same name, but I can create a new
job with a slightly different name ("Clone Job - NT_SAN1 Sat FULL").
But how do I edit the list of nsrtasks, and remove the one that points to
the now available clone resource?

Thanks

--
Michael Leone
Network Administrator, ISM
Philadelphia Housing Authority
1800 South 32nd Street
Phila, PA 19145
Tel: 215-684-4180
Cell: 215-252-0143
<mailto:michael.leone@pha.phila.gov>


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings,
please do so via
http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send
email to owner-emc-dataprotection-l@listserv.temple.edu


---
This email has been checked for viruses by AVG.
http://www.avg.com


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Sorry, only registered users may post in this forum.

Click here to login