Probe problems after upgrade to 19.1.1.

Posted by yaron 

Joe Brancaleone
Re: Probe problems after upgrade to 19.1.1.
September 12, 2019 09:59PM
> A couple of days ago I upgraded from 8.2.4 to 19.1.1. At 8.2.4 we had a few groups with probes. This was working well for years. After the upgrade things stopped working. One of the client in the group is a Netapp filer which we is backed up using NDMP (nsrndmp_save). The second client in the group is a Solaris server which had its probe resource configured. After the upgrade it seems that Networker was trying to run the probe on the Netapp as well, although the client doesn't really have the resource. Later, I configured some probe for the client which then failed because Networker was trying to access nsrexec on the Netapp so it can run the fake probe. Trying to unset "Start backup only after all probes succeed" under probe config didn't help because it was still trying to run the probe. So, there are two problems I see here:

> 1. A probe action tries to run the probe even if there is no probe defined for the client.

> 2. The probe action tries to run the probe for NDMP clients as well, although this doesn't make much sense (no nsrexec).

> Any ideas on how to solve this?

I don't have any ideas yet but want to mention I just ran into a very similar issue this week. I upgraded our 8.2.4.x server to, and the probe group that ran for years did not ever detect a successful exit code from the Solaris client script during the nightly interval window, like it has every night previous. We updated the Solaris client to but the next night it did not seem to have effect.

Like you I also noted in the probe group's error logs that another client in the group that doesn't have a probe defined (its supposed to run Oracle RMAN backups once the Solaris client probe triggers the backup) shows probe errors of not returning a code for needing a backup.

I have an SR open with Dell/EMC support. The really strange thing is today I had the Solaris admin change the client script to immediately return an exit 0 code, and with only that Solaris client enabled in the probe group, a manual test backup worked.

I am not sure what exactly you are doing. After all the years with NW I never had to define a specific probe process.

I also wonder whether the probe will not respond at all of whether it will just take too long to build the worklist.
Such may happen for INCR on very large filesystems. We have some clients where the probe runs for about 3 hours.

For the solaris client - make sure that the OS and the NW versions are compatible. Do not just upgrade.

For the NDMP client - make sure that you really run the nsrndmp_save command and not save.
I do not remember exactly but it could be that NW will get confused by the action commands if the option 'client can override' has been set (which is the default). So if you use the new (NW 9.0+ policy/workflow/action) definitions, mke sure that only one method will be used.
