Thanks for the suggestion, Teresa, but it turns out this was a lot
simpler than that. The host's ip address, listed in /etc/hosts, was not
correct, but the one listed in the main DNS table was. The reason is
because this machine was set up to replace another older host with that
same ip name; so, until the work could be completed on the new host and
the old files transferred from the old one, a temporary ip address and
host was used on the new one. The new ip name was later changed back to
the old one, BUT the ip address never was. It still listed the new ip!
My guess is that when the backup server would go to resolve the ip, it
would not match since it was using DNS which had the correct entry
comparared with the client's /etc/hosts which did not?
The host's /etc/nsswitch.conf file listed 'files nis dns'. After
changing the wrong ip to the correct ip, the problem immediately
resolved itself. No reboot or re-start necessary. I was also then able
to run all the tools like nwadmin, nwrecover, recover, mminfo, etc.
without them hanging. They all hung prior.
One thing I've noticed in the past when there's a duplex mismatch is
that backups run, but the byte changes in the GUI window are much slower
than you would expect. In our case, though, there was nothing in the
sessions window, not even the message about the tape forwarding.
Nothing. It would take hours before anything would appear in the
sessions window and once something finally appeared, it would then run
normally.
George
Teresa Biehler wrote:
Are the duplex settings for the client and network port the same? This
sounds like a duplex mismatch problem.
-T
-----Original Message-----
From: Legato NetWorker discussion [mailto:NETWORKER < at > LISTMAIL.TEMPLE.EDU]
On Behalf Of George Sinclair
Sent: Friday, December 03, 2004 11:30 AM
To: NETWORKER < at > LISTMAIL.TEMPLE.EDU
Subject: [Networker] No response from client
Hi,
We have this Linux client that is agonizingly slow to backup. It *WILL*
back up, but it takes hours, even to do an incremental, even against
something simple like /tmp. So if I launch a backup against this one
client from the GUI, for example, and I walk away, and I come back an
hour later, I will see nothing, I kid you not. Absolutely no activity
whatsoever. If I check group control window, the saveset 'All' is still
listed pending. Unbelievable! If I come back another hour later, same
thing. When I go home at night, though, and I come in the next morning,
it's done!
Its file systems are no bigger than any of our other clients. It's
running the same version of the OS (RedHat 7.3), and as near as I can
tell has the same setup. It's running the same version of the NetWorker
client software, too. Pinging the host produces normal timely responses,
same as other hosts on the network. Also, pinging machines from that
host works normal, too. This machine can access network with no problems
that I can tell, and the user never complains about being able to reach
other hosts, internet, etc.
This problem has existed as long as I can remember, so I don't know when
it first starting exhibiting this behavior. Here are its horrible
symptoms and some of the things I've tried to troubleshoot it:
1. Running the following commands on the client produce no output or
error messages to the console. They just hang:
nwadmin -s server
nwrecover -s server
recover -s server
save -s server -b pool -l i /tmp
2. Running a probe against the client from the primary server produces
nothing:
savegrp -pn c client group
3. Under save sets, changed 'All' to /tmp, placed client in its own
group and ran both full and incremental from GUI. Still nothing! Just
sits there. I can see that if a file system had like 50 million inodes
that maybe doing an incremental might take a while, but /tmp? Common!
Even a small file system like /var just sits dormant during backup. I've
seen huge RAID on other boxes run circles around this machine on a bad
day.
There are no error messages or strange warnings in the nsr daemon.log or
messages log files on the primary server.
I used rpm -e to remove the NetWorker client and then re-installed and
re-started the software. Still no luck. I moved the network cable (100
Mbit) to another port, still no luck. I even tried another network
cable, no luck. I shut down the host, and rebooted it. Nothing. I even
ran a check against the client index on the primary server as: 'nsrck
-L6 client'. No error messages or warnings, and it completed just dandy.
Didn't take long either. What and the heck is this machine's problem?!!!
It has plenty of memory, plenty of swap space, and it's doing NOTHING
when I've been running these tests. It's not like there's 100 users
logged in. Noone is logged in other than me. I've tried everything but
running something like ethereal (sp), but maybe I'm gonna have to start
analyzing packets here? Not too good with those kind of tools.
Any ideas on what to try?
Thanks.
George
--
Note: To sign off this list, send a "signoff networker" command via
email
should be sent to stan < at > temple.edu
Note: To sign off this list, send a "signoff networker" command via email
should be sent to stan < at > temple.edu
