Views |
||||||||||||||
MS Cluster and MSSQL cluster backups - SUMMARY
Originally I was approached to resolve few issues with failing backups of MSCS and MSSQL after NT and W2K clients were upgraded from v.5.1 to v.5.7 and SQL modules from 2.0 to 3.0. All those clients were backing up "successfully" to Solaris 5.1 and 5.5.3 servers before, however we found out that client's original configuration was not supported by Legato. Here is why: Originally customer backed up physical node1 of the cluster. Backup of physical node2 was not implemented. If backup would failed on Node1 customer would take a look on cluster, see if it failed over and if it did just fail it back. Version 5.1 is not cluster aware. So during a backup of node1 cluster drives Q: R: were treated as local drives. After upgrade to 5.7 configuration "was broken" since Networker new that those drives are cluster drives. So for some period of time backups were "successful" until some one noticed that cluster drives not being backed up at all. Networker did not even list those drives. That's happens when save set was set to All. Changing it to specify all drives implicitly would result in error: savefs: The path R: does not belong to the host client.backup.com. savefs: The path Q: does not belong to the host client.backup.com. In addition to that SQL7 backups start failing with "connection refused" messages. So now customer suspects that new version is "not" cluster compatible and all hell brakes loose. They never had to setup before additional virtual clients in order to backup their clusters and SQL. They also never had to add additional IP under cluster resources either (That is only in case if you use different network to backup your clients, multihomed environment). What they did instead on some machines they "hacked" configuration by adding MSDTC cluster resource network name into aliases field. Of course it did not help much since they used different names on some machines (MSDTC, MSCDT, MSDTC2, etc.) and no one could explain why would it work in one case but not other. Legato would not support such configuration. On top of that Networker user for SQL GUI would crash on start up with Dr. Watson error. It took some time to convince them to implement supported Legato configuration in order to backup MSCS and SQL successfully. Now this method would apply for both NT4 and SQL7 and Win2000 and SQL 2000. Here is how it is setup now and working just fine. All backups went successful (file systems, cluster drives and SQL) in all cases, with Physical Node1 & Physical Node2 being up or down. SETUP INFO: In addition to configured clients on physical node1 and node2 we configured another two virtual cluster clients. One for cluster drives, another one for SQL. 1. First we add to cluster resources a new valid IP visible to backup server.* - for multihomed clients only. a) Under Cluster Administrator, in the same group with a Cluster Server Name and IP address you need to create a new resource "Cluster IP Backup" (any name you want). b) Possible owners have to be both physical nodes. c) For Resource Dependencies you have to select Cluster IP Address resource. d) Then type your NEW backup IP address and subnet mask and select your BACKUP network interface.
2. Create 2 "Virtual Name" backup clients on Backup Server. For testing we used "Node1(name)-cluster1.backup.com" a) Before creating a cluster backup client on server adjust server's HOSTS file, so Networker can resolve client name to a valid IP address. b) Configure your clients on Backup Server. Here are two client config examples: To backup virtual cluster drives: type: NSR client; name: node1-cluster1.backup.com; server: backup1.backup.com; ...
save set: All;
remote access: backup1.backup.com, node1.backup.com, node2.backup.com,
aliases: node1-cluster1.backup.com, NODE1CLSTR, NODE1SQL, MSDTC;
were not. The easiest method is to use them all. Of course you should never have to use those aliases for actual physical backup clients (Node1 & Node2). To backup virtual SQL client: type: NSR client; name: node1-cluster1.phx-colo.backup; server: backup1.backup.com; Everything same as above except: save set: "MSSQL:";
remote user: backupmssql; password: ***;
MSSQL: nsrsqlsv: [Microsoft][ODBC SQL Server Driver][Shared Memory]SQL Server does not exist or access denied. MSSQL: nsrsqlsv: [Microsoft][ODBC SQL Server Driver][Shared Memory]ConnectionOpen (Connect()). MSSQL: nsrsqlsv: XBSA no resource.(BSA code 19): Cannot login to local SQL Server backup command: nsrsqlsv -b Standard -a NODE1SQL; aliases: node1-cluster1.backup.com, NODE1CLSTR, NODE1SQL, MSDTC; 3. Delete or disable client definition for physical node1 with MSSQL: save set - customer's old settings. That's all for server side settings. 3. Back to cluster machines to finalize settings Adjust both node's hosts files # Example: # Virtual cluster node xxx.xx.xx.xxx node1-cluster1.backup.com
xxx.xxx.xxx.xxx NODE1CLSTR xxx.xxx.xxx.xxx MSDTC xxx.xxx.xxx.xxx NODE1SQL Last line MUST for SQL7 and optional for SQL 2000. Reason for it is that Networker SQL GUI (nwmssql.exe) will exit with Dr.Watson error if it can't resolve SQL cluster name to IP address. Actually only one of them exists on SQL7 clusters. With SQL 2000 there can be two or more. However, when I commented out last line on SQL 2000 machines, nwmssql.exe did not crashed on start up, that's why I am saying that those entries for SQL 2000 can be optional. That's all for client settings. Also, make sure that Networker server is 5.5.x or higher. Anything before is not cluster aware and your cluster backups will not work properly. In order to restore data on virtual cluster drives we are using directed recovery, because virtual drives will not be visible on either node1 or node2 when you start Networker user. For Networker user for SQL it is not necessary because it will find client by virtual SQL cluster name. One more issue: On some Windows 2000 cluster machines and Networker 5.7 clients we are experiencing problem when nsrexecd will start on reboot but will not restart if stopped manually until that machine is rebooted. I was able to narrow this problem down to Cluster Service. Apparently if we stop Cluster Service, we are able to restart nsrexecd successfully and than restart Cluster Service. I don't know what could cause this behavior but I am working with Legato Tech Support to resolve this issue. Also, I don't believe its happen if Networker 6.0.2 client installed. No aliases of backup clients can be identical on a single backup server. Therefore Cluster_Network_Name, SQL_Network_Name, MSDTC_Network_Name resources (parameters) has to be unique for every cluster. You can change it on the fly, without even restarting cluster services. Something unique to local machine/cluster, like Node1_MSDTC, should work just fine. |
||||||||||||||
| This page was last modified 06:30, 28 April 2007. | ||||||||||||||