LifeKeeper(c) 2.03D for Windows NT(c) Patch Readme File The LifeKeeper 2.03 Patch D includes critical fixes for comm path problems. These fixes are applied to the LifeKeeper core components. In addition, Patch D includes all accumulated fixes included in Patch C for the following LifeKeeper application recovery kits, in addition to new fixes as indicated: IP, Lan Manager, SQL Server, Oracle, and Exchange. We recommend installing this patch on each server AFTER you install all the LifeKeeper products. If installing this patch into an existing LifeKeeper environment, stop lifekeeper on both servers before installing the patch. How to Install the Patch A. New install of Lifekeeper: After installing all the LifeKeeper products: 1. In the Control Panel, choose Services, and STOP the following two entries: - LifeKeeper - LifeKeeper External Interfaces 2. Insert the diskette/CD in the floppy drive. 3. Run setup.exe from the diskette/CD. 4. In the Control Panel, choose Services, and START the following two entries: - LifeKeeper - LifeKeeper External Interfaces Upgrading an existing install of Lifekeeper 2.03 1. On each node, take the active hierarchy (the hierarchy may be active on only one server) out of service 2. Stop Lifekeeper on both nodes 3. Apply Patch D 4. Start Lifekeeper on both nodes 5. Bring the hierarchies into service on each node. Problems fixed by this patch are listed below grouped by core and kit in the order of the most recent fix at the top of the list: Example: MR/CR/SPEC: lkxxxxxxxx - Summary CORE PATCHES: (All new in LK 2.03 Patch D) MR lknt000151 - SOCKCA CPU Utilization Description: This modification corrects a problem that can sometimes cause the SOCKCA.exe process to incorrectly go into a tight loop and consume excessive amounts of CPU utilization. This problem has also been corrected in LK2K CR# lkwin00000046 (SOCKCA.exe replaced) MR lknt000150 - One Socket Comm Path Fails to Reconnect to a Server Description: This modification corrects a problem that can sometimes prevent one LifeKeeper SOCKET comm path from reconnecting when a peer system in the LifeKeeper cluster is shutdown and then restarted. When this happens usually the first socket comm path will fail to reconnect and the second socket comm path will succeed. This has also been corrected in LK2K CR# lkwin00000106. (ntlklcm.dll, LcmLcdInt.exe, lcm.exe, lkinit.config replaced) MR lknt000149 - All Socket Comm Paths Fail To Reconnect to a Server Description: This modification corrects a problem that can sometimes prevent all Lifekeeper SOCKET comm paths from reconnecting when a peer system in the LifeKeeper cluster is shutdown and then restarted. This problem is also corrected in LK2K CR# lkwin00000053. (lksvc.exe replaced) MR lknt000095 - Remove instance info modifications from quickchk and deepchk Description: Customers got corrupted instance info field for volume resources. Intermittent problem also observed elsewhere but not reproducible. Old quickchk and deepchk scripts inserted " - SCSI" if volume type cannot be retrieved. This was modified so an error message is logged if instance info cannot be retrieved, but no changes are made to instance info. This avoids corrupting the instance info if it cannot be retrieved. (volume quickchk.ksh, deepchk.ksh, and restore.ksh replaced) MR lknt980271 - Don't Switchover shutdown strategy fails. Description: On single CPU WindowsNT Terminal Server Systems the Don't Shutdown Stategy may fail. The failure was due to altered TCP/IP behavior on Terminal Server systems. Sometimes the !comm_down! event was sensed on the backup system faster and just ahead of receiving the !nofailover! flag from the system being shutdown. When this happened a failover would be initiated when it was not intended. This has been resolved by streamlining the LifeKeeper shutdown process and raising the prioritizing for sending the LifeKeeper !nofailover! flag. This problem is also corrected in LK2K CR# lkwin00000047. (lksvc.exe replaced) IP KIT PATCHES: (New in LK 2.03 Patch D) MR lknt000145 - modifications to IP quickchk.ksh and deepchk.ksh Description: Intermittent false failovers are occuring with IP resource hierarchies due to the set of environment variables using 'set --' after 'getopt' not occurring (quickchk.ksh and deepchk.ksh). Also, changes to deepchk.ksh as requested by NCR GSC to increase the number of remote pings. It has been modified to require 8 missed pings instead of 5 missed pings before indicating an IP failure. They felt this will help to prevent unnecessary Lifekeeper failovers caused by momentary network or server congestion. (quickchk.ksh and deepchk.ksh replaced) (New in LK 2.03 Patch D) MR lknt000157 - Add non-zero retry count to condition for ping failure (deepchk.ksh replaced). MR lknt990089 - Give the deepchk ping more than one chance to succeed. This helps avoid false failovers on busy networks. MR lknt990040 - Kit fails with French NT 4.0. SQL KIT PATCHES: (New in LK 2.03 Patch D) MR lknt000155 - act/act DB still active after failed switchover Description: Lifekeeper protected databases are active on both Lifekeeper servers at the same time even if though the protected volumes are only active on one server. This generally occurs after a failed switchover of the SQL hierarchy. Added command to aRestore so that if the restore fails, any databases already activated will be deactivated. (aRestore.ksh, aRestore6.ksh, aRestore7.ksh replaced) (New in LK 2.03 Patch D) MR lknt000094 - SyncGUI.exe, SyncLogin.ksh don't handle NT trusted domains Description: There are 2 problems with the utility used to synchronize SQL Server logins between nodes in a cluster: 1. For trusted NT logins, the procedure sp_grantlogin should be called rather than sp_addlogin. 2. For SQL Server users being added via sp_addlogin, the encryption status is being set to SQL Server 7.0 type (2) regardless instead of checking the encryption status of the user on the primary server and adding the user on the backup server(s) with the same encryption status as its matching user on the primary server. Changes were made to 'SyncLogin.ksh' to handle domain users. (SyncLogin.ksh replaced) MR lknt000001 - SyncGUI.exe utility correctly handles a user account created with no password. MR lknt990157 - Corrected problem where duplicate entries in config file tagname.txt would cause sql resource to fail. MR lknt990156 - SyncGUI.exe utility now can handle special characters in login and password fields. MR lknt990152 - Mixed case drive letters used to cause multiple dependencies to be created to the same volume. MR lknt990151 - Made change to allow SQL cluster tab to accept changes after the first apply. MR lknt990087 - Corrected problem with MSDTC fix below (lknt980330). The previous fix was only working for manual failovers. MR lknt980330 - With MSDTC in an n+1 configuration, failovers only work from one primary to the secondary. MR lknt980317 - Stored procedures do no work after failover in active/active configuration. MR lknt980326 - Changed method of creating volume resource's tag name. ORACLE KIT PATCHES: (New in LK 2.03 Patch D) MR lknt000147 - can't create hierarchy if using Oracle 8.1.5 Description: The location of the registry key is different between Oracle 8.1.5 and 8.1.6. In order to support both versions, the file 'srvfunc.ksh' has been modified in several locations to be able to find the registry key if 8.1.5 is installed as well as 8.1.6. (srvfunc.ksh replaced) (New in LK 2.03 Patch D) MR lknt000141 - Additional checking for null TAG and ID in quickchk and deepchk. Description: Could receive intermittent failure with 'getopt' in Oracle quickchk of deepchk script when 'set' command is used. Undefined parameters were not detected, causing the rest of the script to run with the information missing. Modified getopt section of quickchk and deepchk to work as in other kits (sql, web, etc.) and not utilize the set. Also check TAG and ID variable values and return error if they are null. (quickchk.ksh and deepchk.ksh replaced) (New in LK 2.03 Patch D) MR lknt0000112 - Listener modified during create but not restarted Description: In Oracle 8i RK, the LISTENER in the listener.ora file is being modified with the entries for the LifeKeeper protected database. However, the LISTENER is not being restarted on both servers when the app is brought in-service on a given node. Thus users get errors for no listener when attempting to connect to database using tnsnames and sqlplus because the Listener was not restarted. (add_lsnr.ksh and remove.ksh replaced) MR lknt990149 - Eliminate duplicate entries in listener.ora that are created when two instances depend on the same IP or LM. MR lknt990000 - Problem with length of address line in Listener.ora. XCHG PATCHES MR lknt000000 - Added MCA service. MR lknt98319- Problem with LK2 Exchange Recovery kit installer. MR lknt990053 - xchgshares file incorrectly assumes MSMail connector is installed. MR lknt980213 - ShareList file sometimes gets corrupted with multiple Exchange hierarchies. MR lknt980191 - Create error is misleading for resource. MR lknt980209 - Restore fails if system name is reset. MR lknt980204 - Exchange hierarchy create fails when Intel Server Monitor Service is running. LM PATCHES MR lknt000023 - Hidden fileshares now appear in resource create window. MR lknt990116 - Allow & characters to be in share name. MR lknt990117 - Allow for file shares with large security descriptors