LifeKeeper(c) 1.00I for Windows NT(c) Patch Readme File Problems fixed by this patch are listed below grouped by kit in the order of the most recent fix at the top of the list: Note this patch needs to be installed after all other LifeKeeper products to ensure proper installation. See MR ES961944 for a list of restrictions when using LK on a domain controller. You can safely install this patch on any version of LifeKeeper for Windows NT. Lan Manager Kits users: Microsoft only supports Lan Manager functions over one IP# per network interface card, Microsoft bug SRX#9704116-48. This does not allow us to use Lan Manager functions over LifeKeeper protected IP numbers. Therefore the only way to switchover an alias computer name using the TCP/IP protocol is to allow dynamic IP# to Lan Manager name mapping for your clients. At this time using a WINS server is the recommended solution. You will need to make the LifeKeeper systems and all computers who access the protected Lan Manager name WINS clients of the same WINS server. NT 4.0 users: If you have difficulty bringing the socket comm path up it may be that your machine is using the wrong IP number for the local computer name. Look in the %windir%\system32\drivers\etc\hosts file. You should have correct entries for both machines. If not correct them then delete the comm path and recreate it. If the comm path will still not come up go to the network icon in the control panel. Run setup for TCP/IP and click on the DNS tab. Enter a name other than the actual computer name in the host name field. This will force your machine to resolve it's own IP number through the hosts file. CORE PATCHES: MR lknt970718 SPEC - CORE - 10/29/97 - Changed LmShareNameAdd.exe to add shares correctly when the share names already exist. MR lknt960357 SPEC - CORE - 10/24/97 - Application Status and Administration (ASA) window did not keep persistent window size/position and icon size settings between uses. Changes were made so that ASA will restore to its last size/position and also restore the size of the icons in the child windows. MR lknt970868 SPEC - CORE - 10/23/97 - CHKDSK was running during the first manual failover of a volume resource. Now, LK simply ensures that the local resource is OSU. The remote system still must have the comment "remove action has succeeded", meaning it was in service there before. This eliminates the unnecessary chkdsk that occurred on the first manual failover. As before if any system with an equivalency for the volume is "DEAD" then chkdsk is still run. MR lknt970018 SPEC - CORE - 10/17/97 - The ps.exe and kill.exe programs used a previously undocumented method of getting a process list under Windows NT which involved loading all performance libraries configured on the system. This would cause problems if the libraries were unavailable, on a shared disk, or were not working correctly. In some cases this could cause the system to become slow or sometimes the system may halt completely. One of the major symptoms of this is having perflib errors in the event log. The solution is to use a new API call that allows us to simply goin and gets the list. Also includes related fixes for MR's lknt970540, lknt970779, and lknt970816. MR lknt970773 SPEC - CORE - 09/29/97 - File called "NULL" was being created in the current directory. Under NT 4.0, this directory was the desktop directory, so the file was appearing on the user's desktop. Change made to lkstop.ksh in a previous patch introduced this problem and has been fixed. MR lknt970106 SPEC - CORE - 09/08/97 - If LifeKeeper is running under NT 4.0, failover time was significantly longer than under NT 3.51 when all the comm paths were lost. LmIsServerNameExist.exe was changed to allow the SafetyCheckTimeOut feature to work properly and also to speed up finding another server on the network. MR lknt970062 SPEC - CORE - 09/03/97 - When the temp file RUNNING is left in the LKROOT/TMP directory, this will prevent LK from being shutdown the next time it is run. This temp file is now removed when LK starts up. MR lknt970358 SPEC - CORE - 08/28/97 - When a menu item for a recovery kit is selected, ASA starts a new thread to display the resulting dialog that appears. While waiting for this thread to return, it continuously sets the cursor to a pointer/hourglass. The code that changes the cursor in the loop was removed so that it no longer appears when the dialog box is displayed for the recovery kit. MR lknt960039 SPEC - CORE - 08/27/97 - When "sendevent.exe -O" runs, it kicks off "lifekeeper.ksh" and tries to pipe its out to $LKROOT\out\LCM. The shell fails since it can't open that file because sendevent.exe already has it open. The fix changes how the output files are created within lkexec. Output files are now created with FILE_SHARE_WRITE access as well as FILE_SHARE_READ. MR lknt970417 SPEC - CORE - 08/27/97 - When a volume is brought into service two files of the form sh00??.aa? was left in the LKROOT directory. This was a side-effect of MKS and was resolved by changing some command syntax. MR lknt970285 SPEC - CORE - 08/27/97 - The file share kit can create volumes with lower case IDs (it uses the case that was used in the net share command). The delete, remove and restore scripts would only see locked volumes with a capital letter. This was made case-insensitive. MR lknt970356 SPEC - CORE - 08/13/97 - Every time sendarp.exe was used it would call DefineDosDevice with the packet device name even if it was alreadly defined. This would, after executing sendarp about 150 times, cause sendarp to fail and then during an IP switchover client ARP tables would not be updated. MR lknt970537 SPEC - CORE - 08/13/97 - Change for ES971512. If there were no volumes under LifeKeeper protection the UpdateBootKey function would write out the key anyway with only one null on the end instead of two. This could cause things to fail if they counted on the value of this key. When you extend a volume it appends to the bootexecute key a command that finished the extend. If the string has only one null the new string is not added right and the extend will fail. MR lknt970428 SPEC - CORE - 08/13/97 - The start_stop_lifekeeper api does not run lkstop with the correct parameters. When you stop LK from the gui this call is used and even if you have lk set to switchover it does not when you stop lk on one machine. The proposal is to get it to use the correct parameters anytime lkstop is executed. lkstop is run from the lifekeeper service also with no parameters. lkstop.ksh now automatically detects the shutdown_switchover flag and does not run inh_failover if it is there. If lkstop is run with -n it never runs inh_failover. Currently the GUIs never call lkstop with params neither does lksvc.exe. So I made sure that it works OK with no params and if it has params they override the default values. MR lknt970288 SPEC - CORE - 07/21/97 - Resource creation in a 3-way configuration would fail because of the inability of gentag and genid scripts to generate a tag and id that is unique on both the primary and secondary systems on which the resource is being created. These scripts have now been fixed to generate a cluster-wide unique tag and id. MR ES971210 SPEC - CORE - 07/16/97 - Serial status and line errors would cause the TTY comm path to go red before the number of heartbeat misses was reached. MR ES971064 SPEC - CORE - 07/16/97 - Modified the LifeKeeper service to rewrite the BootExecute registry key anytime the status of the Lifekeeper service changes not just when it startes. MR lknt970158 SPEC - CORE - 06/16/97 - During LifeKeeper shutdown, LCD.EXE should be stopped before the other LifeKeeper daemons. To accomplish this, lkstop.ksh now kills the processes based on the executable name, instead of the Process ID. MR lknt970059 SPEC - CORE - 06/12/97 - The cursor in ASA changes to an hourglass when the Attributes dialog is brought up and never changes back to the cursor until the mouse is moved. Removed redundant WaitCursor logic to fix this problem. MR ES970920 All three LK graphical windows get hung up occasionally, not allowing input into any of the windows. To avoid this, the NT service control manager is checked to determine the status of LifeKeeper (running, paused or stopped). MR ES970842 SPEC - CORE - 05/15/97 - If a volume restore failed the machine performing the restore would leave the volume unlocked. This would create a situation where the volume could be accessed by both machines at the same time. Now if the volume restore fails the volume is relocked. MR ES970697 SPEC - CORE - 05/15/97 - The ability to automatically stop programs when a volume switchover falls has been added to the volume restore script. If certain programs are running, such as perfmon, they may have a handle open on the volume causing chkdsk to fail. This in turn will cause the volume restore to fail. To enable this feature which allows the volume restore script to stop such programs you will need to add the following registry entries. The example below shows entries needed to stop and restart perfmon.exe during a volume restore. ValueName "Value" # --------------- --------- ------- # HKEY_LOCAL_MACHINE\Software\AT&T\LifeKeeper\VolumeStopPrograms\Perfmon # ProgramName "Perfmon" # HKEY_LOCAL_MACHINE\Software\AT&T\LifeKeeper\VolumeStopPrograms\Perfmon # StartCmdLine "" # HKEY_LOCAL_MACHINE\Software\AT&T\LifeKeeper\VolumeStopPrograms\Perfmon # WasRunning "0" # HKEY_LOCAL_MACHINE\Software\AT&T\LifeKeeper\VolumeStopPrograms\Perfmon # Restart "1" ProgramName is the path\programname. StartCmdLine is the optional command line arguments to use when restarting ProgramName. WasRunning is used to save the number of instances of ProgramName running before stopping them. Restart equals 1 to automatically restart program after volume restore is finished 0 means do not restart. MR ES970348 SPEC - CORE - 03/19/97 - LifeKeeper is unable to bring shared disk volumes into service on systems where certain NT services are running. Two such services are SNMP (from Microsoft) and NetFinity (from IBM). These services try to open all disk volumes to monitor disk activity. When a LifeKeeper volume is brought into service, these services can keep the restore script from being able to perform CHKDSK or ResetVolume. To fix this problem, the restore script has been enhanced to stop these services before performing CHKDSK or ResetVolume, then to restart the service. The script looks in the registry for a list of services that it should stop. In the registry key HKEY_LOCAL_MACHINE\SOFTWARE\AT&T\LifeKeeper\VolumeStopServices are subkeys whose names are the service names to be stopped. In each of these subkeys are four values (each of them type REG_SZ) - ServiceName (the name of the service), WasRunning (set to 0 initially), StartWait, and StopWait. StopWait is the number of seconds that the script should wait for the service to reach the STOPPED state after it attempts to stop the service. If the service has not reached the STOPPED state in the configured period, the restore script fails. If StopWait is negative, then the script will wait indefinitely for the service to reach the STOPPED state. StartWait is the number of seconds that the restore script should wait for the service to reach the RUNNING state after it restarts the service. If the service doesn't reach the RUNNING state in the configured period, an error is logged in the Event Log, but the restore script succeeds. If StartWait is negative, then the restore script waits indefinitely for the script to reach the RUNNING state. If it is set to 0, then the script starts the service, but doesn't wait for it to reach the RUNNING state (and thus doesn't generate an EventLog message if the service can't be restarted). MR ES970244 SPEC - CORE - 02/12/97 - The saftey check can fail incorrectly if a system takes a very long time to shutdown. This causes resources not to switch over if LifeKeeper is set to switchover on shutdown. This has been fixed. MR ES962741 SPEC - CORE - 02/06/97 - Fix to get a stable serial connection in a 3-way Lifekeeper configuration. The polling routine is changed in ntlklcm.dll to better handle multiple serial communication paths on a Lifekeeper node. MR ES970224 SPEC - CORE - 02/06/97 GetVolumeInfo.exe only handled volumes whose names were less than 20 characters long. This caused volumes with longer names not to show up in the create volume hierarchy dialog box. MR ES962703 SPEC - CORE - 02/05/97 When HSA.EXE encountered an empty sys_list, it produced the error message "Something Wrong." This MR changes the error message to something more descriptive of the problem. MR ES961440 SPEC - CORE - 01/08/97 - Fixed TTYCA.exe so as not to lose serial communications path upon restarting one of the Lifekeeper pairs. MR ES961281 SPEC - CORE - 12/12/96 - Changed lkstop.ksh to handle shutting down gui processes. MR LKNT960172 SPEC - CORE - 12/12/96 - Changed restore.ksh in the volume recovery kit to use resetvol.exe instead of chkdsk.exe when recovering a NTFS volume that was manually taken out of service. This makes bringing large volumes in service very fast. MR LKNT960308 SPEC - CORE - 12/12/96 - Fix to avoid creating corrupt registry entries while closing down the Hardware Admin and Application Status Display windows on NT 4.0. MR LKNT960331 SPEC - CORE - 12/12/96 - Fix not to invoke multiple instances of the Lifekeeper GUI windows on NT 4.0. Also, fix to the Hardware Admin and Application Status Display GUI applications to preserve their window position and size across successive invocations on NT 4.0. MR LKNT960307 SPEC - CORE - 12/11/96 - Fix to make the socket commpath work on NT 4.0. MR LKNT960296 SPEC - CORE - 11/19/96 - Removed GetComputerName() call and used Registry key "ComputerName" to get the computer name of the system running on. This fix is to enable the Exchange recovery kit to change the value of "ActiveComputerName" key in the Registry during the restore process. MR ES961806 SPEC - CORE - 09/30/96 - Fix to convert destination system name to uppercase in the Create CommPath dialog to enable proper creation of LifeKeeper pair. MR ES961520 SPEC - CORE - 09/27/96 - Modified method of getting process id for Lifekeeper services so as to release the lock on Oracle volume. MR ES961264 SPEC - CORE - 09/16/96 - Modified the LifeKeeper service so that it can be operated from the Service Control Manager in the Control Panel. MR ES961447 SPEC - CORE - 09/16/96 - Changed the LifeKeeper service to more efficiently send the do not switchover message during system shutdown. MR LKNT960000 SPEC - CORE - 01/17/96 09/09/96 - Internal LifeKeeper usage of named pipes causes shared volumes to sometimes be perpetually locked. This patch fixes the problem. Patch was backported to lknt_1a as reported in ES961865. MR ES961674 SPEC - CORE - 09/06/96 - Fixes to make LifeKeeper work on International versions of Windows NT. MR ES961277 SPEC - CORE - 06/05/96 - Changed value in matrix.h which doubled spacing of computer name icons in asa to allow for computer names up to 15 characters. MR ES961278 SPEC - CORE - 06/05/96 - Deleted portion of lkstart.ksh and lkstop.ksh scripts which change the setting for the LifeKeeper service in the control panel to Manual or Automatic. This will allow users to change the setting from the control panel without LifeKeeper changing it back. IMPORTANT NOTE: LifeKeeper must be stopped when the patch is installed. This will force the startup setting in the Control Panel to Manual. If you wish for LifeKeeper to startup automatically upon login, then you will need to manually change the setting in the Services Control Panel to Automatic startup after installing the patch. LAN Manager KIT PATCHES: MR lknt970061 SPEC - LM - 07/28/97 - Under international versions of NT the service titles are different. This was causing problems when checking to see if a service was running. Changed to use the service names (which are always the same) instead. MR lknt960152 SPEC - LM - 03/03/97 - Changed sharelist.ksh to display Lan Manager shares that were shared using a lower-case drive letter. IP KIT PATCHES: MR lknt970591 SPEC - IP - 08/26/97 - Changes made to OEMSETUP.INF to allow the LKPacket driver (NCR Chameleon Protocol) to install from removable media (e.g., floppy disk or CD). MR ES970173 SPEC - IP - 02/12/97 - Changed the routing information field in the 802.5 frame header from C2, single route broadcast, to 82, all routes broadcast. This allows our ARP packets to exsist on a multi-router ring. MR ES962284 SPEC - IP - 12/12/96 - Changed IP kit to use the new Chameleon protocol driver to send correct ARP packets onto the network after a IP switchover. A new protocol driver has been added to overcome Microsoft bug SRX#9611086-00908. This bug was the root cause of client ARP caches not being updated after a IP failover. Now all ARP entries are updated after an IP switchover. MR LKNT960299 SPEC - IP - 11/15/96 - Fix to exclude the IP address of the 0th context of each network card from the list of available IP addresses for use as a placeholder. This fix was necessary to provide compatibility for use under Windows NT 4.0. MR ES962059 SPEC - IP - 10/24/96 - The IP kit always tells the Lan Manager subsystem about new IP numbers on the system. This is a problem because the Lan Manager transport only binds to the first IP number of every network interface board. Modified IP kit to only send notification to Lan Manager when switching first IP number on a NIC. MR ES961944 SPEC - IP - 10/24/96 - To use a Lifekeeper on a NT domain controller you must follow these restrictions. Every network interface board must have one perminate IP number in the topmost position in the Advanced Settings dialog under the TCP/IP configuration section of the Network Settings icon in the Control Panel. Also execute the line 'net config Server /hidden:yes' on the LK/DC machine. This prevents the browser from occasionally getting confused when switching over LM computer names. Same root problem/solution as ES962059. MR ES961801 SPEC - IP - 10/24/96 - Same root problem/solution as ES962059. MR LKNT960160 SPEC - IP - 05/07/96 - LifeKeeper would not be able to clear routing and arp tables after a IP switchover if not installed to 'C:\LK'. This patch fixes the problem. ORACLE KIT PATCHES: MR lknt970060 SPEC - ORAAPP - 08/27/97 - The Oracle TNS Listener would fail to start because the add_lsnr.ksh script would incorrectly determine Oracle's root directory. This occurs when ORAHOME is defined multiple times in the registry. Now, the first path found is used. MR lknt970448 SPEC - ORAAPP - 08/12/97 - The Oracle TNS Listener would fail to start when the IP tagname of a resource was different from the actual IP address. Changed to use ins_list instead of dep_list in order to pull out the IP address instead of the tagname. MR LKNT960171 SPEC - ORAAPP - 09/05/96 - If the Oracle software is installed on a different volume than the volume which contains LifeKeeper, users will not be able to connect to Oracle databases which are under LifeKeeper protection, using the protected IP addresses and/or Lan Manager names. The LISTENER file ($ORAHOME\network\admin\LISTENER.ORA) would never get updated with these addresses, and Oracle would not listen for connections using them. MICROSOFT SQL KIT PATCHES: MR ES970059 SPEC - SQLAPP - 02/06/97 If you boot both systems in a Active-Active SQL configuration at the same time both SQL hierarchies may come up on only one system. This is because of a safty check done in the SQL recovery kit file aRestore.ksh. MR ES962304 SPEC - SQLAPP - 02/05/97 Active-Active SQL 6.5 configurations require two master databases one for each system in the pair. This causes a user id administration problem because user id will have to be manually added to both systems in the pair. This patch installs triggers on the master.dbo.syslogins table and causes each system to update the other system's syslogins table whenever user id records are changed. MR LKNT960273 SPEC - SQLAPP - 09/20/96 - This patch is for the SQL 6.5 Recovery Kit. Before installing this patch it is necessary to upgrade the SQL Recovery Kit. This fix is for Active/Standby configurations only. This fix covers multiple issues with saving and restoring SQL configuration information in Active/Standby configurations. 1. If LifeKeeper is located on a FAT file system, it is possible that the configuration file will not be named correctly for restoration. 2. Using the SQL map settings where one of the three mappable characters is set the a backslash "\" would cause configuration corruption. 3. There are still some configurations in which the end user may not wish LifeKeeper to restore configuration parameters from one machine onto the other. Adding a new value into the Registry will keep the restore process from loading the saved configuration parameters onto the system. Add the value "SKIPLOAD" into "\\Registry\\Machine\\SOFTWARE\\AT&T\\LifeKeeper\\SQLapp" with a value of "TRUE" to skip the loading. MICROSOFT EXCHANGE KIT PATCHES: MR LKNT970704 SPEC - XCHGAPP - 09/11/97 Allow Exchange resource switchovers without requiring clients to exit and restart applications (a.k.a. "transparent client switchover"). Note that client systems will get an error message stating that a network error was detected but should then reconnect to the server with the next action. After installing this patch, the Exchange resource must be taken out of service and then brought back into service on the same machine before "transparent" switchover of client systems will be supported.