IBM OSD Components for Oracle OPS on Windows NT 4.0 Version 1.0 IBM Netfinity Cluster Enabler README File This README file contains the latest hints and tips to enhance reliability and performance of your Netfinity Cluster. Refer to the "IBM Netfinity Cluster Enabler Installation and User's Handbook" for complete installation and configuration instructions. CONTENTS ________ 1.0 Tips and Troubleshooting Hints for Installing and Configuring the Netfinity Cluster Software 2.0 How to Obtain the Oracle Patch Set 3.0 Errata 4.0 Trademarks and Notices 1.0 Tips and Troubleshooting Hints for Installing and Configuring the Netfinity Cluster Software ______________________________________________________________________ o Before updating the IBM Netfinity Cluster Enabler software, the IBMCoreClusterService service must be stopped on all nodes. o Symptom: When starting the database, after updating the IBM Netfinity Cluster Enabler software, an access violation error may occur in the dbsnmp.exe module. Explanation: Only occurs on first startup of the database after updating the IBM Netfinity Cluster Enabler software. Action: Restart the Oracle Intelligent Agent Service. o Symptom: A database instance will not start up successfully. The shared dasd storage is inaccessible from a cluster node. The Windows NT system event log contains an entry with a Source of LP6NDS35 and an Event id of 11. Explanation: Use the Windows NT Disk Administrator to check if the shared drives are not present or if they are in an order different from normal. Normally, the shared drives will be ordered before the non-shared drives. Check the Windows NT system event log for an entry with a Source of LP6NDS35, an Event id of 11, and an error code of either "e2" or "ed" in offset 0x10 of the Data section. Refer to Chapter 8 of Emulex LP6000 Fibre Channel PCI Host Adapter User Guide for a detailed description regarding this error. This problem can occur while restarting one node while another node is booting Windows NT and is in the Windows NT "blue screen" phase of bootup. Action: Soft reboot the system. o Symptom: OPSCONF does not create Net8 configuration to support an OPS cluster with more than one public network card. The user cannot select instances to start or stop from the Oracle Enterprise Manager Console. Explanation: Oracle Enterprise Manager Version 1 does not support multiple network cards on the agent machine. This can affect some operations in Oracle Enterprise Manager Console, Oracle Intelligent Agent, and the OPSCONF utility. Oracle plans to address with the next version of these programs. Check with Oracle for details of the availability of the next version. Action: Use only one public network card on the Agent machine. o Symptom: The Net8 Assistant program does not start on the Oracle Enterprise Manager Console. Explanation: When the Net8 Assistant program is selected from the Windows NT Start-Programs menu, the program may fail to start. Action: Ensure that JRE 1.1.6 is installed. There are specific instructions for installing JRE 1.1.6 with Oracle. Contact Oracle for instructions to acquire and install this program with Net8. o Symptom: The symbolic links for the shared disk partitions are not set up correctly. This may be indicated by the inability to start a service or a database. Explanation: After running the SETLINKS program to create symbolic links for the shared disk partitions, follow the instructions in the Oracle Parallel Server "Getting Started" book. If the links have not been set up correctly, the problem could be in the input .tbl file for the SETLINKS program. Action: Ensure that there is a Carriage-Return character after the last line in the .tbl file used with SETLINKS. o Symptom: The Oracle Installer program reports an incorrect amount of disk storage on the installation drive. Explanation: The actual amount of available disk storage can be checked by using Windows NT commands. There is no functional problem due to the reported value. Action: None. o Symptom: Nodes are unable to communicate with each other or clients are unable to connect to a node. PING and/or TNSPING80 report different IP addresses or fail when pinging another node. Explanation: PING and/or TNSPING80 against the local node may return a different IP address than a PING or TNSPING80 from a remote node. This is due to how host names and IP addresses are resolved by Windows NT. The result is that two or more nodes may be unable to communicate. When a node pings itself, the returned IP address is that of the first network adapter card in the Windows NT list. When a node pings a remote node, the returned IP address is that of the public network. If the public network is not connected to the first network adapter card in the machine, then the results of the two pings can be different. Action: Ensure that the first network adapter card in the machine is connected to the public network. The private network should be connected to a later network adapter card. o Symptom: "net stop ibmcoreclusterservice" indicates that OracleServiceOPS is not started, and IBMCoreClusterService is not stopped. Explanation: When the IBMCoreClusterService is installed, it makes itself a dependency of OraclePGMSService. OraclePGMSService is itself a dependency of OracleServiceOPS. When stopping IBMCoreClusterService from a command line, the user is prompted that the two other Oracle services will be stopped in order. The order is: 1. OraclePGMSService 2. OracleServiceOPS As a byproduct of step 1, OracleServiceOPS is stopped. Then when step 2 is attempted, the indication that OracleServiceOPS is not started is seen. This terminates the "net stop" command, and IBMCoreClusterService is not stopped. This is a normal behavior of Windows NT. Action: Reissue the "net stop ibmcoreclusterservice" command. Alternatively, stop the services sequentially in the following order: 1. OracleServiceOPS 2. OraclePGMSService 3. IBMCoreClusterService Alternatively, use the Windows NT Services window to stop IBMCoreClusterService. o Symptom: "SELECT * FROM v$active_instances;" returns invalid information. The response may include an incorrect list of instances, a message that no rows were found, or random characters. Explanation: This SELECT statement is valid only when the database instances are in a stable state. If a database instance is in the process of being shutdown, the response may be invalid. Action: Reissue the statement after the database instance shutdown has completed and the remaining database instances are stable. o Symptom: The OraclePGMSService service terminates when attempting to start a database instance. The error "ORA-29702: Error occurred in Group Membership Services operation" may be seen. Explanation: After receiving the message "The OraclePGMSService service was started successfully" or observing the service status change to "Started" in the Windows NT Services panel, Oracle Parallel Server must complete additional processing to complete the startup of the new node. In the current version of OPS, during this additional processing, another node may not be able to join the cluster successfully. Action: Wait at least 30 seconds after the "OraclePGMSService" service has been reported as started before attempting to start the "OraclePGMSService" service on another node. While 30 seconds is usually sufficient, the time can vary depending on the database load on the other nodes that have already joined the cluster. o Symptom: The Oracle Enterprise Manager can connect to a database but incorrectly reports that all nodes are down. Explanation: The TNSNAMES.ORA file is created when the OPSCONF utility is run. When Oracle Enterprise Manager Service Discovery is used to discover the agents on the nodes, the TNSNAMES.ORA file may get changed so that only the last discovered node is accesible. Action: After the original TNSNAMES.ORA file is created and copied to each node, make a backup copy of the file. After running Service Discovery, restore the TNSNAMES.ORA file on the node where Service Discovery was run. o Symptom: The Oracle "shutdown immediate" command does not complete within 15 minutes. Explanation: After the "shutdown immediate" command is issued, it is recommended that the OracleServiceOPSn service also be stopped. In some cases, "shutdown immediate" may take several minutes to complete. Action: Use the Windows NT Services window to stop OracleServiceOPSn or enter "net stop OracleServiceOPSn" from a command prompt, where n is the OPS instance number. If "shutdown immediate" reports that the database was closed and dismounted, then the OracleServiceOPSn may be stopped to free up resources of that database. If a message does not indicate that the database was closed and dismounted, then stopping OracleServiceOPSn may result in the loss of uncommitted changes but will not affect the integrity of committed data. o Symptom: After replacing a shared RAID drive, the auto-rebuild of the drive fails. Explanation: After a failed drive is replaced and spins up, it may be necessary to manually initiate reconstruction of the drive. Action: Before concluding that the new drive is defective, perform a manual rebuild of the drive using the following steps. Invoke the SYMplicity Storage Manager and select the "Recovery" program. Select "Options" from the Recovery menu. Select "Manual Recovery -> Drives...". Then select the drive to manually rebuild. o Symptom: The manual startup of a service fails when performed immediately after starting up a node and logging on. This occurs with one of the following services: IBMCoreClusterService, OraclePGMSService, or OracleServiceOPSn. Explanation: The system is still performing startup tasks when the attempt is made to start up the IBMCoreClusterService. This may slow the startup of this service to the point where it times out and stops. Since the Oracle services are dependent upon IBMCoreClusterServices, they also do not start. Action: Any of the following actions can be taken: - Wait a minute and retry the command to start the service. - After logging on to a system that is still starting up, wait a minute before attempting to start these services. - Set these services to "automatic" startup. This allows the system startup processes to complete before the services are started. This is the default setting when OraclePGMSService when Oracle is installed. 2.0 How to Obtain the Oracle Patch Set _______________________________________ - Go to support.oracle.com - Click on Metalink - Sign in or register for an ID - Click on Download - Select product --> Parallel Server Option and platform --> MS Windows NT - Download Patch Set: 8.0.4.x.x 3.0 Errata ___________ There is a mistake on p. 60 of the "Netfinity Cluster Enabler Getting Started Handbook for Oracle Parallel Server" that was not discovered when this publication went to press. Item 2 on p. 60 should read: 2. Connect to the instance and run the OPSALL.SQL script to create the database: C:\> CD ORACLE_HOME\OPS C:\ORACLE_HOME\OPS> SVRMGR30 SVRMGR> @OPSALL.SQL NOTE: Running the OPSALL.SQL script will only enable two nodes of the cluster. If you are setting up a third, fourth, fifth, or sixth node in the cluster, repeat the following for each node: C:\> CD ORACLE_HOME\OPS C:\ORACLE_HOME\OPS> SVRMGR30 SVRMGR> CONNECT INTERNAL/ORACLE SVRMGR> STARTUP SVRMGR> @C_THRx.SQL SVRMGR> @C_RBSx.SQL where x is the number of the node. 4.0 Trademarks and Notices ___________________________ The following terms are trademarks of the IBM Corporation in the United States or other countries or both: IBM Netfinity Windows NT is a trademark or registered trademark of Microsoft Corporation. Oracle and Oracle OPS are trademarks or registered trademarks of Oracle Corporation. Emulex is a trademark or registered trademark of Emulex Corporation. Any other company, product, and service names may be trademarks or service marks of others. THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. IBM DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF FITNESS FOR PARTICULAR PURPOSE AND MERCHANTABILITY WITH RESPECT TO THE INFORMATION IN THIS DOCUMENT. BY FURNISHING THIS DOCUMENT, IBM GRANTS NO LICENSES TO ANY PATENTS OR COPYRIGHTS. Copyright (C) 1998 IBM Corporation. All rights reserved." Note to U.S. Government Users -- Documentation related to restricted rights -- Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.