OMS Does Not Start After Server is on New Subnet

References

Oracle® Database Net Services Reference 11g Release 1 (11.1)

Oracle® Database Reference 11g Release 1 (11.1)

Oracle® Database SQL Language Reference 11g Release 1 (11.1)

Oracle® Enterprise Manager Administration 10g Release 5 (10.2.0.5)

RHEL 5.0 Deployment Guide

Overview

GRIDCTRL is having a problem of starting up OMS and OMA after I moved the server from the 10.1.1.0/24 subnet to 192.168.1.0/24 .

The root cause was an entry in /etc/hosts pointing to the old subnet.

Review of Changes Made

I updated the DNS files on GRIDCTRL to refer to the new subnet:

  • /var/named/chroot/etc/named.conf
  • /var/named/chroot/var/named/yaocm.id.au.rr.zone
  • /var/named/chroot/var/named/yaocm.id.au.zone

The named service was restarted via the Services Configuration Tool .

Analysis

Starting OMS

When I tried to start OMS on GRIDCTRL ( Using emctl to Start, Stop, and Check the Status of the Oracle Management Service ), I get the following messages:

[oracle@gridctrl ~]$ emctl start oms
Oracle Enterprise Manager 10g Release 5 Grid Control
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
opmnctl: opmn is already running
Starting HTTP Server ...
Starting Oracle Management Server ...
Checking Oracle Management Server Status ...
Oracle Management Server is Down.

Review Log

The following messages appearred in /opt/oracle/app/OracleHomes/oms10g/sysman/log/emoms.trc whenever the OMS tries to start:

2012-02-02 17:21:22,320 [Orion Launcher] WARN  jdbc.ConnectionCache _getConnection.352 - Io exception: The Network Adapter could not establish the connection
java.sql.SQLException: Io exception: The Network Adapter could not establish the connection
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:137)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:174)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:286)
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:332)
        at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:429)
        at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:152)
        at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:31)
        at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:608)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:217)
        at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPhysicalConnection(OracleConnectionPoolDataSource.java:113)
        at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPooledConnection(OracleConnectionPoolDataSource.java:76)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.makeCacheConnection(OracleImplicitConnectionCache.java:1361)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.getCacheConnection(OracleImplicitConnectionCache.java:439)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.getConnection(OracleImplicitConnectionCache.java:334)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:285)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:253)
        at oracle.sysman.util.jdbc.ConnectionCache._getConnection(ConnectionCache.java:336)
        at oracle.sysman.util.jdbc.ConnectionCache._getConnection(ConnectionCache.java:322)
        at oracle.sysman.util.jdbc.ConnectionCache.getUnwrappedConnection(ConnectionCache.java:575)
        at oracle.sysman.emSDK.svc.conn.FGAConnectionCache.getFGAConnection(FGAConnectionCache.java:218)
        at oracle.sysman.emSDK.svc.conn.ConnectionService.getPrivateConnection(ConnectionService.java:1162)
        at oracle.sysman.emSDK.svc.conn.ConnectionService.getRepositoryVersionAndMode(ConnectionService.java:762)
        at oracle.sysman.emSDK.svc.conn.ConnectionService.verifyRepositoryEx(ConnectionService.java:840)
        at oracle.sysman.emSDK.svc.conn.ConnectionService.verifyRepository(ConnectionService.java:934)
        at oracle.sysman.eml.app.ContextInitializer.contextInitialized(ContextInitializer.java:301)
        at com.evermind.server.http.HttpApplication.initDynamic(HttpApplication.java:1020)
        at com.evermind.server.http.HttpApplication.(HttpApplication.java:560)
        at com.evermind.server.Application.getHttpApplication(Application.java:915)
        at com.evermind.server.http.HttpServer.getHttpApplication(HttpServer.java:707)
        at com.evermind.server.http.HttpSite.initApplications(HttpSite.java:637)
        at com.evermind.server.http.HttpSite.setConfig(HttpSite.java:278)
        at com.evermind.server.http.HttpServer.setSites(HttpServer.java:278)
        at com.evermind.server.http.HttpServer.setConfig(HttpServer.java:179)
        at com.evermind.server.ApplicationServer.initializeHttp(ApplicationServer.java:2435)
        at com.evermind.server.ApplicationServer.setConfig(ApplicationServer.java:1592)
        at com.evermind.server.ApplicationServerLauncher.run(ApplicationServerLauncher.java:92)
        at java.lang.Thread.run(Thread.java:534)
2012-02-02 17:21:22,709 [Orion Launcher] WARN  jdbc.ConnectionCache _getConnection.353 - Got a fatal exeption when getting a connection; Error code = 17002; Cleaning up cache and retrying

Check the Listener Log

This looks like a problem of trying to connect to the repository database via the listener.

Listener Logs Not in ADR Structure

There are two (2) problems with the listener as shown by the output from the lsnrctl status command:

  1. The logs are not in the right place for adrci ;
  2. The repos database instance is not registered with the listener.
[oracle@gridctrl ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 02-FEB-2012 10:34:01

Copyright (c) 1991, 2008, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.1.0.7.0 - Production
Start Date                02-FEB-2012 06:22:40
Uptime                    0 days 4 hr. 11 min. 21 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/app/OracleHomes/db11g/network/admin/listener.ora
Listener Log File         /opt/oracle/app/OracleHomes/db11g/log/diag/tnslsnr/gridctrl/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=gridctrl.yaocm.id.au)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
Change Location of Listener Log

The problem with the location of the logs was fixed by adding the following line to /opt/oracle/app/OracleHomes/db11g/install/unix/scripts/seedstup :

export ORACLE_BASE=/opt/oracle/app

After GRIDCTRL was restarted, the logs are now in the correct place for ADR as shown by the lsnrctl status command:

[oracle@gridctrl ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 02-FEB-2012 10:49:49

Copyright (c) 1991, 2008, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.1.0.7.0 - Production
Start Date                02-FEB-2012 10:42:34
Uptime                    0 days 0 hr. 7 min. 15 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/app/OracleHomes/db11g/network/admin/listener.ora
Listener Log File         /opt/oracle/app/diag/tnslsnr/gridctrl/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=gridctrl.yaocm.id.au)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

However, the listener log is not showing any connection errors.

Registering Repository Database with Listener

I hacked the REPOS database instance by using the LOCAL_LISTENER system parameter.

tnsnames.ora contains the following entry:

REPOS =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC ))
    )
    (CONNECT_DATA =
      (SID = repos)
    )
  )

Changed the system parameter as follows and registered the instance with the listener via the ALTER SYSTEM command:

SQL< alter system set local_listener='REPOS' scope=both;

System altered.

SQL< alter system register;

System altered.

The database instance is now registered with the listener as shown by the lsnrctl services command:

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 02-FEB-2012 10:51:12

Copyright (c) 1991, 2008, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0
         LOCAL SERVER
Service "repos.yaocm.id.au" has 1 instance(s).
  Instance "repos", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER
Service "repos_XPT.yaocm.id.au" has 1 instance(s).
  Instance "repos", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER
The command completed successfully

Attempt to Start OMS Again

Even with the database service registered, the startup of OMS still fails.

Root Cause Identified

Did a google search using the phrase "[Orion Launcher] WARN jdbc.ConnectionCache _getConnection.353 - Got a fatal exeption when getting a connection; Error code = 17002; Cleaning up cache and retrying" and found a hint about /etc/hosts in Grid Control windows installation fails (Business & Enterprise Application) .

Checked /etc/hosts on GRIDCTRL and found the problem in the following line:

10.1.1.252 gridctrl.yaocm.id.au gridctrl

You really should use the Managing Hosts utility instead of editting the /etc/hosts directly.

Changed this line to:

192.168.1.252 gridctrl.yaocm.id.au gridctrl

And the startup of OMS was successful ( Using emctl to Start, Stop, and Check the Status of the Oracle Management Service ):

[oracle@gridctrl ~]$ emctl start oms
Oracle Enterprise Manager 10g Release 5 Grid Control
Copyright (c) 1996, 2009 Oracle Corporation.  All rights reserved.
opmnctl: opmn is already running
Starting HTTP Server ...
Starting Oracle Management Server ...
Checking Oracle Management Server Status ...
Oracle Management Server is Up.

Conclusion

Although I had changed the DNS correctly, I forgot to check the /etc/hosts file for entries.

The change to the LOCAL_LISTENER parameter was unnecessary because the listener is using the default port of 1521. I have left it there in order to maintain the IPC backdoor.