References
Oracle® Enterprise Manager Advanced Configuration 10g Release 5 (10.2.0.5)
- Chapter 2, "Starting and Stopping Enterprise Manager Components"
- Chapter 9, "Locating and Configuring Enterprise Manager Log Files"
Overview
After moving
BANKSTOWN
from
10.1.1.0/24
subnet to the
192.168.1.0/24
one, the EM agent failed on startup. Correcting the hosts entry entry fixed the problem.
Analysis
Startup
Using
Starting, Stopping, and Checking the Status of the Management Agent on UNIX
, the startup of the agent is done by
emctl start agent
. The message returned is:
[oracle@bankstown ~]$ emctl start agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Starting agent ........................................................................................................................... failed .
Consult the log files in: /u00/app/oracle/agent10g/sysman/log
Logs
The following logs were checked for messages at the time of agent startup at 21:58 on 2 February 2011:
EMAGENT.LOG
About the Management Agent Log and Trace Files
describes
emagent.log
as:
- The Management Agent log file (
emagent.log
)The Agent saves information to the log file when the Agent performs an action (such as starting, stopping, or connecting to a Management Service) or when the Agent generates an error (for example, when the Agent cannot connect to the Management Service).
/u00/app/oracle/agent10g/sysman/log/emagent.log
shows:
2012-02-02 21:58:49,603 Thread-10050336 Starting Agent 10.2.0.5.0 from /u00/app/oracle/agent10g (00701)
2012-02-02 21:58:54,547 Thread-10050336 [E-Mail IMAP Server] InstanceProperty (imap_host) is marked OPTIONAL but is being used. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml (00506)
2012-02-02 21:58:57,291 Thread-10050336 [E-Mail POP Server] InstanceProperty (pop_host) is marked OPTIONAL but is being used. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/pop.xml (00506)
2012-02-02 21:58:57,502 Thread-10050336 [E-Mail SMTP Inbound Server] InstanceProperty (smtp_host) is marked OPTIONAL but is being used. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/smtp_in.xml (00506)
2012-02-02 21:58:57,518 Thread-10050336 [E-Mail SMTP Outbound Server] InstanceProperty (smtp_host) is marked OPTIONAL but is being used. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/smtp_out.xml (00506)
2012-02-02 22:00:11,958 Thread-10050336 EMAgent started successfully (00702)
The highlighted message contradicts the output from the
emctl start agent
.
EMAGENT.TRC
About the Management Agent Log and Trace Files
describes
emagent.trc
as:
- The Management Agent trace file (
emagent.trc
)The Agent saves information to the log file when the Agent performs an action (such as starting, stopping, or connecting to a Management Service) or when the Agent generates an error (for example, when the Agent cannot connect to the Management Service).
/u00/app/oracle/agent10g/sysman/log/emagent.trc
shows:
2012-02-02 21:58:50,555 Thread-10050336 WARN command: Job Subsystem Timeout set at 600 seconds
2012-02-02 21:58:50,642 Thread-10050336 WARN upload: Upload manager has no Failure script: disabled
2012-02-02 21:58:51,315 Thread-10050336 WARN metadata: File=file:/u00/app/oracle/agent10g/sysman/admin/metadata/bea_alsb.xml,line=10: <AssocTargetInstance> is not valid element, will be ignored
2012-02-02 21:58:54,543 Thread-10050336 WARN metadata: TABLE metric umsg_imap_user_count can't define table_name: mail_imap_usercount. Filename:/u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,543 Thread-10050336 WARN metadata: Metric umsg_imap_user_count does not have any data columns. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,543 Thread-10050336 WARN metadata: TABLE metric umsg_imap_db_session_count can't define table_name: umsg_imap_dbconnections. Filename:/u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,544 Thread-10050336 WARN metadata: Metric umsg_imap_db_session_count does not have any data columns. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,544 Thread-10050336 WARN metadata: TABLE metric umsg_imap_socket_count can't define table_name: umsg_imap_socketcount. Filename:/u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,544 Thread-10050336 WARN metadata: Metric umsg_imap_socket_count does not have any data columns. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,544 Thread-10050336 WARN metadata: TABLE metric umsg_imap_connections can't define table_name: mail_imap_connections. Filename:/u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,544 Thread-10050336 WARN metadata: TABLE metric umsg_imap_connections_lost can't define table_name: umsg_imap_connections_lost. Filename:/u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
2012-02-02 21:58:54,544 Thread-10050336 WARN metadata: Metric umsg_imap_connections_lost does not have any data columns. Filename /u00/app/oracle/agent10g/sysman/admin/metadata/imap.xml
This log file does not show any errors regarding the startup problem, if any.
EMAGENT.NOHUP
About the Management Agent Log and Trace Files
describes
emagent.nohup
as:
- The Management Agent startup log file (
emagent.nohup
)The Management Agent saves information to the startup log file when there is a problem starting the agent. This file is updated by the Management Agent Watchdog Process. When the Watchdog Process finds any problems, it logs to this file.
/u00/app/oracle/agent10g/sysman/log/emagent.nohup
shows:
----- Thu Feb 2 21:58:48 2012::Time elapsed between Launch of Watchdog process and execing EMAgent is 1 secs -----
----- Thu Feb 2 21:58:48 2012::Agent Launched with PID 5799 at tim Thu Feb 2 21:58:48 2012 -----
(pid=5799): starting emagent version 10.2.0.5.0
----- Thu Feb 2 21:59:22 2012::Checking status of EMAgent : 5799 - ---
----- Thu Feb 2 21:59:52 2012::Checking status of EMAgent : 5799 - ---
(pid=5799): emagent started successfully
----- Thu Feb 2 22:00:26 2012::Checking status of EMAgent : 5799 - ---
This log also indicates that the startup of the EM agent was successful as indicated by the highligthed message. Then, why does the output from the
emagent start agent
command say otherwise?
EMDCTL.TRC
About the Management Agent Log and Trace Files
describes
emdctl.trc
as:
Agent control utility trace file
/u00/app/oracle/agent10g/sysman/log/emdctl.trc
shows:
2012-02-02 21:58:36,476 Thread-9304864 WARN http: nmehl_connect_internal: connect failed to (bankstown.yaocm.id.au:3872): No route to host (error = 113)
2012-02-02 21:58:36,537 Thread-9304864 ERROR main: nmectla_agentctl: Error connecting to https://bankstown.yaocm.id.au:3872/emd/main/. Returning status code 1
Here we have, in the highlighted message above, the first clue to what the problem is: the agent cannot connect to the host itis monitoring.
EMAGENT_PERL.TRC
About the Management Agent Log and Trace Files
describes
emagent_perl.trc
as:
Trace file for the PERL scripts. This includes the PERL metrics and the discovery
/u00/app/oracle/agent10g/sysman/log/emagent_perl.trc
shows:
dynamic_esa_config.pl: Thu Feb 2 21:59:00 2012: ERROR: dynamic_esa_config.pl STARTED
dynamic_esa_config.pl: Thu Feb 2 21:59:00 2012: ERROR: File /u00/app/oracle/agent10g/sysman/config/esa/bankstown.yaocm.id.au_host.properties was not found at location /u00/app/oracle/agent10g/sysman/config/esa
dynamic_esa_config.pl: Thu Feb 2 21:59:00 2012: ERROR: dynamic_esa_config.pl ENDED
maxRowCount.pl: Thu Feb 2 21:59:00 2012: ERROR: File /u00/app/oracle/agent10g/sysman/config/esa/LISTENER_bankstown.yaocm.id.au_oracle_listener.properties was not found at location /u00/app/oracle/agent10g/sysman/config/esa
has_metrics.pl: Thu Feb 2 21:59:18 2012: WARN: WARN:has::Common:: has::Common::hasGetSQLResults: Could not connect to dbsnmp/(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=bankstown.yaocm.id.au)(Port=1521))(CONNECT_DATA=(SID=ocm11g))): ORA-12560: TNS:protocol adapter error (DBD ERROR: OCIServerAttach) at /u00/app/oracle/agent10g/sysman/admin/scripts/has/SQL.pl line 252, <STDIN> line 2.
has_metrics.pl: Thu Feb 2 21:59:22 2012: WARN: WARN:has::Common:: has::Common::hasGetSQLResults: Could not connect to dbsnmp/(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=bankstown.yaocm.id.au)(Port=1521))(CONNECT_DATA=(SID=ocm11g))): ORA-12560: TNS:protocol adapter error (DBD ERROR: OCIServerAttach) at /u00/app/oracle/agent10g/sysman/admin/scripts/has/SQL.pl line 252, <STDIN> line 2.
The highlighted error message reinforces the idea that a TCP client client cannot connect to the host called
bankstown.yaocm.id.au
.
EMCTL.LOG
The log,
emctl.log
, is not described in
About the Management Agent Log and Trace Files
.
/u00/app/oracle/agent10g/sysman/log/emctl.log
shows:
5200 :: Thu Feb 2 21:58:30 2012::AgentLifeCycle.pm: Processing status agent
5200 :: Thu Feb 2 21:58:30 2012::AgentStatus.pm:Processing status agent
5200 :: Thu Feb 2 21:58:36 2012::AgentStatus.pm:emdctl status returned 1
5763 :: Thu Feb 2 21:58:44 2012::AgentLifeCycle.pm: Processing start agent
5763 :: Thu Feb 2 21:58:44 2012::AgentLifeCycle.pm: EMHOME is /u00/app/oracle/agent10g
5763 :: Thu Feb 2 21:58:44 2012::AgentLifeCycle.pm: service name is
5763 :: Thu Feb 2 21:58:47 2012::AgentLifeCycle.pm:status agent returned with retCode=1
5763 :: Thu Feb 2 22:06:43 2012::AgentLifeCycle.pm: Exited loop with retCode=1
7544 :: Thu Feb 2 22:27:59 2012::AgentLifeCycle.pm: Processing status agent
7544 :: Thu Feb 2 22:27:59 2012::AgentStatus.pm:Processing status agent
7544 :: Thu Feb 2 22:28:02 2012::AgentStatus.pm:emdctl status returned 1
Three (3) processes exited with status of one (1). No reasons are given.
AGABEND.LOG
About the Management Agent Log and Trace Files
describes
agabend.log
as:
This log provided in 10.2.0.3 or higher contains all the Agent startup errors. Errors will be added for each failed startup to this file. The Agent watchdog mines this file, to report on an abnormal end of the Agent.
/u00/app/oracle/agent10g/sysman/log/agabend.log
shows:
Thu Feb 2 21:58:47 2012
XXXXXXXXXXXXXXXX
No messages of interest here at all.
Result of Log Analysis
The following logs indicate that there is a TCP connectivity issue with the host called
bankstown.yaocm.id.au
:
Diagnosis
Since the error messages indicate a TCP connectivity issue, let's try a simple ping test first:
[oracle@bankstown log]$ ping -c 5 bankstown.yaocm.id.au
PING bankstown.yaocm.id.au (10.1.1.244) 56(84) bytes of data.
From bankstown.yaocm.id.au (192.168.1.244) icmp_seq=2 Destination Host Unreachable
From bankstown.yaocm.id.au (192.168.1.244) icmp_seq=3 Destination Host Unreachable
From bankstown.yaocm.id.au (192.168.1.244) icmp_seq=4 Destination Host Unreachable
From bankstown.yaocm.id.au (192.168.1.244) icmp_seq=5 Destination Host Unreachable
--- bankstown.yaocm.id.au ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4141ms,
pipe 2
Love how the IP address of the reverse name lookup changes with the second ping message!
Check Hosts Data
Used the procedure in 16.9. Managing Hosts to verify that the following hosts entry existed:
IP | Name | Aliases |
---|---|---|
10.1.1.244 | bankstown.yaocm.id.au | bankstown |
Changed the IP address from
10.1.1.244
to
192.168.1.244
and saved the changes.
Ping is now Successful
Pinging
bankstown.yaocm.id.au
is now successful:
[oracle@bankstown log]$ ping -c 5 bankstown.yaocm.id.au
PING bankstown.yaocm.id.au (192.168.1.244) 56(84) bytes of data.
64 bytes from bankstown.yaocm.id.au (192.168.1.244): icmp_seq=1 ttl=64 time=0.369 ms
64 bytes from bankstown.yaocm.id.au (192.168.1.244): icmp_seq=2 ttl=64 time=0.097 ms
64 bytes from bankstown.yaocm.id.au (192.168.1.244): icmp_seq=3 ttl=64 time=0.032 ms
64 bytes from bankstown.yaocm.id.au (192.168.1.244): icmp_seq=4 ttl=64 time=0.032 ms
64 bytes from bankstown.yaocm.id.au (192.168.1.244): icmp_seq=5 ttl=64 time=0.033 ms
--- bankstown.yaocm.id.au ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.032/0.112/0.369/0.131 ms
Start the Agent Again
Using the
emctl start agent
from
Starting, Stopping, and Checking the Status of the Management Agent on UNIX
, I get the following result:
[oracle@bankstown log]$ emctl start agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. Agent is already running
This confirms the successful startup messages seen in the following logs:
Check the Status of the EM Agent
Using the
emctl start agent
from
Starting, Stopping, and Checking the Status of the Management Agent on UNIX
, I get the following result:
[oracle@bankstown log]$ emctl status agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 10.2.0.5.0
OMS Version : 10.2.0.5.0
Protocol Version : 10.2.0.5.0
Agent Home : /u00/app/oracle/agent10g
Agent binaries : /u00/app/oracle/agent10g
Agent Process ID : 5799
Parent Process ID : 5781
Agent URL : https://bankstown.yaocm.id.au:3872/emd/main/
Repository URL : https://gridctrl.yaocm.id.au:1159/em/upload
Started at : 2012-02-02 21:58:48
Started by user : oracle
Last Reload : 2012-02-02 22:09:00
Last successful upload : 2012-02-02 22:55:56
Total Megabytes of XML files uploaded so far : 5.30
Number of XML files pending upload : 0
Size of XML files pending upload(MB) : 0.00
Available disk space on upload filesystem : 52.08%
Last successful heartbeat to OMS : 2012-02-02 22:59:35
---------------------------------------------------------------
Agent is Running and Ready
Conclusion
As with the drama documented in OMS Does Not Start After Server is on New Subnet , the hosts entry was the root cause of the problem.