Monday 21 July, 2008 - 08:34
The padstow cluster had a few problems over the past week:
- The DATA disk group got corrupted again (the same tablespace was affected - SYSAUX).
- The cluster had timing problems - at one point the padstow2 node was nine (9) seconds ahead of padstow1 .
- The Grid Control agent was not collecting data or picking up targets on either node of the cluster.
Now I know there are some errors that ASM cannot protect against. I had to do a PITR because the archive logs were not duplexed across the DATA and FRA disk groups. At least, I am getting practice with RMAN backups and restorations.
To overcome the timing problems, I decided to go back to using NTP with gridctrl as the local NTP server. Although the other nodes recognise gridctrl as a peer (via ntpq peer), they still insist on using the local clock as the timing source.
The implementation procedure for NTP I have been using is:
-
vi /etc/ntp.conf
(to add "server gridctrl") -
Get the
ntp
service to recognise the new NTP server:
service ntpd restart ntptime # to check the time
The Grid Control agent took several attempts at reinstallation before all the targets were detected. I am still having data collection errors. At least, I did not have to recreate the cluster from scratch to get this far.