Richmond Cluster (13)

Thursday 08 May, 2008 - 19:40

I think I have found the problem described in Richmond Cluster (12) . The TCP/IP parameters were not set correctly. The specific one was sys.net.ipv4.ip_local_port_range which had values 32768 61000. The recommended values from Clusterware Installation for Linux (pp.2-37 to 2-38) are 1024 65000. I checked the other TCP/IP and found that they were wrong as well:
net.core.rmem_default = 65535
net.core.wmem_default = 65535
net.core.rmem_max = 131071
net.core.wmem_max = 131071

My reasoning was that the script was trying to establish communications on port 6200 which requires root privilege at the current settings.

Updated the networking parameters on both richmond1 and richmond2 :
$ su -
# cat >>/etc/sysctl.conf
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 1048576
net.core.wmem_default = 262144
net.core.rmem_max = 1048576
net.core.wmem_max = 262144

# systctl -p

Looks like I cannot rely on cluvfy for everything.

Clicked retry in OUI. Failed at the same point again.

Stopped crs on both systems:
# cd /u00/crs/oracle/product/10/app/bin
# ./crsctl stop crs
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.

Started crs on both nodes:
# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

However when I check the status of crs, I get the following:
# ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM

The system log ( /var/log/messages ) shows:
logger: Oracle CSS daemon failed to start up. Check CRS logs for diagnostics.

On richmond2 , /u00/crs/oracle/product/10/app/log/richmond2/alertrichmond2.log shows:
[cssd(19779)]CRS-1604:CSSD voting file is offline: /dev/raw/raw2. Details in /u00/crs/oracle/product/10/app/log/richmond2/cssd/ocssd.log.
[cssd(19779)]CRS-1604:CSSD voting file is offline: /dev/raw/raw17. Details in /u00/crs/oracle/product/10/app/log/richmond2/cssd/ocssd.log.
[cssd(19779)]CRS-1604:CSSD voting file is offline: /dev/raw/raw32. Details in /u00/crs/oracle/product/10/app/log/richmond2/cssd/ocssd.log.

On richmond2 , /u00/crs/oracle/product/10/app/log/client/css.log shows:
[ CSSCLNT][3076425056]clsssInitNative: connect failed, rc 9

The logs on richmond1 are not that helpful.

I decided to deinstall clusterware and reinstall it.