2017-11-28 Recreate REDFERN Cluster on VICTORIA


Overview

After discovering why ASM and CRS failed to start up, I decided to rebuild the REDFERN cluster from scratch.

References

Procedure

Using Previous Procedures

Only re-initiallised the root_disk in both VMs in order to get a clean installation.

This time, I selected the minimal development environment in order to install Perl, and I created douglas as an admin user. Used visudo to not require the password for sudo for admin users ( wheel group).

Restart the Installation

I did the following:

  1. Installed the pre-installatuion RPM as described in Install GI 12.1.0.2 .
  2. Then the UDEV settings were set correctly as per Correct UDEV Settings .
  3. Rebooted both systems
  4. Use NFS for Oracle Software
  5. Overwrote the old OCR disk header with dd if=/dev/zero of=/dev/xvdh bs=8K count=12800
  6. Started the GI installation as described in Install GI 12.1.0.2 .

Installer Failure

At Step 7: Network Interface Usage , the following message appeared:

Cause - Installer has detected that network interface eth0 does not maintain connectivity on all cluster nodes.

Action - Ensure that the chosen interface has been configured across all cluster nodes.

Additional Information:

Summary of the failed nodes
redfern2 - PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern2"
No route to host
- Cause: An attempt to execute exectask on the specified node failed.
- Action: Examine the accompanying error message for details or contact Oracle Support Services.
redfern1 - PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern1"
No route to host
- Cause: An attempt to execute exectask on the specified node failed.
- Action: Examine the accompanying error message for details or contact Oracle Support Services.

Investigate Failure

Following the advice in [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (Doc ID 1427202.1) , I ran the following commands:

cd /opt/share/Software/grid/linuxamd64_12102/grid
./runcluvfy.sh comp nodecon -i eth0 -n redfern1,redfern2 -verbose

The output was:

Verifying node connectivity 

Checking node connectivity...

Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  redfern2                              passed                  
  redfern1                              passed                  

Verification of the hosts config file successful


Interface information for node "redfern2"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   192.168.1.141   192.168.1.0     UNKNOWN         UNKNOWN         00:16:3E:00:00:12 1500  


Interface information for node "redfern1"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   192.168.1.140   192.168.1.0     UNKNOWN         UNKNOWN         00:16:3E:00:00:0E 1500  


Check: Node connectivity using interfaces on subnet "192.168.1.0"

Check: Node connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  redfern1[192.168.1.140]         redfern2[192.168.1.141]         yes             
Result: Node connectivity passed for subnet "192.168.1.0" with node(s) redfern1,redfern2


Check: TCP connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  redfern1 : 192.168.1.140        redfern1 : 192.168.1.140        passed          
  redfern2 : 192.168.1.141        redfern1 : 192.168.1.140        failed          

ERROR: 
PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern2"
No route to host
  redfern1 : 192.168.1.140        redfern2 : 192.168.1.141        failed          

ERROR: 
PRVG-11850 : The system call "connect" failed with error "113" while executing exectask on node "redfern1"
No route to host
  redfern2 : 192.168.1.141        redfern2 : 192.168.1.141        passed          
Result: TCP connectivity check failed for subnet "192.168.1.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed.

Result: Node connectivity check failed


Verification of node connectivity was unsuccessful on all the specified nodes. 

Solution: Disable Firewall

Following part of Oracle RAC12c on OL 7 using Virtualbox ,I arn the following commands on both REDFREN1 and REDFREN1 :

sudo systemctl stop firewalld
sudo systemctl disable firewalld

The output was:

Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

Validate Network Connectivity

Following the advice in [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (Doc ID 1427202.1) , I ran the following commands:

cd /opt/share/Software/grid/linuxamd64_12102/grid
./runcluvfy.sh comp nodecon -i eth0 -n redfern1,redfern2 -verbose

The output was:

Verifying node connectivity 

Checking node connectivity...

Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  redfern2                              passed                  
  redfern1                              passed                  

Verification of the hosts config file successful


Interface information for node "redfern2"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   192.168.1.141   192.168.1.0     UNKNOWN         UNKNOWN         00:16:3E:00:00:12 1500  


Interface information for node "redfern1"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   192.168.1.140   192.168.1.0     UNKNOWN         UNKNOWN         00:16:3E:00:00:0E 1500  


Check: Node connectivity using interfaces on subnet "192.168.1.0"

Check: Node connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  redfern1[192.168.1.140]         redfern2[192.168.1.141]         yes             
Result: Node connectivity passed for subnet "192.168.1.0" with node(s) redfern1,redfern2


Check: TCP connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  redfern1 : 192.168.1.140        redfern1 : 192.168.1.140        passed          
  redfern2 : 192.168.1.141        redfern1 : 192.168.1.140        passed          
  redfern1 : 192.168.1.140        redfern2 : 192.168.1.141        passed          
  redfern2 : 192.168.1.141        redfern2 : 192.168.1.141        passed          
Result: TCP connectivity check passed for subnet "192.168.1.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed.

Result: Node connectivity check passed

Checking multicast communication...

Checking subnet "192.168.1.0" for multicast communication with multicast group "224.0.0.251"...
Check of subnet "192.168.1.0" for multicast communication with multicast group "224.0.0.251" passed.

Check of multicast communication passed.

Verification of node connectivity was successful. 

Restart GI Installation

Followed the procedure in Install GI 12.1.0.2 . However, Step 18: Install Product failed with:

Cause - Installer has failed to execute the specified script on one or more nodes. This might be because of exception occurred while executing the script on nodes.
Action - Review the log files '/opt/app/oraInventory/logs/installActions2017-11-28_08-10-33PM.log' and '/opt/app/grid_infra/12.1.0/grid/cfgtoollogs/crsconfig/rootcrs_<nodename>_<timestamp>.log' for further details.
More Details
Execution of GI Install script is failed on nodes : [redfern1]  Exception details 
- PRCZ-2009 : Failed to execute command "/opt/app/grid_infra/12.1.0/grid/root.sh" as root within 3,600 seconds on nodes "redfern1"
- PRCZ-2009 : Failed to execute command "/opt/app/grid_infra/12.1.0/grid/root.sh" as root within 3,600 seconds on nodes "redfern1"
  
Execution status of failed node:redfern1 
 Errors 
 :  Performing root user operation.
The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /opt/app/grid_infra/12.1.0/grid Copying dbhome to /usr/local/bin ... 
Copying oraenv to /usr/local/bin ... 
Copying coraenv to /usr/local/bin ...   
Creating /etc/oratab file... 
Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file:
/opt/app/grid_infra/12.1.0/grid/crs/install/crsconfig_params
2017/11/28 20:45:27 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/11/28 20:46:24 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/11/28 20:46:26 CLSRSC-363: User ignored prerequisites during installation OLR initialization -
successful root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert 
2017/11/28 20:47:10 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'redfern1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'redfern1'
CRS-2676: Start of 'ora.mdnsd' on 'redfern1' succeeded
CRS-2676: Start of 'ora.evmd' on 'redfern1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'redfern1'
CRS-2676: Start of 'ora.gpnpd' on 'redfern1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'redfern1'
CRS-2672: Attempting to start 'ora.gipcd' on 'redfern1'
CRS-2676: Start of 'ora.cssdmonitor' on 'redfern1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'redfern1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'redfern1'
CRS-2672: Attempting to start 'ora.diskmon' on 'redfern1'
CRS-2676: Start of 'ora.diskmon' on 'redfern1' succeeded
CRS-2676: Start of 'ora.cssd' on 'redfern1' succeeded
ASM created and started successfully.
Disk Group VOTE created successfully.
2017/11/28 20:53:03 CLSRSC-12: The ASM resource ora.asm did not start
2017/11/28 20:53:03 CLSRSC-258: Failed to configure and start ASM Died at /opt/app/grid_infra/12.1.0/grid/crs/install/crsinstall.pm line 2017.
The command '/opt/app/grid_infra/12.1.0/grid/perl/bin/perl -I/opt/app/grid_infra/12.1.0/grid/perl/lib -I/opt/app/grid_infra/12.1.0/grid/crs/install /opt/app/grid_infra/12.1.0/grid/crs/install/rootcrs.pl -auto -lang=en_AU.UTF-8' execution failed 
 Standard output 
 :  Performing root user operation.
The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /opt/app/grid_infra/12.1.0/grid Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file:
/opt/app/grid_infra/12.1.0/grid/crs/install/crsconfig_params
2017/11/28 20:45:27 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2017/11/28 20:46:24 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2017/11/28 20:46:26 CLSRSC-363: User ignored prerequisites during installation OLR initialization -
successful root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert
2017/11/28 20:47:10 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'redfern1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'redfern1'
CRS-2676: Start of 'ora.mdnsd' on 'redfern1' succeeded
CRS-2676: Start of 'ora.evmd' on 'redfern1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'redfern1'
CRS-2676: Start of 'ora.gpnpd' on 'redfern1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'redfern1'
CRS-2672: Attempting to start 'ora.gipcd' on 'redfern1'
CRS-2676: Start of 'ora.cssdmonitor' on 'redfern1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'redfern1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'redfern1'
CRS-2672: Attempting to start 'ora.diskmon' on 'redfern1'
CRS-2676: Start of 'ora.diskmon' on 'redfern1' succeeded
CRS-2676: Start of 'ora.cssd' on 'redfern1' succeeded
ASM created and started successfully.
Disk Group VOTE created successfully.
2017/11/28 20:53:03 CLSRSC-12: The ASM resource ora.asm did not start
2017/11/28 20:53:03 CLSRSC-258: Failed to configure and start ASM Died at /opt/app/grid_infra/12.1.0/grid/crs/install/crsinstall.pm line 2017. The command '/opt/app/grid_infra/12.1.0/grid/perl/bin/perl -I/opt/app/grid_infra/12.1.0/grid/perl/lib -I/opt/app/grid_infra/12.1.0/grid/crs/install /opt/app/grid_infra/12.1.0/grid/crs/install/rootcrs.pl -auto -lang=en_AU.UTF-8' execution failed