Cluster nodes maintain their heartbeat via private network and voting disk. When there is a private network disruption, cluster nodes can not communicate to each other via private network for the time period of misscount setting, split brain will happen. In such case, voting disk will be used to determine which node(s) survive and which node(s) will be evicted. The common voting result will be:

a. The group with more cluster nodes survive
b. The group with lower node member in case of same number of node(s) available in each group

Split Brain occurs when the instance members in a RAC fail to ping/connect to each other via this private interconnect, but the servers are all pysically up and running and the database instance on each of these servers is also running.

Due to lack of commincation the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instance running, the same block might get read, updated in these individual instances and there would be data integrity issue, as the blocks changed in one instance, will not be locked and could be over-written by another instance. Oracle has efficiently implemented check for the split brain syndrome.

In RAC if any node becomes inactive, or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detects that one of the node is not accessible, it will evict that node from the RAC group. e.g. there are 4 nodes in a rac instance, and node 3 becomes unavailable, and node 1 tries to connect to node 3 and finds it not responding, then node 1 will evict node 3 out of the RAC groups and will leave only Node1, Node2 & Node4 in the RAC group to continue functioning.

The split brain concepts can become more complicated in large RAC setups. For example there are 10 RAC nodes in a cluster. And say 4 nodes are not able to communicate with the other 6. So there are 2 groups formed in this 10 node RAC cluster ( one group of 4 nodes and other of 6 nodes). Now the nodes will quickly try to affirm their membership by locking controlfile, then the node that lock the controlfile will try to check the votes of the other nodes. The group with the most number of active nodes gets the preference and the others are evicted.

We Can use the script below to test internetwork connectivity , setting the privae ip addresses as aplicable inyour environment.


export TODAY=`date "+%Y%m%d"`
while [ $TODAY -lt <the time you want the script to stop running> ] # format needs to be YearMonthDate 
export TODAY=`date "+%Y%m%d"`
export LOGFILE=<log file directory>/interconnect_test_${TODAY}.log
ssh <private Ip address for node 1> "hostname; date" >> $LOGFILE 2>&1
ssh <private Ip address for node 2> "hostname; date" >> $LOGFILE 2>&1

echo "" >> $LOGFILE
echo "" >> $LOGFILE

sleep 5