OBIEE 11g – Horizontal Clustering


Hola … I was thinking about to post this thread from long time (since last 5 months) as I have been extensively involved in a OBIEE 11g platform build from scratch, playing with 3 nodes OBIEE installation , Vertical / horizontal clustering et all.

Now the game is over and I want to detail out each and every steps in details and the hurdles I have faced during step by step setup execution.

Honestly before writing this blog thread I have searched across web and googling around several sites to get consolidated Clustering steps and I never found it. This is nothing mean to wrong to other blogger who are posting excellent stuff in their blogs since ages but I feel each post on Horizontal Cluster lacks some information on details.

Idea is to perform 3 nodes horizontal clustering with one OBIEE instance running on each of the nodes. OBIEE version to be installed in 11.1.1.7.0 with bundle patch version of p18283508. which will make final version to :

11.1.1.7.140415 (Build 140402.1431 64-bit)

There are few restrictions in terms of browser support for this version with IE11. See my previous post for the hack on IE11 to get this work done perfectly.

A little background from OBIEE 10g version just to understand why we made decision to go to OBIEE 11g.

  1. OBIEE 10g lacks features of 11g
  2. Platform wise 11g is more robust with enterprise architecture
  3. In 10g we had faced severe issue with memory in 32 bit architecture and in vertical clustering. The idea of vertical clustering is a crappy design with several architectural flaws which we want to get rid of in 11g
  4. We had 2 OBIEE 10g nodes with 12 instances running on each node is a nightmare to manage and control. Moreover it doesn’t use shared cache with max 4 GB memory address limitation in each sawserver and nqserver process which is a support pain to replenish instances with memory issue/leaks.

Okay so lets start a high level setup details:

  1. Install OBIEE on Node 1 – Primary Node
  2. Install OBIEE on Node 2 – Secondary Node
  3. Install OBIEE on Node 3 – Third Node
  4. Apply OBIEE Bundle patches on Node 1 , then Node 2 and then Node 3
  5. Apply customization , if required and ported from 10g
  6. Perform config chances in EM – Failover and Scaleout to add additional 2 nodes
  7. Perform config chances in EM – Failover
  8. Do Integration check and deploy the required RPD/Catalog on Shared NAS/NFS mount

I will explain the entire things in step by step. In my case it is 2 node cluster installation but for 3rd node the process is exactly same and no difference except extra host to be select on EM.

In above customization steps it is NOT required to have the below OS level file changes present as pre-requisite and you can do it later.

* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072

I assumed that you have right packages downloaded from OTN and right patches before continue.

In below steps I am planning to install OBIEE through Unix script using command line parameters so that no GUI is required. I have created my own response file to pass through certain command line steps which will install OBIEE RCU schema first, OBIEE11g and WLS and then perform some 10g ported customization and finally deploy respective catalog/rpd which is updated from 10g.

The scope for this thread is to define the Steps required to Horizontal cluster on multi-nodes. To save all Logs into a target file use command : script obiee_installation_log.txt  and then start command line execution. Finally once execution done type : exit. this will save entire log from buffer to target file and you can review it to see in details.

STEP 1

Execute weblogic command line params to install RCU. This is similar to what we have in GUI. A command line example is below which will drop existing RCU.

rcu -silent -dropRepository -compInfoXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/ComponentInfo.xml -storageXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/Storage.xml -databaseType ORACLE -connectString <Replace with database connect string as db:port:instance> -dbuser <SYDBA Username> -dbrole sysdba -schemaPrefix <SchemaPrefix> -component MDS -component BIPLATFORM -f < /u00/media/ofm_rcu_linux_11/rcuHome/rcuPassword.txt

A command line example is below which will create RCU.

rcu -silent -createRepository -compInfoXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/ComponentInfo.xml -storageXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/Storage.xml -databaseType ORACLE -connectString <Replace with database connect string as db:port:instance> -dbuser <SYDBA Username> -dbrole sysdba -schemaPrefix <SchemaPrefix> -component MDS -component BIPLATFORM -f < /u00/media/ofm_rcu_linux_11/rcuHome/rcuPassword.txt

Note that rcuPassword.txt could be any file which will act as source file for storing Sysdba User/Password and passed through command line.

N.B:- If you see any error like this that means you have trouble creating RCU and you have to perform clean-up this error before this:

9-28-2015 10-55-58 AM

The reason because RCU schema pre-exist and didn’t clean-up well. You have to find the schema name from below table acts as a RCU version history keeper and then delete those records/commit before proceed.

select * from System.SCHEMA_VERSION_REGISTRY$

2. Execute OBIEE installer with response file. A sample example below: This will be similar like doing Enterprise installation using UI.

cd /u00/media/bishiphome/Disk1
unset ORACLE_HOME
echo inventory_loc=$HOME/oraInventory > $HOME/oraInst.loc
echo inst_group=${OS_GROUP} >> $HOME/oraInst.loc
mkdir $HOME/oraInventory

./runInstaller -silent -response /u00/response_file -invPtrLoc $HOME/oraInst.loc -waitforcompletion

This will unset ORACLE_HOME (which is mandatory) , then creating Inventory Location with OS group and then continue installation with response file called response_file

If you are using response_file then for primary node installation below are important parameters you need to change in response_file. You can find the details of standard response for for OBIEE 11g installation in Oracle doc or just google it

DOMAIN_HOSTNAME, ADMIN_USER_NAME, ADMIN_PASSWORD, ADMIN_CONFIRM_PASSWORD,

WLS_SINGLE_SERVER_INSTALL=false ,

MW_HOME, WEBLOGIC_HOME,ORACLE_HOME,INSTANCE_HOME, DOMAIN_HOME_PATH,

DATABASE_CONNECTION_STRING_BI, DATABASE_SCHEMA_USER_NAME_BI, DATABASE_SCHEMA_PASSWORD_BI,DATABASE_CONNECTION_STRING_MDS, DATABASE_SCHEMA_USER_NAME_MDS, DATABASE_SCHEMA_PASSWORD_MDS.

So my OBIEE installation on Primary node completed and I used to get below issue whenever I ran command line / GUI:

[CONFIG] FAILED:Executing: opmnctl start coreapplication_obisch1
[CONFIG]:Modifying BI Configuration Files
[CONFIG] SUCCESS:Modifying BI Configuration Files
Configuration:BI Configuration failed
[CONFIG] Failed.
[ACTION]: BI Configuration

The installation of Oracle AS Common Toplevel Component, Oracle Business Intelligence Shiphome failed.

To avoid this I used to do a tweak and intercept of correcting the opmn.xml  while 40% of the installation is over and once you get this file created. This file located at directory :  /u00/app/Middleware/instances/instance1/config/OPMN/opmn.

Or else you can just keep watching on below during Installation and then make the changes on opmn.xml file:

[CONFIG]:Creating Instance
[CONFIG] SUCCESS:Creating Instance

Just take the backup of existing opmn.xml and then Change its contents…

from:
<process-type id=”OracleBISchedulerComponent” module-id=”CUSTOM”>
– <module-data>
– <category id=”start-parameters”>
<data id=”start-executable” value=”$ORACLE_HOME/bifoundation/server/bin/bischeduler.sh” />
– <!– enable console log to be able to see process startup error –> f(clean);
<data id=”no-stdio” value=”false” />
</category>
– <category id=”stop-parameters”>
<data id=”stop-executable” value=”integrator” />
</category>
– <category id=”ping-parameters”>
<data id=”ping-type” value=”integrator” />
</category>
– <category id=”ready-parameters”>
<data id=”use-ping-for-ready” value=”true” />
</category>
</module-data>
<start timeout=”600″ retry=”1″ />
<stop timeout=”120″ />
<restart timeout=”720″ retry=”1″ />
</process-type>

to:
<process-type id=”OracleBISchedulerComponent” module-id=”CUSTOM”>
<module-data>
<category id=”start-parameters”>
<data id=”start-executable” value=”$ORACLE_HOME/bifoundation/server/bin/bischeduler.sh” />
<!– enable console log to be able to see process startup error
–>
<data id=”no-stdio” value=”false” />
</category>
<category id=”ping-parameters”>
<data id=”ping-url” value=”/”/>
</category>
<category id=”restart-parameters”>
<data id=”reverseping-timeout” value=”345″/>
<data id=”no-reverseping-failed-ping-limit” value=”3″/>
<data id=”reverseping-failed-ping-limit” value=”6″/>
</category>
</module-data>
<start timeout=”300″ retry=”3″/>
<stop timeout=”300″/>
<restart timeout=”300″ retry=”3″/>
<ping timeout=”60″ interval=”600″/>
</process-type>

This will make sure Scheduler start-up will be fine and no issue to Proceed the BI Configurations. This is just a hack a no reason why Oracle haven’t done this default in there installation package. You might need to do it across all secondary nodes.

The idea is you should be getting below to confirm entire Installation is done successfully.

[CONFIG] SUCCESS:Modifying BI Configuration Files
Configuration:BI Configuration completed successfully
The installation of Oracle AS Common Toplevel Component, Oracle Business Intelligence Shiphome completed successfully.

 STEP 2

Now we have to scale-out to Node 2 (secondary) and then 3rd node.

For this in Node 2 we need to make sure:

  1. we will not re-execute RCU commands as we already have RCU in place during primary node installation
  2. we will make changes in response file and make sure we put DOMAIN_HOSTNAME= <primary node Ip/host> and SCALEOUT_BISYSTEM=true
  3. I would prefer to use INSTANCE_NAME=instance2 to identify it is 2nd instance in cluster
  4. All activities will continue here like installing OBIEE / Weblogic , configuration except Admin server installation.
  5. Each additional node in cluster will act like Managed server and its system components and only Primary server will act like Admin and Managed server both
  6. Node 2 installation would be pretty much faster than Node 1. In my experience Primary node takes 30-40 minutes while all secondary nodes will take ~15 minutes to do complete install. Anyway it again vary from system to system based on capacity but I did a tweak on Java memory parameters for faster start-up/shutdown and Installation on OS level.
  7. Performance enhancement for faster start-up/Installation.

Make changes in all OBIEE Node’s . Need root access. No reboot required

  1. Edit or create /etc/sysconfig/rngd to contain:

# Add extra options here

EXTRAOPTIONS=”-r /dev/urandom”

  1. Then “service rngd start”.
  2. If that works, then “chkconfig rngd on” ( to start it at boot ).
  3. Add this on .bash_profile in Unix as below:

export CONFIG_JVM_ARGS=”-Djava.security.egd=file:/dev/./urandom”
export JAVA_OPTIONS=”-Djava.security.egd=file:/dev/./urandom”

So finally in Node 2 the output should be look like this:

9-28-2015 8-17-49 PM

STEP 3

Repeat STEP 2 for Node 3 . It must be similar.

STEP 4

  1. Apply Patches on NODE 1 using Opatch
  2. Apply Patches on NODE 2 using Opatch
  3. Apply Patch on NODE 3

N.B:- You must be thinking why I am not doing performing Installation and patch in one node completely and then proceed to next node. It will not going to work during Scale-out phase of other nodes. The reason because Once Primary node is upgraded with latest bundle patch and you are trying to perform scale-out a version mismatch occurs  for secondary nodes and it is not being able to access the primary nodes module. In this case you will be getting below error:

[2015-09-28T17:57:31.608-04:00] [as] [ERROR] [] [oracle.as.install.bi] [tid: 38] [ecid: 0000L0LxUqL3b6G5uzd9iX1M2RR700000T,0] ERROR: Instance creation failed.[[
oracle.as.provisioning.exception.ASProvisioningException
at oracle.as.provisioning.engine.Config.executeConfigWorkflow_WLS(Config.java:872)
at oracle.as.install.bi.biconfig.standard.StandardWorkFlowExecutor.executeHelper(StandardWorkFlowExecutor.java:31)
at oracle.as.install.bi.biconfig.standard.InstanceProvisioningTask.exec(InstanceProvisioningTask.java:76)
at oracle.as.install.bi.biconfig.standard.InstanceProvisioningTask.doExecute(InstanceProvisioningTask.java:99)
at oracle.as.install.bi.biconfig.standard.AbstractProvisioningTask.execute(AbstractProvisioningTask.java:70)
at oracle.as.install.bi.biconfig.standard.StandardProvisionTaskList.execute(StandardProvisionTaskList.java:66)
at oracle.as.install.bi.biconfig.BIConfigMain.doExecute(BIConfigMain.java:113)
at oracle.as.install.engine.modules.configuration.client.ConfigAction.execute(ConfigAction.java:375)
at oracle.as.install.engine.modules.configuration.action.TaskPerformer.run(TaskPerformer.java:88)
at oracle.as.install.engine.modules.configuration.action.TaskPerformer.startConfigAction(TaskPerformer.java:105)
at oracle.as.install.engine.modules.configuration.action.ActionRequest.perform(ActionRequest.java:15)
at oracle.as.install.engine.modules.configuration.action.RequestQueue.perform(RequestQueue.java:96)
at oracle.as.install.engine.modules.configuration.standard.StandardConfigActionManager.start(StandardConfigActionManager.java:186)
at oracle.as.install.engine.modules.configuration.boot.ConfigurationExtension.kickstart(ConfigurationExtension.java:81)
at oracle.as.install.engine.modules.configuration.ConfigurationModule.run(ConfigurationModule.java:86)
at java.lang.Thread.run(Thread.java:662)
Caused by: oracle.as.provisioning.engine.CfgWorkflowException
at oracle.as.provisioning.engine.Engine.processEventResponse(Engine.java:596)
at oracle.as.provisioning.fmwadmin.ASInstanceProv.createInstance(ASInstanceProv.java:178)
at oracle.as.provisioning.fmwadmin.ASInstanceProv.createInstanceAndComponents(ASInstanceProv.java:116)
at oracle.as.provisioning.engine.WorkFlowExecutor._createASInstancesAndComponents(WorkFlowExecutor.java:523)
at oracle.as.provisioning.engine.WorkFlowExecutor.executeWLSWorkFlow(WorkFlowExecutor.java:439)
at oracle.as.provisioning.engine.Config.executeConfigWorkflow_WLS(Config.java:866)

STEP 5

Lets do a quick sanity check first for Node 1.

top -u orabi should throw below running process on Node 1:

Process-Primary Node

top -u orabi should throw below running process on Node 2.

Process-Secondary Node

We can see Node 1 and 2 added in cluster. this means scale-out is successful and EM recognizes both node.

Also you see some of the processes are down on Node 2 from EM which is fine. Now we will see what additional steps we need to do.

EM

STEP 6

If you are deploying RPD and Catalog it is recommended to deploy it now into share path (which you must have to do if you want 3 nodes to share the same RPD and Catalog).

  • So Go to EM-) Deployment-)Repository. Add new RPD and Catalog after “Lock and Edit”.
  • Apply and Activate changes.
  • For us we have common NAS/NFS mount shared and accessible from 3 OBIEE nodes. It looks like below: (Catalog final name is obfuscated due to security reason)

9-28-2015 4-17-15 PM

Now “Lock and Edit” and Go to EM-) Availability -) Failover -) Make secondary Host like below and Apply, Activate Changes.

cluster

Now perform “Restart All” . This will make all the processes up and running across all nodes. But note, we haven’t yet created OBIEE Managed server system components on secondary nodes.

Go to Capacity management -) Scalability -) Add one components on each secondary nodes. This is nothing but vertical clustering on horizontal cluster. If you add more than one components it will create more than one instance on single host.

Scalability

Note that, more than one system components means more power and it will run more than 1 instance of sawserver, nqserver,Cluster Controller, nqScheduler and Javahost process. Do it if you need it else not required and single instance in each node with 64 bit architecture is enough capable to handle 700-1000 concurrent user request (considering OBIEE performance parameters have been correctly applied)

Hit “Apply” and “Activate Changes” . It will take some time to create additional processes and start them up on secondary nodes. Once this is done successfully then see the list of running process on secondary node by typing top -u orabi in unix session. And observe below that instance2 system components creates under below location on 2nd node.

/u00/app/Middleware/instances/instance2/bifoundation/OracleBIPresentationServicesComponent/coreapplication_obips1

After successful connections Failover EM screen will be like below:

EM-Failover

I have faced several issues with EM not recognizing the new set of components and processes during Restart. In such cases do Individual component restart or try using opmnctl commands to restart opmn managed components else Bounce Primary node at once and then bounce secondary node once. This should resolve most of the problem otherwise it is bigger issue and something wrong during cluster setup process.

Now “Restart All” from EM and see all green means Horizontal Cluster setup process completed successfully 😀 😀

All Green

STEP 7

If you have any customization carried forward from 10g upgrade this is the right time to do that in each nodes one by one starting from Primary. Follow the steps to stop services on Primary and then make customization changes and then start on primary and then stop on secondary , do customization and then start services. This is standard process step by step.

My Observations:

  • Apparently RPD is share under Shared Path but physically RPD located under repository path below under each Nodes and when the opmnctl process starts up it loads the RPD from this physical path. So the concept of Shared RPD applicable when you open RPD in online mode and made some changes the change is reflected online and this location acts as temporary staging location and after changes in RPD become it sync with clustered nodes and propagated the changes.

For Primary: /u00/app/Middleware/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/repository

For Secondary:

/u00/app/Middleware/instances/instance2/bifoundation/OracleBIServerComponent/coreapplication_obis1/repository

  • Even in Horizontal cluster mechanism Cache has been create in individual nodes. Global Cache is a concept which is applicable during Cache seeding only and nowhere else.
  • If you want to do Vertical clustering on top of Horizontal its easy from EM.
  • After Clustering you can access Node 1 using its HostName . If Host1 is down you can still access Node 1 as WLS internal clustering/Load balance will automatically route the request to 2nd Node by the help of Clustered Controller module. If 2nd node also not found it will redirect request to 3rd Node.
  • The idea of WLS Horizonal clustering is High availability which tells even if Admin Server is down in Node 1 , clustered managed servers/nodes still can work and serve user query without having any downtime.
  • Don’t try to do Customization on STEP 7 before STEP 6 (EM and Failover changes plus adding instances on secondary nodes) because you might have some customization which require that instances directory to be created first inside Secondary nodes.
  • Entire activities above can be done using Unix scripts only. For e.g, even the Horizontal cluster/ Failover/ scalability can be done using WLS scripting and invoking python script from inside shell script. For me its easy to use EM UI to do this but certainly end to end steps possible through scripts.
  • You don’t need to have a successful RPD connections to DB exist while doing this as in those steps services will be bounced several times. You just need a basic placeholder RPD
  • You don’t need the tnsnames to be in ../MiddleWareHome/Oracle_BI1/network/admin path as long as you are using IP:port syntax in RPD connection pool. Else you might think about updating tnsnames.ora on that location to get connectivity with RPD if RPD use DB connect string
  • Note, if you have 3rd, 4th nodes to be added in horizontal cluster you can’t have Scheduler process and ClusterController process in system components for any additional nodes beyond 2nd nodes. This is because failover and cluster controller process can be only present in two nodes/hosts/servers and you have to manage the deployment such a way that both primary and secondary node shouldn’t go down at same time. So apart from Primary and 1 Secondary node Failover will not be available with high availability for Scheduler and ClusterController
  • Scheduler will run only on Node 1(Primary) and Node 2 (Secondary) but it will not run on 3rd Node (Secondary) but scheduler query can run on Node 3.