in OBIEE/Weblogic : Read timed out

I can continue write a long thread about this issue that we have faced in regard of this SocketTimeoutException. However this time I will keep in concise.
Here is the issue : two of the prompts in our Production environment suddenly broken which was working fine for past 3-5 months. In between we have made bundle patch upgrade on our env , we did network/firewall level changes so we were not sure why those two prompts started behaving weirdly. Also we found we had similar kind of prompt exist elsewhere which was not facing the vulnerabilities.

By broken, I mean when user click on the prompt they are keep getting “Please wait” appearing in the drop-down menu and the drop-down value never displayed. And this only happens when you are accessing application through web / outside your corporate network and through external DNS. The prompt works fine in internal VPN network.
Since we have thousands of login happen each day, the Production OBIEE environment crashes after couple of days because of this issue. By crashes we mean OBIEE/EM?Console login hangs , you will not find any process went down in opmnctl status -l , all alive , EM / Console even doesn’t show anything red(if you able to login here) determining its a crash .

Pretty interesting ….. !!! Huh ?

When we inspect the console log of browser we have spotted after 2-3 minutes the browser throws below error :

“Internal Server error” Ref :
500 Internal Server Error
ERROR Codes for Reference #3.5e7f1cb8.1547495087.4c779890

The above was Akamai reference number. And for this issue there is nothing to do with Akamai edge or origin server and its cache issue. We have checked Akamai has no issue with cache content

As soon we receive below error the ADR incident log for obiee , bi_Serverx.log and biserver_diagnostic.log throws below :

This appears in Incident log (change the path to be exact for your environment)

Incident Id: 7997
Incident Source: SYSTEM
Create Time: Tue Jan 15 12:11:11 EST 2019
Problem Key: DFW-99998 [][$][analytics]
ECID: e8398b29c4083075:10e4302f:1685274d016:-8000-0000000000008333
Application Name: analytics
User Name: <WLS Kernel>
Error Message Id: DFW-99998

Context Values
DFW_SERVER_NAME : bi_server1
DFW_DOMAIN_NAME : bifoundation_domain
DFW_APP_NAME : analytics
Incident detected using watch rule “UncheckedException”:
Watch time: Jan 15, 2019 12:11:11 PM EST
Watch ServerName: bi_server1
Watch RuleType: Log
Watch Rule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
Watch DomainName: bifoundation_domain
Watch Data:
DATE : Jan 15, 2019 12:11:11 PM EST
SERVER : bi_server1
MESSAGE : [ServletContext@756918633[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException. read is alrady timed out

[WARNING:7] [WL-320068] [Diagnostics] [host: <hostname>] [nwaddr: [] [tid: [ACTIVE].ExecuteThread: ’71’ for queue: ‘weblogic.kernel.Default (self-tuning)’] [userId: <WLS Kernel>] [LOG_FILE: /u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server2/logs/bi_server2.log] Watch ‘UncheckedException’ with severity ‘Notice’ on server ‘bi_server2’ has triggered at Jan 23, 2019 9:36:00 AM EST. Notification details: [[
WatchRuleType: Log
WatchRule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
WatchData: DATE = Jan 23, 2019 9:36:00 AM EST SERVER = bi_server2 MESSAGE = [ServletContext@1881523291[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException. Read timed out
at Method)
at weblogic.servlet.internal.FilterChainImpl.doFilter(
at Method)
at weblogic.servlet.internal.FilterChainImpl.doFilter(
at oracle.dms.servlet.DMSServletFilter.doFilter(
at weblogic.servlet.internal.FilterChainImpl.doFilter(
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(
at weblogic.servlet.internal.WebAppServletContext$
at weblogic.servlet.internal.WebAppServletContext.securedExecute(
at weblogic.servlet.internal.WebAppServletContext.execute(
SUBSYSTEM = HTTP USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: ’38’ for queue: ‘weblogic.kernel.Default (self-tuning)’ MSGID = WL-101017 MACHINE = <hostname> TXID = CONTEXTID = 0000MXv8etiFKAw_wDCCyW1SGacg003qNu TIMESTAMP = 1548254160834
WatchAlarmType: AutomaticReset
WatchAlarmResetPeriod: 30000


As you are seeing there are two variations of the log :

1) Read timed out
2) read is alrady timed out (note: the typo is not me and it is writing by the OBIEE product itself ! )

Essentially both are same and incident log says this is your problem key : “Problem Key: DFW-99998 [][$][analytics]”

When we started investigating we have seen the prompt code is not special and we have used regular SQL results with SELECT statement and UNION statement.
Most weird part of if we use union all (as lowercase) it works and if we use FETCH ONLY 650001 ROWS clause at the end of the select statement of prompt logical query it works absolutely fine. Again these are the interesting facts which we can contradict later but these are our findings.

Here is the solution :

After going through several layers of network trace , Akamai , IDP Ping SSO , External / Internal DNS check, Checking the packets transfer across firewall , tracing the various logs we are seeing that when you click prompt the request headers even not passing through presentation server session log so its being stuck somewhere in the network and network is not able to process the request . Finally we came to know that we had security layer incorporated by Secureworks (that is a Intrusion Prevention System) which intercept the OBIEE browser POST request header and when it sees the SQL Injections operation (by SELECT or UNION ) via its Intrusion Prevention rule it blocks the inflow of traffic and that stops passing the request to other network layers and cause Weblogic server to go in unknown state. We had to allow those SQL rules across network as white listed traffic to stop this “SocketTimeoutException”.
As soon as we did this those 2 prompts started working fine but the “SocketTimeoutException” didn’t completely goes away . However the volume of this Exception reduced by 100x, 200x and we have seen no sign of crashing the OBIEE platform.

So in Summary :

Root Cause:

The Intrusion Prevention System misidentifying application traffic as malicious.

 Contributing factors:

  • OBIEE uses raw SQL statements for normal application functionality causing the IPS to misidentify the traffic as malicious.
  • OBIEE products do not utilize TLS.
  • Specific filter in the OBIEE, using those filters generate the socket timeout exception which create cascading effect having the product to become unresponsive
  • The server encountered an internal error or mis-configuration and was unable to complete the request.

Unfortunately, because of this Network issue, being in application team, we had to loads of hours to do root cause analysis before we figure this out. Facts of life for developers !!!

Life of OBIEE Catalog object permission – A niche Subject “ACL” cache

Access Control List(ACL) is very common terminology and not confined precisely within OBIEE. Several software use ACL to define security object permissions.An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects. We have seen ACL when we edit Objects permission or open Catalog object using Catalog Manager. I have noticed “preserve ACL” in OBIEE 10g Catalog Manager if you want to apply permission across objects. Also noticed ACL again on 11g while we embarked a very strange/weird and complicated issue.

We have OBIEE 11g .7 version 140415 running on production since launch past 5 months. Recently couple of customer raised concern about page visibility for different kind of subscription. We have customer could use multiple subscription which can be switched from a portal and based on that switch specific page should have to be visible. However we have experience a issue when a user logged in and seeing Dashboard A and logout and relogin again with different subscription Dashboard A still visible while Dashboard B should have to be visible.

Initially how you attack this problem ? I know what you are thinking of but we have very complicated system with several layers before you land to dashboard page. We have One portal which redirects to IDP (performing TCping Authentication) ,then control goes on Horizontal Cluster nodes via Load Balancer and then control goes to Analytics. So there are 3 way channels before you are landing to OBIEE page. On top of this we have deprecated 11g security system running initialization block. I know this is damn bad and Oracle recommended but we have to live with the crap because Oracle said this can still be functional. So in summary we have no Application Roles / Groups setup on EM and Console and all object based permission defined under Catalog is legacy Catalog Group permission.

So there are various aspect and point we need to carefully verify before we attack this problem. I started debugging this issue from Catalog Object level permission first , nothing seems weird but did little bit tweak here and there and it didn’t resolve the issue.

We thought it could be SSO Ping Auth issue which is bypassing the RPD security and not getting GROUPS variable populated right way. So we added WEBGROUPS and ROLES system variables in RPD and enabled all level of trace on EM and Console on oracle bi security objects including the most granular level of LOGLevel trace in instanceconfig.xml file.

We have found after logout of initial session and login new session is not picking up the permission correctly and it retains the old permission on object.

There are security masking logic on Read , Write and All permissions in each level of Catalog object while can be traceable in sawlog file which clearly depicts the issue ! But unfortunately we don’t have smoking gun yet to resolve the problem.

A search in OTN and googling around doesn’t seem to be matching to right use case ! this was a frustrating issue which bother for 4 days until actual fix has been found and Finally we have isolated our environment from SSO , Cluster , Load balancer everything to segregate the root cause of the issue. We came to know its OBIEE 11g issue for sure. We have tried several areas of tweak to fix this issue mostly on file system level , cleaning cache , clean volatileuserdata, removing cacheduserinfo, removing that user content and re-instate, GUID refresh and nothing works and we were almost out of ideas what to do next 😦 😦

FINALLY there is an idea triggered on my mind if there is any configuration can be done on refresh level which could potentially refresh dashboard object including the permission could resolve the issue. And I realize only Instanceconfig.xml is a place where you can declare on such parameter. So in search of some parameter I found below:


This will reload catalog object permission on 1 minutes interval . So potentially this could fix my issue.

And it Voila… it fixed my problem … Now right user getting correct permissions on multiple subscription switch. But still its not end of the game ! WHY ? Yes, we have seen catalog object refreshing in 1 minutes interval but my problem reoccur if multiple login/logout happens within 1 minute of interval …. Sad !! So this is not going to be a solution. Also reload catalog object frequently is not an good idea as SAW server does a hard IO work each time this happens to crawl across catalog and refresh object permission. Also there is no instanceconfig.xml parameter exist to lowered it down to seconds level  😦

Adding this parameter is as good as “Reload server metadata trigged” by Admin from analytics Admin link.  So I was in search how to invoke this automatically rather than required on demand.

Investing more energy on that reveals what is this object permission and seems closely related to ACL and we came to know below details how OBIEE handle the users in Catalog:

The catalog is designed to scale to thousands of concurrent users. To achieve this scaling, the catalog adheres to the following guidelines:

  • The average user typically only reads from the catalog and rarely, if ever, writes to it. In Release 11g, each user is constantly and automatically updating his or her Most Recently Used file, but each user’s “read” operations still far outweigh the user’s “writes” operations. Therefore, the read-to-write ratio is typically at least 100 to 1.
  • While a locking mechanism guarantees that only one user can write to an object at a time, it is rare for multiple users to attempt to write simultaneously to the same object. A feature called “lazy locking” allows users to continue reading an object even when another user is updating that object.
  • Modern file systems cache “small” files directly inside the directory record, such that reading any information on a directory simultaneously loads all small files directly into the operating system’s memory cache. Therefore, it is good practice to keep files in the catalog “small,” especially the frequently “read” .atr metadata files. When these metadata files remain small, then all the .atr files in a directory are loaded into memory with one physical hard disk read. Every file that exceeds the “small” threshold adds another physical hard disk read, which can cause a 100% degradation for each large file. In other words, use care when considering storing arbitrary “Properties” in .atr files.
  • Reading an object’s .atr metadata file using NFS is far slower than reading it directly from a local disk. For this reason, Presentation Services additionally caches all .atr files internally. This cache can become briefly “stale” when another node in the cluster writes data to the file that is newer than the data that is cached by the current node. Therefore, all nodes are refreshed according to the MaxAgeMinutes element in the instanceconfig.xml, whose default for a cluster is 5 minutes. This default setting commonly achieves the best trade-off between the possibility of stale data and the known performance impact. (The default for an environment without clusters is 60 minutes.)

Ours are pretty similar configuration of having millions of .atr file (as we have very large user base in production) in NAS storage continuously accessed by Sawserver catalog crawler.

Looking much on ACL on OTN reveals not same but similar kind of issue and I came to know about bunch of very important config parameters listed at the end of this blog.

Now I have full fledged smoking gun which could potentially resolve my issue which is below:



So I applied in instanceconfig.xml to see the results and unfortunately sawserver was not started as this is unrecognized tag ! When I see OTN carefully and I found this can be added only 140715 version on wards and not less . So we have asked Oracle to provide patch for 140415. After following up Oracle with Sev 1 issue with several escalation we finally got the fix . We applied it , add this params in instanceconfig and it resolves the issue…. 🙂 🙂  Oracle provided the fix with 3 files , one xsd and 2 saw binary files (.so files) . So the long running nightmare is over !!! 🙂 I am sooo happy…

So we know NOW what is ACL Cache and how it could potentially behave weird and cause lot of trouble to our life and we know what should be the way to get rid of the problem around it. See below some of the other important reference on Cache under instanceconfig.xml file. Recommend is if you Disable ACL cache which Oracle confirmed has no performance impact.


Use those judiciously if you know what is that for and later don’t burn your hands … 😉







<!– this cache is NOT related to the security stack It is only for the account info in the webcat itself –>

<!– this cache is NOT related to the security stack It is only for the account info in the webcat itself –>

<!– This CatalogAttributes cache contains all webcat objects’ metadata (the owner, permissions, properties, attributes etc).
The internal permissions have a their own cache as to who is allowed to access the webcat object.

So to cleanup the internal permissions cache, this CatalogAttributescache must also be cleaned. –>






OBIEE 11g – Horizontal Clustering

Hola … I was thinking about to post this thread from long time (since last 5 months) as I have been extensively involved in a OBIEE 11g platform build from scratch, playing with 3 nodes OBIEE installation , Vertical / horizontal clustering et all.

Now the game is over and I want to detail out each and every steps in details and the hurdles I have faced during step by step setup execution.

Honestly before writing this blog thread I have searched across web and googling around several sites to get consolidated Clustering steps and I never found it. This is nothing mean to wrong to other blogger who are posting excellent stuff in their blogs since ages but I feel each post on Horizontal Cluster lacks some information on details.

Idea is to perform 3 nodes horizontal clustering with one OBIEE instance running on each of the nodes. OBIEE version to be installed in with bundle patch version of p18283508. which will make final version to : (Build 140402.1431 64-bit)

There are few restrictions in terms of browser support for this version with IE11. See my previous post for the hack on IE11 to get this work done perfectly.

A little background from OBIEE 10g version just to understand why we made decision to go to OBIEE 11g.

  1. OBIEE 10g lacks features of 11g
  2. Platform wise 11g is more robust with enterprise architecture
  3. In 10g we had faced severe issue with memory in 32 bit architecture and in vertical clustering. The idea of vertical clustering is a crappy design with several architectural flaws which we want to get rid of in 11g
  4. We had 2 OBIEE 10g nodes with 12 instances running on each node is a nightmare to manage and control. Moreover it doesn’t use shared cache with max 4 GB memory address limitation in each sawserver and nqserver process which is a support pain to replenish instances with memory issue/leaks.

Okay so lets start a high level setup details:

  1. Install OBIEE on Node 1 – Primary Node
  2. Install OBIEE on Node 2 – Secondary Node
  3. Install OBIEE on Node 3 – Third Node
  4. Apply OBIEE Bundle patches on Node 1 , then Node 2 and then Node 3
  5. Apply customization , if required and ported from 10g
  6. Perform config chances in EM – Failover and Scaleout to add additional 2 nodes
  7. Perform config chances in EM – Failover
  8. Do Integration check and deploy the required RPD/Catalog on Shared NAS/NFS mount

I will explain the entire things in step by step. In my case it is 2 node cluster installation but for 3rd node the process is exactly same and no difference except extra host to be select on EM.

In above customization steps it is NOT required to have the below OS level file changes present as pre-requisite and you can do it later.

* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072

I assumed that you have right packages downloaded from OTN and right patches before continue.

In below steps I am planning to install OBIEE through Unix script using command line parameters so that no GUI is required. I have created my own response file to pass through certain command line steps which will install OBIEE RCU schema first, OBIEE11g and WLS and then perform some 10g ported customization and finally deploy respective catalog/rpd which is updated from 10g.

The scope for this thread is to define the Steps required to Horizontal cluster on multi-nodes. To save all Logs into a target file use command : script obiee_installation_log.txt  and then start command line execution. Finally once execution done type : exit. this will save entire log from buffer to target file and you can review it to see in details.


Execute weblogic command line params to install RCU. This is similar to what we have in GUI. A command line example is below which will drop existing RCU.

rcu -silent -dropRepository -compInfoXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/ComponentInfo.xml -storageXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/Storage.xml -databaseType ORACLE -connectString <Replace with database connect string as db:port:instance> -dbuser <SYDBA Username> -dbrole sysdba -schemaPrefix <SchemaPrefix> -component MDS -component BIPLATFORM -f < /u00/media/ofm_rcu_linux_11/rcuHome/rcuPassword.txt

A command line example is below which will create RCU.

rcu -silent -createRepository -compInfoXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/ComponentInfo.xml -storageXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/Storage.xml -databaseType ORACLE -connectString <Replace with database connect string as db:port:instance> -dbuser <SYDBA Username> -dbrole sysdba -schemaPrefix <SchemaPrefix> -component MDS -component BIPLATFORM -f < /u00/media/ofm_rcu_linux_11/rcuHome/rcuPassword.txt

Note that rcuPassword.txt could be any file which will act as source file for storing Sysdba User/Password and passed through command line.

N.B:- If you see any error like this that means you have trouble creating RCU and you have to perform clean-up this error before this:

9-28-2015 10-55-58 AM

The reason because RCU schema pre-exist and didn’t clean-up well. You have to find the schema name from below table acts as a RCU version history keeper and then delete those records/commit before proceed.

select * from System.SCHEMA_VERSION_REGISTRY$

2. Execute OBIEE installer with response file. A sample example below: This will be similar like doing Enterprise installation using UI.

cd /u00/media/bishiphome/Disk1
echo inventory_loc=$HOME/oraInventory > $HOME/oraInst.loc
echo inst_group=${OS_GROUP} >> $HOME/oraInst.loc
mkdir $HOME/oraInventory

./runInstaller -silent -response /u00/response_file -invPtrLoc $HOME/oraInst.loc -waitforcompletion

This will unset ORACLE_HOME (which is mandatory) , then creating Inventory Location with OS group and then continue installation with response file called response_file

If you are using response_file then for primary node installation below are important parameters you need to change in response_file. You can find the details of standard response for for OBIEE 11g installation in Oracle doc or just google it





So my OBIEE installation on Primary node completed and I used to get below issue whenever I ran command line / GUI:

[CONFIG] FAILED:Executing: opmnctl start coreapplication_obisch1
[CONFIG]:Modifying BI Configuration Files
[CONFIG] SUCCESS:Modifying BI Configuration Files
Configuration:BI Configuration failed
[CONFIG] Failed.
[ACTION]: BI Configuration

The installation of Oracle AS Common Toplevel Component, Oracle Business Intelligence Shiphome failed.

To avoid this I used to do a tweak and intercept of correcting the opmn.xml  while 40% of the installation is over and once you get this file created. This file located at directory :  /u00/app/Middleware/instances/instance1/config/OPMN/opmn.

Or else you can just keep watching on below during Installation and then make the changes on opmn.xml file:

[CONFIG]:Creating Instance
[CONFIG] SUCCESS:Creating Instance

Just take the backup of existing opmn.xml and then Change its contents…

<process-type id=”OracleBISchedulerComponent” module-id=”CUSTOM”>
– <module-data>
– <category id=”start-parameters”>
<data id=”start-executable” value=”$ORACLE_HOME/bifoundation/server/bin/” />
– <!– enable console log to be able to see process startup error –> f(clean);
<data id=”no-stdio” value=”false” />
– <category id=”stop-parameters”>
<data id=”stop-executable” value=”integrator” />
– <category id=”ping-parameters”>
<data id=”ping-type” value=”integrator” />
– <category id=”ready-parameters”>
<data id=”use-ping-for-ready” value=”true” />
<start timeout=”600″ retry=”1″ />
<stop timeout=”120″ />
<restart timeout=”720″ retry=”1″ />

<process-type id=”OracleBISchedulerComponent” module-id=”CUSTOM”>
<category id=”start-parameters”>
<data id=”start-executable” value=”$ORACLE_HOME/bifoundation/server/bin/” />
<!– enable console log to be able to see process startup error
<data id=”no-stdio” value=”false” />
<category id=”ping-parameters”>
<data id=”ping-url” value=”/”/>
<category id=”restart-parameters”>
<data id=”reverseping-timeout” value=”345″/>
<data id=”no-reverseping-failed-ping-limit” value=”3″/>
<data id=”reverseping-failed-ping-limit” value=”6″/>
<start timeout=”300″ retry=”3″/>
<stop timeout=”300″/>
<restart timeout=”300″ retry=”3″/>
<ping timeout=”60″ interval=”600″/>

This will make sure Scheduler start-up will be fine and no issue to Proceed the BI Configurations. This is just a hack a no reason why Oracle haven’t done this default in there installation package. You might need to do it across all secondary nodes.

The idea is you should be getting below to confirm entire Installation is done successfully.

[CONFIG] SUCCESS:Modifying BI Configuration Files
Configuration:BI Configuration completed successfully
The installation of Oracle AS Common Toplevel Component, Oracle Business Intelligence Shiphome completed successfully.


Now we have to scale-out to Node 2 (secondary) and then 3rd node.

For this in Node 2 we need to make sure:

  1. we will not re-execute RCU commands as we already have RCU in place during primary node installation
  2. we will make changes in response file and make sure we put DOMAIN_HOSTNAME= <primary node Ip/host> and SCALEOUT_BISYSTEM=true
  3. I would prefer to use INSTANCE_NAME=instance2 to identify it is 2nd instance in cluster
  4. All activities will continue here like installing OBIEE / Weblogic , configuration except Admin server installation.
  5. Each additional node in cluster will act like Managed server and its system components and only Primary server will act like Admin and Managed server both
  6. Node 2 installation would be pretty much faster than Node 1. In my experience Primary node takes 30-40 minutes while all secondary nodes will take ~15 minutes to do complete install. Anyway it again vary from system to system based on capacity but I did a tweak on Java memory parameters for faster start-up/shutdown and Installation on OS level.
  7. Performance enhancement for faster start-up/Installation.

Make changes in all OBIEE Node’s . Need root access. No reboot required

  1. Edit or create /etc/sysconfig/rngd to contain:

# Add extra options here

EXTRAOPTIONS=”-r /dev/urandom”

  1. Then “service rngd start”.
  2. If that works, then “chkconfig rngd on” ( to start it at boot ).
  3. Add this on .bash_profile in Unix as below:

export JAVA_OPTIONS=””

So finally in Node 2 the output should be look like this:

9-28-2015 8-17-49 PM


Repeat STEP 2 for Node 3 . It must be similar.


  1. Apply Patches on NODE 1 using Opatch
  2. Apply Patches on NODE 2 using Opatch
  3. Apply Patch on NODE 3

N.B:- You must be thinking why I am not doing performing Installation and patch in one node completely and then proceed to next node. It will not going to work during Scale-out phase of other nodes. The reason because Once Primary node is upgraded with latest bundle patch and you are trying to perform scale-out a version mismatch occurs  for secondary nodes and it is not being able to access the primary nodes module. In this case you will be getting below error:

[2015-09-28T17:57:31.608-04:00] [as] [ERROR] [] [] [tid: 38] [ecid: 0000L0LxUqL3b6G5uzd9iX1M2RR700000T,0] ERROR: Instance creation failed.[[
Caused by:


Lets do a quick sanity check first for Node 1.

top -u orabi should throw below running process on Node 1:

Process-Primary Node

top -u orabi should throw below running process on Node 2.

Process-Secondary Node

We can see Node 1 and 2 added in cluster. this means scale-out is successful and EM recognizes both node.

Also you see some of the processes are down on Node 2 from EM which is fine. Now we will see what additional steps we need to do.



If you are deploying RPD and Catalog it is recommended to deploy it now into share path (which you must have to do if you want 3 nodes to share the same RPD and Catalog).

  • So Go to EM-) Deployment-)Repository. Add new RPD and Catalog after “Lock and Edit”.
  • Apply and Activate changes.
  • For us we have common NAS/NFS mount shared and accessible from 3 OBIEE nodes. It looks like below: (Catalog final name is obfuscated due to security reason)

9-28-2015 4-17-15 PM

Now “Lock and Edit” and Go to EM-) Availability -) Failover -) Make secondary Host like below and Apply, Activate Changes.


Now perform “Restart All” . This will make all the processes up and running across all nodes. But note, we haven’t yet created OBIEE Managed server system components on secondary nodes.

Go to Capacity management -) Scalability -) Add one components on each secondary nodes. This is nothing but vertical clustering on horizontal cluster. If you add more than one components it will create more than one instance on single host.


Note that, more than one system components means more power and it will run more than 1 instance of sawserver, nqserver,Cluster Controller, nqScheduler and Javahost process. Do it if you need it else not required and single instance in each node with 64 bit architecture is enough capable to handle 700-1000 concurrent user request (considering OBIEE performance parameters have been correctly applied)

Hit “Apply” and “Activate Changes” . It will take some time to create additional processes and start them up on secondary nodes. Once this is done successfully then see the list of running process on secondary node by typing top -u orabi in unix session. And observe below that instance2 system components creates under below location on 2nd node.


After successful connections Failover EM screen will be like below:


I have faced several issues with EM not recognizing the new set of components and processes during Restart. In such cases do Individual component restart or try using opmnctl commands to restart opmn managed components else Bounce Primary node at once and then bounce secondary node once. This should resolve most of the problem otherwise it is bigger issue and something wrong during cluster setup process.

Now “Restart All” from EM and see all green means Horizontal Cluster setup process completed successfully 😀 😀

All Green


If you have any customization carried forward from 10g upgrade this is the right time to do that in each nodes one by one starting from Primary. Follow the steps to stop services on Primary and then make customization changes and then start on primary and then stop on secondary , do customization and then start services. This is standard process step by step.

My Observations:

  • Apparently RPD is share under Shared Path but physically RPD located under repository path below under each Nodes and when the opmnctl process starts up it loads the RPD from this physical path. So the concept of Shared RPD applicable when you open RPD in online mode and made some changes the change is reflected online and this location acts as temporary staging location and after changes in RPD become it sync with clustered nodes and propagated the changes.

For Primary: /u00/app/Middleware/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/repository

For Secondary:


  • Even in Horizontal cluster mechanism Cache has been create in individual nodes. Global Cache is a concept which is applicable during Cache seeding only and nowhere else.
  • If you want to do Vertical clustering on top of Horizontal its easy from EM.
  • After Clustering you can access Node 1 using its HostName . If Host1 is down you can still access Node 1 as WLS internal clustering/Load balance will automatically route the request to 2nd Node by the help of Clustered Controller module. If 2nd node also not found it will redirect request to 3rd Node.
  • The idea of WLS Horizonal clustering is High availability which tells even if Admin Server is down in Node 1 , clustered managed servers/nodes still can work and serve user query without having any downtime.
  • Don’t try to do Customization on STEP 7 before STEP 6 (EM and Failover changes plus adding instances on secondary nodes) because you might have some customization which require that instances directory to be created first inside Secondary nodes.
  • Entire activities above can be done using Unix scripts only. For e.g, even the Horizontal cluster/ Failover/ scalability can be done using WLS scripting and invoking python script from inside shell script. For me its easy to use EM UI to do this but certainly end to end steps possible through scripts.
  • You don’t need to have a successful RPD connections to DB exist while doing this as in those steps services will be bounced several times. You just need a basic placeholder RPD
  • You don’t need the tnsnames to be in ../MiddleWareHome/Oracle_BI1/network/admin path as long as you are using IP:port syntax in RPD connection pool. Else you might think about updating tnsnames.ora on that location to get connectivity with RPD if RPD use DB connect string
  • Note, if you have 3rd, 4th nodes to be added in horizontal cluster you can’t have Scheduler process and ClusterController process in system components for any additional nodes beyond 2nd nodes. This is because failover and cluster controller process can be only present in two nodes/hosts/servers and you have to manage the deployment such a way that both primary and secondary node shouldn’t go down at same time. So apart from Primary and 1 Secondary node Failover will not be available with high availability for Scheduler and ClusterController
  • Scheduler will run only on Node 1(Primary) and Node 2 (Secondary) but it will not run on 3rd Node (Secondary) but scheduler query can run on Node 3.

Performance Tuning and OBIEE 10/11g

Hello friends, after long time , I am again back on track … I was quite busy with couple of my mandatory schedules for last couple of weeks!

So today my topic is what are the Performance tuning key measures useful while doing report Performance testing. There are lots and I just tried to pull most of them and cover the highlights which would be very helpful while doing the Report Performance testing. Actually this could be separated across several areas like sometime it would be from DB perspective, sometime Environment parameters and sometime it would be the Repository design perspective. However the target objective is same but we need to work well to understand the performance bottleneck from which angles! This will definitely come with the experience and the different challenging environment and every places different rules needs to be applied.

Database Environment Framework:

  • RAC/Cluster / Grid based in NAS and SAN storage – This is recommended in Enterprise Architecture Platform with robust and performance resilient framework.
  • Exadata / Greenplum Architecture – For high performance and high throughput the database Architecture should be robust and especially when there lot of in memory operations intended for better performance especially for transactional analysis.
  • Memory SGA/PGA/Buffer Cache/ Temp, Undo, Redo tablespaces should be optimized using the best practice guided suitable for the underlying hardware.
  • 64 bit Processor (Multicore- Parallel server) and hyper-thread architecture required for better performance on CPU based calculations

Database Tuning Aspects:

  • Bitmap Index / Unique Index (lookup Explain Plan for less Query Cost and Index required suitable for your scenario …typically for all the join keys involved in Dimensional Star modeling)
  • Partitioning is very important and key areas to get most effective performance benefit as most of the analysis involve the time dimension.
  • Gather Statistics or analyze 30% sample of table at-least for Cost Based Optimizer to use, otherwise Rule Based Optimizer will be used . Oracle recommendation to use DBMS_STATS.AUTO_SAMPLE_SIZE for fastest gathering of stats.
  • DB Compression significantly reduces DB block read and enhance performance
  • De-normalization required rather Normalized Snowflakes
  • Parallel query(DB Environment parameter) to get benefit of report accessed on partitioned data leveraging the parallel CPU processing speeds
  • Faster Aggregation – enable Oracle DB Star Transformation (Bitmap join Indexes)

Tuning OBIEE Environment :

  • Multinode Clustering – Horizontal/Vertical scale-out in terms of multiple Admin and Managed Server in Weblogic domain.
  • Leveraging BI Server Caching to get most benefits out of the Cached queries and return the same result without hitting the database multiple times.
  • Switch off Usage Tracking to avoid extra DB operations running against each query.
  • Tuning WLS Param / JVM memory parameters


  • Remove Snowflake across dimensions and create the Foreign Key for each Dim into Fact to create Star model
  • Bitmap Index for all Foreign Key columns (for Summary reports) and Bitmap Join Index (lowest granularity based query for detailed data)
  • NULL and NOT Null column in sync with RPD and Database to avoid full outer join with NVL
  • Number of Elements in OBIEE hierarchy definition to determine efficiency of Aggregated table
  • Fragmentation specification wherever required to logically hit different LTS based on fragmented data value sets
  • Limit the number of Initialization blocks to avoid the login performance degradation. However this feature improved in 11g to allow the database query fired once when that corresponding Values referred by Init block hitted.
  • Setting the query limit for a certain number of records and certain time window helps performance improvement for long running Ad-hoc analysis.
  • Limit the Concurrent user access by setting additional parameters
  • Number of DB connection to open settings in connection pool
  • Turn off higher Query log level to avoid File system IO based operations and performance degradation.
  • Manage optimally some key values like MAX_SESSION_LIMIT (to specify how long DB connections remain open with the accessed session), SERVER_THREAD_RANGE , DB_GATEWAY_THREAD_RANGE (DB and Server execution threaded execution)



  • Charting thread and Queue related parameters amendment (MaxQueue, MaxThreads)
  • Cache related tunable parameters:

<!– Remove from the cache everything older than N minutes –>




<!– AbsoluteMaxEntries is the enforced maximum number of entries. When this maximum is reached –>

<!– subsequent queries will fail until the maximum is no longer exceeded. –>


<!– CruiseEntries is amount of entries the OracleBI Presentation server tries to maintain in its cache. –>


<!– Forces the cache to attempt to remove an old entry when MaxEntries is exceeded. –>


<!– Cleanup this cache every N minutes. –>



  • Aggregated Tables /Mviews – Follow this approach if there are huge volume of data in detail fact table. Multi-level aggregated table based design return best results.
  • Typically the biggest gains for “Detail” reports performance is :

    •Star Transformation

    •Bitmap Join Indexes


  • The biggest gains for “Summary” reports:

    •Gathering Stats 30%


    •Star Transformation


OBIEE 11g – The Starters & the Concerns








Ω Ω Ω Ω Ω ……. Good day everybody and welcome back……… Ω Ω Ω Ω Ω

As promised that I will post a couple of interesting stuff down the line for next couple of days …Here it begins .. 🙂
In my recent execution of a OBIEE 11g development project several unforeseen events have been experienced which is either a product defect or an enhancement issue encompassing couple of 11g new features or the existing feature having enhancement issue. Most of them are from UI perspective and some of them from Repository perspective. I like to jot down those points that could be useful for future reference. All below issues noted are as per my observations that might not necessarily true as those depends on the situation and the environment where I have faced the problem.
I am quite sure there will be tons of problems on the way while upgrading the 10g to 11g especially from the customization perspective and architecture perspective. I will update it as soon as I will be facing such concern down the line …. !

♦ Issue #1

While  added a filter on Top 20 TRx measure value and then edit the same measure value from the report filter as below:
IFNULL(NNI Weekly.Curr Week TRx,0)  is in top  20
Changed the expression and after removing IFNULL I can see “Is in Top” option vanishes. So you can’t choose numbers anymore that in which ranges it belongs to.

♦ Issue #2

While restarting BI Components (Presentation Server, BI Server and Scheduler) from OFMW enterprise manager console it suddenly throws errors at the end of restart process. However the log and other information are not meaningful but the entire things were up and running behind the scene. Not really sure about what is the meaning of the error.
But it seems there is a communication delay between BI Server components and the Windows Process manager OPMN and once that delay is over all the components work perfectly in sync. Until OPMN also up and sync with BI components you will get error “Error 500–Internal Server Error” in bieehome page URL.

♦ Issue #3

Alternate inner coloring in pivot table row format actually overrides the measure column format label color specifications format once that measure bring from column section to row section.

♦ Issue #4

While you edit the page all the hidden page also visible and can also be edited like changing the entire page layout components. I think this is intentional and not an issue rather it gives the flexibility of altering the components even if those are hidden in actual dashboard page.

♦ Issue #5

The dashboard page level slider prompt control took unnecessary vertical whitespace from the end of the prompts till the Apply/Reset button causing the much more vertical section space for prompt. So even if, inside the control it is specified that the Button will be placed to the side of the prompt it doesn’t behave like that. So I think this is a bug to fix.

♦ Issue #6

Editing the compound layout from the dashboard page layout where it is specified to expose the view for say “Compound layout 3” rather default compound layout it should open the Layout 3 in the edit page. But it is not the case currently and always opens the default compound layout. I know, even the components specific editing feature was not there in 10g as well. But as a developer we could expect much more on that which helps our life cool.

♦ Issue #7

I encountered intermittently “Bad XML instance” while adding multiple prompt values in report filters and it doesn’t allow saving that report. Don’t know the reason about it but once I log off and log back in again everything is fine and I can even add those prompt criteria’s without issue.

♦ Issue #8

OFMW Enterprise Manager console weblogic  user is not case-sensitive and due to that reason when you enter using the user ‘WEBLOGIC’ say (all in caps though actual user is ‘weblogic’) you can logged in see all the pages and options and it allow to lock and edit the configuration button press. But once you click that you will see all the fields are still not editable and there will be no button called ‘ Release Configuration’ or ‘Activate Changes’ .So either it should allow all the functionalities or not allow to log into EM console at all .

♦ Issue #9

Changing the prompt text/label color and style will change the default section value of the prompt format as well. For e.g. if we change the prompt text color it will change the default value set in the prompt as well.

♦ Issue #10

Pivot table exclude is possible but not include in the non-designer mode. For e.g. once your report is ready and placed into the dashboard, right-click on some column and click to exclude some column from the pivot table display. You can do it. But you can’t get those excluded column back from the same section. Instead of that you have to click on reset page display or ‘Clear My Customization’ to get those column back into the pivot reports.

♦ Issue #11

Add to briefing book option is not displayed even if it is enabled in the dashboard layout against the displayed components. The problem happens when, instead of displaying the default compound layout or other compound layouts you are displaying the view like ‘View Selector’ etc.

♦ Issue #12

OBIEE 11g User Interface (UI) Performance Is Extremely Slow with Internet Explorer 8 (- IE version 8 – ) .Follow my previous post to get the details. Due to the browser problem several issues like below you can find while developing your reports.

♦ Changing the colors of the measure attribute behaves erratically and it doesn’t reflect on time in inline view and once the reports have been saved it is back on color.

♦ Sometime compound layout views displayed only without any control activated and hence browser needs to be closed after log-off and only solution to log back in.

♦ Issue can be appeared in Style Formatting for Graph. The spikes and the pointers (triangle, round) doesn’t display correctly.

♦ Issue #13

Saving the online 11g BI repository after modification of several components inside it ,the consistency checker failed throwing a weird kind of error message below:
[nQSError: 37005] Transaction update failed.
That is documented under Bug 9884975 (11G ADMIN TOOL RECEIVES NQSERROR: 37005 WHEN SAVING ONLINE RPD CHANGES  – version  Microsoft Windows (32-bit) )
And the resolution Patch 12909840 – GETTING ERROR: [46036] INTERNAL ASSERTION: CONDITION M_LOCK.ISACQUIRED() available on request from Oracle.

♦ Issue #14

11g RPD has thrown the error message while query on object from Physical layer and double-click that object to get the details.BI admin tool has thrown error “Object is orphaned and should be deleted”. The cause is unknown however I search the object and find it using pair of eyes and double clicking it is fine ! So weird no?

♦ Issue #15

I have encountered a strange error appeared while playing with OBIEE 11g repository. The error message as below:
“nQSError: 43113” – Message returned from OBIS
“nQSError: 46036” – Internal Assertion: Condition COLOCK_ISSET (value) , file server \objectmodel\Src\SORpObject.cpp , line 3
I have not found any such error logged in Oracle Support and not sure about the reason. However the problem resolved after closing the Admin tool without saving the changes and re-opening it!

♦ Issue #16

Graph slider (11g new feature) causes some unpredictable issues if you like to play with it more and more!(changing the sort order , randomly moved the column across several places in layout view , changing the measure data/column format). I had  faced some unpredictable issues and in place of Graph slider it displayed like an error with message and my slider control vanishes. However I got it back after 2-3 days after several restart. Might be a kind of rendering issue in IE8 is responsible for that. Now I switched to Firefox hence not having any problem until now.

♦ Issue #17

While adding the Groups (11g new features) inside catalog level, the values doesn’t change automatically until and unless Cancel button has been pressed. Once u click on Cancel button hold your mouse button and come to other section and release the button pressed you can see the value changed.

Ω Ω Ω Ω Ω ………..Until next.. Good Bye!  Stay tuned…………Ω Ω Ω Ω Ω