OBIEE Agents/Alert Subscription using Java Webservice


After long time I am publishing a interesting post. I could do that long time back but time was not permitting to do so.

So let me first explain the requirement : We had 10g for couple of years before we transitioned to OBIEE 11g. During migration process we lost Subscribers who had subscription to individual Agents/iBOTs. There are several reason why we had lost this, because one of our BI product Catalog is so huge with millions of tiny files (because it has not been cleaned for long time plus it had past 5 years of junk users/folders which never deleted or cleaned) moving those files with archive/un-archive and then OBIEE 10g to 11g upgrade was never successful. So we had to plan for either manually do Subscription for those Alerts or Automatically do that.

Manual was not an option as there were thousand of Alert subscribers who subscribed to individual agents. So that is why the solution come into play. I was never a Java coder but tried my best to code it in below phases:

  1. Install JDeveloper 11g software which will be our UI and java code compiler/execution interface
  2. Create a Application and then Create a Project.
  3. Under one Project add New -) Web Services -) Web Service Proxy
  4. Always select JAX-RPC Style coding
  5. Then Add your WSDL web link which is OBIEE WSDL URL (For e.g : if your OBIEE URL is : http://(host):9704/analytics/saw.dll?Dashboard – your WSDL URL will be http://(host):9704/analytics/saw.dll?wsdl
  6. Then Select the corresponding webservice. In this case we need “SAWSessionService” and “IBotService”

The step by step are in the Slideshow:

This slideshow requires JavaScript.

So finally you see how two Webservice has been added. Now you need the code which is AutoAlert.java and what it does step by step is :

  1. We had a lookup table with all username and password under csv file which is A4.csv here. This csv is has list of users for which we want the Alert subscription automatically for one specific Agents
  2. The code with parse the csv ,take user/password from each row
  3. It will create new OBIEE session for each row and then subscribe that user with specific agent in that OBIEE session itself
  4. Finally once you edit that Agent in OBIEE you will all users subscribed automatically.

For this java file you need to create it under Project as New -) Java -) Java Class and named it as AutoAlert.

5.png

Then paste the code as below: Also replace the OBIEE URL with your env’s URL.

++++++++++++++++++++++++++++++++++++++++++++++++++++

package oracle.bi.webservices;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;

import webservices.bi.oracle.v6.Logon;
import webservices.bi.oracle.v6.LogonResult;
import webservices.bi.oracle.v6.SAWSessionServiceSoapPortClient;
import webservices.bi.oracle.v6.IBotServiceSoapPortClient;
import webservices.bi.oracle.v6.ItemInfo;
import webservices.bi.oracle.v6.Subscribe;
import webservices.bi.oracle.v6.SubscribeResult;

public class AutoAlert
{
public AutoAlert() {
super();
}

public static void main(String[] args) {
try
{
BufferedReader br = new BufferedReader(new FileReader(“D:\\A4.csv”));
String sCurrentLine;
String first = “Username is: “;
String second = “Password is: “;
HashMap data = new HashMap() ;
while ((sCurrentLine = br.readLine()) != null) {
String[] information = sCurrentLine.split(“,”);
String username = information[0];
String password = information[1];
System.out.println(“Username: ” +username + ” / ” + “Password: ” + password );
data.put(username,password);

Logon my_logon = new Logon();
LogonResult logResult = new LogonResult();
my_logon.setName(username);
my_logon.setPassword(password);
String log_sessid = new String();
SAWSessionServiceSoapPortClient new_soap_port_client = new SAWSessionServiceSoapPortClient();
new_soap_port_client.setEndpoint(“http://<obiee_host&gt;:9704/analytics/saw.dll?SoapImpl=nQSessionService”);
logResult = new_soap_port_client.logon(my_logon);
log_sessid = logResult.getSessionID();
System.out.println(“Logon Session ID: “+log_sessid);

IBotServiceSoapPortClient ibot_soap_access = new IBotServiceSoapPortClient();
ibot_soap_access.setEndpoint(“http://<obiee_host&gt;:9704/analytics/saw.dll?SoapImpl=ibotService”);
Subscribe sub=new Subscribe();
SubscribeResult subresult=new SubscribeResult();
sub.setPath(“/shared/Agents/Agency360 Alerts/Pace – Revenue Penetration Rank Decline”);
sub.setSessionID(log_sessid);
sub.setCustomizationXml(“”);
subresult = ibot_soap_access.subscribe(sub);
System.out.println(“Subscribing to Alert: “+sub.getPath());
}
}
catch (java.lang.Exception e) {
e.printStackTrace();
}
}
}

++++++++++++++++++++++++++++++++++++++++++++++++++++

Before you execute this 2 changes required to be done in OBIEE end:

  1. Login as Admin user and go to “Manage Catalog Groups” -) Under Presentation Admin add “Authenticated User” . (This is required because I am not using Impersonate method for my Java webservice call)
  2. Also under “Manage Privilege” add “Presentation Admin” under SOAP  option.

3. Now once you run the project you will see below : with each login each session has been initiated and subscribe to Agents: The failure below because user password was expired

WebService error 2

4. Also if step 1 and 2 is not configured proper way you will see “Insufficient Privilege” issue.

WebService error 3.jpg

WebService error 4

5. Now when this started executing just monitor the Session here and you will see Webservice call :

Manage Sessions while running Alerts Sunscription

6. Finally when everything is done. You will see users are subscribed to Agent as below:

8

7. If you want to run this Subscription for multiple Agents you just need to code such a way and change that Alert path in below method call :  sub.setPath(“/shared/Agents/Agency360 Alerts/Pace – Revenue Penetration Rank Decline”);

8. So you are all set and enjoy writing Java codes to do OBIEE stuff 😉 😀

OBIEE 11g Performance Challenge “Ultimate Tuning Guide”


The idea to create this new thread about OBIEE 11g Performance Tuning as it is a real challenge and very few people even the experts have limitation to cater all tuning aspects. This is only because a monster seating in front wheel which is driver of all stuff called “Weblogic” and magnitude of knowledge with evolve of Weblogic 11g has certain  limitations.

We are going to explore nuts and bolts of all possible tuning aspects which essentially can help a lot of people. The scope of this blog considers tuning OBIEE 11g(11.1.1.7 )Linux platform which is more stable and more reliable. We have huge # of customers accessing analytics everyday with high # of queries. The Customer base is huge each day 24×7 accessible across every part of the universe. The reports and pages rendering needs to run very fast without any performance concern at-least that is the expectation. Having this in mind we started tuning 11g platform and performed regression test 5 times in row. Below are the tuning what we did so far in different layers of the application. Overall it was a great testing effort conducted ! We did performance testing with 1000 concurrent user ramped up in the system within 1 minutes. We use performing testing tool to emulate user login using a software and browse through across different dashboard pages and the logout mimicking users activities.

Lets see what are the different layers of Performance tuning we have done on our platform:

  • Operating System(OS) / Hardware Level Tuning

    We have 3 Virtual servers in Horizontal cluster and each of them running only 1 instance. No Vertical clustering. All in Highly available platform with Fail-over and Load balancing capability and has future scalability opportunity. Note those servers are not Physical yet. If you are planning to do Physical that will give more dedicated power and Oracle recommend this but as of now if you plan to do with Virtual it will give you similar capability if you mastered it with proper level of tuning.
  • Assign Proper Hardware resource: Initially we had 3 CPU Cores but with the increase # of loads on CPU we increased that to 8 Virtual CPU’s to balance out the load.  We have Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz and each server has 128 GB of RAM.
  • Change TCP Max Sync: When the server is heavily loaded or has many clients with bad connections with high latency, it can result in an increase in half-open connections which is called “tcp_max_syn_backlog” . To give CPU enough room to handle too many open files and to avoid OutofMemory error while creating new native threads we have tweaked couple of below parameter in /etc/security/limits.conf  file. Let root edit file to include below: Also check the parameter using ulimit -n and ulimit -u command in Linux.
    * soft nofile 131072
    * hard nofile 131072
    * soft nproc 131072
    * hard nproc 131072
  • Change TCP Fin Timeout: cat /proc/sys/net/ipv4/tcp_fin_timeout  , this will give you the existing #. If the # is 60 make it 30 . Reducing the value of this entry, TCP/IP can release closed connections faster, providing more resources for new connections.
  • You might thing of to change: /proc/sys/net/ipv4/tcp_keepalive_time to be less than 7200. Follow this to make both the changes on TCP level is updated in system:

http://tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec75.html

  • I have face a  very weird scenarios where we had to touch Cisco router settings to Allow Urgent Flag(Disable Clear Urgent Flag) to have better communication between DB server and OBIEE servers. We had issue where Usage Tracking causing trouble to render one prompts .. ! Sounds weird ! yes its true , it was a connection issue between UT table in DB and UI causing trouble for one of the multi-select prompt never rending and hanging forever. Also I have seen issues where Scheduler process has severe intermittent connection issue  with DB (RCU Schemas). Both of them are addressed by this Cisco Firewall level changes below:
  • Cisco Router Config1 Cisco Router Config2 Cisco Router Config3 Cisco Router Config4
  • Middle Tier (Weblogic EM/Console) Layer Tuning

1)  In EM against data source Increase # of Connection Pool to 150. This includes all the data source added there as per below image. For us we haven’t touch default 50 and you might want to do that. Increasing this value could cause potential impact on Database as it will try to open that many open cursors and processes to the database server.

EMDataSourceConnections EMDataSourceConnections2

2) We have tuned 64 bit JVM’s. In JRockit JVM (R28.x), the heap grows faster than before. The JVM also ensures that the heap size grows up to the maximum Java heap size (-Xmx) before an OutOfMemory error is thrown.So we have tweaked some JVM related parameter under below file:  [MiddlewareHome]/user_projects/domains/bifoundation_domain/bin/setOBIDomainEnv.sh

JVM

3) Tune Analytics (WebLogic Server app plug-in) Connection pool:

There is a connection pool between WebLogic Server analytics app and OBIPS, and the default value is 128 inadequate for a large number of concurrent users which is typically expected in a BIEE system with high users concurrency. When the number of connections reaches the maximum limit, any new requests are kept waiting. Hence, it is recommended to increase this pool to 512 for your BIEE system to support more concurrent users.

I have done parameter change in file :

[Middleware Home]/ user_projects/domains/bifoundation_domain/config/fmwconfig/biinstances/coreapplication/bridgeconfig.properties

Changed: oracle.bi.presentation.sawconnect.ConnectionPool.MaxConnections=777

4) EM STUCK THREAD PARAMETER CHANGES

This is very important as highlighted it in RED. It is also known as HOGGING Thread . In a high user volume testing environment seeing this STUCK Thread issue is pretty common in log files. We need some exceptional level of tuning to get rid of this STUCK thread issue.

WebLogic Server automatically detects when a thread in an execute queue becomes “stuck.” Because a stuck thread cannot complete its current work or accept new work, the server logs a message each time it diagnoses a stuck thread. A thread might get stuck due to various reasons.

For example: When large BI report is running and the time it takes to complete is say 800 seconds, then, as the default stuck thread timing is 600 seconds in WebLogic Server, the thread allocated for that query waits for 600 seconds and goes to stuck state.

As best practice we have changed basic Thread Parameters as below under Console. And this needs to be done all the Managed servers clustered participating nodes  including Admin server.

STUCK1 STUCK2

In our environment we have faced this issue multiple times and we took couple of thread dump when STUCK thread occurs and BI Server stops responding and stop accepting any new connections:

A typical thread dump has below details and observe that there are lot of TIMED WAITING events:

“[ACTIVE] ExecuteThread: ‘136’ for queue: ‘weblogic.kernel.Default (self-tuning)'” RUNNABLE native

java.net.SocketInputStream.socketRead0(Native Method)

java.net.SocketInputStream.read(SocketInputStream.java:129)

com.siebel.analytics.web.sawconnect.SAWConnection$NotifyInputStream.read(SAWConnection.java:165)

java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

java.io.BufferedInputStream.read(BufferedInputStream.java:237)

com.siebel.analytics.web.sawconnect.sawprotocol.SAWProtocol.readInt(SAWProtocol.java:188)

————-

“Timer-1” waiting for lock java.util.TaskQueue@3b92d7d0 TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“Timer-0” waiting for lock java.util.TaskQueue@49d85ab9 WAITING

java.lang.Object.wait(Native Method)

java.lang.Object.wait(Object.java:485)

java.util.TimerThread.mainLoop(Timer.java:483)

java.util.TimerThread.run(Timer.java:462)

“Signal Dispatcher” RUNNABLE

null

“Finalizer” waiting for lock java.lang.ref.ReferenceQueue$Lock@67646de5 WAITING

java.lang.Object.wait(Native Method)

java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)

java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)

java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

“Reference Handler” waiting for lock java.lang.ref.Reference$Lock@5178efd5 WAITING

java.lang.Object.wait(Native Method)

java.lang.Object.wait(Object.java:485)

java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)

“main” waiting for lock weblogic.t3.srvr.T3Srvr@4685b50e WAITING

———————-
“Timer-117” waiting for lock java.util.TaskQueue@46fbadd2 TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“Timer-116” waiting for lock java.util.TaskQueue@3550da66 TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“Timer-115” waiting for lock java.util.TaskQueue@4f3279e2 TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“Timer-114” waiting for lock java.util.TaskQueue@7ae00d0c TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“Timer-113” waiting for lock java.util.TaskQueue@b78cdda TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“Timer-112” waiting for lock java.util.TaskQueue@2812a918 TIMED_WAITING

java.lang.Object.wait(Native Method)

java.util.TimerThread.mainLoop(Timer.java:509)

java.util.TimerThread.run(Timer.java:462)

“QuartzScheduler_BIPublisherScheduler-NON_CLUSTERED_MisfireHandler” TIMED_WAITING

————

See more on here if you are interested to trace that down:

https://blogs.oracle.com/WebLogicServer/entry/analyzing_a_stuck_weblogic_execute

A very useful way to find the STUCK Threads are: go to Console -) Environment -) Servers -) Click Each Managed server as bi_serverx -) Click Monitoring -) Threads .

Below are 3 important parameters which tell you the server health plus Hogging thread count . If you see Hogging thread count is potentially high then you are in trouble with STUCK Thread . This below snapshot is very ideal managed server nodes with no Hogging/Stuck thread at all.

Hogger1

However this is most notorious STUCK thread issue . See the Count of thread waiting here. You must be in trouble in such scenarios:

Console

Memory consumption will tell you no issue with CPU / RAM . See this for 3 clustered nodes. Each has 128 GB memory though we are using hardly 15% of it and CPU is 50% free almost every time. So its not an Hardware bottleneck for sure. Network bottleneck ??? May be ! right now in our optical fiber network channel I am assuming its not an issue.

Hogger2

When you see Hogging thread count exceeds 20+ you will get below kind of error in dashboard page for all sessions already In and no new sessions will be able to open connection and will throw login error:

Hogger3

Now see the way you can get details of which request is Hogging . It will not tell you exact dashboard and report name but will tell you which application part/URI request at very high level. When more and more such request piled up in Middle-tier it will results a unresponsive server and application will not be able to respond which potentially leads to server crash.

Usage Tracking Hogging

More such requests you can find here . All became STUCK or HOGGING thread .

Hogger4

See JDBC Data source performance as well:

JDBC Connections usages 0 JDBC Connections usages 1Below shows 2 nodes of 4 node Exadata RAC performance so STUCK Thread is not a bottleneck due to Database and very less usage/load has been observed in DB nodes.

Exadata

HOW TO TRACE STUCK THREAD and SEE where is the bottleneck:

Always leverage EM to see the performance in sessions plus performance in GET and POST request.

server loads3

Use WLS EM to find out the load of various components:

WLS Load

Usage EM Agency Peak Load Agency Peak Load-2 EM Usage - Metric Palette

Try to use Performance URL like below to get the online details during the regression test execution  http://<server:port>/analytics/saw.dll?Perfmon

Load this , analyzed this to get some alarming #’s which will tell you precisely where the problem lies.

Perfmon - Query parameters

Check this link which will tell you how a standard Perfmon will be looks like: RATE-After Final Perf Test

But at  this point you have to understand which parameter is bottleneck and its not an easy job!

Note that, so far whatever tuning we did, it was not enough for the excellent performing environment.

So we did couple of other level of tunings in RPD layer and BI Config layer so that we can best use of the change we did in Weblogic layer.

  • OBIEE (RPD/Catalog/Config file) Level Tuning

     1) BI Server Config (The Blue one below is very important to get rid of THREAD STUCK issue)

[SERVER]

READ_ONLY_MODE = NO; # This is for both online & offline – This Configuration setting is managed by Oracle Enterprise Manager Fusion Middleware Control
MAX_SESSION_LIMIT = 2000;
MAX_REQUEST_PER_SESSION_LIMIT = 5000;
SERVER_THREAD_RANGE = 100-1000; # 40-100 was default
SERVER_THREAD_STACK_SIZE = 1 MB; # default is 256 KB (32 BIT mode), 1 MB (64 BIT mode), 0 for default
DB_GATEWAY_THREAD_RANGE = 40-1000; #40-200 was default
DB_GATEWAY_THREAD_STACK_SIZE = 1 MB; # default is 256 KB (32 BIT mode), 1 MB (64 BIT mode), 0 for default
HTTP_CLIENT_THREAD_RANGE = 0 – 100;
HTTP_CLIENT_THREAD_STACK_SIZE = 1 MB; # default is 256 KB (32 BIT mode), 1 MB (64 BIT mode), 0 for default
MAX_EXPANDED_SUBQUERY_PREDICATES = 8192; # default is 8192
MAX_QUERY_PLAN_CACHE_ENTRIES = 5000; # default is 1024
MAX_QUERY_PLAN_CACHE_ENTRY_SIZE = 1 MB; # default is 256 KB,(32 BIT mode), 1 MB (64 BIT mode), 0 for default
MAX_DRILLDOWN_INFO_CACHE_ENTRIES = 1024; # default is 1024
MAX_DRILLDOWN_QUERY_CACHE_ENTRIES = 1024; # default is 1024
INIT_BLOCK_CACHE_ENTRIES = 500; # default is 20

GLOBAL_CACHE_STORAGE_PATH = “/obiagency360/Shared/Cache” 2 GB; # This Configuration setting is managed by Oracle Enterprise Manager Fusion Middleware Control
MAX_GLOBAL_CACHE_ENTRIES = 1000;
CACHE_POLL_SECONDS = 300;
CLUSTER_AWARE_CACHE_LOGGING = NO;

 2) BI Presentation Server Config

<ThreadPoolDefaults>
<ChartThreadPool>
<MinThreads>100</MinThreads>
<MaxThreads>400</MaxThreads>
<MaxQueue>2048</MaxQueue>
</ChartThreadPool>
</ThreadPoolDefaults>

<Cache>

<ACLs>
<Enabled>false</Enabled>
</ACLs>

<CatalogXml>
<!– Remove from the cache everything older than N minutes –>
<MaxAgeMinutes>240</MaxAgeMinutes>
<MaxLastAccessedSeconds>14400</MaxLastAccessedSeconds>
</CatalogXml>
<Query>
<MaxEntries>5000</MaxEntries>
<!– AbsoluteMaxEntries is the enforced maximum number of entries. When this maximum is reached –>
<!– subsequent queries will fail until the maximum is no longer exceeded. –>
<AbsoluteMaxEntries>20000</AbsoluteMaxEntries>
<!– CruiseEntries is amount of entries the OracleBI Presentation server tries to maintain in its cache. –>
<CruiseEntries>3000</CruiseEntries>
<!– Forces the cache to attempt to remove an old entry when MaxEntries is exceeded. –>
<ForceLRU>true</ForceLRU>
</Query>
</Cache>

 3) Chart config [INSTANCE_HOME/config/OracleBIJavaHostComponent/coreapplication_obijh1/config.xml]

<JobManager>
<MinThreads>100</MinThreads>
<MaxThreads>200</MaxThreads>
<MaxPendingJobs>200</MaxPendingJobs>
</JobManager>

 4) RPD (Repository) Init block config

  • As we know that Starting point of OBIEE Authentication is “Authentication Init block”. We have found minor lags on query performance because it was taking more than expected time and ~7-8 seconds. Since our Init block query is heavy with almost 20 joins the concurrent user login could cause a issue for us. So we moved the entire query to Database procedure , use TABLE functions and pass the :USER to query only and get the variables populated by the returned value of the functions. We have seen 50% performance improvement in this process as it was leveraging the DB cache feature a great extent
  • During 10g to 11g upgrade RPD connection pools are not changed. However best practice is to separate connection pools for init block and connection pools used for Datawarehouse query. So we have segregated one connection pool to two and then assign 30% of Datawarehouse connection pool to Init block totaling ~ 760 connections to 1 RPD spread across 5 databases.
  • Removed unnecessary init block queries. Cleaned up errors in Init block query and removed unused variables plus connection pools
  • Apply patch at server level for “ADF Warning” which will reduce ADF warning in logs
  • Make LOGLEVEL for users not more than 0.
  • Web/Front Layer Tuning

    Bandwidth Savings: 1) Enabling HTTP compression can have a dramatic improvement on the latency of responses. By compressing static files and dynamic application responses, it will significantly reduce the remote (high latency) user response time.
    2) Improves request/response latency: Caching makes it possible to suppress the payload of the HTTP reply using the 304 status code. Minimizing round trips over the Web to re-validate cached items can make a huge difference in browser page load times.

  • This uses a web accelerator mechanism to compress HTTP files (static and dynamic content) and add caching mechanism to help rendering the UI faster. Also in URI and Content compressor the packets transmits the web reduced to 10x in number of bytes and increase the usability of network traffic and faster hand-off between clients and server request.  See this and learn more about HTTP Compression:

https://support.f5.com/kb/en-us/products/wa/manuals/product/wa_implementations_11_0_0/3.htm

Attached below which will tell how much % we gain due to HTTP Compression and we are avoiding misuse of network bandwidth.

HTTP Compress

Also it is advised not to clear browser cache each time after closing and reopening of browser. Make below changes in browser settings:

Increase the cache size to 1024 MB
o Firefox: Enter “about:config” as the url and change:  browser.cache.disk.capacity to 1024000 ,  browser.cache.disk.max_entry_size to -1
o IE: Set “Disk space to use” to 1024 under Internet Options -> Browsing history -> Settings

More details of issues can be found on deciphering below Log files as additional measure of Performance tuning…

  • Presentation Services Log (saw.log).
  • BIServer Log (nqquery.log, nqserver.log).
  • Scheduler Log (nqscheduler.log).
  • JavaHost Log (jh.log).
  • Cluster Controller Log (nqcluster.log).
  • WLS Managed Servers Log (AdminServer-diagnostic.log, AdminServer.log,
    bi_server1.log, bi_server1-diagnostic.log).

I hope this will help a lot and enjoy performance tuning 🙂 😀

You can refer Vishal’s blog which has handful of tuning details specially on DB config parameters and how to tweak them  :

http://obiee-oracledb.blogspot.com/2012/05/obiee-performance-imporvement-tips.html

Another very useful information available here on Performance topic: http://www.clearpeaks.com/blog/oracle-bi-ee-11g/obiee-11g-tuning-and-performance-monitoring

 

Life of OBIEE Catalog object permission – A niche Subject “ACL” cache


Access Control List(ACL) is very common terminology and not confined precisely within OBIEE. Several software use ACL to define security object permissions.An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects. We have seen ACL when we edit Objects permission or open Catalog object using Catalog Manager. I have noticed “preserve ACL” in OBIEE 10g Catalog Manager if you want to apply permission across objects. Also noticed ACL again on 11g while we embarked a very strange/weird and complicated issue.

We have OBIEE 11g .7 version 140415 running on production since launch past 5 months. Recently couple of customer raised concern about page visibility for different kind of subscription. We have customer could use multiple subscription which can be switched from a portal and based on that switch specific page should have to be visible. However we have experience a issue when a user logged in and seeing Dashboard A and logout and relogin again with different subscription Dashboard A still visible while Dashboard B should have to be visible.

Initially how you attack this problem ? I know what you are thinking of but we have very complicated system with several layers before you land to dashboard page. We have One portal which redirects to IDP (performing TCping Authentication) ,then control goes on Horizontal Cluster nodes via Load Balancer and then control goes to Analytics. So there are 3 way channels before you are landing to OBIEE page. On top of this we have deprecated 11g security system running initialization block. I know this is damn bad and Oracle recommended but we have to live with the crap because Oracle said this can still be functional. So in summary we have no Application Roles / Groups setup on EM and Console and all object based permission defined under Catalog is legacy Catalog Group permission.

So there are various aspect and point we need to carefully verify before we attack this problem. I started debugging this issue from Catalog Object level permission first , nothing seems weird but did little bit tweak here and there and it didn’t resolve the issue.

We thought it could be SSO Ping Auth issue which is bypassing the RPD security and not getting GROUPS variable populated right way. So we added WEBGROUPS and ROLES system variables in RPD and enabled all level of trace on EM and Console on oracle bi security objects including the most granular level of LOGLevel trace in instanceconfig.xml file.

We have found after logout of initial session and login new session is not picking up the permission correctly and it retains the old permission on object.

There are security masking logic on Read , Write and All permissions in each level of Catalog object while can be traceable in sawlog file which clearly depicts the issue ! But unfortunately we don’t have smoking gun yet to resolve the problem.

A search in OTN and googling around doesn’t seem to be matching to right use case ! this was a frustrating issue which bother for 4 days until actual fix has been found and Finally we have isolated our environment from SSO , Cluster , Load balancer everything to segregate the root cause of the issue. We came to know its OBIEE 11g issue for sure. We have tried several areas of tweak to fix this issue mostly on file system level , cleaning cache , clean volatileuserdata, removing cacheduserinfo, removing that user content and re-instate, GUID refresh and nothing works and we were almost out of ideas what to do next 😦 😦

FINALLY there is an idea triggered on my mind if there is any configuration can be done on refresh level which could potentially refresh dashboard object including the permission could resolve the issue. And I realize only Instanceconfig.xml is a place where you can declare on such parameter. So in search of some parameter I found below:

<Cache>
<CatalogAttributes>
<MaxAgeMinutes>1</MaxAgeMinutes>
<CleanupFrequencyMinutes>1</CleanupFrequencyMinutes>
</CatalogAttributes>
</Cache>

This will reload catalog object permission on 1 minutes interval . So potentially this could fix my issue.

And it Voila… it fixed my problem … Now right user getting correct permissions on multiple subscription switch. But still its not end of the game ! WHY ? Yes, we have seen catalog object refreshing in 1 minutes interval but my problem reoccur if multiple login/logout happens within 1 minute of interval …. Sad !! So this is not going to be a solution. Also reload catalog object frequently is not an good idea as SAW server does a hard IO work each time this happens to crawl across catalog and refresh object permission. Also there is no instanceconfig.xml parameter exist to lowered it down to seconds level  😦

Adding this parameter is as good as “Reload server metadata trigged” by Admin from analytics Admin link.  So I was in search how to invoke this automatically rather than required on demand.

Investing more energy on that reveals what is this object permission and seems closely related to ACL and we came to know below details how OBIEE handle the users in Catalog:

The catalog is designed to scale to thousands of concurrent users. To achieve this scaling, the catalog adheres to the following guidelines:

  • The average user typically only reads from the catalog and rarely, if ever, writes to it. In Release 11g, each user is constantly and automatically updating his or her Most Recently Used file, but each user’s “read” operations still far outweigh the user’s “writes” operations. Therefore, the read-to-write ratio is typically at least 100 to 1.
  • While a locking mechanism guarantees that only one user can write to an object at a time, it is rare for multiple users to attempt to write simultaneously to the same object. A feature called “lazy locking” allows users to continue reading an object even when another user is updating that object.
  • Modern file systems cache “small” files directly inside the directory record, such that reading any information on a directory simultaneously loads all small files directly into the operating system’s memory cache. Therefore, it is good practice to keep files in the catalog “small,” especially the frequently “read” .atr metadata files. When these metadata files remain small, then all the .atr files in a directory are loaded into memory with one physical hard disk read. Every file that exceeds the “small” threshold adds another physical hard disk read, which can cause a 100% degradation for each large file. In other words, use care when considering storing arbitrary “Properties” in .atr files.
  • Reading an object’s .atr metadata file using NFS is far slower than reading it directly from a local disk. For this reason, Presentation Services additionally caches all .atr files internally. This cache can become briefly “stale” when another node in the cluster writes data to the file that is newer than the data that is cached by the current node. Therefore, all nodes are refreshed according to the MaxAgeMinutes element in the instanceconfig.xml, whose default for a cluster is 5 minutes. This default setting commonly achieves the best trade-off between the possibility of stale data and the known performance impact. (The default for an environment without clusters is 60 minutes.)

Ours are pretty similar configuration of having millions of .atr file (as we have very large user base in production) in NAS storage continuously accessed by Sawserver catalog crawler.

Looking much on ACL on OTN reveals not same but similar kind of issue and I came to know about bunch of very important config parameters listed at the end of this blog.

Now I have full fledged smoking gun which could potentially resolve my issue which is below:

<ServerInstance>
 
<Cache>
<ACLs>
   <Enabled>false</Enabled>
</ACLs>
</Cache>

</ServerInstance>

So I applied in instanceconfig.xml to see the results and unfortunately sawserver was not started as this is unrecognized tag ! When I see OTN carefully and I found this can be added only 140715 version on wards and not less . So we have asked Oracle to provide patch for 140415. After following up Oracle with Sev 1 issue with several escalation we finally got the fix . We applied it , add this params in instanceconfig and it resolves the issue…. 🙂 🙂  Oracle provided the fix with 3 files , one xsd and 2 saw binary files (.so files) . So the long running nightmare is over !!! 🙂 I am sooo happy…

So we know NOW what is ACL Cache and how it could potentially behave weird and cause lot of trouble to our life and we know what should be the way to get rid of the problem around it. See below some of the other important reference on Cache under instanceconfig.xml file. Recommend is if you Disable ACL cache which Oracle confirmed has no performance impact.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Use those judiciously if you know what is that for and later don’t burn your hands … 😉

<Cache>
<UserPopulationExternalMembershipsCache>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</UserPopulationExternalMembershipsCache>

<UserPopulationAccountDetailsCacheByGUID>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</UserPopulationAccountDetailsCacheByGUID>

<UserPopulationAccountDetailsCacheByName>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</UserPopulationAccountDetailsCacheByName>

<RoleDirectMembersCache>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</RoleDirectMembersCache>

<RoleHierarchyCache>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</RoleHierarchyCache>

<ParentRolesForGrantedRolesCache>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</ParentRolesForGrantedRolesCache>

<!– this cache is NOT related to the security stack It is only for the account info in the webcat itself –>
<Accounts>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</Accounts>

<!– this cache is NOT related to the security stack It is only for the account info in the webcat itself –>
<AccountIndex>
<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>
<MaxAgeMinutes>3</MaxAgeMinutes>
</AccountIndex>

<!– This CatalogAttributes cache contains all webcat objects’ metadata (the owner, permissions, properties, attributes etc).
The internal permissions have a their own cache as to who is allowed to access the webcat object.

So to cleanup the internal permissions cache, this CatalogAttributescache must also be cleaned. –>

<CatalogAttributes>

<CleanupFrequencyMinutes>3</CleanupFrequencyMinutes>

<MaxAgeMinutes>3</MaxAgeMinutes>

</CatalogAttributes>

</Cache> 

OBIEE 11g – Horizontal Clustering


Hola … I was thinking about to post this thread from long time (since last 5 months) as I have been extensively involved in a OBIEE 11g platform build from scratch, playing with 3 nodes OBIEE installation , Vertical / horizontal clustering et all.

Now the game is over and I want to detail out each and every steps in details and the hurdles I have faced during step by step setup execution.

Honestly before writing this blog thread I have searched across web and googling around several sites to get consolidated Clustering steps and I never found it. This is nothing mean to wrong to other blogger who are posting excellent stuff in their blogs since ages but I feel each post on Horizontal Cluster lacks some information on details.

Idea is to perform 3 nodes horizontal clustering with one OBIEE instance running on each of the nodes. OBIEE version to be installed in 11.1.1.7.0 with bundle patch version of p18283508. which will make final version to :

11.1.1.7.140415 (Build 140402.1431 64-bit)

There are few restrictions in terms of browser support for this version with IE11. See my previous post for the hack on IE11 to get this work done perfectly.

A little background from OBIEE 10g version just to understand why we made decision to go to OBIEE 11g.

  1. OBIEE 10g lacks features of 11g
  2. Platform wise 11g is more robust with enterprise architecture
  3. In 10g we had faced severe issue with memory in 32 bit architecture and in vertical clustering. The idea of vertical clustering is a crappy design with several architectural flaws which we want to get rid of in 11g
  4. We had 2 OBIEE 10g nodes with 12 instances running on each node is a nightmare to manage and control. Moreover it doesn’t use shared cache with max 4 GB memory address limitation in each sawserver and nqserver process which is a support pain to replenish instances with memory issue/leaks.

Okay so lets start a high level setup details:

  1. Install OBIEE on Node 1 – Primary Node
  2. Install OBIEE on Node 2 – Secondary Node
  3. Install OBIEE on Node 3 – Third Node
  4. Apply OBIEE Bundle patches on Node 1 , then Node 2 and then Node 3
  5. Apply customization , if required and ported from 10g
  6. Perform config chances in EM – Failover and Scaleout to add additional 2 nodes
  7. Perform config chances in EM – Failover
  8. Do Integration check and deploy the required RPD/Catalog on Shared NAS/NFS mount

I will explain the entire things in step by step. In my case it is 2 node cluster installation but for 3rd node the process is exactly same and no difference except extra host to be select on EM.

In above customization steps it is NOT required to have the below OS level file changes present as pre-requisite and you can do it later.

* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072

I assumed that you have right packages downloaded from OTN and right patches before continue.

In below steps I am planning to install OBIEE through Unix script using command line parameters so that no GUI is required. I have created my own response file to pass through certain command line steps which will install OBIEE RCU schema first, OBIEE11g and WLS and then perform some 10g ported customization and finally deploy respective catalog/rpd which is updated from 10g.

The scope for this thread is to define the Steps required to Horizontal cluster on multi-nodes. To save all Logs into a target file use command : script obiee_installation_log.txt  and then start command line execution. Finally once execution done type : exit. this will save entire log from buffer to target file and you can review it to see in details.

STEP 1

Execute weblogic command line params to install RCU. This is similar to what we have in GUI. A command line example is below which will drop existing RCU.

rcu -silent -dropRepository -compInfoXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/ComponentInfo.xml -storageXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/Storage.xml -databaseType ORACLE -connectString <Replace with database connect string as db:port:instance> -dbuser <SYDBA Username> -dbrole sysdba -schemaPrefix <SchemaPrefix> -component MDS -component BIPLATFORM -f < /u00/media/ofm_rcu_linux_11/rcuHome/rcuPassword.txt

A command line example is below which will create RCU.

rcu -silent -createRepository -compInfoXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/ComponentInfo.xml -storageXMLLocation /u00/media/ofm_rcu_linux_11/rcuHome/rcu/config/Storage.xml -databaseType ORACLE -connectString <Replace with database connect string as db:port:instance> -dbuser <SYDBA Username> -dbrole sysdba -schemaPrefix <SchemaPrefix> -component MDS -component BIPLATFORM -f < /u00/media/ofm_rcu_linux_11/rcuHome/rcuPassword.txt

Note that rcuPassword.txt could be any file which will act as source file for storing Sysdba User/Password and passed through command line.

N.B:- If you see any error like this that means you have trouble creating RCU and you have to perform clean-up this error before this:

9-28-2015 10-55-58 AM

The reason because RCU schema pre-exist and didn’t clean-up well. You have to find the schema name from below table acts as a RCU version history keeper and then delete those records/commit before proceed.

select * from System.SCHEMA_VERSION_REGISTRY$

2. Execute OBIEE installer with response file. A sample example below: This will be similar like doing Enterprise installation using UI.

cd /u00/media/bishiphome/Disk1
unset ORACLE_HOME
echo inventory_loc=$HOME/oraInventory > $HOME/oraInst.loc
echo inst_group=${OS_GROUP} >> $HOME/oraInst.loc
mkdir $HOME/oraInventory

./runInstaller -silent -response /u00/response_file -invPtrLoc $HOME/oraInst.loc -waitforcompletion

This will unset ORACLE_HOME (which is mandatory) , then creating Inventory Location with OS group and then continue installation with response file called response_file

If you are using response_file then for primary node installation below are important parameters you need to change in response_file. You can find the details of standard response for for OBIEE 11g installation in Oracle doc or just google it

DOMAIN_HOSTNAME, ADMIN_USER_NAME, ADMIN_PASSWORD, ADMIN_CONFIRM_PASSWORD,

WLS_SINGLE_SERVER_INSTALL=false ,

MW_HOME, WEBLOGIC_HOME,ORACLE_HOME,INSTANCE_HOME, DOMAIN_HOME_PATH,

DATABASE_CONNECTION_STRING_BI, DATABASE_SCHEMA_USER_NAME_BI, DATABASE_SCHEMA_PASSWORD_BI,DATABASE_CONNECTION_STRING_MDS, DATABASE_SCHEMA_USER_NAME_MDS, DATABASE_SCHEMA_PASSWORD_MDS.

So my OBIEE installation on Primary node completed and I used to get below issue whenever I ran command line / GUI:

[CONFIG] FAILED:Executing: opmnctl start coreapplication_obisch1
[CONFIG]:Modifying BI Configuration Files
[CONFIG] SUCCESS:Modifying BI Configuration Files
Configuration:BI Configuration failed
[CONFIG] Failed.
[ACTION]: BI Configuration

The installation of Oracle AS Common Toplevel Component, Oracle Business Intelligence Shiphome failed.

To avoid this I used to do a tweak and intercept of correcting the opmn.xml  while 40% of the installation is over and once you get this file created. This file located at directory :  /u00/app/Middleware/instances/instance1/config/OPMN/opmn.

Or else you can just keep watching on below during Installation and then make the changes on opmn.xml file:

[CONFIG]:Creating Instance
[CONFIG] SUCCESS:Creating Instance

Just take the backup of existing opmn.xml and then Change its contents…

from:
<process-type id=”OracleBISchedulerComponent” module-id=”CUSTOM”>
– <module-data>
– <category id=”start-parameters”>
<data id=”start-executable” value=”$ORACLE_HOME/bifoundation/server/bin/bischeduler.sh” />
– <!– enable console log to be able to see process startup error –> f(clean);
<data id=”no-stdio” value=”false” />
</category>
– <category id=”stop-parameters”>
<data id=”stop-executable” value=”integrator” />
</category>
– <category id=”ping-parameters”>
<data id=”ping-type” value=”integrator” />
</category>
– <category id=”ready-parameters”>
<data id=”use-ping-for-ready” value=”true” />
</category>
</module-data>
<start timeout=”600″ retry=”1″ />
<stop timeout=”120″ />
<restart timeout=”720″ retry=”1″ />
</process-type>

to:
<process-type id=”OracleBISchedulerComponent” module-id=”CUSTOM”>
<module-data>
<category id=”start-parameters”>
<data id=”start-executable” value=”$ORACLE_HOME/bifoundation/server/bin/bischeduler.sh” />
<!– enable console log to be able to see process startup error
–>
<data id=”no-stdio” value=”false” />
</category>
<category id=”ping-parameters”>
<data id=”ping-url” value=”/”/>
</category>
<category id=”restart-parameters”>
<data id=”reverseping-timeout” value=”345″/>
<data id=”no-reverseping-failed-ping-limit” value=”3″/>
<data id=”reverseping-failed-ping-limit” value=”6″/>
</category>
</module-data>
<start timeout=”300″ retry=”3″/>
<stop timeout=”300″/>
<restart timeout=”300″ retry=”3″/>
<ping timeout=”60″ interval=”600″/>
</process-type>

This will make sure Scheduler start-up will be fine and no issue to Proceed the BI Configurations. This is just a hack a no reason why Oracle haven’t done this default in there installation package. You might need to do it across all secondary nodes.

The idea is you should be getting below to confirm entire Installation is done successfully.

[CONFIG] SUCCESS:Modifying BI Configuration Files
Configuration:BI Configuration completed successfully
The installation of Oracle AS Common Toplevel Component, Oracle Business Intelligence Shiphome completed successfully.

 STEP 2

Now we have to scale-out to Node 2 (secondary) and then 3rd node.

For this in Node 2 we need to make sure:

  1. we will not re-execute RCU commands as we already have RCU in place during primary node installation
  2. we will make changes in response file and make sure we put DOMAIN_HOSTNAME= <primary node Ip/host> and SCALEOUT_BISYSTEM=true
  3. I would prefer to use INSTANCE_NAME=instance2 to identify it is 2nd instance in cluster
  4. All activities will continue here like installing OBIEE / Weblogic , configuration except Admin server installation.
  5. Each additional node in cluster will act like Managed server and its system components and only Primary server will act like Admin and Managed server both
  6. Node 2 installation would be pretty much faster than Node 1. In my experience Primary node takes 30-40 minutes while all secondary nodes will take ~15 minutes to do complete install. Anyway it again vary from system to system based on capacity but I did a tweak on Java memory parameters for faster start-up/shutdown and Installation on OS level.
  7. Performance enhancement for faster start-up/Installation.

Make changes in all OBIEE Node’s . Need root access. No reboot required

  1. Edit or create /etc/sysconfig/rngd to contain:

# Add extra options here

EXTRAOPTIONS=”-r /dev/urandom”

  1. Then “service rngd start”.
  2. If that works, then “chkconfig rngd on” ( to start it at boot ).
  3. Add this on .bash_profile in Unix as below:

export CONFIG_JVM_ARGS=”-Djava.security.egd=file:/dev/./urandom”
export JAVA_OPTIONS=”-Djava.security.egd=file:/dev/./urandom”

So finally in Node 2 the output should be look like this:

9-28-2015 8-17-49 PM

STEP 3

Repeat STEP 2 for Node 3 . It must be similar.

STEP 4

  1. Apply Patches on NODE 1 using Opatch
  2. Apply Patches on NODE 2 using Opatch
  3. Apply Patch on NODE 3

N.B:- You must be thinking why I am not doing performing Installation and patch in one node completely and then proceed to next node. It will not going to work during Scale-out phase of other nodes. The reason because Once Primary node is upgraded with latest bundle patch and you are trying to perform scale-out a version mismatch occurs  for secondary nodes and it is not being able to access the primary nodes module. In this case you will be getting below error:

[2015-09-28T17:57:31.608-04:00] [as] [ERROR] [] [oracle.as.install.bi] [tid: 38] [ecid: 0000L0LxUqL3b6G5uzd9iX1M2RR700000T,0] ERROR: Instance creation failed.[[
oracle.as.provisioning.exception.ASProvisioningException
at oracle.as.provisioning.engine.Config.executeConfigWorkflow_WLS(Config.java:872)
at oracle.as.install.bi.biconfig.standard.StandardWorkFlowExecutor.executeHelper(StandardWorkFlowExecutor.java:31)
at oracle.as.install.bi.biconfig.standard.InstanceProvisioningTask.exec(InstanceProvisioningTask.java:76)
at oracle.as.install.bi.biconfig.standard.InstanceProvisioningTask.doExecute(InstanceProvisioningTask.java:99)
at oracle.as.install.bi.biconfig.standard.AbstractProvisioningTask.execute(AbstractProvisioningTask.java:70)
at oracle.as.install.bi.biconfig.standard.StandardProvisionTaskList.execute(StandardProvisionTaskList.java:66)
at oracle.as.install.bi.biconfig.BIConfigMain.doExecute(BIConfigMain.java:113)
at oracle.as.install.engine.modules.configuration.client.ConfigAction.execute(ConfigAction.java:375)
at oracle.as.install.engine.modules.configuration.action.TaskPerformer.run(TaskPerformer.java:88)
at oracle.as.install.engine.modules.configuration.action.TaskPerformer.startConfigAction(TaskPerformer.java:105)
at oracle.as.install.engine.modules.configuration.action.ActionRequest.perform(ActionRequest.java:15)
at oracle.as.install.engine.modules.configuration.action.RequestQueue.perform(RequestQueue.java:96)
at oracle.as.install.engine.modules.configuration.standard.StandardConfigActionManager.start(StandardConfigActionManager.java:186)
at oracle.as.install.engine.modules.configuration.boot.ConfigurationExtension.kickstart(ConfigurationExtension.java:81)
at oracle.as.install.engine.modules.configuration.ConfigurationModule.run(ConfigurationModule.java:86)
at java.lang.Thread.run(Thread.java:662)
Caused by: oracle.as.provisioning.engine.CfgWorkflowException
at oracle.as.provisioning.engine.Engine.processEventResponse(Engine.java:596)
at oracle.as.provisioning.fmwadmin.ASInstanceProv.createInstance(ASInstanceProv.java:178)
at oracle.as.provisioning.fmwadmin.ASInstanceProv.createInstanceAndComponents(ASInstanceProv.java:116)
at oracle.as.provisioning.engine.WorkFlowExecutor._createASInstancesAndComponents(WorkFlowExecutor.java:523)
at oracle.as.provisioning.engine.WorkFlowExecutor.executeWLSWorkFlow(WorkFlowExecutor.java:439)
at oracle.as.provisioning.engine.Config.executeConfigWorkflow_WLS(Config.java:866)

STEP 5

Lets do a quick sanity check first for Node 1.

top -u orabi should throw below running process on Node 1:

Process-Primary Node

top -u orabi should throw below running process on Node 2.

Process-Secondary Node

We can see Node 1 and 2 added in cluster. this means scale-out is successful and EM recognizes both node.

Also you see some of the processes are down on Node 2 from EM which is fine. Now we will see what additional steps we need to do.

EM

STEP 6

If you are deploying RPD and Catalog it is recommended to deploy it now into share path (which you must have to do if you want 3 nodes to share the same RPD and Catalog).

  • So Go to EM-) Deployment-)Repository. Add new RPD and Catalog after “Lock and Edit”.
  • Apply and Activate changes.
  • For us we have common NAS/NFS mount shared and accessible from 3 OBIEE nodes. It looks like below: (Catalog final name is obfuscated due to security reason)

9-28-2015 4-17-15 PM

Now “Lock and Edit” and Go to EM-) Availability -) Failover -) Make secondary Host like below and Apply, Activate Changes.

cluster

Now perform “Restart All” . This will make all the processes up and running across all nodes. But note, we haven’t yet created OBIEE Managed server system components on secondary nodes.

Go to Capacity management -) Scalability -) Add one components on each secondary nodes. This is nothing but vertical clustering on horizontal cluster. If you add more than one components it will create more than one instance on single host.

Scalability

Note that, more than one system components means more power and it will run more than 1 instance of sawserver, nqserver,Cluster Controller, nqScheduler and Javahost process. Do it if you need it else not required and single instance in each node with 64 bit architecture is enough capable to handle 700-1000 concurrent user request (considering OBIEE performance parameters have been correctly applied)

Hit “Apply” and “Activate Changes” . It will take some time to create additional processes and start them up on secondary nodes. Once this is done successfully then see the list of running process on secondary node by typing top -u orabi in unix session. And observe below that instance2 system components creates under below location on 2nd node.

/u00/app/Middleware/instances/instance2/bifoundation/OracleBIPresentationServicesComponent/coreapplication_obips1

After successful connections Failover EM screen will be like below:

EM-Failover

I have faced several issues with EM not recognizing the new set of components and processes during Restart. In such cases do Individual component restart or try using opmnctl commands to restart opmn managed components else Bounce Primary node at once and then bounce secondary node once. This should resolve most of the problem otherwise it is bigger issue and something wrong during cluster setup process.

Now “Restart All” from EM and see all green means Horizontal Cluster setup process completed successfully 😀 😀

All Green

STEP 7

If you have any customization carried forward from 10g upgrade this is the right time to do that in each nodes one by one starting from Primary. Follow the steps to stop services on Primary and then make customization changes and then start on primary and then stop on secondary , do customization and then start services. This is standard process step by step.

My Observations:

  • Apparently RPD is share under Shared Path but physically RPD located under repository path below under each Nodes and when the opmnctl process starts up it loads the RPD from this physical path. So the concept of Shared RPD applicable when you open RPD in online mode and made some changes the change is reflected online and this location acts as temporary staging location and after changes in RPD become it sync with clustered nodes and propagated the changes.

For Primary: /u00/app/Middleware/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/repository

For Secondary:

/u00/app/Middleware/instances/instance2/bifoundation/OracleBIServerComponent/coreapplication_obis1/repository

  • Even in Horizontal cluster mechanism Cache has been create in individual nodes. Global Cache is a concept which is applicable during Cache seeding only and nowhere else.
  • If you want to do Vertical clustering on top of Horizontal its easy from EM.
  • After Clustering you can access Node 1 using its HostName . If Host1 is down you can still access Node 1 as WLS internal clustering/Load balance will automatically route the request to 2nd Node by the help of Clustered Controller module. If 2nd node also not found it will redirect request to 3rd Node.
  • The idea of WLS Horizonal clustering is High availability which tells even if Admin Server is down in Node 1 , clustered managed servers/nodes still can work and serve user query without having any downtime.
  • Don’t try to do Customization on STEP 7 before STEP 6 (EM and Failover changes plus adding instances on secondary nodes) because you might have some customization which require that instances directory to be created first inside Secondary nodes.
  • Entire activities above can be done using Unix scripts only. For e.g, even the Horizontal cluster/ Failover/ scalability can be done using WLS scripting and invoking python script from inside shell script. For me its easy to use EM UI to do this but certainly end to end steps possible through scripts.
  • You don’t need to have a successful RPD connections to DB exist while doing this as in those steps services will be bounced several times. You just need a basic placeholder RPD
  • You don’t need the tnsnames to be in ../MiddleWareHome/Oracle_BI1/network/admin path as long as you are using IP:port syntax in RPD connection pool. Else you might think about updating tnsnames.ora on that location to get connectivity with RPD if RPD use DB connect string
  • Note, if you have 3rd, 4th nodes to be added in horizontal cluster you can’t have Scheduler process and ClusterController process in system components for any additional nodes beyond 2nd nodes. This is because failover and cluster controller process can be only present in two nodes/hosts/servers and you have to manage the deployment such a way that both primary and secondary node shouldn’t go down at same time. So apart from Primary and 1 Secondary node Failover will not be available with high availability for Scheduler and ClusterController
  • Scheduler will run only on Node 1(Primary) and Node 2 (Secondary) but it will not run on 3rd Node (Secondary) but scheduler query can run on Node 3.

OBIEE 11g ( 11.1.1.7) support with IE11- XML hack


We know that by now I am writing this blog OBIEE 11.1.1.7 already supported with IE11 (compatibility mode). When I am talking about 11.1.1.7 it is very specific to latest bundle patch version (anything released after 11.1.1.7.140415 (Build 140402.1431 64-bit)

This could be older post but might be helpful who are still with 11.1.1.7 (recognized as most stable so far) and had older patch of 140415 .We are running on 140415 and without IE 11 support . We are in product company having global presence across world. So for us it is not luxury rather a very necessary to think of how we can get support with IE 11 as we can’y say our customer to use different Browser version and they could have enterprise restriction and in our case several customers have Integration of BI dashboard pages in their own application under iFrame. So it is kind of enterprise wide challenge. Also we can’t release part of application in 11g for some users and rest in 10g. It is again maintenance and communication overhead.

I read that IE11 support is not in Oracle Certification matrix and Oracle said they stopped releasing back-port patch only for IE11 with 11.1.1.7.140415. So responsibility is due on us (BI team) how can we can system hack to get this compatibility with less changes in client side.
I took the challenge to find some tweaks to get this system hack. See how in step by step I approached to solution/fix/hack.

What I see if you open OBIEE .7 version in IE11 you will get Presentation service error in browser saying the browser version not supported . If you add that URL in compatibility mode IE11 will save this as URL request come with domain name and save in that list. And immediately you can see IE11 no more complaining about it . But its not end of story . After login you will find several things are broken like Links , Navigation , View selector , Prompts doesn’t work properly .

If you want to see the issue you can put below string in IE11 URL header . This will tell what kind of compatibility your browser is using. For IE11 you can clearly see no “compatible” header will return in User Agent string .

javascript:alert(navigator.userAgent) – below is example from IE 10 browser .

Browser

The reason behind it is IE11 run the browser in default Document Mode as (Edge) . If you click F12 Developers tool in IE11 you will see Emulation tab . And here you will find Document Mode and User Agent .

Compatibility will not tweak on Document mode . So the challenge is, you have to pass the string related to Browser compatibility which will override client browser settings and forced into Emulation mode you specified.

I start reading how Browser works with this compatibility and what kind of X-UA string responsible to do the browser tweak.

I found a tiny OBIEE config file pass this Emulation string with application URL called iecommon.xml file .

You can find this under : <MiddleWare_Home>/Oracle_BI1/bifoundation/web/display/featuretables/iecommon.xml

Only things is you have to change below lines. See 100 was by default , you have to change it IE=10.

After making this change you need to bounce BI System components.

<feature name=”requiredMetaTags” xsi:type=”xsd:string”>&lt;meta http-equiv=”X-UA-Compatible” content=”IE=100″ &gt;</feature>

Simple huh ? Just removing one zero (0) . Now  paste  “javascript:alert(navigator.userAgent)” (without double quotes) in IE URL header now and you can see the strings returned as IE 10 compatible. Means it did the job  !

Now if you presss F12 in IE11 and go to Emulation tab you will find Document header has been forced to change IE 10(default) and its no more Edge. So as soon as you hit OBIEE .7 URL in IE11 the server side parameter will forced them to run in Document mode IE10 .

Now try pulling your IE 11 reports (Note that we still need browser based compatibility settings and that is one time so should be fine for users) and you can clearly see the difference. No issues with Prompts, Navigation , Action link , Drill down , View selector , Alerts , Answers . Its voila ….

This steps is not declared officially and Oracle never said this works but its a system hack/tweak which perfectly work for us. So please use it on own risk.

Enjoy …till next time 🙂

OBIEE 11g Services Start-up Guide for Unix Platform


Below are the step by step guide to do clean start-up and shut-down of OBIEE 11g services. As we all know 11g is beast compared to its ancestor so we need to be careful doing the things in the right way.

Here I have demonstrated the steps how to do that in Linux/Unix platform and the idea is to kick-off below commands from the user having admin roles and in my case dba groups .This user is same I have used to do installation. If you have any other local user better to avoid it for services start-up as this could cause several locks into Java OPSS security files and will be causing problem to start Admin and Managed server of weblogic in bootstrap process.

In below steps I am running all the processes in nohup mode in background of unix session so that it will continue to run even if you have exit from the current running session.

1) Start-up Admin Server:

[orabi@obiappl11g-xxx ~]$ nohup sh /u00/app/MiddlewareHome/user_projects/domains/bifoundation_domain/bin/startWebLogic.sh -Dweblogic.management.username=weblogic -Dweblogic.management.password=xxx > wls_start.log &

Find the below lines in log files which will make sure clear start of Admin Server.
<Apr 28, 2014 10:05:56 PM CDT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RUNNING>
<Apr 28, 2014 10:05:56 PM CDT> <Notice> <WebLogicServer> <BEA-000360> <Server started in RUNNING mode>

2)  Start-up Node Manager:

[orabi@obiappl11g-xxx ~]$ nohup sh /u00/app/MiddlewareHome/wlserver_10.3/server/bin/startNodeManager.sh > nodem_start.log &

Find the below lines in log files to confirm start-up:

<Apr 28, 2014 10:07:03 PM> <INFO> <Secure socket listener started on port 9556>
Apr 28, 2014 10:07:03 PM weblogic.nodemanager.server.SSLListener run
INFO: Secure socket listener started on port 9556

3) Start-up Managed Server:

[orabi@obiappl11g-xxx ~]$ nohup sh /u00/app/MiddlewareHome/user_projects/domains/bifoundation_domain/bin/startManagedWebLogic.sh bi_server1 http://obiappl11g-xxx:7001 > start_bi_server1.log &

Look for below lines in log files to confirm Managed Server start-up:

WebLogic Managed Server “bi_server1” for domain “bifoundation_domain” running in Production Mode>
<Apr 28, 2014 10:53:14 PM CDT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RUNNING>
<Apr 28, 2014 10:53:14 PM CDT> <Notice> <WebLogicServer> <BEA-000360> <Server started in RUNNING mode>

Note: In above steps http://obiappl11g-xxx:7001 is my Admin URL where  obiappl11g-xxx is server name.

Make sure you have added WLS_USER and WLS_PWD in below file before start-up so you don’t need to pass that in command line parameters:

/u00/app/MiddlewareHome/user_projects/domains/bifoundation_domain/bin/startManagedWebLogic.sh

You can start-up this from WLS Console or Fusion Middleware Control also.

4) Start OPMN managed services:

[orabi@obiappl11g-xxx ~]$ /u00/app/MiddlewareHome/instances/instance1/bin/opmnctl startall

enter and wait until command prompt comeback

Check the status:

[orabi@obiappl11g-xxx ~]$ /u00/app/MiddlewareHome/instances/instance1/bin/opmnctl status

It should be look like this:

Processes in Instance: instance1
———————————+——————–+———+———
ias-component                    | process-type       |     pid | status
———————————+——————–+———+———
coreapplication_obiccs1          | OracleBIClusterCo~ |    5467 | Alive
coreapplication_obisch1          | OracleBIScheduler~ |    5801 | Alive
coreapplication_obijh1           | OracleBIJavaHostC~ |    5465 | Alive
coreapplication_obips1           | OracleBIPresentat~ |    5463 | Alive
coreapplication_obis1            | OracleBIServerCom~ |    5464 | Alive

Now Voila…. After all services started up in good health 🙂

11g

 

Note: You can see background processes from top -u serid> command:

[orabi@obiappl11g-xxx ~]$ top -u orabi

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5065 orabi     20   0 5983m 1.1g  42m S  1.0  2.3   2:20.89 java
 7117 orabi     20   0 19452 1352  948 R  0.7  0.0   0:00.19 top
 2643 orabi     20   0 6100m 1.2g  48m S  0.3  2.6   3:01.12 java
 2841 orabi     20   0 2095m 129m  19m S  0.3  0.3   0:05.34 java
 5464 orabi     20   0 4538m 121m  47m S  0.3  0.3   0:04.26 nqsserver
 5465 orabi     20   0 3980m 232m  16m S  0.3  0.5   0:07.90 java
 2448 orabi     20   0  105m 1936 1492 S  0.0  0.0   0:00.24 bash
 2578 orabi     20   0  103m 1432 1120 S  0.0  0.0   0:00.03 sh
 2809 orabi     20   0  103m 1364 1108 S  0.0  0.0   0:00.01 sh
 5011 orabi     20   0  103m 1264 1092 S  0.0  0.0   0:00.00 sh
 5012 orabi     20   0  103m 1436 1120 S  0.0  0.0   0:00.02 startWebLogic.s
 5440 orabi     20   0 68328 8424 5900 S  0.0  0.0   0:00.00 opmn
 5441 orabi     20   0 1757m  17m  10m S  0.0  0.0   0:06.68 opmn
 5463 orabi     20   0 4172m 271m  72m S  0.0  0.6   0:03.49 sawserver
 5467 orabi     20   0 1481m  24m  15m S  0.0  0.1   0:00.70 nqsclustercontr
 5801 orabi     20   0 1909m  78m  51m S  0.0  0.2   0:01.66 nqscheduler

or else you can see them from running job list:

[orabi@obiappl11g-xxx ~]$ jobs
[1]   Running                 nohup sh /u00/app/MiddlewareHome/user_projects/domains/bifoundation_domain/bin/startWebLogic.sh -Dweblogic.management.username=weblogic -Dweblogic.management.password=xxx> wls_start.log &
[2]-  Running                 nohup sh /u00/app/MiddlewareHome/wlserver_10.3/server/bin/startNodeManager.sh > nodem_start.log &
[3]+  Running                 nohup sh /u00/app/MiddlewareHome/user_projects/domains/bifoundation_domain/bin/startManagedWebLogic.sh bi_server1 http://obiappl11g-t1.gain.tcprod.local:7001 > start_bi_server1.log &

Make sure you follow the same rule during shutdown…

Leave it for you guys to explore how to do that 😉 😛

Fixing Pivoted Graph Conditional Formatting – OBIEE 11g


OBIEE 11g upgrade has a lot of issues and among couple of them common is on Conditional Formatting . This emphasis that even if the Catalog upgrade is successful there are some of the problem still persists in XML conversion engine of the Upgrade Assistant(UA) leading though several catalog defects due to malformed XML . This is another example as my upgraded 11g charts failing to show its conditional formatting which is was working perfectly in 10g ….

Lets deep dive on whats the issue …

Below is the conditional format definition in 11g which is aligned with 10g:

So ideally it should show Chart bars (which is generated from Combined request Pivoted Graph) and see the below comparison:

To understand the reason of why 11g chart not showing the conditional format I went through the full XML from Advanced tab of 11g and find the below discrepancies:

This is malformed XML generated by UA during the upgrade of 10g code .If we look into 10g Advanced XML this is actually decimal but for some reason UA failed to put correct literal against it .

So the fix is replace all the instances of string “untypedLiteral” with “decimal’ like below:

Now after applying the XML the 11g chart output like :

And voila …. the problem is fixed … There could be another workaround to copy your 10g entire XML to 11g and yes it should work     🙂 🙂

Enjoy ..till next time…

D