java.net.SocketTimeoutException in OBIEE/Weblogic : Read timed out


I can continue write a long thread about this issue that we have faced in regard of this SocketTimeoutException. However this time I will keep in concise.
Here is the issue : two of the prompts in our Production environment suddenly broken which was working fine for past 3-5 months. In between we have made bundle patch upgrade on our 11.1.1.7 env , we did network/firewall level changes so we were not sure why those two prompts started behaving weirdly. Also we found we had similar kind of prompt exist elsewhere which was not facing the vulnerabilities.

By broken, I mean when user click on the prompt they are keep getting “Please wait” appearing in the drop-down menu and the drop-down value never displayed. And this only happens when you are accessing application through web / outside your corporate network and through external DNS. The prompt works fine in internal VPN network.
Since we have thousands of login happen each day, the Production OBIEE environment crashes after couple of days because of this issue. By crashes we mean OBIEE/EM?Console login hangs , you will not find any process went down in opmnctl status -l , all alive , EM / Console even doesn’t show anything red(if you able to login here) determining its a crash .

Pretty interesting ….. !!! Huh ?

When we inspect the console log of browser we have spotted after 2-3 minutes the browser throws below error :

“Internal Server error” Ref :
500 Internal Server Error
ERROR Codes for Reference #3.5e7f1cb8.1547495087.4c779890

The above was Akamai reference number. And for this issue there is nothing to do with Akamai edge or origin server and its cache issue. We have checked Akamai has no issue with cache content

As soon we receive below error the ADR incident log for obiee , bi_Serverx.log and biserver_diagnostic.log throws below :

This appears in Incident log (change the path to be exact for your environment)
+++++++++++++++++++++++++++++
/u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server1/

adr/diag/ofm/bifoundation_domain/bi_server1/incident/incdir_539
Incident Id: 7997
Incident Source: SYSTEM
Create Time: Tue Jan 15 12:11:11 EST 2019
Problem Key: DFW-99998 [java.net.SocketTimeoutException][oracle.security.jps.ee.http.JpsAbsFilter$1.run][analytics]
ECID: e8398b29c4083075:10e4302f:1685274d016:-8000-0000000000008333
Application Name: analytics
User Name: <WLS Kernel>
Error Message Id: DFW-99998

Context Values
————–
DFW_SERVER_NAME : bi_server1
DFW_DOMAIN_NAME : bifoundation_domain
DFW_USER_NAME : <WLS Kernel>
DFW_APP_NAME : analytics
Description
———–
Incident detected using watch rule “UncheckedException”:
Watch time: Jan 15, 2019 12:11:11 PM EST
Watch ServerName: bi_server1
Watch RuleType: Log
Watch Rule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
Watch DomainName: bifoundation_domain
Watch Data:
DATE : Jan 15, 2019 12:11:11 PM EST
SERVER : bi_server1
MESSAGE : [ServletContext@756918633[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException.
java.net.SocketTimeoutException: read is alrady timed out
——————–

THIS APPEARS IN BI_SERVERx.log
/u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server2/logs/bi_server2.log
++++++++++++++++++++++++++++++
[WARNING:7] [WL-320068] [Diagnostics] [host: <hostname>] [nwaddr: [10.30.xx.xxx] [tid: [ACTIVE].ExecuteThread: ’71’ for queue: ‘weblogic.kernel.Default (self-tuning)’] [userId: <WLS Kernel>] [LOG_FILE: /u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server2/logs/bi_server2.log] Watch ‘UncheckedException’ with severity ‘Notice’ on server ‘bi_server2’ has triggered at Jan 23, 2019 9:36:00 AM EST. Notification details: [[
WatchRuleType: Log
WatchRule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
WatchData: DATE = Jan 23, 2019 9:36:00 AM EST SERVER = bi_server2 MESSAGE = [ServletContext@1881523291[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:128)
at weblogic.servlet.internal.PostInputStream.read(PostInputStream.java:196)
at java.io.InputStream.read(InputStream.java:82)
at weblogic.servlet.internal.ServletInputStreamImpl.read(ServletInputStreamImpl.java:222)
at com.siebel.analytics.utils.InputStreamWithLimit.read(InputStreamWithLimit.java:39)
at com.siebel.analytics.utils.IOUtils.copyStreams(IOUtils.java:66)
at com.siebel.analytics.web.j2eeutils.SAWHttpUtils.getRequestBytes(SAWHttpUtils.java:39)
at com.siebel.analytics.web.integration.LoadBalancerHTTPFilter.determineServerFromRequest(LoadBalancerHTTPFilter.java:293)
at com.siebel.analytics.web.integration.LoadBalancerHTTPFilter.determineServerFromRequest(LoadBalancerHTTPFilter.java:243)
at com.siebel.analytics.web.integration.LoadBalancerHTTPFilter.doFilter(LoadBalancerHTTPFilter.java:175)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.security.jps.ee.http.JpsAbsFilter$1.run(JpsAbsFilter.java:119)
at java.security.AccessController.doPrivileged(Native Method)
at oracle.security.jps.util.JpsSubject.doAsPrivileged(JpsSubject.java:324)
at oracle.security.jps.ee.util.JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:460)
at oracle.security.jps.ee.http.JpsAbsFilter.runJaasMode(JpsAbsFilter.java:103)
at oracle.security.jps.ee.http.JpsAbsFilter.doFilter(JpsAbsFilter.java:171)
at oracle.security.jps.ee.http.JpsFilter.doFilter(JpsFilter.java:71)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.dms.servlet.DMSServletFilter.doFilter(DMSServletFilter.java:163)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3748)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3714)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2283)
at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2182)
at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1499)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:263)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)
SUBSYSTEM = HTTP USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: ’38’ for queue: ‘weblogic.kernel.Default (self-tuning)’ MSGID = WL-101017 MACHINE = <hostname> TXID = CONTEXTID = 0000MXv8etiFKAw_wDCCyW1SGacg003qNu TIMESTAMP = 1548254160834
WatchAlarmType: AutomaticReset
WatchAlarmResetPeriod: 30000
]]

++++++++++++++++++++++++++++

As you are seeing there are two variations of the log :

1) java.net.SocketTimeoutException: Read timed out
2) java.net.SocketTimeoutException: read is alrady timed out (note: the typo is not me and it is writing by the OBIEE product itself ! )

Essentially both are same and incident log says this is your problem key : “Problem Key: DFW-99998 [java.net.SocketTimeoutException][oracle.security.jps.ee.http.JpsAbsFilter$1.run][analytics]”

When we started investigating we have seen the prompt code is not special and we have used regular SQL results with SELECT statement and UNION statement.
Most weird part of if we use union all (as lowercase) it works and if we use FETCH ONLY 650001 ROWS clause at the end of the select statement of prompt logical query it works absolutely fine. Again these are the interesting facts which we can contradict later but these are our findings.

Here is the solution :

After going through several layers of network trace , Akamai , IDP Ping SSO , External / Internal DNS check, Checking the packets transfer across firewall , tracing the various logs we are seeing that when you click prompt the request headers even not passing through presentation server session log so its being stuck somewhere in the network and network is not able to process the request . Finally we came to know that we had security layer incorporated by Secureworks (that is a Intrusion Prevention System) which intercept the OBIEE browser POST request header and when it sees the SQL Injections operation (by SELECT or UNION ) via its Intrusion Prevention rule it blocks the inflow of traffic and that stops passing the request to other network layers and cause Weblogic server to go in unknown state. We had to allow those SQL rules across network as white listed traffic to stop this “SocketTimeoutException”.
As soon as we did this those 2 prompts started working fine but the “SocketTimeoutException” didn’t completely goes away . However the volume of this Exception reduced by 100x, 200x and we have seen no sign of crashing the OBIEE platform.

So in Summary :

Root Cause:

The Intrusion Prevention System misidentifying application traffic as malicious.

 Contributing factors:

  • OBIEE uses raw SQL statements for normal application functionality causing the IPS to misidentify the traffic as malicious.
  • OBIEE products do not utilize TLS.
  • Specific filter in the OBIEE, using those filters generate the socket timeout exception which create cascading effect having the product to become unresponsive
  • The server encountered an internal error or mis-configuration and was unable to complete the request.

Unfortunately, because of this Network issue, being in application team, we had to loads of hours to do root cause analysis before we figure this out. Facts of life for developers !!!

Authentication error while upgrading to JDK8 on 11g weblogic 10.3.6.0


Upgrading JDK for existing 11g OBIEE is very simple as it is defined in Oracle Documentation here:  https://docs.oracle.com/cd/E28280_01/bi.1111/e10541/patching.htm#BIESG9065

Generally if not said explicitly in Certification matrix it is assumed that any higher jdk6+ will be compatible with OBIEE 11g which is not in this case. That is weird !

But looks like its not that simple when you have followed the process and all services (Admin / Managed / Obis / Obips / opmn) up and running but heck no clue (not even from log) why the the user authentication fails.

The only clue can be given by weblogic Console when you launch it and go to Deployment components you will several weblogic components is in “Prepared” state rather “Active” . So this is your problem .

Weblogic Error_JDK8

When you search for this error you would be able to know sun/io/CharacterEncoding is obsolete for JDK 8 and some of the dependent process will fail subsequently .

Now the only solution you would be having to resolve this problem to downgrade to JDK 7. Why I have faced this error is, I want to execute my bi-migration tool jar exported from JDK 8 (as in OBIEE 12c) and want to create a 11g export bundle for use in 12c as import archive. After making the JDK change I started all the services and want to see if my application working fine or not and found its broke.

 

OBIEE cool effect – Mouse hover on table and row highlight


My requirement was to add some cool effect on table row hover when you move your mouse across regular table or pivot table view in OBIEE

The advantage of this feature is, its cool and not distracting and it will help you to focus one row of important analysis which you can easily distinguish from the other rows.

Here is how it looks like once you hover . It changes the table row background color to light yellow.

1-30-2018 11-29-44 PM

1-30-2018 11-30-33 PM

Also here is the look and feel on alternate row colors(as Cyan) and hover on one particular row as Yellow.

1-30-2018 11-51-21 PM

Here is the simple CSS code you need to add in Narrative view of the report and in the compound layout where your table/pivot table view is placed. Below code works good for both Table and Pivot table view

&lt; style &gt;

.PTChildPivotTable tr:hover td {
background-color: #FFF1CD !important;
}

&lt; /style &gt;

Just replace &lt; with < and &gt; with > . So what it is doing is finding the Table class used in OBIEE and overriding the table row (tr) hover on table data (td) and changed the color.

1-30-2018 11-55-48 PM

Simple and sweet ! and isn’t it cool ? 😛

 

OBIEE 12c – “CSF error” OR “OFM security error” OR “no matching Authentication Protocol”


So , Nowhere in OBIEE 12c Installation said that, you need to have certain pre-requisite condition in the DB to ensure successful installation and  you end up scratching head why you are getting all the nonsense in OBIEE installation logs when everything you did right !

Yes , unfortunately that life with Oracle tool ! (! Sigh ! ) and if you are the one person who hit this error you can try out the option below to fix.

While installing brand new OBIEE12c (not an upgrade)  we have seen issue below and log : /home/orabi/oraInventory/logs/config<time>/startallservers.log

OPMN failed and BIServer , BIPServer and BIScheduler all was in SHUTTING DOWN state.

Below is the server status (./status.sh)

Name                Type            Machine                     Status

—-                    —-                 ——-                       ——

AdminServer     Server            dfatobi                   RUNNING

bi_server1        Server             dfatobi                   RUNNING

obips1             OBIPS            dfatobi                   SHUTDOWN      

obijh1              OBIJH             dfatobi                   RUNNING

obiccs1           OBICCS          dfatobi                   RUNNING

obisch1          OBISCH          dfatobi                   SHUTDOWN      

obis1              OBIS               dfatobi                   SHUTDOWN     

And you see the errors below in bunch of   log files  :

[OBIS] [ERROR:1] [] [] [ecid: ] [sik: ssi] [tid: f1b51720]  [46137] CSF error encountered. Error code: 43131. [[^M

file: server/Utility/Generic/Src/SUGCSF.cpp; line: 133

 [OBIPS] [WARNING:16] [] [saw.csf.cache] [ecid: ] [tid: ] [SI-Name: ] [IDD-Name: ] [IDD-GUID: ] [userId: ] OFM security reported error 43131[[

File:csfwrapperimpl.cpp

Line:213

errCode:28040 errSQLStateBuf:HY000 errMsg:[Oracle DataDirect][ODBC Oracle Wire Protocol driver][Oracle]ORA-28040: No matching authentication protocol

errCode:0 errSQLStateBuf:08003 errMsg:[DataDirect][ODBC lib] Connection not open

Oracle BI Server starting…

errCode:28040 errSQLStateBuf:HY000 errMsg:[Oracle DataDirect][ODBC Oracle Wire Protocol driver][Oracle]ORA-28040: No matching authentication protocol

errCode:0 errSQLStateBuf:08003 errMsg:[DataDirect][ODBC lib] Connection not open

Oracle BI Server startup failed.

nqsserver: Oracle BI Server shutdown.

This means you might end up with Hitting Oracle bug ? My DB server was Oracle 12c R2 12.2 . Lot of places in OTN, Oracle said you might try Oracle 12c R1 but I recommend try below step to see if this works for you .

Try disabling OAS (Oracle Advanced Security) . Ask Oracle DBA to change the DB with below in sqlnet.ora file and you are good to go …

SQLNET.ENCRYPTION_SERVER = rejected 

If that doesn’t help you can try enabling OBIEE (creating a sqlnet.ora file under [DOMAIN_HOME]/config/fmwconfig/bienv/core/) for OAS decryption. However I tried this option and it doesn’t work as for new and scratch installation where do I get this before until installation is done… so previous step does the trick .

SQLNET.ENCRYPTION_CLIENT=accepted
SQLNET.ENCRYPTION_TYPES_CLIENT=(AES256)

The actual reason for the issue is : You are using an Oracle Database (for RCU or datasource) with Oracle Advanced Security (OAS) enabled and with SQL*Net Encryption.

 

OBIEE 12c : Component communication ports


As we know OBIEE 12c is different in terms of lot of things , it is also different from its ancestors in terms of the internal system and other components communication ports. Here is a quick snapshot of all available ports:

ADMIN Server : 9500 (EM, Console , DMS)
BI Server : 9502 (Analytics, XMLPserver,BI Publisher, Mobile , Mapviewer, , Visual Analyzer(VA)

OBISCH_SELF_xxx – OBISCH.obisch1 – 9511 – tcp (s)
OBISCH_MONITOR_xxx – OBISCH_MONITOR.obisch1 – 9512 tcp(s)
BIANALYTICS_URLS, ORACLE_BI_SECURITY_SOAP_URLS  – 9505/analytics
OBIPS_HOSTS – 9507
OBIJH_HOSTS – 9510

ORACLE_BI_SECURITY_SOAP_URLS , ORACLE_BI_SEARCH_REST_URLS , ORACLE_MAPVIEWER_URLS  – 9505

ORACLE_BI_JAVADS_SERVER_URL, ORACLE_BICOMPOSER_URLS – 9502

OBICCS_NODES – 9508
OBICCS_MONITOR_NODES – 9509
OBIS_NODES, OBIS_HOSTS – 9514 (This is the communication port to define in System DSN when creating new ODBC data source in OBIEE 12c , Note : default would be 9703 which is fine for 11g and NOT for 12c)
OBIS_MONITOR_NODES – 9515

Also do keep in mind the ports configuration in default installation. This can be changed in weblogic EM but this are the standard communication ports in 12c.

8-19-2018 5-48-46 PM

Excerpt from Oracle 12c documentation:

NOTE:  The default client/server communication method for the Oracle BI Server has changed from Distributed component object model (DCOM) to TCP/IP. Support for DCOM will be discontinued in a future release. For sites already running the Oracle BI Server that want to continue to use DCOM until support is discontinued, leave this field set to its default value and define a Windows system environment variable named NQUIRE_DCOM to force the usage of DCOM. Set the variable value to 1. (To define a system environment variable, select System from the Control Panel, click the Advanced tab, and then click the Environment Variables button to open the Environment Variables dialog box.)

OBIEE 12c: uploadRpd Failed: Failure in trying to acquire lock. Check bi-lcm-logs or diagnostics.


If you see below error after uploading the 12c RPD make sure you have right parameter passed at the command line .

uploadRpd Failed: Failure in trying to acquire lock. Check bi-lcm-logs or diagnostics. Error Desc Code: DESC_CODE_SERVER_EXCEPTION

This typically happens if the instances are wrong which mean -SI parameter in uploadrpd utility . Please make sure you double check and correct this .

In my case ssi is the default -SI argument value at the command line and you should be able to use cd /u00/app/Oracle_Home/user_projects/domains/bi/bidata/service_instances/ssi/metadata/datamodel/customizations to land to right path where ssi is my -SI argument.

So double check and this might be your issue and once you right service instances you should be good unless some other issues.

Also non comprehensive bi-lcm-logs tells to lookup under managed server log path and see the error description in details under file bi-lcm-rest.log.x at : /u00/app/Oracle_Home/user_projects/domains/bi/servers/bi_server1/logs

De-install OBIEE 12c in <1 minute


Rather going to GUI or achieving this executing bunch of Unix commands I will show you how to do this in 1 minute and possibly less than 1 minute 🙂

My VM Host is Linux and I have SYSDBA access to DB : 11.2.0.4 (minimal required for 12c)

Step 1 : Stop all running processes 

Go to BITools location :  /u00/app/Oracle_Home/user_projects/domains/bi/bitools/bin

Then ./stop.sh   # this will stop all processes associated with BI

Step 2:  DROP RCU Schema  — Needs SYSDBA privilege or ask DBA

Select * from System.SCHEMA_VERSION_REGISTRY where MRC_NAME='<add DB PREFIX>’;   — Check DB Prefix where RCU is installed

delete from System.SCHEMA_VERSION_REGISTRY where MRC_NAME='<add DB PREFIX>’;

Commit;

Locate all RCU schemas and Perform DROP schema operation . Typically it is 9 RCU schemas.  For my case its below:

drop-schema

Step 3: remove the app folder :

in my case :   rm -rf  /u00/app

You can achieve this above by writing a Unix script combining all steps and just press one button .

This steps applicable for Linux env but for Windows you might need to clear extra stuffs sitting inside Registry.

If you don’t have direct access to database then for Step 2) you can user deinstall utility to drop the repository automatically and let the scripts do that internally.