I can continue write a long thread about this issue that we have faced in regard of this SocketTimeoutException. However this time I will keep in concise.
Here is the issue : two of the prompts in our Production environment suddenly broken which was working fine for past 3-5 months. In between we have made bundle patch upgrade on our 11.1.1.7 env , we did network/firewall level changes so we were not sure why those two prompts started behaving weirdly. Also we found we had similar kind of prompt exist elsewhere which was not facing the vulnerabilities.
By broken, I mean when user click on the prompt they are keep getting “Please wait” appearing in the drop-down menu and the drop-down value never displayed. And this only happens when you are accessing application through web / outside your corporate network and through external DNS. The prompt works fine in internal VPN network.
Since we have thousands of login happen each day, the Production OBIEE environment crashes after couple of days because of this issue. By crashes we mean OBIEE/EM?Console login hangs , you will not find any process went down in opmnctl status -l , all alive , EM / Console even doesn’t show anything red(if you able to login here) determining its a crash .
Pretty interesting ….. !!! Huh ?
When we inspect the console log of browser we have spotted after 2-3 minutes the browser throws below error :
“Internal Server error” Ref :
500 Internal Server Error
ERROR Codes for Reference #3.5e7f1cb8.1547495087.4c779890
The above was Akamai reference number. And for this issue there is nothing to do with Akamai edge or origin server and its cache issue. We have checked Akamai has no issue with cache content
As soon we receive below error the ADR incident log for obiee , bi_Serverx.log and biserver_diagnostic.log throws below :
This appears in Incident log (change the path to be exact for your environment)
+++++++++++++++++++++++++++++
/u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server1/
adr/diag/ofm/bifoundation_domain/bi_server1/incident/incdir_539
Incident Id: 7997
Incident Source: SYSTEM
Create Time: Tue Jan 15 12:11:11 EST 2019
Problem Key: DFW-99998 [java.net.SocketTimeoutException][oracle.security.jps.ee.http.JpsAbsFilter$1.run][analytics]
ECID: e8398b29c4083075:10e4302f:1685274d016:-8000-0000000000008333
Application Name: analytics
User Name: <WLS Kernel>
Error Message Id: DFW-99998
Context Values
————–
DFW_SERVER_NAME : bi_server1
DFW_DOMAIN_NAME : bifoundation_domain
DFW_USER_NAME : <WLS Kernel>
DFW_APP_NAME : analytics
Description
———–
Incident detected using watch rule “UncheckedException”:
Watch time: Jan 15, 2019 12:11:11 PM EST
Watch ServerName: bi_server1
Watch RuleType: Log
Watch Rule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
Watch DomainName: bifoundation_domain
Watch Data:
DATE : Jan 15, 2019 12:11:11 PM EST
SERVER : bi_server1
MESSAGE : [ServletContext@756918633[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException.
java.net.SocketTimeoutException: read is alrady timed out
——————–
THIS APPEARS IN BI_SERVERx.log
/u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server2/logs/bi_server2.log
++++++++++++++++++++++++++++++
[WARNING:7] [WL-320068] [Diagnostics] [host: <hostname>] [nwaddr: [10.30.xx.xxx] [tid: [ACTIVE].ExecuteThread: ’71’ for queue: ‘weblogic.kernel.Default (self-tuning)’] [userId: <WLS Kernel>] [LOG_FILE: /u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server2/logs/bi_server2.log] Watch ‘UncheckedException’ with severity ‘Notice’ on server ‘bi_server2’ has triggered at Jan 23, 2019 9:36:00 AM EST. Notification details: [[
WatchRuleType: Log
WatchRule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
WatchData: DATE = Jan 23, 2019 9:36:00 AM EST SERVER = bi_server2 MESSAGE = [ServletContext@1881523291[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:128)
at weblogic.servlet.internal.PostInputStream.read(PostInputStream.java:196)
at java.io.InputStream.read(InputStream.java:82)
at weblogic.servlet.internal.ServletInputStreamImpl.read(ServletInputStreamImpl.java:222)
at com.siebel.analytics.utils.InputStreamWithLimit.read(InputStreamWithLimit.java:39)
at com.siebel.analytics.utils.IOUtils.copyStreams(IOUtils.java:66)
at com.siebel.analytics.web.j2eeutils.SAWHttpUtils.getRequestBytes(SAWHttpUtils.java:39)
at com.siebel.analytics.web.integration.LoadBalancerHTTPFilter.determineServerFromRequest(LoadBalancerHTTPFilter.java:293)
at com.siebel.analytics.web.integration.LoadBalancerHTTPFilter.determineServerFromRequest(LoadBalancerHTTPFilter.java:243)
at com.siebel.analytics.web.integration.LoadBalancerHTTPFilter.doFilter(LoadBalancerHTTPFilter.java:175)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.security.jps.ee.http.JpsAbsFilter$1.run(JpsAbsFilter.java:119)
at java.security.AccessController.doPrivileged(Native Method)
at oracle.security.jps.util.JpsSubject.doAsPrivileged(JpsSubject.java:324)
at oracle.security.jps.ee.util.JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:460)
at oracle.security.jps.ee.http.JpsAbsFilter.runJaasMode(JpsAbsFilter.java:103)
at oracle.security.jps.ee.http.JpsAbsFilter.doFilter(JpsAbsFilter.java:171)
at oracle.security.jps.ee.http.JpsFilter.doFilter(JpsFilter.java:71)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.dms.servlet.DMSServletFilter.doFilter(DMSServletFilter.java:163)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3748)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3714)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2283)
at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2182)
at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1499)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:263)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)
SUBSYSTEM = HTTP USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: ’38’ for queue: ‘weblogic.kernel.Default (self-tuning)’ MSGID = WL-101017 MACHINE = <hostname> TXID = CONTEXTID = 0000MXv8etiFKAw_wDCCyW1SGacg003qNu TIMESTAMP = 1548254160834
WatchAlarmType: AutomaticReset
WatchAlarmResetPeriod: 30000
]]
++++++++++++++++++++++++++++
As you are seeing there are two variations of the log :
1) java.net.SocketTimeoutException: Read timed out
2) java.net.SocketTimeoutException: read is alrady timed out (note: the typo is not me and it is writing by the OBIEE product itself ! )
Essentially both are same and incident log says this is your problem key : “Problem Key: DFW-99998 [java.net.SocketTimeoutException][oracle.security.jps.ee.http.JpsAbsFilter$1.run][analytics]”
When we started investigating we have seen the prompt code is not special and we have used regular SQL results with SELECT statement and UNION statement.
Most weird part of if we use union all (as lowercase) it works and if we use FETCH ONLY 650001 ROWS clause at the end of the select statement of prompt logical query it works absolutely fine. Again these are the interesting facts which we can contradict later but these are our findings.
Here is the solution :
After going through several layers of network trace , Akamai , IDP Ping SSO , External / Internal DNS check, Checking the packets transfer across firewall , tracing the various logs we are seeing that when you click prompt the request headers even not passing through presentation server session log so its being stuck somewhere in the network and network is not able to process the request . Finally we came to know that we had security layer incorporated by Secureworks (that is a Intrusion Prevention System) which intercept the OBIEE browser POST request header and when it sees the SQL Injections operation (by SELECT or UNION ) via its Intrusion Prevention rule it blocks the inflow of traffic and that stops passing the request to other network layers and cause Weblogic server to go in unknown state. We had to allow those SQL rules across network as white listed traffic to stop this “SocketTimeoutException”.
As soon as we did this those 2 prompts started working fine but the “SocketTimeoutException” didn’t completely goes away . However the volume of this Exception reduced by 100x, 200x and we have seen no sign of crashing the OBIEE platform.
So in Summary :
Root Cause:
The Intrusion Prevention System misidentifying application traffic as malicious.
Contributing factors:
- OBIEE uses raw SQL statements for normal application functionality causing the IPS to misidentify the traffic as malicious.
- OBIEE products do not utilize TLS.
- Specific filter in the OBIEE, using those filters generate the socket timeout exception which create cascading effect having the product to become unresponsive
- The server encountered an internal error or mis-configuration and was unable to complete the request.
Unfortunately, because of this Network issue, being in application team, we had to loads of hours to do root cause analysis before we figure this out. Facts of life for developers !!!
You must be logged in to post a comment.