WebLogic Stuck Threads: Creating, Understanding and Dealing with them

Using the time off during the bank holidays over Easter I spent some time coding and looking into more unknown details of WebLogic stuck thread behavior. (Actually I started to write this posting because I was told by my doctor to keep my mouth shut for some days, but that’s another story…).

My personal task was to answer some of the most common questions I’ve encountered while consulting and running WebLogic 12c workshops. As often with my postings, this article is not meant to explain the basic concept of thread pools or workmanagers. I recommend to read the Oracle WebLogic 12c documentation about stuck thread handling first which explains how you can deal with stuck threads by configuring a workmanager.

Also there are some excellent details about stuck threads (including WLST scripting and monitoring) to be found at the Middleware Magic site - a site run by a group of really knowledgeable guys.

 

Now, typically customers tell me that they “observe some stuck threads”, “sometimes”, but often they are “not sure what caused them” and typically they “don’t know what exact state these thread are in” and in addition nobody seems to know if “the stuck threads ever clear up again without rebooting”. I am a pragmatic guy. I enjoy having little applications or tools to demonstrate and measure how WebLogic is working. Keen to play around with the newest edition of Netbeans (I used to be an Eclipse guy) and EJB 3.1 in WLS12c I built a small application to easily test WebLogic stuck thread settings and countermeasures.

Here are some more details about the StuckThreadForFree application:

  • The application allows you to create threads which are busy or which are waiting long enough to be detected as “stuck” by WebLogic.
  • This little application will only work with WLS12c. I intentionally avoided JSF, so a plain JSP page is used to set your parameters. The JSP is calling a simple Servlet which in a for loop is calling an asynchronous business method of an injected  stateless session bean. @Asynchronous and no-interface session beans are only available in EJB3.1 so you have to run it on WLS12c. Unlike in previous versions, the EJB is directly packaged into the .war file for deployment.
  • Every call to the stateless session bean is serialized by the EJB container, so every EJB method is executing in its own thread.
  • Depending on which method was called on the EJB is either waiting n seconds using Thread.sleep() or calculating some trigonometric function for n seconds. Both methods will cause stuck threads.
  • There is zero configuration in the deployment descriptors for the EJB! Only context-root for the web part is set (which could be avoided as well).
  • Building the StuckThreadForFree app with Netbeans was a smooth ride and a real pleasure.
  • The app is provided as is. You can have it for free, yet there is no guarantee for anything but it shouldn’t cause any problems either. Better don’t run it on your production system.
  • It’s just a hack. It demonstrates what it should, nothing else.

DOWNLOAD: for your conveninience you can download the StuckThreadForFree.war from here and follow the example yourself (here is the link to the whole Netbeans project). After downloading you can easily deploy it to WebLogic. To follow the example it’s good enough to run it the admin server. Then you can start with the following URL:

http://localhost:7001/StuckThreadForFree

Now, lets use the app to answer some typical questions.

What are hogging threads? When do threads become hogged? After what period of time?

According to the Oracle doc hogging threads “.. will either be declared as stuck after the configured timeout or will return to the pool before that. The self-tuning mechanism will backfill if necessary.”

So how long does it take for them to become hogged? Nobody (including Google) seemed to know. Trust me I did some research and asked plenty of colleagues about this. Here is the answer:

If you run the application with 3 threads / 100 seconds / Thread.sleep() and immediately switch to the WebLogic 12c admin console Admin Server / Monitoring / Threads you will observe the following:

 

So interestingly hogging threads are detected right away! In my case it took about 2 seconds (I had to hit reload once).

 

So WebLogic transitions into FAILED state when a certain number of stuck threads are detected, right? 

That’s a common misconception! The default configuration of WLS 12c (I also checked for WLS 11 = 10.3.3) is Stuck Thread Count = 0, which means the server “never transitions into FAILED server irrespective of the number of stuck threads”. You will only see the FAILED state only when you set the value to a positive number of threads!

Once the server transitions into FAILED, you can define if WLS should be shut down (and restarted by WLS nodemanager) or suspended.

 

Remember: WLS will not transition into FAILED state when StuckThreadCount is set to zero. Only the health runtime value is set to Warning (but this will be cleared if the hogging thread conditions clears) as shown below:

 

What exactly causes a stuck thread? What state does a thread have to be in to be marked as stuck?

In general there is a number of different thread states in Java: NEWRUNNABLEBLOCKEDWAITINGTIMED_WAITINGTERMINATED.

But which state has a thread to be in to be marked as stuck later? If you run the StuckThreadForFree application and create a stack trace with WebLogic admin console under Server / ServerName / Monitoring / Threads you can observe that the thread state is ACTIVE/TIMED_WAITING when using the Thread.sleep() method to block it:

 

"[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'" TIMED_WAITING
            	java.lang.Thread.sleep(Native Method)
            	com.munzandmore.stuckthread.LongRunningEJB.threadSleep(LongRunningEJB.java:26)
            	com.munzandmore.stuckthread.LongRunningEJB_x9v26k_NoIntfViewImpl.__WL_invoke(Unknown Source)

 

 

when using the calc() method to keep the threads busy they are state ACTIVE/RUNNABLE :

"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE
            	com.munzandmore.stuckthread.LongRunningEJB.threadCalc(LongRunningEJB.java:40)
            	com.munzandmore.stuckthread.LongRunningEJB_x9v26k_NoIntfViewImpl.__WL_invoke(Unknown Source)
            	weblogic.ejb.container.internal.SessionLocalMethodInvoker.invoke(SessionLocalMethodInvoker.java:31)

So both states can become stuck. Also, I am pretty sure I could also show the BLOCKED state when using a monitor lock for synchronization but due to time restrictions this is not included in the app.

 

Can a stuck thread still do reasonable work?

Absolutely! Just because a thread is marked as stuck it doesn’t mean it is frozen or unusable. Imagine you wanted to calculate PI, you are creating PDFs, distance maps, mapping the human genome or you have deployed some JCA adapter talking to MQ-Series, SAP or PeopleSoft which is internally using a Thread.sleep() method call. All of this is are reasonable usages likely to occur in the wild.

 

Do stuck threads ever dissapear? Can they be cleared somehow? Are they stuck forever?

First of all you cannot get rid of a stuck thread by simply “killing it”. You cannot cancel or kill any thread in Java. However, stuck threads automatically will disappear if the condition clears up which caused them to be marked as stuck (e.g. the sleep period is over or the calculation is done).

To prove the point, switch to the WebLogic admin console and under Server / ServerName / Configuration set StuckThreadCount to 3 and StuckThreadTime to 60 seconds then restart the server and run the StuckThreadForFree app to create 3 threads running for 120 seconds using the Thread.sleep() method (the other method will work as well, there is no difference, but keeping 3 threads busy by doing math proves to be a fan test of your machine as well):

 

 

In the WebLogic log file you will find three entries logging the stuck thread state after a while:

<05.04.2012 10:55 Uhr MESZ> <Critical> <WebLogicServer> <BEA-000385> <Server health failed. Reason: health of
critical service 'Thread Pool' failed>
<05.04.2012 10:55 Uhr MESZ> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '4' for queue: 'webl
ogic.kernel.Default (self-tuning)' has been busy for "85" seconds working on the request "Workmanager: default
, Version: 1, Scheduled=false, Started=true, Started time: 85443 ms
", which is more than the configured time (StuckThreadMaxTime) of "60" seconds. Stack trace:
 java.lang.Thread.sleep(Native Method)

 

After waiting about one minute you will observe that WebLogic  is transitioning into FAILED state as configured:

 

Wait another minute, then check the thread states under Server / ServerName / Monitoring / Threads which reveals the following:

 

So once the condition causing the stuck threads is cleared also the stuck threads will disappear again! Stuck threads are not stuck forever. Phew!

 

When should I use StuckThreadCount in the admin console or a Workmanager stuck-thread setting then?

Very good question. Use StuckThreadCount from the WebLogic admin console or with a <work-manager-shutdown-trigger> definition moving the application into ADMIN mode if you can react on the FAILED state.

Do not use StuckThreadCount if the threads might be doing something useful and you cannot react on the situation anyway. Obviously transitioning into FAILED state and restarting WLS with the nodemanager is counterproductive if you threads are doing something useful.

 

 

More?

The following posting shows how simple tools like ps, top and jcmd can track down the exact line of Java code causing a thread to use a high amount of CPU. Exactly the same StuckThreadForFree application is used as here.

 

 

Comments

  1. Pierluigi says:

    in fact it has always been a mystery for me – and still is – under which circumstances a thread is declared as hogging… if I discover something I will let you know.

    • Pierluigi,

      thanks for your comment!

      the example with the app, which is as simple as it gets, shows: TIMED_WAITING or RUNNABLE for a bit more than one second (then it will be detected already).

      Let me know if you have another example – or if you come to other conclusions playing around with the app provided.

      best,

      Frank

  2. Excellent post! Very thorough and up to date for WLS 12c.
    The app is also very useful for studying Stuck Threads.
    Another way to create stuck threads I think, it would be to perform the following: Have my app deployed in WLS and connect my debugger. Set a breakpoint in somewhere in my app and let it there paused. This will also create a stuck thread (the thread that is serving my app at the moment). So I can simulate the behaviour of my WLS when my app is deployed and a stuck thread occurs.

    • Hi Spyros,

      I appreciate your comment a lot!
      What you suggest is a great way to verify that the configured measures (such as workmanager settings) for a deployed app are working in fact. Using a debugger is a nice idea..

      cheers,

      Frank

  3. Edinson says:

    Hi Frank this article is awesome, I’m actually facing Stuck threads problems too on weblogic, one question is normal to get a Webservice call get in STUCK state and later be unable to use the Webservice?

    This is because when I get one Thread Stuck for a Webservice call, and this never ends… I’m unable to use the Webservice again.

    This webservice is doing Database Operations that apparently blocks the normal execution.

  4. First it is awesome article.Many Many Thanks for same .
    Can you please share us the code for StuckThreadForFree.war to help us understand how to create struck threads ?
    Really appreciate your feedback on this.

  5. I tried to deploy War files on Weblogic .It gives me following error :-

    Messages
    Message icon – Error Unable to access the selected application.
    Message icon – Error Exception in AppMerge flows’ progression
    Message icon – Error Exception in AppMerge flows’ progression
    Message icon – Error VALIDATION PROBLEMS WERE FOUND problem: cvc-enumeration-valid: string value ’3.0′ is not a valid enumeration value for web-app-versionType in namespace http://java.sun.com/xml/ns/javaee:
    Message icon – Error VALIDATION PROBLEMS WERE FOUND problem: cvc-enumeration-valid: string value ’3.0′ is not a valid enumeration value for web-app-versionType in namespace http://java.sun.com/xml/ns/javaee:

    Please try to explain me how can i solve this issue.

    • Hello Sahil,

      Did you deploy on WebLogic 12c? It will only work with EJB 3.1 support as described in the text.
      Also I added a download link for the whole Netbeans project so you can experiment yourself.

      best,

      Frank

  6. A thread is flagged as hogger when it is running for a longer time than the average response time of all requests.

    regards,
    Dalton

    • Hi Dalton,

      thanks for this piece of information. The definition makes a lot of sense IMHO.
      If you don’t mind, I will update the original posting. Do you have any reference for it?

      cheers,

      Frank

  7. Hi Frank,

    The documentation at http://docs.oracle.com/cd/E25178_01/apirefs.1111/e13952/pagehelp/Corecoreserverservermonitorthreadstitle.html says the hogger status is “True if the execute thread is being hogged by a request for much more than the normal execution time, as automatically observed by the scheduler”.

    I also decompile some classes to double check how it works… :)

    regards,
    Dalton

    • Dalton,

      cheers for your feedback – I enjoy this blog because of responses like this!
      Last time I was decompiling WLS classes it was with a BEA consultant sitting next to me because we couldn’t agree on how sth was supposed to work based on the documentation. So a while ago…

      best,

      Frank

  8. Hi Frank,

    Another interesting thing is when the thread is set as hogger, WebLogic lowers it’s priority. I think guessing the thread is waiting for some I/O.
    The drawback: if you have a cpu intensive task (like parsing a huge xml), WLS makes things worse lowering the thread priority. I don’t think with todays processors speed it makes any difference, but back 10 years and it was probably an issue.

    Dalton

  9. venkat siva says:

    Hi All,

    We will created WorkManager with the option “ignore stuck threads” and assign it as a dispatch policy to each Proxy Service.

    But Admin server health was ok for first 20 min’s , after that it went to warning state due to “threadpool has stuck thread”.

    Please do the needful

    Thanks,
    Venkat

  10. Hi Frank,

    It is great post! Do you have version of StuckThreadForFree.war for Weblogic 10.3.3?

    Thanks

    • James,

      I appreciate your feedback, good that is helpful for you.
      StuckThread was written based on EJB3.1 (which is only provided by WLS12c or Glassfish) and is using async methods to accomplish its functionality.
      At the moment I don’t have another implementation. If you are a developer you could download the Netbeans project and modify it yourself (making blocking calls, less pretty but still working).

      best regards,

      Frank

      ps. there are more discussions about these kind of issues in my upcoming book. Make sure to stay connected in any way
      http://www.facebook.com/WebLogicBook
      http://www.munzandmore.com/newsletter
      https://twitter.com/frankmunz

      Also I will keep posting recipes from the book once it is available!

  11. Thanks Frank for detailed information on the topic which is generally not discussed clearly in any documentation. We were also facing struck-Thread warning but the intended work of the thread as successful, so we were not sure what’s the cause and how concern we should be. This post solved the puzzle for us.
    Thanks Again. Keep Posting.

  12. Hi Frank,

    Hope you are able to talk now. Great post. I am in a position to identify “why is the server slow?” kind of an issue. And I would like to be able to use the Stuck Thread feature to identify slow threads. I would like to set the Stuck Thread Timeout to 10 seconds. I would like information on what operations are slow. Hopefully, I can get this information by using the Stuck Thread feature. What do you think?

    Thanks!

    -Shanti

  13. Hello,

    Just checking to see if my question came thru’. The window behaved a bit strangely.

    Thanks!
    -Shanti

  14. Hello,
    Is there a new war available for download? I see that the web.xml has java.sun.com reference and in Weblogic, the deployment failed.Any idea about that?

  15. Frank,

    really awesome article. Do you know if weblogic log file doesn’t shows the error for threads hogging which debug option would help getting those logged?

    • not sure if I understand your question.
      - do you see the output in the terminal window but not in the log file?

      • I guess, the question : is there any way to log the details about hogging threads, by using some debug mechanism such as log4j.properties option or so.

  16. Awesome Explanation.Thanks a lot!!. Have bookmarked your site for Future reference.Keep Posting!!

  17. Kunal Parsewar says:

    hi,
    Thanx for this explanation.This post has solved my many queries of threads in weblogic.

  18. FRANK you helped me a lot with this post! Thank you!

  19. rizwan says:

    Hello Frank,

    We have a requirement to get an email alert whenever there is a stuck thread in the environment . My question is

    Do we have any tool in weblogic console or em to do this ? I came to know about WLDF , Can i do this with WLDF ? What other options is available ?

    Regards,
    Rizwan

Trackbacks

  1. [...] 内容为英文,原文出处:http://www.munzandmore.com/2012/ora/weblogic-stuck-threads-howto Category: WebLogic  |  评论 (RSS)  |  Trackback [...]

  2. [...] 内容为英文,原文出处:http://www.munzandmore.com/2012/ora/weblogic-stuck-threads-howto 标签:Stuck, WebLogic Category: WebLogic  |  Comment (RSS)  |  Trackback var __strHref = window.location.href; __strHref = __strHref.replace('http', 'wlw'); __strHref = __strHref.replace('p=', 'postId='); document.writeln(''); [...]

Speak Your Mind

*