DevOps and Middleware Engineering: Weblogic Trouble Shooting issues

Deployment issues:

code issues: we will send error log to the application team for modification.

Caused By: weblogic.utils.ErrorCollectionException:

There are 1 nested errors:

weblogic.j2ee.dd.xml.AnnotationProcessException: Duplicate ejb name 'BDAccountEjbBean' found: annotation 'Stateless' on bean

failed due to connection pool issue: we will fix connection pool issues and then redeploy the application

Out of memory issue during the deployment:

error: java.lang.outofmemory.permgenspace

this error occured due to space in perm area.

setDomainEnv.sh

xx:permsize 64m
xx:maxpermsize 128m

we have set intial permsize=maxpermsize then restarted the servers, redeployed the application

If one or two application faile
d when we are triggering through scipt. we will fix that issue and do a deployment using console

jdbc issues:
************

1) DB down (raise a ticket to db team)
2) Incorrect hostname or port number ( raise a ticket to network team)
3) Data base connection lost ( telnet ipaddress port )
4) Data base user_acc lock ( raise a ticket to db team for unlocking user_acc)
5) Invalid pakage error (raise a ticket to db team)
6) TNS listener error (raise a ticket to db team)
7) Schema does not exist (raise a ticket to db team)
8) Cannot allocate resource error
Intial capacity : 5
max : 15 increase max to 25
9) connection leaks ( send error to application team)
10) Connection time out ( raise a tickect to db team for long running quries)

jms issues:
***********
stuck message issues

Check whether dest queue is available, check message format, check queue name.

rolling message issues (messages will run continuously in the loop)

delete those messages in the queue.

diskspace issues:
*****************
If the disk space usage is 95%-100% then we will delete old log files

[root@localhost ~]# df -kh
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 3.8G 1.9G 1.8G 52% /
/dev/sda1 46M 9.2M 35M 22% /boot
tmpfs 506M 0 506M 0% /dev/shm
/dev/sda3 14G 14G 0G 100% /home

du -kh (disk usage)

[root@localhost ~]# du -sh /home
1.8G /home

[root@localhost bea10.3]# du -sh *
181M jdk160_05
211M jrockit_160_05
28K logs
100M modules
24K registry.dat
8.0K registry.xml
19M user_projects
556K utils
429M wlserver_10.3

delete old log files

/home/bea10.3/user_projects/domains/sherkhan/servers/AdminServer/logs

rm -rf Adminserver.log00001 Adminserver.log00002 Adminserver.log00003
rm -rf Adminserver.out00001 Adminserver.out00002 Adminserver.out00003
rm -rf access.log00001 access.log00002 access.log00003

/home/bea10.3/user_projects/domains/sherkhan/servers/ms1/logs

rm -rf ms1.log00001
rm -rf ms1.out00001

or zip the log files

/home/bea10.3/user_projects/domains/sherkhan/servers/AdminServer/logs

gzip -r *

/home/bea10.3/user_projects/domains/sherkhan/servers/AdminServer

gzip -r logs

High cpu utilization:
*********************

top (linux)
prstat (solaris)

top - 07:45:22 up 3:03, 3 users, load average: 0.16, 0.33, 0.17
Tasks: 113 total, 2 running, 109 sleeping, 0 stopped, 2 zombie
Cpu(s): 0.0%us, 0.7%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1035400k total, 1020348k used, 15052k free, 77688k buffers
Swap: 2040212k total, 0k used, 2040212k free, 483724k cached

%cpu %Mem
9523 root 22 0 637m 239m 3660 S 98.7 23.7 0:12.79 java

ps -ef | grep 9523

If you find any zombie process count >50 raise a ticket to solaris admins

If any java processes are occupying 95-100% cpu usage then check the log files for any continuous looping messages or jdbc transaction time outs.

fix the problem and kill manged server using kill -9 pid and restart the service instance.

Application logs files not rotating issue:
******************************************

check the diskspace if it is full then delete old logs

check whether log4j properties file set in classpath

404 error:
**********

404 error: page can't be displayed.
10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent.

sol:

1) check whether they are using correct url
2) check whether apache server is running ( ps -ef | grep httpd) ( ps -ef | grep -i apache)

Apache2.2/bin

httpd -k start
httpd -k stop

apachectl -k start
apachectl -k stop

3) check the diskspace of Apache server if it is full then delete the log files (df -kh)

goto Apache2.2/logs

delete old logs

4) Check whether the deployed application is in active state
5) If the deployed application is failed then fix the issue and redeploy the application

500 error:
**********

Service unavailable

this error is due to server down

check apache or weblogic service instance is the server is down then start the server.

403 error:
**********

Access forbidden

check whether the proxy mapping is correct

check syngrants and synanyms run properly in data base side

check whether the user having access to the application

issue:replicas.prop file corrupted
**********************************

<BEA-000386> Server subsystem failed. Reason: java.lang.NumberFormatException: null
java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at weblogic.ldap.EmbeddedLDAP.validateVDEDirectories(EmbeddedLDAP.java:1097)
at weblogic.ldap.EmbeddedLDAP.start(EmbeddedLDAP.java:242)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:207)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:176)

This mostly happens when LDAP files are corrupted under the ../domain-name/server/AdminServer/data/ldap/ directory. A possible cause of corruption is when space on server is full. When the associated volume is full (100%) weblogic server will corrupt these files.

sol:
rm -rf /home/bea10.3/user_projects/domains/sherkhan/servers/AdminServer/data/ldap/conf/replicas.prop

To fix the above error tried the below:
Remove the ../domain-name/server/AdminServer/data/ldap/conf/replicas.prop file and restart the Admin server. It should work now.

Error: unable to ubtain lock file

<May 13, 2012 8:34:54 PM IST> <Critical> <WebLogicServer> <BEA-000362> <Server failed. Reason:

There are 1 nested errors:

weblogic.management.ManagementException: Unable to obtain lock on C:\bea10.3\user_projects\domains\sherkhan\servers\AdminServer\tmp\AdminServer.lok. Server may already be running
at weblogic.management.internal.ServerLocks.getServerLock(ServerLocks.java:159)
at weblogic.management.internal.ServerLocks.getServerLock(ServerLocks.java:58)
at weblogic.management.internal.DomainDirectoryService.start(DomainDirectoryService.java:73)
at weblogic.t3.srvr.ServerServicesManager.startService(ServerServicesManager.java:459)
at weblogic.t3.srvr.ServerServicesManager.startInStandbyState(ServerServicesManager.java:164)
at weblogic.t3.srvr.T3Srvr.initializeStandby(T3Srvr.java:711)
at weblogic.t3.srvr.T3Srvr.startup(T3Srvr.java:482)
at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:440)
at weblogic.Server.main(Server.java:67)

>
<May 13, 2012 8:34:54 PM IST> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FAILED>
<May 13, 2012 8:34:54 PM IST> <Error> <WebLogicServer> <BEA-000383> <A critical service failed. The server will shut itself down>
<May 13, 2012 8:34:54 PM IST> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FORCE_SHUTTING_DOWN>

rm -rf \bea10.3\user_projects\domains\sherkhan\servers\AdminServer\tmp\AdminServer.lok

if the server is already running then ignore this error

if you are unable to start the server delete lok file and restart the server.

Error: Users are getting 404 error some times and they are able to access the application sometimes.
*****

1) check whether all managed servers are in running state.

if one of the managed server is in shutdown state then bring up the server.

check the http requests in access.log file for all managed server

if you are getting 404 error in one of the managed server log. then check server log for any errors

i got the below error in log file:

java.lang.socket exception: address or port already in use

netstat -anp | grep 8002

if the port is listened on any other instance. restat managed server.

if the issue still persists then raise a reqest to network team..

slow response:
**************
check All weblogic server status. bring the servers up if they are down
check network handshake requests in application logs. If you found any issues related to n/w then raise request to n/w team.
check for stuck thread issues in weblogic. If you found any stuck thread issues then take thread dump and analyze.
check cpu usage for java processes.
check heap size of weblogic server gc log or in console.
If the heap size is more than 80% then take heap dump send it to l3 support.
check no of users logged in to the application.
check for long running quiries from data base side.
check for latency in data base side.
check memory leaks in gc logs.
check connection leaks in the weblogic server side.
check space in weblogic unix machine.
check apache server space.

application slowness issue

1)we will check apache service instance. ps -ef | grep apache
  
cpu usage, memory usage in apache

2) If apache looks fine then we will check weblogic server side.

   cpu usage, memory usage

   if any system process are running then we will raise a request to unix team.
   if any java related process are occupying more cpu usage(more than 90) then we will check weblogic managed server instances.
   
check for any stuck thread issues.

we will take thread dump using kill -3 pid command and then we will analyze thread dumps.

we will analyze thread dump using samurai tool.

threads may stuck in the application level. if the thread got stuck due to application related issue then we will check the application logs and send it to the application team.

If many threads are waiting on the application server to finish processing the request.

Generally we see one or two threads waiting on the application server. But if you see a lot of these threads.

then there could be an issue with long running sql sessions or data base locks.

If you see many threads processing the same stack then there could be an issue with proxy server configuration.


out of memory issues.

we can see the error in the log file. In production we will get email alters.

java.lang.outofmemory

we will check the heap size.

how to set garbage collection log in setDomainEnv.sh

set JAVA_OPTIONS=%JAVA_OPTIONS% -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc: C:\bea\user_projects\domains\wlsproxy\servers\ AdminServer\logs\gc
check the garbage collection logs

there are two types of garbage collections

1)minor collection: this will happen only in young generation area.
2)major collection: this will happen in both young & old generation areas.

basically minor collection will take 2 milliseconds and major collecion will take 3-5secs. If the full gc is taking more than 10 sec then it could have impact on the users as no requests are pocessed when gc is occuring.

If you are seeing any obnormal behavior in GC log then we have to take heap dump and analyze the heap dump

some time if the garbage collection is not happening or jvm crashes then we will contact jvm vendor.

if we find any memory leaks in application side then we will send the request to l3 support.

Determine whether it is a Java OOM or Native OOM
If the stdout/stderr message says that this is a java.lang.OutOfMemoryError, then this is Java OOM
If the stdout/stderr message says that it failed to acquire memory, then this is a Native OOM
Please note that the above messages goes to stdout or stderr and not to the application specific log files like weblogic.log

[memory ] 7.160: GC 131072K->130052K (131072K) in 1057.359 ms

The format of the above output follows (Note: the same format will be used throughout this Pattern):
[memory ] <start>: GC <before>K-><after>K (<heap>K), <total> ms
[memory ] <start> - start time of collection (seconds since jvm start)
[memory ] <before> - memory used by objects before collection (KB)
[memory ] <after> - memory used by objects after collection (KB)
[memory ] <heap> - size of heap after collection (KB)
[memory ] <total> - total time of collection (milliseconds)

1) If the GC cycle doesn’t happen before java OOM, then it is a JVM bug

Full compaction:

Make sure that the JVM does proper compaction work and the memory is not fragmented which could prevent large objects being allocated and trigger a java OOM error. Java objects need the memory to be contiguous. If the available free memory is fragmented, then the JVM will not be able to allocate a large object, as it may not fit in any of the available free chunks. In this case, the JVM should do a full compaction so that more contiguous free memory can be formed to accommodate large objects. Compaction work involves moving of objects (data) from one place to another in the java heap memory and updating the references to those objects to point to the new location. JVMs may not compact all the objects unless if there is a need. This is to reduce the pause time of GC cycle. We can check whether the java OOM is due to fragmentation by analyzing the verbose gc messages. If you see output similar to the following where the OOM is being thrown even whether there is free java heap available, then it is due to fragmentation.
[memory ] 8.162: GC 73043K->72989K (131072K) in 12.938 ms
[memory ] 8.172: GC 72989K->72905K (131072K) in 12.000 ms
[memory ] 8.182: GC 72905K->72580K (131072K) in 13.509 ms

java.lang.OutOfMemoryError

In the above case you can see that the max heap specified was 128MB and the JVM threw OOM when the actual memory usage is only 72580K. The heap usage is only 55%. Therefore, the effect of fragmentation in this case is to throw OOM even when there is 45% of free heap. This is a JVM bug or limitation.You should contact the JVM vendor.

If the JVM does its work properly (all the things mentioned in the above step), then the java OOM could be an application issue. The application might be leaking some java memory constantly, which may cause this problem. Or, the application uses more live objects and it needs more java heap memory

Increase the java heap - We can also try increasing the java heap if possible to see whether that solves the problem.
Workaround - As a temporary work around, the application may be gracefully re-started when the java heap usage goes about 90%. When following this work around, the java max heap can be set to as high as possible so that the application will take more time to fill all the java heap. The java heap usage can be monitored by adding '-verbosegc' flag in the java command line which will send the GC/ heap usage info to stdout or stderr.

jmap command for capturing heap dump.

For Native OOM Problem
1. Collect the following information:
a. –verbosegc output to monitor the java heap usage. This will help to understand the java memory requirement for this application.
It should be noted that independent of the actual java heap usage by the application, the amount of max heap specified (using –Xmx flag in the java command line) is reserved at the JVM startup and this reserved memory is not available for any other purpose. In case of Jrockit, use -verbose instead of -verbosegc as this gives codegen information in addition to GC information.
b. Record the process virtual memory size periodically from the time the application was started until the JVM runs out of native memory. This will help to understand whether the process really hits the size limitation on that OS.
In case of Windows, use the following procedure to monitor the virtual process size:
In the Start -> Run… dialog, enter “perfmon” and click OK.
In the “Performance” window that pops up, click on the ‘+’ button (above the graph).
Select the following options in the resulting dialog:
Performance object: Process (not the default Processor)
Select counter from list: Virtual Bytes
Select instances from list: Select the JVM (java) instance
Click “Add”, then “Close”

In case of Unix or Linux, for a given PID, the virtual memory size can be found using this command – ps –p <PID> -o vsz. In case of Linux, each java thread within a single JVM instance is shown as a separate process. It is enough if we take the PID of the root java process. The root java process can be found using the —forest option for ps command. For example, ps –lU <user> --forest will give a ASCII tree art for all the processes started by the specified user. You can find the root java from the tree art.
2. Memory availability in the machine If the machine doesn’t have enough RAM and swap space, then the OS will not be able to give more memory to this process that could also result in out of memory. Make sure that the sum of RAM and swap space in the disk is sufficient to cater to all the running processes in that machine.
3. Tuning java heap If the java heap usage is well within the max heap, then reducing the java max heap will give more native memory to the JVM. This is not a solution but a workaround that can be tried. Since the OS limits the process size, we need to strike a balance between the java heap and the native heap.
4. Native memory usage by the JVM The amount of native memory usage by the JVM is expected to flatten out after all the classes are loaded and the methods have been called (code generation is over). This usually happens within the first few hours for most of the applications. After that, the JVM uses only little native memory that may be due to run time class loading, code generation due to optimization etc. In order to narrow down the problem, try disabling run time optimizations and check whether that makes any difference.
o In case of Jrockit, -Xnoopt flag can be used to disable run time optimizations.
o In case of SUN hotspot JVM, -Xint flag will force the JVM to run in interpreted mode (no code generation).
If the native memory usage continues to grow constantly throughout the run, then this could be a memory leak in the native code.
5. Third party native modules or JNI code in the application Check whether you are using any third party native module like database drivers. These native modules could also allocate native memory and the leak may be from these modules. In order to narrow down the problem, you should attempt to reproduce the problem without these third party modules. For example, you can use pure java drivers instead of native database drivers. Check whether your application uses some JNI code. This could also be causing native memory leak and you can try to run the application without the JNI code if possible.
6. If the source of native memory cannot be found after the above steps, then you need to work with the JVM vendor to get a special build which can trace the native memory allocation calls and give more information about the leak.
HP JVM specific tools/tips The following URL gives some tools and tips specific to OOM situations with HP JVM: HP JVM tools/tips
Jrockit specific features Jrockit 8.1 SP1 and above supports JRA recording (Java Runtime Analyzer). This is useful to gather information at JVM run time which will give information about the application like, number of GCs running, number of soft/weak/phantom references, hot methods etc. It is useful to make a recording for few minutes and analyze the data if the JVM has performance problems or hang problems. More details on this can be found in Jrockit docs.

servers are running in admin mode
*********************************
server will run in admin mode due to deployment or connection pool issue

fix deployment or jdbc problems and resume servers.

Failed_not_restartable mode
***************************
if the disk space is full then servers will go to failed not restable mode

stack overflow error:
*********************
If you get stack overflow error in log file.
we need to restart the server or increase the stack size using XSS:1024 (2048)

12 comments:

Unknown said...: 1)post me "out of memory isue"
and reason of causing this isue and what parameters we need to chage..
how we respond firstly

2)post the deployment isues like
when we deply an application it is in the prepared state .. how can we get that in to active state...

plz replay me; 10 July 2013 at 05:08
Unknown said...: 1)post me "out of memory isue"
and reason of causing this isue and what parameters we need to chage..
how we respond firstly

2)post the deployment isues like
when we deply an application it is in the prepared state .. how can we get that in to active state...

plz replay me; 10 July 2013 at 05:10
Anonymous said...: Since the admin of this website is working, no question very rapidly it will be renowned,
due to its feature contents.

My webpage :: click to find out more; 3 July 2014 at 19:47
Anonymous said...: Can I simply say what a comfort to find somebody that really
knows what they are talking about on the internet.
You definitely realize how to bring a problem to light and make it important.
More people really need to read this and understand this side of your story.
I was surprised that you're not more popular since you most certainly have the gift.

Feel free to visit my website ... Moukhtar Abliazov (); 1 August 2014 at 20:24
Ajit said...: Nice Middleware Blog this is .

signature
http://middlewaretimes.blogspot.com/; 30 September 2014 at 16:03
Unknown said...: Thank you very much for posting. this site is helping me lot, Once again thank you very much...; 13 October 2014 at 16:57
Unknown said...: Thank you very much for posting. this site is helping me lot, Once again thank you very much...; 13 October 2014 at 16:58
Unknown said...: Thank you very much for posting. this site is helping me lot, Once again thank you very much...; 13 October 2014 at 16:58
Raviteja said...: Hi Team,

I have a Doubt,What is the Maximum Size of the Application can we deploy in Weblogic Server?; 10 March 2016 at 05:22
Unknown said...: Hi,
one of the online test in java i m using weblogic8.1 server.the database connection in this project is simple using class.forName..... due to some performance issues i want to use connection pooling but when i try to login throug admin console it throw an error like "This page contains the following errors:

error on line 25 at column 6: XML declaration allowed only at the start of the document
Below is a rendering of the page up to the first error."
plz tell me how to resolve this issues in weblogic8.1.

thanks

prashant tomer
9013398220; 15 November 2016 at 01:36
Anonymous said...: What are stubs in weblogic?; 16 December 2016 at 23:49
Blogger said...: Want To Boost Your ClickBank Banner Traffic And Commissions?

Bannerizer made it easy for you to promote ClickBank products by banners, simply visit Bannerizer, and grab the banner codes for your chosen ClickBank products or use the Universal ClickBank Banner Rotator Tool to promote all of the ClickBank products.; 8 August 2018 at 03:43

DevOps and Middleware Engineering

Main Page

Search This Blog

Tuesday, 24 July 2012

Weblogic Trouble Shooting issues

12 comments:

Followers

About Me