ObjectDB ObjectDB

Issue #429: Server out of memory exceptions

Type: Bug ReoprtVersion: 2.2.8Priority: NormalStatus: FixedReplies: 22
#1

I've started to get memory exceptions on objectdb instances running as a server. The objectdb log reports this as:

[2011-07-29 23:28:41 #20 server]
java.lang.OutOfMemoryError: Java heap space

[2011-07-29 23:28:41 #21 server]
java.lang.OutOfMemoryError: Java heap space
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at com.objectdb.o.NTS.z(NTS.java:314)
at com.objectdb.o.NTS.y(NTS.java:274)
at com.objectdb.o.HND.run(HND.java:128)
at java.lang.Thread.run(Thread.java:595)

[2011-07-30 19:53:24 #22 store]
Database 'F:\calculationservice\objectdb\db-files\WorkflowHistory.odb' is closed by 14140@SE101924

[2011-07-30 19:54:36 #23 server]
[ObjectDB 2.2.8] Unexpected exception (Error 990)
  Generated by Java HotSpot(TM) 64-Bit Server VM 1.5.0_21 (on Windows 2003 5.2).
Please report this error on http://www.objectdb.com/database/issue/new
com.objectdb.o.InternalException: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space

and:

[2011-07-31 22:49:01 #18 tool]
java.lang.OutOfMemoryError: Java heap space

The server manages 2 "queue" databases used for communication between processes.

One of the databases has 2 writers and 1 reader. The other has a single writer and single reader. The message size going into the queue is not large - around 4k at most. Each read/write operation is done via a new EntityManager which is closed straight afterwards.

The server is running with settings -Xms512m -Xmx1g. I have scope to increase these limits but would like to get an understanding of where the memory is being used. In each exception case the server has typically been running for 3-4 days without issue.

edit
delete
#2

The stack trace doesn't provide much information.

Could you get heap dump snapshots - after one day, two days, three days - in order to detect memory leakage if any.

ObjectDB Support
edit
delete
#3

Unfortunately the environments showing the problem are production so we have limited access.

I've tried to recreate this in a test environment but have had no luck as yet - I'll update as soon as i have something. I've included the stack trace from the latest incident below just in case it sheds any light on the problem.


[2011-08-02 18:39:59 #18 server]
[ObjectDB 2.2.8] Unexpected exception (Error 990)
  Generated by Java HotSpot(TM) 64-Bit Server VM 1.5.0_21 (on Windows 2003 5.2).
Please report this error on http://www.objectdb.com/database/issue/new
com.objectdb.o.InternalException: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at com.objectdb.o.PAG.<init>(PAG.java:169)
at com.objectdb.o.PGC.x(PGC.java:274)
at com.objectdb.o.SSS.V(SSS.java:441)
at com.objectdb.o.SSS.W(SSS.java:470)
at com.objectdb.o.PGW.az(PGW.java:527)
at com.objectdb.o.SPW.aM(SPW.java:350)
at com.objectdb.o.SPW.aJ(SPW.java:266)
at com.objectdb.o.SPW.ar(SPW.java:231)
at com.objectdb.o.PGW.am(PGW.java:201)
at com.objectdb.o.UPT.B(UPT.java:134)
at com.objectdb.o.UPT.l(UPT.java:110)
at com.objectdb.o.TSK.i(TSK.java:146)
at com.objectdb.o.TSK.f(TSK.java:95)
at com.objectdb.o.UPT.r(UPT.java:157)
at com.objectdb.o.PGT.q(PGT.java:109)
at com.objectdb.o.UPT.B(UPT.java:121)
at com.objectdb.o.URT.l(URT.java:171)
at com.objectdb.o.TSK.i(TSK.java:146)
at com.objectdb.o.TSK.f(TSK.java:95)
at com.objectdb.o.TSM.e(TSM.java:87)
at com.objectdb.o.UTT.z(UTT.java:362)
at com.objectdb.o.UTT.l(UTT.java:218)
at com.objectdb.o.TSK.i(TSK.java:146)
at com.objectdb.o.TSK.f(TSK.java:95)
at com.objectdb.o.TSM.e(TSM.java:87)
at com.objectdb.o.MST.UY(MST.java:1174)
at com.objectdb.o.WRA.UY(WRA.java:365)
at com.objectdb.o.WSM.UY(WSM.java:174)
at com.objectdb.o.WRA.UY(WRA.java:365)
at com.objectdb.o.WSN.UY(WSN.java:650)
at com.objectdb.o.STC.v(STC.java:485)
at com.objectdb.o.SHN.al(SHN.java:542)
edit
delete
#4

Ok, the attached test (testObjectDbPipe) seems to cause the problem.

Basically I'm creating 2 server based queues, each with 5 writers and 1 reader. The msg size being saved should be around 2mb. The server is running with -Xms256m -Xmx256m, so lower than my real environment, but I think this should still run ok as at most we should have 12 clients loading a 2mb msg - so 24mb raw data size? (Let me know if I've got this wrong though!) The test dies after a few minutes of running.

I've attached the code, the multithreaded unit test library i'm using and the server logfile.

 

edit
delete
#5

Could you please explain how this test uses threads? It seems that every few seconds all the existing threads are ended and then more threads are run (i.e. > 60 threads after about 2 minutes). Each thread manages its own EntityManager, which may consume a lot of memory.

ObjectDB Support
edit
delete
#6

The aim of the test is to replicate application startup & shutdown cycles. Each cycle of the outer loop should create 10 writer threads (5 for each database) and 2 reader threads (1 for each database). Each thread should read or write 10 msgs and then exit. When all threads have exited (i.e. the application shutdown) the loop restarts.

There shouldn't be more than 12 threads accessing the server at anyone time - if you're seeing more then it sounds like the threads are not exiting and this is a problem with the test. I'll check this...

Are there additional cleanup steps that I should be taking to dispose of EntityManagers then? I actually thought it would be creating more than one EntityManager per thread as the createEntityManager call is made as every object is read or saved.

To be honest this test came about by fishing for the problem rather than a structured replication. As the real problem only presents itself after a few days the test is deliberately more extreme to try and force the issue out. The application does similar things with objectdb in embedded mode without any issues so it looks to be related to something which is different in server mode - connection management seemed like a good place to start.

edit
delete
#7

Sorry! - I've double checked the test and it is bad. I'm not clearing the thread list on each iteration so you're right - the number of threads is growing each time

I've changed it to clean up properly and also to open/close the EntityManagerFactory in the loop. It seems to run ok now.

Back to the drawing board...

edit
delete
#8

Consider running Java Visual VM on the production computer, just to take heap snapshots.

Heap dumps after a few days of running the system can provide the missing clue.

ObjectDB Support
edit
delete
#9

I'm working on it but even when there are problems it is tough to get this kind of thing through :(

On a brighter note, I think I might have actually replicated it now...

The attached test (testObjectDbPipe) has 10 threads which write msgs to database "test1" and 5 threads which reads msgs from database "test1" and writes them to database "test2".
The msgs are approx 2mb in size so at one msg per thread there should be around 30mb of data in play at any one time.

I've run the test with -Xms256m -Xmx512m and the server with -Xms256m -Xmx256m

After running for about 15mins I get out of memory exceptions

The test fails faster (in about 2 mins) if I enable another 5 threads to read from test1 and write to test2 - I've left this commented out in the attachment.

If I remove the save into test2 from the reader/writer threads (so it just reads the msg and discards) then the test runs fine.

edit
delete
#10

I ran the new test and took heap dump snapshots using the Java VisualVM. After some activity the test became inactive, maybe because of the VisualVM interference, but the heap dump reveals a general picture.

Memory usage on the client side was stable. Memory usage on the server side became higher mainly due to increasing page cache size. I see that the test uses 4 different databases (2 large and 2 small). The default size of each page cache (per database) is 64MB, so the cache of the two large databases consumes at least 128MB (probably a little more). Other caches (page caches of the 2 other databases, query results caches, etc.) consume additional memory, so server side OutOfMemoryError when heap size is limited to 256MB may be reasonable.

Try to increase the maximum heap size and reduce cache sizes (not just the page cache) in the configuration.

ObjectDB Support
edit
delete
#11

Fair point - I hadn't considered the caches.

I've rerun the test after setting the cache (page & results) size to 0mb. At first it looked ok but when I bumped up the msg size to 5mb and left it running for 25mins I got another out of memory exception on the server. I haven't been able to reproduce it consistently though as it worked ok for a hour on a subsequent rerun.

I'll let you know if I ever pin this down...

edit
delete
#12

I've attached a logfile from one of today's tests that cause an oom exception. The best I can tell is that the problem occurs after I restart the server & test without clearing the databases,

i.e. I let the test run for 20mins; stopped the test; restarted the server; restarted the test; the exception came after running for 10mins.

Is it possible something could be corrupted during the restart? Is some state held over between service runs?

 

edit
delete
#13

I've also started to get oom exceptions in an embedded database:

[ObjectDB 2.2.8] javax.persistence.RollbackException
Failed to commit transaction: Attempt to commit a transaction after failure in writing to database file 'F:\calculationservice\objectdb\dbs\admin\WorkflowHistory.odb' (error 613)
at com.objectdb.jpa.EMImpl.commit(EMImpl.java:277)
at rbccm.felix.gridservice.admin.dao.SimpleWorkflowInstanceDao.saveWorkflow(Unknown Source)
at rbccm.felix.gridservice.admin.dao.SimpleWorkflowInstanceDao.SaveWorkflow(Unknown Source)
at rbccm.felix.gridservice.admin.AdminMessageService.process(Unknown Source)
at rbccm.felix.framework.service.ServiceRunner.run(Unknown Source)
at java.lang.Thread.run(Thread.java:595)
Caused by: javax.persistence.PersistenceException: com.objectdb.o.UserException: Attempt to commit a transaction after failure in writing to database file 'F:\calculationservice\objectdb\dbs\admin\WorkflowHistory.odb'
at com.objectdb.o._PersistenceException.b(_PersistenceException.java:47)
at com.objectdb.o.JPE.g(JPE.java:140)
at com.objectdb.o.JPE.g(JPE.java:78)
... 8 more
Caused by: com.objectdb.o.UserException: Attempt to commit a transaction after failure in writing to database file 'F:\calculationservice\objectdb\dbs\admin\WorkflowHistory.odb'
at com.objectdb.o.MSG.d(MSG.java:74)
at com.objectdb.o.SFL.ad(SFL.java:822)
at com.objectdb.o.MST.ad(MST.java:1249)
at com.objectdb.o.MST.UY(MST.java:1164)
at com.objectdb.o.WRA.UY(WRA.java:365)
at com.objectdb.o.WSM.UY(WSM.java:174)
at com.objectdb.o.OBM.bI(OBM.java:904)
at com.objectdb.o.OBM.bG(OBM.java:809)
at com.objectdb.o.OBM.bE(OBM.java:715)
at com.objectdb.jpa.EMImpl.commit(EMImpl.java:274)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.objectdb.o.HTB.f(HTB.java:86)
at com.objectdb.o.HTB.q(HTB.java:399)
at com.objectdb.o.PGC.B(PGC.java:402)
at com.objectdb.o.SFL.ah(SFL.java:990)
at com.objectdb.o.SFL.run(SFL.java:898)
... 1 more

In this case it is a normal object database (i.e. no queue behaviour). The app has been running for at least a couple of weeks and the db is approx 1.2gb. It only seems to have impacted the objectdb operations (not sure why or how) - other functionality in the app seems to be working ok.

(Apologies for sending another stack trace without much detail - I'm working on it...)

edit
delete
#14

You have to provide heap dump snapshots.

Most memory problems cannot be diagnosed using stack traces.

Getting heap dumps should not be so complex.

ObjectDB Support
edit
delete
#15

Unfortunately everything at this bank is complex...

Fortunately, I have a dump from the most recent incident (we already have oom dumps switched on for the app itself). Unfortunately the file is 2g so I have no way of getting it to you. Some of the info (from Eclipse Memory Analyser) is below & attached. Please let me know if there is anything specific I should look for.

Problem Suspect 1:

One instance of "com.objectdb.o.MST" loaded by "sun.misc.Launcher$AppClassLoader @ 0x853f0440" occupies 375,301,672 (17.92%) bytes. The memory is accumulated in one instance of "com.objectdb.o.MST" loaded by "sun.misc.Launcher$AppClassLoader @ 0x853f0440".

Problem Suspect 2:

598,700 instances of "com.objectdb.o.PAG", loaded by "sun.misc.Launcher$AppClassLoader @ 0x853f0440" occupy 1,436,097,296 (68.58%) bytes. These instances are referenced from one instance of "com.objectdb.o.PAG[]", loaded by "sun.misc.Launcher$AppClassLoader @ 0x853f0440"

edit
delete
#16

This heap dump indicates a problem with the page caches. With cache size of 64MB (assuming you use the default) and page size of 2KB (the default) - you should only have 32K PAG instances per open database file.

How many open databases do you use concurrently? Following the thread dump I count about 9 open databases.

For 10 open databases you should have about 320K PAG instances. Additional PAG instances are created for operations, and because your objects are large, operations manage many pages.

Still, 598K PAG instances seems suspicious and it might be an ObjectDB bug.

You can upload the full heap dump (in a zip version) to ftp objectdb.com, user=public, password=public.

 

ObjectDB Support
edit
delete
#17

I'm unable to access the ftp site as the firewall here is blocking it - instead I've uploaded a multi-part rar file (14 parts) via a few support tickets - hope thats ok & apologies for the inconvinience.

The app is connecting to quite a few databases so it may be that the problem boils down to the cache/config settings after all. Is there anyway to specify cache sizes on a per-database level? Most of the "queue" databases we use probably don't need any cache but it would be nice to keep it for the proper queryable databases. (If you think this is the root of the problem then I can switch it off across the board).

To answer your questions - yes, we are using the default settings. The databases are as follows with a "queue" database translating into 2 physical databases (one small for queryable metadata, one larger for msg data):

3x standard object databases (embedded)

3x queue databases (embedded)

3x queue databases (server mode)

So in total we have 9 embedded database and 6 server based databases.

Thanks for your help and patience on this as I know its been frustrating with the lack of information!

edit
delete
#18

Currently the cache size setting is global so all the databases per JVM get the same cache size (it makes sense to make the configuration more flexible in the future).

You may try reducing the cache size from the default 64MB to 8MB - it can solve the problem.

The actual cache size might be larger than the size in the configuration in some cases. I will check the heap dump to see if it indicates an unexpected state.

Thank you for this report and for the data - it helps in improving ObjectDB.

 

ObjectDB Support
edit
delete
#19

Ok, I've also attached a heap dump for the local test issue I was seeing yesterday in case its a separate problem. This involved restarting the test readers & writers from #9 above a few times whilst leaving the server running in the background. The server was throttled to 256mb but cache sizes were set to 0mb.

(I'll post the dump via support tickets again as this page complains about the size - only 5 files this time but still 130mb)

edit
delete
#20

The JVM heap dump reveals a memory leakage in the query program cache.

Please try build 2.2.9_02 that should fix this.

If it doesn't - try to disable the query program cache (and maybe also the query result cache).

Thank you for this important report and heap dump data.

ObjectDB Support
edit
delete
#21

Any feedback regarding the new build that may fix the memory leak would be appreciated.

ObjectDB Support
edit
delete
#22

So far so good.

I've haven't been able to recreated the problem locally with the new build (although my local test was never that scientific anyway) .

The new build was deployed to the environments that we've seen the problem last weekend and as yet there has been no re-occurance. The issue took just over a week to occur last time so it is a little early to say that it is resolved completely but it looks promising.

edit
delete
#23

Thank you for the update. This issue will be marked now as fixed.

Reopen it or create a new issue if necessary.

ObjectDB Support
edit
delete

Reply

To post on this website please sign in.