Not releasing locked MST object

#1

Hi,

we have an issue, that upon a call to

EMImpl.commit

while holding several locks the processing thread was trying to join another thread and it got stuck in that state for hours.

"WebSocket-Worker-169": inconsistent?, holding [0x00007f4bb0e6f200, 0x00007f609be186b0, 0x00007f15a057ea58, 0x00007f4bb0e6f928, 0x00007f4bb0e6faa0, 0x00007f4bb0e6fac0]

at java.lang.Object.wait0(java.base@21.0.1/Native Method)

at java.lang.Object.wait(java.base@21.0.1/Object.java:366)

at java.lang.Thread.join(java.base@21.0.1/Thread.java:2078)

at java.lang.Thread.join(java.base@21.0.1/Thread.java:2154)

at com.objectdb.o.IXA.i(IXA.java:175)

at com.objectdb.o.IXM.YN(IXM.java:101)

at com.objectdb.o.MST.X(MST.java:1217)

at com.objectdb.o.MST.ZB(MST.java:1204)

at com.objectdb.o.WRA.ZB(WRA.java:215)
at com.objectdb.o.TYS.f(TYS.java:843)

at com.objectdb.o.TYS.ZS(TYS.java:644)

at com.objectdb.o.TYM.Z(TYM.java:570)

at com.objectdb.o.TYM.j(TYM.java:500)

at com.objectdb.o.TYM.S(TYM.java:868)

at com.objectdb.o.TYM.R(TYM.java:962)

at com.objectdb.o.EPR.YO(EPR.java:85)

at com.objectdb.o.MPT.visitRefs(MPT.java:176)

at com.objectdb.o.TVS.e(TVS.java:170)

at com.objectdb.o.TVS.cascade(TVS.java:157)

at com.objectdb.o.STA.C(STA.java:539)

at com.objectdb.o.STM.a(STM.java:413)

at com.objectdb.o.OBM.aG(OBM.java:970)

at com.objectdb.jdo.PMImpl.aG(PMImpl.java:2277)

at com.objectdb.o.OBM.aU(OBM.java:881)

at com.objectdb.o.OBM.aH(OBM.java:790)

at com.objectdb.jpa.EMImpl.commit(EMImpl.java:292)

 

The lock was on MST object and it was blocking among others these threads:

0x00007f609be186b0
MST

Held by:
WebSocket-Worker-169

10 threads waiting to take lock:
ODB-FileWriter
ODB-IndexActivation

Because of this lock all DB operations were only waiting and we could fix it only via restart of whole application.

Could You describe why is this lock a bottleneck in ObjectDB and how could the thread owning it got stuck for hours ?

We are running ObjectDB 2.8.9.b04 in embedded mode.

Thx

 

 

#2

To understand the cause of this deadlock a full thread dump of the involved threads would be needed. This should also show locks that WebSocket-Worker-169 waits for and which threads hold them. When a circular lock is found, we should be able to fix the deadlock.

If you can verify that the issue exist in version 2.9.3_01 it would help. Although a deadlock is not mentioned in the change log since version 2.8.3, there is always a chance that an issue happens only in specific versions.

ObjectDB Support
#3

I can not publish here the whole thread dump because it reveals our code structure in the stacktraces, however the blocked thread is waiting on an object which is unknown to the thread dump:
 

"WebSocket-Worker-169" #2190192 [257667] daemon prio=5 os_prio=0 cpu=217195.03ms elapsed=57330.49s tid=0x00007f107c06cf10 nid=257667 in Object.wait()  [0x00007f102466a000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait0(java.base@21.0.1/Native Method)
    - waiting on <no object reference available>
    at java.lang.Object.wait(java.base@21.0.1/Object.java:366)
    at java.lang.Thread.join(java.base@21.0.1/Thread.java:2078)
    - locked <0x00007f4bb0e6f200> (a java.lang.Thread)
    at java.lang.Thread.join(java.base@21.0.1/Thread.java:2154)
    at com.objectdb.o.IXA.i(IXA.java:175)
    at com.objectdb.o.IXM.YN(IXM.java:101)
    at com.objectdb.o.MST.X(MST.java:1217)
    - locked <0x00007f609be186b0> (a com.objectdb.o.MST)
    at com.objectdb.o.MST.ZB(MST.java:1204)
    - locked <0x00007f609be186b0> (a com.objectdb.o.MST)
    at com.objectdb.o.WRA.ZB(WRA.java:215)
    at com.objectdb.o.TYS.f(TYS.java:843)
    at com.objectdb.o.TYS.ZS(TYS.java:644)
    at com.objectdb.o.TYM.Z(TYM.java:570)
    - locked <0x00007f15a057ea58> (a com.objectdb.o.TYM)
    at com.objectdb.o.TYM.j(TYM.java:500)
    - locked <0x00007f15a057ea58> (a com.objectdb.o.TYM)
    at com.objectdb.o.TYM.S(TYM.java:868)
    - locked <0x00007f15a057ea58> (a com.objectdb.o.TYM)
    at com.objectdb.o.TYM.R(TYM.java:962)
    - locked <0x00007f15a057ea58> (a com.objectdb.o.TYM)
    at com.objectdb.o.EPR.YO(EPR.java:85)
    at com.objectdb.o.MPT.visitRefs(MPT.java:176)
    at com.objectdb.o.TVS.e(TVS.java:170)
    at com.objectdb.o.TVS.cascade(TVS.java:157)
    at com.objectdb.o.STA.C(STA.java:539)
    at com.objectdb.o.STM.a(STM.java:413)
    at com.objectdb.o.OBM.aG(OBM.java:970)
    at com.objectdb.jdo.PMImpl.aG(PMImpl.java:2277)
    at com.objectdb.o.OBM.aU(OBM.java:881)
    at com.objectdb.o.OBM.aH(OBM.java:790)
    - locked <0x00007f4bb0e6f928> (a com.objectdb.jdo.PMImpl)
    at com.objectdb.jpa.EMImpl.commit(EMImpl.java:292)
#4

The new stack trace indicates a collision between automatic schema update (e.g. a new type is found) and automatic index rebuilding (e.g. a new index is found or indexes require rebuilding due to schema change).

As a workaround, try disabling index rebuilding (update) and see if it helps. Later (after the schema update was processed) you should enable it again. Let us know what you find as it would help in working on eliminating this deadlock.

ObjectDB Support
#5

Thanks for the tip.

I don't have the  
<index-update>

element in my ObjectDB conf file, so I guess the index update is enabled by default, right ?

I don't have yet a separated test case simulating the issue, it happened up to now only once.

I'll try to simulate it and then use the disabling of the index rebuilding.

#6

> so I guess the index update is enabled by default, right ?

Yes.

> I don't have yet a separated test case simulating the issue, it happened up to now only once.

It probably happened after a schema change that included a new index or invalidated an existing index.

ObjectDB Support

Reply