Issue #2794: EM close causes an exception

Type: DiscussionVersion: 2.8.7_03Priority: HighStatus: ActiveReplies: 28
#1

We get following sporadic exception, sometimes the use case works, sometimes it does not work, do you know why?

Caused by: com.objectdb.o._PersistenceException: Failed to read from file '/tmp/objectdb_2306457094224783100/pages0.dat'
    at com.objectdb.o._PersistenceException.a(_PersistenceException.java:47)
    at com.objectdb.o.JPE.d(JPE.java:147)
    at com.objectdb.o.ERR.h(ERR.java:56)
    at com.objectdb.o.OBC.onObjectDBError(OBC.java:1588)
    at com.objectdb.o.OBM.aE(OBM.java:233)
    at com.objectdb.o.OBM.close(OBM.java:188)
    at ep.base.transactions.internal.TransactionManagerImpl.lambda$3(TransactionManagerImpl.java:411)
    at java.base/java.util.Optional.ifPresent(Optional.java:183)
    at ep.base.transactions.internal.TransactionManagerImpl.doClose(TransactionManagerImpl.java:411)
    ... 52 more
Caused by: com.objectdb.o.UserException: Failed to read from file '/tmp/objectdb_2306457094224783100/pages0.dat'
    at com.objectdb.o.MSG.a(MSG.java:77)
    at com.objectdb.o.LFL.Yq(LFL.java:1180)
    at com.objectdb.o.LFL.Yq(LFL.java:1144)
    at com.objectdb.o.PCF.c(PCF.java:176)
    at com.objectdb.o.PAG.d(PAG.java:515)
    at com.objectdb.o.PAG.a(PAG.java:298)
    at com.objectdb.o.SNP.ZJ(SNP.java:146)
    at com.objectdb.o.DFL.ZX(DFL.java:860)
    at com.objectdb.o.MST.ZY(MST.java:1358)
    at com.objectdb.o.WRA.ZY(WRA.java:355)
    at com.objectdb.o.OBM.aN(OBM.java:1160)
    at com.objectdb.o.OBM.aE(OBM.java:229)
    ... 56 more
Caused by: java.io.EOFException
    at java.base/java.io.RandomAccessFile.readFully(RandomAccessFile.java:472)
    at com.objectdb.o.LFL.Yq(LFL.java:1172)
    ... 66 more
#2

This stack trace indicates a missing temporary file that holds information about uncommitted flushed updates when the EntityManager is closed and that transaction is not rollbacked or committed before the close operation.

Some additional information:

  1. It is related to using a special mode, enabled using the "objectdb.temp.page-file" system property, which is probably used only by your company. It was added to support large flush operations. Since this is a hidden feature, no similar reports from other users are expected. This mode is considered temporary. Version 3.x (currently in development) will add better support for large flush operations. However, we cannot provide information regarding the expected date release.
  2. Apparently it doesn't seem to cause any damage - so you might be able to silently ignore it.
  3. We may be able to release a new build now that may possibly fix it - if you can upgrade to the last build.

Please confirm that other stack traces that you may have regarding this issue are similar. i.e. it happens when closing an EntityManager and has no other side effects except the exception itself.

ObjectDB Support
#3

It seems also the issue https://www.objectdb.com/forum/2796 has the same reason as this issue.

All these issues do only occur, if we execute parallel threads on different databases in Linux. In Windows we don't have problems.

If we remove the option System.setProperty("objectdb.temp.page-file", "true");
then the execution of parallel threads works without errors.

Following the other stacktrace:

Caused by: com.objectdb.o.InternalException: Unexpected internal exception
    at com.objectdb.o.JPE.e(JPE.java:170)
    at com.objectdb.o.ERR.h(ERR.java:66)
    at com.objectdb.o.OBC.onObjectDBError(OBC.java:1588)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:738)
    ... 54 more
Caused by: java.lang.NullPointerException: Cannot invoke "com.objectdb.o.PAG.N()" because "this.h" is null
    at com.objectdb.o.UPT.a(UPT.java:180)
    at com.objectdb.o.TSK.f(TSK.java:184)
    at com.objectdb.o.TSK.b(TSK.java:157)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.UPT.n(UPT.java:158)
    at com.objectdb.o.PGT.j(PGT.java:155)
    at com.objectdb.o.UPT.u(UPT.java:122)
    at com.objectdb.o.URT.g(URT.java:173)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.TSM.a(TSM.java:88)
    at com.objectdb.o.UTT.r(UTT.java:377)
    at com.objectdb.o.UTT.q(UTT.java:425)
    at com.objectdb.o.UTT.g(UTT.java:282)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.MST.ZW(MST.java:1388)
    at com.objectdb.o.WRA.ZW(WRA.java:383)
    at com.objectdb.o.WSM.ZW(WSM.java:155)
    at com.objectdb.o.OBM.aS(OBM.java:1019)
    at com.objectdb.o.OBM.aU(OBM.java:887)
    at com.objectdb.o.OBM.aJ(OBM.java:818)
    at com.objectdb.o.QRR.i(QRR.java:140)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:726)
    ... 69 more
#4

We had also some issues with temporarily files and locking in Linux, if parallel processes access different files in the same directory. We had to change the implemention so that it works also in Linux. Do you need a hint how we solved the access on temporarily files with file locking?

#5

Is it related to the original post or a separate issue?

Of course, any hint on how to improve ObjectDB will be appreciated. If you could provide information about the issue, including a stack trace of an exception, if any, and tips regarding your workaround, we might be able to fix the issue.

ObjectDB Support
#6

Yes, it is related to the original post.

In Linux we try to get a lock on a lock file additionally by opening a file outputstream on the lock file.
This solves problems with the Linux file system which we have not with Windows.

try {
    fos = new FileOutputStream(subLockFile);
    fileLocked = fos.getChannel().tryLock() == null ? true : false;
} catch (Exception e) {
    fileLocked = true;

 

If fileLocked is false then the lock is sucessfully.

#7

Thank you for this information. As mentioned in #2 above, we may also be able to release a new build with a fix. However, if your workaround works well and solves the issue, it might be better to use it.

ObjectDB Support
#8

Yes, we need a new build with a fix.

#9

Build 2.8.7_05 includes an attempt to fix the first reported exception (#1 above).

It is unclear if the issue reported in #3 above is directly related, it might be a different issue.

Please try the new build and provide feedback regarding both exceptions.

 

 

ObjectDB Support
#10

the new version caused an unexpected exception

[ObjectDB 2.8.7_05] Unexpected exception (Error 990)
  Generated by OpenJDK 64-Bit Server VM 11.0.8 (on Windows 10 10.0).
Please report this error on https://www.objectdb.com/issue
com.objectdb.o.InternalException: java.lang.ClassCastException: class java.util.ArrayList cannot be cast to class java.lang.Enum (java.util.ArrayList and java.lang.Enum are in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class java.util.ArrayList cannot be cast to class java.lang.Enum (java.util.ArrayList and java.lang.Enum are in module java.base of loader 'bootstrap')
    at com.objectdb.o.TYW.ac(TYW.java:399)
    at com.objectdb.o.TYW.writeElement(TYW.java:278)
    at ep.expression.bl.internal.dmos.ExpressionImpl.__odbWriteContent(ExpressionImpl.java:1)
    at ep.expression.bl.internal.dmos.IdentifiableExpressionImpl.__odbWriteContent(IdentifiableExpressionImpl.java:1)
    at com.objectdb.o.MMM.M(MMM.java:1215)
    at com.objectdb.o.UTY.Y(UTY.java:1413)
    at com.objectdb.o.UTY.Z(UTY.java:1402)
    at com.objectdb.o.ENH.f(ENH.java:48)
    at com.objectdb.o.STA.d(STA.java:576)
    at com.objectdb.o.STM.a(STM.java:428)
    at com.objectdb.o.OBM.aG(OBM.java:970)
    at com.objectdb.jdo.PMImpl.aG(PMImpl.java:2277)
    at com.objectdb.o.OBM.aU(OBM.java:881)
    at com.objectdb.o.OBM.aJ(OBM.java:826)
    at com.objectdb.o.QRR.i(QRR.java:140)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:726)
#11

the previous version 2.8.7_04 does still work as expected

#12

Build 2.8.7_06 reverses an enum optimization in 2.8.7_05, which might cause the new exception.

Could you please try it?

ObjectDB Support
#13

I checked the update.

But we still get the following exception, if parallel threads access the db:

java.lang.RuntimeException: Failed to read from file '/tmp/objectdb_9120972825587144595/pages0.dat'
'/tmp/objectdb_9120972825587144595/pages0.dat'
...
Caused by: com.objectdb.o._PersistenceException: Failed to read from file '/tmp/objectdb_9120972825587144595/pages0.dat'
    at com.objectdb.o._PersistenceException.a(_PersistenceException.java:47)
    at com.objectdb.o.JPE.d(JPE.java:147)
    at com.objectdb.o.ERR.h(ERR.java:56)
    at com.objectdb.o.OBC.onObjectDBError(OBC.java:1591)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:738)
    at ...
Caused by: com.objectdb.o.UserException: Failed to read from file '/tmp/objectdb_9120972825587144595/pages0.dat'
    at com.objectdb.o.MSG.a(MSG.java:77)
    at com.objectdb.o.LFL.YG(LFL.java:1180)
    at com.objectdb.o.LFL.YG(LFL.java:1144)
    at com.objectdb.o.PCF.c(PCF.java:176)
    at com.objectdb.o.PAG.m(PAG.java:469)
    at com.objectdb.o.PAG.r(PAG.java:715)
    at com.objectdb.o.OBI.<init>(OBI.java:172)
    at com.objectdb.o.PBI.<init>(PBI.java:58)
    at com.objectdb.o.MST.ao(MST.java:851)
    at com.objectdb.o.BQI.YU(BQI.java:130)
    at com.objectdb.o.PRG.h(PRG.java:698)
    at com.objectdb.o.PRG.g(PRG.java:560)
    at com.objectdb.o.QRM.ZG(QRM.java:287)
    at com.objectdb.o.MST.ZG(MST.java:1026)
    at com.objectdb.o.WRA.ZG(WRA.java:313)
    at com.objectdb.o.WSM.ZG(WSM.java:117)
    at com.objectdb.o.QRR.k(QRR.java:260)
    at com.objectdb.o.QRR.i(QRR.java:154)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:726)
    ... 77 more
Caused by: java.io.EOFException
    at java.base/java.io.RandomAccessFile.readFully(RandomAccessFile.java:474)
    at com.objectdb.o.LFL.YG(LFL.java:1172)
    ... 94 more

 

#14

Although the exceptions in #1 and #13 seem similar and both are related to the temporary flush files, only #1 (exception during close) was fixed (hopefully). The exception in #13 is thrown during query execution, where it seems that the temporary file is deleted while the query is executed. The temporary file is deleted only when the database is closed.

Is it possible that in your test you close the  database (i.e. the EntityManagerFactory) while queries against the database are still running?

Another option, do you use multiple databases in that test? If you do, maybe there is a collision where the same path is used for temporary files of different databases. If this is the case than it could be a hint for a possible fix.

Does it happen only on Linux?

Does it happen if you apply the workaround in #6? Regarding #6 please clarify which file exactly you lock and when.

ObjectDB Support
#15

No, the db is not closed during querys.

The issue occurs only in Linux, in Windows it works fine.

The workaround in #6 is only a hint for you how we lock files and directories on our side, it has nothing to do the database issue.

The issue occurs, if two different jvm run in parallel and access their own databases. The databases have their own temp directorys like objectdb_8686310565808007678 and objectdb_2587113241537120946.

What I have found out is that if the use case persists many entities then the use case runs into the issue.
If I reduce the number of persisted entities then the use case works fine.

#16

> The issue occurs, if two different jvm run in parallel and access their own databases. The databases have their own temp directorys like objectdb_8686310565808007678 and objectdb_2587113241537120946.

Are you sure that this happens only when two different JVMs run in parallel? Because different temporary directories are used by the two JVMs, as you wrote, it is difficult to see how one JVM affects the other. Do you also have multithreading access to ObjectDB on each JVM? Maybe this issue is related to multiple threads, possibly using multiple databases, but on the same JVM rather than to multiple JVMs?

ObjectDB Support
#17

I checked it in only one JVM with multiple threads. But in this scenario all works as expected.

But now I have an example in the attachment.

If you execute the example in parallel JVMs in Linux then one of the examples run into the issue.

In Windows the example works fine.

#18

Tried this test case with 2 JVMs, OpenJDK 17.0.4, ObjectDB 2.8.7_07 on Ubuntu 20.04.5 LTS.

They ran without exception in parallel until:

flush: 250000
flush: 260000
flush: 270000
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

At what stage did you get the exception? What heap size were you using?

As the test case runs fine on Windows, maybe it also runs fine on some Linux environments. Can you specify the hardware, Linux edition/version and JVM version that you are using? Can you also try more recent Java / Linux version? It might also be related to specific disk and file system settings.

ObjectDB Support
#19

Please also double check that when the exception is thrown with 2 JVMs running the test case, you have 2 distinct tmp directories for the 2 JVMs (if possible, empty the root tmp directory before starting).

Temporary directories should be distinct and locked per JVM. However, if for some reason (1) java.util.Random starts with the same initial seed, and, (2) file locking does not work (might be the case in some environment), then this exception is possible, as the 2 JVMs will share the same tmp directory.

java.util.Random uses System.nanoTime() for initialization, so if the JVMs have synthetic constant System.nanoTime() then this collision is possible.

ObjectDB Support
#20

Ubuntu 20.04.5 LTS 64-Bit
Ubuntu runs as a virtual machine on vmware

java.vm.version: 17.0.3+7
java.vm.name: OpenJDK 64-Bit Server VM

ObjectDB 2.8.7_07

we used -Xmx2g for the JVM memory configuration

 

Console output:

start
flush: 0
flush: 10000
flush: 20000
flush: 30000
flush: 40000
flush: 50000
flush: 60000
flush: 70000
flush: 80000
Exception in thread "main" [ObjectDB 2.8.7_07] javax.persistence.PersistenceException
Failed to read from file '/tmp/objectdb_5397403947054255678/pages0.dat' (error 117)
    at smallTests.PersistManyEntitiesInOwnDatabase.main(PersistManyEntitiesInOwnDatabase.java:43)
Caused by: com.objectdb.o.UserException: Failed to read from file '/tmp/objectdb_5397403947054255678/pages0.dat'
    at com.objectdb.o.MSG.a(MSG.java:77)
    at com.objectdb.o.LFL.YH(LFL.java:1180)
    at com.objectdb.o.LFL.YH(LFL.java:1144)
    at com.objectdb.o.PCF.c(PCF.java:176)
    at com.objectdb.o.PAG.m(PAG.java:469)
    at com.objectdb.o.PAG.r(PAG.java:715)
    at com.objectdb.o.UPT.u(UPT.java:121)
    at com.objectdb.o.UPT.g(UPT.java:111)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.UPT.n(UPT.java:158)
    at com.objectdb.o.PGT.j(PGT.java:155)
    at com.objectdb.o.UPT.u(UPT.java:122)
    at com.objectdb.o.UPT.g(UPT.java:111)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.UPT.n(UPT.java:158)
    at com.objectdb.o.PGT.j(PGT.java:155)
    at com.objectdb.o.UPT.u(UPT.java:122)
    at com.objectdb.o.URT.g(URT.java:173)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.TSM.a(TSM.java:88)
    at com.objectdb.o.UTT.r(UTT.java:377)
    at com.objectdb.o.UTT.g(UTT.java:210)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.MST.Y5(MST.java:1388)
    at com.objectdb.o.WRA.Y5(WRA.java:383)
    at com.objectdb.o.WSM.Y5(WSM.java:155)
    at com.objectdb.o.OBM.aS(OBM.java:1027)
    at com.objectdb.o.OBM.aU(OBM.java:895)
    at com.objectdb.o.OBM.flush(OBM.java:811)
    ... 1 more
Caused by: java.io.EOFException
    at java.base/java.io.RandomAccessFile.readFully(RandomAccessFile.java:471)
    at com.objectdb.o.LFL.YH(LFL.java:1172)
    ... 32 more

 

#21

Could you try running the same test case with either:

  java -Dobjectdb.temp.no-temp-file-purge=true

or

  System.setProperty("objectdb.temp.page-file", "true");
  System.setProperty("objectdb.temp.no-temp-file-purge", "true");
ObjectDB Support
#22

The property

System.setProperty("objectdb.temp.no-temp-file-purge", "true");

does not help.

But I empty the tmp directory.

And during the execution I saw that only one temp directory was created objectdb_4972201337691437071 and the error message was

Exception in thread "main" [ObjectDB 2.8.7_07] javax.persistence.PersistenceException
Failed to read from file '/tmp/objectdb_810017089076508231/pages0.dat' (error 117)
    at smallTests.PersistManyEntitiesInOwnDatabase.main(PersistManyEntitiesInOwnDatabase.java:44)
Caused by: com.objectdb.o.UserException: Failed to read from file '/tmp/objectdb_810017089076508231/pages0.dat'

But the tmp directory objectdb_810017089076508231 did not exist at all, if I monitored the tmp directory during execution. It seems it was not created.

 

#23

Is it possible that the tmp directory has not been created due to no permission?

Can you try running the test with a root user?

Another system property to check (in case the directory was created and deleted):

  System.setProperty("objectdb.temp.no-temp-file-clean", "true");
ObjectDB Support
#24

The property System.setProperty("objectdb.temp.no-temp-file-clean", "true");
works, the issue does not happen anymore.

This means that the temp clean up job of the second started application removes the tmp directory of the first started application.

#25

The hint in #6 could help you to prevent that the clean up job deletes tmp directories which are created by other JVM.

#26

When ObjectDB creates a temporary directory for a new JVM it also tries to apply a lock on the lock file, as suggested in #6, using RandomAccessFile, FileChannel and tryLock. It tries multiple times, but if it fails it eventually proceeds silently without a lock (probably at least a warning has to be produced in that case). If files cannot be locked on a specific system then the solution of #23 might be needed, with a different application managed cleanup method.

Does the code in #6 succeed in locking files on that specific system?

ObjectDB Support
#27

Please try Build 2.8.7_08, which maybe can solve the issue.

ObjectDB Support
#28

The ObjectDB update solves the issues in the example and our integration tests.

I have still to check the whole test suite.

#29

Version 2.8.8 was now released and it includes all the 2.8.7_* snapshot fixes.

ObjectDB Support

Reply