Issue #1977: Unexpected error when loading all entity instance

Type: Bug ReoprtVersion: 2.7.0Priority: CriticalStatus: FixedReplies: 19
#1

Hi,

when going through all entity instance to find orphaned entries we get the exception:

Exception in thread "main" [ObjectDB 2.6.9_05] Unexpected exception (Error 990)
  Generated by Java HotSpot(TM) 64-Bit Server VM 1.8.0_74 (on Windows 10 10.0).
Please report this error on http://www.objectdb.com/database/issue/new
com.objectdb.o.InternalException: null
com.objectdb.o.InternalException
at com.objectdb.o.BYR.o(BYR.java:113)
at com.objectdb.o.BYR.w(BYR.java:206)
at com.objectdb.o.VUT.j(VUT.java:357)
at com.objectdb.o.RST.C(RST.java:194)
at com.objectdb.o.RTT.l(RTT.java:134)
at com.objectdb.o.RST.l(RST.java:24)
at com.objectdb.o.RTT.D(RTT.java:179)
at com.objectdb.o.RST.s(RST.java:121)
at com.objectdb.o.PGT.q(PGT.java:109)
at com.objectdb.o.RST.B(RST.java:93)
at com.objectdb.o.RTT.l(RTT.java:132)
at com.objectdb.o.RST.l(RST.java:24)
at com.objectdb.o.RTT.D(RTT.java:179)
at com.objectdb.o.RST.s(RST.java:121)
at com.objectdb.o.PGT.q(PGT.java:109)
at com.objectdb.o.RST.B(RST.java:93)
at com.objectdb.o.RTT.l(RTT.java:132)
at com.objectdb.o.RST.l(RST.java:24)
at com.objectdb.o.TSK.i(TSK.java:145)
at com.objectdb.o.TSK.f(TSK.java:95)
at com.objectdb.o.TSM.e(TSM.java:86)
at com.objectdb.o.RTT.D(RTT.java:177)
at com.objectdb.o.RST.s(RST.java:121)
at com.objectdb.o.PGT.q(PGT.java:109)
at com.objectdb.o.RST.B(RST.java:93)
at com.objectdb.o.RTT.l(RTT.java:132)
at com.objectdb.o.RST.l(RST.java:24)
at com.objectdb.o.TSK.i(TSK.java:145)
at com.objectdb.o.TSK.f(TSK.java:95)
at com.objectdb.o.TSM.e(TSM.java:86)
at com.objectdb.o.RTT.D(RTT.java:177)
at com.objectdb.o.RST.s(RST.java:121)
at com.objectdb.o.PGT.q(PGT.java:109)
at com.objectdb.o.RST.B(RST.java:93)
at com.objectdb.o.RTT.l(RTT.java:132)
at com.objectdb.o.RST.l(RST.java:24)
at com.objectdb.o.TSK.i(TSK.java:145)
at com.objectdb.o.TSK.f(TSK.java:95)
at com.objectdb.o.TSM.e(TSM.java:86)
at com.objectdb.o.MST.a0(MST.java:627)
at com.objectdb.o.MST.U7(MST.java:565)
at com.objectdb.o.WRA.U7(WRA.java:279)
at com.objectdb.o.LDR.G(LDR.java:583)
at com.objectdb.o.LDR.F(LDR.java:473)
at com.objectdb.o.OBC.U2(OBC.java:1097)
at com.objectdb.o.QRR.f(QRR.java:222)

What we do:
1. Create a transaction and load 10.000 entity instances
2. Check for each entity instance if it is references using queries (select from <type> where field=<instance>)
3. Load the next 10.000 instance and start over

After a while and a lot of queries we get the above exception.

Thanks
Markus Ritter

#2

Unfortunately this stack trace doesn't provide sufficient information to understand the cause of this exception. Could you post a test program that demonstrates it?

ObjectDB Support
#3

Hello,

I managed to get around the upper exception, but now I get the exception again:

[ObjectDB 2.7.0_03] Unexpected exception (Error 990)
  Generated by Java HotSpot(TM) 64-Bit Server VM 1.8.0_74 (on Windows 10 10.0).
Please report this error on http://www.objectdb.com/database/issue/new
com.objectdb.o.InternalException: null
com.objectdb.o.InternalException
at com.objectdb.o.BYR.o(BYR.java:113)
at com.objectdb.o.BYR.w(BYR.java:206)
at com.objectdb.o.VUT.j(VUT.java:357)
at com.objectdb.o.OBI.r(OBI.java:286)
at com.objectdb.o.OBI.VC(OBI.java:240)
at com.objectdb.o.BQI.VM(BQI.java:151)
at com.objectdb.o.SQI.UI(SQI.java:119)
at com.objectdb.o.PRG.ag(PRG.java:687)
at com.objectdb.o.PRG.af(PRG.java:555)
at com.objectdb.o.QRM.U9(QRM.java:286)
at com.objectdb.o.MST.U9(MST.java:988)
at com.objectdb.o.WRA.U9(WRA.java:311)
at com.objectdb.o.WSM.U9(WSM.java:115)
at com.objectdb.o.QRR.g(QRR.java:247)
at com.objectdb.o.QRR.f(QRR.java:153)
at com.objectdb.jpa.JpaQuery.getSingleResult(JpaQuery.java:750)
at com.querydsl.jpa.impl.AbstractJPAQuery.getSingleResult(AbstractJPAQuery.java:183)
at com.querydsl.jpa.impl.AbstractJPAQuery.fetchOne(AbstractJPAQuery.java:253)
at com.querydsl.core.support.FetchableQueryBase.fetchFirst(FetchableQueryBase.java:51)
at com.mvolution.webservice.endpoints.user.LoginEndpoint.processJson(LoginEndpoint.java:54)

When executing the query:

select _changeSet
from ChangeSet _changeSet
where _changeSet.owner = ?1
order by _changeSet.index desc

With the parameter ?1 = "b07f5cd0-de22-11e6-8f0c-da13fcef2539". This query also fails when using the Database Explorer.

I 7zipped our database and attached it to the private ticket http://www.objectdb.com/database/support/624.

Thanks for your support!

#4

Could you please provide the database as a single file (possibly as a link to a resource on the Internet if it is too large), as extracting the parts fails.

ObjectDB Support
#5

Unfortunately the database is corrupted. Checking it with the ObjectDB Doctor produces errors.

After using the Doctor to repair the database the query runs in the Explorer with no errors.

The errors may indicate abnormal database termination followed by opening the database with no recovery (e.g. if the database file was moved or copied to another location without the recovery file).

ObjectDB Support
#6

Hello,

thats what I did already. But after adding new ChangeSet instances it gets corrupted all the time. So I repair, run for like 10min and then I have to repair again. There seems to be a bug which corrupts the database file. I shutdown the application by closing all EntityManager instances.

Please check again if there is anything wrong.

Thanks

#7

If there is a bug that can cause database corruption then obviously this is a critical issue with the highest priority. If you know how to reproduce it then we need your help, as we do not have information about this issue (no other reports) and it is impossible to understand the cause from a database that is already corrupted.

Could you please provide a program that can demonstrate this corruption, i.e. starts with a valid database and then corrupts it?

ObjectDB Support
#8

Hello,

I did not find a way to reproduce the error now. It seems there was insufficient memory on the maschine. I will observe the issue and update if I find a way to reproduce it.

Best wishes

#9

Insufficient memory may cause a crash. However, if recovery is enabled then the database should be repaired automatically when it is opened again (if not moved without the recovery file).

Please verify that recovery is enabled and report again if new information becomes available.

ObjectDB Support
#10

Hello,

today I got the same stacktrace again. The database file is corrupt. I checked the HDD for errors but there are none. The disk space remaining was 5GB so there is also no issue. I checked my logs and found the error:

Entity is not found: com.mvolution.domain.entities.misc.LogEntry#'017a8666-1865-11e7-8035-62dadd13d0df' [LogEntry] [ENT.an():1699) -> OBC.onObjectDBError():1560) -> ERR.f():56) -> JPE.g():94)]"

After this occurrence I get the BYR.o() exception. The issue http://www.objectdb.com/database/issue/294 seems related to this one. 

I cannot reproduce the error on purpose so I´m not able to provide information about how to get the error.

Recovery is enabled <recovery enabled="true" sync="false" path="." max="128mb"/>

Is there another way for you to analyze the issue since we cannot rely on the database if there are corruption issues?

Thanks,
Markus

#11

Did this happen again after already fixing the database with the ObjectDB Doctor?

ObjectDB Support
#12

Hello,

yes. I repaired the database using the Doctor. Then running the application for like 5 hours and now I have a corrupted database file again.

The output of the Doctor:

Global Value Errors
-------------------

[1] Unexpected total object count: 1041631 (expected 1041630)

BTree Value Errors
------------------

[1] com.mvolution.domain.entities.misc.LogEntry
  - Unexpected object count: 773 (actual 772)

Page Relation Errors
--------------------

[1] Page #442580 unexpected exception: null
[2] Page #442580 last key is null, parent page #94848 next key is '01f1e2e0-12cb-11e7-bda5-f6e38e607e99'|1


Page Content Errors
-------------------

[1] Unexpected exception in page #442580 entry #1: null

Large Sections with Errors
--------------------------

Group #1:
Page#326 141:'01f1e2e0-12cb-11e7-bda5-f6e38e607e99' 1/2 1980+1020/3000

Creating a database copy...
.................................................. 1000000 objects
..  1041630 entity objects (total) have been migrated.

New database copy is ready at: db/smartcal.odb.fixed

I enabled Recording so next time I will be able to attach a record file as well. Maybe this will help investigation.

Thanks,
Markus

#13

Yes, please keep a copy of the repaired database that you just created, and next time you get this exception please send this database copy together with the new corrupted database and the recording.

There are currently no other reports of database corruptions from other users, but if this happens repeatedly then it may indicate an ObjectDB bug that unfortunately only you hit maybe due to some specific use.

If this is the case, then maybe indeed we will be able to reproduce it by playing the recording.

ObjectDB Support
#14

Could you please also provide some information about how you use ObjectDB, i.e. client-server or embedded mode, number of databases, hardware, operating system, JVM, ObjectDB build number, etc?

Do you have recovery enabled?

Is there a chance that the database file is accessed concurrently by two different processes directly (i.e. not using one server process)?

Do you move or copy database files in production?

ObjectDB Support
#15

Hello,

additional information: embedded mode (single process multi-threaded access), one database only, running inside a xen vm (Intel Core i7, 8GB RAM for VM, 16GB SSD Harddrive for VM), Debian jessie, Oracle Java 1.8.0_121, latest ObjectDB 2.7.0_03.

Recovery is enabled like stated above but the database gets corrupted while running without interruption. We do not move the database files, develop, staging and production use separate database files.

Thanks,
Markus

#16

Thank you for the information.

Will you be able to try client-server mode instead of embedded mode in production for awhile?

It may eliminate the small chance that the database file is somehow accessed concurrently by two instances of ObjectDB (e.g. two separate class loader contexts in the same process that function as two separate processes)?

ObjectDB locks the database file to avoid concurrent access by different processes, but the lock may not be effective on some systems, or in the same process if there are two different class loader contexts (which is common with web applications).

ObjectDB Support
#17

Hello,

I uploaded the recording data along with the database files. We use Tomcat and hot deployment. Since you mentioned that there might be parallel access we changed to advanced deployment (shutdown Tomcat, deploy, start Tomcat). Using this approach it will be impossible for multiple processes to access the database file.

Thanks,
Markus

#18

OK. When you shutdown Tomcat (and the web application is stopped) do you close the ObjectDB database (EntityManagerFactory's close), e.g. using a contextDestroyed listener?

During the last corruption did you have hot deployments (i.e. in the last day)?

Corruptions started in the last days, but apparently you were using ObjectDB for awhile without reporting errors. Can you think of things that have changed and may be related to the problem (e.g. moving to hot deployment recently)?

Have you tried playing the recording file on the original db, does the result file healthy or corrupted?

ObjectDB Support
#19

Hello,

yes I close all instances of EntityManagerFactory instances on destroy of the servlet. 
I disabled the hot deploy and the database keeps on getting corrupt. I uploaded the new files (see private ticket) which reproduce the error.

Thanks
Markus

#20

Build 2.7.1 should fix this bug.

This is probably the most critical bug in ObjectDB in the recent years. Thank you very much for your report and for your help in reproducing the situation.

ObjectDB Support

Reply