Issue #2964: missing merger

Type: Bug Reoprt

Version: 2.9.2_03

Priority: Critical

Status: Fixed

Replies: 45

hgzpincher

Joined on 2023‑08‑03

At our major aerospace customer we are facing again and again the problem that these errors appear suddenly in the log (we always do a doctor in that case but just after some hours it is there again), the system is completely unusable then:

[2025-04-22 06:36:36 #67 query.compiler]
Large number of query plan combinations (32768) in query select o from ObjectNode o where o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.9.11" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.9.10" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.9" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.7" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.7" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.6" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.5" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.11" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.7.3" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.7.2" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.16.5" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.16" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.15" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055559.14.1" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055559.13.1".

[2025-04-22 12:33:37 #68 store]
SectionClassifier: SectionClassifier{9->merger[2083]-missing:1}

[2025-04-22 12:33:48 #69 store]
SectionClassifier: SectionClassifier{10->merger[2083]-missing:1}

[2025-04-22 12:33:50 #70 store]
SectionClassifier: SectionClassifier{3885->merger[2083]-missing:1}

[2025-04-22 12:33:53 #71 store]
SectionClassifier: SectionClassifier{3577->merger[2083]-missing:1}

[2025-04-22 12:33:59 #72 store]
SectionClassifier: SectionClassifier{3577->merger[2083]-missing:1}

support

Joined on 2010‑05‑03

This error message indicates that parts of objects have not been found during retrieval (in objects with IDs 9, 10, 3855 and 3577), but the cause is unclear.

Please advise whether there are errors when you run the Doctor after this happens (and share the errors if any) and whether just restart helps, at least for some time. In addition, please clarify what do you mean by "system is completely unusable" in the context of ObjectDB, i.e. what is your observation regarding ObjectDB operations, exceptions, log, etc. after this happens.

ObjectDB Support

hgzpincher

Joined on 2023‑08‑03

Find attached the log oft he last doctor, regarding the other questions:

In addition, please clarify what do you mean by "system is completely unusable" in the context of ObjectDB, i.e. what is your observation regarding ObjectDB operations, exceptions, log, etc. after this happens.

>>> our complete system strucks with these errors, normal functionalities that need to save changes to objects are not working any more

We’ll try to download the corrupt database now and send you a download link as soon as we have that

log.txt (41.7 KB)

hgzpincher

Joined on 2023‑08‑03

Sorry but we still cannot post in your tocket system.

We are using Java 17 now but still javax, could that be the problem ?

Here the link tot he backup of this night, no doctor done on that

https://www.dropbox.com/scl/fi/ulsgyq2t7uz4bhqvyjc5b/coreSystemDb.7z?rlkey=o3tpvdnz78b4ujvy2r69pbz97&dl=0

hgzpincher

Joined on 2023‑08‑03

This ist he log file from today, we have now to shutdown the system, our customer ISAR Aerospace changed to a desaster mode regarding our product

Sorry but we still cannot post in your tocket system.

We are using Java 17 now but still javax, could that be the problem ?

here the link to the current database (will take some more minutes until available
https://www.dropbox.com/scl/fi/ype139jhwcvdbctgoivrg/coreSystemDb-Kopie.7z?rlkey=wsedvt77vn8sksb7k95kh5aa5&dl=0

log.txt (41.7 KB)

odb20250417.log (166.9 KB)

support

Joined on 2010‑05‑03

Looking at the Doctor log file (#3 above) - the following lines indicate a corrupted database:

Large Sections with Errors:
Group #1:
Page#5699973 2:'3835471f-1a5e-4c23-82c1-aac8f7b41f14' 1/2 1980+218/2198
Group #2:
Page#5565011 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 1/8 1980+1980/14345
Page#5059295 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 2/8 3960+1980/14345
Page#5565032 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 3/8 5940+1980/14345
Page#5565033 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 4/8 7920+1980/14345
Page#5565034 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 5/8 9900+1980/14345
Page#5565035 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 6/8 11880+1980/14345
Page#5565036 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 7/8 13860+485/14345

So apparently the best quick emergency solution would be to use the Doctor to fix the database.

Maybe the difference between the databases in #4 and #5 can provide a clue regarding the cause of that corruption. If there is any other clue regarding special activity today at the time that these errors start please report.

> We are using Java 17 now but still javax, could that be the problem ?

No. Using Java 17 and javax should be fine.

> Sorry but we still cannot post in your ticket system.

Hopefully Reply should work now.

ObjectDB Support

hgzpincher

Joined on 2023‑08‑03

If there is any other clue regarding special activity today at the time that these errors start please report

>>> no it is completely random, we also have these errors:

[2025-04-13 17:59:03 #25 *]
[ObjectDB 2.9.2_02] Unexpected exception (Error 990)
Generated by OpenJDK 64-Bit Server VM 17.0.13 (on Windows Server 2019 10.0).
Please report this error on https://www.objectdb.com/issue
com.objectdb.o.InternalException: merge2 com.agile.hummingbird.ObjectNode:'3835496a-7399-4d22-b6b1-91c107ee323d' => merger[15352]-missing:1
com.objectdb.o.InternalException: merge2 com.agile.hummingbird.ObjectNode:'3835496a-7399-4d22-b6b1-91c107ee323d' => merger[15352]-missing:1
   at com.objectdb.o.PBI.h(PBI.java:126)
   at com.objectdb.o.PBI.g(PBI.java:99)
   at com.objectdb.o.OBI.ZZ(OBI.java:249)
   at com.objectdb.o.BQI.Zn(BQI.java:153)
   at com.objectdb.o.MQI.Zn(MQI.java:122)
   at com.objectdb.o.MQI.Zn(MQI.java:122)
   at com.objectdb.o.MQI.Zn(MQI.java:122)
   at com.objectdb.o.PRG.d(PRG.java:795)
   at com.objectdb.o.PRG.h(PRG.java:724)
   at com.objectdb.o.PRG.g(PRG.java:560)
   at com.objectdb.o.QRM.Z1(QRM.java:287)
   at com.objectdb.o.MST.Z1(MST.java:1026)
   at com.objectdb.o.WRA.Z1(WRA.java:313)
   at com.objectdb.o.WSM.Z1(WSM.java:117)
   at com.objectdb.o.QRR.k(QRR.java:260)
   at com.objectdb.o.QRR.i(QRR.java:154)
   at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:677)
   at com.agile.hummingbirdcore.SS_ScheduledSlot.calculateSlotData(SS_ScheduledSlot.java:1146)
   at com.agile.hummingbirdcore.SS_ScheduledSlot.onOneTimeCall(SS_ScheduledSlot.java:227)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
   at com.agile.hummingbirdcore.CyclicProcessor.doSingleTimeCalls(CyclicProcessor.java:268)
   at com.agile.hummingbirdcore.CyclicProcessor.access$300(CyclicProcessor.java:49)
   at com.agile.hummingbirdcore.CyclicProcessor$1.actionPerformed(CyclicProcessor.java:135)
   at java.desktop/javax.swing.Timer.fireActionPerformed(Timer.java:311)
   at java.desktop/javax.swing.Timer$DoPostEvent.run(Timer.java:243)
   at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318)
   at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773)
   at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720)
   at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714)
   at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
   at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
   at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
   at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
   at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
   at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
   at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
   at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
   at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

odb20250413.log (8.2 MB)

support

Joined on 2010‑05‑03

Some initial observations:

Running the Doctor on the database at #4 (yesterday's backup) produced no errors.

Running the Doctor on the database at #5 (today) produced some broken references errors and:

Large Sections with Errors:
Group #1:
Page#5896863 2:'31a9588f-3da4-49ca-b466-bc3f5eaa77b6' 1/2 1980+103/2083

This ObjectNode object was last modified (according to your modified field) today at 12:33, exactly when the errors started.

So apparently something with that transaction produced the database errors.

Running the following query in the explorer:

select from ObjectNode where uUid = "31a9588f-3da4-49ca-b466-bc3f5eaa77b6"

returns 2 objects, which is unexpected of course and will be examined further.

Repairing the database with the Doctor should hopefully keep only the new version and remove the old version with missing parts. Please run the above query on the repaired database to check.

Strangely, the errors in the log (at #1) indicate IDs 9, 10, 3885, 3577 - maybe ObjectProperty instances? These errors are probably related because of the time, 12:33, but these ObjectProperty instances are not corrupted.

Questions:

The Doctor errors in the logs at #3 and #7 above are different. They are not related to this specific incident, right? Have you fixed the database with the Doctor to solve them before today?
Any other errors in the log today (except #1 above)?
Did you try to restart the system without fixing the database?
Does the system work after fixing the database with the Doctor?

ObjectDB Support

hgzpincher

Joined on 2023‑08‑03

The Doctor errors in the logs at #3 and #7 above are different. They are not related to this specific incident, right? Have you fixed the database with the Doctor to solve them before today?

>>> yes they are different, the last time we did run the doctor was on 17.4. (see the log attached to #7)

2. Any other errors in the log today (except #1 above)?

>>> all the errors are in #7 log file

3. Did you try to restart the system without fixing the database?

>>> we did a doctor just some hours ago

4. Does the system work after fixing the database with the Doctor?

>>> after running the doctor it is running now, but currently there is nearly no activity

by the way, we are using

could that cause problems like these ?

#10

hgzpincher

Joined on 2023‑08‑03

we checked our internal log and exactly at the moment where the merger problem started we did have 2 threads accessing/committing that object

#11

support

Joined on 2010‑05‑03

> by the way, we are using
> <locking version-check="false" />
> could that cause problems like these ?

Excellent spot on. It helped a lot in reproducing the issue with a test program and repairing it.

The problem was in cleaning continuing sections of large objects (> 1980 bytes), when these large objects are updated to small objects (<= 1980 bytes). Usually an object is read, updated and written using the same EntityManager and an optimistic lock exception is thrown if another EntityManager breaks this sequence, so it is easy to know when a large object is updated to a small object. When version check is disabled, one EntityManager may update a small object that remains small, assuming clearing sections in not needed, but it may actually override a change to a large object by another EntityManager. So old obsolete sections that should have been deleted can remain in the database.

Version 2.9.3 fixes the bug.

Disabling version check can still cause database integrity issues, as updates are overridden silently, but at the application level, ObjectDB should not crash.

ObjectDB Support

#12

hgzpincher

Joined on 2023‑08‑03

perfect, when can you provide us that new version ?

#13

support

Joined on 2010‑05‑03

It is already available for download.

ObjectDB Support

#14

hgzpincher

Joined on 2023‑08‑03

Hi, observing the customers logs, it looks like the problem is solved, thanks

#15

hgzpincher

Joined on 2023‑08‑03

just now we have the same problem again

-04-29 13:13:11 #65 store]
SectionClassifier: SectionClassifier{'10d140f4-1ffe-4fb2-8bd3-6cb412db24ca'->merger[9507]-missing:1}

[2025-04-29 13:17:26 #66 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:31 #67 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:39 #68 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:39 #69 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:43:11 #70 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:11 #71 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #72 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #73 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

#16

hgzpincher

Joined on 2023‑08‑03

please check that again, we have again a more than bad situation

#17

support

Joined on 2010‑05‑03

Copies of the databases before (latest backup) and after the exception will help.

ObjectDB Support

#18

support

Joined on 2010‑05‑03

Rethinking about the repair, it wouldn't cover a mixed environment in which some users use version-check="false" and others version-check="true". This is probably not the case in your system, however, although the repair fixes the test case that we have that reproduces the failure, it may not cover some other specific case. Investigating the corrupted database against the latest database would be needed first to observe whether this is the same issue or another issue with a similar exception.

ObjectDB Support

#19

support

Joined on 2010‑05‑03

You may also try setting a system property (using System.setProperty on application start or as a -D JVM argument) that disables the optimisation that was found to be problematic:

    System.setProperty("com.objectdb.disable.optimization.small", "true");

This is more aggressive than the default in version 2.9.3 and would cover also a mixed environment in which some users use version-check="false" and others version-check="true".

ObjectDB Support

#20

hgzpincher

Joined on 2023‑08‑03

these are the 2 databases:

https://www.dropbox.com/scl/fi/8l09ajznb0haiw3z3uojb/coreSystemDb-backup.7z?rlkey=omnykeb5s1fmlfea9iq5dkhcb&dl=0

https://www.dropbox.com/scl/fi/y5jbjzcojvfb3yj1wb4gy/coreSystemDb-error.7z?rlkey=9gjoqcl0o0wtnawlgq8qhnt5o&dl=0

#21

hgzpincher

Joined on 2023‑08‑03

we also have, from the very past, set this property:

System.setProperty("objectdb.temp.concurrent-file-read", "true");

by the way, in our application it is not possible that some users use version-check="false" and others version-check="true".

#22

hgzpincher

Joined on 2023‑08‑03

do you have any news for us, the pressure is getting bigger and bigger

#23

support

Joined on 2010‑05‑03

Initial checks indicate that this is a different issue than the previous one that was fixed in version 2.9.3.

No errors have been found in the database file by the Doctor and also by manual inspection of the relevant objects. Both mentioned ObjectNode instances in the log on #15 (10d140f4-1ffe-4fb2-8bd3-6cb412db24ca, 3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b) seem valid. Both have been updated yesterday, possibly when the error was generated. So this time it might be a cache issue rather than database corruption.

As the database is not corrupted, there are less clues this time. We may need a new build with more logging details.

Meanwhile It may help if you could provide more information:

Have you run the Doctor? Can you confirm that there are no errors?
Are there additional errors in the log since the lines in #15?
Has the system crashed or the only indication of errors is in the log?
Have you restarted the system after it happened? Did it change anything?

ObjectDB Support

#24

hgzpincher

Joined on 2023‑08‑03

Have you run the Doctor? Can you confirm that there are no errors? >>> no, we just restarted
Are there additional errors in the log since the lines in #15? >>> no, we also added the property mentioned in #19
Has the system crashed or the only indication of errors is in the log? >>> no, just the log
Have you restarted the system after it happened? Did it change anything? >>> yes, we restarted with the property set

#25

support

Joined on 2010‑05‑03

OK. So this seems to be indeed a different issue, as the previous one was a database corruption issue. At least we can assume for now that there is no risk of database corruption, and in the worst case restarting the system to clear all cached data may help, for now.

The property from #19 addresses the previous issue (in more aggressive way than the default), so it is unlikely to solve the new issue.

Looking at #15 again, there was one report (at 13:13:11) regarding one object and two reports (at 13:17, 13:43) regarding another object. When did you restart the system? Is it possible that the errors were generated when these specific objects where modified and queried concurrently, but no exceptions are thrown later, i.e. restarting the system wasn't really essential and may have no effect in this case? This information is important in order to figure out whether a cache is involved or just a bad managed collision. It is also possible (in the optimal scenario) that this is just a noise in the log.

Anyway, the step forward should be a new build in which these log messages are replaced with more detailed messages.

ObjectDB Support

#26

hgzpincher

Joined on 2023‑08‑03

we restarted at:

[2025-04-29 13:43:11 #70 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:11 #71 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #72 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #73 store]
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 14:28:46 #74 store]
Database 'G:\Hummingbird-Database\objectdb\db\coreSystemDb.odb' is closed by 5848@MES-M02

[2025-04-29 14:28:46 #75 server]
Server on port 3333 has stopped by 5848@MES-M02

#27

hgzpincher

Joined on 2023‑08‑03

after running a stress test on our test database after several hours we have these errors

[2025-04-30 22:55:41 #20 store]
SectionClassifier: SectionClassifier{2->merger[85222]-missing:1}

[2025-04-30 23:47:58 #21 store]
SectionClassifier: SectionClassifier{6->merger[23770]-missing:1,9->merger[23762]-missing:1}

[2025-05-01 02:41:39 #22 store]
SectionClassifier: SectionClassifier{2->merger[164354]-missing:1}

[2025-05-01 03:12:08 #23 store]
SectionClassifier: SectionClassifier{2->merger[176336]-missing:1}

[2025-05-01 03:39:55 #24 store]
SectionClassifier: SectionClassifier{2->merger[186155]-missing:1}

[2025-05-01 05:10:36 #25 store]
SectionClassifier: SectionClassifier{2->merger[215993]-missing:1}

[2025-05-01 05:16:43 #26 store]
SectionClassifier: SectionClassifier{2->merger[217815]-missing:1}

[2025-05-01 05:48:58 #27 store]
Database '/Users/hgz/Documents/Hummingbird/App/objectdb/db/coreSystemDb.odb' is closed by 31829@MacBook-Pro.local

[2025-05-01 05:48:58 #28 server]
Server on port 3333 has stopped by 31829@MacBook-Pro.local

in that database

https://www.dropbox.com/scl/fi/w5cgxk0nm264wqjbf0bon/coreSystemDb-3.odb.zip?rlkey=afmdzfzw5kh4s37tkocxsapcg&dl=0

#28

hgzpincher

Joined on 2023‑08‑03

as already mentioned this cannot be just a noise in the log, the system is getting unusable in this error case

#29

support

Joined on 2010‑05‑03

> as already mentioned this cannot be just a noise in the log, the system is getting unusable in this error case

Yesterday's report indicated log errors only (#24, 3), so this update is important. Can you confirm that restarting the system stops the unstable state until the next log error?

> after running a stress test on our test database after several hours we have these errors

We also ran a stress test yesterday with an attempt to mimic your system (including #21, etc.) but couldn't get the errors. More details have been added to the log error messages (in development build, not released yet), but it is unlikely that these details will expose the cause when it happens. A live debugging may be essential. Can you share your stress test? No source is needed, just something runnable that we can try with different versions of objectdb.jar. This path may be very challenging and long but it is unclear what are the alternatives now.

ObjectDB Support

#30

hgzpincher

Joined on 2023‑08‑03

Can you confirm that restarting the system stops the unstable state until the next log error?

>>> yes

Can you share your stress test?

>>> yes, we prepare that, needs some minutes

#31

support

Joined on 2010‑05‑03

If there is a way to shorten the time that it takes for the errors to occur it would be very helpful, e.g. maybe by using larger ObjectNode objects that change their size frequently.

ObjectDB Support

#32

support

Joined on 2010‑05‑03

Running the Doctor on the stress test database (#27) does not show section errors, i.e. the database is not corrupted.

There is one application level error:

Broken References:
[1] com.agile.hummingbird.ObjectNode(2):'29c314bf-e504-4279-ae2c-3b62b8b8fa57'

Can you confirm that this is expected and should be ignored?

ObjectDB Support

#33

hgzpincher

Joined on 2023‑08‑03

this is expected, no problem

#34

hgzpincher

Joined on 2023‑08‑03

we are preparing the stresstest, important is to run simultaneous updates of the same objects at the same time

#35

support

Joined on 2010‑05‑03

Yes, and also simultaneous retrieval (as the log errors are generated during retrieval operations) and probably updates that change the number of 2K sections in the objects. Unfortunately our tests include all that but do not produce the errors.

ObjectDB Support

#36

hgzpincher

Joined on 2023‑08‑03

we are currently trying to create a stress test that shows results faster. It is a pain ... sometimes we have it after 10 minutes, sometimes after 2-3 hours

#37

hgzpincher

Joined on 2023‑08‑03

now, just running our application over the night we have again these entries

[2025-05-01 20:58:19 #21 store]
SectionClassifier: SectionClassifier{'2a9068c9-b843-49a1-93d1-f3034686ecba'->merger[89131]-missing:1}

[2025-05-01 20:58:45 #22 store]
SectionClassifier: SectionClassifier{'2a9068c9-b843-49a1-93d1-f3034686ecba'->merger[89131]-missing:1}

[2025-05-01 20:58:59 #23 store]
SectionClassifier: SectionClassifier{'2a9068c9-b843-49a1-93d1-f3034686ecba'->merger[89131]-missing:1}

[2025-05-02 06:30:01 #24 store]
Database '/Users/hgz/Documents/Hummingbird/App/objectdb/db/coreSystemDb.odb' is closed by 1615@MacBook-Pro.local

[2025-05-02 06:30:01 #25 server]
Server on port 3333 has stopped by 1615@MacBook-Pro.local

at both times there was a single application doing a calculation where the Action and ObjectNode class was involved. The problematic object gets a new Action object linked into the actions collection during that process. You can see that there are now 16798 Action objects linked

here is the database

https://www.dropbox.com/scl/fi/get290bgwfh73hr1p26y3/coreSystemDb.odb.zip?rlkey=89ygzbgyce1fyke4zhrcbwvn5&dl=0

#38

support

Joined on 2010‑05‑03

No Doctor errors in this database except the expected broken references.

> The problematic object gets a new Action object linked into the actions collection during that process.

> we are currently trying to create a stress test that shows results faster. It is a pain ... sometimes we have it after 10 minutes, sometimes after 2-3 hours

Maybe the exception frequency will increase if Actions instances will take more space, as the error seems to be related to changes that change the number of 2KB sections in ObjectNode instances.

ObjectDB Support

#39

hgzpincher

Joined on 2023‑08‑03

now we have another error

[2025-05-03 13:58:55 #31 *]
java.lang.ArrayIndexOutOfBoundsException: Index -4 out of bounds for length 2048
   at com.objectdb.o.BYR.O(BYR.java:912)
   at com.objectdb.o.BYR.R(BYR.java:995)
   at com.objectdb.o.BYR.H(BYR.java:1071)
   at com.objectdb.o.BYR.G(BYR.java:1059)
   at com.objectdb.o.MST.ay(MST.java:525)
   at com.objectdb.o.MST.au(MST.java:491)
   at com.objectdb.o.MST.ZC(MST.java:462)
   at com.objectdb.o.WRA.ZC(WRA.java:268)
   at com.objectdb.o.LDR.u(LDR.java:568)
   at com.objectdb.o.LDR.Yo(LDR.java:485)
   at com.objectdb.o.OBC.Yo(OBC.java:1131)
   at com.objectdb.o.ERF.c(ERF.java:105)
   at com.objectdb.o.TVS.b(TVS.java:105)
   at com.objectdb.o.TVS.b(TVS.java:94)
   at com.objectdb.jpa.EMImpl.refresh(EMImpl.java:759)
   at com.objectdb.jpa.EMImpl.refresh(EMImpl.java:700)
   at com.objectdb.jdo.PMImpl.refresh(PMImpl.java:989)
   at com.agile.hummingbirdcore.CoreDaemon.refresh(CoreDaemon.java:801)
   at com.agile.hummingbirdcore.AC_Action.endAction(AC_Action.java:1609)
   at com.agile.hummingbirdcore.API.executeEndAction(API.java:3017)
   at com.agile.hummingbirdcore.API.handleAction(API.java:2533)
   at com.agile.hummingbirdcore.API.handleActionBulk(API.java:2133)
   at com.agile.hummingbirdcore.API.directRequest(API.java:1695)
   at com.agile.hummingbirdcore.API.handleWebSocketRequest(API.java:1381)
   at com.agile.hummingbirdcore.WebSocketServerListener.onWebSocketText(WebSocketServerListener.java:212)
   at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:296)
   at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
   at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:67)
   at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:235)
   at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:152)
   at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:326)
   at org.eclipse.jetty.websocket.common.extensions.AbstractExtension.nextIncomingFrame(AbstractExtension.java:148)
   at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.nextIncomingFrame(PerMessageDeflateExtension.java:111)
   at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.forwardIncoming(CompressExtension.java:169)
   at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.incomingFrame(PerMessageDeflateExtension.java:90)
   at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:202)
   at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:225)
   at org.eclipse.jetty.websocket.common.Parser.parseSingleFrame(Parser.java:259)
   at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:459)
   at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
   at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
   at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
   at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   at java.base/java.lang.Thread.run(Thread.java:840)

#40

support

Joined on 2010‑05‑03

The new error may possibly be related to the same low-level issue. ObjectDB uses snapshots of the database, so changes by one transaction are not visible by operations that started before that transaction was committed. Assuming there are no errors in the database itself (as in previous reports) there may be something broken in the snapshot system, which results in using unexpected mix of page versions.

This could be in one of three main levels:

In processing uncommitted flushed changes - Does your application/test use flush (without or with later commit)?
In the page cache that manages all the snapshots.
In reading from the file. It would be useful to know if the errors are still produced when the option in #21 is disabled.

Anyway, given a test case that reproduces the errors, we should hopefully be able to find the exact cause and fix it.

ObjectDB Support

#41

hgzpincher

Joined on 2023‑08‑03

In processing uncommitted flushed changes - Does your application/test use flush (without or with later commit)?
>>> we removed all flushs and since that we do not have these errors any more
In reading from the file. It would be useful to know if the errors are still produced when the option in #21 is disabled.
>>> removing the option does not help

#42

support

Joined on 2010‑05‑03

>>> we removed all flushs and since that we do not have these errors any more

If this is the case it narrows the search for the bug considerably. Have you disabled it in your test or in your application?

Note that there may also be an automatic flush on queries after uncommitted changes.

Anyway, it may take more time to be sure that errors stopped and flush is indeed related to this issue.

ObjectDB Support

#43

hgzpincher

Joined on 2023‑08‑03

Have you disabled it in your test or in your application?

>>> in the application, we stress test against our application

#44

support

Joined on 2010‑05‑03

We now have a test that reproduces the issue and general understanding of what happens. It is indeed a different bug than the first on this thread (posts 1-14), which was fixed in version 2.9.3. It does not cause database corruption. It affects refresh operations after flush, and maybe other operations, but apparently queries are not affected directly. The effect should be local, i.e. between flush to commit/rollback or closing of the EntityManager (although it can happen again if the same conditions repeat). Apparently restart is not needed to recover, just a new EntityManager. It can cause the errors in #15, #26, #27, and #37 as well as other errors, such the error in #39, which looks totally different (the stack trace in #39 helped in creating a test and then finding the bug, as it indicates using refresh).

A new build will be released as soon as a repair is available. Meanwhile, avoiding flush seems as a good workaround, if possible.

ObjectDB Support

#45

hgzpincher

Joined on 2023‑08‑03

that's good news

#46

support

Joined on 2010‑05‑03

Build 2.9.3_01 (just released) should fix the second bug that is reported in this thread (from post 15).

In some cases, flush without commit could cause unexpected behviour in refresh (and possibly other operations) until commit/rollback or closing the EntityManager. The cause was premature release of previous database snapshots that the flush snapshot depends on.

A side effect of the repair could be increased memory usage when the uncommitted flushed transaction takes a long time (or the relevant EntityManager is not closed properly).

This change can be disabled by setting a system property:

    System.setProperty("objectdb.old-flush-optimization", "true");

ObjectDB Support