Issue #2964: missing merger

Type: Bug ReoprtVersion: 2.9.2_03Priority: CriticalStatus: FixedReplies: 27
#1

At our major aerospace customer we are facing again and again the problem that these errors appear suddenly in the log (we always do a doctor in that case but just after some hours it is there again), the system is completely unusable then:

 

[2025-04-22 06:36:36 #67 query.compiler] 
Large number of query plan combinations (32768) in query select o from ObjectNode o where o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.9.11" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.9.10" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.9" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.7" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.7" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.6" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.5" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.11" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.7.3" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.7.2" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.16.5" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.16" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.15" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055559.14.1" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055559.13.1".

[2025-04-22 12:33:37 #68 store] 
SectionClassifier: SectionClassifier{9->merger[2083]-missing:1}

[2025-04-22 12:33:48 #69 store] 
SectionClassifier: SectionClassifier{10->merger[2083]-missing:1}

[2025-04-22 12:33:50 #70 store] 
SectionClassifier: SectionClassifier{3885->merger[2083]-missing:1}

[2025-04-22 12:33:53 #71 store] 
SectionClassifier: SectionClassifier{3577->merger[2083]-missing:1}

[2025-04-22 12:33:59 #72 store] 
SectionClassifier: SectionClassifier{3577->merger[2083]-missing:1}

 

 

#2

This error message indicates that parts of objects  have not been found during retrieval (in objects with IDs 9, 10, 3855 and 3577), but the cause is unclear.

Please advise whether there are errors when you run the Doctor after this happens (and share the errors if any) and whether just restart helps, at least for some time. In addition, please clarify what do you mean by "system is completely unusable" in the context of ObjectDB, i.e. what is your observation regarding ObjectDB operations, exceptions, log, etc. after this happens.

ObjectDB Support
#3

Find attached the log oft he last doctor, regarding the other questions:

In addition, please clarify what do you mean by "system is completely unusable" in the context of ObjectDB, i.e. what is your observation regarding ObjectDB operations, exceptions, log, etc. after this happens.

>>> our complete system strucks with these errors, normal functionalities that need to save changes to objects are not working any more

We’ll try to download the corrupt database now and send you a download link as soon as we have that

#4

Sorry but we still cannot post in your tocket system.

We are using Java 17 now but still javax, could that be the problem ?

Here the link tot he backup of this night, no doctor done on that

https://www.dropbox.com/scl/fi/ulsgyq2t7uz4bhqvyjc5b/coreSystemDb.7z?rlkey=o3tpvdnz78b4ujvy2r69pbz97&dl=0

#5

This ist he log file from today, we have now to shutdown the system, our customer ISAR Aerospace changed to a desaster mode regarding our product

Sorry but we still cannot post in your tocket system.

We are using Java 17 now but still javax, could that be the problem ?

here the link to the current database (will take some more minutes until available
https://www.dropbox.com/scl/fi/ype139jhwcvdbctgoivrg/coreSystemDb-Kopie.7z?rlkey=wsedvt77vn8sksb7k95kh5aa5&dl=0


 

#6

Looking at the Doctor log file (#3 above) - the following lines indicate a corrupted database:

Large Sections with Errors:
Group #1:
Page#5699973 2:'3835471f-1a5e-4c23-82c1-aac8f7b41f14' 1/2 1980+218/2198
Group #2:
Page#5565011 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 1/8 1980+1980/14345
Page#5059295 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 2/8 3960+1980/14345
Page#5565032 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 3/8 5940+1980/14345
Page#5565033 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 4/8 7920+1980/14345
Page#5565034 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 5/8 9900+1980/14345
Page#5565035 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 6/8 11880+1980/14345
Page#5565036 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 7/8 13860+485/14345

So apparently the best quick emergency solution would be to use the Doctor to fix the database.

Maybe the difference between the databases in #4 and #5 can provide a clue regarding the cause of that corruption. If there is any other clue regarding special activity today at the time that these errors start please report.

> We are using Java 17 now but still javax, could that be the problem ?

No. Using Java 17 and javax should be fine.

> Sorry but we still cannot post in your ticket system.

Hopefully Reply should work now.

 

 

ObjectDB Support
#7

If there is any other clue regarding special activity today at the time that these errors start please report

>>> no it is completely random, we also have these errors:

[2025-04-13 17:59:03 #25 *] 
[ObjectDB 2.9.2_02] Unexpected exception (Error 990)
  Generated by OpenJDK 64-Bit Server VM 17.0.13 (on Windows Server 2019 10.0).
Please report this error on https://www.objectdb.com/issue
com.objectdb.o.InternalException: merge2 com.agile.hummingbird.ObjectNode:'3835496a-7399-4d22-b6b1-91c107ee323d' => merger[15352]-missing:1
com.objectdb.o.InternalException: merge2 com.agile.hummingbird.ObjectNode:'3835496a-7399-4d22-b6b1-91c107ee323d' => merger[15352]-missing:1
    at com.objectdb.o.PBI.h(PBI.java:126)
    at com.objectdb.o.PBI.g(PBI.java:99)
    at com.objectdb.o.OBI.ZZ(OBI.java:249)
    at com.objectdb.o.BQI.Zn(BQI.java:153)
    at com.objectdb.o.MQI.Zn(MQI.java:122)
    at com.objectdb.o.MQI.Zn(MQI.java:122)
    at com.objectdb.o.MQI.Zn(MQI.java:122)
    at com.objectdb.o.PRG.d(PRG.java:795)
    at com.objectdb.o.PRG.h(PRG.java:724)
    at com.objectdb.o.PRG.g(PRG.java:560)
    at com.objectdb.o.QRM.Z1(QRM.java:287)
    at com.objectdb.o.MST.Z1(MST.java:1026)
    at com.objectdb.o.WRA.Z1(WRA.java:313)
    at com.objectdb.o.WSM.Z1(WSM.java:117)
    at com.objectdb.o.QRR.k(QRR.java:260)
    at com.objectdb.o.QRR.i(QRR.java:154)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:677)
    at com.agile.hummingbirdcore.SS_ScheduledSlot.calculateSlotData(SS_ScheduledSlot.java:1146)
    at com.agile.hummingbirdcore.SS_ScheduledSlot.onOneTimeCall(SS_ScheduledSlot.java:227)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at com.agile.hummingbirdcore.CyclicProcessor.doSingleTimeCalls(CyclicProcessor.java:268)
    at com.agile.hummingbirdcore.CyclicProcessor.access$300(CyclicProcessor.java:49)
    at com.agile.hummingbirdcore.CyclicProcessor$1.actionPerformed(CyclicProcessor.java:135)
    at java.desktop/javax.swing.Timer.fireActionPerformed(Timer.java:311)
    at java.desktop/javax.swing.Timer$DoPostEvent.run(Timer.java:243)
    at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318)
    at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773)
    at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720)
    at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714)
    at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
    at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
    at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
    at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
    at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
    at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
    at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
    at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
    at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

 

#8

Some initial observations:

Running the Doctor on the database at #4 (yesterday's backup) produced no errors.

Running the Doctor on the database at #5 (today) produced some broken references errors and:

Large Sections with Errors:
Group #1:
Page#5896863 2:'31a9588f-3da4-49ca-b466-bc3f5eaa77b6' 1/2 1980+103/2083

This ObjectNode object was last modified (according to your modified field) today at 12:33, exactly when the errors started.

So apparently something with that transaction produced the database errors.

Running the following query in the explorer:

select from ObjectNode where uUid = "31a9588f-3da4-49ca-b466-bc3f5eaa77b6"

returns 2 objects, which is unexpected of course and will be examined further.

Repairing the database with the Doctor should hopefully keep only the new version and remove the old version with missing parts. Please run the above query on the repaired database to check.

Strangely, the errors in the log (at #1) indicate IDs 9, 10, 3885, 3577 - maybe ObjectProperty instances? These errors are probably related because of the time, 12:33,  but these ObjectProperty instances are not corrupted.

Questions:

  1. The Doctor errors in the logs at #3 and #7 above are different. They are not related to this specific incident, right? Have you fixed the database with the Doctor to solve them before today?
  2. Any other errors in the log today (except #1 above)?
  3. Did you try to restart the system without fixing the database?
  4. Does the system work after fixing the database with the Doctor?
ObjectDB Support
#9
  1. The Doctor errors in the logs at #3 and #7 above are different. They are not related to this specific incident, right? Have you fixed the database with the Doctor to solve them before today?

>>> yes they are different, the last time we did run the doctor was on 17.4. (see the log attached to #7)

2. Any other errors in the log today (except #1 above)?

>>> all the errors are in #7 log file

3. Did you try to restart the system without fixing the database?

>>> we did a doctor just some hours ago

4. Does the system work after fixing the database with the Doctor?

>>> after running the doctor it is running now, but currently there is nearly no activity

by the way, we are using

<locking version-check="false" />

could that cause problems like these ?

 

 

#10

we checked our internal log and exactly at the moment where the merger problem started we did have 2 threads accessing/committing that object

#11

> by the way, we are using
> <locking version-check="false" />
> could that cause problems like these ?

Excellent spot on. It helped a lot in reproducing the issue with a test program and repairing it.

The problem was in cleaning continuing sections of large objects (> 1980 bytes), when these large objects are updated to small objects (<= 1980 bytes). Usually an object is read, updated and written using the same EntityManager and an optimistic lock exception is thrown if another EntityManager breaks this sequence, so it is easy to know when a large object is updated to a small object. When version check is disabled, one EntityManager may update a small object that remains small, assuming clearing sections in not needed, but it may actually override a change to a large object by another EntityManager. So old obsolete sections that should have been deleted can remain in the database.

Version 2.9.3 fixes the bug.

Disabling version check can still cause database integrity issues, as updates are overridden silently, but at the application level, ObjectDB should not crash.

ObjectDB Support
#12

perfect, when can you provide us that new version ?

#13

It is already available for download.

ObjectDB Support
#14

Hi, observing the customers logs, it looks like the problem is solved, thanks

#15

just now we have the same problem again

 

-04-29 13:13:11 #65 store] 
SectionClassifier: SectionClassifier{'10d140f4-1ffe-4fb2-8bd3-6cb412db24ca'->merger[9507]-missing:1}

[2025-04-29 13:17:26 #66 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:31 #67 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:39 #68 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:39 #69 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:43:11 #70 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:11 #71 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #72 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #73 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

 

#16

please check that again, we have again a more than bad situation

#17

Copies of the databases before (latest backup) and after the exception will help.

ObjectDB Support
#18

Rethinking about the repair, it wouldn't cover a mixed environment in which some users use version-check="false" and others version-check="true". This is probably not the case in your system, however, although the repair fixes the test case that we have that reproduces the failure, it may not cover some other specific case. Investigating the corrupted database against the latest database would be needed first to observe whether this is the same issue or another issue with a similar exception.

ObjectDB Support
#19

You may also try setting a system property (using System.setProperty on application start or as a -D JVM argument) that disables the optimisation that was found to be problematic:

    System.setProperty("com.objectdb.disable.optimization.small", "true");

This is more aggressive than the default in version 2.9.3 and would cover also a mixed environment in which some users use version-check="false" and others version-check="true".

ObjectDB Support
#20

these are the 2 databases:

https://www.dropbox.com/scl/fi/8l09ajznb0haiw3z3uojb/coreSystemDb-backup.7z?rlkey=omnykeb5s1fmlfea9iq5dkhcb&dl=0

https://www.dropbox.com/scl/fi/y5jbjzcojvfb3yj1wb4gy/coreSystemDb-error.7z?rlkey=9gjoqcl0o0wtnawlgq8qhnt5o&dl=0

#21

we also have, from the very past, set this property:
 

System.setProperty("objectdb.temp.concurrent-file-read", "true");

by the way, in our application it is not possible that some users use version-check="false" and others version-check="true".

#22

do you have any news for us, the pressure is getting bigger and bigger

#23

Initial checks indicate that this is a different issue than the previous one that was fixed in version 2.9.3.

No errors have been found in the database file by the Doctor and also by manual inspection of the relevant objects. Both mentioned ObjectNode instances in the log on #15 (10d140f4-1ffe-4fb2-8bd3-6cb412db24ca,  3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b) seem valid. Both have been updated yesterday, possibly when the error was generated. So this time it might be a cache issue rather than database corruption.

As the database is not corrupted, there are less clues this time. We may need a new build with more logging details.

Meanwhile It may help if you could provide more information:

  1. Have you run the Doctor? Can you confirm that there are no errors?
  2. Are there additional errors in the log since the lines in #15?
  3. Has the system crashed or the only indication of errors is in the log?
  4. Have you restarted the system after it happened? Did it change anything?
ObjectDB Support
#24
  1. Have you run the Doctor? Can you confirm that there are no errors?  >>> no, we just restarted
  2. Are there additional errors in the log since the lines in #15? >>> no, we also added the property mentioned in #19
  3. Has the system crashed or the only indication of errors is in the log? >>> no, just the log
  4. Have you restarted the system after it happened? Did it change anything? >>> yes, we restarted with the property set
#25

OK. So this seems to be indeed a different issue, as the previous one was a database corruption issue. At least we can assume for now that there is no risk of database corruption, and in the worst case restarting the system to clear all cached data may help, for now.

The property from #19 addresses the previous issue (in more aggressive way than the default), so it is unlikely to solve the new issue.

Looking at #15 again, there was one report (at 13:13:11) regarding one object and two reports (at 13:17, 13:43) regarding another object. When did you restart the system? Is it possible that the errors were generated when these specific objects where modified and queried concurrently, but no exceptions are thrown later, i.e. restarting the system wasn't really essential and may have no effect in this case? This information is important in order to figure out whether a cache is involved or just a bad managed collision. It is also possible (in the optimal scenario) that this is just a noise in the log.

Anyway, the step forward should be a new build in which these log messages are replaced with more detailed messages.

ObjectDB Support
#26

we restarted at:

 

[2025-04-29 13:43:11 #70 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:11 #71 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #72 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #73 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 14:28:46 #74 store] 
Database 'G:\Hummingbird-Database\objectdb\db\coreSystemDb.odb' is closed by 5848@MES-M02

[2025-04-29 14:28:46 #75 server] 
Server on port 3333 has stopped by 5848@MES-M02

 

#27

after running a stress test on our test database after several hours we have these errors

[2025-04-30 22:55:41 #20 store] 
SectionClassifier: SectionClassifier{2->merger[85222]-missing:1}

[2025-04-30 23:47:58 #21 store] 
SectionClassifier: SectionClassifier{6->merger[23770]-missing:1,9->merger[23762]-missing:1}

[2025-05-01 02:41:39 #22 store] 
SectionClassifier: SectionClassifier{2->merger[164354]-missing:1}

[2025-05-01 03:12:08 #23 store] 
SectionClassifier: SectionClassifier{2->merger[176336]-missing:1}

[2025-05-01 03:39:55 #24 store] 
SectionClassifier: SectionClassifier{2->merger[186155]-missing:1}

[2025-05-01 05:10:36 #25 store] 
SectionClassifier: SectionClassifier{2->merger[215993]-missing:1}

[2025-05-01 05:16:43 #26 store] 
SectionClassifier: SectionClassifier{2->merger[217815]-missing:1}

[2025-05-01 05:48:58 #27 store] 
Database '/Users/hgz/Documents/Hummingbird/App/objectdb/db/coreSystemDb.odb' is closed by 31829@MacBook-Pro.local

[2025-05-01 05:48:58 #28 server] 
Server on port 3333 has stopped by 31829@MacBook-Pro.local

 

in that database

https://www.dropbox.com/scl/fi/w5cgxk0nm264wqjbf0bon/coreSystemDb-3.odb.zip?rlkey=afmdzfzw5kh4s37tkocxsapcg&dl=0

 

 

#28

as already mentioned this cannot be just a noise in the log, the system is getting unusable in this error case

Reply