Issue #2964: missing merger

Type: Bug ReoprtVersion: 2.9.2_03Priority: CriticalStatus: FixedReplies: 45
#1

At our major aerospace customer we are facing again and again the problem that these errors appear suddenly in the log (we always do a doctor in that case but just after some hours it is there again), the system is completely unusable then:

 

[2025-04-22 06:36:36 #67 query.compiler] 
Large number of query plan combinations (32768) in query select o from ObjectNode o where o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.9.11" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.9.10" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.9" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.7" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.7" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.6" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.8.6.5" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055562.11" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.7.3" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.7.2" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.16.5" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.16" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055560.15" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055559.14.1" or o.nodePath = "/(CR)SN02/(UC)SN02 12/(YC)TAKT 3/(PR)S2-LPP-0395/(PD)055559.13.1".

[2025-04-22 12:33:37 #68 store] 
SectionClassifier: SectionClassifier{9->merger[2083]-missing:1}

[2025-04-22 12:33:48 #69 store] 
SectionClassifier: SectionClassifier{10->merger[2083]-missing:1}

[2025-04-22 12:33:50 #70 store] 
SectionClassifier: SectionClassifier{3885->merger[2083]-missing:1}

[2025-04-22 12:33:53 #71 store] 
SectionClassifier: SectionClassifier{3577->merger[2083]-missing:1}

[2025-04-22 12:33:59 #72 store] 
SectionClassifier: SectionClassifier{3577->merger[2083]-missing:1}

 

 

#2

This error message indicates that parts of objects  have not been found during retrieval (in objects with IDs 9, 10, 3855 and 3577), but the cause is unclear.

Please advise whether there are errors when you run the Doctor after this happens (and share the errors if any) and whether just restart helps, at least for some time. In addition, please clarify what do you mean by "system is completely unusable" in the context of ObjectDB, i.e. what is your observation regarding ObjectDB operations, exceptions, log, etc. after this happens.

ObjectDB Support
#3

Find attached the log oft he last doctor, regarding the other questions:

In addition, please clarify what do you mean by "system is completely unusable" in the context of ObjectDB, i.e. what is your observation regarding ObjectDB operations, exceptions, log, etc. after this happens.

>>> our complete system strucks with these errors, normal functionalities that need to save changes to objects are not working any more

We’ll try to download the corrupt database now and send you a download link as soon as we have that

#4

Sorry but we still cannot post in your tocket system.

We are using Java 17 now but still javax, could that be the problem ?

Here the link tot he backup of this night, no doctor done on that

https://www.dropbox.com/scl/fi/ulsgyq2t7uz4bhqvyjc5b/coreSystemDb.7z?rlkey=o3tpvdnz78b4ujvy2r69pbz97&dl=0

#5

This ist he log file from today, we have now to shutdown the system, our customer ISAR Aerospace changed to a desaster mode regarding our product

Sorry but we still cannot post in your tocket system.

We are using Java 17 now but still javax, could that be the problem ?

here the link to the current database (will take some more minutes until available
https://www.dropbox.com/scl/fi/ype139jhwcvdbctgoivrg/coreSystemDb-Kopie.7z?rlkey=wsedvt77vn8sksb7k95kh5aa5&dl=0


 

#6

Looking at the Doctor log file (#3 above) - the following lines indicate a corrupted database:

Large Sections with Errors:
Group #1:
Page#5699973 2:'3835471f-1a5e-4c23-82c1-aac8f7b41f14' 1/2 1980+218/2198
Group #2:
Page#5565011 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 1/8 1980+1980/14345
Page#5059295 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 2/8 3960+1980/14345
Page#5565032 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 3/8 5940+1980/14345
Page#5565033 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 4/8 7920+1980/14345
Page#5565034 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 5/8 9900+1980/14345
Page#5565035 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 6/8 11880+1980/14345
Page#5565036 2:'e33e869c-c6ea-467c-add8-38b53fb5d83e' 7/8 13860+485/14345

So apparently the best quick emergency solution would be to use the Doctor to fix the database.

Maybe the difference between the databases in #4 and #5 can provide a clue regarding the cause of that corruption. If there is any other clue regarding special activity today at the time that these errors start please report.

> We are using Java 17 now but still javax, could that be the problem ?

No. Using Java 17 and javax should be fine.

> Sorry but we still cannot post in your ticket system.

Hopefully Reply should work now.

 

 

ObjectDB Support
#7

If there is any other clue regarding special activity today at the time that these errors start please report

>>> no it is completely random, we also have these errors:

[2025-04-13 17:59:03 #25 *] 
[ObjectDB 2.9.2_02] Unexpected exception (Error 990)
  Generated by OpenJDK 64-Bit Server VM 17.0.13 (on Windows Server 2019 10.0).
Please report this error on https://www.objectdb.com/issue
com.objectdb.o.InternalException: merge2 com.agile.hummingbird.ObjectNode:'3835496a-7399-4d22-b6b1-91c107ee323d' => merger[15352]-missing:1
com.objectdb.o.InternalException: merge2 com.agile.hummingbird.ObjectNode:'3835496a-7399-4d22-b6b1-91c107ee323d' => merger[15352]-missing:1
    at com.objectdb.o.PBI.h(PBI.java:126)
    at com.objectdb.o.PBI.g(PBI.java:99)
    at com.objectdb.o.OBI.ZZ(OBI.java:249)
    at com.objectdb.o.BQI.Zn(BQI.java:153)
    at com.objectdb.o.MQI.Zn(MQI.java:122)
    at com.objectdb.o.MQI.Zn(MQI.java:122)
    at com.objectdb.o.MQI.Zn(MQI.java:122)
    at com.objectdb.o.PRG.d(PRG.java:795)
    at com.objectdb.o.PRG.h(PRG.java:724)
    at com.objectdb.o.PRG.g(PRG.java:560)
    at com.objectdb.o.QRM.Z1(QRM.java:287)
    at com.objectdb.o.MST.Z1(MST.java:1026)
    at com.objectdb.o.WRA.Z1(WRA.java:313)
    at com.objectdb.o.WSM.Z1(WSM.java:117)
    at com.objectdb.o.QRR.k(QRR.java:260)
    at com.objectdb.o.QRR.i(QRR.java:154)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:677)
    at com.agile.hummingbirdcore.SS_ScheduledSlot.calculateSlotData(SS_ScheduledSlot.java:1146)
    at com.agile.hummingbirdcore.SS_ScheduledSlot.onOneTimeCall(SS_ScheduledSlot.java:227)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:569)
    at com.agile.hummingbirdcore.CyclicProcessor.doSingleTimeCalls(CyclicProcessor.java:268)
    at com.agile.hummingbirdcore.CyclicProcessor.access$300(CyclicProcessor.java:49)
    at com.agile.hummingbirdcore.CyclicProcessor$1.actionPerformed(CyclicProcessor.java:135)
    at java.desktop/javax.swing.Timer.fireActionPerformed(Timer.java:311)
    at java.desktop/javax.swing.Timer$DoPostEvent.run(Timer.java:243)
    at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318)
    at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773)
    at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720)
    at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714)
    at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
    at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
    at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
    at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
    at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
    at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
    at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
    at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
    at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

 

#8

Some initial observations:

Running the Doctor on the database at #4 (yesterday's backup) produced no errors.

Running the Doctor on the database at #5 (today) produced some broken references errors and:

Large Sections with Errors:
Group #1:
Page#5896863 2:'31a9588f-3da4-49ca-b466-bc3f5eaa77b6' 1/2 1980+103/2083

This ObjectNode object was last modified (according to your modified field) today at 12:33, exactly when the errors started.

So apparently something with that transaction produced the database errors.

Running the following query in the explorer:

select from ObjectNode where uUid = "31a9588f-3da4-49ca-b466-bc3f5eaa77b6"

returns 2 objects, which is unexpected of course and will be examined further.

Repairing the database with the Doctor should hopefully keep only the new version and remove the old version with missing parts. Please run the above query on the repaired database to check.

Strangely, the errors in the log (at #1) indicate IDs 9, 10, 3885, 3577 - maybe ObjectProperty instances? These errors are probably related because of the time, 12:33,  but these ObjectProperty instances are not corrupted.

Questions:

  1. The Doctor errors in the logs at #3 and #7 above are different. They are not related to this specific incident, right? Have you fixed the database with the Doctor to solve them before today?
  2. Any other errors in the log today (except #1 above)?
  3. Did you try to restart the system without fixing the database?
  4. Does the system work after fixing the database with the Doctor?
ObjectDB Support
#9
  1. The Doctor errors in the logs at #3 and #7 above are different. They are not related to this specific incident, right? Have you fixed the database with the Doctor to solve them before today?

>>> yes they are different, the last time we did run the doctor was on 17.4. (see the log attached to #7)

2. Any other errors in the log today (except #1 above)?

>>> all the errors are in #7 log file

3. Did you try to restart the system without fixing the database?

>>> we did a doctor just some hours ago

4. Does the system work after fixing the database with the Doctor?

>>> after running the doctor it is running now, but currently there is nearly no activity

by the way, we are using

<locking version-check="false" />

could that cause problems like these ?

 

 

#10

we checked our internal log and exactly at the moment where the merger problem started we did have 2 threads accessing/committing that object

#11

> by the way, we are using
> <locking version-check="false" />
> could that cause problems like these ?

Excellent spot on. It helped a lot in reproducing the issue with a test program and repairing it.

The problem was in cleaning continuing sections of large objects (> 1980 bytes), when these large objects are updated to small objects (<= 1980 bytes). Usually an object is read, updated and written using the same EntityManager and an optimistic lock exception is thrown if another EntityManager breaks this sequence, so it is easy to know when a large object is updated to a small object. When version check is disabled, one EntityManager may update a small object that remains small, assuming clearing sections in not needed, but it may actually override a change to a large object by another EntityManager. So old obsolete sections that should have been deleted can remain in the database.

Version 2.9.3 fixes the bug.

Disabling version check can still cause database integrity issues, as updates are overridden silently, but at the application level, ObjectDB should not crash.

ObjectDB Support
#12

perfect, when can you provide us that new version ?

#13

It is already available for download.

ObjectDB Support
#14

Hi, observing the customers logs, it looks like the problem is solved, thanks

#15

just now we have the same problem again

 

-04-29 13:13:11 #65 store] 
SectionClassifier: SectionClassifier{'10d140f4-1ffe-4fb2-8bd3-6cb412db24ca'->merger[9507]-missing:1}

[2025-04-29 13:17:26 #66 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:31 #67 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:39 #68 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:17:39 #69 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[8860]-missing:4}

[2025-04-29 13:43:11 #70 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:11 #71 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #72 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #73 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

 

#16

please check that again, we have again a more than bad situation

#17

Copies of the databases before (latest backup) and after the exception will help.

ObjectDB Support
#18

Rethinking about the repair, it wouldn't cover a mixed environment in which some users use version-check="false" and others version-check="true". This is probably not the case in your system, however, although the repair fixes the test case that we have that reproduces the failure, it may not cover some other specific case. Investigating the corrupted database against the latest database would be needed first to observe whether this is the same issue or another issue with a similar exception.

ObjectDB Support
#19

You may also try setting a system property (using System.setProperty on application start or as a -D JVM argument) that disables the optimisation that was found to be problematic:

    System.setProperty("com.objectdb.disable.optimization.small", "true");

This is more aggressive than the default in version 2.9.3 and would cover also a mixed environment in which some users use version-check="false" and others version-check="true".

ObjectDB Support
#20

these are the 2 databases:

https://www.dropbox.com/scl/fi/8l09ajznb0haiw3z3uojb/coreSystemDb-backup.7z?rlkey=omnykeb5s1fmlfea9iq5dkhcb&dl=0

https://www.dropbox.com/scl/fi/y5jbjzcojvfb3yj1wb4gy/coreSystemDb-error.7z?rlkey=9gjoqcl0o0wtnawlgq8qhnt5o&dl=0

#21

we also have, from the very past, set this property:
 

System.setProperty("objectdb.temp.concurrent-file-read", "true");

by the way, in our application it is not possible that some users use version-check="false" and others version-check="true".

#22

do you have any news for us, the pressure is getting bigger and bigger

#23

Initial checks indicate that this is a different issue than the previous one that was fixed in version 2.9.3.

No errors have been found in the database file by the Doctor and also by manual inspection of the relevant objects. Both mentioned ObjectNode instances in the log on #15 (10d140f4-1ffe-4fb2-8bd3-6cb412db24ca,  3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b) seem valid. Both have been updated yesterday, possibly when the error was generated. So this time it might be a cache issue rather than database corruption.

As the database is not corrupted, there are less clues this time. We may need a new build with more logging details.

Meanwhile It may help if you could provide more information:

  1. Have you run the Doctor? Can you confirm that there are no errors?
  2. Are there additional errors in the log since the lines in #15?
  3. Has the system crashed or the only indication of errors is in the log?
  4. Have you restarted the system after it happened? Did it change anything?
ObjectDB Support
#24
  1. Have you run the Doctor? Can you confirm that there are no errors?  >>> no, we just restarted
  2. Are there additional errors in the log since the lines in #15? >>> no, we also added the property mentioned in #19
  3. Has the system crashed or the only indication of errors is in the log? >>> no, just the log
  4. Have you restarted the system after it happened? Did it change anything? >>> yes, we restarted with the property set
#25

OK. So this seems to be indeed a different issue, as the previous one was a database corruption issue. At least we can assume for now that there is no risk of database corruption, and in the worst case restarting the system to clear all cached data may help, for now.

The property from #19 addresses the previous issue (in more aggressive way than the default), so it is unlikely to solve the new issue.

Looking at #15 again, there was one report (at 13:13:11) regarding one object and two reports (at 13:17, 13:43) regarding another object. When did you restart the system? Is it possible that the errors were generated when these specific objects where modified and queried concurrently, but no exceptions are thrown later, i.e. restarting the system wasn't really essential and may have no effect in this case? This information is important in order to figure out whether a cache is involved or just a bad managed collision. It is also possible (in the optimal scenario) that this is just a noise in the log.

Anyway, the step forward should be a new build in which these log messages are replaced with more detailed messages.

ObjectDB Support
#26

we restarted at:

 

[2025-04-29 13:43:11 #70 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:11 #71 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #72 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 13:43:14 #73 store] 
SectionClassifier: SectionClassifier{'3a428d6d-0b2c-4ab4-bb2e-bd0227a3780b'->merger[6672]-missing:1}

[2025-04-29 14:28:46 #74 store] 
Database 'G:\Hummingbird-Database\objectdb\db\coreSystemDb.odb' is closed by 5848@MES-M02

[2025-04-29 14:28:46 #75 server] 
Server on port 3333 has stopped by 5848@MES-M02

 

#27

after running a stress test on our test database after several hours we have these errors

[2025-04-30 22:55:41 #20 store] 
SectionClassifier: SectionClassifier{2->merger[85222]-missing:1}

[2025-04-30 23:47:58 #21 store] 
SectionClassifier: SectionClassifier{6->merger[23770]-missing:1,9->merger[23762]-missing:1}

[2025-05-01 02:41:39 #22 store] 
SectionClassifier: SectionClassifier{2->merger[164354]-missing:1}

[2025-05-01 03:12:08 #23 store] 
SectionClassifier: SectionClassifier{2->merger[176336]-missing:1}

[2025-05-01 03:39:55 #24 store] 
SectionClassifier: SectionClassifier{2->merger[186155]-missing:1}

[2025-05-01 05:10:36 #25 store] 
SectionClassifier: SectionClassifier{2->merger[215993]-missing:1}

[2025-05-01 05:16:43 #26 store] 
SectionClassifier: SectionClassifier{2->merger[217815]-missing:1}

[2025-05-01 05:48:58 #27 store] 
Database '/Users/hgz/Documents/Hummingbird/App/objectdb/db/coreSystemDb.odb' is closed by 31829@MacBook-Pro.local

[2025-05-01 05:48:58 #28 server] 
Server on port 3333 has stopped by 31829@MacBook-Pro.local

 

in that database

https://www.dropbox.com/scl/fi/w5cgxk0nm264wqjbf0bon/coreSystemDb-3.odb.zip?rlkey=afmdzfzw5kh4s37tkocxsapcg&dl=0

 

 

#28

as already mentioned this cannot be just a noise in the log, the system is getting unusable in this error case

#29

> as already mentioned this cannot be just a noise in the log, the system is getting unusable in this error case

Yesterday's report indicated log errors only (#24, 3), so this update is important. Can you confirm that restarting the system stops the unstable state until the next log error?

> after running a stress test on our test database after several hours we have these errors

We also ran a stress test yesterday with an attempt to mimic your system (including #21, etc.) but couldn't get the errors. More details have been added to the log error messages (in development build, not released yet), but it is unlikely that these details will expose the cause when it happens. A live debugging may be essential. Can you share your stress test? No source is needed, just something runnable that we can try with different versions of objectdb.jar. This path may be very challenging and long but it is unclear what are the alternatives now.

ObjectDB Support
#30

Can you confirm that restarting the system stops the unstable state until the next log error?

>>> yes

Can you share your stress test?

>>> yes, we prepare that, needs some minutes

#31

If there is a way to shorten the time that it takes for the errors to occur it would be very helpful, e.g. maybe by using larger ObjectNode objects that change their size frequently.

ObjectDB Support
#32

Running the Doctor on the stress test database (#27) does not show section errors, i.e. the database is not corrupted.

There is one application level error:

Broken References:
[1] com.agile.hummingbird.ObjectNode(2):'29c314bf-e504-4279-ae2c-3b62b8b8fa57'

Can you confirm that this is expected and should be ignored?

ObjectDB Support
#33

this is expected, no problem

#34

we are preparing the stresstest, important is to run simultaneous updates of the same objects at the same time

#35

Yes, and also simultaneous retrieval (as the log errors are generated during retrieval operations) and probably updates that change the number of 2K sections in the objects. Unfortunately our tests include all that but do not produce the errors.

ObjectDB Support
#36

we are currently trying to create a stress test that shows results faster. It is a pain ... sometimes we have it after 10 minutes, sometimes after 2-3 hours

#37

now, just running our application over the night we have again these entries

[2025-05-01 20:58:19 #21 store] 
SectionClassifier: SectionClassifier{'2a9068c9-b843-49a1-93d1-f3034686ecba'->merger[89131]-missing:1}

[2025-05-01 20:58:45 #22 store] 
SectionClassifier: SectionClassifier{'2a9068c9-b843-49a1-93d1-f3034686ecba'->merger[89131]-missing:1}

[2025-05-01 20:58:59 #23 store] 
SectionClassifier: SectionClassifier{'2a9068c9-b843-49a1-93d1-f3034686ecba'->merger[89131]-missing:1}

[2025-05-02 06:30:01 #24 store] 
Database '/Users/hgz/Documents/Hummingbird/App/objectdb/db/coreSystemDb.odb' is closed by 1615@MacBook-Pro.local

[2025-05-02 06:30:01 #25 server] 
Server on port 3333 has stopped by 1615@MacBook-Pro.local

at both times there was a single application doing a calculation where the Action and ObjectNode class was involved. The problematic object gets a new Action object linked into the actions collection during that process. You can see that there are now 16798 Action objects linked

here is the database

https://www.dropbox.com/scl/fi/get290bgwfh73hr1p26y3/coreSystemDb.odb.zip?rlkey=89ygzbgyce1fyke4zhrcbwvn5&dl=0

 

#38

No Doctor errors in this database except the expected broken references.

> The problematic object gets a new Action object linked into the actions collection during that process. 

> we are currently trying to create a stress test that shows results faster. It is a pain ... sometimes we have it after 10 minutes, sometimes after 2-3 hours

Maybe the exception frequency will increase if Actions instances will take more space, as the error seems to be related to changes that change the number of 2KB sections in ObjectNode instances.

ObjectDB Support
#39

now we have another error

 

[2025-05-03 13:58:55 #31 *] 
java.lang.ArrayIndexOutOfBoundsException: Index -4 out of bounds for length 2048
    at com.objectdb.o.BYR.O(BYR.java:912)
    at com.objectdb.o.BYR.R(BYR.java:995)
    at com.objectdb.o.BYR.H(BYR.java:1071)
    at com.objectdb.o.BYR.G(BYR.java:1059)
    at com.objectdb.o.MST.ay(MST.java:525)
    at com.objectdb.o.MST.au(MST.java:491)
    at com.objectdb.o.MST.ZC(MST.java:462)
    at com.objectdb.o.WRA.ZC(WRA.java:268)
    at com.objectdb.o.LDR.u(LDR.java:568)
    at com.objectdb.o.LDR.Yo(LDR.java:485)
    at com.objectdb.o.OBC.Yo(OBC.java:1131)
    at com.objectdb.o.ERF.c(ERF.java:105)
    at com.objectdb.o.TVS.b(TVS.java:105)
    at com.objectdb.o.TVS.b(TVS.java:94)
    at com.objectdb.jpa.EMImpl.refresh(EMImpl.java:759)
    at com.objectdb.jpa.EMImpl.refresh(EMImpl.java:700)
    at com.objectdb.jdo.PMImpl.refresh(PMImpl.java:989)
    at com.agile.hummingbirdcore.CoreDaemon.refresh(CoreDaemon.java:801)
    at com.agile.hummingbirdcore.AC_Action.endAction(AC_Action.java:1609)
    at com.agile.hummingbirdcore.API.executeEndAction(API.java:3017)
    at com.agile.hummingbirdcore.API.handleAction(API.java:2533)
    at com.agile.hummingbirdcore.API.handleActionBulk(API.java:2133)
    at com.agile.hummingbirdcore.API.directRequest(API.java:1695)
    at com.agile.hummingbirdcore.API.handleWebSocketRequest(API.java:1381)
    at com.agile.hummingbirdcore.WebSocketServerListener.onWebSocketText(WebSocketServerListener.java:212)
    at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:296)
    at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
    at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:67)
    at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:235)
    at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:152)
    at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:326)
    at org.eclipse.jetty.websocket.common.extensions.AbstractExtension.nextIncomingFrame(AbstractExtension.java:148)
    at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.nextIncomingFrame(PerMessageDeflateExtension.java:111)
    at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.forwardIncoming(CompressExtension.java:169)
    at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.incomingFrame(PerMessageDeflateExtension.java:90)
    at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:202)
    at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:225)
    at org.eclipse.jetty.websocket.common.Parser.parseSingleFrame(Parser.java:259)
    at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:459)
    at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:840)

 

#40

The new error may possibly be related to the same low-level issue. ObjectDB uses snapshots of the database, so changes by one transaction are not visible by operations that started before that transaction was committed. Assuming there are no errors in the database itself (as in previous reports) there may be something broken in the snapshot system, which results in using unexpected mix of page versions.

This could be in one of three main levels:

  1. In processing uncommitted flushed changes - Does your application/test use flush (without or with later commit)?
  2. In the page cache that manages all the snapshots.
  3. In reading from the file. It would be useful to know if the errors are still produced when the option in #21 is disabled.

Anyway, given a test case that reproduces the errors, we should hopefully be able to find the exact cause and fix it.

ObjectDB Support
#41

  1. In processing uncommitted flushed changes - Does your application/test use flush (without or with later commit)?
    >>> we removed all flushs and since that we do not have these errors any more
  2. In reading from the file. It would be useful to know if the errors are still produced when the option in #21 is disabled.
    >>> removing the option does not help
#42

>>> we removed all flushs and since that we do not have these errors any more

If this is the case it narrows the search for the bug considerably. Have you disabled it in your test or in your application?

Note that there may also be an automatic flush on queries after uncommitted changes.

Anyway, it may take more time to be sure that errors stopped and flush is indeed related to this issue.

ObjectDB Support
#43

Have you disabled it in your test or in your application?

>>> in the application, we stress test against our application

 

#44

We now have a test that reproduces the issue and general understanding of what happens. It is indeed a different bug than the first on this thread (posts 1-14), which was fixed in version 2.9.3. It does not cause database corruption. It affects refresh operations after flush, and maybe other operations, but apparently queries are not affected directly. The effect should be local, i.e. between flush to commit/rollback or closing of the EntityManager (although it can happen again if the same conditions repeat). Apparently restart is not needed to recover, just a new EntityManager. It can cause the errors in #15, #26, #27, and #37 as well as other errors, such the error in #39, which looks totally different (the stack trace in #39 helped in creating a test and then finding the bug, as it indicates using refresh).

A new build will be released as soon as a repair is available. Meanwhile, avoiding flush seems as a good workaround, if possible.

ObjectDB Support
#45

that's good news

#46

Build 2.9.3_01 (just released) should fix the second bug that is reported in this thread (from post 15).

In some cases, flush without commit could cause unexpected behviour in refresh (and possibly other operations) until commit/rollback or closing the EntityManager. The cause was premature release of previous database snapshots that the flush snapshot depends on.

A side effect of the repair could be increased memory usage when the uncommitted flushed transaction takes a long time (or the relevant EntityManager is not closed properly).

This change can be disabled by setting a system property:

    System.setProperty("objectdb.old-flush-optimization", "true");

 

 

 

ObjectDB Support

Reply