Issue #2912: retrieval again and again started

Type: Bug Reoprt

Version: 2.9.0

Priority: High

Status: Fixed

Replies: 18

hgzpincher

Joined on 2023‑08‑03

as already mentioned in other posts the database again and again goes for a retrieval like:

RetrievalTask.reportPrimaryResult0: page#2356263, reader.getPos() = 53

[2024-07-04 10:07:11 #45 store]
RetrievalTask.reportPrimaryResult0: page#2809028, reader.getPos() = 0

[2024-07-04 10:07:11 #46 store]
SectionClassifier: SectionClassifier{48->merger[3590]-missing:1,49->merger[3590]-missing:1,50->merger[3590]-missing:1}

if this happens the system is nearly unusable for a long period of time. This happens in all versions

odb20240703.log (448.0 KB)

odb20240707.log (1.7 MB)

objectdb.conf (2.3 KB)

support

Joined on 2010‑05‑03

The problem with this issue is that we cannot reproduce it and cannot understand the cause. If you will be able to somehow produce a test that demonstrates it then it will be very helpful.

Build 2.9.0_02 includes another attempt to fix the issue (the chance that this is what will solve the issue is not high but probably worth trying) + improved log, which expects to add additional stack traces to the log in this case.

(edited: Build 2.9.0_01 -> Build 2.9.0_02)

ObjectDB Support

hgzpincher

Joined on 2023‑08‑03

ok, we have built a new version, active at the customer, keep you informed

hgzpincher

Joined on 2023‑08‑03

Hi, since nearly 2 weeks no more retrieval, we continue to watch and inform

support

Joined on 2010‑05‑03

If there are no more retrieval errors it is excellent news. Thank you for the update.

ObjectDB Support

hgzpincher

Joined on 2023‑08‑03

update, still no retrievals

support

Joined on 2010‑05‑03

If you don't have retrieval exceptions anymore then we can close this issue. It can be re-opened if something changes. Thank you for the update.

ObjectDB Support

hgzpincher

Joined on 2023‑08‑03

we do have the retrieval again, see attached log. The queries mentioned here are completely unspecific (any query that normally takes some ms).

Could there be a problem somewhere in the cache/write to disk mechanismns ? We have hundreds of clients communicating to our core process ... attached also the configuration that we are using

odb20240819.log (1.6 MB)

objectdb.conf (2.0 KB)

support

Joined on 2010‑05‑03

The dominant exceptions in this log seem different than in the original post, so hopefully build 2.9.0_02 did solve the main issue.

There are many query execution timeout exceptions in the new log. Did you set a limit of 10 seconds? In this case an exception is expected, but the log lines should be improved, as reader.getPos() etc. is not relevant (printed on the way of the exception up but irrelevant in this case).

There is one place in which "SectionClassifier" from the original post (and all your previous reports on this issue) is shown (log #127), but that also happens at the same time of a query execution timeout exception (log #126), so hopefully we are still in the scope of query executions timeout with irrelevant log info for that case.

Except the log, please advise if there are exceptions that are not related to query execution timeout and if there are side effects such as slowness, etc., which cannot be explained by high load and re-running of failed queries.

ObjectDB Support

#10

hgzpincher

Joined on 2023‑08‑03

there is a 100 seconds timeout set. At the times when the timeout happens the system is very slow

#11

support

Joined on 2010‑05‑03

As it seems now, it is not related to the original issue (except that the log includes info that should be removed).

Obviously queries that need more than 100 seconds slow everything and should be addressed, but it would require a separate discussion with new information and analysis.

ObjectDB Support

#12

hgzpincher

Joined on 2023‑08‑03

these queries take normally some ms to some seconds

#13

support

Joined on 2010‑05‑03

Can you monitor the system and the JVM and see if the sudden slowness relates to high load, number of concurrent users, JVM heap usage, GC activity, etc?

ObjectDB Support

#14

support

Joined on 2010‑05‑03

Did you have these query execution timeout exceptions before build 2.9.0_02?

ObjectDB Support

#15

support

Joined on 2010‑05‑03

Thread dumps of all ObjectDB related threads as well as profiler data (when slowness happens) may help.

ObjectDB Support

#16

hgzpincher

Joined on 2023‑08‑03

we try to get these data.

Did you have these query execution timeout exceptions before build 2.9.0_02? no

The same error happens at other customers, too. Here we have that log entry:

[2024-08-30 09:58:27 #37 store]
RetrievalTask.reportPrimaryResult0: page#720301, reader.getPos() = 1329com.objectdb.o.UserException: Query execution exceeded the 100000 milliseconds timeout
   at com.objectdb.o.MSG.a(MSG.java:64)
   at com.objectdb.o.PRG.b(PRG.java:1015)
   at com.objectdb.o.BCN.a(BCN.java:302)
   at com.objectdb.o.BCN.YZ(BCN.java:265)
   at com.objectdb.o.TAI.YZ(TAI.java:233)
   at com.objectdb.o.RTT.r(RTT.java:267)
   at com.objectdb.o.RTT.q(RTT.java:248)
   at com.objectdb.o.RST.l(RST.java:183)
   at com.objectdb.o.RTT.g(RTT.java:147)
   at com.objectdb.o.RTT.o(RTT.java:167)
   at com.objectdb.o.RST.n(RST.java:123)
   at com.objectdb.o.PGT.j(PGT.java:155)
   at com.objectdb.o.RST.k(RST.java:95)
   at com.objectdb.o.RTT.g(RTT.java:145)
   at com.objectdb.o.RTT.o(RTT.java:167)
   at com.objectdb.o.RST.n(RST.java:123)
   at com.objectdb.o.PGT.j(PGT.java:155)
   at com.objectdb.o.RST.k(RST.java:95)
   at com.objectdb.o.RTT.g(RTT.java:145)
   at com.objectdb.o.TSK.b(TSK.java:146)
   at com.objectdb.o.TSK.c(TSK.java:96)
   at com.objectdb.o.TSM.a(TSM.java:88)
   at com.objectdb.o.RTT.o(RTT.java:165)
   at com.objectdb.o.RST.n(RST.java:123)
   at com.objectdb.o.PGT.j(PGT.java:155)
   at com.objectdb.o.RST.k(RST.java:95)
   at com.objectdb.o.RTT.g(RTT.java:145)
   at com.objectdb.o.TSK.b(TSK.java:146)
   at com.objectdb.o.TSK.c(TSK.java:96)
   at com.objectdb.o.MST.ax(MST.java:635)
   at com.objectdb.o.MST.ZN(MST.java:591)
   at com.objectdb.o.PLN.ZN(PLN.java:575)
   at com.objectdb.o.TAI.ZN(TAI.java:203)
   at com.objectdb.o.TAI.Ze(TAI.java:136)
   at com.objectdb.o.TQI.Ze(TQI.java:56)
   at com.objectdb.o.TQI.Ze(TQI.java:56)
   at com.objectdb.o.MQI.Zd(MQI.java:142)
   at com.objectdb.o.GQI.Ze(GQI.java:150)
   at com.objectdb.o.PRG.h(PRG.java:698)
   at com.objectdb.o.PRG.g(PRG.java:560)
   at com.objectdb.o.QRM.ZL(QRM.java:287)
   at com.objectdb.o.MST.ZL(MST.java:1026)
   at com.objectdb.o.WRA.ZL(WRA.java:313)
   at com.objectdb.o.WSM.ZL(WSM.java:117)
   at com.objectdb.o.QRR.k(QRR.java:260)
   at com.objectdb.o.QRR.i(QRR.java:154)
   at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:726)
   at com.agile.hummingbirdcore.AC_Action.onActionError(AC_Action.java:984)
   at com.agile.hummingbirdcore.AC_Action.errorAction(AC_Action.java:1616)
   at com.agile.hummingbirdcore.API.executeErrorAction(API.java:2940)
   at com.agile.hummingbirdcore.API.handleAction(API.java:2418)
   at com.agile.hummingbirdcore.API.directRequest(API.java:1621)
   at com.agile.hummingbirdcore.API.handleWebSocketRequest(API.java:1357)
   at com.agile.hummingbirdcore.WebSocketServerListener.onWebSocketText(WebSocketServerListener.java:200)
   at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:296)
   at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
   at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:67)
   at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:235)
   at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:152)
   at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:326)
   at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:202)
   at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:225)
   at org.eclipse.jetty.websocket.common.Parser.parseSingleFrame(Parser.java:259)
   at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:459)
   at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
   at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
   at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
   at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
   at java.lang.Thread.run(Unknown Source)

for that query:

SELECT DISTINCT a from Action a WHERE a.objectsInCharge.nodePath = '/(MA)DMF-220' and a.type = 1 and a.state = 1 and a.detail = 2

this query normally takes seconds, in general it happens on any query

database can be found here

https://www.dropbox.com/scl/fi/7wwnofxwciwgqennj7dzr/coreSystemDb.rar?rlkey=mxjgf7n03up0yjv48xtzvu09r&dl=0

odb20240830.log (121.1 KB)

#17

support

Joined on 2010‑05‑03

Running the query on that database didn't show any problem, so this is probably some sort of load / deadlock / race condition issue that happens only in specific conditions.

Build 2.9.0_02 fixed the original issue by adding a synchronized block, which apparently was missing and needed. That was the only change. Examining the code does not reveal a deadlock that could be caused by that synchronized block.

If you can generate a thread dump when this happens it might be helpful. Any other additional information (see #13 and #15 above) may also help. If the query execution plan is different than usual when this happens it should be examined.

Unfortunately the stack trace and the log in #16 do not provide much new info.

ObjectDB Support

#18

hgzpincher

Joined on 2023‑08‑03

probably the problems where caused by too less RAM available, we increased and do not have any issue since 2 weeks

#19

support

Joined on 2010‑05‑03

Thank you for the update.

Note that posts #1 to #7 are about the "SectionClassifier" bug that was apparently solved in build 2.9.0_02.

Posts #8 and below discusses a separate issue (extremely slow query execution), which was hopefully solved by increasing RAM as of your update in #18.

If there are any additional issues, which are not related to the original posts (#1 to #7), i.e. that stack traces seem different, please report them in new threads with up to date details.

ObjectDB Support

Issue #2912: retrieval again and again started

Reply