Issue #2912: retrieval again and again started

Type: Bug ReoprtVersion: 2.9.0Priority: HighStatus: FixedReplies: 18
#1

as already mentioned in other posts the database again and again goes for a retrieval like:

RetrievalTask.reportPrimaryResult0: page#2356263, reader.getPos() = 53

or 

[2024-07-04 10:07:11 #45 store] 
RetrievalTask.reportPrimaryResult0: page#2809028, reader.getPos() = 0

[2024-07-04 10:07:11 #46 store] 
SectionClassifier: SectionClassifier{48->merger[3590]-missing:1,49->merger[3590]-missing:1,50->merger[3590]-missing:1}
 

if this happens the system is nearly unusable for a long period of time. This happens in all versions

 

#2

The problem with this issue is that we cannot reproduce it and cannot understand the cause. If you will be able to somehow produce a test that demonstrates it then it will be very helpful.

Build 2.9.0_02 includes another attempt to fix the issue (the chance that this is what will solve the issue is not high but probably worth trying) + improved log, which expects to add additional stack traces to the log in this case.

(edited: Build 2.9.0_01 -> Build 2.9.0_02)

ObjectDB Support
#3

ok, we have built a new version, active at the customer, keep you informed

#4

Hi, since nearly 2 weeks no more retrieval, we continue to watch and inform

#5

If there are no more retrieval errors it is excellent news. Thank you for the update.

ObjectDB Support
#6

update, still no retrievals

#7

If you don't have retrieval exceptions anymore then we can close this issue. It can be re-opened if something changes. Thank you for the update.

ObjectDB Support
#8

we do have the retrieval again, see attached log. The queries mentioned here are completely unspecific (any query that normally takes some ms).

Could there be a problem somewhere in the cache/write to disk mechanismns ? We have hundreds of clients  communicating to our core process ... attached also the configuration that we are using

#9

The dominant exceptions in this log seem different than in the original post, so hopefully build 2.9.0_02 did solve the main issue.

There are many query execution timeout exceptions in the new log. Did you set a limit of 10 seconds? In this case an exception is expected, but the log lines should be improved, as reader.getPos() etc. is not relevant (printed on the way of the exception up but irrelevant in this case).

There is one place in which "SectionClassifier" from the original post (and all your previous reports on this issue) is shown (log #127), but that also happens at the same time of a query execution timeout exception (log #126), so hopefully we are still in the scope of query executions timeout with irrelevant log info for that case.

Except the log, please advise if there are exceptions that are not related to query execution timeout and if there are side effects such as slowness, etc., which cannot be explained by high load and re-running of failed queries.

ObjectDB Support
#10

there is a 100 seconds timeout set. At the times when the timeout happens the system is very slow

#11

As it seems now, it is not related to the original issue (except that the log includes info that should be removed).

Obviously queries that need more than 100 seconds slow everything and should be addressed, but it would require a separate discussion with new information and analysis.

ObjectDB Support
#12

these queries take normally some ms to some seconds 

#13

Can you monitor the system and the JVM and see if the sudden slowness relates to high load, number of concurrent users, JVM heap usage, GC activity, etc?

ObjectDB Support
#14

Did you have these query execution timeout exceptions before build 2.9.0_02?

ObjectDB Support
#15

Thread dumps of all ObjectDB related threads as well as profiler data (when slowness happens) may help.

ObjectDB Support
#16

we try to get these data.

Did you have these query execution timeout exceptions before build 2.9.0_02? no

 

The same error happens at other customers, too. Here we have that log entry:

[2024-08-30 09:58:27 #37 store] 
RetrievalTask.reportPrimaryResult0: page#720301, reader.getPos() = 1329com.objectdb.o.UserException: Query execution exceeded the 100000 milliseconds timeout
    at com.objectdb.o.MSG.a(MSG.java:64)
    at com.objectdb.o.PRG.b(PRG.java:1015)
    at com.objectdb.o.BCN.a(BCN.java:302)
    at com.objectdb.o.BCN.YZ(BCN.java:265)
    at com.objectdb.o.TAI.YZ(TAI.java:233)
    at com.objectdb.o.RTT.r(RTT.java:267)
    at com.objectdb.o.RTT.q(RTT.java:248)
    at com.objectdb.o.RST.l(RST.java:183)
    at com.objectdb.o.RTT.g(RTT.java:147)
    at com.objectdb.o.RTT.o(RTT.java:167)
    at com.objectdb.o.RST.n(RST.java:123)
    at com.objectdb.o.PGT.j(PGT.java:155)
    at com.objectdb.o.RST.k(RST.java:95)
    at com.objectdb.o.RTT.g(RTT.java:145)
    at com.objectdb.o.RTT.o(RTT.java:167)
    at com.objectdb.o.RST.n(RST.java:123)
    at com.objectdb.o.PGT.j(PGT.java:155)
    at com.objectdb.o.RST.k(RST.java:95)
    at com.objectdb.o.RTT.g(RTT.java:145)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.TSM.a(TSM.java:88)
    at com.objectdb.o.RTT.o(RTT.java:165)
    at com.objectdb.o.RST.n(RST.java:123)
    at com.objectdb.o.PGT.j(PGT.java:155)
    at com.objectdb.o.RST.k(RST.java:95)
    at com.objectdb.o.RTT.g(RTT.java:145)
    at com.objectdb.o.TSK.b(TSK.java:146)
    at com.objectdb.o.TSK.c(TSK.java:96)
    at com.objectdb.o.MST.ax(MST.java:635)
    at com.objectdb.o.MST.ZN(MST.java:591)
    at com.objectdb.o.PLN.ZN(PLN.java:575)
    at com.objectdb.o.TAI.ZN(TAI.java:203)
    at com.objectdb.o.TAI.Ze(TAI.java:136)
    at com.objectdb.o.TQI.Ze(TQI.java:56)
    at com.objectdb.o.TQI.Ze(TQI.java:56)
    at com.objectdb.o.MQI.Zd(MQI.java:142)
    at com.objectdb.o.GQI.Ze(GQI.java:150)
    at com.objectdb.o.PRG.h(PRG.java:698)
    at com.objectdb.o.PRG.g(PRG.java:560)
    at com.objectdb.o.QRM.ZL(QRM.java:287)
    at com.objectdb.o.MST.ZL(MST.java:1026)
    at com.objectdb.o.WRA.ZL(WRA.java:313)
    at com.objectdb.o.WSM.ZL(WSM.java:117)
    at com.objectdb.o.QRR.k(QRR.java:260)
    at com.objectdb.o.QRR.i(QRR.java:154)
    at com.objectdb.jpa.JpaQuery.getResultList(JpaQuery.java:726)
    at com.agile.hummingbirdcore.AC_Action.onActionError(AC_Action.java:984)
    at com.agile.hummingbirdcore.AC_Action.errorAction(AC_Action.java:1616)
    at com.agile.hummingbirdcore.API.executeErrorAction(API.java:2940)
    at com.agile.hummingbirdcore.API.handleAction(API.java:2418)
    at com.agile.hummingbirdcore.API.directRequest(API.java:1621)
    at com.agile.hummingbirdcore.API.handleWebSocketRequest(API.java:1357)
    at com.agile.hummingbirdcore.WebSocketServerListener.onWebSocketText(WebSocketServerListener.java:200)
    at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:296)
    at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
    at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:67)
    at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:235)
    at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:152)
    at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:326)
    at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:202)
    at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:225)
    at org.eclipse.jetty.websocket.common.Parser.parseSingleFrame(Parser.java:259)
    at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:459)
    at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.lang.Thread.run(Unknown Source)
 

for that query:

SELECT DISTINCT a from Action a WHERE a.objectsInCharge.nodePath = '/(MA)DMF-220' and a.type = 1 and a.state = 1 and a.detail = 2

 

this query normally takes seconds, in general it happens on any query

database can be found here

https://www.dropbox.com/scl/fi/7wwnofxwciwgqennj7dzr/coreSystemDb.rar?rlkey=mxjgf7n03up0yjv48xtzvu09r&dl=0

 

 

#17

Running the query on that database didn't show any problem, so this is probably some sort of load / deadlock / race condition issue that happens only in specific conditions.

Build 2.9.0_02 fixed the original issue by adding a synchronized block, which apparently was missing and needed. That was the only change. Examining the code does not reveal a deadlock that could be caused by that synchronized block.

If you can generate a thread dump when this happens it might be helpful. Any other additional information (see #13 and #15 above) may also help. If the query execution plan is different than usual when this happens it should be examined.

Unfortunately the stack trace and the log in #16 do not provide much new info.

ObjectDB Support
#18

probably the problems where caused by too less RAM available, we increased and do not have any issue since 2 weeks

#19

Thank you for the update.

Note that posts #1 to #7 are about the "SectionClassifier" bug that was apparently solved in build 2.9.0_02.

Posts #8 and below discusses a separate issue (extremely slow query execution), which was hopefully solved by increasing RAM as of your update in #18.

If there are any additional issues, which are not related to the original posts (#1 to #7), i.e. that stack traces seem different, please report them in new threads with up to date details.

ObjectDB Support

Reply