Issue #2517: queries more than 10 times slower using 2.8.1 versus 2.7.6

Type: Bug ReoprtVersion: 2.8.1_01Priority: NormalStatus: ClosedReplies: 16
#1

- we restarted to try to use 2.8.1 instead of 2.7.6

- we did a doctor on the database that you can download from www.hummingbird-systems.com, login/pw your database name, main menu objectdb

- we did not do any statistics generate, just tried to use the new version

- our reference query that takes in 2.7.6 around 5-10 seconds takes in 2.8.1 around 150-200 seconds:

SELECT DISTINCT o.parentNode FROM ObjectNode o WHERE o.classIdentifier = "(OP)" AND (o.linkedObjects.classIdentifier = "(US)" and o.linkedObjects.objectNameUpper = "ORG" and o.linkedObjects.type = 1 and o.linkedObjects.state = 0) AND (o.parentNode.properties.name = "confirmed" AND o.parentNode.properties.doubleValue = 0) and o.parentNode.classIdentifier = "(ME)" order by o.parentNode.modified desc

 

 

#2

Please advise whether updating the statistics, as explained on issue #2417 post#15 helps.

It seems that running your query on your sample database with version 2.8.1 is slower if statistics is not updated.

ObjectDB Support
#3

Note that we haven't tested it on your new database but on a old sample database for which statistics was already updated as part of the work on issue #2417. Hopefully updating statistics will also help with the new database. If not, we can investigate it further.

ObjectDB Support
#4

isn't it possible that you create the statistics on that new database directly to avoid to again and again transfer this huge database, we are a bit under time pressure

#5

Since we already checked the database from issue #2417 and the problem that you describe here is also demonstrated by that older database, i.e. statistics must be updated to avoid 10x or more performance lose with this query, there is a good chance that the same applies to your newer database (which we have not downloaded).

Please try running statistics update on it and it may solve the issue. If not, then we can download the new database and run statistics update, so you won't have to upload another file.

ObjectDB Support
#6

results after adding the statistics:

query 1: "select o from ObjectNode o join o.linkedObjects l1 where o.classIdentifier = '(SI)' and o.type = 0 and o.state = 0 and (o.linkedObjects.classIdentifier = '(TC)' and o.linkedObjects.objectNameUpper = 'CAM-MILLING') and (l1.classIdentifier = '(SY)' and l1.objectNameUpper = 'TECHNOLOGY')"

2.7.6, 3 seconds

2.8.1, 16 seconds

 

query 2: "select distinct o from ObjectNode o join o.properties p1 join o.properties p2 join o.properties p10 join o.properties p20 join o.properties p21 where o.classIdentifier = '(OP)' and (o.linkedObjects.classIdentifier = '(TC)' and o.linkedObjects.objectNameUpper = 'CAM-MILLING' and o.linkedObjects.type = 1 and o.linkedObjects.state = 0) and (((p2.name = 'scheduledEndSlot' and p2.doubleValue <= 2603) or (p1.name = 'scheduledSlot' and p1.doubleValue <= 2603)) and (p10.name = 'scheduledState' and p10.doubleValue = 2) and (o.properties.name = 'feedbackState' and o.properties.doubleValue != 13) and (p21.name = 'globalFeedback' and p21.doubleValue = 0) and (p20.name = 'roughPlanning' and p20.doubleValue = 0))"

2.7.6, 2 seconds

2.8.1, 16 seconds

#7

OK. We will download and check. What about the query from #1 above?

ObjectDB Support
#8

the #1 query is around half of the speed compared to 2.7.6

#9

Here is an analysis of query 1 of post #6 above:

select o from ObjectNode o join o.linkedObjects l1
where o.classIdentifier = '(SI)' 
  and o.type = 0
  and o.state = 0
  and (o.linkedObjects.classIdentifier = '(TC)'
  and o.linkedObjects.objectNameUpper = 'CAM-MILLING')
  and (l1.classIdentifier = '(SY)'
  and l1.objectNameUpper = 'TECHNOLOGY')

The issue is again suboptimal selection of a query plan. Versions 2.7.6 and 2.8.1 have more or less the same collection of query plans to select from. Version 2.8.1 is expected to make a better choice in most cases, but apparently in this case the selection of 2.8.1 was not optimal.

There are only 237 ObjectNode instances that satisfy the constraint:

     (o.classIdentifier = '(SI)'  and o.type = 0  and o.state = 0)

Therefore, the optimal query starts by using the composite index "ctso" to quickly reduce the examined nodes. This is actually what version 2.7.6 does.

However, version 2.8.1 selection algorithm is different. Since there are 11,170,730 ObjectNode instances, and there are 269 different combinations of (classIdentifier, type, state), it guesses that:

     (o.classIdentifier = '(SI)'  and o.type = 0  and o.state = 0)

will have about 11,170,730 / 269 = ~41,527 objects rather than 237, and therefore it doesn't select this option.

We will try to improve this in future versions of ObjectDB. However, as a quick solution you can force ObjectDB to use the query plan that uses the index "ctso".

For example in the Explorer:

select o from ObjectNode o join o.linkedObjects l1
where o.classIdentifier = '(SI)'
  and o.type = 0
  and o.state = 0
  and (o.linkedObjects.classIdentifier = '(TC)'
  and o.linkedObjects.objectNameUpper = 'CAM-MILLING')
  and (l1.classIdentifier = '(SY)'
  and l1.objectNameUpper = 'TECHNOLOGY')
[[objectdb.query-plan-text=ctso]]

Or in the code:

 query.setHint("objectdb.query-plan-text", "ctso");

See also the discussion in this thread.

ObjectDB Support
#10

the query hint does not help, measures:

version 2.8.1

query:

select distinct o from ObjectNode o join o.properties p1 join o.properties p2 join o.properties p10 join o.properties p20 join o.properties p21 where o.classIdentifier = '(OP)' and (o.linkedObjects.classIdentifier = '(TC)' and o.linkedObjects.objectNameUpper = 'CAM-MILLING' and o.linkedObjects.type = 1 and o.linkedObjects.state = 0) and (((p2.name = 'scheduledEndSlot' and p2.doubleValue <= 2604) or (p1.name = 'scheduledSlot' and p1.doubleValue <= 2604)) and (p10.name = 'scheduledState' and p10.doubleValue = 2) and (o.properties.name = 'feedbackState' and o.properties.doubleValue != 13) and (p21.name = 'globalFeedback' and p21.doubleValue = 0) and (p20.name = 'roughPlanning' and p20.doubleValue = 0))

180 results in 7263 milliseconds

added: 

[[objectdb.query-plan-text=ctso]]

180 results in 34304 milliseconds

 

please advice, how can we proceed ?

#11

As explained in answer #9 the hint is for your first query in post #6.

Does it help with that query?

In post #10 you only mention the second query from post #6 that we haven't analysed, and therefore cannot suggest the particular recommended hint for that query.

Obviously forcing ObjectDB to use a particular index is not suitable for every query, so the answer in #9 demonstrates it with one query in order for you to proceed and try and use hints as necessary in all your other queries. In the ObjectDB log (if you set <logger name="query" level="debug" />) you can see all the top query plans and try alternative query plans manually using hints, or see which query plan was used in older version of ObjectDB, such as 2.7.6, and adopt it manually in 2.8.1 using hints.

ObjectDB Support
#12

Such a process is not possible to implement. We have a huge amount of customer specific queries and it would kill our flexibility.

how about a way to fall back to the old mechanism till you have something that works better ?

#13

> how about a way to fall back to the old mechanism till you have something that works better ?

Maybe it could may be possible to some extent, but it would be difficult to guarantee exactly the same behaviour,  as different things have changed.

Is there a specific reason for upgrading from version 2.7.6? because the best way to guarantee permanent fall back to the old query plans is to stay with the old ObjectDB version.

> Such a process is not possible to implement. We have a huge amount of customer specific queries and it would kill our flexibility.

Maybe this could be automated? We may be able to implement a new Trainer utility, that given a set of queries, it tests different query plans and chooses and remembers the best query plan. Any other solution may still be kind of guessing. The best way to choose the best query plan is simply to try the 10 leading query plans.

However, for this to work queries must repeat, i.e. the query string of a new query must be the same as of a trained query, except the parameter values.

ObjectDB Support
#14

we would like to benefit from the new features/fixes but we can wait until you have something that works better

#15

OK. If it is possible for you to provide a complete set of queries that work on your last database and covers the needs of your application, then we may be able to use it for tuning, hoping to get better results in general, but particularly better results with your applications.

ObjectDB Support
#16

we startet to implement the query hints, question: is it possible to set a sequence of hints but not only the hint for the first index to use

#17

> is it possible to set a sequence of hints but not only the hint for the first index to use

The "objectdb.query-plan-text" hint is a single string, which when specified, ObjectDB selects a query plan that contains that string in its description, if there is such a query plan. Therefore, it is a single string and not a list of strings. However, after finding the best query (manually) you can use as a hint any string in the description of the query plan, not just a name of an index, so you should be able to cover more than one index by a longer string.

 

ObjectDB Support

Reply