slow performance for a complex query

#1

Hi,

We have a query that is constructed dynamically, an example is like this:

SELECT DISTINCT $1 FROM CDI $1, Organisation $2, Organisation $3 WHERE ((UPPER($1.cdiId) LIKE 'THREAD14%') AND ((($1.parentOrgId='news') OR (($1.parentOrgId=$2.organisationId) AND ($2.parentOrgId='news'))) OR ((($1.parentOrgId=$3.organisationId) AND ($3.parentOrgId=$2.organisationId)) AND ($2.parentOrgId='news'))))

I've also tried running the CDI query first and passing the results into the organisation query as follows:

SELECT DISTINCT $1 FROM CDI $1, Organisation $2, Organisation $3 WHERE (($1.id IN :cdiids) AND ((($1.parentOrgId='news') OR (($1.parentOrgId=$2.organisationId) AND ($2.parentOrgId='news'))) OR ((($1.parentOrgId=$3.organisationId) AND ($3.parentOrgId=$2.organisationId)) AND ($2.parentOrgId='news'))))

passing "10449,62901,63521,64132" as a parameter.

You can try it on the database I've uploaded for issue http://www.objectdb.com/database/forum/350

This query in both variations takes over 3 minutes on a small database (about 10000 Organisations and same amount of CDIs). This is due to the execution order of query components, according to Explorer it retrieves all organisations twice at the start of query. 

I do understand that this query is very complex and I could get the CDIs first and filter the organisations manually. However if I do that, I cannot support pagination over result set with setFirstResult and setMaxResults.

I could do this with subqueries, however as I understand it from this post: http://www.objectdb.com/database/issue/10, subqueries are not supported by ObjectDB.

Is there any other way to specify execution order of query components for a complex query like this? Also, any advise on improving this query would be appreciated. 

Regards,

Natalia.

#2

As you noticed the execution of this query involves nested loops of full scan of Organisation. The inner loop is executed about 100M times. In cases with complex AND/OR structures like this, sometimes JOIN can only be implemented by nested full scan (and indexes cannot help).

There are different types of nested queries. Some nested queries are actually equivalent to JOIN. When the inner query is not dependent on the outer query - no JOIN is required. But such nested query (if possible in your case) can also be simulated by running a first query and then sending the results of the first query as a parameter to the second query.

ObjectDB Support

Reply