Problem using "and" for joined entities and indexed fields

#1

Using @Index for fields and filter more than one field with "and" predicate ends in empty result list.

public class IndexTest {

    @Entity
    @Indices({
        @Index(members = {"children.name"}),
        @Index(members = {"children.age"})
    })
    static class Parent {
        @Embedded
        List<Child> children;
    }

    @Embeddable
    static class Child {
        String name;
        Integer age;
    }

    public static void main(String[] args) {
        Child child1 = new Child();
        child1.name = "Foo";
        child1.age = 42;
        Parent parent = new Parent();
        parent.children = Collections.singletonList(child1);

        EntityManagerFactory emf = Persistence.createEntityManagerFactory("foo.odb");
        EntityManager em = emf.createEntityManager();
        em.getTransaction().begin();
        em.persist(parent);
        em.getTransaction().commit();

        System.out.println(em.createQuery("select from Parent p join p.children c where c.age = 42", Parent.class).getResultList().size());
        System.out.println(em.createQuery("select from Parent p join p.children c where c.name = 'Foo'", Parent.class).getResultList().size());
        System.out.println(em.createQuery("select from Parent p join p.children c where c.age = 42 and c.name = 'Foo'", Parent.class).getResultList().size());
    }
}

This returns:

1
1
0

Please solve this problem. Using indexes is a common practice to fasten the query; but if you can't use these fields in logical expressions like "and" is a big problem.

 

#2

This indeed indicates a bug in using indexes on fields of embedded objects in a collection. We will investigate it. Note, however, that you can also use a combined index, which is much more efficient for the third query:

    @Index(members = { "children.name", "children.age" })

(with or without the separate age index for fast queries on age only).

Apparently the combined index does not have the demonstrated issue.

ObjectDB Support
#3

In short, the bug was using an index that is not fit for purpose in this context. An index on the name field alone is not generally useful in this query. It does filter relevant Parent objects, but then it is too complex to complete the execution and filter by age, as it is not the Parent's age, so candidate parents have to be retrieved and all their children scanned to find the relevant child and the age index cannot help here. This would be different if you query the children directly (if they are defined as entity objects), not through the parents.

Build 2.8.9_01 fixes the issue by simply eliminating the invalid query plan, so these indexes are not used for the 2 field query and a full scan is used instead. Unless of course you have a composite index, as suggested in #2 above, which is the right index for this query.

 

ObjectDB Support
#4

Unfortunately, this solution is very poor. While this scenario may not be commonly encountered, it implies that filtering joined objects can only be achieved through either combined indexed fields only or entirely without an index. If you attempt to filter with at least one indexed field alongside any other field, it will result in ignoring the index. I believe this is a significant limitation of indexes on embedded objects.

For instance, even if the "age" field is not indexed, the index on the "name" field will also be ignored, and a full scan will be performed instead. Is there a reason why it's not feasible to first scan the objects with the index and then use this result to filter the rest?

#5

The challenge is briefly explained in #3 above. You are right that scanning all the children of "candidate parents" in order to filter by names again (in order to check ages) may be faster than a full scan of all the parents. However, it would be significantly inferior to using a composite index.

This bug was probably always there unreported, as collections of embedded objects are not used very frequently, and when used, composite indexes are the natural first choice. Therefore, as the demand for this feature is currently very low it would not be a high priority to implement now a better solution than the solution of 2.8.9_01.

Note that composite indexes on collections of embedded objects may contain many fields, not just 2 fields. An alternative way that you may also consider is to switch the children from embedded objects to entity objects and then using single field indexes.

ObjectDB Support
#6

Update: Performance regression in build 2.8.9_01 (disabling indexes too aggressively) is fixed in build 2.8.9_02.

ObjectDB Support

Reply