Join performance in Objectdb

#1
SELECT count(e) FROM Meta e, ImageNet i, Page p
WHERE e.idImage=i.id AND e.idPage=p.id;

So simple a command cause me half an hour and is running now! Each table has about 160K records. What formula can predict the time for such counting?

TIA

#2

Now, after 2 hours, it is still running.

Maybe I used a wrong command?   

#3

According to your query you use IDs instead of references, which is not recommended when using JPA. It seems that a Meta instance holds the IDS of an Image and a Page instances instead of direct references.

A more JPA conventional variant of your query would be:

SELECT count(e) FROM Meta e, ImageNet i, Page p
WHERE e.image=i AND e.page=p

Anyway, this still doesn't justify execution time of hours.

Since you didn't provide a test case (as requested in the posting instructions) it is impossible to see what is exactly the problem. Maybe the id fields in Page and Image are not annotated with @Id or @Index. In that case, executing the JOIN may require iteration over 160,000 x 160,000 x 160,000 tuples.

The following test was written following your post to demonstrate queries which are similar to your query:

import javax.persistence.*;

public class T474 {
   
    @SuppressWarnings("boxing")
    public static void main(String[] args) {
        EntityManagerFactory emf =
            Persistence.createEntityManagerFactory(
                "objectdb:$objectdb/db/test.tmp;drop");
        EntityManager em = emf.createEntityManager();
        int entityCount = 160000;
       
        System.out.println("Database population started.");
        em.getTransaction().begin();
        for (int i = 0; i < entityCount; i++) {
            B b = new B();
            em.persist(b);
            C c = new C();
            em.persist(c);
            em.persist(new A(b, c));
        }
        em.getTransaction().commit();
        System.out.println("Database population completed.");

        long ms, result, elapsedTime;

        System.out.println("Query 1 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q1 = em.createQuery(
            "SELECT COUNT(a) FROM A a, B b, C c " +
            "WHERE a.bId = b.id AND a.cId = c.id", Long.class);
        result = q1.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 1 completed in " + elapsedTime +
            " milliseconds (" + result + ")");

        System.out.println("Query 2 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q2 = em.createQuery(
            "SELECT COUNT(a) FROM A a, B b, C c " +
            "WHERE a.b = b AND a.c = c", Long.class);
        result = q2.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 2 completed in " + elapsedTime +
            " milliseconds (" + result + ")");

        em.close();
        emf.close();
    }
   
    @Entity
    public static class A {
        @Id @GeneratedValue long id;
        B b;
        long bId;
        C c;
        long cId;
       
        public A() {
        }

        public A(B b, C c) {
            this.b = b;
            this.bId = b.id;
            this.c = c;
            this.cId = c.id;
        }
    }
   
    @Entity
    @SequenceGenerator(name="seqB", allocationSize=1000)
    public static class B {
        @Id @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seqB")
        long id;
    }
   
    @Entity
    @SequenceGenerator(name="seqC", allocationSize=1000)
    public static class C {
        @Id @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seqC")
        long id;
    }
}

The result output is:

Database population started.
Database population completed.
Query 1 started.
Query 1 completed in 1594 milliseconds (160000)
Query 2 started.
Query 2 completed in 109 milliseconds (160000)

The first query uses a JOIN on ID fields, similarly to your original query. The second query uses JOIN on direct references. The second form is simpler, faster and more appropriate when using JPA.

ObjectDB Support
#4

1, It is supposed that there is an implicit index for the @ID in JPA, right?

2, I add @index to each field of idxxxx;

3, The Meta is parsed from a xml file like:

...
<PIC>
<PIC_URL></PIC_URL>
<PAGE_URL></PAGE_URL>
<ALT_TEXT></ALT_TEXT>
<ANCHOR_TEXT></ANCHOR_TEXT>
<SUR_TEXT></SUR_TEXT>
<PAGE_TITLE></PAGE_TITLE>
<WIDTH></WIDTH>
<HEIGHT></HEIGHT>
</PIC>
...
I have no reference to set in the Meta, unless do additional searching and replacement.

TIA

#5

I think this is a bug. The aggreiation of Objectdb seems to be based on the size of fields of entities:

JPQL> select count(e.id) from Page e
1 java.lang.Long: 161143
Time : 23.889999

JPQL> select count(e.sUrl) from Page e
1 java.lang.Long: 161143
Time : 96.968002

JPQL> select count(e.iSize) from Page e
1 java.lang.Long: 161143
Time : 10.359000

JPQL>

The command tool is from the book of Mike Keith.

Obviously, it is supposed that the count of every field should be the same performance. I have no time to test the field of byte[], you know.

TIA

#6

> 1, It is supposed that there is an implicit index for the @Id in JPA, right?

That is true.

> 2, I add @index to each field of idxxxx;

I am not sure that it would help for this specific query.

> I have no reference to set in the Meta, unless do additional searching and replacement.

Building the database correctly may be more complicated but then it would work faster.

> The aggreiation of Objectdb seems to be based on the size of fields of entities.

I tried to repeat this with the following test but could get your results.

Can you change this test to demonstrate the execution time difference that you experience?

import javax.persistence.*;

public class T474b {
   
    @SuppressWarnings("boxing")
    public static void main(String[] args) {
        EntityManagerFactory emf =
            Persistence.createEntityManagerFactory(
                "objectdb:$objectdb/db/test.tmp;drop");
        EntityManager em = emf.createEntityManager();
        int entityCount = 1600000;
       
        System.out.println("Database population started.");
        em.getTransaction().begin();
        for (int i = 0; i < entityCount; i++) {
            em.persist(new A());
        }
        em.getTransaction().commit();
        System.out.println("Database population completed.");

        long ms, result, elapsedTime;

        System.out.println("Query 1 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q1 = em.createQuery(
            "SELECT COUNT(a.id) FROM A a", Long.class);
        result = q1.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 1 completed in " + elapsedTime +
            " milliseconds (" + result + ")");

        System.out.println("Query 2 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q2 = em.createQuery(
            "SELECT COUNT(a.value) FROM A a", Long.class);
        result = q2.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 2 completed in " + elapsedTime +
            " milliseconds (" + result + ")");

        System.out.println("Query 3 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q3 = em.createQuery(
            "SELECT COUNT(a.str) FROM A a", Long.class);
        result = q3.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 3 completed in " + elapsedTime +
            " milliseconds (" + result + ")");

        em.close();
        emf.close();
    }
   
    @Entity
    public static class A {
        @Id @GeneratedValue long id;
        int value = 1000000;
        String str = "abcdefghijklmnopqrstuvwxyz";

        public A() {
        }
    }
}

 

ObjectDB Support
#7

I believe the class A in your test case is too simple to give a reliable proof. Please see the message "http://www.objectdb.com/database/forum/226". The random strings and number may perform differently.

TIA

Reply