938 words

Join performance in Objectdb

#1
2011-08-27 08:37
SELECT count(e) FROM Meta e, ImageNet i, Page p
WHERE e.idImage=i.id AND e.idPage=p.id;

So simple a command cause me half an hour and is running now! Each table has about 160K records. What formula can predict the time for such counting?

TIA

Institute of Computer Technologies
gzdillon
gzdillon's picture
Joined on 2011-05-17
User Post #90
#2
2011-08-27 09:44

Now, after 2 hours, it is still running.

Maybe I used a wrong command?   

Institute of Computer Technologies
gzdillon
gzdillon's picture
Joined on 2011-05-17
User Post #91
#3
2011-08-28 20:41

According to your query you use IDs instead of references, which is not recommended when using JPA. It seems that a Meta instance holds the IDS of an Image and a Page instances instead of direct references.

A more JPA conventional variant of your query would be:

SELECT count(e) FROM Meta e, ImageNet i, Page p
WHERE e.image=i AND e.page=p

Anyway, this still doesn't justify execution time of hours.

Since you didn't provide a test case (as requested in the posting instructions) it is impossible to see what is exactly the problem. Maybe the id fields in Page and Image are not annotated with @Id or @Index. In that case, executing the JOIN may require iteration over 160,000 x 160,000 x 160,000 tuples.

The following test was written following your post to demonstrate queries which are similar to your query:

import javax.persistence.*;
 
public class F242 {
 
    @SuppressWarnings("boxing")
    public static void main(String[] args) {
        EntityManagerFactory emf =
            Persistence.createEntityManagerFactory(
                "objectdb:$objectdb/db/test.tmp;drop");
        EntityManager em = emf.createEntityManager();
        int entityCount = 160000;
 
        System.out.println("Database population started.");
        em.getTransaction().begin();
        for (int i = 0; i < entityCount; i++) {
            B b = new B();
            em.persist(b);
            C c = new C();
            em.persist(c);
            em.persist(new A(b, c));
        }
        em.getTransaction().commit();
        System.out.println("Database population completed.");
 
        long ms, result, elapsedTime;
 
        System.out.println("Query 1 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q1 = em.createQuery(
            "SELECT COUNT(a) FROM A a, B b, C c " +
            "WHERE a.bId = b.id AND a.cId = c.id", Long.class);
        result = q1.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 1 completed in " + elapsedTime +
            " milliseconds (" + result + ")");
 
        System.out.println("Query 2 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q2 = em.createQuery(
            "SELECT COUNT(a) FROM A a, B b, C c " +
            "WHERE a.b = b AND a.c = c", Long.class);
        result = q2.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 2 completed in " + elapsedTime +
            " milliseconds (" + result + ")");
 
        em.close();
        emf.close();
    }
 
    @Entity
    public static class A {
        @Id @GeneratedValue long id;
        B b;
        long bId;
        C c;
        long cId;
 
        public A() {
        }
 
        public A(B b, C c) {
            this.b = b;
            this.bId = b.id;
            this.c = c;
            this.cId = c.id;
        }
    }
 
    @Entity
    @SequenceGenerator(name="seqB", allocationSize=1000)
    public static class B {
        @Id @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seqB")
        long id;
    }
 
    @Entity
    @SequenceGenerator(name="seqC", allocationSize=1000)
    public static class C {
        @Id @GeneratedValue(strategy=GenerationType.SEQUENCE, generator="seqC")
        long id;
    }
}

The result output is:

Database population started.
Database population completed.
Query 1 started.
Query 1 completed in 1594 milliseconds (160000)
Query 2 started.
Query 2 completed in 109 milliseconds (160000)

The first query uses a JOIN on ID fields, similarly to your original query. The second query uses JOIN on direct references. The second form is simpler, faster and more appropriate when using JPA.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #578
#4
2011-08-28 23:41

1, It is supposed that there is an implicit index for the @ID in JPA, right?

2, I add @index to each field of idxxxx;

3, The Meta is parsed from a xml file like:

...
<PIC>
<PIC_URL></PIC_URL>
<PAGE_URL></PAGE_URL>
<ALT_TEXT></ALT_TEXT>
<ANCHOR_TEXT></ANCHOR_TEXT>
<SUR_TEXT></SUR_TEXT>
<PAGE_TITLE></PAGE_TITLE>
<WIDTH></WIDTH>
<HEIGHT></HEIGHT>
</PIC>
...
I have no reference to set in the Meta, unless do additional searching and replacement.

TIA

Institute of Computer Technologies
gzdillon
gzdillon's picture
Joined on 2011-05-17
User Post #92
#5
2011-08-29 02:55

I think this is a bug. The aggreiation of Objectdb seems to be based on the size of fields of entities:

JPQL> select count(e.id) from Page e
1 java.lang.Long: 161143
Time : 23.889999
 
JPQL> select count(e.sUrl) from Page e
1 java.lang.Long: 161143
Time : 96.968002
 
JPQL> select count(e.iSize) from Page e
1 java.lang.Long: 161143
Time : 10.359000
 
JPQL>

The command tool is from the book of Mike Keith.

Obviously, it is supposed that the count of every field should be the same performance. I have no time to test the field of byte[], you know.

TIA

Institute of Computer Technologies
gzdillon
gzdillon's picture
Joined on 2011-05-17
User Post #93
#6
2011-08-29 07:25

> 1, It is supposed that there is an implicit index for the @Id in JPA, right?

That is true.

> 2, I add @index to each field of idxxxx;

I am not sure that it would help for this specific query.

> I have no reference to set in the Meta, unless do additional searching and replacement.

Building the database correctly may be more complicated but then it would work faster.

> The aggreiation of Objectdb seems to be based on the size of fields of entities.

I tried to repeat this with the following test but could get your results.

Can you change this test to demonstrate the execution time difference that you experience?

import javax.persistence.*;
 
public class F242b {
 
    @SuppressWarnings("boxing")
    public static void main(String[] args) {
        EntityManagerFactory emf =
            Persistence.createEntityManagerFactory(
                "objectdb:$objectdb/db/test.tmp;drop");
        EntityManager em = emf.createEntityManager();
        int entityCount = 1600000;
 
        System.out.println("Database population started.");
        em.getTransaction().begin();
        for (int i = 0; i < entityCount; i++) {
            em.persist(new A());
        }
        em.getTransaction().commit();
        System.out.println("Database population completed.");
 
        long ms, result, elapsedTime;
 
        System.out.println("Query 1 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q1 = em.createQuery(
            "SELECT COUNT(a.id) FROM A a", Long.class);
        result = q1.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 1 completed in " + elapsedTime +
            " milliseconds (" + result + ")");
 
        System.out.println("Query 2 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q2 = em.createQuery(
            "SELECT COUNT(a.value) FROM A a", Long.class);
        result = q2.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 2 completed in " + elapsedTime +
            " milliseconds (" + result + ")");
 
        System.out.println("Query 3 started.");
        ms = System.currentTimeMillis();
        TypedQuery<Long> q3 = em.createQuery(
            "SELECT COUNT(a.str) FROM A a", Long.class);
        result = q3.getSingleResult();
        elapsedTime = System.currentTimeMillis() - ms;
        System.out.println("Query 3 completed in " + elapsedTime +
            " milliseconds (" + result + ")");
 
        em.close();
        emf.close();
    }
 
    @Entity
    public static class A {
        @Id @GeneratedValue long id;
        int value = 1000000;
        String str = "abcdefghijklmnopqrstuvwxyz";
 
        public A() {
        }
    }
}

 

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #581
#7
2011-08-29 08:23

I believe the class A in your test case is too simple to give a reliable proof. Please see the message "http://www.objectdb.com/database/forum/226". The random strings and number may perform differently.

TIA

Institute of Computer Technologies
gzdillon
gzdillon's picture
Joined on 2011-05-17
User Post #94

Post Reply

Please read carefully the posting instructions - before posting to the ObjectDB website.

  • You may have to disable pop up blocking in order to use the toolbar (e.g. in Chrome).
  • Use ctrl + right click to open the browser context menu in the editing area (e.g. for using a browser spell checker).
  • To insert formatted lines (e.g. Java code, stack trace) - select a style in the toolbar and then insert the text in the new created block.
  • Avoid overflow of published source code examples by breaking long lines.
  • You may mark in paragraph code words (e.g. class names) with the code style (can be applied by ctrl + D).
  • Long stack traces (> 50 lines) and complex source examples (> 100 lines) should be posted as attachments.
Attachments:
Maximum file size: 32 MB
Cancel