Database size is much larger than expected (x2)

#1

I store images into objectdb.

The schema is :

<URI>::=(<sUrl>, <cMark>, <iSize>, <baBody>, <id>)
<sUrl>::=String
<cMark>::=char
<iSize>::=int
<baBody>::=byte[]
<id>::=@ID

So simple a class will fail after 155,648 insertions with -Xmx1432m,

...
NO = 151552 @ 8042217472
id = 0 @ 7703244276
Url = http://cimg2.163.com/catchpic/E/E9/E96492634557CE291CDF3AADC710F373.jpg

NO = 155648 @ 8042217472
id = 0 @ 8016831165
Url = http://pic.tiexue.net/pics/2005_10_19_99028_599028.jpg

Exception in thread "main" [ObjectDB 2.2.9_03] javax.persistence.RollbackExcepti
on
Failed to commit transaction: Java heap space (error 613)
        at com.objectdb.jpa.EMImpl.commit(EMImpl.java:277)
        at image.URIReader.read(URIReader.java:84)
        at image.URIReader.main(URIReader.java:123)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at com.objectdb.o.BYW.r(BYW.java:86)
        at com.objectdb.o.ENH.c(ENH.java:212)
        at com.objectdb.o.ENT.R(ENT.java:738)
        at com.objectdb.o.STA.Z(STA.java:656)
        at com.objectdb.o.STM.H(STM.java:515)
        at com.objectdb.o.OBM.bG(OBM.java:858)
        at com.objectdb.o.OBM.bE(OBM.java:715)
        at com.objectdb.jpa.EMImpl.commit(EMImpl.java:274)
        ... 2 more

Question 1 : Why objectdb cause so many memory that it fails with "-Xmx1432m", even if em.clear() at every 0xFFF insertions?

Question 2 : Why the size of db increased to 16,671,309,824, while the original images are stored in a single file of 7GB?

#2

Question 3: The following messages will occur when run in Eclipse 3.5 in XP.

[ObjectDB 2.2.9_03 Enhancer]
5 persistable types have been enhanced:
    image.ImageSogou
    image.Meta
    image.Page
    image.Thumb
    image.URI
7 NON persistable types have been enhanced:
    image.ImageReader
    image.ImageReader$JPcanv
    image.MetaReader
    image.MetaReader$xmler
    image.PageReader
    image.PageTest
    image.URIReader

NO = 256 @ 8042217472
id = 256 @ 15621180
Url = http://www.jspet.com.cn/UploadThumbs/200459214931523.jpg

NO = 512 @ 8042217472
id = 512 @ 29024565
Url = http://www.qdn.cn/Html/yule/yulequan/UploadFile/20058410499734.jpg

NO = 768 @ 8042217472
id = 768 @ 43678127
Url = http://picture.moptv.com/hwdb/8002.jpg

NO = 1024 @ 8042217472
id = 1024 @ 57879036
Url = http://www.fubusi.com/Files/UpFiles/2006022409353334534.jpg

NO = 1280 @ 8042217472
id = 1280 @ 73880803
Url = http://ent.sz.net.cn/images/2005-09/27/xin_140902271845820143085.jpg

NO = 1536 @ 8042217472
id = 1536 @ 90296440
Url = http://www.csonline.com.cn/img/2003-07/22/tn_22a1404.jpg

Exception in thread "main" [ObjectDB 2.2.9_03] javax.persistence.RollbackException
Failed to commit transaction: Failed to write the value of field field image.URI.baURI using reflection (error 613)
at com.objectdb.jpa.EMImpl.commit(EMImpl.java:277)
at image.URIReader.read(URIReader.java:82)
at image.URIReader.main(URIReader.java:126)
Caused by: javax.persistence.PersistenceException: com.objectdb.o.UserException: Failed to write the value of field field image.URI.baURI using reflection
at com.objectdb.o._PersistenceException.b(_PersistenceException.java:47)
at com.objectdb.o.JPE.g(JPE.java:140)
at com.objectdb.o.JPE.g(JPE.java:78)
... 5 more
Caused by: com.objectdb.o.UserException: Failed to write the value of field field image.URI.baURI using reflection
at com.objectdb.o.MSG.d(MSG.java:74)
at com.objectdb.o.UMR.M(UMR.java:863)
at com.objectdb.o.UMR.x(UMR.java:540)
at com.objectdb.o.UML.u(UML.java:516)
at com.objectdb.o.MMM.af(MMM.java:1027)
at com.objectdb.o.UTY.aG(UTY.java:1195)
at com.objectdb.o.UTY.aF(UTY.java:1184)
at com.objectdb.o.ENH.c(ENH.java:202)
at com.objectdb.o.ENT.R(ENT.java:738)
at com.objectdb.o.STA.Z(STA.java:656)
at com.objectdb.o.STM.H(STM.java:515)
at com.objectdb.o.OBM.bG(OBM.java:858)
at com.objectdb.o.OBM.bE(OBM.java:715)
at com.objectdb.jpa.EMImpl.commit(EMImpl.java:274)
... 2 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.objectdb.o.OBH.c(OBH.java:150)
at com.objectdb.o.EBW.t(EBW.java:57)
at com.objectdb.o.BYW.w(BYW.java:151)
at com.objectdb.o.SBV.S(SBV.java:125)
at com.objectdb.o.SBT.writeArray(SBT.java:141)
at com.objectdb.o.ART.writeStrictly(ART.java:146)
at com.objectdb.o.TYW.ar(TYW.java:385)
at com.objectdb.o.TYW.at(TYW.java:409)
at com.objectdb.o.TYW.writeElement(TYW.java:259)
at com.objectdb.o.UMR$P.y(UMR.java:931)
at com.objectdb.o.UMR.x(UMR.java:537)
... 13 more
Error opening zip file or JAR manifest missing: /E:/projects/objectdb/bin/objectdb.jar
#3

It would be impossible to answer these questions without a sample test case since many details are missing, but here are the general directions:

  1. Commit every 4096 images may cause memory problem if the images are very large - try commit every 256 images.
  2. The database size should be larger than the size of the image files on the file system, but more than 100% overhead is not reasonable.
ObjectDB Support
#4

OK, I finished the same code in DOS mode. I observed that only one java task in DOS mode and two javaw tasks in Eclipse. I believe the command "java -cp objectdb.jar;. mycode" bind all things in the limit of "-Xmx1432m", but in Eclipse, the codes need a single javaw, while objectdb needs another one. So the memory problem cause:

Exception in thread "main" [ObjectDB 2.2.9_03] javax.persistence.RollbackException
Failed to commit transaction: Failed to write the value of field field image.URI.baURI using reflection (error 613)
at com.objectdb.jpa.EMImpl.commit(EMImpl.java:277)

as the message #2 mentioned.

#5

>>The database size should be larger than the size of the image files on the file system, but more than 100% overhead is not reasonable.

BTW, the objectdb with images indeed 2 times of the original size! You can check it by repeating insertion of   a pic of multiples. I can not send the 7GB file to you over this forum, right? You can geti it from here http://www.sogou.com/labs/dl/t-e.html.

 

#6

You should check your code, especially the conversion of images to byte[] - maybe unintentionally it duplicates the required space.

I just tried the following simple test:

package com.objectdb.test.bug.forum;

import java.util.*;
import javax.persistence.*;


public class T456 {

    public static void main(String[] args) {
        EntityManagerFactory emf =
            Persistence.createEntityManagerFactory("images.odb");
        EntityManager em = emf.createEntityManager();

        for (int i = 0; i < 1000; i++) {
            em.getTransaction().begin();
            em.persist(new Image());
            System.out.println(i);
            em.getTransaction().commit();
            em.clear();
        }

        em.close();
        emf.close();
    }

    @Entity
    static class Image {
        byte[] data = new byte[3000000]; // 3MB per image
    }
}

The result database size is about 3GB as expected.

ObjectDB Support
#7

Another thing to check in your code - maybe every image is unintentionally stored twice.

ObjectDB Support
#8

In fact, I store images as <Thumb>, that extends the <URI>. So is <Page>, <Music>.... While the super class store the real data, the derived class has some special features, such as width and height.

I guess Objectdb store the major of data into <URI> and <Thumb> both, then the total size is twice, right?

How can the derived class store only the special features when @Inheritance(strategy=InheritanceType.JOINED)?

#9

ObjectDB ignores @Inheritance but I think that inheritance strategy should not affect anyway also when using ORM JPA implementations.

Since you didn't publish your classes it is unknown where is exactly the problem, but apparently you store the same byte[] more than once - by referencing it from two entity objects, by referencing it more than once in the same entity (two fields, maybe in different levels of the hierarchy), or by including it in an embeddable object that is references twice.

 

ObjectDB Support
#10

>> by referencing it more than once in the same entity

So, this is the problem. As OO programming, regardless of how many references, the original data has to be only one. Why objectdb stores it many copies?

#11

It is not just ObjectDB. This is how it works in JPA (and JDO).

Only entity objects have identity and support multiple references without duplication.

All the other types are embedded in entity objects and duplicated if the they are referenced more than once.

This is indeed different than Java serialization.

ObjectDB Support
#12

Hi, I dont believe I store data more than once. I just em.persist(obj), and the obj has the same fields, where is the second copy?

If you have time to follow my codes, then please see through the attachment.

For the structure is fixed, I can not combine it into one .java. And the original file is "images", its structure is "URL" 0x0d0x0a "Mark" 0x0d0x0a "size"0x0d0x0a "body" 0x0d0x0a.

The RandomAccessFile lib you can replace it with java.io.RandomAccessFile

TIA

#13

I cannot see any persist in the attached code.

If you need additional help you will have to isolate the problem and prepare a self contained runnable test case similar to the test case in #6, following the posting instructions, i.e. one Java class with entity classes defined as inner static classes.

ObjectDB Support
#14

???

What IDE you use? .persist() is in URIReader.

OK

I copy all things into one java. the main() is in URIReader class.

 

#15

Your code contains file reading operations that are not related to ObjectDB and may cause the problem. You will have to follow the posting instructions and simplify the code. Remove any loops and file reading operations and try to demonstrate how pure persisting code causes the problem.

ObjectDB Support
#16

>>Remove any loops and file reading operations

is this relative to the problem?

Without real input data, how to test the real performance of objectdb?

The structure of my class is <URI>::=(<sUrl>, <cMark>, <iSize>, <baBody>, <id>), why "any loops and file reading operations" will cause the persistence of my class into multiple copies?

TIA

#17

Sure. You should test the performance of ObjectDB with real data. But if there is a problem you will have to isolate it in order to get help since this forum cannot provide unlimited debugging services.

Please understand that it doesn't matter for ObjectDB if you store an empty new byte[1000000] or a byte[] in that same size with data that was read from a file. So please demonstrate your problem without external file reading, which is irrelevant for ObjectDB.

The test in #6 demonstrates that ObjectDB stores byte[] as expected.

 

 

ObjectDB Support
#18

OK, the codes show the version 2.2.9_3 is doubling the size:

import static util.SingleManager.em;
import static util.SingleManager.trans;

import java.util.Random;

import javax.persistence.DiscriminatorColumn;
import javax.persistence.DiscriminatorType;
import javax.persistence.DiscriminatorValue;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Inheritance;
import javax.persistence.InheritanceType;

public class URIReader {

public static int read(int iNumber){
  int i = 0;
  int iSpan = 0xFF;
  Random rd = new Random();
  int iSum = 0;
  String sss = "sjdfowefjadfu aipfk;wejhfoiasuydfajdfljweoifasldjfiejlsjdfaiowufajsdfieuajdfjiaejf;ajdfie;lajdkfjiesljdafjeia;jfe";
  trans.begin();
  for (; i<iNumber; i++){
   Uris uri = new Uris();
   String sLine;
   sLine = sss.substring(rd.nextInt(100));
   uri.setsUrl(sLine);
   int iSize = rd.nextInt(5000);
   uri.setiSize(iSize);
   byte[] baB = new byte[iSize];
   rd.nextBytes(baB);
   uri.setBaURI(baB);
   iSum += iSize;
   em.persist(uri);
   if ((i & iSpan)==0){
    trans.commit();
    System.out.printf("NO = %d @ %d%n",i, uri.getId());
    System.out.printf("Url = %s%n%n", uri.getsUrl());
    em.clear();
    trans.begin();
   }
  }
  trans.commit();
  return iSum;
}
public static void main(String[] args) {
  int i2 = URIReader.read(10000);
  System.out.printf("Images bytes = %d%n",i2);
}
}
@Entity
@Inheritance(strategy=InheritanceType.JOINED)
@DiscriminatorColumn(discriminatorType=DiscriminatorType.INTEGER,name="type")
@DiscriminatorValue(value="71")
class Uris {
public Uris() {
}
byte[] baURI;
char cMark;
@Override
public boolean equals(Object obj) {
  if (obj instanceof Uris){
   Uris u = (Uris)obj;
   if (sUrl.equals(u.getsUrl()) && iSize==u.getiSize())
    return true;
   else
    return false;
  }
  return false;
}
@Override
public int hashCode() {
  return sUrl.hashCode();
}
@Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
int id;
int iSize;
String sUrl;
public byte[] getBaURI() {
  return baURI;
}
public char getcMark() {
  return cMark;
}
public int getId() {
  return id;
}
public int getiSize() {
  return iSize;
}
public String getsUrl() {
  return sUrl;
}
public void setBaURI(byte[] baURI) {
  this.baURI = baURI;
}
public void setcMark(char cMark) {
  this.cMark = cMark;
}
public void setiSize(int iSize) {
  this.iSize = iSize;
}
public void setsUrl(String sUrl) {
  this.sUrl = sUrl;
}

}

The result:

Images = 25088878

But in fact, the .odb is 50MB!

#19

This test does indicate a problem and I can approve that the database size is twice as large as it should be.

Thank you for the test - it is not exactly as required by the posting instructions (e.g. it is not self contained - it should use EntityManager directly instead of dependency on external classes) - but it was still very helpful.

Following the posting instruction and submitting such a test before #18 could save a lot of time.

Anyway, thank you for finding this problem. Hopefully a fix will be released soon.

 

ObjectDB Support
#20

To the methodenstreit, I do insist on pointing the problem is the responsibility of clients, and designing a test case is the one of providers. This can be relative to Fat client or thin client problem. I believe since a client points the problem, a provider should give a well-advised test case to proof themselves. Not all client are stubborn as me, then you may leave the bug in your home. It seems the provider benefits more from this situation.

#21

In many cases (not always) a user can isolate the problem quite easily by starting from the failed program and removing unnecessary code until the problem remains with no irrelevant code. For the vendor this is usually more complicated and sometimes impossible. Notice also that an attempt in #6 didn't work.

You can report a problem with no test case of course - but don't expect a quick fix in this case.

Please try build 2.2.9_04 that should improve byte[] persisting.

Storing char[], short[], Byte[], Short[], Character[] still remains inefficient in the new build, but byte[] is probably much more important.

 

ObjectDB Support
#22

>> but byte[] is probably much more important.

Yes, I agree. Thanks a lot.

Reply