2269 words

Issue #255 - Use temporary files to enable very large transactions

BugVersion: 1.04Priority: CriticalStatus: FixedReplies: 27
#1
2015-05-11 14:18

We have transactions in which many objects are created. These objects are no longer needed in the current transaction after creation.
Unfortunately, we will receive an OutOfMemory exception because the objects are kept in the first level cache of ObjectDB.
In the forum thread http://www.objectdb.com/database/forum/921 on post #4 you wrote, that ObjectDB should support very large transactions in further version.


Can you implement it that ObjectDB used temporary files?

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #41
#2
2015-05-20 05:53

Hello,
for us is this scenario a blocker. We dereference the new created and persisted objects from the first level cache (by flush() and detach()). But an array of bytes, any ObjectDB memory, still allocates a lot of memory.

For us it would be very helpful if we can get an evaluation version, in which this data is written to a temporary file. Even without speed optimizations, but with priority on the memory usage. Ideally, even with adjustable threshold.

How do you estimate the realizability of this requirement?
Can you estimate how quickly you could implement this?

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #45
#3
2015-05-20 15:08

Although ObjectDB already uses temporary files for many activities (e.g. processing large query results), currently the size of a transaction (i.e. the total size of database pages that have to be replaced) is limited by the JVM heap size.

Supporting huge transactions requires some major changes in ObjectDB.

However, it may be possible to provide a temporary inefficient workaround for a specific situation.

For this, we need more information:

  • What is the total size of data in the transaction?
  • What combination of DELETE / INSERT / UPDATE operations are in that huge transaction?
  • Can we assume single user mode (i.e. batch background load) or do you need multi threading support?
  • Why don't you split the transaction to multiple smaller transactions?
ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,203
#4
2015-05-21 06:10
  •     What is the total size of data in the transaction?
    • The total size is principle unlimited.
  •     What combination of DELETE / INSERT / UPDATE operations are in that huge transaction?
    • Possible is every combination.
  •     Can we assume single user mode (i.e. batch background load) or do you need multi threading support?
    • We need multi threading support.
  •     Why don't you split the transaction to multiple smaller transactions?
    • Because it is a large transaction content. If there is an error, any changes must be rolled back.
btc_es
btc_es's picture
Joined on 2014-10-20
User Post #46
#5
2015-05-21 07:52

Following your answers, it may be very difficult to provide a quick solution that can address your needs. Particularly the unlimited size. The memory problem is with Page instances in memory that wrap byte[] content with additional information. Even if the byte[] content is moved to disk (which is also not simple and involves many difficulties) we will not be able to support unlimited number of Page instances.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,205
#6
2015-05-22 11:26

Regarding a new maximum for transaction size. In the long term we will be able to eliminate any practical limit (although the database size itself has a maximum size).

However, if you are interested in a quick solution, we may be able to increase the current maximum from heap size, to about 20 x the heap size. For example for heap size of 5GB the maximum transaction size can be 100GB.

Please check if this is sufficient for your current needs.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,208
#7
2015-05-26 11:58

We managed to add a new option of using temporary files for flush / commit, as suggested in #6 above (i.e. with heap size of 5GB the maximum transaction size will be increased from 5GB to about 100GB). Further improvements are possible.

Please let us know if you are interested in a build that integrate this for testing.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,210
#8
2015-05-27 13:50

Hello,
Yes, we are still interested in this solution. I think the factor of 20 relative to the heap should also at our "extreme" scenarios are sufficient.
For a special use case, we now use a second database. Within the real transaction we work with a copy of this database. This enables us to repeatedly commit on the second database and in case of a fault we can still recover a consistent state.  We run the commit, if the Application consumes a percentage threshold of memory.
Another advantage of this solution is that we can continue to work with queries, because the queries are very slowly after create many objects.
However, we found the solution with temporary files better, because this will works in all use cases. For other use cases we can’t use a separate database.

In short: we are interested in a build that integrate this for testing.

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #49
#9
2015-05-28 07:26

We are adding some additional improvements and testing so please expect 2-3 weeks for this release.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,214
#10
2015-06-12 13:38

A build with ability to use temporary files is available now (2.6.2_06).

To enable temporary files in transaction flush/commit set a new system property:

    System.setProperty("objectdb.temp.page-file", "true");

This can also be set as a JVM argument:

> java ... -Dobjectdb.temp.page-file=true

Note: If you are using client-server mode - this should be set on the server side.

Using this new feature should only be done if there is no other option. Splitting huge transactions to several smaller transactions (which can fit the JVM heap) is more efficient and preferred.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,240
#11
2015-09-11 08:40

We test to enable temporary files in transaction flush/commit, but it seems there is no effect.
The test persists 10000 entities and call a flush. Following the next 10000 entities will be persisted.
After the seventh flush the heap limit of 500 mb is reached and a OutOfMemoryError exception has been thrown.

During the test we only see the files eppdb.odb and eppdb.odb$. Where are the temporary files?

The test executed with 'System.setProperty("objectdb.temp.page-file", "true");' and with JVM Param '-Dobjectdb.temp.page-file=true'.

ObjectDB version ObjectDB 2.6.3_04
 
com.objectdb.o.InternalException: Unexpected internal exception
at com.objectdb.o.TSK.g(TSK.java:114) ~[na:na]
at com.objectdb.o.TSK.f(TSK.java:98) ~[na:na]
at com.objectdb.o.TSM.e(TSM.java:86) ~[na:na]
at com.objectdb.o.UTT.A(UTT.java:370) ~[na:na]
at com.objectdb.o.UTT.l(UTT.java:208) ~[na:na]
at com.objectdb.o.TSK.i(TSK.java:145) ~[na:na]
at com.objectdb.o.TSK.f(TSK.java:95) ~[na:na]
at com.objectdb.o.MST.Vg(MST.java:1337) ~[na:na]
at com.objectdb.o.WRA.Vg(WRA.java:381) ~[na:na]
at com.objectdb.o.WSM.Vg(WSM.java:153) ~[na:na]
at com.objectdb.o.OBM.bR(OBM.java:979) ~[na:na]
at com.objectdb.o.OBM.bO(OBM.java:847) ~[na:na]
at com.objectdb.o.OBM.flush(OBM.java:763) ~[na:na]
at com.btc.ep.base.dal.tests.it.IT_PerformanceTest.performanceAndHeapTest(IT_PerformanceTest.java:55) [com.btc.ep.base.dal.tests.it/:na]
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.objectdb.o.OBH.e(OBH.java:219) ~[na:na]
at com.objectdb.o.VLV.p(VLV.java:274) ~[na:na]
at com.objectdb.o.UPT.G(UPT.java:257) ~[na:na]
at com.objectdb.o.UPT.m(UPT.java:172) ~[na:na]
at com.objectdb.o.TSK.k(TSK.java:183) ~[na:na]
at com.objectdb.o.TSK.i(TSK.java:156) ~[na:na]
at com.objectdb.o.TSK.f(TSK.java:95) ~[na:na]
at com.objectdb.o.UPT.s(UPT.java:157) ~[na:na]
at com.objectdb.o.PGT.q(PGT.java:109) ~[na:na]
at com.objectdb.o.UPT.C(UPT.java:121) ~[na:na]
at com.objectdb.o.URT.l(URT.java:171) ~[na:na]
at com.objectdb.o.TSK.i(TSK.java:145) ~[na:na]
at com.objectdb.o.TSK.f(TSK.java:95) ~[na:na]
at com.objectdb.o.TSM.e(TSM.java:86) ~[na:na]
at com.objectdb.o.UTT.A(UTT.java:370) ~[na:na]
at com.objectdb.o.UTT.l(UTT.java:208) ~[na:na]
at com.objectdb.o.TSK.i(TSK.java:145) ~[na:na]
at com.objectdb.o.TSK.f(TSK.java:95) ~[na:na]
at com.objectdb.o.MST.Vg(MST.java:1337) ~[na:na]
at com.objectdb.o.WRA.Vg(WRA.java:381) ~[na:na]
at com.objectdb.o.WSM.Vg(WSM.java:153) ~[na:na]
at com.objectdb.o.OBM.bR(OBM.java:979) ~[na:na]
at com.objectdb.o.OBM.bO(OBM.java:847) ~[na:na]
at com.objectdb.o.OBM.flush(OBM.java:763) ~[na:na]
at com.btc.ep.base.dal.tests.it.IT_PerformanceTest.performanceAndHeapTest(IT_PerformanceTest.java:55) [com.btc.ep.base.dal.tests.it/:na]
btc_es
btc_es's picture
Joined on 2014-10-20
User Post #70
#12
2015-09-11 15:01

Do you invoke em.clear after flush to empty the persistence context?

Do you use embedded mode or client-server mode? If you were using client-server mode then the stack trace clearly indicates a client side OutOfMemoryError (maybe due to a missing clear). The new temporary files have effect only on the server side.

You specify a location for temporary files in the configuration file. The default path is $temp/ObjectDB, where $temp is the default Java temporary directory (System.getProperty("java.io.tmpdir")).

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,306
#13
2015-09-14 11:17

Yes, we invoke em.clear after flush. And we can see how the ObjectDB temp file pages0.dat will be bigger after flush.

Also we see in a heap dump many byte arrays which are referenced by ObjectDB classes.

The count of byte arrays grows after flush and clear, although we start the GC after clear. And the heap reached its limit after x flushes.

I believe the byte arrays are the db cache pages because the size of the arrays is identical to the cache page configuration.

We use the ObjectDB as embedded db.

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #71
#14
2015-09-14 15:55

Can you post a heap dump, or at least relevant information from the heap dump, such as type and number of instances of these byte[] elements in the heap, as well as paths to these objects from root objects (to see why they are reachable and cannot be garbage collected).

If you can share the test it would be even better, since in our test the new setting seems to work.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,308
#15
2015-09-15 07:42

Following the heap dump information which is a different view between flush and clear. After flush and clear we have invoke a full GC and after that we have make the heap dump.

The screenshots shows a summary in 'number_of_instances' and the paths to these objects from root objects.

Before the flush has been executed, the test persisted 10.000 entities.

 

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #72
#16
2015-09-15 09:21

Do you use enhanced classes? If not, could you please run again with enhanced classes?

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,309
#17
2015-09-15 10:59

Just we have tested by usage of enhanced classes.

But the result is an out of memory exception again.

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #73
#18
2015-09-15 11:20

Following as attachment our test.

The JVM arguments for the test:

-Xmx512m
-Dobjectdb.temp.page-file=true

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #74
#19
2015-09-18 11:30

Thank you for the test case. Build 2.6.3_05 includes some required fixes.

In order to pass your test you will have to use the new build but also to tune the configuration as follows.

    <temp path=... threshold="1mb" />

The threshold indicates a maximum size for a memory based list. Every flush is stored as a separate list. 10,000 entity objects (in your current test) consume about 2MB. Therefore, if the threshold is higher, every flush will be stored in memory rather than in a temporary file. Unfortunately this configuration is per list and not global. It may be better to use a larger threshold, e.g. 10mb, but also flush every 100,000 entity objects instead of very 10,000 entity objects.

    <cache ref="weak" level2="0" />

When the 2nd level cache is enabled, ObjectDB manages in-memory list of all the flushed entity IDs, in order to avoid retrieving obsolete content from the L2 cache. In your test the PKs are large (32 characters, and about 160 bytes overhead in total per entity object), so millions of objects will require larger heap size.

As discussed above, this implementation is a temporary solution, and should be used only if there are no other options. It is also slow. A much better solution (faster and with unlimited transaction size) is now planned for ObjectDB 3.0, but will not be released before 2017.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,310
#20
2015-11-12 08:43

The configuration threshold="1mb" performs well.

But the temp file grows over 32 GB in our use cases, when we have been persisted many entities.

Is it really necessary that the temp file grows more than the database file size?

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #81
#21
2015-11-12 13:35

Yes, the temporary file can be larger than the database, since the same database page may be stored in the file multiple times representing the state of that page in different transactions.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,339
#22
2015-11-12 13:44

If the database stores the pages multiple times, would not it be make more sense that the old pages will overwrite?

So that the temp file does not grow so much.

This circumstance is a critical issue for us.

 

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #82
#23
2015-11-12 14:34

Space in the temporary file is eventually reused, of course, but there is still a need to hold multiple versions for concurrent transactions in order to support MCC.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,342
#24
2015-11-17 07:23

We understand that the temporary file can be larger than the database file, if we access the same database pages in different transactions.
But we don't do it.

 

We prepared our old example so that it work without the temporary file with less than 2 GB Memory.
But, if we enable the temporary File, the File grows up to 8 GB.

And we can't understand why the temporary file grows over the 2 GB, what should be sufficient so because in the first case, all data can be kept in memory.

(to enable the temporary file, uncomment the line 8)

 

import java.util.*;
import javax.persistence.*;
 
public final class OutOfMemoryWhenPersistAndFlushBigDataInSingleTransaction {
 
public static void main(String[] args) {
  System.out.println("start");
  //System.setProperty("objectdb.temp.page-file", "true");
  //System.setProperty("objectdb.temp.avoid-page-recycle", "true");
  EntityManagerFactory emf = Persistence
    .createEntityManagerFactory("objectdb:$objectdb/db/test.tmp;drop");
  EntityManager em = emf.createEntityManager();
 
  // Persist an entity:
  em.getTransaction().begin();
  for (int i = 0; i < 3000000; i++) {
 
            Message msg = new MessageImpl();
            msg.setTypeID(UUID.randomUUID().toString()
                    + "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890");
   em.persist(msg);
 
   if (i % 100000 == 0) {
    em.flush();
    em.clear();
    System.out.println("flush: " + i);
   }
  }
  em.getTransaction().commit();
 
  em.close();
  emf.close();
}
 
@Entity
public static class MessageImpl extends ModelElementImpl implements Message {
 
     @Basic
     private String typeID;
 
     @Basic
     @Temporal (TemporalType.TIMESTAMP)
     private Date date;
 
     /**
      * Construct a new message and set its date to this time of instantiation.
      */
     public MessageImpl() {
         super();
         this.date = new Date();
     }
 
     @Override
     public void setTypeID(String typeID) {
         this.typeID = typeID;
     }
 
     @Override
     public String getTypeID() {
         return this.typeID;
     }
}
 
@Entity
public static abstract class ModelElementImpl implements ModelElement {
 
     @Id
     private String uid;
 
     protected ModelElementImpl() {
         this.uid = UUID.randomUUID().toString().toUpperCase().replaceAll("-", "");
     }
 
     @Override
     public int hashCode() {
         if (uid == null) {
             return super.hashCode();
         }
         return uid.hashCode();
     }
 
     @Override
     public boolean equals(Object obj) {
         if (obj instanceof ModelElement && this.getUid() != null) {
             return this.getUid().equals(((ModelElement)obj).getUid());
         }
         return super.equals(obj);
     }
 
     @Override
     public final String getUid() {
         return this.uid;
     }
}
 
public static interface Message extends ModelElement {
 
     public String getTypeID();
 
     public void setTypeID(String typeID);
}
 
public static abstract interface ModelElement {
 
     public String getUid();
 
     @Override
     public boolean equals(Object obj);
 
     @Override
     public int hashCode();
}
 
}

Configuration:

<objectdb>
        <general>
                <temp path="$temp" threshold="1mb" />
                <network inactivity-timeout="0" />
                <url-history size="50" user="true" password="true" />
                <log path="$objectdb/log/" max="8mb" stdout="true" stderr="false" />
                <log-archive path="$objectdb/log/archive/" retain="90" />
                <logger name="*" level="fatal" />
        </general>
        <database>
                <size initial="256kb" resize="256kb" page="2kb" />
                <recovery enabled="true" sync="false" path="." max="128mb" />
                <recording enabled="false" sync="false" path="." mode="write" />
                <locking version-check="true" />
                <processing cache="64mb" max-threads="10" />
                <query-cache results="32mb" programs="500" />
                <extensions drop="temp,tmp" />
 
                <activation code="***removed***" />
        </database>
     <entities>
                <enhancement agent="false" reflection="ignore" />
                <cache ref="weak" level2="0" />
                <fetch hollow="false" />
                <persist serialization="true" />
                <cascade-persist always="auto" on-persist="false" on-commit="true" />
                <dirty-tracking arrays="true" />
        </entities>
</objectdb>

 

Can you explain to us the difference of memory and file-size usage between in-memory(<2GB) and temporary file(8GB)?

 

Hint: We currently use version 2.6.3.b07

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #84
#25
2015-11-18 12:24

Your are right, there was a bug in the implementation of this new feature.

Please try build 2.6.4_04.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,345
#26
2015-11-19 07:16

We have a further notice.

If the clear() is not call after flush() then the first level cache will not be automatically cleared when the memory limit is reached and an OutOfMemoryException occurs. I thought that not dirty entities after flush() are weak references and can be removed by GC. Can you explain this case?

 

btc_es
btc_es's picture
Joined on 2014-10-20
User Post #87
#27
2015-11-19 09:34

ObjectDB manages about 20 different object states and for each state there are settings that also include an indication whether it should be referenced by ObjectDB using a strong references or a weak references.

A check now shows that objects that are retrieved from the database, modified and then flushed are set to have weak references as expected. However, new flushed objects are set to have strong references. If there was a reason for this setting it is unclear now, so following your request this is changed now to weak references in build 2.6.4_05.

Please check the new build and report any unexpected issues in handling flushed objects.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,346
#28
2016-02-11 14:43

Build 2.6.6_04 adds support of temporary files in large update queries.

Still, this should be enabled by setting a system property (on the server side in client-server mode), as described on #10 above.

ObjectDB Support
ObjectDB - Fast Object Database for Java (JPA/JDO)
support
support's picture
Joined on 2010-05-03
User Post #2,420

Post Reply

Please read carefully the posting instructions - before posting to the ObjectDB website.

  • You may have to disable pop up blocking in order to use the toolbar (e.g. in Chrome).
  • Use ctrl + right click to open the browser context menu in the editing area (e.g. for using a browser spell checker).
  • To insert formatted lines (e.g. Java code, stack trace) - select a style in the toolbar and then insert the text in the new created block.
  • Avoid overflow of published source code examples by breaking long lines.
  • You may mark in paragraph code words (e.g. class names) with the code style (can be applied by ctrl + D).
  • Long stack traces (> 50 lines) and complex source examples (> 100 lines) should be posted as attachments.
Attachments:
Maximum file size: 32 MB
Cancel