Iterating over large result set

#1

Hi,

Few days ago I was writing some code to iterate over quite a large dataset. Fitting those data in memory was quite a challenge even with 3GB heap space. So I did paging using two loops.

for (int page=0; page < resultCount; page+=pageSize) {
    Query q = em.createQuery("query");
    for (Tuple t : q.setFirstResult(page).setMaxResults(pageSize).getResultList()) {
        // work with fetched data
    }
    em.clear();
}

I'll be using this with data streaming with REST and Jersey. So, my question is - is there a better way to do such thing with ODB? In PostgreSQL for example I can use cursors. With some JDBC drivers there is a hint, that driver should fetch data in packages of given size until whole result set is iterated. 

What I'm thinking about is some kind of streaming of result set from database without building whole object graph representing fetched data in application memory.

#2

You may be able to replace the loop of paging queries with one query by running the query with a hint "objectdb.result-fetch" set to "LAZY".

Another option is to use a report query, i.e. a query that instead of entity objects returns the simple values that you need.

Finally, the paging loop could be more efficient in some cases if instead of specifying first/max results you will add a constraint in each iteration based on the ID (primary key) of the last retrieved entity object.

ObjectDB Support
#3

You may be able to replace the loop of paging queries with one query by running the query with a hint "objectdb.result-fetch" set to "LAZY".

That was first thing I was trying to do, but because objects are quite complicated and big with many @OneToMany relations, there was huge number of additional queries on database to fetch all those objects. Performance was really terrible.

 

Another option is to use a report query, i.e. a query that instead of entity objects returns the simple values that you need.

In some cases I can use this option, but with more than 10 fields in object and additional mappings from relations this is getter / setter nightmare. I use Dozer to map entity objects to DTO and then converting DTO to json / xml using Jersey. This way I can be sure that there is no infinite loops in objects graph as JAXB and other converters don't understand JPA relations.

 

Finally,  the paging loop could be more efficient in some cases if instead of specifying first/max results you will add a constraint in each iteration based on the ID (primary key) of the last retrieved entity object.

This could work, as I don't need pages with even number of objects. The main goal here is to iterate over all records from database. Thanks! This is the next thing to test.

Reply