Inserted entities with strings as keys and indices needs more and more ram memory in comparing to primitive integers as keys and indices

#1

Hello,

I have a problem with inserting of many entities.

We use strings as primary key and indices. But if the db table of the entity type contains more and more data then the inserting of further entities needs also more and more ram memory, this leads to out of memory exceptions.

But if use primitive integers as keys and indices then the memory consumption for new entities is constant.

Can you explain this and do you have an idea for the case that strings are used for keys and indices?

See both examples.

#2

Correction. The case with strings as primary key and indices needs only more ram memory, but not more and more memory. And if the max heap memory is big enough then the solution works without out of memory exception.

#3

These 2 sample programs demonstrate well that automatic long primary keys are indeed much more efficient than String UID primary keys.

The main reasons for this (ordered from the most important):

  1. Automatic @GeneratedValue primary keys are sequential. So when you persist multiple new objects their primary keys are sequential and therefore they are stored in sequence pages in the database. New objects with UID primary keys may have to be stored distributed in the entire database file, so a similar transaction may require searching positions in the database and processing much more database pages. This difference becomes larger when the total number of objects of that type in the database increases.
  2. Less space is wasted in database when the storing objects sequentially, because every page be filled to the maximum with new objects.
  3. Long values consume less  space than String UID and are processed faster.
ObjectDB Support
#4

Thank you for the clarification.

The example with long keys needs round about 750 MB RAM memory.

The example with string keys needs round about 1,5 GB RAM memory.

Do you know any possibility to reduce the memory consumption, for example with other configurations, especially for the example with strings?

 

#5

> Do you know any possibility to reduce the memory consumption?

  • If possible reduce the size of the transactions.
  • Use enhanced classes.
  • Reduce cache size. You can disable the L2 data cache by setting its size to 0.
  • Use smaller database pages, so updates that are spread over the entire database and requires changes of many existing pages (due to using UID primary key) will  take less space.
  • Use flush during large transactions with temp file enabled (see this thread).

1.5GB seems indeed too much for transactions of 500,000 relatively small objects. Have you checked that all that memory is really in use? Take a heap snapshot and check the total size of the live objects. The JVM uses the available heap size to delay GC. What happens when you run the program with a smaller max heap size?

 

ObjectDB Support
#6

If I reduce indeed the RAM to 1 GB the example runs also through. Always 25% of the RAM are free.

But I still have another question. Have @Basic long types with @Index instead of string types the same advantages as primary long keys.

#7

> Have @Basic long types with @Index instead of string types the same advantages as primary long keys.

The exact question is unclear. Adding additional @Index long field will not replace or help with an issue with the primary key (@Id) if it is inefficient. Please explain what are the options that you have to choose from and what you are trying to do.

ObjectDB Support
#8

You mentioned that long primary keys are more efficient than String UID primary keys.

  1. Automatic @GeneratedValue primary keys are sequential. So when you persist multiple new objects their primary keys are sequential and therefore they are stored in sequence pages in the database. New objects with UID primary keys may have to be stored distributed in the entire database file, so a similar transaction may require searching positions in the database and processing much more database pages. This difference becomes larger when the total number of objects of that type in the database increases.
  2. Less space is wasted in database when the storing objects sequentially, because every page be filled to the maximum with new objects.
  3. Long values consume less  space than String UID and are processed faster.

 

Have indexed fields with long types the same advantages as primary keys with long types in comparing to fields with string types.

@Index
@Basic
long step;

#9

To some extent, yes, but it depends. For example, are the long values sequential or not?

Anyway, for the primary key it is much more critical, because the entire content of objects are organised and stored under that order, indexes are usually smaller.

Back to primary keys, another advantage of long PK over String UID PK that was not mentioned yet:

  • All references to the objects (in reference fields in other objects) and all the indexes use the PK to reference objects. So a smaller PK (long) will make them smaller as well, i.e. less pages to process, store, etc.

Therefore,  when possible PK should be sequential long values (which is also the default if you do not define any @Id). Note, that you can even manage a separate indexed UID String field, if you really need it, and still improve efficiency (except when retrieving the objects by the UID, as retrieval by index is slightly slower then by PK).

ObjectDB Support

Reply