Memory consumption of empty Strings

#1

After doing some memory related research in my program, I found a possible place for an enhancement of objectdb:

My program has several entities which contain many String. Often some of these String are empty (""). To avoid unnecessary memory consumption, the String within the entities are initialized like:
String firstName = "";

In this case every "empty" attribute shares the same String. But after reloading the entities from objectdb, every empty String has become a new (unique) String object, which is a waste of memory (refer to http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf - page 26 e.g.). I assume objectdb is doing a "new String()" for every String object loaded from the database, while for empty String a "".internal() would be more memory efficient.

I wrote a little SSCCE to demostrate this effect. The SSCCE contains 2 programs:

Program a (creating the entities):

public class CreateEntites {

  public static void main(String[] args) {
    EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("sscce.odb");
    EntityManager entityManager = entityManagerFactory.createEntityManager();

    ArrayList<MyEntity> entities = new ArrayList<MyEntity>();

    // create 200000 entries
    for (int i = 0; i < 200000; i++) {
      MyEntity entity = new MyEntity(i);
      entities.add(entity);

      entityManager.getTransaction().begin();
      entityManager.persist(entity);
      entityManager.getTransaction().commit();
      entityManager.clear();
    }

    entityManager.close();
    entityManagerFactory.close();

    System.out.println("let's create a heap dump");

    // used to have some time to create a heap dump
    try {
      Thread.sleep(15000);
    }
    catch (InterruptedException e) {
    }

    // just reuse the arraylist, that the GC isn't cleaning it too early
    for (MyEntity entity : entities) {
    }
  }

  @Entity
  public static class MyEntity {
    @Id
    int    id;

    String firstName = "";
    String lastName  = "";
    String street    = "";
    String city      = "";

    public MyEntity(int id) {
      this.id = id;
    }
  }
}

When running this program I did a heap dump (with VisualVM) while Thread.sleep was executed and analyzed the memory consumption:

Class                Instances            used Memory (Bytes)
MyEntity           200.000               10.400.000
char[]                7.154                   428.532
String                7.089                   226.848
...

 

Program b (loading this entities from the database):

public class ReadEntries {

  public static void main(String[] args) {
    EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("sscce.odb");
    EntityManager entityManager = entityManagerFactory.createEntityManager();

    List<MyEntity> entities;
    TypedQuery<MyEntity> query = entityManager.createQuery("SELECT myentity FROM MyEntity myentity", MyEntity.class);
    entities = query.getResultList();

    entityManager.close();
    entityManagerFactory.close();

    System.out.println("let's create a heap dump");

    // used to have some time to create a heap dump
    try {
      Thread.sleep(15000);
    }
    catch (InterruptedException e) {
    }

    // just reuse the arraylist, that the GC isn't cleaning it too early
    for (MyEntity entity : entities) {
    }
  }

  @Entity
  public static class MyEntity {
    @Id
    int    id;

    String firstName = "";
    String lastName  = "";
    String street    = "";
    String city      = "";

    public MyEntity(int id) {
      this.id = id;
    }
  }
}

When running this program I did a heap dump (with VisualVM) while Thread.sleep was executed and analyzed the memory consumption:

Class                Instances            used Memory (Bytes)
char[]                807.363              13.237.950
String                807.291              25.833.312
MyEntity           200.000               10.400.000
...

 

as you can see, the whole memory consumption of this little program has grown from ~10MB to ~50MB.

Possible solution: whenever loading an empty String from the database, objectdb should not create a new empty String - it should reuse the internal empty String
 

one further possible enhancement: implement a setting, where the developer can specify which Strings should be loaded via the String.internal() function. Hint for this enhancement: When loading a large list of persons, the persons first names are many time the same ones (here it is Martin, Michael, Thomas, ...). Instead of instantiating every first name with a new String, the user should be able to specify that this field should be loaded via Stirng.internal() (with all its advantages and disadvantages)

Thanks for looking into this

Kind regards
Manuel Laggner

#2

Thank you for this analysis and for suggesting solutions.

Build 2.5.4 applies the empty string improvement.

The other enhancement was added as a feature request to the issue tracking system.

ObjectDB Support
#3

Thanks!

I just did an memory analysis with version 2.5.4 and it's looking much better now ;)

Reply