Selective merge/cascade of detatched entity

#1

I'm having an issue with my application, and I'm hoping you guys will be able to help. Please forgive typos and obvious errors, I'm having to retype from a non-internet connected network.

A contrived example is:

@Entity
public class Salesman
{
  private String name;

  @OneToMany(fetch=FetchType.EAGER, cascade=CascadeType.All)
  private Map<Company, Collection<Integer>> companyTargets;
}

@Entity
public class Company
{
  private String name;
}

For the sake of the example, each salesman has many numerical targets for each company. Two salesman will have a different set of numerical targets for the same company.

The application is an unusual one, in that it keeps all objects in memory at all times, in a Detached state (I realise the implications here, there are good reasons for this approach). All potential changes are checked with application logic before being committed to the database, so a commit will never fail. Any time an object is modified it is checked that the modification would be valid, and then committed.

Lets assume I have many Salesmen, and add a new Company. Each salesman is given many numerical targets for the new company. I then need to merge each salesman back into the database to keep it in step. The code i have to do this is akin to:

EntityManager em = emf.getEntityManager();
em.getTransaction().begin();
Company company = new Company("new");
em.persist(company);
em.getTransaction().commit();
giveSalesmenNewTargets(salesmen, company);
for (Salesman s : salesmen)
  em.merge(s);

This appears to have the effect of checking and merging the entire companyTargets map for every salesman, and because there are thousands of Company objects attached to each Salesman, and each Salesman has hundreds of targets for each Company, this merge takes a significant amount of time.

What I would Ideally like to happen would be for there to be some way to re-attach the detached salesman with some form of 'I promise I haven't changed anything' flag, then to perform the modification, and for things that change after the re-attach happens to be changed in the database. Is there some way to achieve this? It might look something like:

EntityManager em = emf.getEntityManager();
em.getTransaction().begin();
Company company = new Company("new");
em.persist(company);
em.getTransaction().commit();
em.getTransaction.begin();
for (Salesman s : salesmen)
  em.uncheckedMerge(s);
giveSalesmenNewTargets(salesmen, company);
em.getTransaction().commit();

Thanks very much,

Phil

#2

Do you use enhanced classes?

Merging non dirty objects is expected to be more efficient when classes are enhanced.

ObjectDB Support
#3

Hi,

Thanks for the reply. I've introduced the -javaagent option as documented https://www.objectdb.com/java/jpa/tool/enhancer#Load_Time_Java_Agent_Enhancement_

This improved performance by between 5-30%, but this is still significantly slower than I'd hoped. I performed printlns at stages in the above code, where there were 100,000 Salesmen, and adding a single target to each salesman for the same (pre-persisted) company. The following code was run:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("objectdb:/var/tmp/points.tmp;drop");
EntityManager e = emf.createEntityManager();
e.getTransaction().begin();
Company company = new Company("testCompany");
e.getTransaction().persist(company);
e.getTransaction().commit();
HashSet<Salesman> salesmen = new HashSet<Salesman>();
e.getTransaction().begin();
for (int i = 0; i < 100000; i++)
{
  Salesman s = new Salesman(i+"");
  s.getCompanyTargets().put(company,new HashSet<Integer>());
  e.persist(s);
  salesmen.add(s);
}
e.getTransaction().commit();
e.close();
for (int i = 0; i < 10 ; i++)
{
  EntityManager em = emf.getEntityManager();
  long time = System.currentTimeMillis();
  em.getTransaction().begin();
  System.out.println("Before Merge:\t"+(System.currentTimeMillis()-time));
  for (Salesman s : salesmen)
    em.merge(s);
  System.out.println("After Merge:\t"+(System.currentTimeMillis()-time));
  for (Salesman s : salesmen)
    s.getCompanyTargets().get(company).add(i);
  System.out.println("Before commit:\t"+(System.currentTimeMillis()-time));
  em.getTransaction().commit();
  System.out.println("After commit:\t"+(System.currentTimeMillis()-time));
  em.close();
  System.out.println("After close:\t"+(System.currentTimeMillis()-time));
}

To produce output such as:

Before Merge: 0
After Merge:  2383
Before Commit: 2462
After Commit: 2470
Before Merge: 0
After Merge:  2029
Before Commit: 2068
After Commit: 2931
Before Merge: 0
After Merge:  1862
Before Commit: 1901
After Commit: 2534
Before Merge: 0
After Merge:  1664
Before Commit: 1704
After Commit: 2322
...

As you can see, the merge operations are the ones taking the most time, albeit not a consistent amount (most likely due to the VM im on). These readings were taking with -javaagent:${workspace_loc:Tutorial}/lib/objectdb.jar in the eclipse 'VM Arguments' section.

 

Is there a better way I can achieve this sort of operation?

Thanks,

Phil

#4

In fact, a more useful snippet for improvement would be where I'm inserting a new company each time, such as:

EntityManagerFactory emf = Persistence.createEntityManagerFactory("objectdb:/var/tmp/points.tmp;drop");
EntityManager e = emf.createEntityManager();
HashSet<Salesman> salesmen = new HashSet<Salesman>();
e.getTransaction().begin();
for (int i = 0; i < 100000; i++)
{
  Salesman s = new Salesman(i+"");
  e.persist(s);
  salesmen.add(s);
}
e.getTransaction().commit();
e.close();
for (int i = 0; i < 10 ; i++)
{
  EntityManager em = emf.getEntityManager();
  em.getTransaction().begin();
  Company company = new Company("testCompany");
  em.getTransaction().persist(company);
  em.getTransaction().commit();
  long time = System.currentTimeMillis();
  em.getTransaction().begin();
  System.out.println("Before Merge:\t"+(System.currentTimeMillis()-time));
  for (Salesman s : salesmen)
    em.merge(s);
  System.out.println("After Merge:\t"+(System.currentTimeMillis()-time));
  for (Salesman s : salesmen)
  {
    Collection<Integer> tgts = new HashSet<Integer>();
    tgts.add(i);
    s.getCompanyTargets().put(company, tgts);
  }
  System.out.println("Before commit:\t"+(System.currentTimeMillis()-time));
  em.getTransaction().commit();
  System.out.println("After commit:\t"+(System.currentTimeMillis()-time));
  em.close();
  System.out.println("After close:\t"+(System.currentTimeMillis()-time));
}

Which produces timings like:

Before Merge: 0
After Merge:  2528
Before Commit: 2601
After Commit: 2602
Before Merge: 0
After Merge:  2410
Before Commit: 2499
After Commit: 4463
...
Before Merge: 0
After Merge:  5329
Before Commit: 5355
After Commit: 6958
Before Merge: 0
After Merge:  7679
Before Commit: 7710
After Commit: 9695

Are you able to assist here? For this example i changed companyTargets' cascadeType to be {} (i.e. none) since I thought that might explain the increasing commit/merge times, but to no avail.

Thanks,

Phil

#5

Could you please provide a full runnable test, so we will be able to investigate it?

ObjectDB Support
#6

Thanks for the interest.

I've uploaded the source to the example, it just needs compiling. Main is the class to run, and I've included a sample run text file. The VM I ran it on is extremely limited, so I was forced to work with reasonably low numbers of objects. If this completes too fast on your machines, try altering the main function to be:

public static void main(String[] args)
        {
                HashSet<Salesman> salesmen = createSalesmen(100);
                addCompaniesToSalesmen(salesmen, 1000, 100);
                Salesman emptied = redistributeTargets(salesmen);
                System.out.println("Salesman " + emptied + " now has targets:" + emptied.getCompanyTargets());
        }

The performance I'm looking for is roughly constant time to add a new company (i.e. one outer loop iteration in the addCompaniesToSalesmen function). I've also included the second bottleneck use-case, in the function redistributeTargets. This should complete as quickly as possible. It is ultimately the speed of these two use cases that will decide whether objectdb is worth the investment for my company/project.

Let me know if you need anything else.

Thanks very much for looking in to this with me,

Phil

#7

Thank you for the test program. It definitely demonstrates activity that is too slow.

The problem is with the cascading merge operation that becomes too heavy when the database becomes larger. Notice that on each operation you merge all the salesmen with all their content (i.e. actually the entire database) and this is inefficient.

Replacing the merge with find solves the problem:

    s = em.find(Salesman.class, s.getId());
   // em.merge(s);

The last output lines in that case are:

Company Before Merge: 0
Company After Merge: 4
Company Before commit: 6
Company After commit: 6
Company After close: 6
Company Total time: 794
Redist Before Merge: 23
Redist After Merge: 31
Redist Before Commit: 31
Redist Total Time:  31
Salesman 92 now has targets:{}

I will try to use your test to check if the merge operation could be improved, but the solution, if possible, should be avoiding merge.

By the way, at the end of the test you should add:

    emf.close();
ObjectDB Support
#8

Hi,

Thanks very much for your reply, your alternate approach really does improve things! My only remaining concern is that the reason the detached objects are kept in memory is that they are shared between multiple threads (all using 'synchronized' operations for field access). Could this 'find' method ever return a different object (i.e. original != found but original.equals(found)), thus breaking the sharing?

Otherwise fantastic improvement, thanks very much!

Phil

#9

Each EntityManager manages its own view of the database using its own isolated entity objects. To avoid multiple memory objects for the same database object you may try sharing one EntityManager for all the threads.

I am not sure that the solution of detached shared objects, as is, will work. Please notice that merge doesn't use the detached object but just synchronizes a managed object in the EntityManager (Which is not shared) with the up to date content of the detached object. Accordingly, it is common to use the return value of merge, which is the managed object:

    s = em.merge(s);

So actually you have to replace the detached object after update. So using find and switching to a new detached object after the update will have the same effect, but will work much faster.

I would try, however, simply sharing the EntityManager and avoiding detached objects and merging.

ObjectDB Support
#10

The support from you is absolutely fantastic, thanks! I'll look at using the same EntityManager for all threads, but this will likely introduce locking/synchronize issues, so I'm not sure it's a good solution without a significant rework of the existing code, which I was hoping to avoid. If you can find a way to have a merge type method that only watches for future changes, that might go a long way to solving my issue.

Thanks very much for your support!

Phil

#11

Actually you can consider find as a version of merge that ignores detached changes.

When using merge, the detached entity argument is used only for integrating changes. As noted above, the detached object is not used by the EntityManager to represent the object later,  but a different managed object is retrieved (and merge returns it) and any further changes are reflected in that object and not in the detached object.

ObjectDB Support

Reply