Dirty checking

#1

Hi, wondering if anyone knows the best way to detect which objects have been changed in the database.

ie: which objects, and which collections of objects, have been dirtied.

in the jdo api there is:

void postDirty(InstanceLifecycleEvent event)

http://www.objectdb.com/api/java/jdo/listener/DirtyLifecycleListener

am wondering what the equivalent in the jpa version is?

from what I can see, there are:

a) Lifecycle Listener classes, e.g:

http://www.objectdb.com/java/jpa/persistence/event

and

b) some config settings to track changes to collections, e.g:

http://www.objectdb.com/java/jpa/setting/entities

 

examples would be helpful

 

#2

You can use JPA lifecycle event methods, as you mentioned.

You may also use JDO methods (even if your application uses JPA) so you can check if an entity object is dirty using the static JDOHelper.isDeleted method.

ObjectDB Support
#3

> You may also use JDO methods (even if your application uses JPA) so you can check if an entity object is dirty using the static JDOHelper.isDeleted method.

excellent, thanks!

another question:

Ideally I'd like to capture all objects that were changed after a commit, i.e: after the transaction has succeeded, so that I can update views which are observing those objects.

It seems that the JPA lifecycle is being called after each object is changed, separately, in the middle of a transaction which might not suceed. So, I can't calculate any views that depend on other objects, because they may not have changed yet in that transaction, and also, the entire transaction could fail.

i.e: I'd really like to know after a commit what objects changed in the prior commit.

in other words a 'post-commit' listener. 

I can guarentee single threaded access to a single EntityManager in an embedded application, ie I can guarentee that no other transactions will execute until I've generated the readonly observable view queries, if that helps.

 

#4

In other words, given a graph of objects, I'd like to know all the objects that changed and commited, so that I can generate view messages based on that graph of objects. 

e.g: if a collection of User Entity objects changed in a transaction, I'd like to generate a Json object with a list of the Users which were Added, Deleted, and Changed.

That is a simple example, more complex views are composed of data from several different Entity clases.

So, one way to do this, though probably not the most efficient way, is to keep a list of all the Entities which changed during a transaction, then once the transaction commits, to follow with another transaction which uses those entities.

I can guarentee that this view tx follows immediately (because I'm serializing requests).

I suppose I guarentee that this view tx itself doesn't alter the database, by rolling it back after I've generated the views.

Your caching probably makes this fairly efficient (by definition all the data I want was in the last tx).

But, just wondering how you would do this?

#5

Your solution seems fine, although not all the details are clear. Why should you roll back that view transaction, what is the purpose of comitting it?

Note that you can also make use of a @Version field that is updated automatically for every entity object when it is changed and the change is committed.

ObjectDB Support
#6

> Your solution seems fine, although not all the details are clear. Why should you roll back that view transaction, what is the purpose of comitting it?

I see, so I just drop the transaction? Don't I have to either rollback or commit it?

Let's take a step back and explain the use case:

1) The client loads a view from the database.
2) Each view is a 'projection', or 'lens', or 'reactive' view over one or more entity classes.
3) After that, the client is reactive, ie: reacts to changes in the database.
4) The client sends requests to server, which makes changes to the database.
5) The server pushes changes to the client which updates its view.

In the Javascript world, libraries like Breeze.js and HorizonDB do this sort of thing.

There are plenty of reactive user interface libraries, knockoutjs, Vuejs (actually based on a proposal I made in 2013 in the RactiveJS project), etc.

So, now let's give a simple concrete example, with just 2 classes (the actual application has 20 Entity classes and 60+ request types):

Given two entities:

class Company{
String company_name
List<User> users
}

class User{
String username
}

Now assume a single html table, showing a list of users with these columns:

Company name, User name

So, first the client loads the user table.
The actual users rows may depend on whether the client is an administrator, or a group leader, etc.
Now that table is a projection/lens/reactive-query over a subset of the Company and User entity classes.

Next a request comes in which deletes a company.
So we need to send a message to the relevant clients to remove the rows in their table.

That's the basic scenario.

Notes:
1. if the transaction fails, then nothing gets sent out.
2. in the real application, figuring out what get sents to whom isn't so straightforward, ie: it can't be done mid-transaction. because most views are composed from several different entity classes, and might depend on whether both (or neither) are mutated. So these projections can only be calculated post commit.

So, based on current thinking, I need to:

1) set the config file to track array changes.
2) capture all entities changed during the transaction.
   - most likely this would be a list of both the entity class and it's id
3) then, if the transaction succeeds, immediately follow it with another transaction which calculates new views, based on which entities changed. ie: a single calculation that runs at the end of every request and calculates view changes.
4) That transaction must not alter the database! So, safest to just roll it back.

Correct?

Any better way to do this?

The way in which I'm doing it now is manually calculating changes specific to each of the 60 request types, which is error-prone, and requires significant testing.

What I'm looking to do is to be able to use dirty tracking to a single 'view projection update' calculation at the end of any request.

thoughts?

#7

OK. Rollback is fine, of course, and with your detailed explanations it is clear now why you need another transaction.

Note that you have to enable array change tracking only if you have Java arrays (e.g. int[], MyEntity[]), as change detection for Java collections and maps (which are preferred as JPA portable and more efficient) are automatic.

ObjectDB Support
#8

Great, thanks, I'm not using Java Arrays, just List<? extends Entity> collections.

Regarding rollback, the JPA state diagram seems to indicate that post commit entities are automatically set to 'Detached' state. 

However, unless the relevant parts of the object graph have already been retrieved, they won't be retrieved for the detached object. I'm not sure how ObjectDb handles that, am guessing that for embedded apps you have the entire graph already.

But just to be safe I'm running a second transaction and rollback.

My app is idempotent (I think) so it probably doesn't even matter if the view transaction immediately follows the mutation transaction, but I've serialized requests anyway so can guarentee that the each mutation transaction is immediately followed by a view transaction.

btw: given how fast ObjectDb is, this is a great use-case for your technology, you might want to think about building in 'reactive-view / real-time-projection / real-time-lense' capability!

 

#9

ps: I really did suggest the UI pattern upon which VueJs is based:

https://github.com/ractivejs/ractive/issues/366

 

 

#10

surprisingly, it looks like embeddable classes can't register change listeners?

ie: @Embeddable @EntityListeners (Listener.class)

So, having to go and make all @Embeddables into @Entities.

and, having to manually save each one.

Question:

What's the best way to cascade persistence of Entities contained by other Entities (i.e: that were previously Embeddables).

 

#11

Changes to embedded objects can be tracked by tracking the containing entity objects, i.e. an entity object with all its embedded objects is considered as one atomic unit regarding change detection.

Please use a new forum thread to discuss cascading, as it is not directly related to the subject of this forum thread ("Dirty checking").

ObjectDB Support
#12

ok, thanks, there are 'embedded' objects which have to be tracked independantly, so I'll just make them Entities instead.

will move cascading discussion to a new thread.

thanks

 

Reply