ObjectDB ObjectDB

Replication error on slave restart

#1

I'm trying to enable replication on master server that was running with "recovery" enabled previously.

I have copied database file to slave... and so on by manual. It worked fine. But if I restart slave server:

com.objectdb.o.UserException: Failed to synchronize replicated database
        at com.objectdb.o.MSG.d(MSG.java:61)
        at com.objectdb.o.RPT.d(RPT.java:99)
        at com.objectdb.o.RPT.<init>(RPT.java:66)
        at com.objectdb.o.SHN.aw(SHN.java:703)
        at com.objectdb.o.SHN.K(SHN.java:202)
        at com.objectdb.o.HND.run(HND.java:133)
        at java.lang.Thread.run(Unknown Source)

And, maybe it is irrelevant to this topic, if I enable "recording" on slave and then restart, I will get:

 java.lang.NullPointerException

        at com.objectdb.o.RPR.u(RPR.java:248)
        at com.objectdb.o.RPR.<init>(RPR.java:112)
        at com.objectdb.o.SMR.S(SMR.java:650)
        at com.objectdb.o.SMR.n(SMR.java:162)
        at com.objectdb.o.TOL.run(TOL.java:116)
        at com.objectdb.Server.runCommand(Server.java:198)
        at com.objectdb.Server.run(Server.java:103)
        at com.objectdb.Server.main(Server.java:62)

and my *.odb files on slave  turn to *.odb_nonclosed.

 

I'm using v2.5.4_01

edit
delete
#2

And could you, please, provide additional information about recording mode and replication in order to figure out how can I clean unused recording files or at least estimate sizing. Because I use objectDB in online game, and every day get over a million transactions.

edit
delete
#3

You should only enable recording for the master server and not for the slave server.

To save space please try limiting recording to write operations:

    <recording enabled="true" sync="false" path="." mode="write" />

Before you start replication again, copy the master database to the slave and remove any old recording files.

This should be done offline when the master server is not running. In addition, make sure you shut down the master server normally and then you should not have a recovery file (ended with $) with the master server. This procedure is required only once when installing the replication.

Every time you restart the master server a new recording file is created. You should be able to delete old recording files unless they are needed for slave servers that are currently offline. If a recording file is missing you will get the reported "Failed to synchronize replicated database" error message.

ObjectDB Support
edit
delete
#4
Failed to synchronize replicated database

Also appeared using clean DB (with recording from transaction 1) when I restarting slave server.

...and I updated to v2.5.4_04

edit
delete
#5

Another issue with replecation... Slave server doesn't like when I'm reusing ID, unlike master where everiting is perfect!

java.lang.NullPointerException
        at com.objectdb.o.SFL.ae(SFL.java:873)
        at com.objectdb.o.MST.ae(MST.java:1419)
        at com.objectdb.o.MST.Vh(MST.java:1330)
        at com.objectdb.o.WRA.Vh(WRA.java:396)
        at com.objectdb.o.WSM.Vh(WSM.java:184)
        at com.objectdb.o.STC.v(STC.java:514)
        at com.objectdb.o.STC.i(STC.java:248)
        at com.objectdb.o.STC.h(STC.java:142)
        at com.objectdb.o.RPR.run(RPR.java:203)
        at java.lang.Thread.run(Unknown Source)
com.objectdb.o.UserException: Attempt to reuse an existing primary key value (com.marlexgames.sm.db.Level:1)
        at com.objectdb.o.MSG.d(MSG.java:61)
        at com.objectdb.o.PPW.an(PPW.java:196)
        at com.objectdb.o.PGW.ai(PGW.java:201)
        at com.objectdb.o.UPT.C(UPT.java:134)
        at com.objectdb.o.URT.l(URT.java:171)
        at com.objectdb.o.TSK.i(TSK.java:145)
        at com.objectdb.o.TSK.f(TSK.java:95)
        at com.objectdb.o.TSM.e(TSM.java:86)
        at com.objectdb.o.UTT.A(UTT.java:365)
        at com.objectdb.o.UTT.l(UTT.java:203)
        at com.objectdb.o.TSK.i(TSK.java:145)
        at com.objectdb.o.TSK.f(TSK.java:95)
        at com.objectdb.o.MST.Vg(MST.java:1293)
        at com.objectdb.o.WRA.Vg(WRA.java:381)
        at com.objectdb.o.WSM.Vg(WSM.java:153)
        at com.objectdb.o.STC.u(STC.java:497)
        at com.objectdb.o.STC.i(STC.java:245)
        at com.objectdb.o.STC.h(STC.java:142)
        at com.objectdb.o.RPR.run(RPR.java:203)
        at java.lang.Thread.run(Unknown Source)

Level has program generated @id. And sometimes I use "DELETE lvl FROM level lvl" to reassemble level data (it sometimes has some structural changes) 

 

And "Failed to synchronize replicated database" appears regardless to this issue

edit
delete
#6

My primary goal is to get master server to max uptime. And to use slave server in read-only mode to process statistics.
But it appears that slave server fails to replicate master, after it restarted. And even restarting master sometimes cause this exseption.

edit
delete
#7

Restarting the servers should not stop replication, of course, unless recording is not working on the master server.

Could you please post the configuration files of your servers?

ObjectDB Support
edit
delete
#8

MASTER

<objectdb>
<general>
  <temp path="$temp" threshold="128mb" />
  <network inactivity-timeout="0" />
  <url-history size="50" user="true" password="true" />
  <log path="$objectdb/log/" max="8mb" stdout="false" stderr="false" />
  <log-archive path="$objectdb/log/archive/" retain="90" />
  <logger name="*" level="info" />
</general>
<database>
  <size initial="256kb" resize="256kb" page="2kb" />
  <recovery enabled="false" sync="false" path="." max="128mb" />
  <recording enabled="true" sync="false" path="." mode="write" />
  <locking version-check="true" />
  <processing cache="64mb" max-threads="8" />
  <query-cache results="32mb" programs="500" />
  <extensions drop="temp,tmp" />
</database>
<entities>
  <enhancement agent="false" reflection="warning" />
  <cache ref="weak" level2="0" />
  <persist serialization="false" />
  <cascade-persist always="auto" on-persist="false" on-commit="true" />
  <dirty-tracking arrays="false" />
</entities>
<schema>
</schema>
<server>
  <connection port="6136" max="100" />
  <data path="$objectdb/db" />
</server>
<users>
  <user username="xxx1" password="xxx2" admin="true">
   <dir path="/" permissions="access,modify,create,delete" />
  </user>
</users>
<ssl enabled="false">
  <server-keystore path="$objectdb/ssl/server-kstore" password="pwd" />
  <client-truststore path="$objectdb/ssl/client-tstore" password="pwd" />
</ssl>
</objectdb>

SLAVE

<objectdb>
<general>
  <temp path="$temp" threshold="128mb" />
  <network inactivity-timeout="0" />
  <url-history size="50" user="true" password="true" />
  <log path="$objectdb/log/" max="8mb" stdout="false" stderr="false" />
  <log-archive path="$objectdb/log/archive/" retain="90" />
  <logger name="*" level="info" />
</general>
<database>
  <size initial="256kb" resize="256kb" page="2kb" />
  <recovery enabled="true" sync="false" path="." max="128mb" />
  <recording enabled="false" sync="false" path="." mode="write" />
  <locking version-check="true" />
  <processing cache="64mb" max-threads="8" />
  <query-cache results="32mb" programs="500" />
  <extensions drop="temp,tmp" />
</database>
<entities>
  <enhancement agent="false" reflection="warning" />
  <cache ref="weak" level2="0" />
  <persist serialization="false" />
  <cascade-persist always="auto" on-persist="false" on-commit="true" />
  <dirty-tracking arrays="false" />
</entities>
<schema>
</schema>
<server>
  <connection port="6136" max="100" />
  <data path="$objectdb/db" />
  <replication url="objectdb://xxx.com:6136/SM/sm.odb;user=xxx1;password=xxx2" />
</server>
<users>
  <user username="xxx1" password="xxx2" admin="true">
   <dir path="/" permissions="access,modify,create,delete" />
  </user>
</users>
<ssl enabled="false">
  <server-keystore path="$objectdb/ssl/server-kstore" password="pwd" />
  <client-truststore path="$objectdb/ssl/client-tstore" password="pwd" />
</ssl>
</objectdb>
edit
delete
#9

In order to diagnose the problem, could you please try changing the master recording configuration:

    <recording enabled="true" sync="false" path="." mode="all" />
ObjectDB Support
edit
delete
#10

Same result with mode="all".

edit
delete
#11

Something is unclear with the attached files.

The Slave directory contains a database file (sm.odb) with 4 entity classes, 5 embeddable classes and 108 entity objects.

The master directory contains an empty database file (sm.odb) with no classes and no objects.

Could you please explain what happened to the master database and when and how it became empty?

Because in this situation, when the master database is empty, obviously replication cannot succeed.

ObjectDB Support
edit
delete
#12

Interesting... Maybe I can explain what I did, and than you will explain why it's empty...

Test Case #1:

1) I have 2 separate third-party servers

2) Deploy and Run Application on master (new DB file, no transactions for now)

3) Start ODB Server on slave.

4) Make changes in application (now it has some transactions)

5) View db files over explorer both master and slave (directly through /$replication path on slave) [IDENTICAL]

6) restart ODB server on slave [NO EXEPTIONS]

7) While application still makes changes to db, restarting ODB server on slave [GOT Failed to synchronize]

8) repeated N5. [NOT IDENTICAL]

9) Stoped slave ODB server, because it continues to send me exeptions

10) sent you an archive file

 

As for "empty master db" case #2:

1) non a single transaction has been made since last test, master has not been restarted ever.

2) I opened master db with explorer [DATA PRESENT - AND UP TO DATE]

3) I copied sm.db file to my PC and opened it with explorer [EMPTY]

4) I copied sm.odr folder to my PC and opened it with explorer [EMPTY]

5) I changed my local ODB conf (recovery=false, recording=true), restarted and opened it with explorer [DATA PRESENT - BUT NOT UP TO DATE (more simular with slave)]

So my answer for your question "Could you please explain what happened to the master database and when and how it became empty?" is maybe it because I did not restart master ODB server since begining of the test.

And now I have trouble that worry me a lot! In february I switched ODB server to "recording" on main production server. What are the chances of data loss if I switch it back to recovery mode? Or I ended up using recording mode?

edit
delete
#13

I completely do not understand replication behavior of ODB!

Test Case #3:

1) same for the end of last test (master:no restarts,no NEW transactions; slave:ODB was stoped since end of case 1)

2) started slave ODB server [NO EXCEPTIONS! DATA IS UP TO DATE](how? why?)

3) made 1 change to master ODB

4) opened master DB with explorer (User table see master.png) [UP TO DATE]

5) opened slave DB with explorer (User table see slave.png) [NOT UP TO DATE]

6) got this on slave after step 3:

com.objectdb.o.UserException: Optimistic lock failed (see multiple nested exceptions)
        at com.objectdb.o.MSG.d(MSG.java:61)
        at com.objectdb.o.UTT.J(UTT.java:574)
        at com.objectdb.o.UTT.l(UTT.java:278)
        at com.objectdb.o.TSK.i(TSK.java:145)
        at com.objectdb.o.TSK.f(TSK.java:95)
        at com.objectdb.o.MST.Vg(MST.java:1293)
        at com.objectdb.o.WRA.Vg(WRA.java:381)
        at com.objectdb.o.WSM.Vg(WSM.java:153)
        at com.objectdb.o.STC.u(STC.java:497)
        at com.objectdb.o.STC.i(STC.java:245)
        at com.objectdb.o.STC.h(STC.java:142)
        at com.objectdb.o.RPR.run(RPR.java:203)
        at java.lang.Thread.run(Unknown Source)
java.lang.NullPointerException
        at com.objectdb.o.SFL.ae(SFL.java:873)
        at com.objectdb.o.MST.ae(MST.java:1419)
        at com.objectdb.o.MST.Vh(MST.java:1330)
        at com.objectdb.o.WRA.Vh(WRA.java:396)
        at com.objectdb.o.WSM.Vh(WSM.java:184)
        at com.objectdb.o.STC.v(STC.java:514)
        at com.objectdb.o.STC.i(STC.java:248)
        at com.objectdb.o.STC.h(STC.java:142)
        at com.objectdb.o.RPR.run(RPR.java:203)
        at java.lang.Thread.run(Unknown Source)

7) got exception from step 6 every time repeating step 3

As you can see slave is updated now only to the end of the first test case (1 day ago). Master is up to date. And as you can see I have useful field "updatedAt" which sets actual timestamp (GMT+0) everytime User entity changes (in case of misunderstanding of actual time master and slave are (GMT+1) my PC (GMT+4 and so *.png)

edit
delete
#14

if it helps I'm using GlassFish 4.0 with JPA2.1 (all: master, slave, local). All communication with DB is over EJB persistance-unit (with objectdb-jee.jar v2.5.4_04)

edit
delete
#15

It is difficult to advise because the cause of this unexpected behavior is still unclear.

According to your description it seems that maybe something is blocking ObjectDB from writing changes and transactions to the disk. So the master database is not being updated regularly, regardless of using replication. Therefore, it could be that the replication issue is just a symptom to a bigger problem. This may explain the empty database and recording files.

Does it also happen with the production server? Can you check when the database file (and recording files if enabled) are physically updated? Maybe only when you close the EntityManagerFactory? stop GlassFish? stop the ObjectDB server?

Check it first in your development environment in order to avoid losing data in your production database. After stopping the production server (and full backup), it would be probably better to start it again with recovery enabled. You may also keep both recovery and recording modes enabled.

If you can provide a test that demonstrates the issue with detailed instructions how to run it , it would probably be possible to understand this problem, and if necessary provide a fix.

ObjectDB Support
edit
delete
#16

Yes if I restart GlassFish/ODB server sm.odb physically changes. Until that only last TrancactionID.odr is updating

edit
delete
#17

If the recording is being updated it may be ok, although the database file should have been updated at least occasionally (even if not on every transaction).

In #13 above the recording was also empty.

Anyway, in order to proceed with checking of this issue we will need a test case that we can run.

ObjectDB Support
edit
delete

Reply

To post on this website please sign in.