Issue #2284: cannot delete objects after crash (see issue 2283)

Type: Bug ReoprtVersion: 2.7.4_03Priority: CriticalStatus: FixedReplies: 16
#1

after the crash we cannot delete specific objects, see screenshots of explorer.

Database will be soon available (size 5.561.119 KB) on hummingbird-systems.com, user/pw your db name, main menu option objectdb 

#2

meanwhile the database became completely unusable, we had to stop and run the Doctor. Find attached the latest log

 

#3

When the Doctor completes please post its output.

Exploring the cause of the recovery failure is still in progress.

ObjectDB Support
#4

we upload currently the database before Doctor to our extranet, coreSystemDb_.rar (size 5.589.297 KB). upload is complete

we are meanwhile in a problematic situation at that customer as all the other database systems (Oracle, MSSQL, MySQL) that they are using do recover without any problems from these unexpected server shutdowns.

what can we do with this, are there other options how we can reach more reliability in situations like these ?  

#5

Obviously ObjectDB should recover from such situations as well, and usually it does. Actually this is the first report of such a failure after recovery. Probably you do not have a copy of the database and the recovery file (odb$) after shutdown and before the server was restarted, but if you do have it may help in understanding the cause.

We will extend our testing of the recovery from failure. However, there is also a possibility that some sort of optimisation of your production OS or disk requires special consideration (e.g. if synchronized writing by ObjectDB to the disk returns before the physical writing is completed and the data is still buffered by the OS / disk). Therefore, if you will be able to run some tests that we will provide on your system it could help.

Possible immediate solutions that may help could be:

  • Enabling recording, as the database can be restored from recording files.
  • Enabling replication, generating a live backup database.

We may also increase the priority of a plan to implement an online doctor that can work on a running database, but this will take time as there are some technical challenges.

ObjectDB Support
#6

We will try to run the tests that you provide. For us it is important that there is work done on optimizing that case, not within some days, for sure. But we need a clear picture/plan when we could have an optimized behaviour to enable us communicating that to our customer

For my understanding

- does the recording support an automatic recovery, too ?

- is the replication possible using embedded mode ?

#7

- does the recording support an automatic recovery, too ?

Recording enables manual recovery by running the Transaction Replayer.

- is the replication possible using embedded mode ?

Currently replication is only supported in client-server mode.

Could you please post your production objectdb.conf file?

ObjectDB Support
#8

find attached our configuration

#9

You are using recovery in sync="false" mode:

    <recovery enabled="true" sync="false" path="." max="128mb" />

In this mode, commit returns after writing and flushing the transaction updates to the recovery file (the database file itself may be updated later), but possibly before the updates are physically written to the disk, as the OS / disk manager may still buffer them.

Unfortunately this is an optimisation that involves some risk (as indicated in the manual).

Modern OS and disk managers may optimise IO operations by reordering them. In that case it is possible that some updates would be written physically to the database file before they are written to the recovery file, and then in case of system failure the recovery file may not always be able to complete a full recovery. Only sync="true" guaranties that the recovery file is always physically updated before the database file.

This is currently the only possible explanation for this recovery failure as no other issues were found in the examination of this failure.

ObjectDB Support
#10

thanks, we'll adapt accordingly

#11

Please monitor performance after the configuration change and report as it may have significant effect on speed in applications with high transaction rate. We may be able to examine options to eliminate this risk of recovery failure with lower performance hit.

ObjectDB Support
#12

ok

#13

our performance monitoring shows that creating of object structures is significantly slower than with the old setting, it is more or less not acceptable to slow down the system like this

#14

File sync is a slow operation so this is expected. Although we cannot improve the speed of the sync operation itself (which is mainly hardware-dependent) we may try two possible improvements:

  • Improving the speed of the sync mode by delaying returning from commit, gathering several concurrent committing transactions together and reducing the number of sync operations by sync per such group of transactions. This would be useful only in highly multi threading applications.
  • Improving the reliability of the no sync setting, by delaying physical writing to the database, accumulating updates, and performing one sync of the recovery file before writing a group of updates to the physical database. With this improvement recovery is expected to be reliable with small effect on performance but you may still lose the last committed transactions after recovery (just that the database will not become corrupted).

Which of this options seems to fit better to your needs?

ObjectDB Support
#15

We would prefer the 2nd option

#16

OK. We will work on it in a high priority.

ObjectDB Support
#17

Version 2.7.5 includes an additional protection when recovery is enabled with no sync, by performing one sync of the recovery file before writing a group of updates to the physical database.

ObjectDB Support

Reply