after the crash we cannot delete specific objects, see screenshots of explorer.
Database will be soon available (size 5.561.119 KB) on hummingbird-systems.com, user/pw your db name, main menu option objectdb
Type: Bug Reoprt | Version: 2.7.4_03 | Priority: Critical | Status: Fixed | Replies: 16 |
after the crash we cannot delete specific objects, see screenshots of explorer.
Database will be soon available (size 5.561.119 KB) on hummingbird-systems.com, user/pw your db name, main menu option objectdb
meanwhile the database became completely unusable, we had to stop and run the Doctor. Find attached the latest log
When the Doctor completes please post its output.
Exploring the cause of the recovery failure is still in progress.
we upload currently the database before Doctor to our extranet, coreSystemDb_.rar (size 5.589.297 KB). upload is complete
we are meanwhile in a problematic situation at that customer as all the other database systems (Oracle, MSSQL, MySQL) that they are using do recover without any problems from these unexpected server shutdowns.
what can we do with this, are there other options how we can reach more reliability in situations like these ?
Obviously ObjectDB should recover from such situations as well, and usually it does. Actually this is the first report of such a failure after recovery. Probably you do not have a copy of the database and the recovery file (odb$) after shutdown and before the server was restarted, but if you do have it may help in understanding the cause.
We will extend our testing of the recovery from failure. However, there is also a possibility that some sort of optimisation of your production OS or disk requires special consideration (e.g. if synchronized writing by ObjectDB to the disk returns before the physical writing is completed and the data is still buffered by the OS / disk). Therefore, if you will be able to run some tests that we will provide on your system it could help.
Possible immediate solutions that may help could be:
We may also increase the priority of a plan to implement an online doctor that can work on a running database, but this will take time as there are some technical challenges.
We will try to run the tests that you provide. For us it is important that there is work done on optimizing that case, not within some days, for sure. But we need a clear picture/plan when we could have an optimized behaviour to enable us communicating that to our customer
For my understanding
- does the recording support an automatic recovery, too ?
- is the replication possible using embedded mode ?
- does the recording support an automatic recovery, too ?
Recording enables manual recovery by running the Transaction Replayer.
- is the replication possible using embedded mode ?
Currently replication is only supported in client-server mode.
Could you please post your production objectdb.conf file?
find attached our configuration
You are using recovery in sync="false" mode:
<recovery enabled="true" sync="false" path="." max="128mb" />
In this mode, commit returns after writing and flushing the transaction updates to the recovery file (the database file itself may be updated later), but possibly before the updates are physically written to the disk, as the OS / disk manager may still buffer them.
Unfortunately this is an optimisation that involves some risk (as indicated in the manual).
Modern OS and disk managers may optimise IO operations by reordering them. In that case it is possible that some updates would be written physically to the database file before they are written to the recovery file, and then in case of system failure the recovery file may not always be able to complete a full recovery. Only sync="true" guaranties that the recovery file is always physically updated before the database file.
This is currently the only possible explanation for this recovery failure as no other issues were found in the examination of this failure.
thanks, we'll adapt accordingly
Please monitor performance after the configuration change and report as it may have significant effect on speed in applications with high transaction rate. We may be able to examine options to eliminate this risk of recovery failure with lower performance hit.
ok
our performance monitoring shows that creating of object structures is significantly slower than with the old setting, it is more or less not acceptable to slow down the system like this
File sync is a slow operation so this is expected. Although we cannot improve the speed of the sync operation itself (which is mainly hardware-dependent) we may try two possible improvements:
Which of this options seems to fit better to your needs?
We would prefer the 2nd option
OK. We will work on it in a high priority.
Version 2.7.5 includes an additional protection when recovery is enabled with no sync, by performing one sync of the recovery file before writing a group of updates to the physical database.