ObjectDB ObjectDB

Vacuum - Reclaiming Unused Space In Odb

#1

I was curious if you could assist me with a challenge we are having with our objectdb database.  Our odb files are about 50Gb in size and are written to about a million times a day with both adds and updates.  This is creating a lot of unused space in the database that we are wanting to reclaim.  Other database vendors provide a vacuum process to allow dumping a database to a flat file (basically a sql dump) and reload.  Is there a capability in ObjectDb to provide a vacuum?  The online backup feature appears to copy the dead space as well so isn't performing the vacuum.

If there is no way to do this other than manually implementing code to read, delete, and write to a new odb, is there an algorithim that you could provide that plays to ObjectDb's strength.  We have half a Tb of data to clean up and want to do this as efficiently as possible.

Thanks!

edit
delete
#2

Unused space in the database is automatically allocated for new stored data, so it is not lost, and usually there is no need to reclaim it.

You may be able to get a vacuum effect by running the Doctor in repair mode. The new generated database is expected to be compacted. Currently this is available only offline and not when the database is open.

 

 

ObjectDB Support
edit
delete
#3

Thank you for this information.  We have executed the doctor a couple times on some larger database we have.  The results were somewhat surprising.  Neither execution of Doctor found any errors.

Original File Size        Post Doctor File Size           Outcome

17,281,068                  14,038,528                           18.8% Reduction

18,572,690                  22,776,320                           22.6% Increase In Size

Can you explain why a database would increase in size after running the doctor utility?  

edit
delete
#4

It is possible that the new database after running the Doctor will be larger because of the use of BTrees (in which pages are at least half full but usually not completely full), since the exact size depends on the order in which the data is inserted. But any unused space in a BTree is normal and may help accelerating future operations.

Anyway, there is probably no much wasted space in these databases before and after running the Doctor.

50GB is not very large for a modern hard drive (including SSD). Why is it important to compact it to a minimum? 

 

ObjectDB Support
edit
delete
#5

We are trying to optimize the database size due to the time it takes to perform an online backup for large databases.  Our organization requires daily backups on our data and we have a limited window of time to perform these backups.  We've looked into using the replication strategy that ObjectDb enables however since we cannot guarantee data from the master is replicated to the slave, the slaves cannot be trusted as a backup.  We have recently been granted permission from our business to have a monthly maintenance window where we can bring the database down for a couple of hours so we are investigating other strategies such as the Doctor utility.

edit
delete
#6

> since we cannot guarantee data from the master is replicated to the slave, the slaves cannot be trusted as a backup.

Actually backing up the replicated database is the recommended solution. It also provides an additional live backup, which could be more useful than the daily backup in case of a disaster (since it is more up to date).

Maybe you can check the replicated database before running a backup for some recent transaction. Since replication is sequential if that transaction is available, all the preceding transactions are also expected to be available in the replicated database. 

ObjectDB Support
edit
delete
#7

Based on your feedback, I think you may have misinterpreted my meaning.  We do not use replication because ObjectDb does not guarantee that every request received by the master is replicated to the slave server(s).  Therefore we cannot use replication because the slave has the potential to be different than the master.  Without that guarantee, we cannot rely on the feature nor go through additional measures to "validate" that replication worked.  Thank you for your feedback and I believe we have recently arrived at a strategy that will work going forward.  We'll reach out to you for additional help if our newest backup strategy doesn't work.  Warm regards.

edit
delete
#8

Happy to hear that you found a solution. However, could you please clarify the following statement:

> We do not use replication because ObjectDB does not guarantee that every request received by the master is replicated to the slave server(s).

What exactly do you mean? Every request is replicated. ObjectDB cannot guarantee the exact timing of every transaction replication, but if the slave and master servers are running and there is a connection between them then replication will be completed. Even if there is a failure, restarting the slave and the connection to the master should complete all missing updates. You can also check for a specific update in the slave and if it exists, all preceding updates also exist.

ObjectDB Support
edit
delete
#9

Kevinwh,

I'm looking into ObjectDB for a new product I'm working on and am interested in the reasons why you say "ObjectDb does not guarantee that every request received by the master is replicated to the slave server(s)". Did you experience issues with data not replicated to a slave?

Also interested in the other strategy you found for backup if you can share.

Thanks

edit
delete

Reply

To post on this website please sign in.