ObjectDB ObjectDB

Issue #2681: Online Backup degrades responsiveness.

Type: Bug ReoprtVersion: 2.8.5_04Priority: HighStatus: FixedReplies: 14
#1

I've had an hourly job running the following for many months now and it's been fine until these 2 days:

        try {
            CoreService.getLogger(this).info("DB Backup Start...");
            TypedQuery<Thread> backupQuery =
                em.createQuery("objectdb backup", Thread.class);
            Thread backupThread = backupQuery.getSingleResult();
            backupThread.join(); 
            CoreService.getLogger(this).info("DBService Backup DONE!");
        } finally {
            em.close();
        }

These 2 days I notice that the server, or mainly database would start to have issues responding during the end of the backup process.  e.g. The Backup usually takes about 5 minutes.  When the "DB Backup Start..." starts, 2-3 minutes after I would have lots of players drop off and my server just can't respond properly when there are DB calls involved.  Once the "DBService Backup DONE!" log happens, within 500ms everything goes back to normal.

 

As I said, this only started happening these 2 days, and only when the server is busy, e.g. the DB is getting hit harder than when it's not during the busy time of the day.

The db is working fine. But it has grown to 20gig in size with 79 million records. I don't know if it's because the size of the DB making the backup process behave badly. But for now I have schedule the backup only to 2:30am so that I get less disruption. It's not ideal so I hope you can help in trying to reproduce the issue with a large DB as I described and about 100 concurrent threads reading and writing to the DB per second.

Thanks

edit
delete
#2

Actually I have a feeling it is due to the backup causing disk activity that the disk cannot handle.  Because when I copy very large files to and/or from that drive where the database file resides(as a test)  It gets the same symptoms.

Is there a way to make the backup go slower and lower the disk usage for backup?

I've upgraded the drive to 4x faster, and it did result in lower error count, but still not completely fixes the issue and I'm not sure how scalable this and costly it would be as the DB gets bigger.

edit
delete
#3

It doesn't seem too complicated to support a configurable delay (sleep, in milliseconds), e.g. every 1MB of copy. Would you be able to test a new build with such new feature?

ObjectDB Support
edit
delete
#4

Thanks for your help.  I think that might help with the issue.  Can you make it so that it can be configured via java e.g.

backupQuery.setParameter("sleepInMillisPerMB", 500);// sleep for 500ms every 1MB

That way I can test it live in prod and see the effects and adjust accordingly.

 

edit
delete
#5

Just wanted to see what's going on with this.  Haven't heard back for a while.

edit
delete
#6

Your request in #4 is implemented in build 2.8.6_01. Please try it.

ObjectDB Support
edit
delete
#7

Thanks I'll report back.

edit
delete
#8

So I have tried it but it's not working.

This following will work as usual:

        EntityManager em = getEM();
        try {
            getLogger(this).info("Backup Start...");
            TypedQuery<Thread> backupQuery = em.createQuery("objectdb backup", Thread.class);
            Thread backupThread = backupQuery.getSingleResult();
            backupThread.join(); 
            getLogger(this).info("Backup DONE!");
        } finally {
            em.close();
        }

 

The following will not cause the DB to start the backup, no backup file is generated and the logs "Backup Start..." will be logged followed by the "Backup DONE!" within a few miliseconds.

 

        EntityManager em = getEM();
        try {
            getLogger(this).info("Backup Start...");
            TypedQuery<Thread> backupQuery = em.createQuery("objectdb backup", Thread.class);
            backupQuery.setParameter("sleepInMillisPerMB", 500L));// sleep for x ms every 1MB
            Thread backupThread = backupQuery.getSingleResult();
            backupThread.join(); 
            getLogger(this).info("Backup DONE!");
        } finally {
            em.close();
        }
edit
delete
#9

Please try 500 instead of 500L, as in #4. The expected parameter is Integer. This is a good point and the next build will except any type of Number.

However, it is strange that no backup is created. Just tried it and 500L is silently ignored, but a backup is generated.

ObjectDB Support
edit
delete
#10

Unfortunately I have tried int first, then I tried L after, both had the same results.

edit
delete
#11

Are you using embedded mode or client-server mode?

If you tried client-server mode, can you try embedded mode?

It is unclear how the use of this parameter can stop backup.

ObjectDB Support
edit
delete
#12

I use client/server mode and I have never tried embedded.  I am not sure if I can do embedded mode for my application, would be a big change.

edit
delete
#13

There was indeed an issue with this feature in client-server mode.

Please try build  2.8.6_02.

ObjectDB Support
edit
delete
#14

Thanks it has managed to throttle the throughput.  Now I just have to see if this actually fixes original errors.

edit
delete
#15

This has been confirmed to fix the original problem in production.  Thanks so much again for the unmatched support!

edit
delete

Reply

To post on this website please sign in.