Thank you for the test case. Build 2.6.3_05 includes some required fixes.
In order to pass your test you will have to use the new build but also to tune the configuration as follows.
<temp path=... threshold="1mb" />
The threshold indicates a maximum size for a memory based list. Every flush is stored as a separate list. 10,000 entity objects (in your current test) consume about 2MB. Therefore, if the threshold is higher, every flush will be stored in memory rather than in a temporary file. Unfortunately this configuration is per list and not global. It may be better to use a larger threshold, e.g. 10mb, but also flush every 100,000 entity objects instead of very 10,000 entity objects.
<cache ref="weak" level2="0" />
When the 2nd level cache is enabled, ObjectDB manages in-memory list of all the flushed entity IDs, in order to avoid retrieving obsolete content from the L2 cache. In your test the PKs are large (32 characters, and about 160 bytes overhead in total per entity object), so millions of objects will require larger heap size.
As discussed above, this implementation is a temporary solution, and should be used only if there are no other options. It is also slow. A much better solution (faster and with unlimited transaction size) is now planned for ObjectDB 3.0, but will not be released before 2017.