Query on primary key is slow when using 'IN'

#1

Hi! We have an entity with a composite primary key on "primaryVariantOid" and "storeOid" fields:

(I removed unnecessary fields and methods to make the code shorter and hopefully compile with just the JDK)

import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.IdClass;
import java.io.Serial;
import java.io.Serializable;
import java.util.Objects;

@Entity
@IdClass(ProductVariantStoreData.DatabaseId.class)
public final class ProductVariantStoreData implements Serializable {
    
    @Serial
    private static final long serialVersionUID = -4357827914123980944L;
    
    @Id
    private String productVariantOid;
    
    @Id
    private String storeOid;
    
    private ProductVariantStoreData() {}
    
    public ProductVariantStoreData(String productVariantOid, String storeOid) {
        this.productVariantOid = productVariantOid;
        this.storeOid = storeOid;
    }
    
    public static final class DatabaseId {
        
        final String productVariantOid;
        final String storeOid;
        
        public DatabaseId(String productVariantOid, String storeOid) {
            this.productVariantOid = productVariantOid;
            this.storeOid = storeOid;
        }
        
        @Override
        public boolean equals(Object obj) {
            return obj instanceof DatabaseId other &&
                    Objects.equals(productVariantOid, other.productVariantOid) &&
                    Objects.equals(storeOid, other.storeOid);
        }
        
        @Override
        public int hashCode() {
            return Objects.hash(productVariantOid, storeOid);
        }
    }
}

We have about 200M instances of this entity.

The following query finishes instantly:

SELECT FROM ProductVariantStoreData d
WHERE d.storeOid = '1205450'
  AND (d.productVariantOid = '1344763' OR d.productVariantOid = '1344770')
Query plan 1/2 description
============================

[Step 1]
Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData
locating ProductVariantStoreData (d) instances that satisfy:
  (d.productVariantOid='1344770').

[Step 2]
Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData
locating ProductVariantStoreData (d) instances that satisfy:
  (d.productVariantOid='1344763').

[Step 3]
Merge the results of steps 1, 2 using OR.

[Step 4]
Filter the results of step 3
retaining only results that satisfy:
  (d.storeOid='1205450').

[Step 5]
Group the results of step 4 by:
  d.

[Step 6]
Remove duplicates from the groups of step 5.

[Step 7]
Apply selection and prepare final results.

<distinctPlan 1.7056 3,83/4,13 distinct(d)>
  <filterPlan 1.4503 3,83/4,13 (d.storeOid='1205450')>
    <mergePlan or 1.4292 3,81/4,13 0>
      <btreePlan 1.0142 2,72/4,13 d(ProductVariantStoreData) type(ProductVariantStoreData['1344770'->'1344770'*:notNull]) />
      <btreePlan 1.0142 2,72/4,13 d(ProductVariantStoreData) type(ProductVariantStoreData['1344763'->'1344763'*:notNull]) />
    </mergePlan>
  </filterPlan>
</distinctPlan>

However, if I change the two OR'd equalities to IN, the query suddenly takes 2.5 minutes:

SELECT FROM ProductVariantStoreData d
WHERE d.storeOid = '1205450'
  AND d.productVariantOid IN ('1344763', '1344770')
Query plan 1/1 description
============================

[Step 1]
Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData
locating ProductVariantStoreData (d) instances that satisfy:
  and(contains(list('1344763','1344770'),d.productVariantOid),(d.storeOid='1205450')).

[Step 2]
Apply selection and prepare final results.

<btreePlan 23.4232 6,87/7,56 d(ProductVariantStoreData) type(ProductVariantStoreData[and(contains(list('1344763','1344770'),d.productVariantOid),[pos<1>'1205450'->'1205450':notNull])]) />

The same happens programmatically when using javax.jdo.Query, where I declare HashSet parameters for the searched values, and set the query filter to:

param_productVariantOids.contains(this.productVariantOid) &&
param_storeOids.contains(this.storeOid)

I would assume this would have the same query plan as the "IN" query, though I haven't checked because I don't know how to do that in Explorer's Query tab.

I was able to work around this by declaring a separate parameter for each value and OR-ing them together, instead of using the HashSet, but it would be nice to improve the optimizer to get the same performance for queries. Thanks!

#2

There is another issue that causes a complete scan:

SELECT FROM ProductVariantStoreData d
WHERE d.storeOid = '1205450'
  AND FALSE

This query will obviously not find anything, but it still takes 2 minutes to complete. Not being able to use "AND FALSE" makes it awkward to configure the query programmatically if some deeply nested condition is known to always be false.

Query plan 1/1 description
============================

[Step 1]
Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData
locating ProductVariantStoreData (d) instances that satisfy: 
  and(false,(d.storeOid='1205450')).

[Step 2]
Apply selection and prepare final results.

<btreePlan 23.4232 6,86/7,56 d(ProductVariantStoreData) type(ProductVariantStoreData[and(false,[pos<1>'1205450'->'1205450':notNull])]) />

The same issue happens when a HashSet parameter is empty; then `param_productVariantOids.contains(this.productVariantOid)` will obviously never find anything, but the scan still happens. This is what I discovered first, and I was hoping that if the parameter is empty, I could substitute the condition with `&& false` to nudge ObjectDB into immediately returning an empty result, but it still did a full scan.

Reply