Hi! We have an entity with a composite primary key on "primaryVariantOid" and "storeOid" fields:
(I removed unnecessary fields and methods to make the code shorter and hopefully compile with just the JDK)
import javax.persistence.Entity; import javax.persistence.Id; import javax.persistence.IdClass; import java.io.Serial; import java.io.Serializable; import java.util.Objects; @Entity @IdClass(ProductVariantStoreData.DatabaseId.class) public final class ProductVariantStoreData implements Serializable { @Serial private static final long serialVersionUID = -4357827914123980944L; @Id private String productVariantOid; @Id private String storeOid; private ProductVariantStoreData() {} public ProductVariantStoreData(String productVariantOid, String storeOid) { this.productVariantOid = productVariantOid; this.storeOid = storeOid; } public static final class DatabaseId { final String productVariantOid; final String storeOid; public DatabaseId(String productVariantOid, String storeOid) { this.productVariantOid = productVariantOid; this.storeOid = storeOid; } @Override public boolean equals(Object obj) { return obj instanceof DatabaseId other && Objects.equals(productVariantOid, other.productVariantOid) && Objects.equals(storeOid, other.storeOid); } @Override public int hashCode() { return Objects.hash(productVariantOid, storeOid); } } }
We have about 200M instances of this entity.
The following query finishes instantly:
SELECT FROM ProductVariantStoreData d WHERE d.storeOid = '1205450' AND (d.productVariantOid = '1344763' OR d.productVariantOid = '1344770')
Query plan 1/2 description ============================ [Step 1] Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData locating ProductVariantStoreData (d) instances that satisfy: (d.productVariantOid='1344770'). [Step 2] Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData locating ProductVariantStoreData (d) instances that satisfy: (d.productVariantOid='1344763'). [Step 3] Merge the results of steps 1, 2 using OR. [Step 4] Filter the results of step 3 retaining only results that satisfy: (d.storeOid='1205450'). [Step 5] Group the results of step 4 by: d. [Step 6] Remove duplicates from the groups of step 5. [Step 7] Apply selection and prepare final results. <distinctPlan 1.7056 3,83/4,13 distinct(d)> <filterPlan 1.4503 3,83/4,13 (d.storeOid='1205450')> <mergePlan or 1.4292 3,81/4,13 0> <btreePlan 1.0142 2,72/4,13 d(ProductVariantStoreData) type(ProductVariantStoreData['1344770'->'1344770'*:notNull]) /> <btreePlan 1.0142 2,72/4,13 d(ProductVariantStoreData) type(ProductVariantStoreData['1344763'->'1344763'*:notNull]) /> </mergePlan> </filterPlan> </distinctPlan>
However, if I change the two OR'd equalities to IN, the query suddenly takes 2.5 minutes:
SELECT FROM ProductVariantStoreData d WHERE d.storeOid = '1205450' AND d.productVariantOid IN ('1344763', '1344770')
Query plan 1/1 description ============================ [Step 1] Scan type eu.extech.quant.product_variants.data.ProductVariantStoreData locating ProductVariantStoreData (d) instances that satisfy: and(contains(list('1344763','1344770'),d.productVariantOid),(d.storeOid='1205450')). [Step 2] Apply selection and prepare final results. <btreePlan 23.4232 6,87/7,56 d(ProductVariantStoreData) type(ProductVariantStoreData[and(contains(list('1344763','1344770'),d.productVariantOid),[pos<1>'1205450'->'1205450':notNull])]) />
The same happens programmatically when using javax.jdo.Query, where I declare HashSet parameters for the searched values, and set the query filter to:
param_productVariantOids.contains(this.productVariantOid) && param_storeOids.contains(this.storeOid)
I would assume this would have the same query plan as the "IN" query, though I haven't checked because I don't know how to do that in Explorer's Query tab.
I was able to work around this by declaring a separate parameter for each value and OR-ing them together, instead of using the HashSet, but it would be nice to improve the optimizer to get the same performance for queries. Thanks!