Index Definition

Querying without indexes requires iteration over entity objects in the database one by one. This may take a significant amount of time if many entity objects have to be examined. Using proper indexes the iteration can be avoided and complex queries over millions of objects can be executed quickly. Index management introduces overhead in terms of maintenance time and storage space, so deciding which fields to define with indexes should be done carefully.

This page covers the following topics:

Single Field Index Composite Index Multi Part Path Index Indexes in Queries

Single Field Index

The following entity definition uses JDO’s @Index and @Unique annotations to define indexes (JPA's @Index cannot be applied to fields):

import javax.jdo.annotations.Index;
import javax.jdo.annotations.Unique;

@Entityjavax.persistence.Entity - JPA Annotation
 Specifies that the class is an entity.
public class EntityWithSimpleIndex {
    @Index String indexedField1;
    @Index(unique="true") int indexedField2; // unique
    @Index(name="i3") int indexedField3;
    @Unique Integer indexedField4; // unique
    @Unique(name="u2") Date indexedField5; // unique
}

@Unique represents a unique index that prevents duplicate values in the indexed field.
A PersistenceException is thrown on commit (or flush) if different entities have the same value in a unique field (similar to how primary keys behave).

@Index represents either an ordinary index with no unique constraint or a unique index if unique="true"is specified (the default is false).

The optional name attribute has no specific role but might be presented in the ObjectDB Explorer and in logging.

When an entity object is stored in the database every indexed field must contain either null or a value of one of the following persistable types:

Primitive types: boolean, byte, short, char, int, long, float, double.
Equivalent wrapper classes from package java.lang:
Byte, Short, Character, Integer, Long, Float, Double.
java.math.BigInteger, java.math.BigDecimal.
java.lang.String.
java.util.Date, java.sql.Date, java.sql.Time, java.sql.Timestamp.
Any enum type.
Reference to an entity object.
Arrays and collections that contain values of the above types (including null).

Indexes can only be defined on ordinary persistent fields (not on primary key / version fields).

Composite Index

A composite index is an index on more than one persistent field. It is defined by specifying multiple fields in the members attribute of the @Index or @Unique annotations:

@Entityjavax.persistence.Entity - JPA Annotation
 Specifies that the class is an entity.
@Index(members={"lastName","firstName"})
public class EntityWithCompositeIndex {
    String firstName;
    String lastName;
}

When indexed fields are specified explicitly in the members attribute, as shown above, the @Index or @Unique annotation can be attached to either the class or to any persistent field in the class.

Multiple @Index annotations can be wrapped with an @Indices annotation:

@Entityjavax.persistence.Entity - JPA Annotation
 Specifies that the class is an entity.
@Indices({
    @Index(members={"lastName","firstName"}),
    @Index(members={"firstName"}, unique="true")
})
public class EntityWithCompositeIndex {
    String firstName;
    String lastName;
}

Similarly, the @Uniques annotation can wrap multiple @Unique annotations.

As shown above, the members attribute can also be used for a single field index. This is useful if you want to centralize your index definitions at the top of the class and is equivalent to omitting members and attaching the @Index annotation directly to the indexed field.

Multi Part Path Index

The members attribute is also required in order to define indexes on multi part paths:

@Entityjavax.persistence.Entity - JPA Annotation
 Specifies that the class is an entity.
@Index(members={"address.city"})
public class Employee {
    String firstName;
    String lastName;
    Address address;
     :
}

@Embeddable
javax.persistence.Embeddable - JPA Annotation
 Specifies a class whose instances are stored as an intrinsic
 part of an owning entity and share the identity of the entity.class Address {
    String street;
    String city;
     :
}

Indexes must always refer to values that are stored as part of the entity. Therefore, indexes on multi part paths are only allowed when using embeddable classes as fields of embedded object are stored as part of the containing entity.

Composite indexes on multi part paths are also allowed:

@Entityjavax.persistence.Entity - JPA Annotation
 Specifies that the class is an entity.
@Index(members={"addresses.city,addresses.street"})
public class Employee {
     :
    List<Address> addresses;
     :
}

Notice that the paths include a collection, so multiple values will be maintained by the index for every entity.

Multi part paths in a composite index must have the same length. Therefore, the following index definition is invalid:

@Entityjavax.persistence.Entity - JPA Annotation
 Specifies that the class is an entity.
@Index(members={"lastName", "address.city"}) // INVALID
public class Employee {
    String firstName;
    String lastName;
    Address address;
     :
}

Indexes in Queries

ObjectDB manages a BTree for every index. A BTree is an ordered map data structure that ObjectDB maintains in the file system rather than in memory. The keys of the BTree are all the unique values in the indexed field (or arrays of values in composite indexes), in all the entities of that class (including subclasses). Every key is associated with a list of references to the entities that contain that value.

Indexes require maintenance time and consume storage space. Therefore, using indexes wisely requires an understanding of how ObjectDB uses indexes to accelerate query execution.

Indexes are especially efficient in lookup and range queries:

  SELECT p FROM Point p WHERE p.x = 100
  SELECT p FROM Point p WHERE p.x BETWEEN 50 AND 80
  SELECT p FROM Point p WHERE p.x >= 50 AND p.x <= 80

By using an index on field x, ObjectDB can find the results using a range scan, which is very efficient because only branches of the BTree that are relevant are iterated.

A composite index on fields x and y can also be used for the queries above. In this case the order of the fields in the composite index definition is important, because it determines which field is used as the BTree's primary sort key and which field is used as a secondary sort key. If x is specified first, the BTree is ordered by x values, so a range scan on x values is supported. On the other hand, if y is specified first, the BTree is ordered by y values, so a full BTree scan is required to find all the objects with relevant x values. A full index scan is less efficient than a range scan, but might be still more efficient than iteration over the entities data.

A composite index on fields x and y enables quick execution of the following queries:

  SELECT p FROM Point p WHERE p.x = 100 AND p.y = 100
  SELECT p FROM Point p WHERE p.x = 100 AND p.y BETWEEN 50 AND 80

For the second query above, the index order should be x first and y second.

Indexes on collections are useful in JOIN queries:

  SELECT d FROM Document d JOIN d.words w WHERE w = 'JPA'

The Document entity class contains a words field whose type is List<String>. The query retrieves the documents that contain "JPA". An index on words will manage for every word all the documents that contain it. Therefore, using such index, the query above can be executed by a quick index range scan.

The same collection index can also be used for executing the following query:

  SELECT d FROM Document d JOIN d.words w WHERE LENGTH(w) >= 10

But this time a full index scan is required because the index uses the lexicographic order of the words and is not ordered by the length of the words.

The bottom line is that if there is an index that contains all the fields in the WHERE clause of a given query that query will run faster as it will be able to, at the very least, utilize a full index scan. Even better, if the field order as defined in the index matches the field order used in the WHERE clause of the query a more efficient range scan can be performed.

ObjectDB also uses indexes for sorting results and for projection:

  SELECT MIN(p.x) FROM Point p WHERE p.x < p.y ORDER BY p.y

In the above example, a composite index on fields x and y would cover all the fields in the query (including the WHERE and ORDER BY clauses) saving the need to access the entities themselves. Instead, because of the comparison in the WHERE clause, a full index scan would be performed. Additionally, if field y was the first field defined in the index the results would already be produced in the requested order (the order of the scan) thus eliminating the need for a separate sort.

Finally, indexes are also used in MIN and MAX queries:

  SELECT MIN(p.x), MAX(p.x) FROM Point p

Given an index on field x ObjectDB can simply return the first and last key in the BTree, without any iteration.

< Generated Values ^ Entity Classes Schema Evolution >