Indexes

A Btree Example, http://commons.wikimedia.org/wiki/File:Btree.png

Indexes are key to achieving high performance in MongoDB, as they allow the database to search through less documents to satisfy a query. Without an index MongoDB has to scan through all of the documents in a collection to fulfill the query.

An index increases the amount of storage needed to represent a document and the time it takes to insert a document. Indexes trade off faster queries against storage space.

One of the principal things to remember about indexes, is that they are inclusive. Inclusive means they can only answer questions about documents that have been included in the index.

tip

Gotchas

$nin and $ne queries cannot be answered by indexes, and force collection scans. If you need to use these ensure you are filtering down using indexes as much as possible and leaving the $nin and $ne terms to the very last part of the query selector.

MongoDB has several types of indexes

  • Single field indexes
  • Compound indexes
  • Multi key indexes
  • Geo-spatial indexes
  • Text indexes

It also supports a couple of variations of the above indexes

  • Sparse indexes
  • Unique indexes
  • Time To Live indexes
  • Covered Index Queries

Single field indexes

Take the following document

{
  _id: ObjectId("523cba3c73a8049bcdbf6007"),
  name: 'Peter Jackson',
  age: 50,
  nationality: "New Zealand",
  address: {
    street: "Some Street 22"
  },
  department: {
    floor: 1,
    building: 1
  }
}

Figure: An example document

Let’s look at some different ways we can apply a single field index

var values = db.getSisterDB("indexes").values;
values.ensureIndex({name: 1});

Example 1: A single field index

This indexes the name field in ascending order.

var values = db.getSisterDB("indexes").values;
values.ensureIndex({"address.street": 1});

Example 2: A single embedded field index

This indexes the street field, in the embedded document, under the address field.

var values = db.getSisterDB("indexes").values;
values.ensureIndex({department: 1});

Example 3: A embedded document field index

This indexes the department embedded document, allowing for strict equality matches on the it. It will only match on the query for a embedded document, that contains all the fields in the indexed embedded document.

var values = db.getSisterDB("indexes").values;
values.findOne({department: {floor: 1, building: 1}});

Example 4: Embedded document match

Compound indexes

A compound index is an index that contains references to multiple fields within a document.

var values = db.getSisterDB("indexes").values;
values.ensureIndex({nationality: 1, age: -1, name: 1});

Example 5: Create compound index

The compound indexes have some interesting properties. The index is usable if you have a query that includes nationality, age and name. But it’s also able to answer other queries using the index.

  1. Any query starting with nationality
  2. Any query starting with nationality and age
  3. Any query starting with nationality, age and name
  4. Any query starting with nationality and name

For compound indexes order matters as we match from left to right. For example if you reverse a query to start with name and age, it will not match the order of fields in the compound index, and MongoDB will not be able to use the index to speed up the query.

tip

Compound Index Field Order

Always make sure the order of fields in a compound index matches the order of fields in the queries you want to execute against the collection.

One thing to note about using a compound index is sorting. The ordering and direction of the fields in a compound index not only decide if it’s possible to use the index in the query, but are also used for sorting.

Given the index {nationality: 1, age: -1, name: 1}, we can support the following sorting leveraging the index.

var values = db.getSisterDB("indexes").values;
values.find().sort({nationality: 1, age: -1}).toArray();
values.find().sort({nationality: -1, age: 1}).toArray();
values.find().sort({nationality: -1, age: 1, name: -1}).toArray();
values.find().sort({nationality: 1, age: -1, name: 1}).toArray();

Example 6: Create compound index

Sorting can only use the index if they match the order specified or the exact reverse order specified.

Multi key indexes

Multi key indexes lets MongoDB index arrays of values. Take the following example document.

{
  "title": "Superman",
  "tags": ["comic", "action", "xray"],
  "issues": [
    {
      "number": 1,
      "published_on": "June 1938"
    }
  ]
}

Figure: Sample document

Multi key indexes lets us search on the values in the tags array, as well as in the issues array. Let’s create two indexes to cover both.

var comics = db.getSisterDB("store").comics;
comics.ensureIndex({tags: 1});
comics.ensureIndex({issues: 1});

Example 7: Create Multikey fields

The two indexes lets you do exact matches on values in the tags and issues arrays of values.

var comics = db.getSisterDB("store").comics;
comics.find({tags: "action"});
comics.find({issues: {number: 1, published_on: "June 1938"}}).toArray();

Example 8: Query using Multikey indexes

The first query will use the index on tags to return the document. The second will use the index on issues to return the document. The second query is dependent on the order of the fields in the documents indexed. If the number and published_on on fields changed order, the second query would fail.

If the document changes structure over time, it would be better to create a specific compound index on the fields needed in the embedded documents. A better index would be:

var comics = db.getSisterDB("store").comics;
comics.ensureIndex({"issues.number":1, "issues.published_on":1});

Example 9: Create Embedded Array Compound Index

To leverage the index, the second query can be written as follows.

var comics = db.getSisterDB("store").comics;
comics.find({
  "issues.number":1,
  "issues.published_on": "June 1938"}).toArray()

Example 10: Query using the Compound Multikey index

Geo-spatial indexes

MongoDB offers several Geo-spatial index types. These indexes make it possible to perform efficient Geo-spatial queries.

important

GEO JSON

MongoDB supports GEO JSON (http://geojson.org) for performing GEO location queries.

Specialized 2d Sphere index

The 2d Geo-spatial Sphere index, allows applications to perform queries on an earth-like sphere, making for higher accuracy in matching locations.

Take the following example document.

{
  loc: {
    type: "Point",
    coordinates: [60, 79]
  },
  type: "house"
}

Figure: A Simple document with location information

Create a 2dsphere index.

var locations = db.getSisterDB("geo").locations;
locations.ensureIndex({loc: "2dsphere", house: 1});

Example 11: Create 2dsphere compound index

Query the index using a GEO JSON type.

var locations = db.getSisterDB("geo").locations;
locations.find({loc: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [[
          [ 0 , 0 ] , [ 0 , 80 ] , [ 80 , 80 ] , [ 80 , 0 ] , [ 0 , 0 ]
        ]]
      }
    }
  }}).toArray();

Example 12: Query using 2dsphere index

important

Gotchas 2dsphere

The 2d sphere index is a pure GeoSpatial index and is limited to the ranges for latitude (-90 to 90) and longitude (-180 to 180). It also only accepts $geometry queries. In return it’s faster and more accurate than the general 2d index.

General 2d index

The 2d index is a flat index that does not take into consideration any projection. One of the benefits of the 2d index is that it allows for fixed lower and upper bounds for the coordinate system as well as the search resolution. This makes for a more general 2d index.

Let’s add a sample document.

var houses = db.getSisterDB("2d").houses;
houses.insert({
  price_rooms: [10000, 3],
  type: "house"
});

Figure: A Simple document with location information

Notice that the price_rooms is just an array. This is because the 2d index is not inherently tied to the GeoJSON format in the same way as the 2dsphere index.

Let’s create a 2d index.

var houses = db.getSisterDB("2d").houses;
houses.ensureIndex({price_rooms: "2d"}, { min: 0, max: 200000, bits: 32 });

Example 13: Create 2d index

Now let’s look for all houses that fall inside the range of 2000 to 20000 in price and have 0 to 5 rooms.

db.houses.find( { price_rooms :
  { $geoWithin : {
      $box : [ [ 2000 , 0 ] , [ 20000 , 5 ] ]
    }
  }
}).toArray();

Example 14: Query for matching houses

tip

2d indexes

The min and max values let you project any 2d data with numeric values into a 2d index, where you can use geo queries like $near, $box etc to cut and slice the data. Once one realizes it’s a generalized 2d index, it becomes quite useful for a range of queries not easily achieved with the standard query operators.

Text indexes

Text search is integrated into the MongoDB query language from version 2.6 (In 2.4 it was available as beta command). It relies on an underlying text index.

Let’s insert some sample documents.

var entries = db.getSisterDB("blogs").entries;
entries.insert( {
  title : "my blog post",
  text : "I'm writing a blog. yay",
  site: "home",
  language: "english" });
entries.insert( {
  title : "my 2nd post",
  text : "this is a new blog I'm typing. yay",
  site: "work",
  language: "english" });
entries.insert( {
  title : "knives are Fun",
  text : "this is a new blog I'm writing. yay",
  site: "home",
  language: "english" });

Figure: Insert some sample documents

Let’s create the text index.

var entries = db.getSisterDB("blogs").entries;
entries.ensureIndex({title: "text", text: "text"}, { weights: {
    title: 10,
    text: 5
  },
  name: "TextIndex",
  default_language: "english",
  language_override: "language" });

Example 14: Create text index

The example shows how weights can be used to control the weighing of fields in subsequent queries against the index. In this case any search that matches title will be ranked higher than a match in the text field.

We also passed a name option that allows us to give the index a custom name.

The default_language option specifies that any document missing a specific language field should default to english.

The option language_override tells the text index to look for individual documents language definition under the language field. If the language field for a specific document is set to for example Spanish, MongoDB will index it using the Spanish stop list and stemming.

Now let’s query for all the blog entries that contain the blog word, and filter by the site field.

var entries = db.getSisterDB("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"})

Example 15: Query using text index

The query matches all the documents, that contain the word blog in either the title or text field, and then filters them by the site field. If you wish to retrieve the search score for each matching document, modify the query slightly.

var entries = db.getSisterDB("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"},
  {score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}});

Example 16: Query using text index

The query includes the score given to the individual matched documents and sorts them in descending order by the score.

important

Text Indexes Can Get Big

Text indexes can grow to be bigger than the actual stored documents, and can take a while to build if the collection is big. They also add additional overhead to writes, compared to simpler indexes.

Sparse indexes

Sparse indexes are indexes, where no values are included for fields that do not exist. Look at the following two documents.

var sparse = db.getSisterDB("indexes").sparse;
sparse.insert({ hello: "world", number: 1 });
sparse.insert({ hello: "world" });

Figure: Sample documents

A non-sparse index for the field number, will contain an entry for both documents in the index. A sparse index will contain only the documents that contain the number field. Sparse indexes can be an efficient way of indexing fields that only occur in a certain percentage of documents in a collection, saving disk IO and memory.

To create a sparse index:

var sparse = db.getSisterDB("indexes").sparse;
sparse.ensureIndex({number: 1}, {sparse: true});

Example 17: Create Sparse Index

Unique indexes

An unique index is different from a normal index in that it only allows a single document to exist for a field value.

var unique = db.getSisterDB("indexes").unique;
unique.ensureIndex({number: 1}, {unique: true});

Example 18: Create Unique Index

Now let’s try to insert some documents

var unique = db.getSisterDB("indexes").unique;
unique.insert({ hello: "world", number: 1 });
unique.insert({ hello: "world", number: 1 });

Figure: Sample documents

The second insert fails, due to the existing document with the field number set to 1.

Time To Live indexes

Time to live indexes (TTL) are a special type of index that will remove documents that fail to meet the index condition. One use for TTL indexes is to model a cache where documents expire after a set amount of time, allowing old documents to be gradually removed by MongoDB. This avoids the need of performing bulk deletes of documents using an external application or script.

Let’s insert some documents.

var ttl = db.getSisterDB("indexes").ttl;
ttl.insert({ created_on: new Date() });

Figure: Sample documents

Let’s define an a TTL index on created_on with an expire time of 1000 seconds in the future.

var ttl = db.getSisterDB("indexes").ttl;
ttl.ensureIndex({created_on: 1}, {expireAfterSeconds: 1000});

Example 19: Create TTL index

The TTL will delete any documents where the created_on field is older than 1000 seconds.

important

Notes about TTL

The expireAfterSeconds is not a hard limit. MongoDB will remove any expired documents some time after they have meet the condition.

It’s important to note that Time to Live indexes only with Date based fields.

Covered Index Queries

Covered index queries are queries that can be answered using only the information stored in the index. Basically MongoDB answers the query using the fields stored in an index, never actually materializing the document into memory. Let’s look at an example usage of covered queries.

var covered = db.getSisterDB("indexes").covered;
covered.insert({ text: "hello", site: "home"});
covered.insert({ text: "hello", site: "work" });

Figure: Sample documents

Let’s define the covered index first.

var covered = db.getSisterDB("indexes").covered;
covered.ensureIndex({text: 1, site: 1});

Example 20: Create Covered index

Next perform a covered index query.

var covered = db.getSisterDB("indexes").covered;
covered.find({text: "hello"}, {_id: 0, text:1, site:1});

Example 21: Query using Covered Index

Let’s retrieve the query plan.

var covered = db.getSisterDB("indexes").covered;
covered.find({text: "hello"}, {_id: 0, text:1, site:1}).explain();

Example 22: Query with Explain

This returns a result with all the details about the query plan executed for this query.

{
  "cursor" : "BtreeCursor text_1_site_1",
  "isMultiKey" : false,
  "n" : 2,
  "nscannedObjects" : 0,
  "nscanned" : 2,
  "nscannedObjectsAllPlans" : 0,
  "nscannedAllPlans" : 2,
  "scanAndOrder" : false,
  "indexOnly" : true,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 0,
  ...
  "server" : "christkv.local:27017"
}

Figure: Query plan example, field removed for brevity

Notice how the query plan result includes indexOnly set to true. This means that the query was completely covered by the index and MongoDB never touched the documents.

important

Covered Index Gotchas

Notice how {_id: 0, text:1, site:1} excludes _id. A covered index query cannot include the _id field.