- Schema Book
- Indexes
Indexes
Indexes are key to achieving high performance in MongoDB, as they allow the database to search through less documents to satisfy a query. Without an index MongoDB has to scan through all of the documents in a collection to fulfill the query.
An index increases the amount of storage needed to represent a document and the time it takes to insert a document. Indexes trade off faster queries against storage space.
One of the principal things to remember about indexes, is that they are inclusive
. Inclusive
means they can only answer questions about documents that have been included in the index.
tip
Gotchas
$nin
and $ne
queries cannot be answered by indexes, and force collection scans. If you need to use these ensure you are filtering down using indexes as much as possible and leaving the $nin
and $ne
terms to the very last part of the query selector.
MongoDB has several types of indexes
- Single field indexes
- Compound indexes
- Multi key indexes
- Geo-spatial indexes
- Text indexes
It also supports a couple of variations of the above indexes
- Sparse indexes
- Unique indexes
- Time To Live indexes
- Covered Index Queries
Single field indexes
Take the following document
{
_id: ObjectId("523cba3c73a8049bcdbf6007"),
name: 'Peter Jackson',
age: 50,
nationality: "New Zealand",
address: {
street: "Some Street 22"
},
department: {
floor: 1,
building: 1
}
}
Let’s look at some different ways we can apply a single field index
var values = db.getSisterDB("indexes").values;
values.ensureIndex({name: 1});
This indexes the name field in ascending order.
var values = db.getSisterDB("indexes").values;
values.ensureIndex({"address.street": 1});
This indexes the street
field, in the embedded document, under the address
field.
var values = db.getSisterDB("indexes").values;
values.ensureIndex({department: 1});
This indexes the department
embedded document, allowing for strict equality matches on the it. It will only match on the query for a embedded document, that contains all the fields in the indexed embedded document.
var values = db.getSisterDB("indexes").values;
values.findOne({department: {floor: 1, building: 1}});
Compound indexes
A compound index is an index that contains references to multiple fields within a document.
var values = db.getSisterDB("indexes").values;
values.ensureIndex({nationality: 1, age: -1, name: 1});
The compound indexes have some interesting properties. The index is usable if you have a query that includes nationality, age and name. But it’s also able to answer other queries using the index.
- Any query starting with nationality
- Any query starting with nationality and age
- Any query starting with nationality, age and name
- Any query starting with nationality and name
For compound indexes order matters as we match from left to right. For example if you reverse a query to start with name and age, it will not match the order of fields in the compound index, and MongoDB will not be able to use the index to speed up the query.
tip
Compound Index Field Order
Always make sure the order of fields in a compound index matches the order of fields in the queries you want to execute against the collection.
One thing to note about using a compound index is sorting. The ordering and direction of the fields in a compound index not only decide if it’s possible to use the index in the query, but are also used for sorting.
Given the index {nationality: 1, age: -1, name: 1}
, we can support the following sorting leveraging the index.
var values = db.getSisterDB("indexes").values;
values.find().sort({nationality: 1, age: -1}).toArray();
values.find().sort({nationality: -1, age: 1}).toArray();
values.find().sort({nationality: -1, age: 1, name: -1}).toArray();
values.find().sort({nationality: 1, age: -1, name: 1}).toArray();
Sorting can only use the index if they match the order specified or the exact reverse order specified.
Multi key indexes
Multi key indexes lets MongoDB index arrays of values. Take the following example document.
{
"title": "Superman",
"tags": ["comic", "action", "xray"],
"issues": [
{
"number": 1,
"published_on": "June 1938"
}
]
}
Multi key indexes lets us search on the values in the tags
array, as well as in the issues
array. Let’s create two indexes to cover both.
var comics = db.getSisterDB("store").comics;
comics.ensureIndex({tags: 1});
comics.ensureIndex({issues: 1});
The two indexes lets you do exact matches on values in the tags
and issues
arrays of values.
var comics = db.getSisterDB("store").comics;
comics.find({tags: "action"});
comics.find({issues: {number: 1, published_on: "June 1938"}}).toArray();
The first query will use the index on tags
to return the document. The second will use the index on issues
to return the document. The second query is dependent on the order of the fields in the documents indexed. If the number
and published_on
on fields changed order, the second query would fail.
If the document changes structure over time, it would be better to create a specific compound index on the fields needed in the embedded documents. A better index would be:
var comics = db.getSisterDB("store").comics;
comics.ensureIndex({"issues.number":1, "issues.published_on":1});
To leverage the index, the second query can be written as follows.
var comics = db.getSisterDB("store").comics;
comics.find({
"issues.number":1,
"issues.published_on": "June 1938"}).toArray()
Geo-spatial indexes
MongoDB offers several Geo-spatial index types. These indexes make it possible to perform efficient Geo-spatial queries.
important
GEO JSON
MongoDB supports GEO JSON (http://geojson.org) for performing GEO location queries.
Specialized 2d Sphere index
The 2d Geo-spatial Sphere index, allows applications to perform queries on an earth-like sphere, making for higher accuracy in matching locations.
Take the following example document.
{
loc: {
type: "Point",
coordinates: [60, 79]
},
type: "house"
}
Create a 2dsphere index.
var locations = db.getSisterDB("geo").locations;
locations.ensureIndex({loc: "2dsphere", house: 1});
Query the index using a GEO JSON type.
var locations = db.getSisterDB("geo").locations;
locations.find({loc: {
$geoWithin: {
$geometry: {
type: "Polygon",
coordinates: [[
[ 0 , 0 ] , [ 0 , 80 ] , [ 80 , 80 ] , [ 80 , 0 ] , [ 0 , 0 ]
]]
}
}
}}).toArray();
important
Gotchas 2dsphere
The 2d sphere index is a pure GeoSpatial index and is limited to the ranges for latitude (-90 to 90) and longitude (-180 to 180). It also only accepts $geometry
queries. In return it’s faster and more accurate than the general 2d index.
General 2d index
The 2d index is a flat index that does not take into consideration any projection. One of the benefits of the 2d index is that it allows for fixed lower and upper bounds for the coordinate system as well as the search resolution. This makes for a more general 2d index.
Let’s add a sample document.
var houses = db.getSisterDB("2d").houses;
houses.insert({
price_rooms: [10000, 3],
type: "house"
});
Notice that the price_rooms
is just an array. This is because the 2d index is not inherently tied to the GeoJSON format in the same way as the 2dsphere index.
Let’s create a 2d index.
var houses = db.getSisterDB("2d").houses;
houses.ensureIndex({price_rooms: "2d"}, { min: 0, max: 200000, bits: 32 });
Now let’s look for all houses that fall inside the range of 2000 to 20000 in price and have 0 to 5 rooms.
db.houses.find( { price_rooms :
{ $geoWithin : {
$box : [ [ 2000 , 0 ] , [ 20000 , 5 ] ]
}
}
}).toArray();
tip
2d indexes
The min
and max
values let you project any 2d data with numeric values into a 2d index, where you can use geo queries like $near, $box etc to cut and slice the data. Once one realizes it’s a generalized 2d index, it becomes quite useful for a range of queries not easily achieved with the standard query operators.
Text indexes
Text search is integrated into the MongoDB query language from version 2.6 (In 2.4 it was available as beta command). It relies on an underlying text index.
Let’s insert some sample documents.
var entries = db.getSisterDB("blogs").entries;
entries.insert( {
title : "my blog post",
text : "I'm writing a blog. yay",
site: "home",
language: "english" });
entries.insert( {
title : "my 2nd post",
text : "this is a new blog I'm typing. yay",
site: "work",
language: "english" });
entries.insert( {
title : "knives are Fun",
text : "this is a new blog I'm writing. yay",
site: "home",
language: "english" });
Let’s create the text index.
var entries = db.getSisterDB("blogs").entries;
entries.ensureIndex({title: "text", text: "text"}, { weights: {
title: 10,
text: 5
},
name: "TextIndex",
default_language: "english",
language_override: "language" });
The example shows how weights can be used to control the weighing of fields in subsequent queries against the index. In this case any search that matches title
will be ranked higher than a match in the text
field.
We also passed a name
option that allows us to give the index a custom name.
The default_language
option specifies that any document missing a specific language field should default to english
.
The option language_override
tells the text index to look for individual documents language definition under the language
field. If the language
field for a specific document is set to for example Spanish, MongoDB will index it using the Spanish stop list and stemming.
Now let’s query for all the blog entries that contain the blog
word, and filter by the site field.
var entries = db.getSisterDB("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"})
The query matches all the documents, that contain the word blog
in either the title
or text
field, and then filters them by the site
field. If you wish to retrieve the search score for each matching document, modify the query slightly.
var entries = db.getSisterDB("blogs").entries;
entries.find({$text: {$search: "blog"}, site: "home"},
{score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}});
The query includes the score given to the individual matched documents and sorts them in descending order by the score
.
important
Text Indexes Can Get Big
Text indexes can grow to be bigger than the actual stored documents, and can take a while to build if the collection is big. They also add additional overhead to writes, compared to simpler indexes.
Sparse indexes
Sparse indexes are indexes, where no values are included for fields that do not exist. Look at the following two documents.
var sparse = db.getSisterDB("indexes").sparse;
sparse.insert({ hello: "world", number: 1 });
sparse.insert({ hello: "world" });
A non-sparse index for the field number
, will contain an entry for both documents in the index. A sparse index will contain only the documents that contain the number
field. Sparse indexes can be an efficient way of indexing fields that only occur in a certain percentage of documents in a collection, saving disk IO and memory.
To create a sparse index:
var sparse = db.getSisterDB("indexes").sparse;
sparse.ensureIndex({number: 1}, {sparse: true});
Unique indexes
An unique index is different from a normal index in that it only allows a single document to exist for a field value.
var unique = db.getSisterDB("indexes").unique;
unique.ensureIndex({number: 1}, {unique: true});
Now let’s try to insert some documents
var unique = db.getSisterDB("indexes").unique;
unique.insert({ hello: "world", number: 1 });
unique.insert({ hello: "world", number: 1 });
The second insert fails, due to the existing document with the field number
set to 1
.
Time To Live indexes
Time to live indexes (TTL) are a special type of index that will remove documents that fail to meet the index condition. One use for TTL indexes is to model a cache where documents expire after a set amount of time, allowing old documents to be gradually removed by MongoDB. This avoids the need of performing bulk deletes of documents using an external application or script.
Let’s insert some documents.
var ttl = db.getSisterDB("indexes").ttl;
ttl.insert({ created_on: new Date() });
Let’s define an a TTL index on created_on
with an expire time of 1000
seconds in the future.
var ttl = db.getSisterDB("indexes").ttl;
ttl.ensureIndex({created_on: 1}, {expireAfterSeconds: 1000});
The TTL will delete any documents where the created_on
field is older than 1000
seconds.
important
Notes about TTL
The expireAfterSeconds
is not a hard limit. MongoDB will remove any expired documents some time after they have meet the condition.
It’s important to note that Time to Live indexes only with Date based fields.
Covered Index Queries
Covered index queries are queries that can be answered using only the information stored in the index. Basically MongoDB answers the query using the fields stored in an index, never actually materializing the document into memory. Let’s look at an example usage of covered queries.
var covered = db.getSisterDB("indexes").covered;
covered.insert({ text: "hello", site: "home"});
covered.insert({ text: "hello", site: "work" });
Let’s define the covered index first.
var covered = db.getSisterDB("indexes").covered;
covered.ensureIndex({text: 1, site: 1});
Next perform a covered index query.
var covered = db.getSisterDB("indexes").covered;
covered.find({text: "hello"}, {_id: 0, text:1, site:1});
Let’s retrieve the query plan.
var covered = db.getSisterDB("indexes").covered;
covered.find({text: "hello"}, {_id: 0, text:1, site:1}).explain();
This returns a result with all the details about the query plan executed for this query.
{
"cursor" : "BtreeCursor text_1_site_1",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 0,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
...
"server" : "christkv.local:27017"
}
Notice how the query plan result includes indexOnly
set to true
. This means that the query was completely covered by the index and MongoDB never touched the documents.
important
Covered Index Gotchas
Notice how {_id: 0, text:1, site:1}
excludes _id
. A covered index query cannot include the _id
field.