Schema Design

A Schema,

There are several things to keep in mind when designing a schema for your application.

Read Ratio to Write Ratio

Determining if your application is read heavy or write heavy will lead to how you design your schema. If your application is read heavy, you might want to choose a schema that minimizes the amount of reads from MongoDB.

As an example consider an auction type website. As most operations are read operations caused by people browsing the catalog, it might makes sense to use a denormalized schema for the product including as much relevant information as needed to render the entire product page.

Similarly if your application is write heavy, you might want to ensure that you use a schema that maximizes MongoDB write throughput.

Avoid Application Joins

MongoDB does not support server side joins. All joins have to be performed in the application itself. The performance can suffer if you are pulling back and joining a lot of data due to all the round-trips required to bring back all the data and the time it takes to perform the in application join. If you find your schema is depending on a lot of joins, it might make sense to denormalize the schema in order to reduce the number of joins.

Pre-aggregate Data

Additionally, if you find you are aggregating data in a lot of application queries, you might want to consider pre-aggregating. One example might be a page view counter. Instead of summing up the number of views for a particular page on request, we can increment a view counter for that page each time the page is viewed and use this counter to show the number of page views.

Avoid Growing Documents (MMAP)

If, you find that your schema design creates documents that are constantly growing in size, it will have impact on your disk IO and database performance. Using document buckets and document pre-allocation will help address the issues for the MMAP storage engine.

Avoid Updating Whole Documents (MMAP)

MongoDB provides for atomic operators that let you modify fields in an existing document, and in most cases will cause an in-place update when using the MMAP storage engine. This ensures we spend as little time as possible re-allocating documents in memory and improves write performance.

Pre-allocated Documents (MMAP)

If your schema grows to a known size you can avoid document moves by pre-allocating the maximum size of the document causing all operations on the document to be in-place updates.

Field Names Take up Space (MMAP)

In some cases documents can contain more space allocated for the field names than the actual data stored. For this case you may want to consider compressing your field names if you are using MMAP or switch to using WiredTiger that supports compression using snappy or zlib.

Over Eager Indexing

You might get tempted to add all kinds of indexes to your schema. You have to keep in mind, that each index will impact performance, as they will need to be updated when documents change and the more indexes you have on a collection the more overhead there will be for each write operation. Each index also takes up space and memory so keep in mind that over eager indexing can cause your storage size to balloon.

Custom _id Field

You can save some space and additional indexes by overriding the meaning of the _id field. The only requirement is that _id is a unique field for the collection. For example you might have a structure that contains a timestamp, userid and machineid allowing you to use the _id index to query for those fields without having to create an additional index.

Covered Indexes

If your application can leverage covered indexes, it might help performance given that a query might be completely answerable using the data stored in the index without materializing the underlying documents.