WiredTiger Storage Engine

MongoDB 3.0 introduced an internal storage API, allowing for new storage engines to be added to MongoDB. MongoDB 3.0 included the new WiredTiger storage engine as an option. The WiredTiger storage engine brings a whole new set of possibilities when it comes to scaling MongoDB vertically. WiredTiger introduces document level locking and efficient CPU scaling. In this chapter, we provide an overview of the main aspects of the new storage engine.

Overview of WiredTiger

  • Highly concurrent and vertically scalable
  • Document level locking
  • Allows for more tuning of storage engine than MMAP
  • Compression
  • On-line compaction
  • Write-ahead transaction log for the journal
  • Does not support in place updates

Essentials

The WiredTiger storage engine brings document level locking to MongoDB, meaning that writes no longer block a collection or database. While MMAP in 3.0 brought collection level locking, multiple writes to the same collection will still cause the writes to be applied serially and can starve reads from the collection as reads have to wait for the writes to finish. WiredTiger gets rid of the limitation allowing multiple writes to happen concurrently against the same collection. This means that writes and reads scale with the CPU, whereas in MMAP there was a low ceiling for CPU scaling as the locks reduced the throughput.

Another feature WiredTiger introduces, is on-disk compression supporting snappy and zlib. This allows the user to trade off CPU usage, for higher data compression.

Compression Method

Description

snappy

Balances the compression ratio with low CPU usage

zlib

Very good compression but comes at the cost of higher CPU usage than snappy

Compression is one of the major benefits of the WiredTiger engine, as it reduces the amount of data that needs to be written or read from the disk. This lowers the IO operations needed, and allows for better usage of the storage IO bandwidth available.

It also goes a far towards solving one of the main issues with MMAP which is the cost of storing all the document field names in each stored document as these are efficiently compressed. Indexes are also compressed using prefix compression, which allows for more of the index to be stored in RAM.

Although WiredTiger accepts two different on disk formats MongodDB officially only supports the Record Store engine in 3.0. The Record Store engine is a B+ tree and is optimized for read workloads.

Keep in mind is that WiredTiger does not support in place updates. In place updates will cause the whole document to be rewritten. Even though it does not allow for in place updates, it could still perform better than MMAP for many workloads. The additional vertical scalability might still offset the cost of writing the new documents.

For durability, WiredTiger uses a write ahead transaction log where a checkpoint is taken every 60 seconds or if 2GB are written. This is analogous to the MMAP disk flushing. In the worst case, if the mongodb process does you'll lose up to 60 seconds of data or 2GB.

Tuning

WiredTiger allows for tuning some of the storage engine parameters. The 3 most important to consider are the cache size, checkpoint interval and logging.

Cache Size

The cache size is the WiredTiger working set size. Initially it defaults to 50% or 1 GB which ever is the higher value.

Tuning the parameter can have a big impact on performance.

Checkpoint interval

The default checkpoint thresholds is 60 seconds, or 2GB data, whichever comes first. This setting can be changed to allow for more frequent, or less frequent checkpoints depending on the needs of the application, and what kind of durability guarantees you need.

Logging

WiredTiger uses a write-ahead log for the journaling that provides immediate write durability, with no need for crash recovery. One of the main things to keep in mind, is that the write-ahead log is not needed for crash recovery. Crash recovery is handled by the checkpoints mechanism. This stands in contrasts to MMAP where the journal is essential when performing crash recovery.

important

If you are using a replicaset you might decide to turn off logging because replication is a good enough durability guarantee for you application.

When to use WiredTiger

This is one where it's best to benchmark your specific application, as it will vary depending on what kind of write and read patterns you application use, as well as the available hardware. For some workloads you might find that WiredTiger is a huge improvement, while for others there is no improvement, or even performance degradation. The schema simulator tool was specifically written to help you model and try your workload against MongoDB.

comments powered by Disqus
On this page