Internationalization

Earth, courtesy of http://upload.wikimedia.org/wikipedia/commons/2/22/Earth_Western_Hemisphere_transparent_background.png

In this internationalization example, we show a schema that might be useful for when you have to support multiple languages on an international site. We are going to use products and product categories in our example. This is a very simplified example and does not take into consideration more esoteric edge cases of i18 support.

Schema Observations

  • Allow for single document reads, including all the translation information for efficient multi language support.
  • The need to perform potentially mass updates can be costly if the translations are frequently changing.

Schema

Below is an example category document. The field names is an embedded document keyed by the language. In this case en-us means american english and the name of the category is car.

{
  "_id" : 1
  , "names" : {
    "en-us" : "car"
  }
}

The product contains all the categories as well, so that we can easily show the right name depending on the users preferred language. This is to optimize for single document retrieval avoid multiple round trips to the database. We will look at the trade off we make for this optimization later.

{
    "_id" : 1
  , "name" : "car"
  , "cost" : 100
  , "currency" : "usd"
  , "categories" : [{
        "_id" : 1
      , "names" : { "en-us" : "car" }
    } ]
  }

Operations

Add a new translation to a category

We are going to add the de-de local to the car category shown above. For simplicity’s sake we will assume this category is identified by the _id field set to the value 1.

{lang=“js”, linenos=on, title=“Example 1: Add a new translation”}

var categoryId = 1;

var categories = db.getSisterDB("shop").categories;
var products = db.getSisterDB("shop").products;

categories.update({
    _id: categoryId
  }, {
    $set: {
      "categories.$.names.de-de": 'auto'
    }
  });

products.update({
    "categories._id": categoryId
  }, {
    $set: {
      "categories.$.names.de-de": 'auto'
    }    
  })

In the first step, we update the category by adding the new de-de local to the document.

In the second step, we need to update all the caches of all documents that contain the car category. The update statement looks for all products where the categoires._id field matches categoryId. Then, on the first matching document for that categoryId, it adds the new local.

Removing a translation from a category

If we wish to remove a translation from a category, we need to first remove it from the category and then update all the product caches.

{lang=“js”, linenos=on, title=“Example 1: Remove a translation”}

var categoryId = 1;

var categories = db.getSisterDB("shop").categories;
var products = db.getSisterDB("shop").products;

categories.update({
    _id: categoryId
  }, {
    $unset: {
      "categories.$.names.de-de": 'auto'
    }
  });

products.update({
    "categories._id": categoryId
  }, {
    $unset: {
      "categories.$.names.de-de": 'auto'
    }    
  })

Just as when adding it, the only change is that we are using the $unset update operator to remove the field from the embedded documents.

Indexes

In this example there are special indexes used other than the _id index.

Scaling

Secondary Reads

If the site is read heavy (say a product catalog), it might make sense to offload reads to secondary servers to scale reading. It comes down to the application’s acceptable latency level as there might be some delay between a write happening on a primary until it’s been replicated across to the secondary.

Sharding

The multi language pattern does not really benefit from sharding. It’s more likely that you would shard the collections based on other criteria.

Performance

There is a very obvious trade off being made here. We are exchanging the costs of updating all the products each time we add or remove a local against the need to perform multiple reads on the categories collection. Since adding new translations are not likely to happen constantly, the added updates to the products collection are insignificant against the benefit of performing single document reads when retrieving the product documents.

A simple exploration of the performance on a single machine with MongoDb 3.0 shows the difference between MMAP and WiredTiger for a narrow simulation using the schema simulation framework mongodb-schema-simulator.

Scenarios

https://github.com/christkv/mongodb-schema-simulator/blob/master/examples/scripts/single_or_replset/multilanguage/multilanguage_add_new_local_scenario.js

https://github.com/christkv/mongodb-schema-simulator/blob/master/examples/scripts/single_or_replset/multilanguage/multilanguage_remove_local_scenario.js

MongoDb runs locally on a MacBook Pro Retina 2015 with ssd and 16 gb ram. The simulation runs with the following parameters against a single mongodb instance under osx 10.10 Yosemite.

Add a local to a category

Parameters Value
processes 4
poolSize per process 50
type linear
Resolution in milliseconds 1000
Iterations run 25
Number of users adding locals to category iteration 1000
Execution strategy slicetime

Remove a local from a category

Parameters Value
processes 4
poolSize per process 50
type linear
Resolution in milliseconds 1000
Iterations run 25
Number of users adding locals to category iteration 1000
Execution strategy slicetime

MMAP

The MMAP engine is run using the default settings on MongoDB 3.0.1.

Add Local to Category Simulation

add local to category scenario results

Statistics Value
Runtime 30.898 seconds
Mean 0.676 milliseconds
Standard Deviation 0.202 milliseconds
75 percentile 0.716 milliseconds
95 percentile 0.865 milliseconds
99 percentile 1.446 milliseconds
Minimum 0.391 milliseconds
Maximum 6.839 milliseconds

Remove Local from Category Simulation

remove local from category scenario results

Statistics Value
Runtime 31.037 seconds
Mean 0.675 milliseconds
Standard Deviation 0.185 milliseconds
75 percentile 0.715 milliseconds
95 percentile 0.867 milliseconds
99 percentile 1.418 milliseconds
Minimum 0.403 milliseconds
Maximum 5.882 milliseconds

As expected the performance is similar because the patterns are similar. We would expect the performance to be tied to the amount of documents that need to be updated when a new local is added to a category.

WiredTiger

The WiredTiger engine is run using the default settings on MongoDB 3.0.1.

Add Local to Category Simulation

add local to category scenario results

Statistics Value
Runtime 30.681 seconds
Mean 0.693 milliseconds
Standard Deviation 0.219 milliseconds
75 percentile 0.73 milliseconds
95 percentile 0.889 milliseconds
99 percentile 1.582 milliseconds
Minimum 0.409 milliseconds
Maximum 7.157 milliseconds

Remove Local from Category Simulation

remove local from category scenario results

Statistics Value
Runtime 30.404 seconds
Mean 0.913 milliseconds
Standard Deviation 0.337 milliseconds
75 percentile 1.145 milliseconds
95 percentile 1.316 milliseconds
99 percentile 1.763 milliseconds
Minimum 0.409 milliseconds
Maximum 7.938 milliseconds

WiredTiger is a bit slower than MMAP for this schema as it involves a lot of in place updates that causes the storage engine to rewrite the entire document.

Notes

It’s important to consider the trade off of caching vs performing multiple queries. In this case it’s pretty obvious the caching strategy will pay off but there might be situations where it doesn’t.

Let’s say you are caching stock ticker prices in a portfolio object where the stock ticker is constantly changing. The constant changes and required writes will offset any benefit of caching the latest stock prices in the portfolio document. In this case, it’s better to fetch the latest stock prices from a the prices collections instead.

tip

High rate of change

One more thing to consider is that if you find your application is constantly adding new categories or translations it might be beneficial to duplicate the product information making an identical document for each language allowing for single document reads for any language and avoiding massive updates across all possible products.