- Schema Book
- Account Transactions
Account Transactions
MongoDB 3.0 support any notion of a transaction across one or more documents. It does, however, guarantee atomic write operations on single documents. This allows us to implement a Two-Phase commit strategy using double bookkeeping to simulate transactions. However, it’s important to note that due to only single document operations being atomic, MongoDB can only offer transaction-like semantics. It’s still possible for applications to return intermediate results during the two-phase commit or rollback.
In this chapter, we will use two collections to simulate a bank account system. The first collection is accounts and contains all the customer accounts, while the second one is transactions and is our bookkeeping collection. The goal is to transfer 100 from Joe to Peter using a two-phase commit.
Schema Observations
- Simple.
- Requires background processes to manage unfinished transactions.
- Can still cause intermediate results during a two-phase commit or rollback.
Schema
In our simplified schema, we will only keep the basic necessary fields needed to perform the account transaction.
Let’s look at an example document representing an account.
{
"_id": ObjectId("54fd7392742abeef6186a68e")
, "name": "Joe Moneylender"
, "balance": 1000
, "pendingTransactions": []
}
Schema Attributes | Description |
---|---|
name | Name of the account owner (name is used for simplicities sake) |
balance | The account balance in a magic universal currency. |
pendingTransactions | Contains any pending transactions on the account |
A transaction document example would look something like.
{
"_id": ObjectId("54fd7392742abeef6186a68e")
, "source": "Joe Moneylender"
, "destination": "Peter Bum"
, "balance": 200
, "state": "initial"
}
Schema Attributes | Description |
---|---|
source | Name of the source bank account |
destination | Name of the destination bank account |
balance | Contains the amount of the transaction |
state | The state of the transaction. Can be one of initial , pending , committed , done and canceled |
Operations
Performing a successful Transfer
Create two accounts for Joe
and Peter
, crediting them each with an initial balance of 1000
.
var col = db.getSisterDB("bank").accounts;
col.insert({name: "Joe Moneylender", balance: 1000, pendingTransactions:[]});
col.insert({name: "Peter Bum", balance: 1000, pendingTransactions:[]});
We are going perform a transfer from Joe
to Peter
of 100
. We start by creating a new transaction and setting its state to initial
.
var col = db.getSisterDB("bank").transactions;
col.insert({source: "Joe", destination: "Peter", amount: 100, state: "intital"});
After successfully creating the initial transaction, we are going to attempt to apply it to both accounts. First we need to flip the transaction state to pending in order to signal that we are attempting to apply it to both accounts.
var col = db.getSisterDB("bank").transactions;
var transaction = col.findOne({state: "initial"});
col.update({_id: transaction._id}, {$set: {state: "pending"}});
Next, we attempt to apply the transaction to the source account. The transaction
variable represents the previously created transaction document.
var col = db.getSisterDB("bank").accounts;
col.update({
name: transaction.source, pendingTransactions: {$ne: transaction._id}, balance: { $gte: amount}
}, {
$inc: {balance: -transaction.amount}, $push: {pendingTransactions: transaction._id}
});
The update will only succeed if the transaction is not already present on the account and there is enough money to cover the attempted transaction (there is no concept of credit in this world).
If the criteria are met, we push the transaction id to the pendingTransactions
embedded array and deduct the amount from the account balance.
After applying the transaction successfully to the source account, we credit the destination account with the transfer amount requested.
var col = db.getSisterDB("bank").accounts;
col.update({
name: transaction.source, pendingTransactions: {$ne: transaction._id}, balance: { $gte: amount}
}, {
$inc: {balance: -transaction.amount}, $push: {pendingTransactions: transaction._id}
});
As with the source we ensure the transaction does not already exist on the destination account to guard against applying the same transaction multiple times.
Once we have successfully applied the transaction to the source
and the destination
accounts, our account documents will look something like the following (the pendingTransactions ObjectId entries will vary).
{
"_id": ObjectId("54fd7392742abeef6186a68e")
, "name": "Joe Moneylender"
, "balance": 900
, "pendingTransactions": [ObjectId("54fd7392742abeef6186a68e")]
}
{
"_id": ObjectId("54fd7392742abeef6186a68e")
, "name": "Joe Moneylender"
, "balance": 900
, "pendingTransactions": [ObjectId("54fd7392742abeef6186a68e")]
}
After having applied the transaction to the account, we need to update the transaction state to committed
.
var col = db.getSisterDB("bank").transactions;
var transaction = col.findOne({state: "pending"});
col.update({_id: transaction._id}, {$set: {state: "committed"}});
Once the transaction is in committed
state, we need to clean up the source and destination accounts by removing the transaction from both accounts. Let’s start by removing it from the source
account.
var col = db.getSisterDB("bank").transactions;
col.update({name: transaction.source}, {$pull: {pendingTransactions: transaction._id}});
After removing it from the source
account, we remove it from the destination
account.
var col = db.getSisterDB("bank").transactions;
col.update({name: transaction.source}, {$pull: {pendingTransactions: transaction._id}});
Finally, we set the transaction to the done
state, signaling the transfer has been completed.
var col = db.getSisterDB("bank").transactions;
var transaction = col.findOne({state: "pending"});
col.update({_id: transaction._id}, {$set: {state: "done"}});
That wraps up the successful transfer of an amount between two accounts. Next, we need to look at potential failure scenarios.
Transaction failure between initial and pending
If there is a failure while flipping the transaction state to pending
, the recovery process is to set the transaction state to canceled
.
var col = db.getSisterDB("bank").transactions;
var transaction = col.findOne({state: "initial"});
col.update({_id: transaction._id}, {$set: {state: "cancel"}});
Transaction failure between pending and committed
We have a partially applied transaction, leaving the transaction in an unfinished state. In this case we need to reverse
the transaction.
First reverse the debit of the source account.
var col = db.getSisterDB("bank").accounts;
col.update({
name: transaction.source, pendingTransactions: {$in: [transaction._id]}
}, {
$inc: {balance: transaction.amount}, $pull: {pendingTransactions: transaction._id}
});
Then, reverse the credit to the destination account.
var col = db.getSisterDB("bank").accounts;
col.update({
name: transaction.source, pendingTransactions: {$in: [transaction._id]}
}, {
$inc: {balance: -transaction.amount}, $pull: {pendingTransactions: transaction._id}
});
Finally, set the transaction itself to the canceled
state.
var col = db.getSisterDB("bank").transactions;
var transaction = col.findOne({state: "initial"});
col.update({_id: transaction._id}, {$set: {state: "cancel"}});
Transaction failure between committed and done
If the transaction fails between the state of committed
and done
we need to clean up the state of the accounts.
var col = db.getSisterDB("bank").transactions;
col.update({name: transaction.source}, {$pull: {pendingTransactions: transaction._id}});
After removing it from the source
account remove it from the destination
account.
var col = db.getSisterDB("bank").transactions;
col.update({name: transaction.source}, {$pull: {pendingTransactions: transaction._id}});
Finally retry changing the transaction state to done
.
tip
Recovery Process
It’s helpful to have a process running in the background that will look for any transactions left in pending
or committed
state. To determine the time a transaction has been sitting in an interrupted state, it might be helpful to add a createAt
timestamp to all transaction documents.
Undoing/Rolling back a transaction
For some cases you might need to undo (rollback) a transaction due to the application canceling the transaction or because it cannot be recovered (for example, if the one of the accounts does not exist during the transaction).
There are two points in the two-phase commit we can rollback.
- If you have applied the transaction to the accounts, you should not rollback. Instead, create a new transaction based on the existing transaction and switch the original source and destination fields.
- If you have created the transaction but have not yet applied it, you can use the following steps.
First set the transaction state to canceling
var col = db.getSisterDB("bank").transactions;
col.update({_id: transaction._id}, {$set: {state: "canceling"}});
Next, let’s undo the transaction. Notice that a non-applied transaction means the transaction _id
has not yet been removed from the pendingTransactions
array. Let’s undo the transaction on the source
account.
var col = db.getSisterDB("bank").accounts;
col.update({
name: transaction.source, pendingTransactions: transaction._id
}, {
$inc: {balance: transaction.value}, $pull: {pendingTransactions: transaction._id}
});
Next undo the transaction on the destination
account.
var col = db.getSisterDB("bank").accounts;
col.update({
name: transaction.destination, pendingTransactions: transaction._id
}, {
$inc: {balance: -transaction.value} , $pull: {pendingTransactions: transaction._id}
});
Finally set the transaction state to canceled
var col = db.getSisterDB("bank").transactions;
col.update({_id: transaction._id}, {$set: {state: "canceled"}});
Concurrent Transaction Application
Let’s imagine applications A1
and A2
that both start processing the single transaction, T1 while the transaction is still in the
initial` state.
A1
can applyT1
beforeA2
startsA2
will then applyT1
again because it does not appear as pending in theaccounts
documents.
You can avoid this by explicitly stating in the transaction which application is handling it.
var col = db.getSisterDB("bank").transactions;
col.findAndModify({
query: {state: "initial", application: {$exists: 0}}
, update: {$set: {state: "pending", application: "A1"}}
, new: true});
findAndModify
will retrieve and update the document in one atomic operation. This guarantees that only a single application can tag a transaction that is being processed by it. In this case, a transaction in the initial
state, is marked as being processed by A1
should the application
field not exist.
If the transaction fails or needs to be rolled back, you can retrieve the pending
transactions for the specific application, A1
.
var col = db.getSisterDB("bank").transactions;
col.transactions.find({application: "A1", state: "pending"});
Indexes
Since we are retrieving transactions by the _id
field we do not need any additional indexes for the transactions
collection. For the accounts
collection we are retrieving the accounts by the name
field.
var col = db.getSisterDB("bank").accounts;
col.ensureIndex({name:1});
name
index
If you have processes looking up transactions that are unfinished and are still in the committed
or pending
states, continuing to either apply the transactions to the accounts or to roll them back, an additional index on the transaction state
field will make retrieval of transactions more efficient.
var col = db.getSisterDB("bank").transactions;
col.ensureIndex({state:1});
state
index
Scaling
Secondary Reads
All the operations against the queue are write operations so secondary
reads are not useful for this schema.
Sharding
In this case we can shard the accounts
collection by the name
field and use a hashed index for the field to ensure even distribution across all out shards.
var admin = db.getSisterDB("admin");
db.runCommand({enableSharding:'bank'});
db.runCommand({
shardCollection: 'bank.accounts'
, key: {name:'hashed'}
});
name
key
For the transactions
collection we will hash the _id
field as we are looking up transactions mostly by _id
. Looking up the transaction by the state
field is a rare case that we can let incur a scatter gather
query.
var admin = db.getSisterDB("admin");
db.runCommand({enableSharding:'bank'});
db.runCommand({
shardCollection: 'bank.transactions'
, key: {_id:'hashed'}
});
_id
key
Performance
A simple exploration of the performance on a single machine with MongoDb 3.0 shows the difference between MMAP
and WiredTiger
for a narrow simulation using the schema simulation framework mongodb-schema-simulator
.
Scenario
MongoDb runs on a MacBook Pro Retina 2013 with 512 GB ssd and 16 gb ram. The simulation runs with the following parameters against a single mongodb
instance under osx 10.10 Yosemite
.
Parameters | Value |
---|---|
processes | 4 |
poolSize per process | 50 |
type | linear |
Resolution in milliseconds | 1000 |
Iterations run | 25 |
Number of users querying using metadata per iteration | 1000 |
Execution strategy | slicetime |
MMAP
The MMAP
engine is run using the default settings on MongoDB 3.0.1
.
Successfully transfer between accounts
Statistics | Value |
---|---|
Runtime | 48.086 seconds |
Mean | 9951.659 milliseconds |
Standard Deviation | 7131.847 milliseconds |
75 percentile | 12220.925 milliseconds |
95 percentile | 25750.843 milliseconds |
99 percentile | 34520.997 milliseconds |
Minimum | 65.398 milliseconds |
Maximum | 41428.625 milliseconds |
WiredTiger
The WiredTiger
engine is run using the default settings on MongoDB 3.0.1
.
Successfully transfer between accounts
Statistics | Value |
---|---|
Runtime | 46.653 seconds |
Mean | 8655.022 milliseconds |
Standard Deviation | 5743.181 milliseconds |
75 percentile | 11037.199 milliseconds |
95 percentile | 19937.439 milliseconds |
99 percentile | 29972.029 milliseconds |
Minimum | 77.812 milliseconds |
Maximum | 40603.657 milliseconds |
The two storage engines MMAP
and WiredTiger
are fairly close in performance, as the document actually changes are small and thus do not adversely affect WiredTiger
performance.
Notes
important
Notes
Real world applications will likely be more complex and require updating more than just the balance of the accounts. The application might need to update pending credits, pending debits and other details.
If these fields are part of the account document they can still occur within a single update
, ensuring an atomic
update of all the fields.