VoltDB Command Logging Replay

written by Ning Shi on August 10, 2011 with no comments

In the previous blog post (http://newblog.voltdb.com/intro-voltdb-command-logging), Ariel Weisberg described how VoltDB’s command logging feature works. He also briefly mentioned how we replay command logs during the recovery process. In this post, I am going to focus on the replay process and discuss how VoltDB recovers from catastrophic events.<--break->

Goals of Command Logging Replay

The goals of command logging replay are pretty simple:

  1. Ensure that the recovered database is 100% accurate to the last usable transaction in the command log
  2. Complete the recovery process in the shortest possible time

Command logging obviously adds important new functionality to VoltDB’s infrastructure.  That said, the command logging replay subsystem is designed specifically to leverage many of VoltDB’s existing distributed, high performance transaction processing technologies.

How VoltDB Generates A Replay Plan

VoltDB 2.0 will support recovery mode as a command-line argument. When the database starts in this mode, each node scans for snapshots and command logs locally. On a single node, there could be multiple snapshots. But there is at most one command log. For each of the nodes that have command logs, snapshots taken before the first logged transaction in the local command log are skipped. This ensures a proper starting point against which logs can be replayed.  Viable snapshots and logs collectively comprise all of the local information needed to recover the database.

Each node sends its local recovery information to a VoltDB agreement service, to which other nodes have visibility. Once every node has done this, a consistent deterministic plan can be generated independently on each node. A global consensus will be reached about which command logs are faulted and which are usable. If the usable logs constitute a full database, recovery will initiate.

Transactions may have beeen logged redundantly depending on the replication factor in the predecessor production cluster. In order to replay each transaction only once, each recovery node is assigned a set of partitions from the predecessor production cluster. Each node will only replay transactions targeted for that node’s assigned partitions.

How VoltDB Replays Transactions

Transactions are replayed the same way as normal transactions are processed from VoltDB client applications (http://voltdb.com/company/blog/lifecycle-transaction). They are re-initiated using initiators that are local to each log. The original transaction IDs are reused. Transactions are rehashed and sent to the corresponding partition(s) in the new topology. In other words, command logs are topology agnostic. No matter what the topology looks like in the new cluster, as long as there are enough command logs to assemble a full database, the database can be recovered.  This approach leverages VoltDB’s distributed transaction processing infrastructure.

Before transactions are re-initiated, they have to be processed so that they replay in the correct order. Replay uses VoltDB’s transaction processing engine to ensure transactions are executed in global order: http://voltdb.com/company/blog/transaction-ordering-and-replication (again, letting the core VoltDB engine do the heavy lifting). VoltDB ensures that incoming transaction streams are properly sequenced and replayed into the database.

As mentioned above, each VoltDB node only replays transactions that were issued to its selected partition(s). Different types of transactions are handled differently. Single-partition transactions are checked by looking at the partition parameters. For multi-partition transactions, VoltDB records the transactions’ original coordinators, then maps the coordinator site IDs back to the original partition IDs.

On the last step of recovery, after all transactions have been replayed, the database takes a global snapshot. The snapshot truncates the command log so that log files can be reused when the system is ready to accept transactions. At this point, it is possible to recover the database again to the same state by using this snapshot as a starting point.

Summary

Command logging replay is designed to apply snapshots and command logs to quickly recover a VoltDB database.  It adds important new functionality, but also explicitly leverages VoltDB’s core transaction processing infrastructure.  In our benchmarks, a single node VoltDB database running on a commodity server was able to replay the command log of the Voter application at 110,000 transactions per second, which is roughly equivalent to VoltDB’s normal production throughput.