Planned Maintenance Windows

written by vdbdev on January 30, 2014 with no comments

When you need to take the database down for maintenance, this is the recommended approach to do it safely with no data loss using a snapshot.

A snapshot is a point-in-time consistent copy of the entire contents of the database. You may think of it as a backup. It is written to a local disk on each node in a cluster. A snapshot can be taken manually at any time, whether the database is online and available to users or in admin mode, and regardless of whether command-logging or automated snapshots are enabled.

For this maintenance process, the database must first be put into admin mode so that no further user transactions can occur after the snapshot begins. This ensures that the snapshot contains all of the latest data. Later, when the database is restarted, it must be started in admin mode, to prevent users from creating any new records that could interfere with restoring the snapshot, until the snapshot has been loaded and the process is complete.

By relying on a snapshot rather than command-logging, this process allows you to make any type of change to the deployment of the cluster, its hardware, or the database schema. By contrast, the command log recovery process is intended to restart the database as-is with no deployment or schema changes, to recover from power loss or other failures that stopped the cluster.

Process for a planned maintenance window

1) Pause the database (disconnect users)

voltadmin pause

As an optional sanity check, query the record count of a large table, for example:

sqlcmd --port=21211

1> SELECT COUNT(*) FROM votes;

2) Take a manual snapshot

voltadmin save /path/to/save/dir snapshot_name

3) Verify snapshot succeeded

voltadmin show snapshots

This will show a report of the most recent snapshots. You should look at the results and verify that the snapshot completed successfully before stopping the database.

4) Shut down the database

voltadmin shutdown

5) Maintenance Window

During the window, you may make changes, such as:

  • Catalog changes
  • Deployment file changes
  • Hardware changes
  • Upgrading VoltDB

To prevent users from connecting right away when the database restarts, and possibly interfering before the snapshot is fully restored, you can have the database start in admin mode by setting adminstartup to true in your deployment.xml file:

<deployment>
  ...
  <admin-mode port="21211" adminstartup="true"/>
</deployment>

6) Restart the database

voltdb create thecatalog.jar -d deployment.xml -l thelicense.xml -H voltdbserver1 --background

7) Reload the data from the snapshot

voltadmin restore /path/to/save/dir snapshot_name

Optionally, repeat the sanity check, for example:

sqlcmd --port=21211

1> SELECT COUNT(*) FROM votes;

8) Take the database out of admin mode, which will allow users to connect

voltadmin resume