Part 4: Schema Updates and Durability

Documentation

Home » Documentation » Welcome to VoltDB

Part 4: Schema Updates and Durability

Thus far in the tutorial we have restarted the database from scratch and reloaded the data manually each time we changed the schema. This is often the easiest way to make changes when you are first developing your application and making frequent changes. However, as your application — and the data it uses — becomes more complex it is desirable to maintain your database state across sessions.

You may have noticed that in the previous section of the tutorial we modified the schema to add the States table but did not add it to the running database yet. That is because we want to demonstrate ways of modifying the database without having to start from scratch each time.

Note

The following examples make use of functionality available in the VoltDB Enterprise Edition; specifically, the voltadmin update command for on-the-fly schema changes. If you are using the VoltDB Community Edition you will need to save, restart, and restore to complete the schema updates manually. See the section called “Restoring Data to a New Catalog” for details.

Saving and Restoring Data

First let's talk about durability. VoltDB is an in-memory database. Each time you start the database with the create action, it creates a fresh, empty copy of the database. Obviously, in most real business situations you want the data to persist. VoltDB has several features that preserve the database contents across sessions. We will start by looking at snapshots.

Snapshots are a complete disk-based representation of a VoltDB database, including everything needed to reproduce the database after a shutdown. You can create a snapshot of a running VoltDB database at anytime using the voltadmin save command. For example, from our tutorial directory, we can save the data to disk in a snapshot called "townsandpeople".

$ HERE=$(pwd)
$ voltadmin save $HERE/voltdbroot/snapshots/ "townsandpeople"

The arguments to the voltadmin save command are the directory where the snapshot files will be created and the name for the snapshot. Note that the save command requires an absolute path for the directory. In the preceding example, we assign the current working directory to a variable so the snapshot can be saved in the subfolder ./voltdbroot/snapshots. VoltDB creates this folder by default when you start the database. We will learn more about it shortly.

Now that you have a copy of the database contents, we can stop and restart the database. But this time, instead of using the create command, we use recover:

^C
$ voltdb recover

When you specify recover as the startup action, VoltDB looks for and restores the most recent snapshot. And since the snapshot contains both the catalog and the data, you do not have to specify the catalog on the command line.

We can verify that the database was restored by doing some simple SQL queries in our other terminal session:

$ sqlcmd
SQL Command :: localhost:21212
1> select count(*) from towns;
C1     
-------
 193297


(1 row(s) affected)
2> select count(*) from people;
C1    
------
 81691


(1 row(s) affected)

Adding and Removing Tables

Now that we know how to save and restore the database, we can add the States table we defined in Part Three. Adding and dropping tables, or changing stored procedures can be done "on the fly", while the database is running. We start by recompiling the application catalog using our new schema definition. Then we use the voltadmin update command to update the catalog on the running database.[1]

When you update the catalog, you must provide both the catalog and the deployment file. VoltDB uses the deployment file to set runtime configuration options, such as the number of servers, how many partitions per host, and so on. Thus far we have used the default settings when starting the database, so have not needed to specify a deployment file. However, VoltDB creates one for you when you do that. So we can use the default deployment file that VoltDB creates to perform the update. So let's recompile the catalog and update the database. We can also check to make sure our new table exists by checking how many records it contains.

$ voltdb compile -o towns.jar towns.sql
$ voltadmin update towns.jar voltdbroot/deployment.xml
$ sqlcmd
SQL Command :: localhost:21212
1> select count(*) from states;
C1 
---
  0

(1 row(s) affected)
3> exit

Next we can load the state information from the data file and save a new copy of the database.

$ csvloader --skip 1 -f data/states.csv states
$ HERE=$(pwd)
$ voltadmin save $HERE/voltdbroot/snapshots/  "states"

Updating Existing Tables

Now that we have a definitive lookup table for information about the states, we no longer need the redundant columns in the Towns and People tables. We want to keep the FIPS column, State_num, but can remove the State column from each table. Our updated schema for the two tables looks like this:

CREATE TABLE towns (
   town VARCHAR(64),
--   state VARCHAR(2),
   state_num TINYINT NOT NULL,
   county VARCHAR(64),
   county_num SMALLINT NOT NULL,
   elevation INTEGER
);
CREATE TABLE people (
  state_num TINYINT NOT NULL,
  county_num SMALLINT NOT NULL,
--  state VARCHAR(20),
  town VARCHAR(64),
  population INTEGER
);

We can recompile the catalog as before.

Many schema changes, including adding and removing columns, can be done on the fly just like adding and dropping tables. However, there are some limitations. For example, you cannot add new uniqueness constraints to an existing index or column. In these cases, the best method is to save the existing data, then restore the data to a fresh copy of the database using the new catalog.

Although the preceding changes are permissible for voltadmin update, in the next section we will show you how to change the schema by restoring a snapshot to an updated schema, just so you know how it is done.

Restoring Data to a New Catalog

When you start a database with recover, VoltDB restores both the data and the associated catalog. To replace the catalog but keep the data you need to create a new database using the updated catalog and then manually restore the data using the voltadmin restore command.

If you haven't already, stop the database, recompile your catalog using the modified schema (which is townsupdate.sql in the source files), and create a new database using the updated catalog:

^C
$ cp townsupdate.sql towns.sql
$ voltdb compile -o towns.jar towns.sql
$ voltdb create towns.jar

Once the database has started, you can use the voltadmin restore command from another terminal session to restore the data — but not the catalog — from the previous session. The restore command takes the same first two arguments as save. That is, the directory where the snapshot resides and the unique identifier.

$ HERE=$(pwd)
$ voltadmin restore $HERE/voltdbroot/snapshots/ "states"
HOST_ID  HOSTNAME  SITE_ID  TABLE   PARTITION_ID  RESULT   ERR_MSG 
-------- --------- -------- ------- ------------- -------- --------
       0 pollux           0 STATES             -1 SUCCESS          
       0 pollux           0 STATES             -1 SUCCESS          
       0 NULL             0 PEOPLE              1 SUCCESS          
       0 NULL             0 PEOPLE              0 SUCCESS          
       0 NULL             0 PEOPLE              1 SUCCESS          
       0 NULL             0 PEOPLE              0 SUCCESS          
       0 NULL             0 TOWNS               1 SUCCESS          
       0 NULL             0 TOWNS               0 SUCCESS          
       0 NULL             0 TOWNS               0 SUCCESS          
       0 NULL             0 TOWNS               1 SUCCESS          

Again, we can use SQL queries to verify that the column has been removed.

$ sqlcmd
SQL Command :: localhost:21212
1> select top 1 * from towns order by state_num,county_num;
TOWN          STATE_NUM  COUNTY   COUNTY_NUM  ELEVATION 
------------- ---------- -------- ----------- ----------
Autaugaville           1 Autauga            1         49


(1 row(s) affected)

Finally, we can save the database one more time. To save space, we can delete the old snapshots from the directory before saving the new one. Like so:

$ rm voltdbroot/snapshots/*
$ voltadmin save $HERE/voltdbroot/snapshots/ "tutorial"

Durability and Automated Snapshots

Up to now we have manually saved the database when we made changes, which is fine during development or when doing an explicit save and restore to update the catalog. But in most production environments it is important to keep the snapshots up to date in case of accidents or unexpected events, such as system failure.

VoltDB provides several features to ensure the durability of your data. At a minimum, it is a good idea to schedule regular snapshots to maintain a recent copy of the database. Establishing an automated snapshot schedule is one of the configuration options you set when starting the database.

We have been starting VoltDB with the default configuration. However, you can enable specific features using a deployment file. You can see the default deployment file in the voltdbroot subfolder:

$ cat voltdbroot/deployment.xml
<?xml version="1.0"?>
<!-- IMPORTANT: This file is an auto-generated ...
                                               ... -->
<deployment>
   <cluster hostcount="1" sitesperhost="2" />
   <httpd enabled="true">
      <jsonapi enabled="true" />
   </httpd>
</deployment>

The default settings use a single machine, localhost, with two partitions per host and with the http and JSON interfaces enabled. You can copy this file to your working directory and edit it, or create a new deployment.xml file, adding the tags necessary to enable automated snapshots. Your updated deployment file looks like this:

<?xml version="1.0"?>
<deployment>
   <cluster hostcount="1" sitesperhost="2" />
   <httpd enabled="true">
      <jsonapi enabled="true" />
   </httpd>
   <snapshot prefix="tutorial" 
          frequency="5m" 
          retain="3"
   />
</deployment>

The attributes of the <snapshot> tag in the preceding example set the name (or prefix) of the snapshots to "tutorial", the frequency of snapshots to every 5 minutes, and tells VoltDB to retain only the three most recent copies. (Older snapshots are deleted to save space.) Now when we start or recover the database we can specify our custom deployment file to have VoltDB automatically snapshot every 5 minutes.

$ voltdb recover --deployment=deployment.xml

Once automated snapshots are in place, you don't have to manually create a snapshot before stopping the database. You can use the snapshots created automatically by VoltDB, using the command in the preceding example.

Note that the deployment settings are runtime options and are not saved as part of the snapshot. So you should always specify your deployment file on the command line when starting VoltDB unless you want to use the default configuration. VoltDB has several other features that provide additional protection against data loss and increase the resilience of the database, including K-safety, command logging, and database replication. See the Using VoltDB manual for more information about availability options.

This ends Part Four of the tutorial.



[1] If you are using the VoltDB Community Edition, you will need to stop and restart the database to change the schema, as described in the section called “Restoring Data to a New Catalog”.