Special Considerations for Existing 3.x Customers

Documentation

Home » Documentation » Special Considerations for Existing 3.x Customers

Special Considerations for Existing 3.x Customers


Product

VoltDB

Version

3.x

To accommodate new features and focus on improved ease of use, a number of changes have been made to the VoltDB product in version 4.0, including changes to functionality from pre-V4 versions. Existing customers should be aware of the following changes in behavior that could impact their applications.

1.1.

Try the VoltDB Enterprise Edition.

With V4.0, the durability and availability features associated with production usage move from the VoltDB community edition to the enterprise edition. This includes K-safety and snapshot save and restore. We encourage anyone who has been using the community edition to try the enterprise edition, which comes with a 30 day trial license. The community edition is still freely available for research and application development. However, the Enterprise Edition is recommended for all production environments.

1.2.

Upgrade both server and Java client systems.

The Java client library implements client affinity, allowing the client to route procedure calls to the most effective database server. VoltDB 4.0 changes the hashing algorithms for partitioning. As result, attempting to call a VoltDB v4.0 database using a VoltDB 3.x Java client library can result in spurious errors and decreased performance. We strongly recommend upgrading both server and client systems when upgrading to VoltDB V4.0. (This change does not affect non-Java clients.)

1.3.

Check the latest system requirements

With 4.0. VoltDB has updated system requirements. CentOS 5.8 and OS X 10.6 are no longer supported. Also, VoltDB now requires Java 7. If you have not already, be sure to upgrade your server software to meet the new requirements:

  • CentOS 6.3 and later, Red Hat 6.3 and later, Ubuntu 10.4 and 12.4, or Macintosh OS X 10.7 and later

  • OpenJDK or Oracle/Sun JDK 7 or later

1.4.

Update scripts that use the CLI to start VoltDB.

VoltDB 4.0 introduces a new command line interface (CLI) for starting server nodes. The commands for voltdb create, recover, rejoin, and add now use standard LINUX command line syntax, including argument switches, consistent with the other VoltDB command line utilities voltadmin, csvloader, and sqlcmd. As a result, you will need to update any scripts you have for starting VoltDB servers from the command line. (Note that the REST interface has not changed.) For example, the command to start a new database cluster looks like this:

$ voltdb create mycatalog.jar \
     --deployment=deployment.xml \
     --host=voltsvr1 \
     --license=~/license.xml

There are also short forms for the most common switches:

$ voltdb create mycatalog.jar \
     -d deployment.xml \
     -H voltsvr1 \
     -l ~/license.xml

If you would prefer to temporarily continue using the old syntax for the voltdb command, you can either:

  • Use the voltdb3 command which is provided for backwards compatibility.

  • Replace the voltdb script in the VoltDB /bin directory with the voltdb3 script. To do this, we recommend copying /bin/voltdb to /bin/voltdb4 and then copying /bin/voltdb3 to /bin/voltdb, so you have access to both the old and new syntax.

1.5.

Changes related to elasticity.

With 4.0, elasticity (the ability to expand a running database) is enabled by default. There are several consequences of this change:

  • You no longer need to explicitly enable elasticity by including the elastic="enabled" attribute to the <cluster> element in the deployment file.

  • Elasticity changes how content is segmented into partitions. As a result the trick for reaching each partition separately by incrementing a partition key modulo the number of partitions will not work reliably. Instead, a new system procedure is provided that returns a set of key values for reaching all of the current partitions. The syntax for the @GetPartitionKeys system procedure is:

    ClientResponse client.callProcedure("@GetPartitionKeys", String datatype)

    The parameter to the procedure is a string value specifying the datatype of the keys to return. The allowable values for the parameter (which is not case sensitive) are "INTEGER", "STRING", and "VARCHAR", where "STRING" and "VARCHAR" are synonyms.

    Note that the results of the @GetPartitionKeys system procedure are valid at the time they are generated. If the cluster is static (that is, no nodes are being added and any rebalancing is complete), the results remain valid until the next elastic event. However, during rebalancing, the distribution of partitions is likely to change. So it is a good idea to call @GetPartitionKeys once to get the keys, act on them, then call the system procedure again to verify that the partitions have not changed.

  • When adding nodes to a highly available cluster (that is, where the K-safety value is greater than zero), you must add K+1 nodes before the cluster starts to rebalance the partitions. Until you add a full complement of K+1 nodes, any added nodes join the cluster but do not participate in any work until the rebalancing starts.

  • While rebalancing is in progress, the cluster continues to process client requests. However, rebalancing will tend to impact latency of other transactions. Also, depending on the workload and size of the database, rebalancing can take a significant amount of time. You can configure the ratio of ongoing client transactions against rebalancing activities. See the section on elastic scaling in the Using VoltDB manual for details.

  • Elasticity works with all major features of VoltDB, including export, K-safety, command logging, and database replication (DR). For DR, when you use elasticity to expand the master cluster, replication stops. Once the master cluster finishes rebalancing, you can restart the replica — using sufficient nodes to match the new master configuration — then restart the DR agent. You cannot elastically expand the replica cluster.

1.6.

Update the deployment file for any database using export.

With 4.0, support for remote clients has been dropped. All export clients must run on the database server. This change provides both more reliability and better performance for the export process.

As part of this change, the configuration of export in the deployment file has changed to simplify configuration. Specifically, the <onserver> element has been removed and the exportto attribute is replaced by the target attribute to the parent <export> element. The following example illustrates the new syntax you must use:

<export enabled="true" target="file">
   <configuration>
     <property name="type">csv</property>
     <property name="nonce">MyExport</property>
  </configuration>
</export>

1.7.

Evaluate your schema for possible use of ASSUMEUNIQUE.

In previous releases, VoltDB allowed you to define unique indexes on partitioned tables, although the index could not be guaranteed to be unique. If an index on a partitioned table does not include the partitioning column, there is no way for VoltDB to ensure the index is unique except within the partition where the query is executed.

To ensure global uniqueness for UNIQUE and PRIMARY KEY indexes, VoltDB no longer allows you to define such indexes on partitioned tables if the index does not include the partitioning column. As a result, schema that compiled successfully in the past may fail to compile under VoltDB 4.0.

However, there is an alternative. A new keyword, ASSUMEUNIQUE, has been added. You can use ASSUMEUNIQUE anywhere you use UNIQUE. The difference is that an ASSUMEUNIQUE index can be defined on a partitioned table where the index does not contain the partitioning column. VoltDB treats ASSUMEUNIQUE indexes as if they were unique, providing all the performance advantages of unique indexes. However, it is your responsibility to ensure the index is actually globally unique.

Although VoltDB verifies the index entry is unique within the current partition when a record is inserted, the danger is that if the index is not actually unique, any operation that repartitions the data could generate a constraint violation when two matching ASSUMEUNIQUE entries fall within the same partition. Operations that could generate such violations include restoring a snapshot to a different cluster configuration or rebalancing as a result of adding nodes. When this happens the repartitioning operation is likely to fail and you must correct the conflict before continuing.

Therefore, although there are performance advantages to using ASSUMEUNIQUE indexes, you use them at your own risk. You should ensure that the indexes are actually globally unique, either by design or by external verification, before inserting ASSUMEUNIQUE records into the database itself.