There are three steps to running a VoltDB application:
Defining the cluster configuration
Starting the VoltDB database
Starting the client application or applications
The following sections describe the procedures for starting and stopping a VoltDB database in detail.
The schema that is used to compile the application catalog defines how the database is logically structured: what tables to create, which tables are partitioned, and how they are accessed (i.e. what stored procedures to support). The other important aspect of a running database is the physical layout of the cluster that runs the database. This includes information such as:
The number of nodes in the cluster
The number of partitions (or "sites") per node
The amount of K-safety to establish for durability
You define the cluster configuration in the deployment file. The deployment file is an XML file, which you specify when you start the database to establish the correct cluster topology. The basic syntax of the deployment file is as follows:
<?xml version="1.0"?> <deployment> <cluster hostcount="n" sitesperhost="n" kfactor="n" /> </deployment>
The attributes of the
<cluster> tag define the physical layout of the hardware that will run
the database. Those attributes are:
hostcount — specifies the number of nodes in the cluster.
sitesperhost — specifies the number of partitions (or "sites") per host. In general, this value is related to the number of processor cores per node. Section 6.1.1, “Determining How Many Partitions to Use” explains how to choose a value for this attribute,
kfactor — specifies the K-safety value to use when creating the database. This attribute is optional. If you do not specify a value, it defaults to zero. (See Chapter 11, Availability for more information about K-safety.)
In the simplest case — when running on a single node with no special options enabled — you can skip the deployment file altogether and specify only the catalog on the command line. If you do not specify a deployment file or host, VoltDB defaults to one node, two sites per host, a K-safety value of zero, and localhost as the host.
The deployment file is used to enable and configure many other runtime options related to the database, which are described later in this book. For example, the deployment file specifies whether security is enabled and defines the users and passwords that are used to authenticate clients at runtime. See Chapter 8, Security for more information about security and VoltDB databases.
In general, the number of partitions per node is related to the number of processor cores each system has, the optimal number being approximately 3/4 of the number of CPUs reported by the operating system. For example, if you are using a cluster of dual quad-core processors (in other words, 8 cores per node), the optimal number of partitions is likely to be 6 or 7 partitions per node.
For systems that support hyperthreading (where the number of physical cores support twice as many threads), the operating system reports twice the number of physical cores. In other words, a dual quad-core system would report 16 virtual CPUs. However, each partition is not quite as efficient as on non-hyperthreading systems. So the optimal number of partitions is more likely to be between 10 and 12 per node in this situation.
Because there are no hard and set rules, the optimal number of partitions per node is best calculated by actually benchmarking the application to see what combination of cores and partitions produces the best results. However, two important points to keep in mind are:
It is never useful to specify more partitions than the number of CPUs reported by the operating system.
All nodes in the cluster will use the same number of partitions, so the best performance is achieved by using a cluster with all nodes having the same physical architecture (i.e. cores).
In addition to configuring the database process on each node of the cluster, the deployment file lets you enable and configure a number of features within VoltDB. Export, automatic snapshots, and network partition detection are all enabled through the deployment file. The later chapters of this book describe these features in detail.
An important aspect of these features is that some of them make use of disk resources for persistent storage across sessions. For example, automatic snapshots need a directory for storing snapshots of the database contents. Similarly, export uses disk storage for writing overflow data if the export client cannot keep up with the export queue.
You can specify individual paths for each feature, or you can specify a root directory where VoltDB will create
subfolders for each feature as needed. To specify a common root, use the
<voltdbroot> tag (as a
<paths>) to specify where VoltDB will store disk files. For example, the following
<paths> tag set specifies
/tmp as the root directory:
<paths> <voltdbroot path="/tmp" /> </paths>
Of course, /tmp is appropriate for temporary files, such as export overflow. But /tmp is not a good location for
files that must persist when the server reboots. So you can also identify specific locations for individual features. For
example, the following excerpt from a deployment file specifies
/tmp as the default root but
/opt/voltdbsaves as the directory for automatic snapshots:
<paths> <voltdbroot path="/tmp" /> <snapshots path="/opt/voltdbsaves" /> </paths>
If you specify a root directory path, the directory must exist and the process running VoltDB must have write access to it. VoltDB does not attempt to create an explicitly named root directory path if it does not exist.
On the other hand, if you do not specify a root path or a specific feature path, the root path defaults to
./voltdbroot in the current default directory and VoltDB creates the directory (and subfolders) as
needed. Similarly, if you name a specific feature path (such as the snapshots path) and it does not exist, VoltDB will
attempt to create it for you.
The deployment file defines the expected configuration of your database cluster. However, there are several important aspects of the physical hardware and operating system configuration that you should be aware of before running VoltDB:
VoltDB can operate on heterogeneous clusters. However, best performance is achieved by running the cluster on similar hardware with the same type of processors, number of processors, and amount of memory on each node.
All nodes must be able to resolve the IP addresses and host names of the other nodes in the cluster. That means they must all have valid DNS entries or have the appropriate entries in their local hosts file.
You must run NTP on all of the cluster nodes, preferably synchronizing against the same local time server. If the time skew between nodes in the cluster is greater than 100 milliseconds, VoltDB cannot start the database.
It is strongly recommended that you run NTP with the -x argument. Using
ntpd -x stops the
server from adjusting time backwards for all but very large increments. If the server time moves backward, VoltDB must
pause and wait for time to catch up.