Run Everywhere

written by vdbdev on February 26, 2014 with no comments

VoltDB supports single-partition transactions, which as the name implies, run on only
one partition. VoltDB can run as many single partition transactions in parallel as
there are unique partitions. This enables high transaction throughput as well as
high transaction concurrency execution.

VoltDB also supports multi-partition transactions, which are transactions that involve
all unique partitions within the database. When a multi-partition transaction is
executing, it is the only transaction running in the database at that point in time.
Thus, you can run fewer multi-part transactions as compared to single-part transactions

But what if you wanted to run an operation on all data (each partition), but you
didn’t want to execute the operation as a multi-part transaction? For example, you
may want to run a periodic operation across all the data to locate one or more
particular records, as a scatter/gather type of operation. Or perhaps you want to
age out or delete rows, or perhaps timeout session data. These types of operations
likely don’t have to be transactionally consistent across the data set and thus do
not need to be run as a multi-partition transaction.

Here’s how to do that.

In this example, we’ll modify the Voter sample application to periodically execute
a stored procedure on each partition. This transaction will return the count
of the number of voters in each partition.

Here’s the DDL to define the “Everywhere” stored procedure:

CREATE PROCEDURE FROM CLASS voter.procedures.Everywhere;
PARTITION PROCEDURE Everywhere ON TABLE votes COLUMN phone_number;

The Everywhere Java stored procedure is defined as follows:

package voter.procedures;

import org.voltdb.SQLStmt;
import org.voltdb.VoltProcedure;
import org.voltdb.VoltTable;

public class Everywhere extends VoltProcedure {

// Count the phone numbers who voted in this partition
public final SQLStmt count_votes = new SQLStmt(
        "SELECT count(*) FROM v_votes_by_phone_number;");

public VoltTable[] run(long id) {
    // Count the phone numbers who voted.
    voltQueueSQL(count_votes);
    return voltExecuteSQL(true);
  }
}

The client code uses a VoltDB System Procedure, @GetPartitionKeys,
to retrieve a list of partition values, one for every partition in the database.
Client programs can use this list of partition key values to invoke a single-part
stored procedure on each partition individual, as shown in this client-side
code:

/**
 * Run the a query (or stored procedure) on every partition, as a s
 * ingle-partition transaction.
 *
 * @throws Exception if anything unexpected happens.
 */
public void run_everywhere() throws Exception {
    // Get the partition key for each partition from the database.
    // Once we have partition ids for all partitions we'll loop through
    // and invoke our query or transaction as a single-part transaction.

    VoltTable results[] = client.callProcedure("@GetPartitionKeys", "INTEGER")
                              .getResults();
    VoltTable keys = results[0];
    for (int k = 0;k < keys.getRowCount(); k++) {
        long key = keys.fetchRow(k).getLong(1);
        VoltTable voter_count_table = client.callProcedure("Everywhere", key)
                                        .getResults()[0];
        System.out.println("Partition " + key + " row count = " + 
                              voter_count_table.fetchRow(0).getLong(0));
    }
}

When this code is added to the Voter application and run periodically via the
statistics thread on a single-node cluster with 6 sites per host, the output
looks similar to the following:

Running benchmark...
00:00:05 Throughput 137143/s, Aborts/Failures 0/0
Partition 2 row count = 185484
Partition 1 row count = 186041
Partition 0 row count = 186821
Partition 3 row count = 187131
Partition 5 row count = 186745
Partition 4 row count = 187408
00:00:10 Throughput 125519/s, Aborts/Failures 0/0
Partition 2 row count = 288592
Partition 1 row count = 289928
Partition 0 row count = 290540
Partition 3 row count = 290900
Partition 5 row count = 290025
Partition 4 row count = 290783