VoltDB is a Churning Urn of Groovy Funk

written by Stefano Santoro on February 11, 2014 with no comments

VoltDB is welcoming Groovy into its ecosystem as its first inline procedure language. Code your procedure logic straight into the DDL, bypassing the java procedure requirements to edit/compile java source files separately.

With VoltDB Groovy stored procedures you can code your procedure implementation as part of CREATE PROCEDURE statements in your DDL file.

CREATE PROCEDURE groovy.procedures.Item AS ###
selectItem = new SQLStmt('SELECT ITEM_ID, DESCRIPTION FROM ITEMS WHERE ITEM_ID = ?')
transactOn = { id ->
    voltQueueSQL(selectItem, EXPECT_ZERO_OR_ONE_ROW, id)
    voltExecuteSQL(true)
}
### LANGUAGE GROOVY;
PARTITION PROCEDURE Item ON TABLE ITEMS COLUMN ITEM_ID;

How to write a groovy procedure

You can rely on the fact that the following imports are predefined and made available for your Groovy store procedures:

import static org.voltdb.VoltTypes.*
import static org.voltdb.VoltProcedure.*
import org.voltdb.*
import org.voltdb.groovy.TableBuilder
import org.voltdb.groovy.Tuplerator
import org.voltdb.VoltProcedure.VoltAbortException

You first define instances of SQLStmt that, like in Java, must never change throughout the procedure code. This is because VoltDB reflects on, analyzes, and creates plan for those statements.

You need, then, to define the transactOn closure. VoltDB invokes this closure when it actually executes the procedure. The closure can accept any arguments that its java run method equivalent can. It also must return a VoltTable, or an array of VoltTables, or a long value.

If you need to abort the transaction while the transactOn closure executes, then you need to throw the VoltAbortException.

Also keep in mind the following guidelines.

  • define your SQLStmt’s outside the transactOn closure
  • use tables to persist any state that must live across procedure invocations

Groovy Table Readers and Builders

The following procedure showcases two additions that make reading, and building a table a little easier

    CREATE PROCEDURE voter.procedures.ContestantWinningStates AS ###
        resultStmt = new SQLStmt('''
            SELECT contestant_number, state, SUM(num_votes) AS num_votes
            FROM v_votes_by_contestant_number_state
            GROUP BY contestant_number, state
            ORDER BY 2 ASC, 3 DESC, 1 ASC;
        ''')

        transactOn = { int contestantNumber, int max ->
            voltQueueSQL(resultStmt)

            results = []
            state = ""

            tuplerator(voltExecuteSQL()[0]).eachRow {
                isWinning = state != it[1]
                state = it[1]

                if (isWinning && it[0] == contestantNumber) {
                    results << [state: state, votes: it[2]]
                }         
            }         
            if (max > results.size) max = results.size
            buildTable(state:STRING, num_votes:BIGINT) {
                results.sort { a,b -> b.votes - a.votes }[0..<max].each {
                    row it.state, it.votes
                }
            }
        }
    ### LANGUAGE GROOVY;

tuplerator(VoltTable table) returns a Groovy wrapper around the VoltTable. The eachRow method accepts a closure that is invoked for each row in the table. Column values can easily be accessed by column index, or column name. In the example above ‘it’ is the implicit parameter passed to eachRow closure, and ‘it’ accesses the first column value for the row with the it[0] accessor. The same may be accessed with it['contestantNumber'] or more simply with it.contestantNumber. If you need to access directly the underlying table, you may do so by referring it with the ‘table’ accessor: it.table.get(1,STRING). Other methods available to you are rowAt(int rowNum), which sets the table cursor to the indicated row, and reset(), which resets the table cursor. For example

tuplerator(voltExecuteSQL()[0]).atRow(0)['state']

gets the state column value for the first row.

buildTable(column name: type map) allows you to build tables easily. The method parameters are:

  • a map where the keys are column names, and the values are their respective column types.
  • a closure where row invocations add rows to the underlying table

The code above shows a table with two columns (state, and num_votes), that is fed values from the results collection.

Both may be combined as follows, where the results of one table are processed, and fed into the newly built one:

CREATE PROCEDURE voter.procedures.GetStateHeatmap AS ###
    resultStmt = new SQLStmt('''
        SELECT contestant_number, state, SUM(num_votes) AS num_votes
        FROM v_votes_by_contestant_number_state
        GROUP BY contestant_number, state
        ORDER BY 2 ASC, 3 DESC, 1 ASC;
    ''')

    transactOn = {
        voltQueueSQL(resultStmt)

        state = ""

        buildTable(
            state:STRING, contestant_number:INTEGER, 
            num_votes:BIGINT, is_winning:TINYINT
        ) {
            tuplerator(voltExecuteSQL()[0]).eachRow {
                byte isWinning = state != it.state ? (byte)1 : (byte)0
                state = it.state

                row state, it.contestantNumber, it.numVotes, isWinning
            }
        }
    }
### LANGUAGE GROOVY;

The above code excerpts may be perused by visiting an example DDL in our Github repository

Java Procedures are still faster than Groovy procedures

The syntactic conciseness that leverages the dynamic code interpretation capabilities of Groovy comes at a performance cost. Java should be your procedure implementation language choice if you really need to eek out as much performance as you can. But if you want something to get you started faster with VoltDB, and explore its features, and power, then Groovy procedures are a very good option.