Programming VoltDB – Easy, Flexible and Ultra-fast!
You may believe that the only way to interact with VoltDB is through Java stored procedures. To achieve maximum throughput, VoltDB stored procedures is the way to go. You can achieve upwards of 100,000 transactions per second on a single node. However, you can also achieve significant throughput by interacting with VoltDB conversationally, through ad hoc SQL statements, avoiding the need to pre-compile stored procedures.
This blog will discuss several approaches to interacting with VoltDB programmatically and cover the performance, in terms of transaction throughput, that you should expect with each approach.
Ad Hoc Query
To VoltDB, an ad hoc query is a SQL statement that has not yet been planned (pre-compiled). To test your SQL or to get started quickly with an application, you can avoid writing stored procedures and pass SQL to VoltDB in an ad hoc conversational manner. Ad hoc queries can be passed to VoltDB in several ways: from an application via the VoltDB JDBC driver, via the JSON interface, by passing the SQL statement to the @AdHoc system stored procedure, or interactively using VoltDB Studio or the command line sqlcmd tool.
Each ad hoc SQL invocation is treated as a single transaction. Using this approach, applications can achieve upwards of several thousand transactions per second, depending on the complexity of the SQL being executed.
For example, if we execute the SQL found in the Vote stored procedure from the VoltDB voter application as an ad hoc query, it would look something like this :
Note that this SQL is for example only, you may not want to write your logic this way due to SQL injection issues.
Of course, ad hoc SQL can also be executed with the JDBC driver, but consider issuing the SQL via the VoltDB client api (Java API in this example) as it gives you the additional flexibility of invoking requests in an asynchronous manner.
When VoltDB receives an ad hoc query, VoltDB determines if the query can be executed on a single partition. If possible, as is the case with the query above, VoltDB executes it as such. This achieves greater throughput than those queries that need to be executed across all partitions (called multi-partition queries, explained later in this post).
When you are ready to dive into the deep end of the pool and achieve web scale transaction throughput – executing 10’s of thousands to 100’s of thousands of transactions a second – you need to execute your transactions within VoltDB stored procedures.
Usually there is some portion of your application that is the “fast path”. VoltDB stored procedures allow you to optimize those fast path execution paths to achieve the maximum possible throughput.
If you are unfamiliar with the Java language, writing stored procedures may be initially intimidating. That’s ok. Writing stored procedures is not that difficult and can be done without having to learn all of Java. The fastest way to create your stored procedures is to take the SQL and the program logic that comprises the high velocity portion of your application, and put this logic into a VoltDB Java stored procedure. Our documentation has a nice overview of writing a stored procedure. When you compile this code, VoltDB creates and packages query execution plans for the SQL within your stored procedure at compile time, thus eliminating all of the query plan processing overhead during runtime execution (The overhead that’s in play when executing ad hoc queries).
By coding those transactional execution paths using VoltDB stored procedures, you are enabling VoltDB to pre-compile and ultimately parallelize transaction execution across a cluster of commodity hardware. This architecture is important to understand because it directly influences how you choose to build your VoltDB client applications as the rest of this blog post will discuss.
There are two types of stored procedures: Single-partition and Multi-partition. Understanding the difference is vital to maximizing your application’s performance.
When you create a VoltDB stored procedure, by default, it is treated as a multi-partition. A multi-part procedure is executed on every partition, with VoltDB handling the merging of the results from each partition before returning the result to the client.
Use multi-partition procedures when you wish to retrieve or interact with data that is spread across more than one, potentially all, partitions. When invoking multi-partition procedures, applications can generally experience 1,000 to 10,000 transactions per second, depending on the complexity of the stored procedure SQL being executed. A good example of a multi-partition procedure is the Results procedure that can be found in the Voter sample app. This procedure periodically gathers voting results by retrieving data from all partitions – while a higher rate of Vote transactions are occurring in parallel in each individual partition (discussed in the next section).
If your operation is “single-partition”, it is executed on only one partition. This is important because if only one partition is in use, all other partitions can be executing other single-partition transactions in parallel. This is how your application achieves maximum scalability – by spreading work concurrently across all available partitions in the cluster. Write-intensive workloads that have been historically difficult to scale with existing technology can easily scale in VoltDB.
By default, stored procedures are treated as multi-partition procedures. To identify your procedure as single partition, add the following Java annotation at the beginning of your stored procedure:
You also want to make sure your stored procedure is passing in the key on which you partitioned the data. VoltDB uses this key to determine on which partition to execute the stored procedure SQL. A good example of a single-partition operation is the Vote procedure, partitioned on “phone number”, which can be found in the Voter sample app.
Using single-partition procedures, applications can generally experience 1 00,000+ transactions per second, per node, depending on the type of machine they are being executed on. For example, a MacBook Pro might execute the Voter application at 40,000 transactions a second, while the same application might run at 120,000 transactions a second on a single commodity quad-core Linux box.
For a deeper understanding about single partition vs. multi-partition execution, please refer to the section on designing single partition procedures in the VoltDB documentation.
Automatically Generated Stored Procedures
There are two occasions where VoltDB will automatically generate stored procedures. For partitioned tables where the partitioning column is defined as part of the primary key, VoltDB will automatically generate CRUD (Create, Read, Update, Delete) stored procedures.
Additionally, single SQL statements can be defined in the project definition file and VoltDB will automatically generate the respective stored procedure. For example, including this construct in your project.xml file will generate a MyReservationsByTrip stored procedure that executes the specified SQL:
Note that the SQL can be parameterized, as shown above, and you can also easily identify the stored procedure as single partition or multi-partition. More details on single-statement stored procedures can be found here.
Building client applications with VoltDB is very easy – and the good news is that with VoltDB you can walk before you run. Should you choose to, you can begin building your application using ad hoc queries either through JDBC or any one of our client APIs. Using this strategy, you can easily build an application that supports several thousand transactions a second. As you become more comfortable with VoltDB (it won’t take long!), you can migrate your “fast path” transactions to stored procedures, and reach 100+ thousand transactions a second.
Ultimately to achieve eye popping throughput you’ve read about in the VoltDB benchmarks, such as the 3.4 million transactions per second benchmark, or the recent 695,000 tx/second Node.js benchmark, you will want to encode a majority of your transactional logic as VoltDB stored procedures. To achieve this type of scale you will also want to run your application against a VoltDB cluster of commodity Linux servers (or cloud machines). Clustering will be the subject of my next blog post.