Upserts in VoltDB

written by Andrew Wilson on November 16, 2012 with 3 comments

Upserts in VoltDB

The idea behind an upsert is that you try an update or an insert query first and if it fails, you then do the other query.
Why do an upsert? Upserts tend to be very fast in traditional databases because they can execute in as little as one query or as many as two. Consequently, a good upsert strategy has only two defined queries (insert and update) rather than three (insert, update and select). More importantly, “upsert” itself can be a keyword in which the database understands that it is responsible for figuring out whether to update or insert a record. VoltDB, however does not actually need an upsert statement at all. The execution time of a select statement adds almost no detectable overhead compared to the traditional database’s upsert.

Deciding on whether to insert or update first can be quickly determined by the size of the data. I’ll describe two scenarios and how upserts fit into them.

The first scenario is user profile registration and updates. Your application has an ever-growing list of users registering and occasionally updating their profiles. I’m defining profile as the metadata like address, contact information, etc. and possibly username and password. It is likely that you have more new registrations than profile updates. In this case, you will want to try inserting the record and having it fail and then falling back to the update. Why? Because it is possible that 9 out of 10 profile related operations are new records and consequently the insert will succeed more often than an update.

The second scenario deals with analytical groups. Consider that your application tracks age, sex, income and education as a demographic group with a count. You may have 5 age groups, 7 income groups and 6 education groups that are combined into demographic groups. That leaves you with only (576) 210 possible demographic groups that are created from your user profiles user profiles. You may have millions of user profiles, however you still only have 210 demographic groups. In this case, it is more efficient to attempt the update because the demographic group set is so small that you are more likely to find an existing demographic group and increment its count as opposed to encountering a new demographic group. Just to be clear, in this case it is better to attempt the update, fail and fallback to the insert.

How to upsert in VoltDB

VoltDB throws a VoltAbortException object in the event of a statement failure within a stored procedure. This exception should not be caught and instead be allowed to propagate to the client application. Updating a non-existent record or inserting a record with a duplicate primary key will throw the exception. Consequently, to implement upsert, you must first perform a select and then execute either an update or an insert. This is shown in the following code.

SQL.DDL

CREATE TABLE DEMOGRAPHIC_AGGREGATION (
 key             varchar(200)        UNIQUE NOT NULL,
 network         varchar(100)        NOT NULL,
 sex             varchar(100)        NOT NULL,
 impressions     integer             NOT NULL,
 marital_status  VARCHAR(100)        NOT NULL,
 income          VARCHAR(100)        NOT NULL,
 education       VARCHAR(100)        NOT NULL,
 occupation      VARCHAR(100)        NOT NULL,
 conversions     integer             NOT NULL,
 percent         float               NOT NULL,
 CONSTRAINT key_idx PRIMARY KEY
 (key)
);
PARTITION TABLE DEMOGRAPHIC_AGGREGATION ON COLUMN key;

UpsertDemographicStats.java

@ProcInfo(partitionInfo = "DEMOGRAPHIC_AGGREGATION.key:0", singlePartition = true)
 public class UpsertDemographicStats extends VoltProcedure {
public final SQLStmt select = new SQLStmt(
 "Select network from DEMOGRAPHIC_AGGREGATION where key like ?;");
public final SQLStmt update = new SQLStmt(
 "update DEMOGRAPHIC_AGGREGATION set impressions= ?,
conversions=?, percent=? where key like ?;");
 public final SQLStmt insert = new SQLStmt(
 "INSERT INTO DEMOGRAPHIC_AGGREGATION (key,
network, sex, impressions, marital_status, income,
education, occupation, conversions, percent) "
+ " VALUES (?,?,?,?,?,?,?,?,?,?);");
public long run(String key, String network, String sex, int impressions,
 String maritalStatus, String income, String education,
 String occupation, int conversions, double percent) {
voltQueueSQL(select, key);
 VoltTable[] selectResults = voltExecuteSQL();
 if (selectResults[0].getRowCount() > 0) {
 voltQueueSQL(update, impressions, conversions, percent, key);
 } else {
 voltQueueSQL(insert, key, network, sex, impressions, maritalStatus,
 income, education, occupation, conversions, percent);
 }
  voltExecuteSQL(true);
return 1;
 }
 }

The above example first executes a select statement, which is a very inexpensive, or fast, query because we are running within the context of a single partition and we know the primary (unique) key of the record. We then branch depending upon whether an existing record is found, performing the insert or update operation accordingly.

With VoltDB, this is the recommend way to implement a single transaction upsert operation. To find examples of the code as well as other resources, please visit our GitHub page located at https://github.com/VoltDB/voltdb-labs/tree/master/java/UpsertSample.