Real-time Analytics with JSON

written by vdbdev on October 23, 2013 with no comments

This post describes how to combine VoltDB JSON functionality and materialized views
to perform high velocity real-time analytics. For this example, we’ll make some
simplistic modifications modify the voter application (found in
/examples/voter) to store vote values as JSON and
generate real-time analytics using views on the JSON vote data.

The first thing we need to do is update our “votes” table to use
JSON. The votes table is our high velocity transactional table – we want to
support as many votes per second as possible, so this table is partitioned across
the database. This table tracks phone numbers (the partition column)
along with two other columns, the state the call was made from (so that we can roll
up votes by state), along with the contestent that is being voted for. Let’s modify
this table definition to store both the state and contestent values in JSON, by
adding a varchar(256) column as follows:

CREATE TABLE votes
(
      phone_number       bigint     NOT NULL
    , json_data          varchar(256) NOT NULL
);

In order to provide efficient roll-up counting by state, we’ll next update the
v_votes_by_contestant_number_state materialized view. Materialized views are
refreshed automatically when the underlying data changes. As such performing
database-wide expensive computations such as tallying votes by state becomes extremely
efficient when using a view. Because we’re building this view on the values
of state and contestent-number that are contained within our JSON payload, we must
use the field() JSON column function to extract those values when we define the
view. Here’s the updated DDL:

CREATE VIEW v_votes_by_contestant_number_state
(
  contestant_number
, state
, num_votes
)
AS
   SELECT cast(field(json_data, 'contestant_number') as integer)
        , field(json_data, 'state')
        , COUNT(*)
     FROM votes
 GROUP BY cast(field(json_data, 'contestant_number') as integer)
        , field(json_data, 'state')
;

Finally, we must make a small modification to the voter application to store
our vote data, the state and contestant_number as JSON. The following code
modifies the Vote stored procedure to do just that:

    // Create a JSON string to hold the state and contestant number.
    String json_data = "{\"contestant_number\":\""+ contestantNumber +
                             "\",\"state\":\"" + state + "\"}";
    // Post the vote
    voltQueueSQL(insertVoteStmt, EXPECT_SCALAR_MATCH(1), phoneNumber, json_data);
    voltExecuteSQL(true);

Note that this modification is by way of simple example and not the most optimal
or extensive method to add JSON support. For example, you could migrated your json
formatting all the way back to the client, which would allow the the stored procedure
to become JSON-schema independent. You would not have to update it with each new
JSON field.

Storing voting data in JSON allows you to change the structure and contents of the
data stored within the JSON field, possibly per user or per vote, without changing
the database schema. Should you wish to query new fields within that data, you can
issue ad hoc queries, and even add additional indexes on the values using the field()
function.

Note that the trade-off for this flexibility. There is additional computational
overhead to parse the JSON to extract the field values during materialized view and
index maintenance, but fortunately the cost of maintaining the view is amortized
across all row inserts. As usual, you should benchmark your application and tune
as appropriate.

More Information

For more information please see the following: