VoltDB Real-Time Analytics with JSON

written by John Piekos on October 3, 2013 with no comments

Consider the scenario where you are building an online multi-player game platform, where users can use the platform to create their own games.  Such a platform would have the following high-level requirements:

  1. It needs to be able to process tens of thousands of write transactions, game state changes, per second.  As each player makes a “move” or “action”, that needs to be recorded.
  2. It needs to be able to query player status and ranking.  Each player’s ranking in relation to other players, for example, needs to be computed regularly, across all players and all games.
  3. It would need to support different properties, or data items, per unique game.

VoltDB easily handles requirements #1 and #2.  VoltDB can ingest millions of transactions a second on a modest commodity cluster.  VoltDB’s SQL aggregation and materialized views allow you to invoke queries that quickly and efficiently calculate rankings, aggregations and high velocity counting.

But how would you implement requirement #3, the ability to have custom data elements, columns, for each customer? It is difficult to design a complete database schema in advance, when future business requirements are unknown.  With the added challenge that the data volume will continue to grow, perhaps astronomically.

This is where VoltDB, and in particular, its JSON data processing can help.  In the v3.0 release (released January 2012), VoltDB introduced the concept of storing, querying and reporting on JSON data.  A prior blog post discussed the technical details and developer advantages of using JSON in your VoltDB application, from allowing flexible schema, to adding unstructured data alongside relational data.

How it works

JSON is a industry-standard format that stores both data and structure in a plain text string. So each JSON-encoded string can have a different structure and/or size.

Because it is encoded as plain text, it is easy to store JSON data in a variable length string (VARCHAR) field. What makes VoltDB’s implementation special are the column functions that let you intelligently operate on that data directly within SQL statements. The field()array_element(), and array_length() functions let you parse individual elements of the JSON structure. You can even execute SQL queries that join on fields and elements of JSON text.

For example, it is possible to select records based on the value of a specific JSON field. You can even create materialized views based on elements of JSON data, which are refreshed automatically when the underlying data changes.

Real-time Analytics with VoltDB JSON

Scott Jarr, in his insightful Big Data Continuum blog post explains that “Shortly after data creation, we are often interested in a specific data instance relative to other data that has also arrived recently – how is my network traffic trending, what is my composite risk by trading desk, what is the current state of my online game leader board?  Queries like these on high velocity data are commonly referred to as real-time analytics.”  Real-time analytics is defined as the ability to perform business-related queries on vast amounts of data immediately, as the data arrives, rather than waiting until some time in the future when your data warehouse has batch processed the data.

Building applications that produce real-time analytics can be very challenging. A successful high-velocity application must not only ingest huge amounts of data it must also produce aggregate metrics that make that data meaningful. The first metric people often ask for are counts. “How many times did X happen?” is an easy question to ask, but for the online, mobile and machine generated data streams of tomorrow, it can be tricky to answer. Existing OLAP and OLTP systems have to rely on fuzzy answers, or post-processing that is out of date as soon as it is produced, minutes or hours later.

Because VoltDB was designed for extreme write throughput, a single row can be updated tens of thousands of times per second. Distributed counters with millions of updates per second and thousands of reads per second leverage VoltDB’s full consistency and distributed querying. Furthermore, while other systems are focused on counting, VoltDB’s rank-indexes and materialized view support allows live and instantaneous querying of minimums, maximums, top-N, bottom-N, averages, sums as well as aggregated sums and counts. Furthermore, these constructs can be combined with powerful SQL expressions to build custom, focused metrics.

The accuracy of this real-time analytics is guaranteed because it is comes directly from the relational data within VoltDB tables. What’s more, the analytics can be extended to include the variable fields within JSON documents stored in VARCHAR columns. The VoltDB storage engine natively understands JSON and allows for deep and selective indexing within such documents. Materialized views can be build on keys and arrays, all using straightforward extensions to the SQL data definition and query language.

Conclusion

VoltDB JSON allows your application to compute real-time analytics, including live reports and live decisioning support, over a variety of high velocity data, both structured and semi-structured. 

When data is semi-structured, varied on a per customer or tenant basis, or possibly from different data sources, there is no need to created parallel database schemas or systems to ingest and process these feeds.  By incorporating VoltDB’s JSON processing you can greatly simplify application development while supplying high velocity real-time analytics across differing data feeds, enabling your business to capitalize and make decisions, across all types of data, the moment the it arrives.

Additional Resources

  • VoltDB Vanguard case study: A VoltDB game platform customer reported running 800,000 tx/second through their PHP client on their multi-tenant JSON-based application.  For additional details visit the VoltDB Vanguards webcast page here: http://voltdb.com/resources/webcasts/.