Skip directly to content

In-Memory Database Performance Use Cases

Streaming Data Pipeline

A streaming data system is traditionally used to handle a “stream” of arriving events. These systems are built to ingest fast-moving data feeds, but they lack context and state, which are necessary for decision-making. Unlike OLAP and OLTP systems, streaming systems are not optimized to store data or produce fast lookups.

 

VoltDB enables applications to use real-time streaming data to enrich user experience, optimize interactions, and create value. The scale-out, SQL ACID-compliant architecture ensures data durability and provides standard application interfaces (ODBC/JDBC) with broad ad-hoc query capability. Applications can take action on real-time, per-event data as it is streaming in, then export it to the long-term data warehouse or analytics store for reporting and analysis. VoltDB is a platform that offers real time ingest capabilities to real-time applications, while supporting stateful buffering of the feed for downstream batch processing, meeting both sets of requirements.

 

 

VoltDB scales to "firehose" speeds and includes a built-in pipeline connector called "VoltDB Export". You can use VoltDB Export to stream data that has been processed by VoltDB to downstream systems. We ship export connectors to Hadoop (HDFS), HP Vertica, message queues like Kafka and RabbitMQ and local file systems. Export data can be formatted as CSV, JDBC rows or Avro messages. The export connector API is open source and is easily extensible if you need to connect VoltDB to other downstream components or format data using a different serialization.

 

 

       

Ingest

Connect to VoltDB using a native VoltDB driver, by POSTing requests directly over HTTP or by using one of the pre-built loaders to connect to an existing data source.

Native DriversLoaders
C++Kafka
JavaCSV
PHPJDBC
PythonSQL Command Tool
C#Hadoop OutputFormat
JDBC 
ODBC 
Others: see Clients and DriversVertica UDX

 

 

Example: Use the VoltDB Kafka Loader to feed events from Kafka to VoltDB

 

voltdb/bin/kafkaloader volttable --zookeeper=zkserver:2181 —topic=sourcedata

More on the VoltDB Ecosystem

VoltDB has two loaders available. The VoltDB Kafka Loader, which makes it easy for VoltDB to ingest streams of data from Kafka message queues. Additionally the VoltDB JDBC Loader facilitates loading data into VoltDB from relational data stores allowing for efficient retrieval of all records from the specified table in a remote database. This can then be inserted into a matching table in VoltDB. This pattern is often used to install result data, such as computed user segmenting used for digital ad network applications, into your fast data pipeline.

 

VoltDB also includes a Hadoop OutputFormat implementation to import job data from Hadoop into VoltDB. This comes in handy when performing historical analysis and capturing the intelligence needed when making real-time decisions. For example, a digital ad decision-making application may use a user/household segmentation data set, computed in Hadoop, to target different demographics with specific ads.

 

This output format is used by our Hadoop connectors for both Apache Pig and Apache Hive. The connectors can be found in our github repository here.

 

VoltDB also built a Vertica UDx, a user-defined extension, that takes a Vertica result set and inserts loads it into VoltDB. You can download the UDx here.

 

Analyze and Decide

VoltDB processes each incoming event or request as a discrete ACID transaction. A transaction can be one or more SQL statements pre-defined in a DDL file, an ad-hoc SQL statement issued by the application, or a combination of SQL and Java encapsulated in a VoltDB stored procedure.

 

Use VoltDB’s scale out share nothing architecture, fault tolerant, ACID, SQL database capabilities to:

    • Filter duplicate events
    • Sessionize real time click streams
    • Enrich (denormalize) incoming data using reference tables
    • Classify events in real time using analytics from Hadoop/Warehouses

Export

Solving data-at-scale problems requires using multiple tools together — “one size does not fit all.” VoltDB embraces this point of view and includes a native, high performance integration interface: VoltDB Export. Use VoltDB to process discrete events in real time (thousands to millions of events per second with per-event responses in milliseconds). Connect VoltDB processed data feeds to downstream pipleline components like Hadoop, Kafka or enterprise data warehouse (OLAP) systems using VoltDB export.

 

VoltDB export is fully parallel - it gets faster as you add nodes to the cluster. Export is fault tolerant, implementing at least once deliver to the downstream system. (With optional unique row identifiers to handle possible duplicates in case of fault processing.) If the connection between VoltDB and the destination is broken or interrupted, VoltDB buffers export data to local disks until export can resume ensuring durability.

 

Common Export Use Cases
Enrich incoming events with static metatdata or computed aggregates and export Avro data from VoltDB to Hadoop (HDFS).
Evaluate conditions and rules for each incoming event and export alerts or notifications to Kafka, RabbitMQ, or Amazon SNS.
Batch incoming data to local flat files (CSV) for collection.
Filter duplicate events and send unique rows to HP Vertica or another OLAP DB.
Re-assemble split messages using VoltDB and stream the transformed feed to SparkML.

 

Export is simple to use. Declare a VoltDB table as “EXPORTED” in the VoltDB DDL file and all rows inserted in to that table are handed to the export connector. Connectors serialize content to the required format (CSV, Avro, JDBC) and push it downstream to the destination system.

 

Example Export Configuration

 

Create an Export table:
CREATE TABLE events (
    event_id INTEGER,
    time    TIMESTAMP,
    payload VARCHAR(128));
EXPORT TABLE events;
 
Configure an Export Connector:
<export enabled="true" target="http">
    <configuration>
        <property name="endpoint">
            http://myhadoopsvr/webhdfs/v1.0/%t/data%p-%g.%d.avro
        </property>
        <property name="type">avro</property>
        <property name="avro.compress">true</property>
        <property name="avro.schema.location">
            http://myhadoopsvr/webhdfs/v1.0/%t/schema.json
        </property>
    </configuration>
</export>
 
Write a new record to the export table:
    INSERT INTO events (event_id, time, payload) VALUES (id, ts, data);

Real-time Analytics

Real-time analytics applications often provide a summary of an incoming data stream. These applications are used to discover real-time insights from fast flowing data produced by social media, mobile devices, sensors, infrastructure, and more. VoltDB offers high speed transactional ACID performance and the ability to process thousands to millions of discrete incoming events per second.

 

Implementing VoltDB to handle fast ingestion of data and interact on data to perform real real-time analytics provides the ability to create applications that can make data-driven decisions on each event as it enters the data pipeline.

 

 

VoltDB enables real time SQL analytics against fast streams of data. VoltDB analytics use cases fall into three categories: moving windows and aggregation of real time data for BI dashboards and external applications, per-event analytics to detect fraud or enforce policy, and caching of OLAP analytics to scale high concurrency querying and serving.

 

 

      

 

 

Ingest

Connect to VoltDB using a native VoltDB driver, by POSTing requests directly over HTTP or by using one of the pre-built loaders to connect to an existing data source.

 

Native DriversLoaders
C++Kafka
JavaCSV
PHPJDBC
PythonSQL Command Tool
C#Hadoop OutputFormat
Others: see Clients and Drivers [LINK]Vertica UDX

 

Example: Use the VoltDB Kafka Loader to feed events from Kafka to VoltDB

 

voltdb/bin/kafkaloader volttable --zookeeper=zkserver:2181 —topic=sourcedata

Analyze Streaming Aggregations

Materialized views calculate groupings and aggregations of data in their associated base tables. VoltDB supports materialized views that are scalable, fully transactional and always-up-to-date. Using materialized views, it is easy to maintain streaming aggregations of the real time data managed in VoltDB. Views are query-able using standard SQL. Use VoltDB’s MPP shared-nothing architecture to scale-out aggregation and processing of fast event feeds and then use VoltDB’s distributed SQL support to read database-wide rollups and groupings.

 

Example: Define a materialized view and query averages events per second. The complete application is available as the “windowing” example application distributed with VoltDB.

 

Table that stores values that are timestamped and are partitioned by UUID.

 

CREATE TABLE timedata
(
    uuid VARCHAR(36) NOT NULL,
    val BIGINT NOT NULL,
    update_ts TIMESTAMP NOT NULL
);

 

MATERIALIZED VIEW that pre-aggregates the sum and counts of the rows by second. This allows for fast computation of averages by ranges of seconds.

 

CREATE VIEW agg_by_second
(
    second_ts,
    count_values,
    sum_values
)
AS SELECT TRUNCATE(SECOND, update_ts), COUNT(*), SUM(val)
    FROM timedata
    GROUP BY TRUNCATE(SECOND, update_ts);

 

SQL queries and SQL functions for roll-ups.

 

Find the average value over all tuples across all partitions for the last N seconds, where N is a parameter the user supplies. Uses the materialized view so it has to scan fewer tuples. For example, if tuples are being inserted at a rate of 15k/sec and there are 4 partitions,then to compute the average for the last 10s, VoltDB would need to scan 150k rows. In this case, it needs to scan 1 row per second times the number of partitions, or 40 rows. That's a tremendous advantage of pre-aggregating the table sums and counts by second.

 

CREATE PROCEDURE windowing.Average AS
    SELECT SUM(sum_values) / SUM(count_values)
    FROM agg_by_second
    WHERE second_ts >= TO_TIMESTAMP(SECOND, SINCE_EPOCH(SECOND, NOW) - ?);

 

Per Event Analytics and Policy Enforcement

VoltDB real-time analytics processes each incoming event or request as a discrete ACID transaction. A transaction can be one or more SQL statements pre-defined in a DDL file, an ad-hoc SQL statement issued by the application, or a combination of SQL and Java encapsulated in a VoltDB stored procedure.

 

Use VoltDB’s MPP, fault tolerant, ACID, SQL database capabilities to:

  • Alert, alarm or notify based on the current event and real-time counters and aggregations.
  • Monitor and enforce quotas, balance check and resource assignments in real-time.
  • Evaluate rules on a per-event basis in combination with OLAP analytics for real time segmentation, scoring and mass personalization.

 

 

Support 100,000’s of request evaluations per second with millisecond-level
response latencies. Integrate policy enforcement, alerting and monitoring all at
streaming speeds on a small VoltDB clusters with SQL and minimal Java. (For an
example, run voter example in the VoltDB download kit.)

Scalable Serving

VoltDB includes code-free bulk loaders to load report data from HP Vertica (using Vertica UDX) and Hadoop (using Hadoop OutputFormat). Cache OLAP report output in VoltDB and use VoltDB to support thousands of concurrent connections and SQL queries to the data. Each VoltDB server supports tens of thousands of concurrent connections. VoltDB’s distributed SQL support enables further refinement, filtering and re-grouping of analytic outputs for high speed users and applications.

 

For more information about loading data from Hadoop to VoltDB click here.

Integrate and Report

ODBC and JDBC Tool Integration

Support for standard ODBC and JDBC interfaces enable standard third-party tool integration like Microstrategy, Tableau and SQL. Download clients, drivers and monitoring integrations from the Community Clients and Monitoring page.

SQL Analytics

VoltDB supports a super-set of SQL-92 functionality. Our full SQL support is documented in the Using VoltDB Guide.

Integrations

Learn more about VoltDB integrations and real-time analytics on our Integrations page.

Real-time Decision Engine

Data has the greatest value as it enters the pipeline. Real-time interactions can power business decisions, such as customer interaction, security and fraud prevention, and resource optimization.

 

In order to process at the speed and latencies for real-time decisions, the database platform must support moving transaction processing closer to the data. VoltDB is a fast in-memory database that supports SQL and ACID compliance to provide the high-throughput and low-latency response your applications need to make decisions in real time.

 

 

VoltDB’s high performance ACID SQL transactions enable scalable request-response style applications that scale from thousands to millions of requests per second on small clusters. Supporting thousands of concurrent connections with round-trip latencies in milliseconds (1-5ms @99.9%), VoltDB is an ideal platform for high speed policy enforcement, authorization, rule evaluation, and quota management.

 

       

 

 

Ingest

Connect to VoltDB using a native VoltDB driver, by POSTing requests directly over HTTP or by using the ODBC or JDBC drivers. VoltDB provides synchronous and asynchronous interfaces for a variety of popular programming languages. VoltDB drivers include optimizations like built-in load balancing, connection caching, topology aware request routing and other sophisticated features to make writing high performance database applications simpler.

 

Example: Run a VoltDB stored procedure transaction from PHP

 

$parameters = new \volt\Parameters();
$parameters->push(new \volt\Parameter(\volt\voltdb::WIRE_TYPE_BIGINT));
$parameters->push(new \volt\Parameter(\volt\voltdb::WIRE_TYPE_TINYINT));
$parameters->push(new \volt\Parameter(\volt\voltdb::WIRE_TYPE_BIGINT));
$procedure = new \volt\Procedure('Vote', $parameters);
$procedure->params()->addInt64($phoneNumber)->addInt8($contestantNumber) \
    ->addInt64($maxVotesPerPhoneNumber);
$voltClient->invoke($procedure);

 

Example: Run a VoltDB stored procedure from Java

 

client = ClientFactory.createClient(clientConfig);
client.createConnection(server);
client.callProcedure("Vote", phoneNumber, contestantNumber, maxvotes);

 

Analyze and Decide

VoltDB processes each incoming event or request as a discrete ACID transaction. A transaction can be one or more SQL statements pre-defined in a DDL file, an ad-hoc SQL statement issued by the application, or a combination of SQL and Java encapsulated in a VoltDB stored procedure.

 

Use VoltDB’s MPP, fault tolerant, ACID, SQL database capabilities for:

  • Real time personalization based on current context and OLAP segmentation data. Cache historical segmentation reports in VoltDB and combine that data with real time campaign strategies and per-event context to personalize free to play games and real time mass personalization marketing campaigns.
  • Policy evaluation and enforcement in real time. Evaluate campaign selection rules in real time for each incoming event - hundreds of thousands of times per second.
  • High speed authorization based on real time balances and per-user state. Maintain real time pre-paid balances and track micro-payments in real time.
  • Resource optimization using per-event least-cost/least-used decisions. Use VoltDB’s performance to track real time status of fast changing environments like financial position tracking, best bid offer tracking, and network utilization and QoS metrics for applications like global risk control and DDoS attack mitigation.

Real Time personalization

VoltDB’s combination of high throughput performance, ACID transaction capabilities, and stateful, durable database tables make it easy to write high velocity request/response applications. Latencies in the 1-5ms range allow personalization parameters to be calculated in-line with user experience. See a complete example application demonstrating VoltDB for high velocity request-response style applications.

Performance

Each server in a VoltDB cluster can execute tens of thousands of transactions per second. VoltDB’s linear scale-out architecture lets you scale your cluster to match your throughput requirements. Most VoltDB clusters are 3-20 nodes in size. (A 16 node cluster can run upwards of 6 million durable, ACID SQL transactions per-second.)

 

VoltDB includes built-in high availability and fault tolerance using active/active, synchronous replication within a datacenter. VoltDB supports asynchronous WAN replication for disaster recover across datacenters and availability zones.

 

Applications can connect and send requests to any server in the cluster. The VoltDB database takes care of data-routing, replication, high availability, durability - and provides applications a fully ACID, SQL based programming model that scales from thousands to millions of transactions per second.

 

We recently ran YCSB workloads to benchmark VoltDB performance.