Introducing VoltDB v5.2!
VoltDB v5.2 continues to enhance the Database Replication functionality released in v5.1, with resume-able replication over cluster failures. v5.2 also adds secure export to Hadoop using Kerberos, introduces support for partial indexes, adds SQL bit masking functions and adds the ability to modify configuration settings to the VoltDB Management Center. Let’s take a more detailed look at the new capabilities.
Database Replication improvements include:
Ability for database replication (DR) to resume across cluster outages. Previously, database replication (DR) was able to continue despite individual node failures (in a K-safe environment). However, failure of either the master or replica cluster would force a restart of DR. Beginning with v5.2, DR can resume across cluster failures when either the master or replica is recovered from command logs. See the chapter on "Database Replication" in the Using VoltDB manual for details.
Hadoop ecosystem improvements include:
Secure export to Hadoop using Kerberos. The HTTP export connector now supports the use of Kerberos authentication when exporting to a WebHDFS endpoint configured to use Kerberos. Read the section on using the HTTP export connector in the Using VoltDB manual for details.
SQL improvements include:
Support for partial indexes. The index definition can now contain a WHERE clause limiting the rows that are included in the index. For example:
CREATE INDEX completed_tasks
ON tasks (task_id, startdate, enddate)
WHERE enddate IS NOT NULL;
The WHERE clause limits the number of rows that get indexed. This is useful if certain columns in the index are not evenly distributed. For example, if you are not interested in records where a column is null, you can use a WHERE clause to exclude those records and optimize the size and performance of the index.
New bitwise functions. VoltDB now supports several new functions for performing bitwise operations on BIGINT values. The new functions support standard binary operands such as AND, OR, XOR, and NOT, as well as bit shifting operations. See the reference pages for the BITAND(), BITNOT(), BITOR(), BITXOR(), BIT_SHIFT_LEFT(), and BIT_SHIFT_RIGHT() functions in the Using VoltDB manual for details.
New HEX() function. Another new function, HEX(), converts a BIGINT value into its hexadecimal representation as a string. See the reference page for the HEX() function in the Using VoltDB manual for details.
VoltDB Management Center features include:
- New Idle Time graph — The Monitor tab contains a new graph, the partition idle time graph, which shows the amount of work being done by each partition on the current server. The graph plots the percentage of time each partition is idle. 100% indicates the partition is doing no work; 0% indicates the partition is constantly in use. The graph also includes lines for the local multi-partition coordinator and the minimum and maximum idle time for the cluster as a whole.
- Ability to change configuration settings — The Admin tab now lets you change deployment settings that are configurable at runtime, including export properties, automated snapshots, security, and selected system settings. Click on the pencil icon next to a property to edit it. Note that if security is enabled, only users with the ADMIN permission are allowed to view and edit the Admin tab settings.
New voltadmin command to stop individual servers. The VoltDB command line utility, voltadmin, now supports the STOP command. The voltadmin STOP command stops the VoltDB server process on the specified node. Note that the STOP command can only be used on a K-safe cluster and will not intentionally shut down the database. The command will only stop a node if enough nodes remain for the cluster to remain viable.
Please download v5.2 today, and share your feedback.
In-Memory Performance with On-Disk Durability and High Availability
VoltDB’s in-memory architecture is designed for performance. It eliminates the significant overhead of multi-threading and locking responsible for the poor performance of traditional RDBMSs that rely on disks.
VoltDB is also designed to ensure that data is never lost. Being an in-memory database, a frequent question is “can data be lost?” Ensuring that data would never be lost was a foundational requirement when VoltDB was designed. VoltDB’s Snapshots and Command Logging features allow you to fully recover quickly and easily. Just bring your database back up and VoltDB will do all of the heavy lifting – restoring physical data from snapshots, rebuilding indexes, and replaying transaction logs. VoltDB will have you back to normal operations in no time.
VoltDB snapshots a consistent point-in-time view of the in-memory data and serializes it to local disk. Snapshots are written at each server and are consistent across servers. Read more about snapshots.
To protect data between snapshots, VoltDB logs transaction invocations to disk. VoltDB refers to this as the command log. Command logs are also written at each server. To recover, the snapshot is restored and the command log is replayed. Together snapshots and command logs create durable, replicated copies of the database across all servers. Read more about command logs.
High Availability (HA)
VoltDB was designed for HA from the ground up. It’s easy to configure and completely transparent to your applications. Partitions are transparently replicated (active/active and synchronous) on multiple servers, so if a server fails, all data remains available, consistent, and durable for continued operation.
Transparent Scalability with Data Consistency (ACID)
VoltDB's fundamental redesign of the RDBMS provides unparalleled performance and scalability on bare-metal, virtualized and cloud infrastructures.
VoltDB uses a shared-nothing architecture to achieve database parallelism. Data and the processing associated with it are distributed among all the CPU cores within the servers composing a single VoltDB cluster. By extending its shared-nothing foundation to the per-core level, VoltDB exploits and scales with the increasing core-per-CPU counts on modern commodity servers.
Scaling server capacity is easy and 100% transparent to your application. Simply add servers to scale throughput and storage capacity -- no need to build complex and costly sharding layers. You can build your applications with the confidence that they’ll scale to meet increasing workloads.
And VoltDB is ACID compliant meaning you don’t have to trade data consistency to achieve performance and scale. Transactions are guaranteed. Your data will be 100% accurate, 100% of the time.
Standards - SQL and Java, Integrations, and Client Support
VoltDB combines the richness and flexibility of SQL for data interaction with a modern, distributed, fault-tolerant, cloud-deployable clustered architecture while maintaining the ACID guarantees of a traditional database system.
VoltDB supports the JSON data type for agility in the application development process. VoltDB also supports several client access methods including stored procedures, JDBC and ad hoc queries. Stored procedures provide the fastest processing of data as queries are moved to the data for processing.
Integrations and Client Support
Recognizing the importance of working together in a broader software ecosystem, VoltDB supports a wide range of integrations include JDBC (Java Database Connectivity) and ODBC (Open Database Connectivity) for data exchange. In addition to the tools and system procedures that VoltDB provides for monitoring the health of your database, you can also integrate this data into third-party monitoring solutions so they become part of your overall enterprise monitoring architecture.
VoltDB also provides drivers and SDKs to help connect applications to respective languages. Read more about clients and monitoring here.
VoltDB Integrations in the Data Warehouse Ecosystem
VoltDB offers a broad set of Big Data ecosystem integrations, certifications, industry partnerships and connectors to enable high-speed data export to Hadoop-based data warehouses and long-term analytics stores such as HP Vertica and IBM Netezza.
VoltDB Big Data integrations enable developers to take advantage of the speed and cyclical nature of the import-export data pipeline.
Partners and Certifications
Hortonworks is a leading commercial vendor of Apache Hadoop, the open source platform for storing, managing and analyzing Big Data. The Hortonworks Data Platform distribution of Apache Hadoop provides an open and stable foundation for enterprises and a growing ecosystem to build and deploy Big Data solutions. VoltDB is a Certified Hortonworks partner.
Cloudera offers an enterprise-class implementation of Apache Hadoop. The company’s Cloudera Enterprise helps developers benefit from the experience of the open source and Big Data/Hadoop communities. Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools. VoltDB is a Certified Cloudera partner.
MapR - MapR provides developers with an enterprise-grade Hadoop platform. MapR offers dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified distribution for Hadoop. VoltDB is a MapR Advantage Technology partner.
IBM Netezza – VoltDB’s IBM Netezza Export client uses the JDBC Connector to fetch transactional data from VoltDB. The data is written in batches to the Netezza data warehouse. Configuring this behavior is simple and requires no programming. Users automate the export process by identifying the specific VoltDB tables in the schema as sources for export data. At runtime, any data written to the specified tables is automatically sent to the VoltDB export connector, which manages the exchange of the updated information to the Netezza destination. The VoltDB export process transactionally queues export data to the connector automatically. The export client uses a series of poll and acknowledgement requests to transactionally exchange data between VoltDB and Netezza, guaranteeing at least one delivery of the data to the destination system. The export client runs within the VoltDB cluster, so it, like VoltDB, is highly available. IBM is a partner. Read more about the JDBC Connector in our documentation.
HP Vertica – VoltDB’s HP Vertica Export client uses the VoltDB JDBC Connector to fetch transactional data from VoltDB and write it, in batches, to the Vertica data warehouse. Configuring this behavior is simple and requires no programming.
Users automate the export process by identifying the specific VoltDB tables in the schema as sources for export data. At runtime, any data written to the specified tables is automatically sent to the VoltDB export connector, which manages the exchange of the updated information to the Vertica destination. The VoltDB export process transactionally queues export data to the connector automatically. The export client uses a series of poll and acknowledgement requests to transactionally exchange data between VoltDB and Vertica, guaranteeing at least one delivery of the data to the destination system. The export client runs within the VoltDB cluster, so it, like VoltDB, is highly available. HP Vertica is a partner. Read more about the JDBC Connector in our documentation.
VoltDB supports a wide range of export connectors to support integration with other data management components including CSV, WebHDFS/Hadoop, Kafka, RabbitMQ, and JDBC. The JDBC Connector provides export to data warehouse technologies such as IBM Netezza and HP Vertica. VoltDB also provides developers with simple-to-use examples and instructions to build custom, open-source export connectors. VoltDB Export enables data to arrive in your analytic store sooner, and allows deep analytics to be leveraged with radically lower latency. Read about VoltDB Export in our documentation.
VoltDB Connectors, message queues and interfaces
VoltDB serves as a real-time application database used in conjunction with Hadoop and analytical results derived from Hadoop in applications including real-time scoring, policy enforcement, and customer interaction. VoltDB provides the ability to ingest data as fast as it arrives; perform real-time analytics in-memory; make automated decisions in real time; and continuously pass, or export, processed data into Hadoop. For more about Hadoop integrations, click here.
- Apache Kafka Connector
VoltDB’s Apache Kafka Export, paired with the Kafka importer utility, allows developers to build applications in which VoltDB can both transact on incoming Kafka messages and also deliver data and alerts to Kafka feeds down stream, enabling VoltDB applications to analyze and make decisions on data in the moment. For more on Kafka connectors for VoltDB, click here.
Apache Kafka is a persistent, high performance, distributed message queue/service. Kafka is highly available, partitions (or shards) messages, and is simple and efficient to use. Useful for serializing and multiplexing streams of data, Kafka provides "at least once" delivery, and gives clients (subscribers) the ability to rewind and replay streams.
In the Apache Kafka model, VoltDB export acts as a Producer. The Kafka connector receives serialized data from the export tables and writes it to a message queue using the Apache Kafka version 0.8 protocols.
Both Kafka and VoltDB are built around shared-nothing clustering. Load is distributed among cluster nodes for performance. Data is replicated among cluster nodes for safety and availability. To handle increasing loads, nodes can be transparently added to the cluster. Nodes can fail or be removed and the remaining cluster will continue to function. Both systems are designed without single points of failure. These features are the hallmark of systems designed for scale.
Kafka is one of the most frequently used streaming vehicles in the Big Data application space. Because of its persistence capabilities, it is often used to front-end Hadoop data feeds. Read more about VoltDB and the Kafka Connector in our documentation.
The integration of RabbitMQ with VoltDB expands developers’ options to export data from VoltDB. RabbitMQ is a popular, scalable, asynchronous message queueing service that supports multiple platforms, multiple languages, and multiple protocols, including AMQP.
RabbitMQ is in wide use in enterprises developing and running applications in the cloud. Read more about the RabbitMQ Connector in our documentation.
Hive is a data warehouse application that can query large datasets held in distributed memory. Hive is a runtime Hadoop support structure that allows developers fluent with SQL to leverage the Hadoop platform with minimal effort.
VoltDB v5.0 includes a VoltDB Hadoop OutputFormat implementation, which can be used to import job data from Hadoop into VoltDB. This is further leveraged by our Hadoop connectors for both Apache Pig and Apache Hive.
VoltDB developers can export VoltDB data in Pig format to Hadoop. Pig was developed to enable developers using Hadoop® to focus on analyzing large data sets and spend less time writing mapper and reducer programs.
Pig includes two components: the PigLatin language, and a runtime environment where PigLatin programs are executed.
Avro provides a serialization and data exchange format for Hadoop. Avro is used natively by Hadoop utilities such as Pig and Hive. Because it is a binary format, Avro data takes up less network bandwidth than text-based formats such as CSV, providing VoltDB developers with efficiencies when moving processed data out of Hadoop and into VoltDB. Avro features rich data structures; a compact, fast, binary data format; a container file to store persistent data; and Remote Procedure Call (RPC).
Avro offers simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation is an optional optimization, best for implementation for statically typed languages.
Avro was designed for developers who prefer strongly typed data serialization or protocol buffer-style tools but who want the flexibility of easy interoperability with dynamic languages.
- Creating a Custom Export
It is also possible to create a custom export connector that runs inside VoltDB. Click here for instructions.
VoltDB is the only open source in-memory NewSQL database. VoltDB was created as an open source project by co-founder Dr. Michael Stonebraker. Once the decision was made to launch a company to support the project, we worked to build an organization committed to developer and customer success with our products.
Why is it important for VoltDB to be open source? The open source community values integrity, participation, the open exchange of ideas, shared purpose, and support for the best ideas. We share those values: our company culture is open and collaborative. We work closely with customers, establishing open, supportive relationships between our developers and our customers’ developers.
VoltDB is developed using the Agile methodology; we believe in rapid prototyping and push updates out to customers on a monthly schedule. Our developers integrate VoltDB with key open source ecosystem components including Kafka, Rabbit MQ, Docker, and Hadoop, with more in the pipeline. We also contribute to projects including Kafka, Rabbit MQ, and Nagios. Our customers and community members contribute to VoltDB’s open source project on GitHub, helping to improve the quality and robustness of the project and the downstream product.
We make our code available under the popular AGPL license, and offer a free community edition which is used by many community members and educational organizations. We encourage prospects to inspect our code using the community edition.
With VoltDB, our users have the best of both worlds: a freely-available community edition built with open source methodologies, and a robust, commercial-strength product. We invite you to take a look at the community edition, or give us a call to discuss how the commercial edition can meet your needs.