Introducing VoltDB v5.1!
It’s tough work to continually release improvements to VoltDB, especially after a major release such as VoltDB v5.0. But the development team here has done it again, and we’re really excited to share the new functionality we’ve added to VoltDB 5.1.
So what, exactly, did we do?
VoltDB v5.1 adds our next generation of Database Replication functionality, removing any single point of failure, improving performance and scalability, and setting the foundation for future capabilities.
Database Replication improvements include:
- Significantly better performance: rather than a single replication stream, DR now occurs between multiple partitions simultaneously, in parallel. Also, the new DR uses binary logs of transaction results, saving the replica from having to replay the transaction.
- No single point of failure — By eliminating the DR agent the new DR not only removes a single point of failure, it simplifies DR from an operational perspective.
- More flexibility — You can now specify which tables you want to replicate rather than having to replicate the entire database. Of course, you can always choose to replicate all of the tables if you like.
Existing DR customers will need to make a few operational changes to take advantage of these new features. Among the changes you’ll need to implement:
- Identify the tables participating in DR using the DR TABLE statement in the schema.
- Configure DR in the deployment files for both the master and the replica clusters.
See the chapter on Database Replication in the Using VoltDB manual for details.
Of course, v5.1 offers more than just Database Replication. This release also includes:
Export to multiple targets. Previously you could export to only one downstream system. With v5.1, each export table can stream to a different target. For example, you might export de-duped sensor data to Hadoop once it has been processed and export alerts regarding unusual events to HTTP for distribution via SMS, email, or other notification service. See the chapter on exporting live data in the Using VoltDB manual for details.
Batch processing of DDL statements. Defining large database schema just got much faster. VoltDB 5.0 introduced interactive DDL, eliminating the need for a precompiled application catalog. However, large schema could take a significant time to process interactively. VoltDB 5.1 solves this problem by allowing you to batch DDL statements. Also, if you have a mix of DDL (data definition language) statements and DML (data manipulation language) and directives you can batch process only the DDL statements by enclosing them in a file --inlinebatch directive and the specified end marker.
Additional monitoring in the VoltDB Management Center. We’ve added administrative functions to VMC, accessible through a new tab (Admin) that allows you to administer the database by setting various configuration options. You can now pause and resume the database, save and restore snapshots, or review and update the database configuration. If you’ve enabled security, only users with ADMIN permission can see and use the Admin tab in the Management Center.
Download v5.1 today, and let us know what you think!
In-Memory Performance with On-Disk Durability and High Availability
VoltDB’s in-memory architecture is designed for performance. It eliminates the significant overhead of multi-threading and locking responsible for the poor performance of traditional RDBMSs that rely on disks.
VoltDB is also designed to ensure that data is never lost. Being an in-memory database, a frequent question is “can data be lost?” Ensuring that data would never be lost was a foundational requirement when VoltDB was designed. VoltDB’s Snapshots and Command Logging features allow you to fully recover quickly and easily. Just bring your database back up and VoltDB will do all of the heavy lifting – restoring physical data from snapshots, rebuilding indexes, and replaying transaction logs. VoltDB will have you back to normal operations in no time.
VoltDB snapshots a consistent point-in-time view of the in-memory data and serializes it to local disk. Snapshots are written at each server and are consistent across servers. Read more about snapshots.
To protect data between snapshots, VoltDB logs transaction invocations to disk. VoltDB refers to this as the command log. Command logs are also written at each server. To recover, the snapshot is restored and the command log is replayed. Together snapshots and command logs create durable, replicated copies of the database across all servers. Read more about command logs.
High Availability (HA)
VoltDB was designed for HA from the ground up. It’s easy to configure and completely transparent to your applications. Partitions are transparently replicated (active/active and synchronous) on multiple servers, so if a server fails, all data remains available, consistent, and durable for continued operation.
Transparent Scalability with Data Consistency (ACID)
VoltDB's fundamental redesign of the RDBMS provides unparalleled performance and scalability on bare-metal, virtualized and cloud infrastructures.
VoltDB uses a shared-nothing architecture to achieve database parallelism. Data and the processing associated with it are distributed among all the CPU cores within the servers composing a single VoltDB cluster. By extending its shared-nothing foundation to the per-core level, VoltDB exploits and scales with the increasing core-per-CPU counts on modern commodity servers.
Scaling server capacity is easy and 100% transparent to your application. Simply add servers to scale throughput and storage capacity -- no need to build complex and costly sharding layers. You can build your applications with the confidence that they’ll scale to meet increasing workloads.
And VoltDB is ACID compliant meaning you don’t have to trade data consistency to achieve performance and scale. Transactions are guaranteed. Your data will be 100% accurate, 100% of the time.
Standards - SQL and Java, Integrations, and Client Support
VoltDB combines the richness and flexibility of SQL for data interaction with a modern, distributed, fault-tolerant, cloud-deployable clustered architecture while maintaining the ACID guarantees of a traditional database system.
VoltDB supports the JSON data type for agility in the application development process. VoltDB also supports several client access methods including stored procedures, JDBC and ad hoc queries. Stored procedures provide the fastest processing of data as queries are moved to the data for processing.
Integrations and Client Support
Recognizing the importance of working together in a broader software ecosystem, VoltDB supports a wide range of integrations include JDBC (Java Database Connectivity) and ODBC (Open Database Connectivity) for data exchange. In addition to the tools and system procedures that VoltDB provides for monitoring the health of your database, you can also integrate this data into third-party monitoring solutions so they become part of your overall enterprise monitoring architecture.
VoltDB also provides drivers and SDKs to help connect applications to respective languages. Read more about clients and monitoring here.
VoltDB Integrations in the Data Warehouse Ecosystem
VoltDB offers a broad set of Big Data ecosystem integrations, certifications, industry partnerships and connectors to enable high-speed data export to Hadoop-based data warehouses and long-term analytics stores such as HP Vertica and IBM Netezza.
VoltDB Big Data integrations enable developers to take advantage of the speed and cyclical nature of the import-export data pipeline.
Partners and Certifications
Hortonworks is a leading commercial vendor of Apache Hadoop, the open source platform for storing, managing and analyzing Big Data. The Hortonworks Data Platform distribution of Apache Hadoop provides an open and stable foundation for enterprises and a growing ecosystem to build and deploy Big Data solutions. VoltDB is a Certified Hortonworks partner.
Cloudera offers an enterprise-class implementation of Apache Hadoop. The company’s Cloudera Enterprise helps developers benefit from the experience of the open source and Big Data/Hadoop communities. Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools. VoltDB is a Certified Cloudera partner.
MapR - MapR provides developers with an enterprise-grade Hadoop platform. MapR offers dependability, ease-of-use and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified distribution for Hadoop. VoltDB is a MapR Advantage Technology partner.
IBM Netezza – VoltDB’s IBM Netezza Export client uses the JDBC Connector to fetch transactional data from VoltDB. The data is written in batches to the Netezza data warehouse. Configuring this behavior is simple and requires no programming. Users automate the export process by identifying the specific VoltDB tables in the schema as sources for export data. At runtime, any data written to the specified tables is automatically sent to the VoltDB export connector, which manages the exchange of the updated information to the Netezza destination. The VoltDB export process transactionally queues export data to the connector automatically. The export client uses a series of poll and acknowledgement requests to transactionally exchange data between VoltDB and Netezza, guaranteeing at least one delivery of the data to the destination system. The export client runs within the VoltDB cluster, so it, like VoltDB, is highly available. IBM is a partner. Read more about the JDBC Connector in our documentation.
HP Vertica – VoltDB’s HP Vertica Export client uses the VoltDB JDBC Connector to fetch transactional data from VoltDB and write it, in batches, to the Vertica data warehouse. Configuring this behavior is simple and requires no programming.
Users automate the export process by identifying the specific VoltDB tables in the schema as sources for export data. At runtime, any data written to the specified tables is automatically sent to the VoltDB export connector, which manages the exchange of the updated information to the Vertica destination. The VoltDB export process transactionally queues export data to the connector automatically. The export client uses a series of poll and acknowledgement requests to transactionally exchange data between VoltDB and Vertica, guaranteeing at least one delivery of the data to the destination system. The export client runs within the VoltDB cluster, so it, like VoltDB, is highly available. HP Vertica is a partner. Read more about the JDBC Connector in our documentation.
VoltDB supports a wide range of export connectors to support integration with other data management components including CSV, WebHDFS/Hadoop, Kafka, RabbitMQ, and JDBC. The JDBC Connector provides export to data warehouse technologies such as IBM Netezza and HP Vertica. VoltDB also provides developers with simple-to-use examples and instructions to build custom, open-source export connectors. VoltDB Export enables data to arrive in your analytic store sooner, and allows deep analytics to be leveraged with radically lower latency. Read about VoltDB Export in our documentation.
VoltDB Connectors, message queues and interfaces
VoltDB serves as a real-time application database used in conjunction with Hadoop and analytical results derived from Hadoop in applications including real-time scoring, policy enforcement, and customer interaction. VoltDB provides the ability to ingest data as fast as it arrives; perform real-time analytics in-memory; make automated decisions in real time; and continuously pass, or export, processed data into Hadoop. For more about Hadoop integrations, click here.
- Apache Kafka Connector
VoltDB’s Apache Kafka Export, paired with the Kafka importer utility, allows developers to build applications in which VoltDB can both transact on incoming Kafka messages and also deliver data and alerts to Kafka feeds down stream, enabling VoltDB applications to analyze and make decisions on data in the moment. For more on Kafka connectors for VoltDB, click here.
Apache Kafka is a persistent, high performance, distributed message queue/service. Kafka is highly available, partitions (or shards) messages, and is simple and efficient to use. Useful for serializing and multiplexing streams of data, Kafka provides "at least once" delivery, and gives clients (subscribers) the ability to rewind and replay streams.
In the Apache Kafka model, VoltDB export acts as a Producer. The Kafka connector receives serialized data from the export tables and writes it to a message queue using the Apache Kafka version 0.8 protocols.
Both Kafka and VoltDB are built around shared-nothing clustering. Load is distributed among cluster nodes for performance. Data is replicated among cluster nodes for safety and availability. To handle increasing loads, nodes can be transparently added to the cluster. Nodes can fail or be removed and the remaining cluster will continue to function. Both systems are designed without single points of failure. These features are the hallmark of systems designed for scale.
Kafka is one of the most frequently used streaming vehicles in the Big Data application space. Because of its persistence capabilities, it is often used to front-end Hadoop data feeds. Read more about VoltDB and the Kafka Connector in our documentation.
The integration of RabbitMQ with VoltDB expands developers’ options to export data from VoltDB. RabbitMQ is a popular, scalable, asynchronous message queueing service that supports multiple platforms, multiple languages, and multiple protocols, including AMQP.
RabbitMQ is in wide use in enterprises developing and running applications in the cloud. Read more about the RabbitMQ Connector in our documentation.
Hive is a data warehouse application that can query large datasets held in distributed memory. Hive is a runtime Hadoop support structure that allows developers fluent with SQL to leverage the Hadoop platform with minimal effort.
VoltDB v5.0 includes a VoltDB Hadoop OutputFormat implementation, which can be used to import job data from Hadoop into VoltDB. This is further leveraged by our Hadoop connectors for both Apache Pig and Apache Hive.
VoltDB developers can export VoltDB data in Pig format to Hadoop. Pig was developed to enable developers using Hadoop® to focus on analyzing large data sets and spend less time writing mapper and reducer programs.
Pig includes two components: the PigLatin language, and a runtime environment where PigLatin programs are executed.
Avro provides a serialization and data exchange format for Hadoop. Avro is used natively by Hadoop utilities such as Pig and Hive. Because it is a binary format, Avro data takes up less network bandwidth than text-based formats such as CSV, providing VoltDB developers with efficiencies when moving processed data out of Hadoop and into VoltDB. Avro features rich data structures; a compact, fast, binary data format; a container file to store persistent data; and Remote Procedure Call (RPC).
Avro offers simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation is an optional optimization, best for implementation for statically typed languages.
Avro was designed for developers who prefer strongly typed data serialization or protocol buffer-style tools but who want the flexibility of easy interoperability with dynamic languages.
- Creating a Custom Export
It is also possible to create a custom export connector that runs inside VoltDB. Click here for instructions.
VoltDB is the only open source in-memory NewSQL database. VoltDB was created as an open source project by co-founder Dr. Michael Stonebraker. Once the decision was made to launch a company to support the project, we worked to build an organization committed to developer and customer success with our products.
Why is it important for VoltDB to be open source? The open source community values integrity, participation, the open exchange of ideas, shared purpose, and support for the best ideas. We share those values: our company culture is open and collaborative. We work closely with customers, establishing open, supportive relationships between our developers and our customers’ developers.
VoltDB is developed using the Agile methodology; we believe in rapid prototyping and push updates out to customers on a monthly schedule. Our developers integrate VoltDB with key open source ecosystem components including Kafka, Rabbit MQ, Docker, and Hadoop, with more in the pipeline. We also contribute to projects including Kafka, Rabbit MQ, and Nagios. Our customers and community members contribute to VoltDB’s open source project on GitHub, helping to improve the quality and robustness of the project and the downstream product.
We make our code available under the popular AGPL license, and offer a free community edition which is used by many community members and educational organizations. We encourage prospects to inspect our code using the community edition.
With VoltDB, our users have the best of both worlds: a freely-available community edition built with open source methodologies, and a robust, commercial-strength product. We invite you to take a look at the community edition, or give us a call to discuss how the commercial edition can meet your needs.