Debunking Myths about the VoltDB in-memory database
I’d like to take a quick moment to address some myths and misconceptions about VoltDB. Many people selling products who view VoltDB as competition seem to be repeating them. As you’ll read, much of what’s said is just plain FUD.
VoltDB is an in-memory database that has benchmarked at over 3 million transactions a second on bare metal, and recently crushed previous performance records in the cloud, posting eye-popping YCSB (Yahoo! Cloud Service Benchmark) numbers on AWS, Amazon’s cloud platform. It’s fully transactional, supports full disk durability, and has very low latency and fully native high availability. It’s not surprising competing interests look for FUD.
Note the comments below apply to the current shipping version of VoltDB at the time of this posting (4.2), unless otherwise noted.
Myth #1: “VoltDB requires stored procedures.”
This was true for 1.0, but no one seems to notice it’s been false since we shipped 1.1 in 2010. VoltDB supports unforeseen SQL without any stored procedure use. We have users in production who have never used a single stored procedure.
SQL queries can be run through:
- Our command line SQL console
- A web portal built into every server
- The HTTP/JSON interface
- Native client drivers written in C++, C#, PHP, Python, Node.js and even Java.
- Community-provided language drivers
- ODBC drivers (in development to ship in Q2)
There is one restriction: VoltDB doesn’t support external transaction control; VoltDB is permanently in auto-commit mode. Note this offers more transactionality than most NoSQL systems, and is a shared restriction of several NewSQL or “In-Memory” SQL systems. For those systems that do offer external transaction control, I’m not aware of any that publish benchmarks where transactions require multiple round trips to the client. This is because it’s generally understood to be a bad thing in a performance-critical system.
In addition to ad-hoc SQL, VoltDB auto-generates optimized procedures for various CRUD operations (create, read, update, delete). For example, to fetch a row by primary key from table FOO, I can call a procedure named “FOO.select” with the key value as a parameter.
Myth #2: “VoltDB doesn’t support ad-hoc SQL.”
This is just a rephrasing of Myth #1 and is still false.
Myth #3: “VoltDB is slow unless I use stored procedures.”
Well, no. VoltDB can run faster with stored procedures, but it’s still fast if they are not used. In our internal benchmarks on pretty cheap single-socket hardware, we can run about 50k write statements per second, per host with full durability. That’s a bit lower than we can achieve with stored procedures, so we’re doing some engineering work to close the gap. Still, on a 3-node cluster costing about $3,000 for hardware, we can run well over 100k SQL updates per second with durability and double redundancy. If your current system can run that fast, let us know.
Also, the auto-generated CRUD procedures mentioned above run at the full speed of pre-optimized procedures. Using just CRUD, VoltDB is the fastest durable Key-Value store I’m aware of.
Myth #4: “I have to know Java to use VoltDB.”
As of VoltDB 3.0, released over a year ago, (we’re on V4.2 today), a user can build VoltDB apps and run the server without ever directly interacting with the Java CLI tools or any Java code.
Client apps can also be written without Java. As mentioned previously, VoltDB supports client drivers in many languages, with community-contributed drivers in others.
If you do want to leverage VoltDB’s ability to perform multi-statement, transactional logic, only the most basic understanding of Java is needed to build stored procedures. We provide example applications in our kit that can be copied from and/or modified. We even support declaring procedures in Groovy right in your DDL.
Myth #5: “VoltDB has garbage collection problems because it is written in Java.”
The short answer is this is just not true.
First, VoltDB uses native C++ code for all of its data storage and SQL execution paths. This is done to avoid putting long-lived data on the Java heap, but also to allow very fine control of memory use. VoltDB’s special native data structures use less memory than competing systems and actively fight allocation fragmentation.
What’s left on the Java heap is either near-immortal configuration and setup data, or very short-lived per-transaction data. This is almost the perfect workload for the Java garbage collector, so there is minimal impact to the user’s workload.
For more detail, read Ariel Weisberg’s excellent blog post on the topic.
Myth #6: “VoltDB’s SQL is very limited.”
This misconception is based on historical truth, but is way out of date in 2014. The original H-Store project, upon which VoltDB is based, started with very limited, OLTP-focused SQL, but much has changed in the past years. VoltDB is rapidly approaching SQL-92 functionality for DML and DQL SQL. One of the final holdouts, subselects, shipped in part in 4.2; more functionality will ship in upcoming versions. In addition to this ANSI SQL baseline, we’ve added materialized view support, function-based index support, native JSON functionality and more.
We have a lot more coming in 2014. We’re proud to announce all of this is fully Unicode-compliant and is in production in several uses in countries with multi-byte character sets.
Myth #7: “Yes, VoltDB supports cross-partition transactions, but they are too slow to use.”
Yes we do, and no they aren’t too slow to use. Keep in mind that NoSQL systems often don’t offer cross-partition transactions and many other NewSQL systems have functionality or performance limitations. Few benchmarks stress this functionality, so I’ll be as clear as possible here.
Traditionally, VoltDB’s cross-partition transactions have been slower than our partitioned operations. Additional coordination needs to happen to make these transactions ACID at the “serializable” isolation level at which all VoltDB transactions run. Furthermore, these operations don’t scale with cluster size and sometimes can vary in performance, depending on your networking performance.
But are they too slow to use? I’ll let you be the judge.
- Cross-partition writes run on the order of 1,000 per second, regardless of cluster size. This number can vary by an order of magnitude depending on hardware.
- Cross partition reads run on the order of 50,000 per second, regardless of cluster size. Again, there is some variability, but less than with writes.
- It’s possible to send writes to each partition individually using the “Send Everywhere” pattern. These writes are transactional at each partition, but individual partitions might fail and rollback independently of others.
A common pattern we see is a mix of single-partition writes and cross-partition reads. Cross-partition writes are often used to update infrequently changing lookup tables. Of course at 1000 per second, they can still change fairly frequently. Note that if your read operation is scanning millions of tuples, you won’t see the same throughput, but if it’s populating a schema supported leaderboard or computing a time window statistic using a materialized view, you will.The bottom line is that most of our customers not only use cross-partition transactions, but also these kinds of operations are one of the key benefits VoltDB offers.
We are always working to improve the performance of cross-partition operations, and in recent releases we have made a lot of progress. In 4.0 we introduced read optimizations that got us to the 50k/second number. We also have write optimizations in the pipeline.
Many of these myths and misconceptions are firmly rooted in the world of traditional RDBMSs. VoltDB is a NewSQL database. When evaluating VoltDB, consider the non-traditional features that set us head and shoulders above traditional RDBMSs, NoSQL and other NewSQL options:
- Intelligent, asynchronous clients allow tremendous throughput per-client, as well as terrific management.
- Native export connectors push tuples into downstream systems like message queues, analytic data-warehouses and Hadoop
- Active-Active intra-cluster replication with sub-millisecond latency ensures perfectly consistent high availability and redundant durability.
- Seamless node failure and replacement without impacting serializable ACID semantics.
- Transactional cluster expansion with low performance impact and without impacting serializable ACID semantics provides near-limitless scalability.
- 1500 active, connected clients per server node with minimal latency impact meets the needs of most high-velocity, high transaction volume businesses.
- VoltDB is open source (https://github.com/VoltDB/voltdb)
If you want to hear more, or if you have a use case you want to talk to us about, contact us. We’d love to hear from you. If you’d like to try VoltDB out yourself, you can download a free trial here.