Optimizing Distributed Read Operations in VoltDB

written by Ryan Betts on August 3, 2011 with no comments

Many VoltDB applications, such as gaming leader boards and real-time analytics, use multi-partition procedures to compute consistent global aggregates (and other interesting statistics).  It’s challenging to efficiently process distributed reads operations, especially for performance sensitive applications.  Based on feedback from our users, we in VoltDB engineering have been enhancing the VoltDB SQL planner over the last few releases to improve this capability.

Executing global aggregates efficiently requires calculating sub-results at each partition replica and combining the sub-results at a coordinating partition to produce the final result.  For example, to calculate a total sum, the VoltDB planner should produce a sub-total at each partition and then sum the sub-totals at the coordinator node.  All of this work must be transparent to the application, of course.

A Couple of Examples

Let’s look at a couple of quick examples using the simple Votes table from the Voter sample application that ships with VoltDB, using explain plans to demonstrate the VoltDB planner’s choices.

Each row of the partitioned Votes table is a vote for a contestant.  The table is indexed by voters’ phone numbers.

create table votes (
    phone_number bigint not null,
    contestant_number tinyint null null);
  create index idx_votes_tree on votes (phone_number);

To calculate the number of votes received by each contestant:

  select contestant_number, count(phone_number)
    from votes group by contestant_number
    order by contestant_number;

For the above select statement, the planner performs the following work:

RETURN RESULTS TO STORED PROCEDURE
ORDER BY (SORT)
AGGREGATION ops: sum
RECEIVE FROM ALL PARTITIONS
SEND PARTITION RESULTS TO COORDINATOR
AGGREGATION ops: count
SEQUENTIAL SCAN of "VOTES"

Reading from the bottom upwards, the plan starts with a scan of the Votes table, counting phone numbers at each distributed partition. Those sub-counts are sent to the coordinator (“RECEIVE FROM ALL PARTITIONS”), where they are summed and sorted.

If the count aggregate had not been pushed past the send/receive pair, the partitions would have sent all rows of the Votes table to the coordinator. That plan would not have scaled and VoltDB would have aborted its execution to protect the coordinator from exhausting memory resources.

The VoltDB planner is capable of distributing combinations of min, max, sum, count and limit operations to achieve a variety of query outcomes.  Note also how the following example uses the index on phone_number to avoid scanning:

select phone_number from votes
   order by phone_number limit 3;

For the above select statement, the planner performs the following work:

RETURN RESULTS TO STORED PROCEDURE
LIMIT 3
ORDER BY (SORT)
RECEIVE FROM ALL PARTITIONS
SEND PARTITION RESULTS TO COORDINATOR
INDEX SCAN of "VOTES" using "IDX_VOTES_TREE" (for sort order only)
  inline (LIMIT 3)

The plan produces a limit at each partition and then re-limits at the coordinator to produce the final result.

Conclusion

In summary, the goal of our ongoing optimizations is to handle an increasing number (and combination) of distributed read operations in VoltDB.  Much of this work has been driven by user feedback in support of real-time analytics, gaming systems and other applications that need high performance queries.