Leaderboards: Optimization for Ranking-Related Queries

written by Douglas Patten on October 5, 2012 with one Comment

My name is Xin Jia and I was one of the student interns at VoltDB this summer. For one of my projects, I worked on a feature that greatly improved the performance of ranking-related queries.

Finding the rank of an entry in a sorted list of data is a common operation in a lot of applications. Take a leaderboard within gaming application as an example: it is common to have a scoreboard that keeps track of users’ scores. Questions like, “what are the top-n users and their corresponding scores?”, or “what is the rank of a certain user?” are often asked. While the first question is easy to answer with a simple SQL query with an ORDER BY and a LIMIT, the second one can be hard to answer efficiently.

With the release of VoltDB 2.8.1, finding the rank of a user is no longer an expensive operation.

Assume that we are creating a leaderboard application for a popular online game. Each user has a score representing the current standing of that user in the game. When a player logs into the game, the score of that player along with his or her rank will be displayed.

The table we will use to store all the scores is called scores. It has a player_id column that is the unique ID of a player, and a score column that stores the latest high score of that player. To make locating a player in the table efficient, the player_id column is set as the primary key. Since we are also interested in the ranks of each player, a tree index on the score column that preserves ordering is also created.

CREATE TABLE scores

(
player_id    integer  NOT NULL
, score        integer  NOT NULL
, PRIMARY KEY (player_id)
);

CREATE INDEX player_index ON scores(score);

With this schema it is trivial, and efficient, to find the score of any given player. For example, the following query finds the score of user 10240, and will do so efficiently by using the primary key player_id:

SELECT score FROM scores WHERE player_id = 10240;</pre>
Finding the rank of that player, though simple, can be much more more costly. Assume that the score of player 10240 is 281, the following query below will return the rank of the player:

SELECT COUNT(*) FROM scores WHERE score > 281;
This query is asking the database to compute the number of players have that scores greater than 281. This is the same as asking the rank of the player.

In versions prior to 2.8.1, this query would have to scan through the tree index counting the number of players, one by one, that have scores greater than 281. Starting from version 2.8.1, the underlying tree index will store an extra count of all the children under every node, so the query can be answered instantly.

The performance improvement is noticeable and significant even in in a medium sized dataset. As the size of the dataset increases, the speedup could be orders of magnitude.

Beyond gaming applications, this feature could be used in Capital Market applications to measure the depth of the order book at various price levels. In Digital Advertising, this feature could be used for optimizing targeting such as ranking the demographic groups or keywords associations that respond best to a particular ad.

Finally, note that this feature is not restricted to getting ranks, it can also be applied to range queries like the following:

SELECT COUNT(*) FROM scores WHERE score >= 10 AND score <= 200;

Which returns the number of users within a score range of 10 and 200.

There is only a small number of databases and datastores that support a similar feature. By adding this feature to VoltDB, we hope to bring a much more pleasant user experience to certain types of applications.