13.5. How Export Works

The export connector implements a loosely coupled approach to extracting export data from a running VoltDB database. When export is enabled at runtime:

  1. Insert operations to export-only tables are queued to the export connector.

  2. If the export client is run on the database server, the client starts and links to the export connector. If the client is run remotely, it establishes a link to the connector through one of the standard TCP/IP ports (either the client or admin port). The client then issues POLL requests.

  3. The connector responds to the POLL requests with the next queued data block (or an empty block if the queue is empty).

  4. The client is then responsible for receiving the data and writing it to the appropriate destination.

  5. Finally, the export client sends an ACK message acknowledging completion of the export (at which point the connector can remove it from the queue) before polling for the next data block.

Figure 13.3, “The Components of the Export Process” shows the interaction between the VoltDB database, the connector, and the export client.

Figure 13.3. The Components of the Export Process

The Components of the Export Process

The export function queues and passes data to the connector automatically. You do not need to do anything explicitly to start the connector; it starts and stops when the database starts and stops. The connector and the export client use a series of poll and ack requests to exchange the data over the TCP port.

The export client decides what is done with the data it receives from the connector. For example, one client writes the serialized data to a sequence of files while another could insert it into an analytic database.

When using a remote client, only one client can connect to the connector at a time. It is also important to note that the remote client must create connections to all nodes in the database cluster, since each node creates its own instance of the connector.

When running the export client on the server, each database server creates one instance each of the connector and the client, distributing the work across the cluster.

13.5.1. Export Overflow

For the export process to work, it is important that the connector and client keep up with the queue of exported information. If too much data gets queued to the connector by the export function without being fetched by the client, the VoltDB server process consumes increasingly large amounts of memory.

If the export client does not keep up with the connector and the data queue fills up, VoltDB starts writing overflow data in the export buffer to disk. This protects your database in several ways:

  • If a remote export client fails, writing to disk helps VoltDB avoid consuming too much memory while waiting for the client to restart.

  • If the database is stopped, the export data is retained across sessions. When the database restarts and the client reconnects, the connector will retrieve the overflow data and reinsert it in the export queue.

You can specify where VoltDB writes the overflow export data using the <exportoverflow> element in the deployment file. For example:

<paths>
   <voltdbroot path="/opt/voltdb/" />
   <exportoverflow path="/tmp/export/"/>
</paths>

If you do not specify a path for export overflow, VoltDB creates a subfolder in the root directory (in the preceding example, /opt/voltdb). See Section 6.1.2, “Configuring Paths for Runtime Features” for more information about configuring paths in the deployment file.

13.5.2. Persistence Across Database Sessions

It is important to note that VoltDB only uses the disk storage for overflow data. However, you can force VoltDB to write all queued export data to disk by either calling the @Quiesce system procedure or by requesting a blocking snapshot. (That is, calling @SnapshotSave with the blocking flag set.) This means it is possible to perform an orderly shutdown of a VoltDB database and ensure all data (including export data) is saved with the following procedure:

  1. Put the database into admin mode with the voltadmin pause command.

  2. Perform a blocking snapshot with voltadmin save, saving both the database and any existing queued export data.

  3. Shutdown the database with voltadmin shutdown.

You can then restore the database — and any pending export queue data — by starting the database in admin mode, restoring the snapshot, and then exiting admin mode.