![]() | ![]() | contents |
The export-to-Hadoop client fetches the serialized data from the export connector and — rather than writing it out as text files — exports it to the Hadoop distributed file system (hdfs) using the Sqoop import application from Cloudera. The export-to-Hadoop client is available in the VoltDB Enterprise Edition only.
To use the export-to-Hadoop client, you must have already installed and configured both Hadoop and Sqoop and started the Hadoop file system. The client uses the variable HADOOP_HOME to determine where Hadoop is installed and what file system to use as the target of the export. It then extracts and formats the exported data from the VoltDB connector and runs the Sqoop importer to read that data into Hadoop.
You start the export-to-Hadoop client using the Java command. However, it is important that all of the necessary JAR files for VoltDB, Hadoop, and Sqoop are in your class path. The easiest way to do this is using an Ant build script. But for demonstration purposes, the following example uses the export command to define CLASSPATH.
The command to start the export-to-Hadoop client looks something like the following:
$ V_PATH="/opt/voltdb/voltdb/*"
$ H_PATH="$HADOOP_HOME/*:$HADOOP_HOME/conf:$HADOOP_HOME/lib/*"
$ S_PATH="$SQOOP_HOME/*:$SQOOP_HOME/lib/*"
$ export CLASSPATH="$V_PATH:$H_PATH:$S_PATH"
$ java org.voltdb.hadoop.VoltDBSqoopExportClient \
--connect client \
--servers myserver \
--nonce ExportDataThe export-to-Hadoop client has a number of command line options that let you customize the export process to meet your needs. Many of the options are the same as for the export-to-file client. However, some options are specific to this client and allow you to control the Sqoop importer. In the following description the generic export client options are listed separately from the Sqoop-specific options.
The complete syntax of the command line is as follows:
$ java -classpath {path} \
org.voltdb.hadoop.VoltDBSqoopExportClient \
{arguments...}$ java -classpath {path} \
org.voltdb.hadoop.VoltDBSqoopExportClient \
--help
The supported export client arguments are:
A comma separated list of host names or IP addresses to query.
The prefix to use for the directories that the client creates in Hadoop.
The port to connect to. You specify the type of port (client or admin), not the port number.
The username to use for authenticating to the VoltDB server(s). Required only if security is enabled for the database.
The password to use for authenticating to the VoltDB server(s). Required only if security is enabled for the database. If you specify a username but not a password, the export client prompts you for the password.
(Optional.) The frequency, in minutes, for "rolling" the output file. The default frequency is 60 minutes.
(Optional.) The directory where temporary files are written as part of the export/import process.
(Optional.) Alternate delimiter characters for the CSV output. The text string specifies four characters: the field delimiter, the enclosing character, the escape character, and the record delimiter. To use special or non-printing characters (including the space character) encode the character as an html entity. For example "<" for the "less than" symbol.
(Optional.) Specifies that all CSV fields, even numeric and null fields, are enclosed (in quotation marks, by default).
(Optional.) The http port for connecting to the Hadoop file system (port 8099 by default).
(Optional.) Eliminates the six columns of VoltDB metadata (such as transaction ID and timestamp) from the
output. If you specify --skipinternals the output files contain only the exported table
data.
The following are supported Sqoop-specific arguments. These arguments are passed directly to the Sqoop importer interface and are documented in detail in the Sqoop documentation:
(Optional.) Overrides the value of the HADOOP_HOME environment variable. You must provide a valid Hadoop home directory, either by defining HADOOP_HOME or specifying the location with --hadoop-home.
(Optional.) Displays additional information during Sqoop processing.
(Optional.) The destination directory in HDFS.
(Optional.) A parent directory in HDFS where separate folders are created for each table. The options
--target-dir and --warehouse-dir are mutually exclusive.
(Optional.) The string to use for null string values in the output.
(Optional.) The string to use for null values in the output for all datatypes except strings.
The Tao of VoltDB
The 5 Principles of VoltDB
VoltDB Technosphere
Products and Solutions
Technical Support
Key Features
Download VoltDB
No Limits
VoltDB Application Gallery
Infinite Possibilities
VoltBuilder Program
