csvloader

csvloader — Imports the contents of a CSV file and inserts it into a VoltDB table.

Synopsis

csvloader table-name [arguments]

csvloader -p procedure-name [arguments]

Description

The csvloader command reads comma-separated values and inserts each valid line of data into the specified table in a VoltDB database. The most common way to use csvloader is to specify the database table to be loaded and a CSV file containing the data, like so:

$ csvloader employees -f acme_employees.csv

Alternately, you can use standard input as the source of the data:

$ csvloader employees < acme_employees.csv

In addition to inserting all valid content into the specified database table, csvloader creates three output files:

  • Error log — The error log provides details concerning any errors that occur while processing the input file. This includes errors in the format of the input as well as errors that occur attempting the insert into VoltDB. For example, if two rows contain the same value for a column that is declared as unique, the error log indicates that the second insert fails due to a constraint violation.

  • Failed input — A separate file contains the contents of each line that failed to load. This file is useful because it allows you to correct any formatting issues and retry just the failed content, rather than having to restart and reload the entire table.

  • Summary report — Once all input lines are processed, csvloader generates a summary report listing how many lines were read, how many were successfully loaded and how long the operation took.

All three files are created, by default, in the current working directory using "csvloader" and the table name as prefixes. For example, using csvloader to insert contestants into the sample voter database creates the following files:

csvloader_contestants_insert_log.log
csvloader_contestants_invalidrows.csv
csvloader_contestants_insert_report.log

Arguments

--blank {error | null | empty }

Specifies what to do with missing values in the input. By default, if a line contains a missing value, an error is reported and the input line ignored. If you do not want missing values to be interpreted as an error, you can use the --blank argument to specify other behaviors. Specifying --blank null enters a null, in the appropriate datatype, as the column value. Specifying --blank empty returns the corresponding "empty" value in the appropriate datatype. An empty value is interpreted as the following:

  • Zero for all numeric columns

  • Zero, or the Unix epoch value, for timestamp columns

  • An empty or zero-length string for VAXCHAR and VARBINARY columns

--escape {character}

Specifies the escape character that must precede a separator or quotation character that is supposed to be interpreted as a literal character in the CSV input. The default escape character is the backslash (\).

-f, --file {file-specification}

Specifies the location of a CSV file to read as input. If you do not specify an input file, csvloader reads input from standard input.

--limitrows {integer}

Specifies the maximum number of rows to be read from the input stream. This argument (along with --skip) lets you load a subset of a larger CSV file.

-m, --maxerrors {integer}

Specifies the target number of errors before csvloader stops processing input. Once csvloader encounters the specified number of errors while trying to insert rows, it will stop reading input and end the process. Note that, since csvloader performs inserts asynchronously, it often attempts more inserts before the target number of exceptions are returned from the database. So it is possible more errors could be returned after the target is met. This argument lets you conditionally stop a large loading process if more than an acceptable number of errors occur.

--nowhitespace

Specifies that the CSV input must not contain any whitespace between data values and separators. By default, csvloader ignores extra space between values, quotation marks, and the value separators. If you use this argument, any input lines containing whitespace will generate an error and not be inserted into the database.

--password {text]

Specifies the password to use when connecting to the database. You must specify a username and password if security is enabled for the database.

--port {port-number}

Specifies the network port to use when connecting to the database. If you do not specify a port, csvloader uses the default client port 21212.

-p, --procedure {procedure-name}

Specifies a stored procedure to use for loading each record from the data file. The named procedure must exist in the database catalog and must accept the fields of the data record as input parameters. By default, csvloader uses the default insert procedure {TABLE-NAME}.insert.

--quotechar {character}

Specifies the quotation character that is used to enclose values. By default, the quotation character is the double quotation mark (").

-r, --reportdir {directory}

Specifies the directory where csvloader writes the three output files. By default, csvloader writes output files to the current working directory. This argument lets you redirect output to an alternative location.

--s, --servers=server-id[,...]

Specifies the network address of one or more nodes of a database cluster. By default, csvloader attempts to insert the CSV data into a database on the local system (localhost). To load data into a remote database, use the --servers argument to specify the database nodes the loader should connect to.

--separator {charactor}

Specifies the character used to separate individual values in the input. By default, the separator character is the comma (,).

--skip {integer}

Specifies the number of lines from the input stream to skip before inserting rows into the database. This argument (along with --limitrows) lets you load a subset of a larger CSV file.

--strictquotes

Specifies that all values in the CSV input must be enclosed in quotation marks. If you use this argument, any input lines containing unquoted values will generate an error and not be inserted into the database.

--user {text}

Specifies the username to use when connecting to the database. You must specify a username and password if security is enabled for the database.

Examples

The following example loads the data from a CSV file, languages.csv, into the helloworld table from the Hello World example database and redirects the output files to the ./logs subfolder.

$ csvloader helloworld -f languages.csv -r ./logs

The following example performs the same function, providing the input interactively.

$ csvloader helloworld -r ./logs
"Hello", "World", "English"
"Bonjour", "Monde", "French"
"Hola", "Mundo", "Spanish"
"Hej", "Verden", "Danish"
"Ciao", "Mondo", "Italian"
CTRL-D