Ingest delimiter-separated values including CSV data
Imply Polaris supports data ingestion from files containing delimiter-separated values, specifically:
- CSV: Comma-separated values.
- TSV: Tab-separated values.
- Other delimiter: Custom text delimiter.
This topic shows the options for ingesting data in a delimiter-separated format.
Ingest delimiter-separated data
You specify the input format in the Parse data stage of creating an ingestion job.
Select CSV
, TSV
, or Other delimiter
in the Input format drop-down menu:
Format settings for all delimiter formats
For delimiter-separated value formats, you can specify the following:
- Number of header rows to skip. If you choose to skip any header rows, Polaris detects the column headers from the first non-skipped row.
- Whether the data contains column headers. If your files do not contain headers or if you want to use your own column names, set Data has header? to No and enter your column headers. Verify the column headers and any skipped rows in the data sample before continuing onto schema editing.
You can ingest data in delimiter-separated formats with streaming ingestion. However, you can't specify the header rows to skip and custom headers with streaming ingestion.
You can also configure these settings for delimiter-separated values when you create an ingestion job by API.
Define these settings in source.formatSettings
in the payload of the API call.
For more information, see Create an ingestion job by API and the Jobs API documentation.
Custom delimiter
If you select Other delimiter
, fill in the Delimiter field with the characters used to specify the boundary between data values.
A custom delimiter of ,
is equivalent to CSV format, and a value of \t
is equivalent to TSV format. You can also supply Unicode escape sequences such as \u0001
.
When provided, Polaris parses each row using the custom delimiter. If all rows do not contain the same number of delimiters, Polaris identifies columns based on the row with the lowest number of delimiters.
The delimiter applies to how Polaris parses columns from the input files.
You can set the delimiter for parsing multi-value dimensions when creating ingestion jobs using the API. Specify this delimiter in source.formatSettings.listDelimiter
of the ingestion job request body.
Example
The following screenshot shows an example of skipping the header row from a CSV file and assigning new column names.
Learn more
For more information about ingestion jobs, see Create an ingestion job.