Imply Polaris supports data ingestion from files containing delimiter-separated values, specifically:
- CSV: Comma-separated values.
- TSV: Tab-separated values.
- Other delimiter: Custom text delimiter.
This topic shows the options for ingesting data in a delimiter-separated format.
Ingest delimiter-separated data
You specify the input format in the Parse data stage of creating an ingestion job.
Other delimiter in the Input format drop-down menu:
Format settings for all delimiter formats
For delimiter-separated value formats, you can specify the following:
- Number of header rows to skip. If you choose to skip any header rows, Polaris detects the column headers from the first non-skipped row.
- Whether the data contains column headers. If your files do not contain headers or if you want to use your own column names, set Data has header? to No and enter your column headers. Verify the column headers and any skipped rows in the data sample before continuing onto schema editing.
You can ingest data in delimiter-separated formats with streaming ingestion. However, you can't specify the header rows to skip and custom headers with streaming ingestion.
You can also configure these settings for delimiter-separated values when you create an ingestion job by API.
Define these settings in
source.formatSettings in the payload of the API call.
For more information, see Create an ingestion job by API and the Jobs API documentation.
If you select
Other delimiter, fill in the Delimiter field with the characters used to specify the boundary between data values.
A custom delimiter of
, is equivalent to CSV format, and a value of
\t is equivalent to TSV format. You can also supply Unicode escape sequences such as
When provided, Polaris parses each row using the custom delimiter. If all rows do not contain the same number of delimiters, Polaris identifies columns based on the row with the lowest number of delimiters.
The following screenshot shows an example of skipping the header row from a CSV file and assigning new column names.
For more information about ingestion jobs, see Create an ingestion job.