Ingest delimiter-separated values including CSV data
Imply Polaris supports data ingestion from files containing delimiter-separated values, specifically:
- CSV: Comma-separated values.
- TSV: Tab-separated values.
- Other delimiter: Custom text delimiter.
This topic shows the options for ingesting data in a delimiter-separated format.
Ingest delimiter-separated data
You specify the input format in the Parse data stage of creating an ingestion job.
Select CSV
, TSV
, or Other delimiter
in the Input format drop-down menu:
Format settings
For delimiter-separated value formats, you can specify the following format settings:
- Number of header rows to skip. If you choose to skip any header rows, Polaris detects the column headers from the first non-skipped row.
- Whether the data contains column headers. If your files do not contain headers or if you want to use your own column names, set Data has header? to No and enter your column headers. Polaris does not allow multiple column headers to have the same name, so you can use this option to provide new column names. Verify the column headers and any skipped rows in the data sample before continuing onto schema editing.
With streaming ingestion, you can ingest data in delimiter-separated formats but you can't specify the header rows to skip or supply custom headers.
You can also configure these settings for delimiter-separated values when you create an ingestion job by API.
Define these settings in source.formatSettings
in the payload of the API call.
For more information, see Create an ingestion job by API and the Jobs API documentation.
Custom delimiter
If you select Other delimiter
, fill in the Delimiter field with the characters used to specify the boundary between data values.
Do not confuse this field with Multi-value delimiter for multi-value dimensions.
A custom delimiter of ,
is equivalent to CSV format, and a value of \t
is equivalent to TSV format. You can also supply Unicode escape sequences such as \u0001
.
When provided, Polaris parses each row using the custom delimiter. If all rows do not contain the same number of delimiters, Polaris identifies columns based on the row with the lowest number of delimiters.
Multi-value dimensions
The Multi-value delimiter field controls how Polaris parses multi-value dimensions.
If you use the API, specify the multi-value dimension in source.formatSettings.listDelimiter
of the ingestion job request body.
Examples
This section shows examples of ingesting data in various delimited formats.
Basic CSV format
The following example shows a simple CSV format, without any custom format settings:
The example uses the following data:
time,product,department
2022-06-13T10:10:35Z,Bike,Sports
2022-06-13T21:37:06Z,Mouse,Computers
2022-06-13T07:52:29Z,Bike,Sports
2022-06-13T13:57:38Z,Sausages,Grocery
2022-6-14T10:32:08Z,Keyboard,Computers
New column names
The following example discards the header row from the data and assigns new column names.
The example uses the same data as the preceding example.
Custom delimiter
The following example applies a custom delimiter of ;
.
The example uses the following data:
time;product;department
2022-06-13T10:10:35Z;Bike;Sports
2022-06-13T21:37:06Z;Mouse;Computers
2022-06-13T07:52:29Z;Bike;Sports
2022-06-13T13:57:38Z;Sausages;Grocery
2022-6-14T10:32:08Z;Keyboard;Computers
Multi-value dimensions
The following example applies a custom delimiter of ;
to demarcate columns and a multi-value delimiter of ,
to indicate the delimiter for multi-value dimensions.
The example uses the following data:
time;product;department
2022-06-13T10:10:35Z;Bike;Sports,Transportation,Outdoor
2022-06-13T21:37:06Z;Mouse;Computers,Electronics
2022-06-13T07:52:29Z;Bike;Sports,Transportation,Outdoor
2022-06-13T13:57:38Z;Sausages;Grocery,Deli
2022-6-14T10:32:08Z;Keyboard;Computers,Electronics
Learn more
For more information about ingestion jobs, see Create an ingestion job.