Skip to main content

Ingest delimiter-separated values including CSV data

Imply Polaris supports data ingestion from files containing delimiter-separated values, specifically:

  • CSV: Comma-separated values.
  • TSV: Tab-separated values.
  • Other delimiter: Custom text delimiter.

This topic shows the options for ingesting data in a delimiter-separated format.

Ingest delimiter-separated data

You specify the input format in the Parse data stage of creating an ingestion job. Select CSV, TSV, or Other delimiter in the Input format drop-down menu:

Input format options

Format settings

For delimiter-separated value formats, you can specify the following format settings:

  • Number of header rows to skip. If you choose to skip any header rows, Polaris detects the column headers from the first non-skipped row.
  • Whether the data contains column headers. If your files do not contain headers or if you want to use your own column names, set Data has header? to No and enter your column headers. Polaris does not allow multiple column headers to have the same name, so you can use this option to provide new column names. Verify the column headers and any skipped rows in the data sample before continuing onto schema editing.
info

With streaming ingestion, you can ingest data in delimiter-separated formats but you can't specify the header rows to skip or supply custom headers.

You can also configure these settings for delimiter-separated values when you create an ingestion job by API. Define these settings in source.formatSettings in the payload of the API call. For more information, see Create an ingestion job by API and the Jobs API documentation.

Custom delimiter

If you select Other delimiter, fill in the Delimiter field with the characters used to specify the boundary between data values. Do not confuse this field with Multi-value delimiter for multi-value dimensions.

Custom delimiter

A custom delimiter of , is equivalent to CSV format, and a value of \t is equivalent to TSV format. You can also supply Unicode escape sequences such as \u0001.

When provided, Polaris parses each row using the custom delimiter. If all rows do not contain the same number of delimiters, Polaris identifies columns based on the row with the lowest number of delimiters.

Multi-value dimensions

The Multi-value delimiter field controls how Polaris parses multi-value dimensions. If you use the API, specify the multi-value dimension in source.formatSettings.listDelimiter of the ingestion job request body.

Examples

This section shows examples of ingesting data in various delimited formats.

Basic CSV format

The following example shows a simple CSV format, without any custom format settings:

CSV example

The example uses the following data:

time,product,department
2022-06-13T10:10:35Z,Bike,Sports
2022-06-13T21:37:06Z,Mouse,Computers
2022-06-13T07:52:29Z,Bike,Sports
2022-06-13T13:57:38Z,Sausages,Grocery
2022-6-14T10:32:08Z,Keyboard,Computers

New column names

The following example discards the header row from the data and assigns new column names.

New columns example

The example uses the same data as the preceding example.

Custom delimiter

The following example applies a custom delimiter of ;.

Custom delimiter example

The example uses the following data:

time;product;department
2022-06-13T10:10:35Z;Bike;Sports
2022-06-13T21:37:06Z;Mouse;Computers
2022-06-13T07:52:29Z;Bike;Sports
2022-06-13T13:57:38Z;Sausages;Grocery
2022-6-14T10:32:08Z;Keyboard;Computers

Multi-value dimensions

The following example applies a custom delimiter of ; to demarcate columns and a multi-value delimiter of , to indicate the delimiter for multi-value dimensions.

MVD example

The example uses the following data:

time;product;department
2022-06-13T10:10:35Z;Bike;Sports,Transportation,Outdoor
2022-06-13T21:37:06Z;Mouse;Computers,Electronics
2022-06-13T07:52:29Z;Bike;Sports,Transportation,Outdoor
2022-06-13T13:57:38Z;Sausages;Grocery,Deli
2022-6-14T10:32:08Z;Keyboard;Computers,Electronics

Learn more

For more information about ingestion jobs, see Create an ingestion job.

For other supported formats, see Supported data and file formats.