Skip to main content

Ingestion shortcuts

The ingestion process in Imply Polaris requires a source of data, a destination table, and an ingestion job to link the two together. To make the ingestion process simpler and more straightforward, Polaris can perform the following tasks for you when you start an ingestion job:

  • Create a table to ingest data into.
  • Determine the appropriate schema to use in your table.
  • Detect the schema of your input data (streaming ingestion only).

Automatic table creation

If you don't have a table created before starting an ingestion job, Polaris creates one for you. All you have to supply is a name for the table. Note that you can't change the table name later.

Polaris determines the appropriate attributes for the table based on details from the job specification. For example, if you define a transformation that aggregates your data, Polaris creates an aggregate table, otherwise it creates a detail table.

With an automatically created table, there's no need to define a schema for the table in advance. Polaris determines the table schema based on the data itself, such as creating a column named id with the string data type and a column named price with the long data type. To read more about automatic determination of your table schema, see Schema detection for tables.

To read more about tables and automatically created tables, see Introduction to tables.

Schema detection for tables

Instead of explicitly defining the column names and data types for all data ingested into a table, Polaris can determine the table schema for you. To enable this, select the flexible schema mode when you create a table. If Polaris creates a table for you, it uses the flexible mode by default.

The schema mode determines how Polaris enforces a table's schema, whether the schema is fixed and predetermined by the user (strict mode), or variable and detected at ingestion time (flexible mode).

When you create a flexible mode table, Polaris adds columns in the table based on the columns it identifies in the data. You can also declare columns in a flexible table if you want to ensure that it has a particular data type. In a strict table, you declare all of the columns of the table.

The flexible schema enforcement mode supports both batch and streaming ingestion.

For the use cases of creating a flexible table versus a strict one, see Flexible table.

Schema auto-discovery of input data

To correctly ingest your data, Polaris needs to understand the schema of the input data used as the source of your ingestion job. When you ingest data using the UI, Polaris samples the data and attempts to determine its schema. You can correct the column names and interpreted data types in the Parse data stage of ingestion.

When you ingest data using the API, you must supply the input schema as an array of field names and data types. Polaris requires this information for ingestion jobs that take a batch input source, such as Amazon S3 or uploaded files. You don't need to specify the schema when your input source is an existing Polaris table, or when you've enabled schema auto-discovery for a streaming ingestion job.

Schema auto-discovery allows you to skip providing the input schema when your ingestion source comes from streaming data and your table has the flexible schema mode. If Polaris creates a table for you, it uses the flexible mode by default.

For example, you might have a long-running streaming job that sends data with a particular schema, but then changes the structure of the sent data to include new fields not previously included. With schema auto-discovery, these new fields are automatically detected and ingested into the table. If records have missing fields, Polaris fills them with null values in the table.

Schema auto-discovery ingests fields without changing their names, data types, or the values itself. You can optionally provide input expressions if you want to do any of these transformations, such as applying an arithmetic operation on a field, but you must also define the source fields in the input schema.

Learn more about schema auto-discovery.