Data ingestion is the process of loading data into tables in Imply Polaris. You can load data from a variety of batch and streaming sources, and Polaris supports a range of data formats. After you ingest data, you can then query and visualize the data stored in Polaris tables.
If you want to skip the details and jump straight into ingestion, try one of the following guides:
You need the following to ingest data into Polaris:
- A source of data
- A table to store the data
- An ingestion job
A typical workflow to ingest data in Polaris looks like this:
This section discusses each of the stages in more detail. Polaris also offers features that simplify this process, such as automatic table creation. For more details, see Ingestion shortcuts.
Specify a data source
To specify your data source, you either upload files to the Polaris staging area, or you create a connection to one of the ingestion sources, such as Amazon S3 or Confluent Cloud. You also create a connection when your source data comes from an application that pushes event data to Polaris.
If you plan to ingest from one table to another, all you need is the name of the source table.
The following topics provide more information on specifying a data source:
Create a table
Tables store data you ingest in Polaris. You can create a table before you start an ingestion job when you want to ensure a particular table type, table mode, or schema. Otherwise if you provide a new table name in your ingestion job, Polaris creates a table with the appropriate attributes for you.
In either case, you can update your table after creation. For example, you can declare certain columns in the table schema or add a storage policy to manage data retention. You can't change the table name and table type.
The following topics provide more information on creating and updating tables:
Start an ingestion job
The ingestion job in Polaris reads data from the specified source, optionally applies functions to transform the data, and loads data into the destination table. In the ingestion job, you can apply filters on the data, for instance to limit ingested data to a particular date range. You can also control some data management features to tune performance and efficiency, such as partitioning and rollup.
You can view the ingestion job spec, or the recipe for a particular ingestion job, to track your ingestion history or to use as the basis for defining future jobs.
The following topics provide more information on ingestion jobs:
See the following topics for more information: