Data overview
Like many database systems, Imply Polaris stores your data in tables. Behind the scenes, the tables are in the columnar format used by Apache Druid.
This topic provides a general overview of the data lifecycle for both the Polaris UI and the API. For more information on how to use the API, see Polaris API overview.
Supported input data formats
Prior to building tables and loading data, verify that your source data adheres to the supported data and file formats.
Create a schema
A schema defines the columns and other metadata about your tables. Based on the schema mode of your table, the table's schema may contain declared columns, which are explicitly defined by the user, or automatically discovered columns, which Polaris detects during ingestion.
You can create a schema:
- Manually with the UI. See Table schema.
- Using schema detection when you upload a file. Follow the Quickstart.
- Using the API. See Create a table by API.
Polaris offers the following schema optimizations:
- Rollup, a form of pre-aggregation that reduces the size of stored data and increases query performance.
- Partitioning, a way to organize your schema to increase query performance based upon columns you frequently use for filters.
Additionally, you can use data sketch columns for approximation.
Ingest data
You can ingest from a variety of sources into Polaris, such as uploaded files, Amazon S3 buckets, and Kafka topics. Polaris automatically scales the number of ingestion tasks within your project to maintain optimal performance. For more details and reference on the available sources for ingestion, see Ingestion sources overview.
To ingest data into Polaris, create the following components:
- A table with a defined schema to receive the ingested data.
- A connection to define the source of the data. Connections are not required for ingesting from uploaded files or existing tables.
- An ingestion job to bring in data from the connection.
The ingestion job loads data into a table from the specified source data, whether from a connection or existing data in Polaris. In the ingestion job specification, you also define how the input data maps to the table schema and any transformations to apply. While your data is loading, you can monitor the status of your ingestion job on the Jobs tab in the UI or using the Jobs API.
You can create tables, connections, and ingestion jobs using the UI or the API.
Work with existing tables and data
After you create a table, you can:
- Add batch data in the UI or using the API.
- Replace existing data for specific time intervals.
- Drop data in the UI or use the Jobs API to create a
delete_data
job. When you drop data, you have the option to drop all data or the data for a time interval. - Delete the table in the UI or use the Jobs API to create a
drop_table
job.
To access table operations within the Tables list, click the ... menu button at the far left of the table.
Known limitations
We are regularly updating Polaris to add features and fixes. If you run into an issue, check Known limitations.
Learn more
See the following topics for more information:
- Quickstart for a tutorial on how to upload data using batch ingestion.
- Ingest from files for strategies and concepts for batch ingestion.
- Load event data for information and examples on how to load event stream data.