Create a schema
Imply Polaris stores data in tables. A table's schema determines how the data is organized. It defines the table's columns, data types, and other metadata about the table.
There are several ways to create a schema in Polaris:
- You can use the UI to create a schema manually by adding and removing columns, as described in Define a schema for an empty table. This method is best suited for cases when you know exactly what your schema should look like and want to define it before streaming any data.
- You can load data and infer a table's schema simultaneously using the schema auto-detection feature available for batch ingestion. Polaris supports schema auto-detection for newline delimited JSON files up to 10 GB. When you select batch ingestion to load data, Imply scans the first 1000 entries to create a sample table. It then infers a name and a data type for each column based on the detected values. This method is best suited for cases when you do not have a predefined schema and want to get started quickly. For additional information, see the quickstart guide.
- You can use the Tables API to create tables and manage schema programmatically. This method is best suited for automated workflows.
This topic explains how to create a schema manually.
Prerequisites
The ManageTables
role is required to create and edit a schema in Polaris.
Data types
When you define a schema, you must specify a name and a data type for each column.
Imply supports the following data types for a table column:
- string: UTF-8 encoded text
- long: a 64 bit integer
- float: a 32 bit floating point decimal number
- double: a 64 bit floating point decimal number
The following restrictions apply to column names:
- Names must be unique and non-empty when trimmed of leading and trailing spaces.
- Names starting with two underscores, such as
__time
, are reserved for internal use.
Timestamp
Every schema has a timestamp column by default. Polaris uses the timestamp to partition and sort data, and to perform time-based data management operations, such as dropping time chunks.
When you load data into a table, you can select the source field to map to the timestamp value. If your dataset does not contain a timestamp, you can set a default value for the timestamp in your schema. If your dataset has more than one timestamp, you can ingest it as a secondary timestamp. Regardless of the source field for the timestamp, Polaris always stores the primary timestamp in the __time
column of your table.
Schema dimensions
Schema dimensions are data columns that contain qualitative information. You can group by, filter, or apply aggregators to dimensions at query time.
Schema measures
Schema measures are numeric (quantitative) data fields derived from the original data source. A schema measure represents the output of an aggregation function applied across all schema dimensions in your dataset.
Schema measures are available when you enable rollup.
Dimensions and measures in the schema relate 1:N with the dimensions and measures in data cubes. Data cubes can model additional dimensions and measures using expressions, and can also remove dimensions and measures as needed.
Define a schema for an empty table
You can create an empty table with a schema definition before loading any data into Polaris.
To define a schema for an empty table, follow these steps:
- Click Tables from the left navigation menu.
- Click Create table.
- Enter a unique name for your table and click Continue.
- Click Edit schema.
- Polaris displays an empty table.
- Toggle the Rollup switch to ON to optionally enable rollup.
- Polaris splits the table view to show dimensions to the left and measures to the right.
- In the Dimensions side of the Edit schema split view, click Add.
- Enter the dimension information: Base column (essentially, the source column for your data), Name, and Data type.
- Click Save.
- In the Measures side of the Edit schema split view, click Add.
- Enter the measure information: Name, Data type, Aggregation function, and Base column. You can map multiple measures to the same base column. The base column you select determines which aggregation functions are exposed. Similarly, the aggregation function you choose determines which data types are valid for this particular measure.
- Click Save.
When you are done editing your schema, click Save schema. Your table is now ready for ingestion.
Table status reference
Polaris displays the current ingestion status for the table in the Status field of the Tables page.
Depending on the data ingestion stage, the table's status can be one of the following:
- Setup incomplete: The table's schema needs to be configured to proceed.
- Ready for ingestion: The table is ready for ingestion.
- Ingesting: Polaris is in the process of ingesting data into the table.
- Ingested: Data has been ingested and is ready for querying.
- Deleting: Polaris is in the process of deleting the table.