›Data

Get started

  • Introduction to Imply Polaris
  • Quickstart
  • Navigate the console
  • Key concepts

Data

  • Overview
  • Create a schema
  • Data partitioning
  • Introduction to rollup
  • Replace data
  • Supported data formats

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Dimensions
  • Measures
  • Dashboards
  • Create a dashboard
  • Visualizations reference
  • Query data

Monitoring

  • Overview

Management

  • Overview

Billing

  • Overview
  • Polaris plans
  • Estimate project costs
  • Manage billing and payments

Security

  • Overview
  • Add users to an organization
  • User roles reference
  • Manage user groups
  • Enable SSO
  • SSO settings reference

Developer guide

  • Overview
  • Authenticate API requests
  • Create a table
  • Get table ID
  • Define a schema
  • Upload files
  • Ingest to table
  • Push event data
  • Query data
  • Link to BI tools
  • Connect over JDBC

API reference

  • Overview
  • Reference index
  • Events API
  • Files API
  • Ingestion Jobs API
  • Ingestion Templates API
  • Performance API
  • Query API
  • Tables API
  • Common object definitions

    • Table
    • TableRequest
    • RollupSchema
    • IngestionJobSpec
    • CsvFormatSettings
    • JsonFormatSettings
    • TimestampMapping

Product info

  • Release notes
  • Known limitations

Create a schema

Imply Polaris stores data in tables. A table's schema determines how the data is organized. It defines the table's columns, data types, and other metadata about the table.

There are several ways to create a schema in Polaris:

  • You can use the UI to create a schema manually by adding and removing columns, as described in Define a schema for an empty table. This method is best suited for cases when you know exactly what your schema should look like and want to define it before streaming any data.
  • You can load data and infer a table's schema simultaneously using the schema auto-detection feature available for batch ingestion. Polaris supports schema auto-detection for newline delimited JSON files up to 10 GB. When you select batch ingestion to load data, Imply scans the first 1000 entries to create a sample table. It then infers a name and a data type for each column based on the detected values. This method is best suited for cases when you do not have a predefined schema and want to get started quickly. For additional information, see the quickstart guide.
  • You can use the Tables API to create tables and manage schema programmatically. This method is best suited for automated workflows.

This topic explains how to create a schema manually.

Prerequisites

The ManageTables role is required to create and edit a schema in Polaris.

Data types

When you define a schema, you must specify a name and a data type for each column.

Imply supports the following data types for a table column:

  • string: UTF-8 encoded text
  • long: a 64 bit integer
  • float: a 32 bit floating point decimal number
  • double: a 64 bit floating point decimal number

The following restrictions apply to column names:

  • Names must be unique and non-empty when trimmed of leading and trailing spaces.
  • Names starting with two underscores, such as __time, are reserved for internal use.

Timestamp

Every schema has a timestamp column by default. Polaris uses the timestamp to partition and sort data, and to perform time-based data management operations, such as dropping time chunks.

When you load data into a table, you can select the source field to map to the timestamp value. If your dataset does not contain a timestamp, you can set a default value for the timestamp in your schema. If your dataset has more than one timestamp, you can ingest it as a secondary timestamp. Regardless of the source field for the timestamp, Polaris always stores the primary timestamp in the __time column of your table.

Schema dimensions

Schema dimensions are data columns that contain qualitative information. You can group by, filter, or apply aggregators to dimensions at query time.

Schema measures

Schema measures are numeric (quantitative) data fields derived from the original data source. A schema measure represents the output of an aggregation function applied across all schema dimensions in your dataset.

Schema measures are available when you enable rollup.

Dimensions and measures in the schema relate 1:N with the dimensions and measures in data cubes. Data cubes can model additional dimensions and measures using expressions, and can also remove dimensions and measures as needed.

Define a schema for an empty table

You can create an empty table with a schema definition before loading any data into Polaris.

To define a schema for an empty table, follow these steps:

  1. Click Tables from the left navigation menu.
  2. Click Create table.
  3. Enter a unique name for your table and click Continue.
  4. Click Edit schema. Polaris edit schema
  5. Polaris displays an empty table.
  6. Toggle the Rollup switch to ON to optionally enable rollup.
  7. Polaris splits the table view to show dimensions to the left and measures to the right.
    Polaris schema rollup
  8. In the Dimensions side of the Edit schema split view, click Add. Polaris schema dimension
  9. Enter the dimension information: Base column (essentially, the source column for your data), Name, and Data type.
  10. Click Save.
  11. In the Measures side of the Edit schema split view, click Add. Polaris schema measure
  12. Enter the measure information: Name, Data type, Aggregation function, and Base column. You can map multiple measures to the same base column. The base column you select determines which aggregation functions are exposed. Similarly, the aggregation function you choose determines which data types are valid for this particular measure.
  13. Click Save.

When you are done editing your schema, click Save schema. Your table is now ready for ingestion.

Table status reference

Polaris displays the current ingestion status for the table in the Status field of the Tables page.

Depending on the data ingestion stage, the table's status can be one of the following:

  • Setup incomplete: The table's schema needs to be configured to proceed.
  • Ready for ingestion: The table is ready for ingestion.
  • Ingesting: Polaris is in the process of ingesting data into the table.
  • Ingested: Data has been ingested and is ready for querying.
  • Deleting: Polaris is in the process of deleting the table.
← OverviewData partitioning →
  • Prerequisites
  • Data types
    • Timestamp
    • Schema dimensions
    • Schema measures
  • Define a schema for an empty table
  • Table status reference
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2022 Imply Data, Inc