• Developer guide
  • API reference

›Tables and data

Getting started

  • Introduction to Imply Polaris
  • Quickstart
  • Execute a POC
  • Create a dashboard
  • Navigate the console
  • Key concepts

Tables and data

  • Overview
  • Introduction to tables
  • Table schema
  • Create an ingestion job
  • Timestamp expressions
  • Data partitioning
  • Introduction to rollup
  • Approximation algorithms
  • Replace data

Ingestion sources

  • Ingestion sources overview
  • Supported data formats
  • Create a connection
  • Ingest from files
  • Ingest data from a table
  • Ingest from S3
  • Ingest from Kafka and MSK
  • Ingest from Kinesis
  • Ingest from Confluent Cloud
  • Kafka Connector for Imply Polaris
  • Push event data
  • Connect to Confluent Schema Registry

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Data cube dimensions
  • Data cube measures
  • Dashboards
  • Visualizations reference
  • Set up alerts
  • Set up reports
  • Embed visualizations
  • Query data

Monitoring

  • Overview

Management

  • Overview
  • Pause and resume a project

Billing

  • Overview
  • Polaris plans
  • Estimate project costs

Usage

  • Overview

Security

    Polaris access

    • Overview
    • Invite users to your organization
    • Manage users
    • Permissions reference
    • Manage user groups
    • Enable SSO
    • SSO settings reference
    • Map IdP groups

    Secure networking

    • Connect to AWS
    • Create AWS PrivateLink connection

Developer guide

  • Overview
  • Authentication

    • Overview
    • Authenticate with API keys
    • Authenticate with OAuth
  • Manage users and groups
  • Migrate deprecated resources
  • Create a table
  • Define a schema
  • Upload files
  • Create an ingestion job
  • Ingestion sources

    • Ingest from files
    • Ingest from a table
    • Get ARN for AWS access
    • Ingest from Amazon S3
    • Ingest from Kafka and MSK
    • Ingest from Amazon Kinesis
    • Ingest from Confluent Cloud
    • Push event data
    • Kafka Connector for Imply Polaris
    • Kafka Connector reference
  • Filter data to ingest
  • Ingest nested data
  • Ingest and query sketches
  • Specify data schema
  • Query data
  • Update a project
  • Link to BI tools
  • Connect over JDBC
  • Query parameters reference
  • API documentation

    • OpenAPI reference
    • Query API

Product info

  • Release notes
  • Known limitations
  • Druid extensions

Table schema

Imply Polaris stores data in tables. A table's schema determines how the data is organized. It defines the table's columns, data types, and other metadata about the table.

This topic covers the different data types for Polaris columns: schema dimensions, schema measures, and time.

Schema dimensions and schema measures relate 1:N with data cube dimensions and measures. Data cubes can model additional dimensions and measures using expressions, and can also remove dimensions and measures as needed.

Prerequisites

To create and edit a schema in Polaris, you need the following:

  • An existing table.
  • The ManageTables permission assigned to your user profile. For more information on permissions, see Permissions reference.

Schema auto-detection in the UI

Polaris can infer the schema for a table for batch or streaming ingestion with the schema auto-detection feature. With schema auto-detection, Polaris scans the first 1000 entries of your source data to create an input schema. During this data mapping phase, Polaris infers a name and a data type for each column in your table based on the detected values from your input fields. This method is best suited for cases when you do not have a predefined schema and want to get started quickly. For an example, see the quickstart guide.

Create a schema in the UI

You can use the Polaris UI to create a schema manually. You can add and remove columns, as described in the example. This method is best suited for cases when you know exactly what your schema should look like and want to define it before loading any data.

Create a schema with the API

You can use the Tables API to create tables and manage schemas programmatically. This method is best suited for automated workflows.

Data types

When you define a schema, you must specify a name and a data type for each column. Column definitions are immutable.

Polaris supports the following data types for a table column:

  • string: UTF-8 encoded text
  • long: a 64 bit integer
  • float: a 32 bit floating point decimal number
  • double: a 64 bit floating point decimal number
  • json: nested data in JSON format
  • timestamp: primary timestamp

For details on ingesting nested data, see Create an ingestion job.

The following data types are supported for schema measures in aggregate tables only:

  • thetaSketch: a Theta sketch object
  • HLLSketch: an HLL sketch object

Sketch objects are probabilistic data structures that improve the query performance of distinct count queries with known error distributions. For more information on sketches in Polaris, see Compute results with approximation algorithms.

The following restrictions apply to column names:

  • Names must be unique and non-empty when trimmed of leading and trailing spaces.
  • Names starting with two underscores, such as __count, are reserved for internal use.

Schema dimensions

Schema dimensions are data columns that contain qualitative information. You can group by, filter, or apply aggregators to dimensions at query time.

Schema measures

Schema measures are quantitative data fields or probabilistic data structures derived from the original data source. A schema measure stores data in aggregated form based on an aggregation function you apply on your source data in your ingestion job.

Supported aggregation functions include:

  • Count: Counts the number of rows for the dimension.
  • Max: Returns the largest value for the dimension.
  • Min: Returns the smallest value for the dimension.
  • Sum: Returns the sum of all values for the dimension. Polaris assigns the Sum aggregation function to measures containing numeric data by default.

Schema measures are only available for aggregate tables. All aggregate tables automatically include a __count measure that counts the number of source data rows that were rolled up into a given table row. The __count measure is populated internally. Do not specify this measure in a table schema or ingestion job specification.

Timestamp

Every schema has a timestamp column by default. Polaris uses the timestamp to partition and sort data, and to perform time-based data management operations, such as dropping time chunks. When you create a table without a schema, Polaris automatically creates the primary timestamp column __time.

If you use the Polaris API to manually define your schema, include the __time column in the schema object of the request payload. Only the __time column takes the data type of timestamp. If you do not include __time in the table schema, Polaris automatically creates this column for the table.

When creating an ingestion job, you can transform timestamps using input expressions in the ingestion job specification. For more information, see Timestamp expressions.

Example

The following example walks you through the steps to can create an empty table with a schema definition.

  1. Click Tables from the left navigation menu.
  2. Click Create table.
  3. Enter a unique name for your table and select Aggregate for the table type. For more information, see Types of tables. Polaris create table
  4. Click Create.
  5. Click Edit schema. Polaris edit schema
  6. Polaris displays an empty table with two columns automatically created, __time and __count. The __time dimension stores the primary timestamp for all Polaris tables. The __count measure holds the number of source data rows that were rolled up into a given table row for aggregate tables. Polaris splits the table view to show dimensions to the left and measures to the right. For more details on dimensions and measures, see Schema dimensions and Schema measures. Polaris empty schema
  7. On the Dimensions side of the split view, click the add icon. Polaris schema dimension
  8. Enter the name and data type of the dimension.
  9. Click Save.
  10. On the Measures side of the split view, click the add icon. Polaris schema measure
  11. Enter the name and data type of the measure. Certain data types, such as theta and HLL sketches, are only available for measures. The data type of a measure determines any aggregation functions to apply during ingestion and querying.

When you finish editing your schema, click Save schema. Your table is now ready for ingestion.

Learn more

To define a table schema using the Polaris API, see Define table schemas by API.

← Introduction to tablesCreate an ingestion job →
  • Prerequisites
  • Schema auto-detection in the UI
  • Create a schema in the UI
  • Create a schema with the API
  • Data types
  • Schema dimensions
  • Schema measures
  • Timestamp
  • Example
  • Learn more
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2023 Imply Data, Inc