Skip to main content

Key concepts

Imply Polaris is a real-time database for modern analytics applications, built from Apache Druid and delivered as a fully managed database as a service (DBaaS). It provides intelligent database optimizations that empower you to get all the performance you want from Druid without needing Druid expertise.

Polaris is designed to get you from the initial setup to production in record time. This topic covers the main Polaris concepts and terminology to help you get started.

Audit log

An audit log provides information about administrative and user activities within Polaris.

API key

An API key represents a unique string that authorizes the use of the Polaris API. Essentially, it identifies the application calling the API without the need to manage user identities. For more information on key-based authentication, see API keys authentication.

Asynchronous query

An asynchronous query is a non-blocking long running query that provides high throughput but at a higher latency compared to synchronous queries.

Asynchronous queries return a placeholder object, called a future, that represents the result of the asynchronous operation. Common use cases for async queries include executing large joins and downloading high volumes of data, specifically queries exceeding 100 MB.

Batch ingestion

Batch ingestion refers to an ingestion job that reads a finite amount of data from a source and terminates when all rows have been processed. See Batch ingestion sources for supported ingestion sources.

Precached data

Precached data refers to a subset of your data that is stored in deep storage data but is also loaded into a caching layer. By default, Polaris stores all data in deep storage in addition to caching that data to enable high concurrency and low latency queries. Precached data counts towards the project’s storage size.

You can control what data gets precached and what data gets retained through cache and retention policies.

Cloud provider

A cloud_provider is a cloud computing platform where Polaris services are hosted. For a list of cloud providers Polaris supports, see Cloud providers and regions.

Data cube

A data cube is a multidimensional data model that you use to organize and visualize aggregated data. Data cubes contain data from one or more data sources and provide an interface for users to explore a data set.

DPU

A Data Processing Unit (DPU) is a normalized unit of processing power that Polaris uses to measure data transformation tasks such as asynchronous queries.

DPU-minute

One DPU-minute corresponds to one task consuming one DPU for one minute.

Dashboard

A dashboard combines views from multiple data cubes on a single screen. Using a dashboard, you can create effective and focused data visualizations such as line charts, heatmaps, and vertical bar chartsjust to name a few. See the Visualizations reference for a description of all visualizations.

Deep storage

Deep storage refers to the long term storage in Polaris.

Data stored in deep storage doesn't count towards the project's storage size and is substantially cheaper to retain for long term. You can configure a cache policy to offload precached data that's outside a specific period into deep storage to save costs and conserve resources.

Polaris queries data in deep storage asynchronously. Asynchronous queries return a query ID that you use to check the query's status and retrieve the results. For more information, see Asynchronous query.

Group

A group is a collection of permissions that enable users to perform specific actions in the context of an organization or a project. To learn more about groups, see Manage user groups.

Input schema

An input schema represents the names and data types of data used as the source for an ingestion job. You can use the input field names in input expressions, which transform the source data before ingestion. For details on ingestion jobs, see Create an ingestion job.

Job

A job is an asynchronous request that performs various tasks in the background, such as streaming or batch ingestion, async queries, dimension table jobs, and lookup table ingestion jobs.

Mapping

A mapping in an ingestion job describes the relationship between one or more input fields and the destination column in a table. A mapping takes the name of the table column as well as an input expression.

Input expression

The input expression is a SQL expression that describes the transformation applied to input fields.

For more information on mappings, see Map and transform data with input expressions.

OAuth client

An OAuth client allows you to authenticate a third-party application against the Polaris API in a secure manner. An OAuth client contains a client ID, a client secret, and an access token. Polaris uses these credentials to validate your application and authorize the API calls. To learn how to use the OAuth authentication scheme, see Authenticate with OAuth.

Organization

An organization is a parent-level entity that maps to your Imply customer account. When you first sign up for a Polaris account, Imply creates an organization for you. Subsequently, you log into your organization to manage projects, add users, process data, and build dashboards.

Project

A project represents a base entity that holds tables, data sources, files, jobs, data cubes, dashboards, alerts and reports.

Permission

A permission allows a user to perform a specific action within an organization or a project. To view all available permissions, see Permissions reference.

Real-time data

Real-time data refers to the data that Polaris has ingested in a streaming ingestion job during approximately the past hour.

Region

A region represents a geographic area with one or more physical data centers managed by a cloud provider. For a list of regions Polaris supports, see Cloud providers and regions.

Schema enforcement mode

The schema enforcement mode on a table controls how Polaris enforces the schema on the table.

Flexible mode

For a table with flexible schema enforcement (also known as a flexible table), Polaris auto-discovers the table schema during ingestion and assigns an inferred schema to the table. The schema of a flexible table can have both declared and undeclared columns.

Strict mode

For a table with strict schema enforcement (also known as a strict table), you set a declared schema on the table before ingesting data. All columns in the schema of a strict table are declared columns.

Streaming ingestion

Streaming ingestion refers to an ongoing ingestion job that continuously collects and processes data from an event stream as it is generated.

Consume event data

Consuming from an event stream, also referred to as "pull streaming ingestion," is a streaming ingestion method where Polaris actively consumes data from a streaming source, such as Apache Kafka or Amazon Kinesis. See Consume from an event stream for supported ingestion sources.

Publish event data

Publishing event data, also referred to as "push streaming ingestion," is a streaming ingestion method where you publish event data from a stream to a destination table in Polaris using the Events API. See Publish data from an event stream for supported ingestion sources.

Synchronous query

A synchronous query is the default mode of query execution. A synchronous query is usually short-running, has low latency, and can support highly concurrent workloads.

Table

A table is the central concept in Polaris. Tables store and organize data records in a row-and-column format. All tables have a schema, which represents the table structure in terms of the names and data types for the table columns. If you have data, such as clickstream data, you ingest it into a table in order to be able to query and visualize it. For more information, see Introduction to tables.

Table schema

A table schema is an ordered collection of columns that describe a table. You can specify the column names and data types for a table in its schema.

A table schema may have declared columns or undeclared columns.

Declared column

A declared column is a column explicitly provided by the user when creating or updating a table.

Undeclared column

An undeclared column is a column whose schema is inferred by Polaris during ingestion.

Queryable schema

The queryable schema of a table describes all columns in a table that you can query, including both declared columns and undeclared columns.

For more information on table schemas, see Table schema and mode.

Table type

The table type determines whether rollup is applied to a table. Rollup reduces the size of stored data and improves query performance.

Detail table

A detail table has rollup disabled. It stores each ingested record as is. For example, an e-commerce site might keep track of each purchase that is made. When you ingest those purchase records into a detail table, the table shows one row for each purchase.

Aggregate table

An aggregate table has rollup enabled to group individual records according to the table's time granularity and dimensions. For example, an e-commerce site might only be interested in the total sales per hour for a region. In this case, you don't need to see each sales record, only a summary by hour.

Task

A job can launch one or more tasks to fulfill a request.

Version

The version, in the context of Polaris, refers to the current release of the Polaris project.

Custom version

A custom version is a tailored deployment of Polaris that Imply has customized to meet the unique requirements of an organization.