• Developer guide
  • API reference

›Tables and data

Getting started

  • Introduction to Imply Polaris
  • Quickstart
  • Execute a POC
  • Create a dashboard
  • Navigate the console
  • Key concepts

Tables and data

  • Overview
  • Introduction to tables
  • Table schema
  • Create an ingestion job
  • Timestamp expressions
  • Data partitioning
  • Introduction to rollup
  • Approximation algorithms
  • Replace data

Ingestion sources

  • Ingestion sources overview
  • Supported data formats
  • Create a connection
  • Ingest from files
  • Ingest data from a table
  • Ingest from S3
  • Ingest from Kafka and MSK
  • Ingest from Kinesis
  • Ingest from Confluent Cloud
  • Kafka Connector for Imply Polaris
  • Push event data
  • Connect to Confluent Schema Registry

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Data cube dimensions
  • Data cube measures
  • Dashboards
  • Visualizations reference
  • Set up alerts
  • Set up reports
  • Embed visualizations
  • Query data

Monitoring

  • Overview

Management

  • Overview
  • Pause and resume a project

Billing

  • Overview
  • Polaris plans
  • Estimate project costs

Usage

  • Overview

Security

    Polaris access

    • Overview
    • Invite users to your organization
    • Manage users
    • Permissions reference
    • Manage user groups
    • Enable SSO
    • SSO settings reference
    • Map IdP groups

    Secure networking

    • Connect to AWS
    • Create AWS PrivateLink connection

Developer guide

  • Overview
  • Authentication

    • Overview
    • Authenticate with API keys
    • Authenticate with OAuth
  • Manage users and groups
  • Migrate deprecated resources
  • Create a table
  • Define a schema
  • Upload files
  • Create an ingestion job
  • Ingestion sources

    • Ingest from files
    • Ingest from a table
    • Get ARN for AWS access
    • Ingest from Amazon S3
    • Ingest from Kafka and MSK
    • Ingest from Amazon Kinesis
    • Ingest from Confluent Cloud
    • Push event data
    • Kafka Connector for Imply Polaris
    • Kafka Connector reference
  • Filter data to ingest
  • Ingest nested data
  • Ingest and query sketches
  • Specify data schema
  • Query data
  • Update a project
  • Link to BI tools
  • Connect over JDBC
  • Query parameters reference
  • API documentation

    • OpenAPI reference
    • Query API

Product info

  • Release notes
  • Known limitations
  • Druid extensions

Introduction to data rollup

Modern day applications emit millions of events in streaming data per day. As data accumulates, it increases the storage footprint, often leading to higher storage costs and decreased query performance. Imply Polaris uses the Apache Druid data rollup feature to aggregate raw data at predefined intervals during ingestion. By decreasing row counts, rollup can dramatically reduce the size of stored data and improve query performance.

This topic provides an overview of data rollup in Polaris.

Data rollup

Rollup is a form of time-based data aggregation. It combines multiple rows with the same timestamp and dimension values into segments, resulting in a condensed data set.

You enable rollup by specifying the aggregate table type during table creation. You can then configure the table's time granularity before ingesting data to maximize performance.

When you select the detail table type, Polaris stores each record as it is ingested, without performing any form of aggregation.

The following are optimal scenarios to create an aggregate table with rollup:

  • You want optimal performance or you have strict space constraints.
  • You don't need raw values from high-cardinality dimensions.

Conversely, create a detail table without rollup when any of the following conditions hold:

  • You want to preserve results for individual rows.
  • You don't have any measures that you want to aggregate during the ingestion process.
  • You have many high-cardinality dimensions.

The following screenshots show two tables created using the same dataset. The first table is an aggregate table with time granularity set to Day. The total size of the table is 6.03 MB.

Polaris aggregate table example

The second table is a detail table without rollup. The total size of the table is 6.36 MB.

Polaris detail table example

Time granularity

Time granularity determines how to bucket data across the timestamp dimension using UTC time—days start at 00:00 UTC.

Polaris supports the following time granularity options:

Time granularityDescriptionExample
MillisecondBuckets input data by millisecond.2016-04-01T01:02:33.080Z
SecondBuckets input data by second.2016-04-01T01:02:33.000Z
MinuteBuckets input data by minute.2016-04-01T01:02:00.000Z
15 minuteBuckets input data by 15-minute intervals.2016-04-01T01:15:00Z
30 minuteBuckets input data by 30-minute intervals.2016-04-01T01:30:00Z
HourBuckets input data by hour.2016-04-01T01:00:00.000Z
DayBuckets input data by day.2016-04-01T00:00:00.000Z
WeekBuckets input data by week.2016-06-27T00:00:00.000Z
MonthBuckets input data by month.2016-06-01T00:00:00.000Z
QuarterBuckets input data by quarter.2016-04-01T00:00:00.000Z
YearBuckets input data by year.2016-01-01T00:00:00.000Z

Polaris sets the default time granularity at Millisecond.

Example

The following example shows how to create an aggregate table and specify its rollup time granularity. The dataset is a sample of network flow event data, representing packet and byte counts for an IP traffic that occurred within a particular second.

To create an aggregate table and specify its rollup granularity, follow these steps:

  1. Download this JSON file containing the sample input data.

  2. Click Table from the left navigation menu of the Polaris UI.

  3. Click Create table.

  4. Enter a unique name for your table.

  5. Select the Aggregate table type.

  6. On the table detail page, click Load data > Insert data and select the file you downloaded, rollup-data.json.

  7. Click Next > Continue.

  8. On the Insert data page, click on the timestamp dimension, then click Edit.

  9. In the timestamp dialog, select Minute from the Rollup granularity drop-down. This tells Polaris to bucket the timestamps of the original input data by minute. As a result, the table goes from nine rows to five rows.

    Your table should look similar to the following:

    Polaris rollup ingestion

  10. Click Start ingestion.

Note that Polaris interprets srcIP and dstIP columns as dimensions and packets and bytes columns as measures. All aggregate tables automatically include a __count measure. This measure counts the number of source data rows that were rolled up into a given row.

The following events were aggregated:

  • Events that occurred during 2018-01-01T01:01:

    {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
    {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
    {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
    
  • Events that occurred during 2018-01-01T01:02:

    {"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289}
    {"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
    
  • Events that occurred during 2018-01-02T21:33:

    {"timestamp":"2018-01-02T21:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":38,"bytes":6289}
    {"timestamp":"2018-01-02T21:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":123,"bytes":93999}
    

Limitations

The following restrictions apply to aggregate tables:

  • Rollup is set for aggregate tables only. Tables are either aggregate or detail at creation. Once a table is created, you cannot change its type.
  • Once you add data to an aggregate table and specify its rollup granularity, you can only make the granularity coarser—for example,Minute to Hour. Polaris makes the granularity change during compaction. If an aggregate table does not contain data and there is not an active ingestion job associated with the table, you can change the rollup granularity to a finer granularity—for example, Hour to Minute.
  • Polaris does not support rollup for nested data because this data is generally high cardinality. If you have nested data, either flatten it into dimensions at ingestion time using JSON_VALUE or ingest the data into a string-typed column.

Learn more

See the following topics for more information:

  • Table schema for the different data types for Polaris columns.
  • Tables v2 API for reference on working with tables in Polaris.
  • Ingestion sources overview for the available sources for ingestion in Polaris.
← Data partitioningApproximation algorithms →
  • Data rollup
    • Time granularity
  • Example
  • Limitations
  • Learn more
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2023 Imply Data, Inc