Introduction to data rollup
Modern day applications emit millions of events in streaming data per day. As data accumulates, it increases the storage footprint, often leading to higher storage costs and decreased query performance. Imply Polaris uses the Apache Druid data rollup feature to aggregate raw data at predefined intervals during ingestion. By decreasing row counts, rollup can dramatically reduce the size of stored data and improve query performance.
This topic provides an overview of data rollup in Polaris.
Data rollup
Rollup is a form of time-based data aggregation. It combines multiple rows with the same timestamp and dimension values into segments, resulting in a condensed data set.
You enable rollup by specifying the aggregate table type during table creation. You can then configure the table's time granularity before ingesting data to maximize performance.
When you select the detail table type, Polaris stores each record as it is ingested, without performing any form of aggregation.
The following are optimal scenarios to create an aggregate table with rollup:
- You want optimal performance or you have strict space constraints.
- You don't need raw values from high-cardinality dimensions.
Conversely, create a detail table without rollup when any of the following conditions hold:
- You want to preserve results for individual rows.
- You don't have any measures that you want to aggregate during the ingestion process.
- You have many high-cardinality dimensions.
The following screenshots show two tables created using the same dataset.
The first table is an aggregate table with time granularity set to Day
.
The total size of the table is 6.03 MB.
The second table is a detail table without rollup. The total size of the table is 6.36 MB.
Time granularity
Time granularity determines how to bucket data across the timestamp dimension using UTC time—days start at 00:00 UTC.
Polaris supports the following time granularity options:
Time granularity | Description | Example |
---|---|---|
Millisecond | Buckets input data by millisecond. | 2016-04-01T01:02:33.080Z |
Second | Buckets input data by second. | 2016-04-01T01:02:33.000Z |
Minute | Buckets input data by minute. | 2016-04-01T01:02:00.000Z |
15 minute | Buckets input data by 15-minute intervals. | 2016-04-01T01:15:00Z |
30 minute | Buckets input data by 30-minute intervals. | 2016-04-01T01:30:00Z |
Hour | Buckets input data by hour. | 2016-04-01T01:00:00.000Z |
Day | Buckets input data by day. | 2016-04-01T00:00:00.000Z |
Week | Buckets input data by week. | 2016-06-27T00:00:00.000Z |
Month | Buckets input data by month. | 2016-06-01T00:00:00.000Z |
Quarter | Buckets input data by quarter. | 2016-04-01T00:00:00.000Z |
Year | Buckets input data by year. | 2016-01-01T00:00:00.000Z |
Polaris sets the default time granularity at Millisecond
.
Example
The following example shows how to create an aggregate table and specify its rollup time granularity. The dataset is a sample of network flow event data, representing packet and byte counts for an IP traffic that occurred within a particular second.
To create an aggregate table and specify its rollup granularity, follow these steps:
Download this JSON file containing the sample input data.
Click Table from the left navigation menu of the Polaris UI.
Click Create table.
Enter a unique name for your table.
Select the Aggregate table type.
On the table detail page, click Load data > Insert data and select the file you downloaded,
rollup-data.json
.Click Next > Continue.
On the Insert data page, click on the timestamp dimension, then click Edit.
In the timestamp dialog, select
Minute
from the Rollup granularity drop-down. This tells Polaris to bucket the timestamps of the original input data by minute. As a result, the table goes from nine rows to five rows.Your table should look similar to the following:
Click Start ingestion.
Note that Polaris interprets srcIP
and dstIP
columns as dimensions and packets
and bytes
columns as measures.
All aggregate tables automatically include a __count
measure. This measure counts the number of source data rows that were rolled up into a given row.
The following events were aggregated:
Events that occurred during
2018-01-01T01:01
:{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024} {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133} {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
Events that occurred during
2018-01-01T01:02
:{"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289} {"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
Events that occurred during
2018-01-02T21:33
:{"timestamp":"2018-01-02T21:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":38,"bytes":6289} {"timestamp":"2018-01-02T21:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":123,"bytes":93999}
Limitations
The following restrictions apply to aggregate tables:
- Rollup is set for aggregate tables only. Tables are either aggregate or detail at creation. Once a table is created, you cannot change its type.
- Once you add data to an aggregate table and specify its rollup granularity, you can only make the granularity coarser—for example,
Minute
toHour
. Polaris makes the granularity change during compaction. If an aggregate table does not contain data and there is not an active ingestion job associated with the table, you can change the rollup granularity to a finer granularity—for example,Hour
toMinute
. - Polaris does not support rollup for nested data because this data is generally high cardinality. If you have nested data, either flatten it into dimensions at ingestion time using
JSON_VALUE
or ingest the data into a string-typed column.
Learn more
See the following topics for more information:
- Table schema for the different data types for Polaris columns.
- Tables v2 API for reference on working with tables in Polaris.
- Ingestion sources overview for the available sources for ingestion in Polaris.