Skip to main content

Time series functions (alpha)

Time series functions are an alpha feature that should be considered preview and subject to change or removal at any time. Alpha features are provided "as is," and are not subject to Imply SLAs.

Time series functions operate on data recorded over a period of time. You can use time series functions to analyze time series data, identify trends and seasonality, interpolate values, and load extra time periods to fill in boundary values.

All examples in this topic use the kttm-v2-2019-08-25.json.gz dataset bundled with the Imply distribution of Apache Druid. To load this dataset, open the Druid console and click Load data > Batch-SQL > Example data. Select KoalasToTheMax one day from the list of datasets.

Setup

Time series functions are disabled by default. Enable this feature by loading the imply-timeseries extension.

To load the extension, go to the cluster's Setup page and expand the Advanced config section. Click Add custom extension and enter imply-timeseries in the Name field. You do not need to provide the URL or the path to the extension file because it is included with the Imply distribution of Druid. See Imply bundled extensions for more information.

Time series aggregator

The TIMESERIES aggregation function generates a binary representation of the time series from raw data.

FunctionDescription
TIMESERIES(timeColumn, dataColumn, window [, maxEntries])Creates a time series from raw data.
  • timeColumn specifies the name of the column that contains the timestamps corresponding to the time series being created. timeColumn can be a numeric column or a time series column.
  • dataColumn specifies the name of the column that contains the data points corresponding to the time series being created.
  • window is an ISO 8601 interval that restricts the time range over which the time series is created.
  • maxEntries controls the maximum number of entries allowed in the time series at any point of time. maxEntries defaults to 7200.

To convert the binary output to JSON format, wrap the outermost function with the TIMESERIES_TO_JSON function.

Example

The following query produces a time series using the raw data values stored in the table named kttm-v2-2019-08-25. The resulting output is in JSON format.

SELECT count(*) as total_count,
TIMESERIES_TO_JSON(TIMESERIES("__time", "session_length", '2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z', 203000)) as timeseries_json
FROM "kttm-v2-2019-08-25"
Click to view the output
{
"total_count": 465346,
"timeseries_json": {
"window":"2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z",
"timestamps":[1566691200031,1566691200059,1566691200178,1566691200965,1566691201241,1566691201858,1566691202501,1566691202525,1566691202688,1566691203546,1566691204014,1566691205170,1566691205223,1566691205379,1566691205833,1566691206205,1566691206208,1566691206250,1566691206550,1566691206977,1566691207544,1566691207587,1566691207611,1566691207747,
...
1566691544969,1566691545010,1566691545586,1566691546054,1566691546361],
"dataPoints":[76261.0,252689.0,1753602.0,62093.0,79764.0,6264.0,8070.0,8078.0,49275.0,82064.0,31420.0,15024.0,17501.0,52173.0,149.0,52950.0,10613.0,82476.0,19424.0,19253.0,1376342.0,68713.0,315220.0,315250.0,102294.0,13215.0,21133.0,56008.0,198372.0,56549.0,144642.0,53529.0,49.0,39488.0,57443.0,1070453.0,2487.0,2302.0,17963.0,60384.0,2307.0,43
...
1401934.0,30915.0,420991.0,278.0,6217.0,37563.0,2.026504E7,1028.0],
"bucketMillis":1,
"bounds":{
"start":{"timestamp":null,"data":null},
"end":{"timestamp":1566691546517,"data":91579.0}
}
}
}

The output is a time series object with the following properties:

  • window: The ISO 8601 time interval for which the time series is created.
  • timestamps: The list of sorted timestamps.
  • dataPoints: The list of data points associated with the timestamps.
  • bucketMillis: The time interval in milliseconds that determines how to segment the data.
  • bounds: If the overall query time interval is greater than the aggregator window, the bounds object contains the closest data points to the window on each end.

Time series processing functions

You can use time series processing functions to transform or augment a time series. You can apply these functions to a time series repeatedly and in any suitable order.

FunctionDescription
ADD_TIMESERIES(timeseries1, timeseries2 [, shouldNullPoison])Adds the data points in the two time series objects and returns a new time series. The two provided time series must have identical timestamps.
shouldNullPoison sets the output to null if any of the input time series are null. shouldNullPoison defaults to false.
DELTA_TIMESERIES(timeseries [, bucketPeriod])Returns a new time series which contains delta differences between the data points in the input time series. You can use the DELTA_TIMESERIES function to analyze counter data (an always increasing numeric metric) and determine the rate of increase of the counter value over time.
bucketPeriod is an ISO 8601 interval through which the deltas are added up (or folded) within a time bucket. bucketPeriod defaults to 1 millisecond.
DIVIDE_TIMESERIES(timeseries1, timeseries2 [, shouldNullPoison])Divides the data points in the two time series objects and returns a new time series. The two provided time series must have identical timestamps.
shouldNullPoison defaults to false if not specified. The function returns a non-null time series if the first time series is non-null and the second time series is null. If the denominator data point in any division is 0, the new data point for the divided time series is a NaN value. When true, shouldNullPoison sets the output to null if any of the input time series are null.
MAX_OVER_TIMESERIES(timeseries)Returns the largest data point value in the time series.
MULTIPLY_TIMESERIES(timeseries1, timeseries2)Multiplies the data points in the two time series objects and returns a new time series. The two provided time series must have identical timestamps.
SUBTRACT_TIMESERIES(timeseries1, timeseries2 [, shouldNullPoison])Subtracts the data points in the two time series objects and returns a new time series. The two provided time series must have identical timestamps.
shouldNullPoison defaults to false if not specified. The function returns a non-null time series if the first time series is non-null and the second time series is null. When true, shouldNullPoison sets the output to null if any of the input time series are null.
SUM_TIMESERIES(timeseries, window [, maxEntries])Returns a sum of data points in a list of time series. All of the time series in the list must have identical timestamps.
window is an ISO 8601 interval that specifies the time range to consider.
maxEntries controls the maximum number of entries allowed in the time series at any point of time. maxEntries defaults to 260000.
TIMESERIES_TO_JSON(timeseries)Outputs time series data in JSON format.

Interpolation

Interpolation lets you estimate the values of missing data points at regular intervals in the time series.

The supported interpolation methods include:

  • Linear: Uses linear interpolation to fill the data points for missing timestamps.
  • Padding: Carries forward the last seen value in the time series to the missing timestamps.
  • Backfill: Carries backwards the last seen value in the time series to the missing timestamps.

You define the regular time intervals by segmenting time since Unix epoch (00:00:00 UTC on January 1, 1970) using the bucketPeriod parameter. The bucketPeriod parameter is an ISO 8601 interval that determines the segment period in which time is chunked for calculating interpolation points.

The following table lists the interpolation functions that you can apply to the time series object.

FunctionDescription
LINEAR_INTERPOLATION(timeseries, bucketPeriod)Creates a time series from the given time series object with linear interpolation for missing data points.
PADDING_INTERPOLATION(timeseries, bucketPeriod)Creates a time series from the given time series object with padding interpolation for missing data points.
BACKFILL_INTERPOLATION(timeseries, bucketPeriod)Creates a time series from the given time series object with backfill interpolation for missing data points.

Time-weighted average

You can calculate time-weighted averages of data points within each regular time interval. You define the regular time intervals by segmenting time since Unix epoch (00:00:00 UTC on January 1, 1970) using the bucketPeriod parameter. Time intervals that have no data points associated with them are not included in the output.

FunctionDescription
TIME_WEIGHTED_AVERAGE(timeseries, interpolator, bucketPeriod)Creates a time series from the given time series object with time-weighted average for each bucket point.
interpolator specifies the interpolation strategy to fill missing edge points for time intervals. interpolator accepts the following values: linear, padding, or backfill. See Interpolation for additional details.
bucketPeriod is an ISO 8601 interval that determines the segment period in which time is chunked for calculating time-weighted average.

Example queries

This section contains examples of time series processing functions.

The following query creates a time series that contains delta differences between the data points:

SELECT count(*) as total_count,
TIMESERIES_TO_JSON(DELTA_TIMESERIES(TIMESERIES("__time", "session_length", '2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z', 203000))) as delta
FROM "kttm-v2-2019-08-25"
Click to view the output
{
"total_count":465346,
"delta": {
"window":"2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z",
"timestamps":[1566691200031,1566691200059,1566691200178,1566691201241,1566691202501,1566691202525,1566691202688,1566691203546,1566691205223,1566691205379,1566691206205,1566691206250,1566691207544,1566691207611,1566691207747,1566691208852,1566691209424,1566691209643,1566691211161,1566691212085,1566691212106,1566691212136,1566691213559,1566691213598,
...
1566691543580,1566691544324,1566691545010,1566691545586,1566691546054],
"dataPoints":[176428.0,1500913.0,17671.0,1806.0,8.0,41197.0,32789.0,2477.0,34672.0,52801.0,71863.0,1357089.0,246507.0,30.0,7918.0,34875.0,142364.0,88093.0,39439.0,17955.0,1013010.0,15661.0,42421.0,131493.0,49737.0,29011.0,19682.0,36958.0,57481.0,106319.0,208682.0,53522.0,5447.0,64676.0,91347.0,47892.0,19609.0,142112.0,52714.0,33778.0,37210.0,15904.0,35711.0,179455.0,101275.0,147328.0,18283.0,8710.0,113181.0,
...
9715883E7,33.0,9552.0,96944.0,437809.0,1366714.0,390076.0,5939.0,31346.0,2.0227477E7,90551.0],
"bucketMillis":1,
"bounds":{
"start":{"timestamp":null,"data":null},
"end":{"timestamp":null,"data":null}
}
}
}

The following query returns the largest data point value in the time series:

SELECT count(*) as total_count,
MAX_OVER_TIMESERIES(TIMESERIES("__time", "session_length", '2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z', 203000)) as largest_data_point
FROM "kttm-v2-2019-08-25"
Click to view the output
{
"total_count":465346,
"largest_data_point":159900372
}

The following query creates a time series with linear interpolation for missing data points at equal intervals of one minute:

SELECT count(*) as total_count,
LINEAR_INTERPOLATION(TIMESERIES("__time", "session_length", '2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z', 203000), 'PT1M') as timeseries
FROM "kttm-v2-2019-08-25"

The following query creates a time series with padding interpolation for missing data points at equal intervals of one hour:

SELECT count(*) as total_count,
PADDING_INTERPOLATION(TIMESERIES("__time", "session_length", '2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z', 203000), 'PT1H') as timeseries
FROM "kttm-v2-2019-08-25"

The following query creates a times series with time-weighted average for each bucket point:

SELECT count(*) as total_count,
TIME_WEIGHTED_AVERAGE(TIMESERIES("__time", "session_length", '2019-08-25T00:00:00.031Z/2019-08-25T00:05:46.517Z', 203000), 'backfill', 'PT1M') as timeseries
FROM "kttm-v2-2019-08-25"