Measures

Measures are aggregations applied to every segment in your data. They can you an overview of the segment, and help distinguish important segments from the less important ones. The measures live in the measure panel.

measure panel with at least one collapsed and one expanded group

You can edit them in the Measures tab of the data cube edit view:

edit measures

Creating measures

Measures can be created by clicking the Add button in the Measure tab.

Simple measures consisting of a single aggregate function over a single column (with an optional filter) can be configured from the Simple measure tab.

Cube edit new measure modal

A measure can also represent some post aggregation or computation (see the specific measure types section below). In that case you would use the Custom tab where you can enter any supported Plywood expression as the measure's formula.

The measure's formula assumes that $main is the Plywood reference to the data that the measure will be aggregating over.

Cube edit new measure modal

Imply can also provide some suggestions about simple measures that you might want to add. This is done by scanning the schema of the underlying data source and automatically suggesting a measure, with the appropriate aggregate, for any column that is not already represented by one of the existing measures.

Cube edit new measure modal

This is particularly useful if you have added a new column to your data source after creating the data cube and now want to represent it in the views.

Measure groups

You can group related measures into measure groups. This can be particularly useful for measures which come from the same basic attribute or are otherwise similar.

For example in a sales dataset scenario you might be interested in Total revenue, Min revenue, Max revenue, Avg revenue, Median revenue per user, and more. It might therefore make sense to put those measures in a group.

To create a group simply click the ... icon in one of the measures and select Add to new group. Then drag the relevant measures into the newly created group.

Measure formatting

You can configure how a measure is formatted from the Advanced tab of the Edit measure dialog.

Cube edit new measure modal

The Format field is used to specify the numeraljs formatting string that will be used to format this measure.

You can specify the format to be for regular numbers, currency, bytes, percentages, and scientific notation. You can also adjust the number of decimal places that you will see in the numbers. For example to see 3 decimal places (with abbreviation) you should set the format to: 0.000 a. For more examples of possible formatting entries see the example formatting table.

Measure transformations

You can transform a measure to be displayed as a 'Percent of parent split' or as 'Percent of total' instead of the default measure display.

Cube edit new measure modal custom transform

Measure fill options

You can specify how a measure should be filled on continuous visualizations (such as the line chart).

measure fill options

There are four possible fill options demonstrated above by four measures that are configured the same with the exception of different fill options.

The possible options are:

Specific measure types

In this section we will look at some of the many specific measure types supported.

Filtered aggregations

Filtered aggregations are very powerful. If, for example, your revenue in the US is a very important measure, you could express it as:

$main.filter($country == 'United States').sum($revenue)

It is also common to express a ratio of something filtered vs unfiltered.

$main.filter($statusCode == 500).sum($requests) / $main.sum($requests)

Ratios

Ratios are often useful to see the relationships in your data.

Here's one that expresses the computation of CPM:

$main.sum($revenue) / $main.sum($impressions) * 1000

Quantiles

A quantile can be a very useful measure or your data. For large datasets it is often more informative to look at the 98th or 99th quantile of the data rather than the max as it would filter out freak values. Similarly a median (or 50% quantile) can present different information than an average measure.

To add a quantile measure to your data simply define a formula like so:

$main.quantile($revenue, 0.98)

It is possible to fine-tune approximateHistogram based quantiles, allowing you to determine your trade-off between performance and accuracy.

Enter a 3rd parameter in the quantile formula of the form 'resolution=400,numBuckets=10,lowerLimit=0,upperLimit=1000' to pass those tuning parameters to the underlying aggregator.

To understand how to tune the approximateHistogram parameters check out the Druid documentation

Rates

Sometime the rate of change (over time) of a measure is very important.

There is a magic constant available for expressions $MillisecondsInInterval that corresponds to the number of milliseconds in the time interval over which the aggregation (whatever it is) is running.

Therefore it is possible to define a measure like

$main.sum($bytes) / $MillisecondsInInterval * 1000

to give you the accurate rate of bytes per second regardless of your filter window or selected split (hour, minute, or whatever).

Double aggregation measures

It is possible to define measures that perform a sub-split and a sub aggregation as part of their overall aggregation. This is needed to express certain calculations otherwise would not be possible.

The general form of a double aggregated measure's formula is:

$main.split(SUB_SPLIT_EXPRESSION).apply('V', $main.INNER_AGGREGATE(...)).OUTER_AGGREGATE($V)

Where:

Two examples of double aggregated measures are provided:

Netflow 95th percentile billing

When dealing with netflow data at an ISP level one metric of interest is 5 minutely 95th percentile which is used for burstable billing.

To add that as a measure (assuming there is a column called bytes that represent the bytes transferred during the interval) set the formula as follows:

$main.split($__time.timeBucket(PT5M)).apply('B', ($main.sum($bytes) * 8) / 300).quantile($B, 0.95)

Here the data is sub-split on 5 minute buckets (PT5M), then aggregated to calculate the bitrate (8 is the number of bit in a byte and 300 is the number of seconds in 5 minutes). Those inner 5 minute bitrates are then aggregated with the 95th percentile.

Average daily active users

When looking at users engaging with your website, service, or app it is often needed to know the number of daily active users.

This can be calculated as:

$main.split($__time.timeBucket(P1D)).apply('U', $main.countDistinct($user)).average($U)

Here the data is sub-split by day (P1D) and then an average is computed on top of that.

Switching metric columns

If you switch how you ingest your underlying metric and can't (or do not want to) recalculate all of the previous data, you could use a derived measure to seamlessly merge these two metrics in the UI.

Let's say you had a metric called revenue_in_dollars and for some reason you will now be ingesting it as revenue_in_cents.

Furthermore right now your users are using the following measure:

$main.sum($revenue_in_dollars)

If your data had a 'clean break' where all events have either revenue_in_dollars or revenue_in_cents with no overlap, you could use:

$main.sum($revenue_in_dollars) + $main.sum($revenue_in_cents) / 100

If instead there was a period where you were ingesting both metrics then the above solution would double count that interval. You can 'splice' these two metrics together at a specific time point.

Logically you should be able leverage the Filtered aggregations to do:

$main.filter(__time < '2016-04-04T00:00:00Z').sum($revenue_in_dollars) + $main.filter('2016-04-04T00:00:00Z' <= __time).sum($revenue_in_cents) / 100

Custom aggregations

Within the measures you have access to the full power of the Plywood expressions. If you ever find yourself needing to go beyond the expressive potential of Plywood you could define your own custom aggregations. The aggregation could be any supported Druid aggregation.

For example Plywood currently does not support the modulo operator. While Druid has no native modulo support either, it is possible to modulo a measure by using a javascript aggregator.

To do so in the data cube options (Advanced tab of the edit view) define:

{
  "customAggregations": {
    "addedMod1337": {
      "aggregation": {
        "type": "javascript",
        "fieldNames": ["added"],
        "fnAggregate": "function(current, added) { return (current + added) % 1337 }",
        "fnCombine": "function(partialA, partialB) { return (partialA + partialB) % 1337 }",
        "fnReset": "function() { return 0; }"
      }
    }
  }
}

Then reference addedMod1337 in a measure's formula like so:

$main.customAggregate('addedMod1337')

This functionality can be used to access any custom aggregations that might be loaded via extensions.

Overview

Deploy

Manage Data

Query Data

Visualize

Configure

Misc