Dimensions are the primary concept when exploring your data. A dimension represents some quality that can distinguish one part of your data from another.
Clicking on a dimension brings up the dimension preview menu which, as well as providing buttons to perform the above operations also, tells you the estimated number of values in the dimension.
The dimensions can be edited in the
Dimensions tab of the data cube edit view:
Dimensions can be created by clicking the
Add button in the
A dimension that is just a direct representation of a column can be configured from the
Simple dimension tab.
A dimension can also represent some arbitrary transformation (see the dimension types section below)
in that case you would use the
Custom tab where you can enter any supported Plywood
expression as the dimension's formula.
Pivot can also provide suggestions about simple dimensions you might want to add. This is done by scanning the underlying data source's schema and automatically suggesting a dimension for any column that is not already represented.
Pivot provides the ability to group dimensions. This can be particularly useful for dimensions which come from the same basic attribute or are related in another way.
For example in a web traffic dataset scenario a lot of different dimensions (such as os, browser name, browser version, etc.) could all be derived from the user agent. It might therefore make sense to put those dimensions in a group.
To create a group, click the
... icon in one of the dimensions and select
Add to new group.
To add a dimension to an existing group, drag the dimension into the group.
ToDo: add screenshot of dimension group menu
In this section we will look at some of the many specific dimensions Pivot supports.
While Druid has a primary time column (called
__time) that is used for partitioning, it's often useful to have more than just one time column.
These time columns can be ingested as regular dimensions formatted in ISO string format (
Time from the type to make a time dimension, you can also configure the preset split granularities that will be presented as defaults.
Most dimensions are categorical, in the sense that they are represented as strings.
String from the type to make a string dimension.
Numeric dimensions can be bucketed during split to create histograms.
Number from the type to make a numeric dimension, you can also configure the preset split bucketing granularities that will be presented as defaults.
Geographic units can be a useful way to segment data.
You might need to create dimensions that are the result of some Boolean expression. Let's say that you are responsible for all accounts in the United States as well as some specific account. You could create a dimension with a formula like:
$country == 'United States' or $accountName.in(['Toyota', 'Honda'])
Here are some examples of common dimensions patterns
In Pivot and Druid is is possible to create dimensions that performs a lookup at query time.
ToDo: screenshot of a dimension with a lookup
If you have a dimension that represents a key into some other table, you may have set up a Druid query-time lookup (QTL)
in which case you would set the formula to
$lookupKey.lookup('my_awesome_lookup'), which would apply the lookup.
You can also apply the
.fallback() action as ether:
$lookupKey.lookup('my_awesome_lookup').fallback($lookupKey)to keep values that were not found as they are.
$lookupKey.lookup('my_awesome_lookup').fallback('missing')to map missing values to the word 'missing'.
A URL dimension is a dimension that somehow corresponds to a URL. It is possible to specify the URL mapping in Pivot.
Imagine you have a column
resourceName which has the following values:
druid-0.8.2 druid-0.8.1 druid-0.7.0 index.html
You could create a dimension that uses the
.extract function to focus on the version number in the column values:
Which would have values:
0.8.2 0.8.1 0.7.0 null
If no existing plywood function meets your needs, you could also define your own custom transformation. The transformation could be any supported Druid extraction function.
To do so in the data cube options (
Advanced tab of the edit view) define:
Then in the dimensions simply reference
stringFun like so: