Dimensions are the primary concept when exploring your data. A dimension represents some quality that can distinguish one part of your data from another.
Clicking on a dimension brings up the dimension preview menu which, as well as providing buttons to perform the above operations also, tells you the estimated number of values in the dimension.
The dimensions can be edited in the
Dimensions tab of the data cube edit view:
Dimensions can be created by clicking the
Add button in the
A dimension that is just a direct representation of a column can be configured from the
Simple dimension tab.
A dimension can also represent some arbitrary transformation (see the dimension types section below)
in that case you would use the
Custom tab where you can enter any supported Plywood
expression as the dimension's formula.
You can use the suggestion feature to rapidly add many new dimensions. This is done by scanning the underlying dataset's schema and automatically suggesting a dimension for any column that is not already represented.
You can group dimensions together into dimension groups. This can be particularly useful for dimensions which come from the same basic attribute or are related in another way.
For example in a web traffic dataset scenario a lot of different dimensions (such as os, browser name, browser version, etc.) could all be derived from the user agent. It might therefore make sense to put those dimensions in a group.
To create a group, click the
... icon in one of the dimensions and select
Add to new group.
To add a dimension to an existing group, drag the dimension into the group.
In this section we will look at some of the many specific types of dimensions that can be created.
While Druid has a primary time column (called
__time) that is used for partitioning, it's often useful to have more than just one time column.
These time columns can be ingested as regular dimensions formatted in ISO string format (
Time from the type to make a time dimension, you can also configure the preset granularities that will be presented as defaults.
Most dimensions are categorical, in the sense that they are represented as strings.
String from the type to make a string dimension.
Numeric dimensions can be bucketed during to create histograms.
Number from the type to make a numeric dimension, you can also configure the preset bucketing granularities that will be presented as defaults.
Geographic units can be a useful way to segment data.
You might need to create dimensions that are the result of some Boolean expression. Let's say that you are responsible for all accounts in the United States as well as some specific account. You could create a dimension with a formula like:
$country == 'United States' or $accountName.in(['Toyota', 'Honda'])
Here are some examples of common dimensions patterns
It is possible to create dimensions that performs a lookup at query time.
If you have a dimension that represents a key into some other table, you may have set up a Druid query-time lookup (QTL)
in which case you would set the formula to
$lookupKey.lookup('my_awesome_lookup'), which would apply the lookup.
You can also apply the
.fallback() action as ether:
$lookupKey.lookup('my_awesome_lookup').fallback($lookupKey)to keep values that were not found as they are.
$lookupKey.lookup('my_awesome_lookup').fallback('missing')to map missing values to the word 'missing'.
A URL dimension is a dimension who's values somehow corresponds to a URL.
Imagine you have a column
resourceName which has the following values:
druid-0.8.2 druid-0.8.1 druid-0.7.0 index.html
You could create a dimension that uses the
.extract function to focus on the version number in the column values:
Which would have values:
0.8.2 0.8.1 0.7.0 null
If no existing plywood function meets your needs, you could also define your own custom transformation. The transformation could be any supported Druid extraction function.
To do so in the data cube options (
Advanced tab of the edit view) define:
Then in the dimensions simply reference
stringFun like so: