Dimensions are the primary concept when exploring your data. A dimension represents some quality that can distinguish one part of your data from another.
Clicking on a dimension brings up the dimension preview menu which, as well as providing buttons to perform the above operations, also tells you the estimated number of values in the dimension.
The dimensions can be edited in the
Dimensions tab of the data cube edit view:
Dimensions can be created by clicking the
New dimension button in the
A dimension that is just a direct representation of a column can be configured from the
Simple dimension tab.
A dimension can also represent some arbitrary transformation (see the dimension types section below),
in which case you would use the
Custom dimension tab, where you can enter any supported Plywood
expression as the dimension's formula.
You can use the
Suggestions feature to rapidly add many new dimensions.
This is done by scanning the underlying dataset's schema and automatically suggesting a dimension for any column that is not already represented.
You can group dimensions together into dimension groups. This can be particularly useful for dimensions which come from the same basic attribute or are related in another way.
For example in a web traffic dataset scenario, a lot of different dimensions (such as os, browser name, browser version, etc.) could all be derived from the user agent. It might therefore make sense to put those dimensions in a group.
To create a group, click the
... icon in one of the dimensions and select
Add to a new group.
To add a dimension to an existing group, drag the dimension into the group.
In this section we will look at some of the many specific types of dimensions that can be created.
While Druid has a primary time column (called
__time) that is used for partitioning, it's often useful to have more than just one time column.
These time columns can be ingested as regular dimensions formatted in ISO string format (e.g.,
Time from the type to make a time dimension. You can also configure the preset granularities that will be presented as defaults.
Most dimensions are categorical, in the sense that they are represented as strings.
String from the type to make a string dimension.
Numeric dimensions can be bucketed during setup to create histograms.
Number from the type to make a numeric dimension. You can also configure the preset bucketing granularities that will be presented as defaults.
Geographic units can be a useful way to segment data.
To use the "Geo marks" or "Geo shade" visualizations, you will need to be showing a single dimension configured as the "Geo" type. The underlying data will need to conform to one of the following specifications: ISO-3166-1 Alpha 2, ISO-3166-1 Alpha 3, ISO-3166-2, UN M49, or Geohash
You might need to create dimensions that are the result of some Boolean expression. Let's say that you are responsible for all accounts in the United States as well as some specific account. You could create a dimension with a formula like:
$country == 'United States' or $accountName.in(['Toyota', 'Honda'])
Here are some examples of common dimensions patterns
It is possible to create dimensions that perform a lookup at query time.
If you have a dimension that represents a key into some other table, you may have set up a Druid query-time lookup (QTL),
in which case you would set the formula to
$lookupKey.lookup('my_awesome_lookup'), which would apply the lookup.
You can also apply the
.fallback() action as either:
$lookupKey.lookup('my_awesome_lookup').fallback($lookupKey)to keep values that were not found as they are.
$lookupKey.lookup('my_awesome_lookup').fallback('missing')to map missing values to the word 'missing'.
A URL dimension is a dimension whose values can be mapped to a URL.
Within the dimension modal you can add a URL transformation which will add a "Go to URL" action button for this dimension.
The provided string will be interpolated (
%s) for the given dimension value.
In the above example the value
Nepal would be transformed into
https://en.wikipedia.org/wiki/Nepal, the Wikipedia page for Nepal.
Imagine you have a column
resourceName which has the following values:
druid-0.8.2 druid-0.8.1 druid-0.7.0 index.html
You could create a dimension that uses the
.extract function to focus on the version number in the column values:
Which would return the following values:
0.8.2 0.8.1 0.7.0 null
If no existing plywood function meets your needs, you could also define your own custom transformation. The transformation could be any supported Druid extraction function.
To do so in the data cube options (in the
Advanced tab of the edit view) define:
Then in the dimensions simply reference
stringFun like so: