Dimensions are the primary concept when exploring your data. A dimension represents some quality that can distinguish one part of your data from another.
Dimensions can be used to split your data into segments, and to focus on a specific segment using filters.
The dimensions can be dragged into the filter bar, show bar, visualization, and pinboard panel to aid in the exploration.
Clicking on a dimension brings up the dimension preview menu which, as well as providing buttons to perform the above operations, also tells you the estimated number of values in the dimension.
The dimensions can be edited in the Dimensions
tab of the data cube edit view:
Dimensions can be created by clicking the New dimension
button in the Dimensions
tab.
A dimension that is just a direct representation of a column can be configured from the Basic
tab.
A dimension can also represent some arbitrary transformation (see the dimension types section below),
in which case you would use the Custom
tab, where you can enter any supported Plywood
expression as the dimension's formula.
You can use the Suggestions
feature to rapidly add many new dimensions.
This is done by scanning the underlying dataset's schema and automatically suggesting a dimension for any column that is not already represented.
You can organize dimensions into groups. Groups are particularly useful for dimensions that come from the same basic attribute or are related in another way.
For example, in a web traffic dataset scenario, dimensions such as OS, browser name, and browser version could all be derived from the user agent. It might therefore make sense to put those dimensions in a group.
To create a group, click the ...
icon in one of the dimensions and select Add to a new group
.
To add a dimension to an existing group, drag the dimension into the group.
In this section we will look at some of the many specific types of dimensions that can be created.
While Druid has a primary time column (called __time
) that is used for partitioning, it's often useful to have more than just one time column.
These time columns can be ingested as regular dimensions formatted in ISO string format (e.g., 2017-04-20T16:20:00Z
).
Select Time
from the type to make a time dimension. You can also configure the preset granularities that will be presented as defaults.
Most dimensions are categorical, in the sense that they are represented as strings.
Select String
from the type to make a string dimension.
Numeric dimensions can be bucketed during setup to create histograms.
Select Number
from the type to make a numeric dimension. You can also configure the preset bucketing granularities that will be presented as defaults.
Geographic units can be a useful way to segment data.
To use the "Geo marks" or "Geo shade" visualizations, show a single dimension configured as the "Geo" type. The underlying data needs to conform to one of the following specifications: ISO-3166-1 Alpha 2, ISO-3166-1 Alpha 3, ISO-3166-2, UN M49, or Geohash
You might need to create dimensions that are the result of some Boolean expression. Let's say that you are responsible for all accounts in the United States as well as some specific account. You could create a dimension with a formula like:
$country == 'United States' or $accountName.in(['Toyota', 'Honda'])
Here are some examples of common dimensions patterns.
It is possible to create dimensions that perform a lookup at query time.
If you have a dimension that represents a key into some other table, you may have set up a Druid query-time lookup (QTL),
in which case you would set the formula to $lookupKey.lookup('my_awesome_lookup')
, which would apply the lookup.
You can also apply the .fallback()
action as either:
$lookupKey.lookup('my_awesome_lookup').fallback($lookupKey)
to keep values that were not found as they are.$lookupKey.lookup('my_awesome_lookup').fallback('missing')
to map missing values to the word 'missing'.A URL dimension is a dimension whose values can be mapped to a URL.
Within the dimension modal you can add a URL transformation which will add a "Go to URL" action button for this dimension.
The provided string will be interpolated (%s
) for the given dimension value.
In the above example the value Nepal
would be transformed into https://en.wikipedia.org/wiki/Nepal
, the Wikipedia page for Nepal.
Imagine you have a column resourceName
with the following values:
druid-0.8.2
druid-0.8.1
druid-0.7.0
index.html
You could create a dimension that uses the .extract
function to focus on the version number in the column values:
$resourceName.extract('(\d+\.\d+\.\d+)')
Which would return the following values:
0.8.2
0.8.1
0.7.0
null
If no existing plywood function meets your needs, you could also define your own custom transformation. The transformation could be any supported Druid extraction function.
For example, if Javascript is enabled in Druid, you can apply JavaScript functions to a string.
To do so in the data cube options (in the Advanced
tab of the edit view) define:
{
"customTransforms": {
"stringFun": {
"extractionFn": {
"type": "javascript",
"function": "function(x) { try { return decodeURIComponent(x).trim().charCodeAt(0) } catch(e) { return null; } }"
}
}
}
}
Then in the dimensions simply reference stringFun
like so:
$countryURL.customTransform('stringFun')