Dimensions
Dimensions correspond to data columns in the data source. In a data cube, you can show and filter data using dimensions.
You can drag dimensions into the filter bar, show bar, and pinboard panels to explore and visualize data. Clicking on a dimension brings up the dimension preview menu along with buttons for performing these operations, as shown in the following image:
After you drag a dimension into the show bar, click the dimension to set the following options:
- Sort by: Sort the dimension in ascending or descending order.
- Limit: Limit the number of values to display. For example, selecting
Limit:5
in the above example displays five countries with the highest number of events.
The limit defaults to 1000 for string type dimensions and 5000 for numeric type dimensions. You can customize the limit by specifying thesplitLimits
property in the Data cube options (as JSON) field of the Advanced tab in the data cube edit view. For more information, see Advanced options. - Overall: Show or hide a row that displays the sum total of the measure for all values returned by the query.
- Others: If you set a Limit, show or hide a row that displays the sum total of all values omitted by the limit.
Dimension groups
You can organize dimensions into groups. Groups are particularly useful for dimensions that come from the same basic attribute or are related in another way.
For example, in a web traffic dataset scenario, dimensions such as OS, browser name, and browser version could all be derived from the user agent. It might therefore make sense to put those dimensions in a group.
To create a group, click the ...
icon in one of the dimensions and select Add to a new group
.
To add a dimension to an existing group, drag the dimension into the group.
Create a custom dimension
You can add new dimensions or edit the existing ones for a data cube. Click the edit icon at the top right, and then click the Dimensions tab in the data cube edit view.
You can configure a dimension that simply represents a column from the data source in the Basic tab.
You can also create a dimension that represent some arbitrary transformation. In this case, use the Custom tab, where you can enter a Druid SQL expression or a supported Plywood expression as the dimension's formula.
For example, the following expression defines a dimension based on a lookup function for country names:
COALESCE(LOOKUP(t."Country", 'store_sales_country_to_iso31661'), t."Country")
This expression performs a lookup that retrieves the country name based on a mapping between country abbreviations and a country name in the store_sales_country_to_iso31661
table. The resulting data cube includes a dimension with the full country name. In effect, this data cube joins rows from the store_sales_country_to_iso31661
and store_sales
data sources.
Dimensions and measures can include a description text which users can view in the Info pop-up window.
Dimension suggestions
You can use the Suggestions feature to rapidly add many new dimensions. Pivot scans the schema that underlies the dataset and automatically suggests a dimension for any column not already represented.
Dimension types
In this section we will look at some of the many specific types of dimensions that can be created.
Time dimensions
Druid has a primary time column called __time
which it uses for partitioning. You can also create additional time columns.
Druid ingests time columns as regular dimensions formatted in ISO 8601 string format, for example 2017-04-20T16:20:00Z
.
When creating or editing a dimension, select Time as the type to configure a time dimension. You can also configure the default granularities that Pivot presents and select a bucketing option.
Pivot uses an algorithm to format time ranges. To override it, go into edit mode for the time dimension and click Format. Enter a specific format to display. For a list of supported formats, see the Moment.js documentation.
The following example datacube shows a time dimension with format ha
to display the hour and am/pm in lowercase:
String dimensions
Most dimensions are categorical, in the sense that they are represented as strings.
Select String from the type to make a string dimension.
Multi-value dimensions
Select the Set/String type to create a multi-value dimension, which is an array of strings. For example, a multi-value dimension named Cities
might contain the following data:
["Paris","London","New York","Sydney"]
Numeric dimensions
Numeric dimensions can be bucketed during setup to create histograms.
Select Number from the type to make a numeric dimension. You can also configure the preset bucketing granularities that will be presented as defaults.
Geographic dimensions
Geographic units can be a useful way to segment data.
To use the Geo marks and Geo shade visualizations, create and display a single dimension configured as the Geo type. Select the appropriate Geo encoding—the underlying data must conform to one of the following specifications:
Latitude and longitude
If your event data contains latitude and longitude coordinates, you can configure the Street map visualization to pinpoint events to precise locations on a map.
The Imply street map visualization is enabled with an account-level feature flag. To enable the feature, contact your Imply support representative.
Once Imply has enabled the feature, you can create dimensions for your latitude and longitude data. Select the Geo type and the latitude or longitude Geo encoding.
See Street map for information on configuring the street map visualization.
Boolean dimensions
You might need to create dimensions that are the result of some Boolean expression. Let's say that you are responsible for all accounts in the United States as well as some specific account. You could create a dimension with a formula like:
- SQL
- Plywood
t.country = 'United States' OR t.accountName IN ('Toyota', 'Honda')
$country == 'United States' or $accountName.in(['Toyota', 'Honda'])
IP dimensions
The Enhanced IP support feature allows you to store a dimension as a complex data type representing an IP address or an IP prefix.
Once you enable this feature and ingest IP data, you can use the IP and IP prefix type dimensions in Pivot to search and filter in your visualizations based on IP addresses and IP prefixes.
When you create a data cube, Pivot auto-detects the complex IP columns and applies the appropriate type:
- IP: An IP address in a 128-bit binary format. Maps to the
ipAddress
complex data type in Druid. - IP prefix: An IP prefix in a 136-bit binary format, comprised of a 128-bit address and an 8-bit prefix value. Maps to the
ipPrefix
complex data type in Druid.
See the Data cubes page for information on using IP dimensions to filter a visualization.
If you create a dimension with type IP or IP prefix on a column that doesn't contain complex IP type data, Pivot converts the dimension type to string. IP filters don't work on strings but you can apply string filters to this data.
Custom dimension examples
Here are some examples of common dimension patterns.
The following examples show expressions as Pivot SQL and Plywood expressions.
Lookups
You can create dimensions that perform a lookup at query time. To have a dimension that represents a key into another table, set up a Druid query-time lookup.
You can then use the LOOKUP
function to pass the lookup key and the lookup table name:
- SQL
- Plywood
LOOKUP(t.lookupKey, 'my_lookup')
$lookupKey.lookup('my_lookup')
To handle values that are not found in the lookup table, you can either retain values that were not found as they are or replace them with a specific string.
For example, to keep values as they are, use the lookup key as the fallback value:
- SQL
- Plywood
COALESCE(LOOKUP(t.lookupKey, 'my_lookup'), t.lookupKey)
$lookupKey.lookup('my_lookup').fallback($lookupKey)
To replace values not found in the lookup table with a string, such as the word missing, pass the string as the fallback value:
- SQL
- Plywood
COALESCE(LOOKUP(t.lookupKey, 'my_lookup'), 'missing')
$lookupKey.lookup('my_lookup').fallback('missing')
Joins
You can create a dimension that performs a join on another table at query time. When you define a dimension, use the PIVOT_LEFT_JOIN
function to enable Pivot to perform a lookup-style (left) join on an arbitrary table.
The function syntax is as follows:
PIVOT_LEFT_JOIN(
t.lookupKey OR ARRAY [expr1, expr2,...],
'my_lookup_table',
'key_column' OR ARRAY [expr1, expr2,...],
'value_column'
)
The first argument is a SQL expression or an array of expressions representing the lookup key. The third argument must be a string literal or an array of string literals representing the key column. If you use arrays, they must have the same length.
The following example demonstrates a lookup for customer status:
PIVOT_LEFT_JOIN(
t.customerId,
'customer_statuses',
'id',
'status'
)
The following example performs a lookup for value Foxtrot
in the nato_phonetic
table. The ['city', 'state']
array in the third argument represents the key column in nato_phonetic
.
PIVOT_LEFT_JOIN(
ARRAY ["cityName", "stateName"],
'nato_phonetic',
ARRAY ['city', 'state'],
'Foxtrot'
)
URL dimensions
A URL dimension is a dimension whose values can be mapped to a URL.
Within the dimension modal you can add a URL transformation which will add a "Go to URL" action button for this dimension.
The provided string will be interpolated (%s
) for the given dimension value.
In the above example the value Nepal
would be transformed into https://en.wikipedia.org/wiki/Nepal
, the Wikipedia page for Nepal.
Extractions
Imagine you have a column resourceName
with the following values:
druid-0.18.2
druid-0.18.1
druid-0.17.0
index.html
You could create a dimension that extracts the version number in the column values:
- SQL
- Plywood
REGEXP_EXTRACT(t.resourceName, '(\d+\.\d+\.\d+)')
$resourceName.extract('(\d+\.\d+\.\d+)')
Which would return the following values:
0.18.2
0.18.1
0.17.0
null
Other examples
Here are some common expressions with which to build custom dimensions from the dimensions that already exist in your data cube:
Returns the character length for every row of the dimension. Applies to the String
dimension type.
- SQL
- Plywood
LENGTH(t."comment")
$dimension.length
This example extracts all numbers from every row of the dimension. If there are no numbers, it returns "missing." Use regular expressions to build extractions. Applies to the String
dimension type.
- SQL
- Plywood
(REGEXP_EXTRACT(t."comment", '([0-9]+)', 0))
$dimension.extract("([0-9]+)").fallback("missing")
This example makes the entire string upper case. The function also accepts 'lowerCase'. Applies to the String
dimension type.
- SQL
- Plywood
UPPER(t."comment")
$dimension.transformCase('upperCase')
This example adds 'End of String' to the end of the dimension's values. Applies to the String
dimension type.
- SQL
- Plywood
CONCAT(t."comment", 'End of String')
$dimension.concat('End of String')
Returns the index value for 'World' in the string for every row in the dimension. In Plywood, this function starts indexing at 0
and returns -1
when 'World' is not found. In Pivot SQL, the equivalent function starts indexing at 1
and returns 0
when no match is found. Applies to the String
dimension type.
- SQL
- Plywood
POSITION('World' IN t."dimension") - 1
$dimension.indexOf('World')
Returns x-y number of characters for every row in the dimension. Applies to the String
dimension type.
- SQL
- Plywood
SUBSTR(t."dimension", x, y)
$dimension.substr(x, y)
Returns true if the value matches a value in the specified set; otherwise, returns false. You can specify the returned values as follows:
$cityName.in(['London','Aachen','Abbotsford']).then('Match Found').fallback('No Match')
Applies to String
and Numeric
data types.
- SQL
- Plywood
t."cityName" IN ('London','Aachen','Abbotsford')
$cityName.in(['London','Aachen','Abbotsford'])
Returns true or false based on whether the given regular expression is matched. You can specify the returned values as follows:
$dimension.match('^Hel*')then('Matched').fallback('Not matched')
Applies to the String
dimension type.
- SQL
- Plywood
REGEXP_LIKE(t."comment", ('^Hel*'))
$dimension.match('^Hel*')
Returns true or false based on whether the input string is matched. You can specify the returned values like this: $dimension.contains('hello').then('Hello Friend').fallback('Bye Friend')
Applies to the String
dimension type.
- SQL
- Plywood
CONTAINS_STRING(t."comment", 'hello')
$dimension.contains('hello')
You can use math operations in custom expressions. Applies to the Numeric
data type.
- SQL
- Plywood
LENGTH(f."comment") + 2
$dimension.length()+2
Returns an absolute value. Applies to the Numeric
data type.
- SQL
- Plywood
ABS(CHAR_LENGTH(t."channel")-100)
($dimension.length()-100).absolute()
Raises the operand to the power that you specify. Applies to the Numeric
data type.
- SQL
- Plywood
POWER(LENGTH(t."comment"), 5)
$dimension.length().power(0.5)
Returns true if the value matches the input value and false if it does not. Only use single quotes with strings. Applies to String
, Numeric
, and Boolean
data types.
- SQL
- Plywood
t."comment" IS 'red'
$dimension.is('example')
Performs an AND operation between the value of the dimension and the second value that you input. Additional Boolean expressions include 'or'
, 'greaterOrEqual'
, 'greaterThan'
, 'lessThanOrEqual'
, 'lessThan'
, and 'not'
. Applies to the Boolean
data type.
- SQL
- Plywood
t."comment" AND (2>1)
$dimension(value).and(2>1)
Creates a time bucket of the duration that you specify. Applies to the Time
data type.
- SQL
- Plywood
TIME_FLOOR(t."__time", 'PT5M')
$__time.timeBucket('PT5M')
Floors the operand time to the nearest duration within the specified timezone. Applies to the Time
data type.
- SQL
- Plywood
TIME_FLOOR(t."__time", 'P1D', 'Asia/Shanghai')
$__time.timeFloor('P1D','Asia/Shanghai')
Shifts the operand time by the interval you specify within the given timezone. Do not use a + sign to shift time forward. Applies to the Time
data type.
- SQL
- Plywood
TIME_SHIFT(t."__time", 'PT1M', -5)
$__time.timeShift('PT1M',-5)
Returns part of a timestamp. Additional possibilities include: 'MINUTE_OF_HOUR'
, 'MINUTE_OF_DAY'
, 'MINUTE_OF_WEEK'
, 'MINUTE_OF_MONTH'
, 'MINUTE_OF_YEAR'
, 'HOUR_OF_DAY'
, 'HOUR_OF_WEEK'
, 'HOUR_OF_MONTH'
, 'HOUR_OF_YEAR'
, 'DAY_OF_WEEK'
, 'DAY_OF_MONTH'
, 'DAY_OF_YEAR'
, 'WEEK_OF_MONTH'
, 'WEEK_OF_YEAR'
, and 'MONTH_OF_YEAR'
. Applies to the Time
data type.
- SQL
- Plywood
TIME_EXTRACT(t."__time", 'SECOND')
$__time.timePart('SECOND_OF_MINUTE')
Casts a number to a time value.
- SQL
- Plywood
MILLIS_TO_TIMESTAMP(t."dimension")
$dimension.cast('time')
In addition to number to time (time
), other supported cast types in Plywood are: from number to string (string
), from time to number (number
), and from string to number (number
). Applies to Time
, String
, and Number
data types.
Custom transformations
DEPRECATED: Custom transformations are only possible in Plywood and their use is discouraged.
If no existing Plywood function meets your needs, you could also define your own custom transformation. The transformation could be any supported Druid extraction function.
For example, if JavaScript is enabled in Druid, you can apply any number of JavaScript functions to a string.
To do so in the data cube options (in the Advanced
tab of the edit view) define:
{
"customTransforms": {
"stringFun": {
"extractionFn": {
"type": "javascript",
"function": "function(x) { try { return decodeURIComponent(x).trim().charCodeAt(0) } catch(e) { return null; } }"
}
}
}
}
Then in the dimensions simply reference stringFun
like so:
$countryURL.customTransform('stringFun')