Managing data cubes
This section explains how to create and configure your data cubes.
Creating a data cube
Create a new data cube from the home view by clicking the + Data cube
button.
The Create new data cube
modal allows you to define the source data for a data cube.
Checking the Auto fill dimensions and measures
checkbox will automatically detect your schema
and create dimensions and measures accordingly.
Another way to create a new data cube is to duplicate an existing data cube and edit its properties.
After creating the data cube you will be taken to the edit screen to adjust any details.
Editing a data cube
Click the pencil icon in the data cube header to edit a data cube.
Within the edit view you can change the title, description, and default timezone of the data cube. You can also edit and create dimensions and measures from their respective tabs.
Schema detection
Pivot will automatically detect the columns in your data source and recommend some possible dimensions and measures.
Schema detection can be helpful when you are creating a new data cube in which case it will allow you to initialize the data cube with some dimensions and measures.
The other use for schema detection is when you evolve your schema and add columns over time.
If you have just added a column and you want a corresponding dimension to be created for it, you can use schema detection to automatically construct that column for you.
This functionality is accessible from the Suggestions
tab of the
New dimension and New measure modals.
How schema detection works
Pivot looks at the dataset metadata and uses the returned list of columns, their types, and their aggregation (in case of rollup) to determine what dimensions and measures to suggest.
Pivot generates dimensions and measures by applying the following rules to the discovered underlying column types:
- Time columns get mapped to a dimension with automatic bucketing by default.
- String columns get mapped to a dimension.
- Numeric columns get mapped to a
SUM
measure or an otherwise appropriate measure if the column is marked as being aggregated as part of rollup. - ApproximateHistogram columns get mapped as a 98th percentile measure.
- HyperUnique and ThetaSketch columns get mapped to a
countDistinct
measure.
Limitations
While schema detection is an invaluable tool for quickly getting you up and running with a new data source, it can never do a perfect job.
Once you create a data cube you should play with it and tailor it to be perfectly suited to your needs.
Don't hesitate to change and delete the auto-generated dimensions and measures; they will always be there in the Suggestions
tab.
In particular, schema detection cannot detect these common scenarios:
- String columns that you might want to see as
countDistinct
- number of unique values. - The perfect granularities to apply to time and numeric dimensions.
- Lookups that you might want to apply to certain dimensions.
- Dimensions that actually correspond to a URL.
- Measures that are interesting when filtered on something.
- Measures that should be seen as a ratio, or some other post aggregation.
Defaults
The default settings control what options are selected in the data cube view when a Pivot user first navigates to the data cube. By default, the data cube shows data from the latest day and which represents the first measure in the list of available measures for that data cube.
You can modify these defaults, add filtering conditions, or set a specific dimension to be shown with the default settings.
To do so, in the data cube edit view, expand the More
chevron and select Defaults.
For the default time filter duration, selected measures, pinned dimensions, refresh rate, or timezone, you can choose new values by clicking the field and selecting new values from the options shown. Be sure to click the Save button to apply your changes.
Default filter values allow you to supplement the default filter, Latest day, with additional filtering criteria. Specify filters as Plywood expressions. For example, you can show data generated in New York by adding the following filter:
$cityName.match("New York")
Any Plywood expression is available to you to use in the Default filter field.
By default, no dimensions are chosen in the data cube view.
To add a default dimension, type the name of the dimension, such as cityName, in the Default shown dimensions field. When you enter a dimension, the editor automatically substitutes the dimension name with a full JSON expression that matches the name and specifies how the results are to be sorted.
For example, the following configuration filters by city name and sets the city name to appear by default in the data cube:
The data cube view would appear as:
Creating time buckets
You can also use the Default shown dimensions field to create segments or buckets of a given dimension by which you're filtering. For example, if you are using a Latest day
filter, you can create hour-long buckets to compare the number of events that happened within every hour of the latest day. To create buckets, use the timeBucket
function. timeBucket
accepts an ISO 8601 duration expression, and creates the duration that you specify. The ISO 8601 expression for one hour is PT1H
. You can use timeBucket
in a JSON string like this:
[{"dimension":"__time","bucketAction":{"op":"timeBucket","duration":"PT1H"},
"sortType":"dimension","direction":"ascending"}]
Save your JSON string in the Default shown dimensions field:
This results in a line chart with markers for the total number of events in every hour of the latest day:
If you need help creating other ISO duration expressions to specify temporal intervals other than one hour, you can search for an online tool that can generate these expressions for you.
Advanced options
The following advanced options are available when you expand the More
chevron.
Subset filter formula: Specify a plywood formula that applies a mandatory, hidden filter to all queries made through this data cube. See row level restriction.
Required filter token: Specify a token that corresponds to a filter token defined in a user role. See token based row level access control.
Primary time dimension: Specify the time dimension that corresponds to when the data point was actually added. This is used for checking the freshness of data and also potentially enforcing a time filter. Defaults to Time
.
Enforce time filter: If set to true, ensures that every query is filtered on the primary time dimension. This should be used if time unbounded queries are likely to be slow due to the volume and tearing of data.
Instances: Configure multiple filtered instances within a single data cube via a configuration object. See example configuration.
Refresh time: Configure how the data cube will calculate the data's latest time.
Defaults to Query
, which means it will query the data to learn about the latest data ingested. This can also be set to predefined in which case latest current time will always be assumed. An alternative 'fixed' time can also be defined as the 'fixedMaxTime' property in the options.
Query caching: Specify query caching behavior. Allowing caching can greatly speed up exploration but can also cause results to be a little out of date especially in realtime rolled up datasets.
Dimension and measure formulae visibility: You can hide formulae for dimensions and measures that appear in the info box of dimensions and measures. Formulae are shown by default. An administrator can select Hide formulae when the formulae might reveal information that is too sensitive for all Pivot users in the organization to view. When the Hide formulae option is selected, an administrator can still view the formula for a dimension or module in its Settings view.
Data cube options
Additional data cube configuration options as JSON with any of the following properties:
fixedMaxTime
: Specify a fixed time value for data cubes to use as the latest "max time" instead of querying for it or using the current time. This configuration should be used in conjunction with theRefresh time
setting set to 'fixed'.timeoutOverride
: Override a connection-wide timeout for queries made through this data cube.exactResultsOnly
: If true, disallows approximate aggregators and force Pivot to make groupBy over topN queries.customAggregations
: Map any Druid aggregation to a Pivot measure. Aggregation definitions should be keyed on measure names.customTransforms
: Map any Druid extraction function function to a Pivot dimension. Extraction function definitions should be keyed on dimension names.crossTabBatchSize
: Configure the number of cross tab rows loaded at once (defaults to80
).piiMask
: Define a threshold below which data is considered Personally identifiable information (PII) and hidden. See the piiMask example.restrictedModeProperties
: Define a restricted edit mode for access control which grants users limited edit functionality for a data cube. See the restrictedModeProperties example.
Time filter presets
Each section of presets in the time filter menu can be configured with custom presets. See an example of latest presets.
latestPresets
: An array of preset objects that will override theLATEST DATA
presets in the time filter menu.currentPresets
: An array of preset objects that will override theCURRENT
presets in the time filter menu.previousPresets
: An array of preset objects that will override thePREVIOUS
presets in the time filter menu.
Querying options
splitLimits
: Provide an array of numbers that will override the default options of[5, 10, 25, 50, 100]
shown in the limit dropdown of the split menu.ignoreDimensionInExplain
: Provide a list of dimensions to ignore when computing an explanation.rankExpression
: Configure the measure whose value is used to order dimension values in the dimension filter menu. Defaults tocount
.druidContext
: An object that will be passed in as the Druid query context.priority
: A number that will be passed as 'priority' into the Druid query context.maxDownloadLimit
: Configure a limit (in number of rows) to downloads.largeDownloadInterval
: Configure a limit (as a time interval) to downloads.
Visual options
suppressOverallByDefault
: If true, hides the "overall" element of a visualization by default.boostPrefixRank
: If true, will "boost" dimension values where matches occur at the beginning of the string over values with matches found in other positions in the dimension filter menu.boostSelectedMeasures
: If true and the data cube view is in multi-measure selection mode, will "boost" selected measures to the top of the measures list in the measure picker menu.grayOutTrivial
: If true, renders "trivial" dimensions in gray in the dimensions panel. A continuous dimension is trivial if it has a max value equal to its min value. A discrete dimension is trivial if it has less than 2 values.alwaysShowCompareControl
: If true, will remove the ability to toggle time compare.disablePreview
: If true, disables the preview of values in the dimension and measure modals.showExplainBadge
: If true, shows a badge with the standard deviation for the sample values used for the explanation.
Examples
Instances
{
"dimensionFormula": "$language",
"values": [
{
"title": "English",
"value": "en"
},
{
"title": "Chinese",
"value": "zh"
},
{
"title": "Spanish",
"value": "es"
}
]
}
dimensionFormula
is a plywood formula that operates on a dimension.
values
is a list of objects with keys of title
, value
where value
is the value that will be matched with the dimensionFormula
expression and title
is the label displayed in the UI.
piiMask
{
"piiMask": {
"aggregate": "$main.count()",
"threshold": 10,
"maskText": "HIDDEN [%i]",
"otherText": "Other (%i)",
"hideAggregates": false
}
}
Note that %i
will be resolved as the row index of the datum
latestPresets
latestPresets
appear as time filter values in the data cube view. You can create presets for custom time values, as in the following
example:
{
"latestPresets": [
{ "name": "1M", "pillLabel": "1 minute", "selection": "$m.timeRange('PT1M', -1)" },
{ "name": "5M", "pillLabel": "5 minutes", "selection": "$m.timeRange('PT1M', -5)" },
{ "name": "30M", "pillLabel": "30 minutes", "selection": "$m.timeRange('PT1M', -30)" },
{ "name": "1H", "pillLabel": "1 hour", "selection": "$m.timeRange('PT1H', -1)" },
{ "name": "6H", "pillLabel": "6 hours", "selection": "$m.timeRange('PT1H', -6)" }
]
}
In the UI, the value of name
appears as the UI label in the presets dropdown, while the optional pillLabel
appears as the label when the
value is selected. If pillLabel
is
not provided, name
is used in both places. The selection
value is the interval of
time selected. The sample presets above would appear as follows:
In the selection
value, $m
is resolved as the data's maximum time and $n
as the current time.
Since $m
is a time value, if you would like to add a preset to view all data for the last seven days,
with full data for each day, you would need to adjust $m
as shown in the following example:
{
"latestPresets": [
{ "name": "7D", "pillLabel": "7 days", "selection": "$m.timeFloor('P1D').timeRange('P1D', -7)" }
]
}
restrictedModeProperties
When set, this property will expose an additional access control list configuration in the data cube's "Access" section, enabling the ability to grant users and/or roles a subset of full edit functionality.
{
"restrictedModeProperties": [
"dimensions",
"measures"
]
}
The configuration above would allow users and roles granted "Restricted edit access" on the "Access" screen the ability to edit the data cube's dimensions and measures, but nothing else.
The values that can be set for this array are:
"title"
: The data cube name"description"
: The data cube description"theme"
: The data cube theme"dimensions"
: Dimensions"measures"
: Measures"group"
: Dimension groups"defaultDuration"
: Default time filter duration"defaultSelectedMeasures"
: Default selected measures"defaultPinnedDimensions"
: Default pinned dimensions"defaultRefreshRate"
: Default refresh rate"defaultTimezone"
: Default time zone"defaultFilter"
: Default filter"defaultSplits"
: Default shown dimensions"subsetFormula"
: Subset filter formula"filterTokens"
: Required filter token"specialTimeDimension"
: Primary time dimension"enforceTimeFilter"
: Enforce time filter"instances"
: The possible instances for the data cube"refreshTimeWith"
: Refresh time"queryCaching"
: Query caching