A data cube is a multidimensional data model used to organize and visualize aggregated data. Data cubes contain data from one or more tables and provide an interface for users to explore a data set. This topic explains how to create and configure a data cube in Imply Polaris.
Create a data cube
A data cube contains data from one or more tables. Before you create a data cube, verify that the data you want to present in a data cube is available in a table. See the Data overview to learn about creating tables and loading data.
ManageDataCubes roles can create data cubes in Polaris. See role groups for default role assignments by group.
To create a data cube, follow these steps:
- Click Data cubes in the left sidebar.
- Click New data cube in the top left corner.
- Choose between a table or a SQL query as your data source:
- If you choose a table, the table must already be created in Polaris.
- For a SQL query, write a query that selects data used to populate the data cube. You can add queries against any data source. Use this if you want a subset of the tables's data, or you otherwise want to use a SQL query to select and manipulate the data.
- Give your data cube a descriptive name.
- Choose whether to auto-fill dimensions and measures. When enabled (recommended), Polaris creates a dimension for each column in the table. It also makes some inferences about the data to create several measures. For instance, it creates a measure that represents the count of events returned by the query underlying the data cube. The following figure shows the dimensions created based on the Koalas to the Max data:
For more information about dimension and measure detection, see Schema detection.
After you create a data cube, you can adjust any settings in the edit screen.
You can create additional data cubes by duplicating an existing data cube and editing its properties.
Edit a data cube
To edit a data cube, click the pencil icon in the data cube header.
Within the edit view, you can change the title, description, color theme, default timezone, and data source for the data cube. You can also edit and create dimensions and measures from their respective tabs.
If the underlying schema of the table changes, you can update the data cube schema as well. To do so, access the Dimensions or Measures tab in the Edit data cube view. If there are changes to the underlying schema, the Suggestions button indicates the number of changes detected. Click the Suggestions button to review and accept the suggestions as needed.
For more information about schema detection, see Schema detection.
The dimensions and measures of a data cube make up the schema for the data cube. When you create a data cube, Polaris can derive the schema from the base data source, which you can modify as needed.
How schema detection works
Imply looks at the dataset metadata and uses the returned list of columns, their types, and their aggregation (in case of rollup) to determine what dimensions and measures to suggest.
Imply generates dimensions and measures by applying the following rules to the discovered underlying column types:
- Time columns get mapped to a dimension with automatic bucketing by default.
- String columns get mapped to a dimension.
- Numeric columns get mapped to a
SUMmeasure or an otherwise appropriate measure if the column is marked as being aggregated as part of rollup.
Schema detection limitations
While schema detection enables you to set up a new data cube quickly, you may need to test it and tailor it to suit your needs. Try modifying or deleting the auto-generated dimensions or measures. You can always access them in the Suggestions tab if you decide to revert back.
In particular, schema detection cannot detect these common scenarios:
- String columns that you might want to see as
countDistinct- number of unique values.
- The perfect granularities to apply to time and numeric dimensions.
- Lookups that you might want to apply to certain dimensions.
- Dimensions that correspond to a URL.
- Measures that are interesting when filtered on something.
- Measures that should be seen as a ratio, or some other post aggregation.
Set a default data cube view
When a user navigates to a data cube, the data cube displays data from the latest day and shows the first measure in the list of available measures by default.
You can modify this default view by adding filtering conditions or by setting a specific dimension to be shown with the default settings.
Create a default view:
- Create the view you want to save as your default view in the data cube.
- Open the options menu by clicking the toggle icon at the top right of the data cube view.
- Click Update data cube defaults.
- Click Set current view as default.
- Click Save for all users.
To download data cube data, click the download icon in the top navigation bar. The Download data page appears with the following fields:
- Filename: Specify the name of the file to generate and download based on the data.
- Max columns to download: Specify the maximum number of columns to download.
- Max rows to download: Specify the maximum number of rows to download.
- Format: Specify the file format for your download. You can use comma-separated values, tab-separated values, XSL format, and newline-delimited JSON format.
- Include metadata: Enable this option to include metadata in the download file. If enabled, the metadata shows the name of the data cube and filtering conditions under which the data was generated, such as
Data cube,Wikipedia Filter,"Jun 26 - Jun 27, 2016 9:32pm".
Once you click Download, the download status bar indicates the progress of your download.