Manage data cubes
A data cube is a multidimensional data model used to organize and visualize aggregated data. Data cubes contain data from one or more tables and provide an interface for users to explore a data set. This topic explains how to create and configure a data cube in Imply Polaris.
View data cubes
Click Data cubes in the left sidebar to view all data cubes.
You can click the star icon on the right side of a data cube in the list to identify it as a favorite. Polaris displays favorites at the top of the page for easy access.
Create a data cube
A data cube contains data from one or more tables. Before you create a data cube, verify that the data you want to present in a data cube is available in a table. See the Data overview to learn about creating tables and loading data.
Users with AdministerDataCubes
or ManageDataCubes
permissions can create data cubes in Polaris. See Permissions assigned to predefined groups for default permission assignments.
To create a data cube, follow these steps:
- Click Data cubes in the left sidebar.
- Click New data cube in the top left corner.
- Choose between a table or a SQL query as your data source:
- If you choose a table, the table must already be created in Polaris.
- For a SQL query, write a query that selects data used to populate the data cube. You can add queries against any data source. Use this if you want a subset of the table's data, or you otherwise want to use a SQL query to select and manipulate the data.
- Choose whether to Auto-fill dimensions and measures. When enabled (recommended), Polaris creates a dimension for each column in the table. It also makes some inferences about the data to create several measures. For instance, it creates a measure that represents the count of events returned by the query underlying the data cube. The following figure shows the dimensions created based on the Koalas to the Max data:
For more information about dimension and measure detection, see Schema detection.
- Click Next: Create data cube.
- Complete the General properties for the data cube:
- Name: Name of the data cube.
- Description: An optional description.
- Color theme: The color theme to apply to the data cube.
- Default timezone: The data cube's timezone.
- Minimum auto-refresh rate: The minimum rate at which data in the data cube will refresh.
- Click Save to save the data cube.
After you create a data cube, you can adjust any settings in the edit screen.
You can create additional data cubes by duplicating an existing data cube and editing its properties.
Edit a data cube
To edit a data cube, click the pencil icon in the data cube header.
Within the edit view, you can change the general properties of the data cube.
You can also edit and create dimensions and measures from their respective tabs, and manage access and access filters for the data cube.
If the underlying schema of the table changes, you can update the data cube schema as well. To do so, access the Dimensions or Measures tab in the Edit data cube view. If there are changes to the underlying schema, the Suggestions button indicates the number of changes detected. Click the Suggestions button to review and accept the suggestions as needed.
For more information about schema detection, see Schema detection.
Manage access
By default, all users in your project can view a newly created data cube and all of its underlying data. You can use the Access and Access filters tabs to optionally control access to individual data cubes, and filter the data specified user groups can access in the data cube.
Data cube access
The data cube creator and users with the AdministerDatacubes
permission assigned to their profile can edit the data cube. See Permissions reference for more information on permissions.
To control access to an individual data cube at a more granular level:
- Edit the data cube and click the Access tab.
- Click Add people.
- Select one or more options from the drop-down lists for the following access levels:
Can view: View the data cube. Users with this access level also need theAccessVisualization
permission assigned to their profile.
Can download: View and download the data cube. Users with this access level also need theAccessDownloadData
permission assigned to their profile.
Can edit: View, download, and edit the data cube. Users with this access level also need theManageDataCubes
orAdministerDataCubes
permission assigned to their profile.
In cases where different permissions on the individual level and the group level apply to the same user, Polaris applies the most privileged permission. For example, if an individual user has Can view access to a data cube, but they are a member of a group with Can edit access, they have permission to edit the data cube.
Access filters
Access filters allow you to apply a SQL predicate to a data cube's underlying data. You assign the access filter to a user group, so that members of the group view the filtered data when they access the data cube.
You can create multiple access filters for the same data cube, to enable groups of users to have a unique, filtered view of the underlying data.
Access filters are transparent to group members—Polaris provides no indication that they are viewing filtered data.
To create an access filter:
- Edit the data cube and click the Access filters tab.
- Click Enable and require access filters.
Once you enable access filters, only users in groups with applied access filters can query the data cube.
- Click New access filter and enter the following details:
- Name: A name for the access filter.
- Filter: A SQL query to select data for the filter. Preface column names with
t.
—for example,t.country
. - User group: The user group to attach to the filter. You can only attach a user group to one access filter.
- Click Create access filter.
- Use the Allow combining multiple filters checkbox to determine the behaviour when a user is a member of two or more user groups with access filters applied:
If checked, Polaris combines the filters with the OR operator.
If unchecked, the user can't view any data in the data cube. - Click Save.
Data cube access example
Imagine you are running an auction system with buyers and sellers. You have an auctions
table that contains a record of every closed auction, and has columns for buyer_id
and seller_id
.
You want to offer sellers a way to view their own data, and prevent them from viewing any other data. At the same time, you want your internal users to be able to view all of the data.
Here's how you could set this up:
- Create the following roles:
InternalUser
Assign this role to all internal users in your organization.ExternalSeller
Apply theAccessVisualization
permission and assign this role to all sellers.SellerA
,SellerB
, etc
Assign these roles to the sellers in each seller organization (where the letter A, B, etc is the seller organization name).
- Create the following data cubes on the
auctions
data source:Auctions_internal
Create dimensions forbuyer_id
andseller_id
to allow your internal users to query the data.
Apply the Can edit access level to theInternalUser
role.
This data cube doesn't need an access filter.Auctions_sellers
Apply the Can view access level to theExternalSeller
role.
Create an access filter for Seller A, apply a SQL query to select data with Seller A'sseller_id
and assign a user group that belongs to theSellerA
role.
Create additional access filters for the other sellers.
This data cube doesn't need a dimension forseller_id
because it is always filtered to a singleseller_id
value.
You could also set up a similar arrangement for buyers, by creating an ExternalBuyer
role and an Auctions_buyers
data cube with the appropriate buyer access filters.
Create a link to a data cube
You can create a URL that links directly to a data cube. See Query parameters reference for information on the query parameters Polaris supports in the URL.
Schema detection
The dimensions and measures of a data cube make up the schema for the data cube. When you create a data cube, Polaris can derive the schema from the base data source, which you can modify as needed.
How schema detection works
Imply looks at the dataset metadata and uses the returned list of columns, their types, and their aggregation (in case of rollup) to determine what dimensions and measures to suggest.
Imply generates dimensions and measures by applying the following rules to the discovered underlying column types:
- Time columns get mapped to a dimension with automatic bucketing by default.
- String columns get mapped to a dimension.
- Numeric columns get mapped to a
SUM
measure or an otherwise appropriate measure if the column is marked as being aggregated as part of rollup.
Schema detection limitations
While schema detection enables you to set up a new data cube quickly, you may need to test it and tailor it to suit your needs. Try modifying or deleting the auto-generated dimensions or measures. You can always access them in the Suggestions tab if you decide to revert back.
In particular, schema detection cannot detect these common scenarios:
- String columns that you might want to see as
countDistinct
- number of unique values. - The perfect granularities to apply to time and numeric dimensions.
- Lookups that you might want to apply to certain dimensions.
- Dimensions that correspond to a URL.
- Measures that are interesting when filtered on something.
- Measures that should be seen as a ratio, or some other post aggregation.
Set a default data cube view
When a user navigates to a data cube, the data cube displays data from the latest day and shows the first measure in the list of available measures by default.
You can modify this default view by adding filtering conditions or by setting a specific dimension to be shown with the default settings.
Create a default view:
- Create the view you want to save as your default view in the data cube.
- Open the options menu by clicking the toggle icon at the top right of the data cube view.
- Click Update data cube defaults.
- Click Set current view as default.
- Click Save for all users.
Download data
To download data cube data, click the download icon in the top navigation bar. The Download data page appears with the following fields:
- Filename: Specify the name of the file to generate and download based on the data.
- Max columns to download: Specify the maximum number of columns to download.
- Max rows to download: Specify the maximum number of rows to download.
- Format: Specify the file format for your download. Options are:
- CSV (Comma-separated values)
- TSV (Tab-separated values)
- XSLX (Excel spreadsheet)
- JSON (Newline-delimited JSON)
Once you click Download, the download status bar indicates the progress of your download.
Data cube options
Click the options icon in the top navigation bar to set the following data cube options:
- Timezone: Set the data cube’s timezone.
- Query cache: Enable or disable the query cache for all dashboards and data cubes. The cache setting persists until you reload the page. The cache optimizes query performance, but can cause results to be slightly out of date—especially for stream-ingested rolled-up datasources.
- Refresh rate: Set the rate at which data in the data cube refreshes.
- View raw data: View and download the raw data underlying the data cube.
- View essence: View and copy the JSON structure of the data cube.
- Monitor queries: View the underlying queries Polaris makes when you’re working with the data cube. Click Enable monitoring in the dialog to enable the feature.
- Update data cube defaults: Update the initial settings for the data cube view when you first navigate to it.
- Reset view: Reset the data cube view to the default view.
Click the ellipsis in the top navigation bar to access the following options:
- Set up alert: Set up an alert for the data cube.
- Schedule report: Set up a report for the data cube.
- Embed this view: Create an embedding link to this data cube view.
- Add to/Remove from favorites: Identify the data cube as a favorite or remove it from favorites. Polaris displays favorites at the top of the main data cubes page for easy access.