The Imply release notes provide information on features, improvements, and bug fixes in each release. Be sure to read these release notes carefully before upgrading to the 4.0.5 release.
Imply Cloud Free Tier gives you a feature-complete, size-limited Imply Cloud cluster, deployed into your AWS account and managed by Imply (you control the infrastructure and data in your AWS VPC). Sign up for Imply Cloud Free Tier or start a self-hosted trial at Get started with Imply!
Apache Druid 0.20.0-iap contains new features, performance enhancements, bug fixes, and major documentation improvements from our contributors. Check out the complete list of changes and everything tagged to the milestone.
On Druid engine, we have made numerous improvements and we are looking forward to seeing the expanded possibilities the new features bring.
Keeping the performance edge
Druid is constantly at the forefront of analytical database performance. In Imply 4.0 release, we have made a number of improvements on the core engine side to keep us on the edge.
In Imply 3.4 vectorized query execution had been enabled by default. In Imply 4.0, the query engine has been vectorized to support calculations on group by queries as shown in the example below.
A common type of query that cannot be vectorized in previous releases are queries involving expression calculations. In the following example, the total_price column is computed on the fly, and this is an example of a virtual column.
This is a very common usage pattern. In this release, we have made changes to enable vectorized query execution for such queries. As you can see from the benchmarks below, we are reducing the execution time anywhere between 6x to 11x.
Secondary partitioning support for auto-compaction
Optimal query performance depends on optimal data layouts.
It’s often difficult to determine the best data layout at ingestion time, since optimal layout may depend on queries run in the future. At the same time, it is difficult to maintain an optimal data layout as new data constantly arrives in the cluster.
Two major data layout factors that contribute to query performance are segment size and partitioning scheme. We have seen up to 40 times speedup with production workloads with optimal data layouts when comparing to non-optimized layouts.
The ultimate solution is an auto-optimization system that constantly monitors the actual workload and optimizes the data layout. In Imply 4.0 release, we are making a big step towards this goal.
In this release, you can set up auto-compaction rules that reshape your segments into optimal sizes with optimized partitioning schemes as you learn about your workload over time. Those auto compaction rules will continuously run in the background, so that newly arrived data will be optimized over time.
In the following example, queries that filter on specific partitioning columns can quickly eliminate segments without actually reading them.
Types of use cases where this might apply are:
- If you are using streaming ingestion but also want to take advantage of partitioning schemes that allows partition pruning
- If you are appending data on a data source with partitioning enabled
- If you have a sub-optimal hash partition spec with too many shards
- You want to optimize for new query usage pattern without reingestion
Druid web console data source view improvements
The Druid web console now includes statistical distribution of segment size. If you see a significant difference between small-average-max size of the segment, then it’s a good indicator that a compaction pass can help you improve query performance.
Druid web console query view improvements
We have also made numerous improvements on the Druid web console query view as well.
First of all, we made it really simple to find problem in the query text. Now you simply have to click on the link in the error log view to jump to the right place.
The auto-run query has been replaced by Live query that will automatically determine when to rerun the queries as you change filters
In the past, if you made a mistake in the query, you would've needed to wait until the query finishes before you can make changes. You can now easily cancel your queries, so that you can resume working without having to wait for the previous query to complete.
Security vulnerability updates
Many dependencies have been updated to newer versions to address security vulnerabilities.
Druid 0.20.0-iap contains 65 bug fixes; you can see the complete list here.
For a full list of open issues, please see https://github.com/apache/druid/labels/Bug
The fastest, greenest Pivot ever
In Imply 4.0, we’ve made a number of improvements to ensure that Pivot remains the fastest, easiest tool for exploratory self-service analytics, even when operating against data sources with very high cardinality columns.
Query times for filter preview results are now much faster than in previous releases, and ad-hoc filter values can now be applied without waiting for preview values to be returned. This allows users to more rapidly apply filter changes, especially when filtering on well-known dimension values.
Additionally, Imply 4.0 adds Alpha support for query cancellation when using Pivot. With this feature enabled, triggered queries that are no longer needed by the UI will be cancelled across the entire stack, freeing up resources on both the Pivot application layer as well as on the Druid cluster.
Some examples of where query cancellation might impact performance and cost on a cluster include navigating away from a dashboard or data cube when a long-running query is still being processed, switching pages on a dashboard, or making rapid changes to filters and shown dimensions on a data cube.
(Alpha) Support for creating data cubes from SQL queries
As part of an ongoing initiative to offer SQL as an alternative to the Plywood expression language throughout the Pivot UI, we’re very excited to announce an early look at a major new capability in Pivot for Imply 4.0 - the ability to create data cubes using a SQL statement wrapper.
Users can now easily create data cubes that leverage recent improvements to the Druid engine’s SQL querying capabilities, most notably using JOIN statements, with full support for dimensions and measure introspection.
Note that Pivot SQL is still an alpha-stage feature. The team is focused on validating that security, performance, and data correctness are not impacted when using these new capabilities. Until then, we don’t recommend using Pivot SQL in production environments.
Improved dashboarding capabilities
We’ve made a number of improvements to the dashboarding management user experience in 4.0, most notably around updating dashboard tiles to reference new data cubes. In Imply 4.0, we’ll ensure that when updating the data cube for a dashboard tile, all possible tile settings remain in place, including shown columns and filters that exist on the new data cube.
Additionally, we’re adding Alpha support for better batch management of dashboard tiles, allowing dashboard editors to easily update data cube references across all matching tiles in a dashboard.
In Imply 4.0, we are removing support for connecting to data sources other than Imply Druid. We’re doing this to ensure that we can continue to provide the most robust, performant user experience when using Imply Druid. If you’re currently connecting Pivot to Apache Druid or another database, please contact our support team for help migrating to Imply Druid
All changes and bug fixes
- Add support for enabling and disabling individual experimental features in the advanced settings
- Improve performance of filter menu preview queries
- Improve support for dashboards which reference multiple data cubes
- Add support for retaining existing visualization settings if possible when changing the data cube on a dashboard tile
- Improve consistency of results when applying comparisons against overlapping time ranges
- Limit support for sharing system artifacts with "Everyone" to users with the "SeeOtherUsers" permission
- When OIDC is configured, Pivot should automatically initiate the OIDC login flow from the login screen
- [Alpha] Add support for end-to-end cancellation of in-flight queries
- [Alpha] Add support for defining Pivot SQL data cubes using a full SQL statement as the base query
- Fix issue where evaluating whether a data cube can be converted to SQL can cause the browser to crash
- Fix Pivot SQL queries do not correctly set Imply query context attributes
- Fix notification emails for misconfiguration errors not surfacing data cube title when it exists
- Fix alerts should not issue max-time queries before the "check every" condition is met
- Fix alert lifecycle logging incorrectly conflates condition queries with max-time queries
- 3.4.5: Fix visualization error boundary not encompassing all visualization errors
- 3.4.4: Fix comparisons do not work on measures where measure transform is set to values other than "None"
- 3.4.4: Fix some query types can result in an incorrect "Unsupported data type" error
- 3.4.3: Fix measure edit dialog sometimes crashes when entering a SQL formula
- 3.4.2: Fix filters can't be applied before typeahead results return
- 3.4.2: Fix OS X distributions of Imply are blocked by Gatekeeper
- 3.4.2: Fix Pivot redundantly appending to OIDC discovery URL
- 3.4.2 Fix query requests can fail when server-side data cube PII changes have not yet propagated to client
- 3.4.2: Fix clicking on the "Filter" checkbox in the Measures editor causes the modal to be dismissed and the application to stop responding
- 3.4.1: Fix event annotations bubble should only be visible in the hovered chart in multi-measure mode
- 3.4.1: Fix scenario where dashboards can no longer be edited or deleted
- 3.4.1: Fix complex Plywood resplit expressions can cause incorrect results when using comparison periods
- 3.4.1: Fix text input controls sometimes do not respect user input
- 3.4.1: Fix alert preview does not correctly surface an error when the query fails
- 3.4.1: Fix grid visualization does not correctly apply filter by measure settings
- 3.4.1: Fix webhooks can fail to be delivered correctly when dimension values contain special characters
(Alpha) Imply Private Cloud on GCP
Imply is excited to announce the first release of private Imply Cloud on GCP, which is built on Google Kubernetes Engine.
It provides a similar experience of Imply Cloud on AWS in a more secure, self-contained environment completely under customer control.
This implementation provides a tighter integration with GKE and kubernetes environments and users are able to choose GCP instances out of a pre-selected machine pool.
Other changes and bug fixes
3.4.1 and 3.4.3: Imply Cloud now by default uses user email as the user ID in the login prompt window.
3.4.1: The Manager UI has been changed to present only the latest version of each minor release track. For version 3.3, for example, only the latest version, 3.3.5, appears as an upgrade option in the Manager UI. Similarly, for 3.2, only 3.2.8 appears, and so on. Note that this does not affect terms of support, which are unaffected by the UI change.
3.4.1: For any new cluster created with a version of at least 3.3, cloud manager will automatically set druid.query.scheduler.numThreads to druid.server.http.numThreads - 1 via a feature flag (Reserve Threads for Non-Query Request), which is to help guarantee to reserve one thread in the pool for health check.
This step is skipped if users have overridden
druid.server.http.numThreadsis set to 1 The cluster administrator can toggle this on or off in the “Advanced Config” section of the Cluster Setup page.
3.4.1: Imply Cloud now disables “Manage Data” link when the cluster is set to not use Pivot Proxy mode, which further assures data security when running in this mode.
3.4.3: Revamped the “API” cluster page to get rid of redundant and out-of-date information.
3.4.3: Improve cluster rolling upgrade logic by delaying reverting
replicationThrottleLimitback to a small number to accommodate the need for faster data replication.
druid.monitoring.monitors="org.apache.druid.client.cache.CacheMonitorfor Imply Version 3.3 and later by default, which allows Clarity to collect and view query cache metrics.
- Fixed an issue that changing certain properties in the cluster configurations would be treated as a version change then triggered a cluster full-restart.
- 3.4.1: Fixed an issue that sometimes Worker Capacity number is shown incorrectly on the Cluster Overview page
- 3.4.1: Fixed an issue that sometimes Data Node Capacity number is shown incorrectly on the Cluster Overview page.
- 3.4.3: Fixed an issue when an updated cluster version list being used to make the determination cluster update options was incomplete.
Imply Self-hosted Manager
(Beta) Support non-Kubernetes Managed Deployment
Imply now provides a generic installer deployment model that works on non-Kubernetes environments (e.g., on-prem or VMs on cloud) for a self-hosted use case.
This deployment workflow starts with the deployment of the Imply Manager deployment, creating the cluster specification first and then joining the cluster nodes. The cluster administrator is capable of adding and removing nodes dynamically.
Automation of the deployment workflow is supported through the use of user parameters via configuration files.
- Added a 30-day trial license by default in Imply Manager.
- Imply Self-hosted Manager now includes a mode in which cluster admins can see a list of unassigned nodes and choose which cluster a node may join asynchronously. The mode is hidden behind a feature flag and by default disabled.
- Breaking Change: To upgrade to Imply Druid 4.0 with Imply Self-hosted Manager, you must upgrade to the self-hosted Imply Manager 4.0.
Before 4.0, Imply Druid did not read the
JAVA_HOMEenvironment variable. Starting in 4.0, it now uses the
JAVA_HOMEenvironment variables to determine which Java installation to use. Old versions of the Imply agent image have
JAVA_HOMEset incorrectly, and as a result, 4.0 will not work correctly with the old agent image.
Documentation site changes
4.0 introduces a new framework, style, and organization for the Imply documentation. We've improved the organizational structure to correspond to the primary product components: Pivot, Druid, Clarity and the Manager, which appear at top right. The Imply doc category contains documentation that is common across products.
The Druid documentation is now bundled in a more integrated manner with the rest of the Imply docs, with a shared look and navigational structure.
Note that the Imply distribution of the Apache Druid documentation constitutes a subset of the Apache Druid documentation, and may include references to artifacts and concepts relevant to Apache Druid only.
Also note that while the URL paths for versions of the documentation earlier than 4.0 haven't changed, some doc paths for the latest doc version may have changed.
Upgrading from previous releases
If you are upgrading from a previous Imply release, please take note of the following sections.
Druid upgrade notes
Be aware of the following changes between 0.19.0-iap and 0.20.0-iap before upgrading. If you're updating from an earlier version than 0.19.1-iap, please see the release notes of the relevant intermediate versions.
Upgrading from earlier releases
When upgrading from Imply 3.3, which is based on Apache Druid 0.18.0, also note any items in the "Updating from previous releases" section of the Imply 3.3 release notes that may be relevant for your deployment.
Deprecation and removal notices
Docker deployment not supported
Imply no longer recommends nor supports Docker-based deployments for new installations. Existing Docker-based deployments are supported through July 1, 2021. If you are currently using a Docker-based deployment, you should migrate to one of the following deployment modes before that date:
End of support
As of July 15, 2020, Imply version 2.x is no longer supported. If you still have active deployments that use Imply version 2.x, you are strongly encouraged to upgrade to the current version as soon as possible. See Subscription Support Maintenance Terms for more information about supported versions.
Changes in 4.0.1
- Fix users cannot enter custom dimension or measure values
- Fix raw data view does not render all columns on Pivot SQL data cubes
- Improve workflow for migrating from native user management to OIDC with local role authority
- Fix exporting data is not properly gated by the "DownloadData" permission
- Add support for integration with the new Imply License Service
Changes in 4.0.2
- Add support for exposing
headersTimeoutas a Pivot config property
- Fix client query requests do not retry correctly on failure
- Add support for matching defaultRole against external role id
- Fix streaming of export response column headers can fail due to GZIP compression
Changes in 4.0.3
- Fix reset password links should not be shown when passwords are not managed by Pivot
- Fix issue where Zookeeper can enter an unresolvable state causing alerts to not be evaluated
Changes in 4.0.4
- Fix error shown when adding a data cube to favorites
- Fix error shown when attempting to change user profile properties
- Allow users to specify order of AND, OR filters for better performance
- Handle edge case for vectorized OR filters
- Security fix for CVE-2021-25646
Changes in 4.0.5
- Fix filters are not respected when compare or multi-range time selections are used in Pivot SQL
- Fix alert setup form always defines "previous period" as "previous day"
- Fix default values on global filters are not applied when loading a dashboard
- Fix alert conditions with percent delta cannot be inputted correctly
- Fix CSV and Excel exports on time-series visualizations have incorrect column headers
- Fix lookups with empty strings de-serialize correctly
- Allow missing intervals for Parallel task with hash/range partitioning