The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2021.05-1. Read these release notes carefully before upgrading to 2021.05-1.
If you are upgrading by more than one version, read the intermediate release notes too.
See Previous versions for older releases.
New to Imply? Get started with an Imply Cloud Free Trial or start a self-hosted trial at Get started with Imply!
With Imply Cloud, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.
Changes in 2021.05-1
- Fix for application error tracking
- Revert fix introduced in 2021.05 to address Hadoop race condition
- Fix bug preventing updating, adding, or deleting auto-compaction configurations when one or more auto-compaction configurations created prior to 2021.04 exist
- Fix bug preventing changing auto-compaction
compactionTaskSlotRatiosettings when one or more auto-compaction configurations created prior to 2021.04 exist
Changes in 2021.05
The current version of Imply bundles version 2021.05.0-iap of the Imply distribution of Druid.
Automated metadata cleanup
You can configure automated cleanup to remove records from the metadata store after you delete delete some entities from Apache Druid:
- segments records
- audit records
- supervisor records
- rule records
- compaction configuration records
- datasource records created by supervisors
This feature helps maintain performance when you have a high datasource churn rate, meaning you frequently create and delete many short-lived datasources or other related entities. You can limit the length of time to retain unused metadata records to prevent your metadata store from filling up. See Automated cleanup for metadata records.
You can use the
ARRAY_AGG function to flatten an array of strings at query time. It enables reporting and data-export scenarios that require flattening of an array of strings. See
ARRAY_AGG in the list of Aggregation functions.
The Expression aggregator is a native Druid aggregator function that lets you use Druid native expressions to perform
reduce operations on any number of input columns. This adds significant flexibility to develop functionality unavailable with the default native Druid aggregation functions. It is the underlying technology for aggregate functions such as
- Faster inner joins and HyperLogLog (HLL) aggregators
- Improved ingestion reliability for large files with dynamic partitioning
GA support for query cancellation
If the user navigates away from a running query in a data cube or dashboard, Pivot now cancels the queries and saves server processing time. This also reduces query load by stopping a running query when the user has modified the data cube by, for example, changing the filter or the shown dimensions.
Platform updates in this release focus on enhancements to Enhanced Imply Private on Google Kubernetes Engine. Be aware that upgrading to this release may require system downtime for Enhanced Imply Private on GKE deployments. For details, and additional upgrade considerations, see the following notes.
Node capacity increase for Enhanced Imply Private on GKE
For deployments based on Enhanced Imply Private on Google Kubernetes Engine, the number of supported managed nodes has been increased to 1,000 nodes across all clusters for a given Imply Manager instance (assuming default CIDR settings of GKE).
This change requires a network configuration change that requires a hard update. This means Imply deployments will not be accessible for about 30 minutes during the upgrade. Please plan your upgrade during downtime of your infrastructure or have a planned maintenance outage.
Previously, due to the default address reservation configuration and subtracting IPs required for system use, the maximum amount of nodes that could be started by the Imply Manager across all managed clusters was 61.
The reservation has been changed to limit the capacity to 8 pods per node, and 16 pods per node on system nodes. As a result, support for managed node has been increased to around 1000 nodes across all clusters within the Manager, given the default CIDR settings.
ZooKeeper per cluster for Enhanced Imply Private on GKE
Previously, Enhanced Imply Private on Google Kubernetes Engine used a shared ZooKeeper ensemble for all clusters. There is now a ZooKeeper ensemble per cluster, which helps prevent the possibility of a cluster overwhelming ZooKeeper.
Note the following upgrade notes:
- All new clusters created after updating to 2021.05 will use a ZooKeeper per cluster automatically.
- For existing clusters, you can determine ZooKeeper utilization by its ZooKeeper hosts path under the advanced settings in the Imply Manager. It will most likely be
imply-default-zookeeper:2181but may be
imply-<install-name>-zookeeper:2181if a secondary cluster was created via the installer.
To update ZooKeeper, change the ZooKeeper hosts path to
<first-13-of-the-cluster-id> is the first 13 characters of the cluster ID. You can find the cluster ID in the URL on the Setup page. The relevant portion is everything up to the 2nd dash (-):
After updating the path, click Apply. This will result in brief down time for the cluster.
GCP metadata store autoscaling documented
The Cloud SQL instance used as the metadata store for Enhanced Imply Private on Google Kubernetes Engine supports autoscaling. This feature was previously undocumented. The documentation has been enhanced to describe this behavior.
For details, see the GCP metadata store documentation.
Imply user management with SSO support (alpha)
This release introduces a new user management interface in Imply Cloud with support for external identity providers, as an alpha feature. The new Imply user management provides for a single sign-on (SSO) experience for users, with support for SAML and OIDC identity systems, and a new, central user management console for administrators.
For details, see User management with SSO support.
- Fix SQL-based data cubes fail against datasources with a '.' in the name
- Fix data sketch measures don't get translated to SQL properly in SQL data cubes
- Fix casting string dimensions to integer does not work in SQL data cubes
- Fix deleted roles should not be shown in ACLs
- Fix users should be able to select and copy value from role id field
- Fix alert emails should not render the link into Pivot when
linkHostNameis not set
- Fix alert filter previews do not correctly query against the selected data cube instance
- Fix zero values are incorrectly highlighted in red in the table view
- Fix Plywood incorrectly casts Strings to Numbers using
- Remove non-functional "terms delegate" checkbox from "special columns" UI
- Add support for
milliSecondsInIntervalequivalent functionality in Pivot SQL
- Improve support for emitting tracking event metadata when an export fails
- Fix Clarity alerts can fire incorrectly when configured against newly created clusters
- Fix incorrectly disabled Apply button in the Connect tab for the Kinesis UI Dataloader
- Update Jetty to v9.4.40 to address a bug with Jetty v9.4.39 that breaks some SSL queries
- Fix missing virtual columns in a
- Added options to automatically clean up metadata database
HadoopIndexTasktemp segment renaming race condition resulting in
- Fix JVM Configs not getting set in Enhanced Imply Private on GKE
preStophook failures with invalid state.
- Fix changing machine types does not result in a rolling update and could cause an outage in Enhanced Imply Private on GKE
- Add support for overriding Auto Repair, Auto Upgrade and max node pool size in Enhanced Imply Private on GKE
Changes in 2021.04-3
- Update to Jetty
v9.4.40to address a bug with Jetty
v9.4.39that breaks some SSL queries
Changes in 2021.04-1
- Fix unable to edit user roles in the Imply Manager for Imply Private
Changes in 2021.04
Row and column level security (alpha)
You can use the SQL Views feature of Imply Druid to limit user access to rows or columns within a datasource. The Imply View Manager lets you define views that expose a subset of data for access to a type of user. You can assign permissions for specific views to user roles to enable or disable user access. See the following:
Configure segment granularity in automatic compaction (alpha)
You can configure automatic compaction to go from a finer granularity to a coarser granularity in the automatic compaction
Pivot 2.0 (alpha)
Pivot 2.0 is a new visualization engine that simplifies and accelerates building visualizations from data that you query. Pivot 2.0 aims to be highly configurable and highly scalable. For more information, see Pivot 2.0.
New GCP machine types and regions
The machine types available for Enhanced Imply Private on GKE (beta) have been extended to include the following:
- n2-highmem-80, consisting of 80 vCPU, 640 GB RAM, and 3000 GB disk.
- n2-highmem-48, consisting of 48 vCPU, 384 GB RAM, and 9000 GB disk
See Google Cloud machine types for more information.
Additional regions include:
New AWS machine type
AWS machine types now include c5ad.8xlarge (32 vCPU, 64 GB RAM, and 1200 GB disk). See Imply Cloud Instance Types for more information.
GKE Public access and Ingress security policy support
The Enhanced Imply Private on GKE (beta) now allows you to secure public access to Enhanced Imply Private by specifying a Cloud Armor Security Policy to restrict access based on IP. See Imply Cloud Instance Types for more information.
Enabling GKE Ingress now provides for public access, that is access to Imply endpoints from outside the GKE network. The public endpoints are shown in the API tab of the Imply Manager for the cluster.
- Fix reports with multiple recipients send emails with corrupted attachments
- Fix data cubes and dashboards continue to refresh when the browser is not in focus
- Fix "all rows" option for report attachments is not correctly saved when creating or editing a report
- Fix error message does not show up when entering an invalid dimension or measure formula
- Fix invalid inputs can crash the "Advanced" tab in the data cube editor
- Fix help text incorrectly indicates sunburst visualization requires 2 dimensions
- Fix dimensions of type "Time" need to be cast to
VARCHARwhen using Pivot SQL
- Fix incorrect compare wrapping on compound measure expressions
- Fix comparisons cannot be disabled in
- [Pivot SQL] Fix issue with incorrect results when comparisons are used against a measure containing
COUNT(*) FILTER (WHERE ...)
- Add support for showing role identifier as a read-only field on the role edit form
- Security updates
- Add the ability to configure a batch ingestion task to wait for newly indexed segments to become available for query on Historical services before completing.
- Fix compaction no longer works if there are overlapping intervals
- Update the web console to only query required or visible columns
- Enable multiple distinct aggregators in same query
- Fix an issue with compaction where, if you run compaction twice, the second run yields better compaction
- Fix an issue where Kinesis lag continues to increase when there is remaining data for the Kinesis stream
- Enforce allow list for JDBC properties by default
- Improve bitmap vector offset to report contiguous groups
- Vectorize 'auto' long decoding
- Enable the ability to request logs through Kafka emitter
- Support protobuf serialization for Avatica connections
- Fix an issue with
APPROX_QUANTILEreturning the wrong median value when the query plan generates a single TopN/GroupBy query
- Improve performance of queries against
- Add expression filter support for vectorized query engines
- Update protobuf and schema registry
- Update Avro and schema registry
- Fix an issue where joins with subqueries fail to plan in default mode but work as expected in SQL compatible mode
- Fix an issue where compaction no longer works if there are overlapping intervals group by processor for string output expressions
- Fix an issue where columns needed as inputs to
TransformSpecon reindexing are not read automatically by default
- Fix an issue where Kinesis resharding causes EOS messages to logged as '"Events thrown away"" in metrics'
CASTbeing ignored when aggregating on strings after cast
- Fix issue in column projection
- Security fix for CVE-26919
sys.kernel.threads-maxparameter to Kubernetes helm chart
- Allow access to Pivot and Druid console from the Imply Manager during GKE updates
- Add Pivot roles to Imply Manager in GKE
- Improve error message in Enhanced GKE when deployment fails due to GKE Update
/mnt/var/tmpin GKE Enhanced
- Add HTTP to HTTPS redirection for GKE Enhanced
- Fix Imply Manager does not load in GKE mode
- Fix GKE installation fails when external SQL and bucket are used
Changes in 2021.03
Druid compaction now preserves the query granularity for compacted segments by default. For manual compaction, you can specify a query granularity for compacted segments. For example, if you ingest your data with
hour query granularity, but after two months, you only need
day granularity, you can configure a compaction task to use
day granularity for compacted segments older than two months. See Data handling with compaction.
Increased security for HTTP and HDFS input sources
You can now configure Druid to apply additional restrictions on the allowed protocols for URIs with HTTP and HDFS ingestion sources. This feature allows cluster operators more control over how users can load data into the cluster. The two new settings are
druid.ingestion.http.allowedProtocols, and default to
["http", "https"] respectively. If ingestion tasks are using other protocols than these defaults, configure this setting to include them when upgrading to prevent tasks from failing. See Druid upgrade notes for more information.
Configurable option for result fetch size
A new configuration property,
druid.sql.avatica.minRowsPerFrame, a counterpart to
druid.sql.avatica.maxRowsPerFrame, gives cluster operators more control over the size of result ‘fetch’ size. This indirectly reduces the overall number of fetch requests required and can improve JDBC performance with large result sets because they can require many thousands of separate fetches to transfer. See SQL in the configuration reference.
Ingestion support for Confluent Schema Registry
Beta. This feature improves Druid support of Apache Kafka + Apache Avro + Confluent Schema Registry with updated library dependencies, providing support for authorization headers and Schema Registry client configuration. This release also introduces experimental support for Kafka + Protobuf + Schema Registry. Both Avro and Protobuf Schema Registry support is limited to the older "parser" based ingestion specifications instead of the "input format" which has replaced it. See the documentation for Protobuf.
- Add support for navigating between the Crosstab view and the regular data cube explorer visualizations--visualizations now persist across Crosstab and regular views
- Fix wrong query created when both filter by measure and calculation of change is applied
- Add support for configuring Pivot to show a persistent custom header and/or footer, such as "Confidential," to display for all your Pivot users
- Fix data exports can have incorrect column headers when transformed measures are present
- Fix alert setup form inaccurately renders comparison as "Previous period" after the "current period" has been redefined
- Fix alert comparison period is not correctly preserved by link from alert occurrence to data cube view
- Add support for adding multiple filters against string dimensions--you can now copy and paste to add many string filters simultaneously
- Fix CSV/TSV exports incorrectly show some column values as "undefined"--a new config setting lets you override the default maximum column/row export value
- Add support for adjusting the width of string filter menus in the data cube explorer so that users can view longer values
- Security updates
- Fix the Web Console Services tab to display which Coordinator and Overlord nodes are currently serving as the leader, providing additional insight to cluster operators
- Fix Service view actions when grouping in the Web Console
- Fix runtime error when
IndexedTableJoinMatchermatches long selector to unique string index
- Introduce primitive-typed
incrementmethods for granularity to improve performance
- Remove namespace property that does not exist from JDBC lookup for the Web Console
- Add a parser per
- Support for Confluent schema registry with authentication. Experimental
- Fix streaming ingestion failure caused by empty or null rows in the topic
- Fix a Java runtime exception error while applying rule
DruidQueryRule(AGGREGATE)post upgrade from Imply-3.2.9 TO Imply-4.0.4
- Fix maxBytesInMemory for heap overhead of all sinks and hydrants check
- Add JDBC handler config for minimum number of rows per frame to improve performance
- Add SQL functions to support bitwise operations on
- Support specifying
- Preserve queryGranularity metadata during compaction by default
- Add a feature to improve query planning for correlated subqueries. To turn on this experimental feature, set
- Fix OvershadowableManager inefficiently handles large numbers of segments in a single time chunk (#10892)
- Fix historicals can use more disk than configured for the segment cache
- Security updates
- Support for a larger number of segments per node for Imply Private on Kubernetes, including for Enhanced Imply Private on Google Kubernetes Engine
- The Manage Data link is now disabled in Imply Cloud when using direct access Pivot
- Improved support for managing cluster configuration declaratively with the ability to specify Imply versions for clusters in Helm charts
Druid upgrade notes
- Because of the enhanced security of this release around HTTP and HDFS input sources, if you use an HDFS input source for anything other than HDFS, for example WebHDFS, or an HTTP input source for anything other than HTTP or HTTPS, the task will fail upon upgrade. Either update your ingestion spec to conform or update the value for the corresponding configuration property:
Changes in 2021.02
- Fix preserving the columns in a crosstab when accessed through a URL
- Fix default values on global filters are not applied when loading a dashboard
- Fix Pivot SQL timeout after 60000 ms regardless of configured timeout
- Fix using Postgres as a session or state store is broken due to an incompatibility introduced by a library upgrade
- Fix TLS 1.0 and TLS 1.1 are not correctly supported for Pivot metadata storage connections
- Fix alert conditions with percent delta cannot be inputted correctly
- Fix error shown when adding a data cube to favorites
- Fix error shown when attempting to change user profile properties
- Fix global dashboard filters are not updated correctly
- Add support for multiple connections when creating a data cube from Pivot SQL
- Fix reset password link does not include
- Fix filters are not respected when compare or multi-range time selections are used in Pivot SQL
- Fix alert setup form always defines “previous period” as “previous day”
- Improve UX when adding permissions to user roles
- Fix CSV exports on time-series visualizations have incorrect column headers
- Improve error reporting for InputSource errors during sampling
- Add vectorized theta sketch aggregator and rework of
- Add Bitwise math function expressions to the Druid native expression system:
bitwiseConvertLongBitsToDoubleto allow use with double typed columns (#10605)
- Fix the Web Console to treat null as not defined in AutoForm (#10751)
- Improve handling of damaged segments such that the historical unloads damaged segments automatically when lazy on start
- Fix to correctly deserialize empty keys in a map
- Add the latest AWS Web Identity Token Support for Druid to support IRSA
- Fix the Web Console to use new example manifest
- Fix an issue where the Broker starts before SQL metadata view is fully initialized even when
- Fix SegmentAnalyzer to properly close column after retrieving it
- Improve query execution to retain order of AND, OR filter children
- Fix cardinality estimation to calculate numShards independently of partition dimensions
- Fix an issue where the Broker doesn't start if
druid.extensions.useExtensionClassloaderFirstis set to true and extensions load duplicate jars
- Better handling of Kinesis errors and message gap metric for Kinesis ingestion
java.lang.IllegalStateException: 'other' must be a different instance from 'this'error for some vectorized queries using
- Add Ubuntu 18.04 as a supported platform for Imply Private on Linux
- Add support for non-default Kubernetes scheduler via
- Fix using authentication in Imply Private on Kubernetes breaks custom file download
- Fix security section does not appear for a cluster under API in the Cloud Manager UI
- Fix placement of security parameters in the Pivot configuration file