Release notes

Imply 3.4.4 includes the following packages:

Pivot evaluation

The Imply download includes a 30-day evaluation license of Pivot. Full licenses are included with Imply subscriptions. Contact us to learn more!

Druid highlights

New features

GroupBy and Timeseries vectorized query engines enabled by default

Vectorized query engines for GroupBy and Timeseries queries were introduced in Druid 0.16 as opt-in features. Since then we have extensively tested these engines and feel that the time has come for these improvements to find a wider audience. Note that not all of the query engine is vectorized at this time, but this change makes it so that any eligible query is vectorized. This feature may still be disabled if you encounter any problems by setting druid.query.vectorize to false.

Druid native batch support for Apache Avro Object Container Files

New in Druid 0.19.0-iap, native batch indexing now supports Apache Avro Object Container Format encoded files, allowing batch ingestion of Avro data without needing an external Hadoop cluster. Check out the docs for more details

Updated Druid native batch support for SQL databases

An SqlInputSource has been added in Druid 0.19.0-iap to work with the new native batch ingestion specifications first introduced in Druid 0.17-iap, deprecating the SqlFirehose. Like the SqlFirehose it currently supports MySQL and PostgreSQL, using the driver from those extensions. This is a relatively low level ingestion task, and the operator must take care to manually ensure that the correct data is ingested, either by specially crafting queries to ensure no duplicate data is ingested for appends, or ensuring that the entire set of data is queried to be replaced when overwriting. See the docs for more operational details.

REGEXP_LIKE

A new REGEXP_LIKE function has been added to Druid SQL and native expressions, which behaves similar to LIKE, except using regular expressions for the pattern.

New Coordinator per datasource loadstatus API

A coordinator API can make it easier to determine if the latest published segments are available for querying. This is similar to the existing coordinator loadstatus API, but is datasource specific, may specify an interval, and can optionally live refresh the metadata store snapshot to get the latest up to date information. Note that operators should still exercise caution when using this API to query large numbers of segments, especially if forcing a metadata refresh, as it can potentially be a "heavy" call on large clusters.

Native batch append support for range and hash partitioning

Part bug fix, part new feature, Druid native batch (once again) supports appending new data to existing time chunks when those time chunks were partitioned with hash or range partitioning algorithms. Note that currently the appended segments only support dynamic partitioning, and when rolling back to older versions that these appended segments will not be recognized by Druid after the downgrade. In order to roll back to a previous version, these appended segments should be compacted with the rest of the time chunk in order to have a homogeneous partitioning scheme.

(Alpha) Indexed tables

Previously, lookups tables were the only native table type that supported direct joins, meaning, local joins on data that is available on all query processing nodes. Lookups, however, were limited in that they could comprise only a single key and value column. The Imply distribution of Druid introduces a new type of globally distributed table in 0.19.0-iap, the indexed table.

Indexed tables are multi-column tables that expand what is possible with efficient direct joins on globally distributed data. Indexed tables are backed by Druid segments and distributed among the cluster with broadcast load rules. The segments are created with some additional information that tells Druid how to load the table and which columns are the joinable key columns.

For more information, see Druid indexed tables (alpha) in the Imply knowledge base.

Bug fixes

Druid 0.19.0-iap contains 65 bug fixes; you can see the complete list here.

Fix for batch ingested dynamic partitioned segments not becoming queryable atomically

Druid 0.19.0-iap fixes an important query correctness issue, where dynamic partitioned segments produced by a batch ingestion task were not tracking the overall number of partitions. This had the implication that when these segments came online, they did not do so as a complete set, but rather as individual segments, meaning that there would be periods of swapping where results could be queried from mixed sets of segment versions within a time chunk.

Fix to allow 'hash' and 'range' partitioned segments with empty buckets to now be queryable

Prior to 0.19.0-iap, Druid had a bug when using hash or ranged partitioning where if data skew was such that any of the buckets were empty after ingesting, the partitions would never be recognized as complete and so never become queryable. Druid 0.19.0-iap fixes this issue by adjusting the schema of the partitioning spec. These changes to the JSON format should be backwards compatible, however rolling back to a previous version will again make these segments no longer queryable.

Incorrect balancer behavior

A bug that affected on-prem Druid versions prior to 0.19.0-iap allowed for (incorrect) coordinator operation if druid.server.maxSize was not set. This bug would allow segments to load, and effectively randomly balance them in the cluster (regardless of what balancer strategy was actually configured) if all historicals did not have this value set. This bug has been fixed, but as a result druid.server.maxSize must be set to the sum of the segment cache location sizes for historicals or they will not load segments. No action is needed if you are using Imply Cloud.

Pivot highlights

(Alpha) Pivot SQL

Imply 3.4 introduces the ability to define data cubes using SQL expressions, allowing users to more easily define advanced dimension extractions and measure aggregates without using the Plywood expression language.

This is an alpha grade feature, and should not be used in production environments. For more information on enabling this feature for testing purposes, please refer to the Knowledge Base article.

(Alpha) Event annotations

Additionally, in Imply 3.4, we are introducing the capability to add event annotations to time ranges on data cubes, allowing users to easily annotate real-world events like software releases or advertising campaigns against changes in data cube metrics.

This is also an alpha grade feature, and should not be used in production environments. For more information on enabling this feature for testing purposes, please refer to the Knowledge Base article.

Other changes and bug fixes

Improvements

Bug fixes

Imply Cloud

Behind-the-scenes monitoring improvements enhance the stability of Imply Cloud operations. The Imply field team can now proactively discover and address conditions in the Amazon RDS metadata store that may lead to incidents or downtime in your Druid clusters before those incidents occur. See Monitoring the metadata store with Cloudwatch for more information.

Imply Manager highlights

Imply Helm chart changes

The Imply Manager Helm chart makes it easier to deploy a distributed Imply cluster over Kubernetes. In 3.4, the Imply Manager Helm chart has been enhanced with the following features:

Bug fixes

Upgrading from previous releases

If you are upgrading from a previous Imply release, please take note of the following sections.

Druid upgrade notes

Be aware of the following changes between 0.18.1-iap and 0.19.0-iap before upgrading. If you're updating from an earlier version than 0.18.1-iap, please see the release notes of the relevant intermediate versions.

druid.server.maxSize must now be set for Historical servers

A Coordinator bug fix, as a side-effect, now requires druid.server.maxSize to be set for segments to be loaded. No action is needed if you are using Imply Cloud. If using on-prem Imply, ensure that the setting is configured correctly before upgrading your clusters or else segments will not be loaded. See Segment Cache Size in the Druid documentation for more information.

System tables 'sys.segments' column 'payload' has been removed and replaced with dimensions, metrics, and shardSpec

The payload column has been removed from the sys.segments table, which should make queries on this table much more efficient. The most useful fields, the list of dimensions, metrics, and the shardSpec, have been split out, and still available to devote to processing queries.

Changed default number of segment loading threads

The druid.segmentCache.numLoadingThreads configuration has had the default value changed from "number of cores" to "number of cores" divided by 6. This should improve historical behavior out-of-the-box when loading a large number of segments, limiting the impact on query performance.

Broadcast load rules no longer have colocated datasources

A number of incomplete changes to facilitate more efficient join queries, based on the idea of utilizing broadcast load rules to propagate smaller datasources among the cluster so that join operations can be pushed down to individual segment processing, have been added to 0.19.0-iap. While not a finished feature yet, as part of the changes to make this happen, 'broadcast' load rules no longer have the concept of 'colocated datasources', which would attempt to only broadcast segments to servers that had segments of the configured datasource. This didn't work so well in practice, as it was non-atomic, meaning that the broadcast segments would lag behind loads and drops of the colocated datasource, so we decided to remove it.

Brokers and realtime tasks may now be configured to load segments from 'broadcast' datasources

Another effect of the previously mentioned preliminary work to introduce efficient broadcast joins, Brokers and realtime indexing tasks now load segments loaded by broadcast rules if a segment cache is configured. Since the feature is not complete there is little reason to do this in 0.19.0-iap, and it will not happen unless explicitly configured.

lpad and rpad function behavior change

The lpad and rpad functions have undergone a slight behavior change in Druids default non-SQL compatible mode in order to make them behave consistently with PostgreSQL. In the new behavior, if the pad expression is an empty string, then the result will be the (possibly trimmed) original characters, rather than the empty string being treated as a null and coercing the results to null.

Other known issues

For a full list of open issues, please see https://github.com/apache/druid/labels/Bug

Pivot upgrade notes

OIDC configuration change

If you are using an OIDC authentication provider with Pivot, such as Okta, you need to change your OIDC configuration.

The format of the issuer field has changed. Previously, the documentation indicated that the value should include /oauth2/default. Instead, change the issuer value to be the base URL only, without a trailing slash. This URL is automatically concatenated with /.well-known/openid-configuration. In the rare case that the OpenID Configuration Document for your provider is not at this address, you can use the newly introduced discoveryUrl field to provide the exact URL for the OpenID Configuration Document.

See Using OIDC in Pivot for more information.

Upgrading from earlier releases

When upgrading from Imply 3.3, which is based on Apache Druid 0.18.0, also note any items in the "Updating from previous releases" section of the Imply 3.3 release notes that may be relevant for your deployment.

Deprecation and removal notices

End of support

As of July 15, 2020, Imply version 2.x is no longer supported. If you still have active deployments that use Imply version 2.x, you are strongly encouraged to upgrade to the current version as soon as possible. See Subscription Support Maintenance Terms for more information about supported versions.

Changes in 3.4.1

Pivot Changes

Druid Changes

Changes in 3.4.2

Pivot Changes

Druid Changes

Changes in 3.4.3

Pivot Changes

Druid Changes

Changes in 3.4.4

Pivot Changes

Druid Changes

Overview

Tutorial

Deploy

Administer

Manage Data

Query Data

Visualize

Configure

Misc