Imply 3.4 includes the following packages:
The Imply download includes a 30-day evaluation license of Pivot. Full licenses are included with Imply subscriptions. Contact us to learn more!
Vectorized query engines for GroupBy and Timeseries queries were introduced in Druid 0.16 as opt-in features. Since then we have extensively tested these engines and feel that the time has come for these improvements to find a wider audience. Note that not all of the query engine is vectorized at this time, but this change makes it so that any eligible query is vectorized. This feature may still be disabled if you encounter any problems by setting
New in Druid 0.19.0-iap, native batch indexing now supports Apache Avro Object Container Format encoded files, allowing batch ingestion of Avro data without needing an external Hadoop cluster. Check out the docs for more details
SqlInputSource has been added in Druid 0.19.0-iap to work with the new native batch ingestion specifications first introduced in Druid 0.17-iap, deprecating the SqlFirehose. Like the
SqlFirehose it currently supports MySQL and PostgreSQL, using the driver from those extensions. This is a relatively low level ingestion task, and the operator must take care to manually ensure that the correct data is ingested, either by specially crafting queries to ensure no duplicate data is ingested for appends, or ensuring that the entire set of data is queried to be replaced when overwriting. See the docs for more operational details.
contrib extension has been added for Alibaba Cloud Object Storage Service (OSS) to provide both deep storage and usage as a batch ingestion input source. Since this is a
contrib extension, it will not be packaged by default in the binary distribution, please see community extensions for more details on how to use in your cluster.
contrib extension new in 0.19.0-iap has been added to support overlord autoscaling for Google Compute Engine. Unlike the Amazon Web Services overlord autoscaling, which provisions and terminates instances directly, the GCE autoscaler uses Managed Instance Groups to more closely align with how operators are likely to provision their clusters. Like other
contrib extensions, it will not be packaged by default in the binary distribution, please see community extensions for more details on how to use in your cluster.
REGEXP_LIKE function has been added to Druid SQL and native expressions, which behaves similar to LIKE, except using regular expressions for the pattern.
A coordinator API can make it easier to determine if the latest published segments are available for querying. This is similar to the existing coordinator
loadstatus API, but is datasource specific, may specify an interval, and can optionally live refresh the metadata store snapshot to get the latest up to date information. Note that operators should still exercise caution when using this API to query large numbers of segments, especially if forcing a metadata refresh, as it can potentially be a "heavy" call on large clusters.
Part bug fix, part new feature, Druid native batch (once again) supports appending new data to existing time chunks when those time chunks were partitioned with hash or range partitioning algorithms. Note that currently the appended segments only support dynamic partitioning, and when rolling back to older versions that these appended segments will not be recognized by Druid after the downgrade. In order to roll back to a previous version, these appended segments should be compacted with the rest of the time chunk in order to have a homogenous partitioning scheme.
Previously, lookups tables were the only native table type that supported direct joins, meaning, local joins on data that is available on all query processing nodes. Lookups, however, were limited in that they could comprise only a single key and value column. The Imply distribution of Druid introduces a new type of globally distributed table in 0.19.0-iap, the indexed table.
Indexed tables are multi-column tables that expand what is possible with efficient direct joins on globally distributed data. Indexed tables are backed by Druid segments and distributed among the cluster with broadcast load rules. The segments are created with some additional information that tells Druid how to load the table and which columns are the joinable key columns.
For more information, see Druid indexed tables (alpha) in the Imply knowledge base.
Druid 0.19.0-iap contains 65 bug fixes; you can see the complete list here.
Druid 0.19.0-iap fixes an important query correctness issue, where dynamic partitioned segments produced by a batch ingestion task were not tracking the overall number of partitions. This had the implication that when these segments came online, they did not do so as a complete set, but rather as individual segments, meaning that there would be periods of swapping where results could be queried from mixed sets of segment versions within a time chunk.
Prior to 0.19.0-iap, Druid had a bug when using hash or ranged partitioning where if data skew was such that any of the buckets were empty after ingesting, the partitions would never be recognized as complete and so never become queryable. Druid 0.19.0-iap fixes this issue by adjusting the schema of the partitioning spec. These changes to the JSON format should be backwards compatible, however rolling back to a previous version will again make these segments no longer queryable.
A bug that affected on-prem Druid versions prior to 0.19.0-iap allowed for (incorrect) coordinator operation if
druid.server.maxSize was not set. This bug would allow segments to load, and effectively randomly balance them in the cluster (regardless of what balancer strategy was actually configured) if all historicals did not have this value set. This bug has been fixed, but as a result
druid.server.maxSize must be set to the sum of the segment cache location sizes for historicals or they will not load segments. No action is needed if you are using Imply Cloud.
Imply 3.4 introduces the ability to define data cubes using SQL expressions, allowing users to more easily define advanced dimension extractions and measure aggregates without using the Plywood expression language.
This is an alpha grade feature, and should not be used in production environments. For more information on enabling this feature for testing purposes, please refer to the Knowledge Base article.
Additionally, in Imply 3.4, we are introducing the capability to add event annotations to time ranges on data cubes, allowing users to easily annotate real-world events like software releases or advertising campaigns against changes in data cube metrics.
This is also an alpha grade feature, and should not be used in production environments. For more information on enabling this feature for testing purposes, please refer to the Knowledge Base article.
timeShiftor other time plywood function causes a UI crash
serverRootconfig setting is applied
Behind-the-scenes monitoring improvements enhance the stability of Imply Cloud operations. The Imply field team can now proactively discover and address conditions in the Amazon RDS metadata store that may lead to incidents or downtime in your Druid clusters before those incidents occur. See Monitoring the metadata store with Cloudwatch for more information.
The Imply Manager Helm chart makes it easier to deploy a distributed Imply cluster over Kubernetes. In 3.4, the Imply Manager Helm chart has been enhanced with the following features:
labelsfield. See the Deploy with Kubernetes documentation for more information about the Imply Helm chart.
extraVolumeClaimTemplatesfield. For more information, see Adding another volume claim.
If you are upgrading from a previous Imply release, please take note of the following sections.
Be aware of the following changes between 0.18.1-iap and 0.19.0-iap before upgrading. If you're updating from an earlier version than 0.18.1-iap, please see the release notes of the relevant intermediate versions.
A Coordinator bug fix, as a side-effect, now requires
druid.server.maxSize to be set for segments to be loaded. No action is needed if you are using Imply Cloud. If using on-prem Imply, ensure that the setting is configured correctly before upgrading your clusters or else segments will not be loaded. See Segment Cache Size in the Druid documentation for more information.
payload column has been removed from the
sys.segments table, which should make queries on this table much more efficient. The most useful fields, the list of
metrics, and the
shardSpec, have been split out, and still available to devote to processing queries.
druid.segmentCache.numLoadingThreads configuration has had the default value changed from "number of cores" to "number of cores" divided by 6. This should improve historical behavior out-of-the-box when loading a large number of segments, limiting the impact on query performance.
A number of incomplete changes to facilitate more efficient join queries, based on the idea of utilizing broadcast load rules to propagate smaller datasources among the cluster so that join operations can be pushed down to individual segment processing, have been added to 0.19.0-iap. While not a finished feature yet, as part of the changes to make this happen, 'broadcast' load rules no longer have the concept of 'colocated datasources', which would attempt to only broadcast segments to servers that had segments of the configured datasource. This didn't work so well in practice, as it was non-atomic, meaning that the broadcast segments would lag behind loads and drops of the colocated datasource, so we decided to remove it.
Another effect of the previously mentioned preliminary work to introduce efficient broadcast joins, Brokers and realtime indexing tasks now load segments loaded by broadcast rules if a segment cache is configured. Since the feature is not complete there is little reason to do this in 0.19.0-iap, and it will not happen unless explicitly configured.
rpad functions have undergone a slight behavior change in Druids default non-SQL compatible mode in order to make them behave consistently with PostgreSQL. In the new behavior, if the pad expression is an empty string, then the result will be the (possibly trimmed) original characters, rather than the empty string being treated as a null and coercing the results to null.
For a full list of open issues, please see https://github.com/apache/druid/labels/Bug
When upgrading from Imply 3.3, which is based on Apache Druid 0.18.0, also note any items in the "Updating from previous releases" section of the Imply 3.3 release notes that may be relevant for your deployment.
As of July 15, 2020, Imply version 2.x is no longer supported. If you still have active deployments that use Imply version 2.x, you are strongly encouraged to upgrade to the current version as soon as possible. See Subscription Support Maintenance Terms for more information about supported versions.