Release notes

Imply 3.2.4 includes the following packages:

Pivot evaluation

The Imply download includes a 30 day trial evaluation of Pivot. Full licenses are included with Imply subscriptions — contact us to learn more!

Druid highlights

Batch ingestion improvements

Druid 0.17.0 includes a significant update to the native batch ingestion system. This update adds the internal framework to support non-text binary formats, with initial support for ORC and Parquet. Additionally, native batch tasks can now read data from HDFS.

This rework changes how the ingestion source and data format are specified in the ingestion task. To use the new features, please refer to the documentation on InputSources and InputFormats.

Please see the following documentation for details:

https://github.com/apache/druid/issues/8812

Single dimension range partitioning for parallel native batch ingestion

The parallel index task now supports the single_dim type partitions spec, which allows for range-based partitioning on a single dimension.

Please see https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html for details.

Compaction changes

Parallel index task split hints

The parallel indexing task now has a new configuration, splitHintSpec, in the tuningConfig to allow for operators to provide hints to control the amount of data that each first phase subtask reads. There is currently one split hint spec type, SegmentsSplitHintSpec, used for re-ingesting Druid segments.

Parallel auto-compaction

Auto-compaction can now use the parallel indexing task, allowing for greater compaction throughput.

To control the level of parallelism, the auto-compactiontuningConfig has new parameters, maxNumConcurrentSubTasks and splitHintSpec.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#compaction-dynamic-configuration for details.

https://github.com/apache/incubator-druid/pull/8570

Stateful auto-compaction

Auto-compaction now uses the partitionSpec to track changes made by previous compaction tasks, allowing the coordinator to reduce redundant compaction operations.

Please see https://github.com/apache/druid/issues/8489 for details.

If you have auto-compaction enabled, please see the information under "Stateful auto-compaction changes" in the "Upgrading to Druid 0.17.0" section before upgrading.

Parallel query merging on brokers

The Druid broker can now opportunistically merge query results in parallel using multiple threads.

Please see druid.processing.merge.useParallelMergePool in the Broker section of the configuration reference for details on how to configure this new feature.

Parallel merging is enabled by default (controlled by the druid.processing.merge.useParallelMergePool property), and most users should not have to change any of the advanced configuration properties described in the configuration reference.

Additionally, merge parallelism can be controlled on a per-query basis using the query context. Information about the new query context parameters can be found at https://druid.apache.org/docs/0.17.0/querying/query-context.html.

https://github.com/apache/incubator-druid/pull/8578

SQL-compatible null handling

We have added official documentation for a previously undocumented feature, Druid's SQL-compatible null handling mode.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#sql-compatible-null-handling and https://druid.apache.org/docs/0.17.0/design/segments.html#sql-compatible-null-handling for details.

Self-discovery resource

A new pair of endpoints have been added to all Druid services that return information about whether the Druid service has received a confirmation that the service has been added to the cluster, from the central service discovery mechanism (currently ZooKeeper). These endpoints can be useful as health/ready checks.

The new endpoints are:

Please see the Druid API reference for details.

https://github.com/apache/incubator-druid/pull/6702 https://github.com/apache/incubator-druid/pull/9005

Supervisors system table

Task supervisors (e.g. Kafka or Kinesis supervisors) are now recorded in the system tables in a new sys.supervisors table.

Please see https://druid.apache.org/docs/0.17.0/querying/sql.html#supervisors-table for details.

https://github.com/apache/incubator-druid/pull/8547

Fast historical start with lazy loading

A new boolean configuration property for historicals, druid.segmentCache.lazyLoadOnStart, has been added.

This new property allows historicals to defer loading of a segment until the first time that segment is queried, which can significantly decrease historical startup times for clusters with a large number of segments.

Please see the configuration reference for details.

https://github.com/apache/incubator-druid/pull/6988

Historical segment cache distribution change

A new historical property, druid.segmentCache.locationSelectorStrategy, has been added.

If there are multiple segment storage locations specified in druid.segmentCache.locations, the new locationSelectorStrategy property allows the user to specify what strategy is used to fill the locations. Currently supported options are roundRobin and leastBytesUsed.

Please see the configuration reference for details.

https://github.com/apache/incubator-druid/pull/8038

New readiness endpoints

A new Broker endpoint has been added: /druid/broker/v1/readiness.

A new Historical endpoint has been added: /druid/historical/v1/readiness.

These endpoints are similar to the existing /druid/broker/v1/loadstatus and /druid/historical/v1/loadstatus endpoints.

They differ in that they do not require authentication/authorization checks, and instead of a JSON body they only return a 200 success or 503 HTTP response code.

https://github.com/apache/incubator-druid/pull/8841

Support task assignment based on MiddleManager categories

It is now possible to define a "category" name property for each MiddleManager. New worker select strategies that are category-aware have been added, allowing the user to control how tasks are assigned to MiddleManagers based on the configured categories.

Please see the documentation for druid.worker.category in the configuration reference, and the following links, for more details:

https://github.com/apache/druid/pull/7066

Security vulnerability updates

A large number of dependencies have been updated to newer versions to address security vulnerabilities.

Please see the PRs below for details:

Pivot highlights

Alerts GA release

With Imply 3.2, Alerts are now available in Pivot for all accounts with the appropriate license, and will be made available in Clarity over the coming weeks. Alerts can be configured against multiple condition thresholds, checking against either aggregate measures for a dimension or against dimension values, and users will be notified when alerts are triggered via either email or webhook.

Scheduled Reporting

Pivot users can now also create regularly scheduled reports to be delivered in-application or via email. Scheduled report configurations support all existing data export formats, such as CSV, JSON, and Excel as email attachments. Additionally, report configurations support email delivery to external, non-Pivot users. As with Alerts, an appropriate license is required to enable the scheduled reports feature.

Improved async download experience

In previous versions of Pivot, data exports using the DownloadLargeData permission were susceptible to unpredictable timeouts. This should now be more robust against long-running queries.

Customization improvements

Pivot can now be configured to hide information about the underlying version and the accompanying modal, and can also be configured to display a custom message when a logged-in user does not have any permissions, e.g. when a user authorized via LDAP has no matching groups.

Miscellaneous improvements

Bug fixes

Clarity highlights

We’ve been working hard to bring Clarity up to speed with recent changes in Pivot. In Imply 3.2, Clarity has been updated behind the scenes to pull in many improvements and performance enhancements from Pivot libraries. The primary user-facing result of this will be the ability to configure Alerts in Clarity, but expect many more great Clarity enhancements in the near future!

Upgrading from previous releases

If you are upgrading from a previous Imply release, please take note of the following sections.

Druid upgrade notes

Select native query type has been removed

The deprecated Select native query type has been removed in 0.17.0.

If you have native queries that use Select, you need to modify them to use Scan instead. See the Scan query documentation for syntax and output format details.

For Druid SQL queries that use Select, no changes are needed; the SQL planner already uses the Scan query type under the covers for Select queries.

https://github.com/apache/incubator-druid/pull/8739

Old consoles have been removed

The legacy coordinator and overlord consoles have been removed, and they have been replaced with the new web console on the coordinator and overlord.

https://github.com/apache/incubator-druid/pull/8838

Calcite 1.21 upgrade, Druid SQL null handling

Druid 0.17.0 updates Calcite to version 1.21. This newer version of Calcite can make additional optimizations that assume SQL-compliant null handling behavior when planning queries.

If you use Druid SQL and rely on null handling behavior, please read the information at https://druid.apache.org/docs/0.17.0/configuration/index.html#sql-compatible-null-handling and ensure that your Druid cluster is running in the SQL-compliant null handling mode before upgrading.

https://github.com/apache/incubator-druid/pull/8566

Logging adjustments

Druid 0.17.0 has tidied up its lifecycle, querying, and ingestion logging.

Please see https://github.com/apache/incubator-druid/pull/8889 for a detailed list of changes. If you relied on specific log messages for external integrations, please review the new logging changes before upgrading.

The full set of log messages can still be seen when logging is set to DEBUG level. Template log4j2 configuration files that show how to enable per-package DEBUG logging are provided in the _common configuration folder in the example clusters under conf/druid.

Stateful auto-compaction changes

The auto-compaction scheduling logic in 0.17.0 tracks additional segment partitioning information in Druid's metadata store that is not present in older versions. This information is used to determine whether a set of segments has already been compacted under the cluster's current auto-compaction configurations.

When this new metadata is not present, a set of segments will always be scheduled for an initial compaction and this new metadata will be created after they are compacted, allowing the scheduler to skip them later if auto-compaction config is unchanged.

Since this additional segment partitioning metadata is not present before 0.17.0, the auto-compaction scheduling logic will re-compact all segments within a datasource once after the upgrade to 0.17.0.

This re-compaction on the entire set of segments for each datasource that has auto-compaction enabled means that:

Users are advised to be aware of the temporary increase in scheduled compaction tasks and the impact on deep storage usage. Documentation on removing old segments is located at https://druid.apache.org/docs/0.17.0/ingestion/data-management.html#deleting-data

targetCompactionSizeBytes property removed

The targetCompactionSizeBytes property has been removed from the compaction task and auto-compaction configuration. For auto-compaction, maxRowsPerSegment is now a mandatory configuration. For non-auto compaction tasks, any partitionsSpec can be used.

https://github.com/apache/incubator-druid/pull/8573

Compaction task tuningConfig

Due to the parallel auto-compaction changes introduced by #8570, any manually submitted compaction task specs need to be updated to use an index_parallel type for the tuningConfig section instead of index. These spec changes should be applied after the cluster is upgraded to 0.17.0.

Existing auto-compaction configs can remain unchanged after the update; the auto-compaction will create non-parallel compaction tasks until the auto-compaction configs are updated to use parallelism post-upgrade.

To control the level of parallelism, the auto-compactiontuningConfig has new parameters, maxNumConcurrentSubTasks and splitHintSpec.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#compaction-dynamic-configuration for details.

Compaction task ioConfig

The compaction task now requires an ioConfig in the task spec.

Please see https://druid.apache.org/docs/0.17.0/ingestion/data-management.html#compaction-ioconfig for details.

ioConfig does not have to be added to existing auto-compaction configurations, the coordinator after the upgrade will automatically create task specs with ioConfig sections.

https://github.com/apache/incubator-druid/pull/8571

Renamed partition spec fields

The targetPartitionSize and maxSegmentSize fields in the partition specs have been deprecated. They have been renamed to targetNumRowsPerSegment and maxRowsPerSegment respectively.

https://github.com/apache/incubator-druid/pull/8507

Cache metrics are off by default

Cache metrics are now disabled by default. To enable cache metrics, add "org.apache.druid.client.cache.CacheMonitor" to the druid.monitoring.monitors property.

https://github.com/apache/incubator-druid/pull/8561

Supervisor API has changed to be consistent with task API

Supervisor task specs should now put the dataSchema, tuningConfig, and ioConfig sections as subfields of a spec field. Please see https://github.com/apache/incubator-druid/pull/8810 for examples.

The old format is still accepted in 0.17.0.

Segments API semantics change

The /datasources/{dataSourceName}/segments endpoint on the Coordinator now returns all used segments (including overshadowed) on the specified intervals, rather than only visible ones.

https://github.com/apache/incubator-druid/pull/8564

Password provider for basic authentication of HttpEmitterConfig

The druid.emitter.http.basicAuthentication property now accepts a password provider.

https://github.com/apache/incubator-druid/pull/8618

Multivalue expression transformation change

Reusing multi-valued columns in expressions will no longer result in unnecessary cartesian explosions. Please see the following links for details.

https://github.com/apache/druid/issues/8947 https://github.com/apache/incubator-druid/pull/8957

Kafka/Kinesis ingestion during rolling upgrades

During a rolling upgrade, if there are tasks running 0.17.0 and overlords running older versions, and a task made progress reading data from its stream but rejected all the records it saw (e.g., all were unparseable), you will see NullPointerExceptions on overlords running older versions when the task updates the overlord with its current stream offsets.

Previously, there was a bug in this area (https://github.com/apache/incubator-druid/issues/8765) where such tasks would fail to communicate their current offsets to the overlord. The task/overlord publishing protocol has been updated to fix this, but older overlords do not recognize this protocol change.

This condition should be fairly rare.

ParseSpec.verify method removed

If you were maintaining a custom extension that provides an implementation for the ParseSpec interface, the verify method has been removed, and the @Override annotation on the method will need to be removed in any custom implementations.

https://github.com/apache/druid/pull/8744

Known issues

Ingestion spec preview in web console

The preview specs shown for native batch ingestion tasks created in the Data Loader of the web console are not correctly formatted and will fail if you copy them and submit them manually. Submitting these specs through the Data Loader submits a correctly formatted spec, however.

https://github.com/apache/druid/issues/9144

Hadoop version upgrade conflict with ingestion specs

As part of security updates to Imply, the bundled Hadoop client libraries (hadoop-client and hadoop-aws) have been updated from version 2.8.3 to 2.8.5.

If you have specified an earlier Hadoop version either in Druid configuration files (as described in https://druid.apache.org/docs/latest/operations/other-hadoop.html) or in the Hadoop ingestion specs, you need to update the specified version after updating to Imply 3.2.

In your ingestion spec, you can either update the version number indicated by the hadoopDependencyCoordinates property or remove the property entirely from the spec, since the currently bundled version is now the default. For more information on the property, see the task syntax example on Druid Hadoop documentation.

Alternatively, after upgrading to 3.2, you can work around the upgrade issue by reinstalling the version of the Hadoop libraries referenced by your configuration or ingestion spec.

Known issues with HDFS input source

In Imply Cloud, HDFS is not currently available as an input source for native batch ingestion (index or index_parallel task types). It continues to be available through Hadoop-based ingestion (index_hadoop task type).

In other environments, the new native batch HDFS input source requires the druid-hdfs-storage extension to be loaded. This is not currently possible unless you are also using HDFS for deep storage; otherwise, you will get an error related to "HdfsDataSegmentKiller" when trying to load the extension.

Other known issues

For a full list of open issues, please see https://github.com/apache/druid/labels/Bug

Upgrading from earlier releases

When upgrading from Imply 3.1, which is based on Apache Druid 0.16.0, please additionally take note of the items in the "Updating from previous releases" section of the Imply 3.1 release notes. You may need to take these items into consideration if they are relevant for your deployment.

Changes in 3.2.1

Pivot changes

Druid changes

Changes in 3.2.1

Pivot changes

Changes in 3.2.2.3

Pivot changes

Druid changes

Changes in 3.2.2.4

Druid changes

Changes in 3.2.3.1

Pivot changes

Druid changes

Changes in 3.2.3.2

Druid changes

Changes in 3.2.4

Pivot Changes

Druid changes

Overview

Deploy

Manage Data

Query Data

Visualize

Configure

Misc