Skip to main content

Imply Enterprise and Hybrid release notes

Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2024.07.

For information on the LTS release, see the LTS release notes.

If you are upgrading by more than one version, read the intermediate release notes too.

The following end-of-support dates apply in 2023:

  • On January 26, 2023, Imply 2021.01 LTS reached EOL. This means that the 2021.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
  • On January 31, 2023, Imply 2022.01 LTS ended general support status and is eligible only for security support.

For more information, see Lifecycle Policy.

See Previous versions for information on older releases.

Imply evaluation

New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!

With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.

Imply Enterprise

If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.

Changes in 2024.07

Pivot highlights

Multi-axis line chart visualization (beta)

The new multi-axis line chart visualization is optimized for the display of multiple axes—you can display up to 10 measures on a single chart.

You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, you can continue to access the original line chart as well (id: 39612)

Window functions in custom measures

You can now use window functions when defining custom measures in data cubes. See Custom measure examples for details (id: 41427)

Pivot changes

  • Fixed folder duplicates appearing in the dimensions list (id: 61757)
  • Fixed re-sorted axes producing incorrect filters in the table visualization (id: 61723)
  • Fixed some dashboard tiles not applying filters when they are off-screen (id: 61722)
  • Fixed a problem displaying "Other" values in a filtered table visualization (id: 61704)
  • Fixed data cube instances not working with some visualizations (id: 61612)
  • Fixed filter not persisting in the filter bar in the flat table visualization (id: 61499)
  • Fixed a problem with downloads from flat table visualizations when All rows is selected (id: 61045)

Druid changes

  • Added a way for columns to provide GroupByVectorColumnSelectors, which controls how the GroupBy engine operates on them (#16338) (id: 61938)
  • Added druid-parquet-extensions to all example quickstarts (#16664) (id: 61937)
  • Added support for bootstrap segments (#16609) (id: 61844)
  • Added formatted JSON values to web console displays (#16632) (id: 61827)
  • Added druid.azure.account and druid.azure.container properties to Azure deep storage configuration (#16561) (id: 61643)
  • Added authorization checks for permissionless internal requests (#16419) (id: 61543)
  • Added interface method for returning canonical lookup name (#16557) (id: 61542)
  • Added new usage metrics for CPU and memory control groups (#16472) (id: 61401)
  • Added the appropriate hash strategy and the equals method for IP types so they can be grouped on (id: 60641)
  • Added private method handleConnectionStateChanged to handle connection state changes (#16528) (id: 57657)
  • Improved `AbstractSegmentMetadataCache by changing log level to debug to avoid logging signature for each segment (#16565) (id: 61552)
  • Improved allocation and supervisor logs for easier debugging (#16535) (id: 61468)
  • Improved AutoCompactionSnapshotBuilder (#16523) (id: 61447)
  • Improved event hubs by disabling when Kafka extensions aren't loaded (#16559) (id: 61572)
  • Improved window operators by enabling reordering (#16482) (id: 39694)
  • Improved Kafka support by enabling use of CSV input format in Kafka record when "Parse Kafka metadata" is also enabled (#16630) (id: 61255)
  • Improved S3UploadThreadPool by exposing its metrics (#16616) (id: 61732)
  • Improved GroupIteratorForWindowFrame by extending its use for aggregate computations of PeerType ROWS (#16603) (id: 61834)
  • Improved segment allocation by optimizing unused segment query (#16623) (id: 61731)
  • Improved JsonInputFormat by simplifying its serialized form (#15691) (id: 61547)
  • Improved druid.indexer.tasklock.batchAllocationWaitTime by updating default value to zero (#16578) (id: 43004)
  • Improved UsedSegmentChecker by renaming it to PublishedSegmentsRetriever and cleaning up task actions (#16644) (id: 61940)
  • Improved Azure extension by removing an unused converter file (#16541) (id: 61491)
  • Improved ResultCache keys by removing incorrect UTF8 conversion (#16569) (id: 61693)
  • Improved indexing by removing indexrealtime and indexrealtimeappenderator tasks (#16602) (id: 61895)
  • Fixed vector grouping expression deferring evaluation to only consider dictionary-encoded strings as fixed width (#16666) (id: 61965)
  • Fixed `CgroupCpuSetMonitor throwing null pointers when activated (id: 61841)
  • Fixed null pointer exceptions when CgroupCpuSetMonitor was enabled (#16621) (id: 61806)
  • Fixed duplicate entry logs during pending segment allocation (#16605) (id: 61770)
  • Fixed attempts to publish the same pending segments multiple times (#16605) (id: 61769)
  • Fixed retry logic in BrokerClient (#16618) (id: 61746)
  • Fixed task replica failures due to inconsistent metadata (#16614) (id: 61730)
  • Fixed a bug causing maxSubqueryBytes to fail when segments have missing columns (#16619) (id: 61659)
  • Fixed NestedDataColumnIndexerV4 reporting incorrect cardinality (#16507) (id: 61656)
  • Fixed pagination and filtering regression in supervisor view in the web console (#16571) (id: 61644)
  • Fixed expression column capabilities so that they don't report as dictionary-encoded unless the input is a string (#16577) (id: 61642)
  • Fixed query with floor(exp(least())) in the filter returning incorrect result (#16649) (id: 61594)
  • Fixed query with greatest(floor()) in the filter returning incorrect result (#16649) (id: 61592)
  • Fixed capabilities reported by UnnestStorageAdapter (#16551) (id: 61544)
  • Fixed delta sorting in the explore view table in the web console (#16542) (id: 61501)
  • Fixed race condition in AzureClient factory fetch (#16525) (id: 61466)
  • Fixed a condition where 2 coordinators are elected leader (#16411) (id: 61456)
  • Fixed ip_stringify() error that the column doesn't exist when there is a segment of null values (id: 61436)
  • Fixed queries filtering for the same condition with both an IN and EQUALS so they don't return empty results (#16597) (id: 61239)
  • Fixed schema backfill count metric (#16536) (id: 60745)
  • Fixed the grouping engine for a query with grouping sets when a limit is applied with order by columns different to the query dimensions (#16534) (id: 60356)
  • Fixed window function in a subquery returning "Cannot convert to Scan query without any columns" (#16502) (id: 42784)
  • Fixed is null filter on an unnest query with json_value() returning "Unhandled Query Planning Failure" (id: 37339)
  • Fixed is not null filter has no effect on an unnest query with json_value() output (id: 37258)
  • Upgraded DeepJavaLibrary(DJL) to address CVE-2024-37902 (id: 61918)
  • Upgraded Calcite to 1.37 (#16504) (id: 61734) (id: 60501)

Clarity changes

  • Added the ability to distinguish between task types in the ingestion view (id: 62005)
  • More accurate and faster percentile calculations (id: 61620)
  • Added fsDevName and fsDirName as dimensions in Raw metrics (id:62000)
  • Fixed the raw metrics count measure (id:61892)
  • Added Asia/Yangon timezone (id: 61701)

Imply Manager changes

  • Added support for managing loadBalancerSourceRanges to the Helm chart (id: 3128)
  • You can now forbid a password from including any of the following: username, email, first or last name (id: 43193)
  • Imply Enterprise enhanced on GKE can now be configured to use internal IP addresses for cluster nodes (id: 43068)
  • Fixed a problem with timestamp_format in the CloudFormation logs (id: 61865)
  • Fixed an issue where password requirements were not being validated (id: 3480)

Changes in 2024.06.1

Druid changes

  • Improved the query that's used to fetch unused segments for a datasource. It now finishes more quickly. In a datasource with 1.8 million unused segments, Druid can now return results in less than a second. Previously, results in that scenario could take over 30 seconds. A long wait for results could lead to issues for the Overlord service (#16623) (id: 61731)

Changes in 2024.06

Pivot highlights

Data cube time zone setting

If you change the time zone in a data cube's settings, you can now apply the same time zone the next time you access the data cubeas either a one-time or persistent change. See Managing data cubes for details.

(id: 39603)

Pinned dimensions in query parameters

You can now use pinnedDimensions as a query parameter in a Pivot URL to pin one or more specified dimensions to the sidebar. See Data cube and dashboard query parameters reference for details.

(id: 60664)

Druid highlights

High-precision geospatial filters

High-precision geospatial filters use a geo dimension to provide the same filters and bound types as spatial dimensions and filters but at a higher level of precision, offering more options for how you work with and utilize your geospatial data. They replace the lower precision geospatial filters that Druid offers out-of-the-box.

To enable them, load the imply-utility-belt extension. For more information, see High-precision geospatial filters.

Zookeeper-based segment loading removed

The improvements made to the Druid Coordinator improve the experience with HTTP-based segment loading.

Therefore, Zookeeper-based segment loading is being removed as it is known to have issues and has been deprecated for several releases.

The following configs are being removed as they are not used anymore:

druid.coordinator.load.timeout: Not needed as the default value of this parameter (15 minutes) is known to work well for all clusters druid.coordinator.loadqueuepeon.type: Not needed as this value will always be http druid.coordinator.curator.loadqueuepeon.numCallbackThreads: Not needed as zookeeper(curator)-based segment loading is not an option anymore

If set in any cluster, these configs will be ignored by Druid.

Automatic cleanup of compaction configs of inactive datasources is now enabled by default.

(#15705) (id: 60764)

Pivot changes

  • Improved performance of the background reports job runner (id: 60996)
  • Updated AccessDashboards so that users with this permission can't expand a tile in a dashboard or access the underlying data cube (id: 60285)
  • Fixed an error when navigating from the records or records table visualizations to the overall visualization (id: 60787)
  • Fixed report owner's inability to see a report when they are removed from the recipients list (id: 42841)
  • Fixed an issue where alert creation fails when the first data cube in the list is missing a primary time dimension (id: 40948)
  • Fixed a problem with axis data truncating in the line chart visualization (id: 61173)
  • Fixed string dimensions filter failing in some circumstances (id: 60545)
  • Fixed some column headings not appearing in a data cube with multiple measures (id: 43091)

Druid changes

  • Added retries to the S3 client to better handle transient errors related to finding a region (#16438)(id: 39357)
  • Added validation to prevent a datasource that's being ingested into from being queried if the query includes real-time sources. This prevents issues with fetching segment details (#16310) (id: 44895)
  • Added a new API that makes a best effort to trigger a handoff for tasks of a supervisor early: /druid/indexer/v1/supervisor/{supervisorId}/taskGroups/handoff (#16310) (id: 60152)
  • Added support for sorting on complex column types when using the MSQ task engine (id: 60588)
  • Added support for rolling up geo complex columns in MSQ (id: 61133)
  • Added native filter conversion for SCALAR_IN_ARRAY (#16312) (id: 60873)
  • Added MSQ support for using the selective lookup loading (id: 60714)
  • Changed lookups for compaction tasks to no longer unnecessarily load by default (#16420) (id: 61036)
  • Fixed an issue where a rolling or downgrade caused batch ingestion tasks to fail (#16556) (id:61523)
  • Fixed a race condition that could occur when you queried data that was in Azure-based deep storage (#16525) (id: 61462)
  • Fixed an issue where having two exact COUNT(DISTINCT ) aggregations with certain conditions produces a data correctness issue. (#16402)(id: 60355)
  • Fixed an issue where small concurrent lookup queries time out after 5 minutes (id: 60657)
  • Fixed a NPE in the segment schema cache (#16404) (id: 61289)
  • Fixed an issue where sorting on a delta column in the Explore view table results in an error (#16417) (id: 61283)
  • Fixed an issue with Geo columns where deserialization on Peon services violated the buffer (#16389) (id: 60896)
  • Fixed an issue where an exception led to columns leaking (#16365) (id: 60874)
  • Fixed an issue where sketches didn't downsample sketches sufficiently. This could lead to situations where sketches exceeded their allowed memory usage (#16119)(id: 43055)
  • Fixed issues with type-aware schema discovery related to grouping:
    • Inconsistent results when grouping on real-time data with type-aware schema discovery
    • discovered LONG and DOUBLe type columns incorrectly report not having null values, resulting in incorrect null handling when grouping (#16489)(id: 61259)
  • Improved how the web console detects durable storage settings (#16493) (id: 61329)
  • Improved the MSQ export log error message (#16363) (id: 61287)
  • Improved the web console to use globs instead of filters for files (#16452) (id: 61281)
  • Improved the web console's Download all button to produce a single file with concatenated data instead of individual files (#16375) (id: 60857)
  • Improved the Supervisor view in the web console to provide information dynamically (#16318) (id: 60840)
  • Improved the speed of SQL IN queries that use the SCALAR_IN_ARRAY function (#16388) (id: 43245) (id: 42869)
  • Improved sketches that use the MSQ task engine to reduce memory usage when transferring sketches between the controller and worker (#16269)(id: 42171)
  • Updated the web console's Druid doctor check to accept Java 17 (#16250) (id: 61280)
  • Updated org.scala-lang:scala-library from 2.13.11 to 2.13.14 (#16364) (id: 61291)

Imply Manager changes

  • You can now add PreStop hooks to query and master nodes through the Helm chart. Additionally, query pods now include a default PreStop hook that provides time for LoadBalancers and Ingress controllers to reconcile (id: 60925)
  • Added support for c6g.2xlarge (AWS) query nodes in Imply Hybrid (id: 60869)

Changes in 2024.05.2

Pivot changes

  • Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)

Changes in 2024.05.1

Pivot changes

  • Improved the Pivot alerts API to provide the details for a planned query (id: 40980)

Druid changes

  • Fixed an issue with new tasks failing to get the location of currently running tasks (id: 61110)
  • Fixed an issue with numeric replace-with-default behavior when using Druid legacy null handling mode (id: 61139)

SaaS Clarity changes

  • Added Leader dimension srvice/heartbeat metric (id: 43031)
  • Fixed an issue with Pivot query cancellation (id: 43222)
  • Disabled custom time alerts for less than one minute (id: 43223)
  • Disabled alert previews (# 43263)

Changes in 2024.05

Pivot highlights

Download types feature flag

If you're using the Async download (alpha) feature, a new feature flag Download types (Alpha) allows you to control the download options available to users in datacubes. The feature flag accepts a JSON array of values to enable download types and set their priority:

  • ["standard","async"] sets standard download as the default experience, with async download (alpha) available as a secondary option.
  • ["async","standard"] sets async download (alpha) as the default experience, with standard download available as a secondary option.
  • ["standard"] sets standard download as the only download experience.
  • ["async"] sets async download (alpha) as the only download experience. In the case of an invalid array, Pivot applies ["standard","async"]. An empty array turns off all downloads.

See Download data for more information.

(id: 43248)

Druid highlights

Zookeeper-based segment loading turned off

Zookeeper-based segment loading is no longer supported. You do not need to take any action. Druid ignores the following related configs:

  • druid.coordinator.load.timeout
  • druid.coordinator.loadqueuepeon.type
  • druid.coordinator.curator.loadqueuepeon.numCallbackThreads

Druid now only uses the recommended HTTP loading, which includes improvements to the Coordinator service such as smart segment loading.

As part of this change, compaction configs for inactive datasources are automatically cleaned up by default.

(#15705) (#id: 60764)

Manifest files for MSQ task engine exports (beta)

Export queries that use the MSQ task engine now also create a manifest file at the destination, which lists the files created by the query.

During a rolling update, older versions of workers don't return a list of exported files, and older Controllers don't create a manifest file. Therefore, export queries ran during this time might have incomplete manifests.

(#15953) (id: 42101)

MSQ task engine compaction state

You can now include a storeCompactionState context parameter to MSQ task engine replace queries. If set to true, segment metadata would include the last compaction state, which allows compaction jobs to skip segments where the compaction state matches the desired state (#15965) (id: 60301) (id: 39754)

Kinesis autoscaling

The Kinesis autoscaler now considers max lag in minutes instead of total lag. To maintain backwards compatibility, this change is opt-in for existing Kinesis connections. To opt in, set lagBased.lagAggregate in your supervisor spec to MAX. New connections use max lag by default. (#16284) (id: 60222) (#16334) (id: 60672) (#16314) (id: 60572)

Double and null values in arrays

Druid now supports double or null values for SQL array types when you use dynamic parameters in a query.

(#16274) (id: 60410)

New SCALAR_IN_ARRAY function

You can now use the following function to check if a scalar expression appears in an array:

SCALAR_IN_ARRAY(expr, arr)

(#16306) (id: 60546)

Improved native queries

Native queries can now group on nested columns and arrays.

(#16068) (id: 42483)

Improved performance for LIKE filters

Previously, simple regular expressions could trigger backtracking that dramatically increased query time. For example, an expresion with a few % wildcards. Druid now uses a simple greedy algorithm that avoids backtracking to improve query performance by up to 20% for worst-case scenarios.

(#43169)(id: 43169)

Centralized datasource schema (alpha)

You can now configure Druid to centralize schema management using the Coordinator service. Previously, Brokers needed to query data nodes and tasks for segment schemas. Centralizing datasource schemas can improve startup time for Brokers and the efficiency of your deployment.

If enabled, the following changes occur:

  • Realtime segment schema changes get periodically pushed to the Coordinator
  • Tasks publish segment schemas and metadata to the metadata database
  • The Coordinator service polls the schema and segment metadata to build datasource schemas
  • Brokers fetch datasource schemas from the Coordinator when possible. If not, the Broker builds the schema.

This behavior is currently opt-in. To enable this feature, set the following configs:

  • In your common runtime properties, set druid.centralizedDatasourceSchema.enabled to true.
  • If you're using MiddleManagers, you also need to set druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled to true in your MiddleManager runtime properties.

You can return to the previous behavior by changing the configs to false.

You can configure the following properties to control how the Coordinator service handles unused segment schemas:

NameDescriptionRequiredDefault
druid.coordinator.kill.segmentSchema.onBoolean value for enabling automatic deletion of unused segment schemas. If set to true, the Coordinator service periodically identifies segment schemas that are not referenced by any used segment and marks them as unused. At a later point, these unused schemas are deleted.NoTrue
druid.coordinator.kill.segmentSchema.periodHow often to do automatic deletion of segment schemas in ISO 8601 duration format. Value must be equal to or greater than druid.coordinator.period.metadataStoreManagementPeriod. Only applies if druid.coordinator.kill.segmentSchema.on is set to true.NoP1D
druid.coordinator.kill.segmentSchema.durationToRetainISO 8601 duration for the time a segment schema is retained for from when it's marked as unused. Only applies if druid.coordinator.kill.segmentSchema.on is set to true.Yes, if druid.coordinator.kill.segmentSchema.on is set to true.P90D

In addition there are new metrics for monitoring after enabling centralized datasource schemas:

  • metadatacache/schemaPoll/count
  • metadatacache/schemaPoll/failed
  • metadatacache/schemaPoll/time
  • metadacache/init/time
  • metadatacache/refresh/count
  • metadatacache/refresh/time
  • metadatacache/backfill/count
  • metadatacache/finalizedSegmentMetadata/size
  • metadatacache/finalizedSegmentMetadata/count
  • metadatacache/finalizedSchemaPayload/count
  • metadatacache/temporaryMetadataQueryResults/count
  • metadatacache/temporaryPublishedMetadataQueryResults/count

For more information, see Metrics.

(#15817) (id: 37627)(id: 36862)

Imply Manager highlights

New configurable security policies

In Settings > Security policy, you can configure custom password options and login throttling.

For passwords, some of the parameters you can configure include the following:

  • Minimum and maximum password length
  • Minimum passphrase length
  • Password lifetime
  • Password history

For login throttling options, you can configure the following:

  • Lockout tries: the number of attempts before a user account is locked out
  • Lockout duration: the amount of time the user account is locked out for
  • Disable tries: the number of attempts before a user account is disabled
  • Disable duration: the amount of time the user account is disabled for

(id: 43193)

Pivot changes

  • Improved the flat table, gauge, time series, and overall (beta) visualizations to allow users to drag-and-drop dimensions, measures, and comparisons into the visualization. This functionality is already available in other visualizations (id: 60621)
  • Improved custom date time formatting (id: 40619)
  • Improved the PIVOT_NESTED_AGG function (id: 43264)
  • Improved the experience of switching from the overall visualization to the overall (beta) visualization (id: 60607)
  • Fixed rawDownloadLimit not working in data cube advanced options (id: 34072)
  • Fixed an issue that prevented the All rows async download option from working in the records table visualization (id: 60110)
  • Fixed a rendering issue for overall and flat table visualizations on data cubes without a primary time dimension (id: 60151)
  • Fixed an issue with alert webhook URLs missing protocol (id: 60383)
  • Fixed a problem with Ok and Cancel buttons not appearing when multiple time ranges were applied to a data cube (id: 60502)
  • Fixed an issue where hiding 'overall' for a column dimension would also hide the dimension name (id: 40649)
  • Fixed an issue with dashboard filters resetting from "greater than or equal to/less than or equal to" to "greater than/less than" (id: 40777)
  • Fixed a problem with the time dimension bucket option "default, no bucket" (id: 41304)
  • Fixed an issue where dashboards with mixed tile types didn't apply filters to some visualizations (id: 41787)
  • Fixed a problem with the table visualization hiding nested split values in some circumstances (id: 42517)
  • Fixed an issue where users without the accessDataCubes permission could expand a dashboard tile and navigate directly to a data cube view (id: 60285)
  • Fixed a problem with the overall (beta) visualization when using a LONG column as primary time dimension (id: 43293)

Druid changes

  • Added support for selective loading of lookups so that MSQ task engine workers don't load unnecessary lookups (#16328) (id: 40610)
  • Added the JVM version to JVM monitor metrics (#16262) (id: 60386)
  • Added a new index for pending segments table for datasource and task_allocator_id columns (#16355) (id: 60743)
  • Changed the web console to no longer send transform expressions containing lookups to the sampler, which always resulted in an error. The web console now uses a placeholder (#16234) (id: 60276)
  • Changed the upload buffer size in GoogleTaskLogs to 1 MB instead of 15 MB to allow more uploads in parallel and prevent the MiddleManager service from running out of memory (#16236) (id: 60293)
  • Changed default value of useMaxMemoryEstimates for Hadoop jobs to false (#16280) (id: 60688)
  • Fixed an issue with concurrent replace where you might get duplicate query results due to a race condition (#16144) (id: 60300)
  • Fixed an issue where the log count for the number of datasources affected by auto-kill was wrong (#16341) (id: 60706)
  • Fixed an issue where join queries on complex data types return the wrong results (id: 60698)
  • Fixed an issue where concurrent replace skipped intervals locked by append locks during compaction (#16316) (id: 60583)
  • Fixed an issue where the query context parameter enableTimeBoundaryPlanning: true makes a max time query return incorrect results when a virtual column is used. TimeBoundary queries don't support virtual columns #(id: 60682)
  • Fixed an issue where TimeBoundary queries incorrectly allowed filters that require virtual columns. TimeBoundary queries don't support virtual columns (#16337) (id: 60686)
  • Fixed an incorrect check while generating MSQ task engine error Report (#16273) (id: 60577)
  • Fixed the supervisor offset reset dialog in the web console (#16298) (id: 60571)
  • Fixed an exception that occurs while loading lookups from an empty JDBC source (#16307) (id: 60539)
  • Fixed query timer issues in the web console (#16235) (id: 60266)
  • Fixed windowed aggregates so that they update the aggregation value based on the final compute (#16244) (id: 60390)
  • Fixed an issue where ORDER BY gets ignored on certain GROUPING SETS (#16268) (id: 60389)
  • Fixed CVEs (#16147) (id: 60279)
  • Fixed an issue where the web console could return incorrect creation times and durations for tasks after the Overlord service restarts (#16228) (id: 60223)
  • Fixed issues with the first/last vector aggregators (#16230) (id: 47262)
  • Fixed an issue where groupBy queries that have bit_xor() is null return the wrong result (#16237) (id: 42556)
  • Fixed an issue where ipv4_parse() returns an assertion error instead of null on an invalid IP address string literal (#15916) (id: 42199)
  • Fixed an issue where Broker merge buffers get into a deadlock when multiple simultaneous queries use them (#15420) (id: 37984)
  • Improved the feedback you receive when a task fails due to lock revocation exceptions in task status (#16325) (id: 60710)
  • Improved how scalars work in arrays (#16311) (id: 60705)
  • Improved how Druid parses JSON by using charsetFix (#16212) (id: 60708)
  • Improved the error message when a task fails before becoming ready (#16286) (id: 60525)
  • Improved the performance of OR filters in certain use cases (#16300) (id: 60507)
  • Improved the performance of queries that use filter bundles (#16292) (id: 60500)
  • Improved the user experience for the web console to better indicate when it is in manual capability detection mode and limited features are available (#16191) (id: 60288)
  • Improved the error messages when a supervisor's checkpoint state is invalid (#16208) (id: 60236)
  • Improved MSQ task engine reports to show why range partitioning was not chosen (#16175) (id: 42652)

Imply Manager changes

  • The root password for Imply Enterprise on Kubernetes deployments, including GKE and AKS, now expire one year after startup. Previously, the password expired one year after the image was created (id: 60397)
  • You can now specify GCP resource labels in installation script for Imply Enterprise on GKE (id: 43069)

Changes in 2024.04.1

Pivot changes

  • Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)

Changes in 2024.04

Pivot highlights

Database auth tokens

You can now create a database auth token on a Pivot role to enable access to specific Druid data. See Database auth tokens for more information.

(id: 41170)

Druid highlights

Improved array ingest mode

The array mode for arrayIngestMode contains improvements that make it the best choice for any new datasources that contain arrays. Imply strongly recommends that you use array mode instead of mvd mode. array mode provides a better experience, including support for a wider range of array types. Continued improvements to the array ingest mode and array-typed columns are on the roadmap. Additionally, you can avoid certain limitations of mvd mode by using array mode.

The following list describes the behavior based on what you set arrayIngestMode to:

  • If you set it to array, SQL ARRAY types are stored using Druid array columns. This is recommended for new tables.
  • If you set it to mvd, SQL VARCHAR ARRAY types are implicitly wrapped in ARRAY_TO_MV. This causes them to be stored as multi-value strings, using the same STRING column type as regular scalar strings. This is the default behavior when arrayIngestMode is not provided in your query context.
  • If you set it to none, Druid throws an exception when trying to store any type of array.

The following table summarizes the differences in SQL ARRAY handling between arrayIngestMode: array and arrayIngestMode: mvd:

SQL typeStored type when arrayIngestMode: arrayStored type when arrayIngestMode: mvd (default)
VARCHAR ARRAYARRAY<STRING>multi-value STRING
BIGINT ARRAYARRAY<LONG>not possible (validation error)
DOUBLE ARRAYARRAY<DOUBLE>not possible (validation error)

In either mode, you can explicitly wrap string arrays in ARRAY_TO_MV to cause them to be stored as multi-value strings.

Note that you cannot mix string arrays and multi-value strings in the same column.

(#15920) (id: 43043)

Pivot changes

  • You can now use asymmetric number range filters with flat table, gauge, time series, and overall (beta) visualizations (id: 58507)
  • Added query precision and query caching session persistence (id: 43251)
  • Added FilterWithRegex permission which allows users to use the regex filter for string dimensions (id: 42840)
  • Added Pivot server configuration property disableExternalEmails which allows administrators to disable sending alerts and reports to external email addresses (id: 42201)
  • Added the ability to include only one value in a number filter (id: 43253)
  • Improved the performance and behavior of the PIVOT_NESTED_AGG function (id: 43264)
  • Fixed an issue with the line chart visualization causing dashboards to crash (id: 58506)
  • Fixed an issue with the street map visualization including all data cube Latitude/Longitude dimensions, even the ones not in the visualization (id: 58505)
  • Fixed an issue with the y-axis extending above the line chart visualization boundary for very small values (id: 43300)
  • Fixed unnecessary query in axis-query generation (id: 43244)
  • Fixed an issue with table columns not respecting the time format (id: 43181)
  • Fixed incorrect x-axis in the time series vizualisation (id: 42725)
  • Fixed dashboard time filter "include end bound" not working in gauge, flat table, time series, and overall (beta) visualizations (id: 42406)
  • Fixed flat table changes not being propagated from data cube to dashboard (id: 42275)
  • Fixed time comparison not working as expected in a line chart visualization when bucketing <=5 minutes (id: 41982)
  • Fixed an issue where dashboard filters reset from Greater than or equal and Less than or equal to Greater than and Less than (id: 40777)

Druid changes

  • Added more logging for S3 retries (#16117) (id: 43161)
  • Added new in filter that preserves the input types (id: 41500)
  • Added new typed in filter (#16039) (id: 48937)
  • Added error code to failure type InternalServerError (#16186) (id: 54432)
  • Added support for using window functions with the MSQ task engine as the query engine (#15470) (id: 39416)
  • Added support for joins in decoupled mode (#15957) (id: 42763)
  • Added segmentsRead and segmentsPublished fields to parallel compaction task completion reports so that you can see how effective a compaction task is (#15947) (id: 38574)
  • Added a new task/autoScaler/requiredCount metric that provides a count of required tasks based on the calculations of the lagBased autoscaler. Compare that value to task/running/count to discover the difference between the current and desired task counts (#16199) (id: 58510)
  • Changed the controller checker for the MSQ task engine to check for closed only (#16161) (id: 43289)
  • Added geospatial interfaces (#16029) (id: 60162)
  • Fixed ColumnType to RelDataType conversion for nested arrays (#16138) (id: 43178)
  • Fixed WindowingscanAndSort query issues on top of Joins (#15996) (id: 42717)
  • Fixed REGEXP_LIKE, CONTAINS_STRING, and ICONTAINS_STRING so that they correctly return null for null value inputs in ANSI SQL compatible null handling mode (the default configuration). Previously, they returned false (#15963) (id: 43288)
  • Fixed the Azure icon not rendering in the web console (#16173) (id: 43286)
  • Fixed a bug in the MarkOvershadowedSegmentsAsUnused Coordinator duty to also consider segments that are overshadowed by a segment that requires zero replicas (#16181) (id: 43285)
  • Fixed issues with ARRAY_CONTAINS and ARRAY_OVERLAP with null left side arguments as well as MV_CONTAINS and MV_OVERLAP (#15974) (id: 43162)
  • Fixed an issue where numeric LATEST_BY and EARLIEST_BY aggregations show incorrect results with latest_by (#15939) (id: 42342)
  • Fixed a bug in the markUsed and markUnused APIs where an empty set of segment IDs would be inconsistently treated as null or non-null in different scenarios (#16145) (id: 43153)
  • Fixed a bug where export queries did not use the output names specified and exported the temporary column names instead for some queries, such as GROUP BY (#16096) (id: 42826)
  • Fixed a bug where numSegmentsKilled is reported incorrectly (#16103) (id: 42960)
  • Fixed an issue with metric emission in the segment generation phase (#16146) (id: 43152)
  • Fixed a data race in getting results from MSQ select tasks (#16107) (id: 43000)
  • Fixed an issue which can occur when using schema auto-discovery on columns with a mix of array and scalar values and querying with scan queries (#16105) (id: 43007)
  • Fixed a bug where completion task reports are not being generated on index_parallel tasks. (#16042) (id: 42805)
  • Fixed an issue where safe_divide queries returned "Calcite assertion violated" errors (id: 41766)
  • Fixed an issue where SQL-based ingestion fails if the first monitor for druid.server.metrics.ServiceStatusMonitor is ServiceStatusMonitor (id: 38520)
  • Improved ingestion performance by parsing an input stream directly instead of converting it to a string and parsing the string as JSON (#15693) (id: 57692)
  • Improved optimizations to the MSQ task engine for real-time queries so that they are backwards compatible (id: 42658)
  • Improved serialization of TaskReportMap (#16217) (id: 60179)
  • Improved the creation of input row filter predicate in various batch tasks (#16196) (id: 56861)
  • Improved how tasks are fetched from the Overlord to redact credentials (#16182) (id: 52829)
  • Improved the web console to only pick the Kafka input format by default when needed (#16180) (id: 60186)
  • Improved compaction segment read and published fields to include sequential compaction tasks (#16171) (id: 60142)
  • Improved the markUnused API endpoint to handle an empty list of segment versions (#16198) (id: 56864)
  • Improved the segmentIds filter in the markUsed API payload so that it's parameterized in the database query (#16174) (id: 47268)
  • Improved how quickly workers get canceled for the MSQ task engine (#16158) (id: 43179)
  • Improved the MSQ task engine to support IS NOT DISTINCT FROM for SortMerge joins (#16003) (id: 43099)
  • Improved the download query detail archive option in the web console to be more resilient when the detail archive is incomplete (#16071) (id: 42908)
  • Improved the UX for arrayIngestMode in the web console (#15927) (id: 43038)
  • Improved array handling for Booleans to account for queries such as select array[true, false] from datasource (#16093) (id: 42963) (id: 42610)
  • Improved nested columns. Nested column serialization now releases nested field compression buffers as soon as the nested field serialization is completed, which requires significantly less direct memory during segment serialization when many nested fields are present (#16076) (id: 42955)
  • Improved querying to decrease the chance of going OOM with high cardinality data Group By (#16114) (id: 42502)
  • Improved real-time queries that use the MSQ task engine by changing how segments are grouped (#15399) (id: 39167)
  • Optimized isOvershadowed when there is a unique minor version for an interval (#15952) (id: 43287)
  • Updated the following dependencies:
    • redisclients:jedis from 5.0.2 to 5.1.2 (#16074) (id: 42909)
    • express from 4.18.2 to 4.19.2 in the web console (#16204) (id: 60147)
    • webpackdevmiddleware from 5.3.3 to 5.3.4 in the web console (#16195) (id: 60146)
    • followredirects from 1.15.5 to 1.15.6 in the web console (#16134) (id: 43157)
    • axios in web console (#16087) (id: 42954)
    • druidtoolkit from 0.21.9 to 0.22.11 in the web console (#16213) (id: 60144)

Clarity changes

  • Disabled alert custom time periods of less than one minute (id: 43223)

Imply Manager changes

  • Allowed all kube-system pods to be moved by the cluster-autoscaler in GKE (id: 43198)
  • Prevent middle managers from being replaced before task status is synced during rolling updates (id: 40137)
  • Imply Hybrid on AWS:
    • Enabled ServiceStatusMonitor by default (id: 38540)
    • Fixed cluster manager API so that custom extensions are not removed (id: 33190)

Changes in 2024.03.2

Pivot changes

  • Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)

Changes in 2024.03.1

Druid changes

  • Fixed an issue where the Overlord process could fail to return the location of tasks (id: 60106)

Changes in 2024.03

Pivot highlights

Time series visualization supports more functions

The time series visualization now supports the following time series functions in addition to TIMESERIES and DELTA_TIMESERIES:

  • ADD_TIMESERIES
  • DIVIDE_TIMESERIES
  • MULTIPY_TIMESERIES
  • SUBTRACT_TIMESERIES

The configuration options have also been simplified. See Visualization reference for details.

(id: 40670)

Druid highlights

Dynamic table append

You can now use the TABLE(APPEND(...)) function to implicitly create unions based on table schemas. For example, the two following queries are equivalent:

TABLE(APPEND('table1','table2','table3'))

and

SELECT column1,NULL AS column2,NULL AS column3 FROM table1
UNION ALL
SELECT NULL AS column1,column2,NULL AS column3 FROM table2
UNION ALL
SELECT column1,column2,column3 FROM table3

Note that if the same columns are defined with different input types, Druid uses the least restrictive column type.

(#15897) (id: 42645)

Renamed segment kill metric

The kill/candidateUnusedSegments/count metric is now called kill/eligibleUnusedSegments/count.

(#15977) (id: 42492)

Improved streaming task completion reports

Streaming Task completion reports now have an extra field, recordsProcessed. The field lists the partitions processed by that task and the count of records for each partition. You can look at this field to see the actual throughput of tasks and make decisions on whether to scale your workers vertically or horizontally.

(#15930) (id: 42430)

Improved Supervisor rolling restarts

The stopTaskCount config now prioritizes stopping older tasks first. As part of this change, you must also explicitly set a value for stopTaskCount. It no longer defaults to the same value as taskCount.

(#15859) (id: 42143) (id: 40605)

Parallelized incremental segment creation

You can now configure the number of threads used to create and persist incremental segments on the disk using the numPersistThreads property. Use additional threads to parallelize the segment creation to prevent ingestion from stalling or pausing frequently as long as there are sufficient CPU resources available.

(#13982) (id: 32098)

Fixes for deep storage on Google Cloud Storage

This release contains fixes for customers using deep storage on GCS. The issues were caused by updates to the Google Cloud Client libraries from an older API client. Affected STS versions of Imply were 2024.01 STS through 2024.02.3 STS. For remediation steps for kill task failures see Remove orphaned segments in deep storage.

  • Fixed kill task failures caused when trying to delete a file that doesn't exist in Google Cloud Storage (#16047) (id: 42663)
  • Fixed an issue where Druid incorrectly deleted task log events when druid.indexer.logs.kill.enabled is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838)
  • Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)

Improved query performance for AND filters

Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. Known as "filter partitioning," it can result in dramatic performance increases, depending on the order of filters in the query.

For example, take a query like SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'. Previously, Druid used indexes when processing filters if they are available. That's not always ideal; imagine if stringColumn1 = '1000' matches 100 rows. With indexes, we have to find every value of stringColumn2 LIKE '%1%' that is true to compute the indexes for the filter. If stringColumn2 has more than 100 values, it ends up being worse than simply checking for a match in those 100 remaining rows.

With the new logic, Druid checks the selectivity of indexes as it processes each clause of the AND filter. If it determines it would take more work to compute the index than to match the remaining rows, Druid skips computing the index.

The order you write filters in a WHERE clause of a query can improve the performance of the query. More improvements are coming, but you can try out the existing improvements by reordering a query. Put less intensive to compute indexes such as IS NULL, =, and comparisons (>, >=, <, and <=) near the start of AND filters so that Druid more efficiently processes your queries. Not ordering your filters in this way won’t degrade performance from previous releases since the fallback behavior is what Druid did previously.

(#15838) (id: 41535)

Pivot changes

  • Added permission AccessDownloadAsync to allow users to access the async download (alpha) feature when the feature is enabled for your organization (id: 42274)
  • You can now set the Latest data strategy to Query the latest timestamp from the data source, relative to the latest full day in the advanced data cube options (id: 39634)
  • You can now set the default view in a data cube's defaults to be a gauge, flat table, time series, or overall (beta) visualization (id: 41373)
  • You can now choose whether or not to display the year in time values in a table visualization (id: 40988)
  • Fixed an issue where filters and shown dimensions and measures were not preserved when switching to some visualization types (id: 41059)
  • Fixed Pivot showing an error for some time comparisons in a data cube (id: 41013)
  • Fixed a rounding issue in the display of dimensions (id: 42654)
  • Fixed downloads limited to 5,000 rows for flat table, gauge, time series, and overall (beta) visualizations (id: 42600)
  • Fixed failed async downloads producing a truncated file instead of an error (id: 42595)
  • Fixed query precision issues (ids: 42521, 42227, 42230)
  • Fixed async downloads not working with "previous period" comparisons (id: 42247)
  • Fixed Pivot crashing when applying a filter to the records visualization (id: 42189)
  • Fixed dashboard tiles causing save conflicts in flat table, gauge, time series, and overall (beta) visualizations (id: 41414)
  • Fixed lack of indication when data cube is refreshed (id: 40260)

Druid changes

  • Added support for single value aggregated Group By queries for scalars (#15700) (id: 41951)
  • Added support for numeric arrays to columnar frames, which are used in subquery materializations and window functions (#15917) (id: 41784)
  • Added the ability to set custom dimensions for events emitted by the Kafka emitter as a JSON map for the druid.emitter.kafka.extra.dimensions property. For example, druid.emitter.kafka.extra.dimensions={"region":"us-east-1","environment":"preProd"} (#15845) (id: 41961)
  • Added more AWS Kinesis regions and groups to the web console (#15900) (id: 42476)
  • Added support to the web console for Protobuf input formats and the Avro bytes decoder (#15950) (id: 42461)
  • Changed the format of the value of targetDataSource in EXPLAIN clauses for SQL-based ingestion queries back to being a string. For some recent releases, it was a JSON object (#16004) (id: 42575)
  • Changed the severity of a k8sTaskRunner log message to WARN (#15871) (id: 42303)
  • Changed the durationMs properties in MSQ task reports to exclude worker/controller start up time (id: 40311)
  • Fixed an issue where queries that use LATEST_BY or EARLIEST_BY return null when they contain a secondary timestamp column (#15939) (id: 42917)
  • Fixed an issue where Druid incorrectly deleted task log events when druid.indexer.logs.kill.enabled is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838)
  • Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)
  • Fixed an issue where the data loader for the web console crashes when attempting to parse data that can't be parsed (#15983) (id: 42649)
  • Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. (#15615) (id: 42657)
  • Fixed an issue where compaction tasks reports got overwritten. New entries are written to the report instead (#15981) (id: 42673)
  • Fixed an issue that occurred when the castToType parameter is set on auto column schema (#15921) (id: 42434)
  • Fixed an issue where flattenSpec is in the wrong location if you use the web console to generate the supervisor spec for a Kafka ingestion (#15946) (id: 42433)
  • Fixed an issue where Kubernetes environment variables that use underscores would be parsed incorrectly (#15919) (id: 42336)
  • Fixed an issue where the wrong base template would be used for task types included through extensions, such as index_kinesis. For example, if you define druid.indexer.runner.k8s.podTemplate.index_kafka, the KubernetesTaskRunner still used druid.indexer.runner.k8s.podTemplate.base as the base template for tasks.(#15915) (id: 42293)
  • Fixed an issue where a query returns the wrong results if PARSE_LONG is null (#15909) (id: 42134)
  • Fixed an issue where Druid incorrectly deleted task log events when druid.indexer.logs.kill.enabled is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838)
  • Fixed an issue where MSQ task engine results are truncated and return an error (#16107)
  • Improved Connection Count server select strategy to account for slow connection requests (#15975) (id: 42662)
  • Improved the retry behavior for deep storage connections (#15938) (id: 42690)
  • Improved how segments are counted so that segments still available through deep storage (replicas set to 0) are not marked as unavailable (#16020) (id: 42656)
  • Improved the error message for when a MSQ task engine-based join using the sortMerge option falls back to a broadcast join (#16002) (id: 42655)
  • Improved druid-basic-security performance by using the cache for password hash when validating LDAP passwords (#15993) (id: 42650)
  • Improved concurrent replace to work with supervisors using concurrent locks (#15995) (id: 42648)
  • Improved the web console to detect doubles better (#15998) (id: 42646)
  • Improved the web console to be able to search in tables and columns (#15990) (id: 42647)
  • Improved segment trouble shooting. Segments created in the same batch have the same created_date entry (#15977) (id: 42492)
  • Improved the error messages you get if there's an issue with your PARTITIONED BY clause (#15961) (id: 42462)
  • Improved the web console to support export with the MSQ task engine (#15969) (id: 42460)
  • Improved how connections are counted and servers are selected to account for slow connections (#15975) (id: 42407)
  • Improved the web console to allow compaction config slots to drop to 0, such as when compaction is paused (#15877) (id: 42178)
  • Improved the web console to include system fields when using the batch data loader (#15858) (id: 41918)
  • Updated PostgreSQL from 42.6.0 to 42.7.2 (#15931) (id: 42432)
  • Improved performance for real-time queries that use the MSQ task engine (#15399) (id: 39167)
  • Improved the Coordinator process to better handle an uninitialized cache in node role watchers, which could lead to stuck tasks (#15726) (id: 39099)
  • Improved how expressions are evaluated to ensure thread safety (#15694) (id: 42620)
  • Improved batching of scan results while estimating bytes (#15987) (id: 42507)
  • Updated Log4j from 2.18.0 to 2.22.1 (#15934) (id: 42431)

Platform changes

  • Account settings now display in Imply Hybrid Manager in SSO mode (id: 42372)
  • Fixed an issue with Imply Enterprise on GKE deployments where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)
  • Fixed a race condition that could cause Enterprise deployments on GKE to fail to start because of files missing from the configuration bundle (id: 42747) (id: 42726)
  • Fixed an issue where GCP automatically deploys a managed Prometheus instance causing pod exhaustion. Imply Enterprise on GKE turns off these automatic Prometheus deployments by default now (id: 42567)

Changes in 2024.02.4

Pivot changes

  • Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)

Changes in 2024.02.3

Druid changes

  • Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. For details, see the following Imply Knowledge Base article (id: 42545)

Changes in 2024.02.2

Druid changes

  • Fixed an issue with filters on expression virtual column indexes incorrectly considering values null in some cases for expressions which translate null values into not null values (id: 42448)

Changes in 2024.02.1

Druid changes

  • Fixed an issue where the Druid console generates a Kafka supervisor spec where flattenSpec is in the wrong place, causing it to be ignored (#15946)

Pivot changes

  • Fixed an issue where Pivot closes unexpectedly when you open the records visualization and apply a filter (id: 42189)

Platform changes

  • Fixed an issue with the GKE enhanced installation where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)

Changes in 2024.02

Pivot highlights

New overall visualization (beta)

A new overall visualization includes a trend line and an updated properties panel.

You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, the beta overall visualization replaces the standard overall visualization. See Visualizations reference for more information. (ids: 40562, 41090)

Druid highlights

Improved concurrent append and replace

You no longer need to manually specify the task lock type for concurrent append and replace using the taskLockType context parameter. Instead, Druid can determine it for you. You can either use a context parameter or a cluster-wide config:

  • Use the context parameter "useConcurrentLocks": true for specific JSON-based or streaming ingestion tasks and datasource. Datasources need the parameter in situations such as when you want to be able to append data to the datasource while compaction is running.
  • Set the cluster-wide config druid.indexer.task.default.context to true.

(#1568) (id: 41083)

Range support for window functions

Window functions now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter windowingStrictValidation to false.

The following example shows a window expression with RANGE frame specifications:

(ORDER BY c)
(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)

(#15746) (#15365) (id: 41623)

Ingest from multiple Azure accounts

Azure as an ingestion source now supports ingesting data from multiple storage accounts that are specified in druid.azure.account. To do this, use the new azureStorage schema instead of the previous azure schema. For example,

    "ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "azureStorage",
"objectGlob": "**.json",
"uris": ["azureStorage://storageAccount/container/prefix1/file.json", "azureStorage://storageAccount/container/prefix2/file2.json"]
},
"inputFormat": {
"type": "json"
},
...
},
...

(#15630) (id: 41428)

Improved performance for real-time queries

If the query context bySegment is set to false for real-time queries, the way in which layers are merged has been improved to be more efficient. There's now only a single layer of merging, just like for Historical processes. As part of this change, segment metrics, like query/segment/time, are now per-FireHydrant instead of per-Sink.

If you set bySegment to true, the old behavior of two layer is preserved.

(#15757) (id: 41406)

Pivot changes

  • Added maxNumDownloadTasks to Pivot server configuration file, to optionally set the maximum number of tasks to assign to async downloads. See Pivot server config for more information (id: 41092)
  • Added an option to "Go to URL" for URL dimensions in the flat table visualization (id: 41283)
  • Fixed an error that appeared when duplicating a dashboard from the header bar (id: 41537)
  • Fixed a problem with filtering on a dimension with Set/String type that contains nulls (id: 41459)
  • Fixed an issue where async downloads didn't include filters by measure (id: 41435)
  • Fixed records table visualization crashing when scrolling to the bottom in a dashboard tile (id: 41165)
  • Fixed an issue with the records visualization not supporting async download (id: 41289)
  • Fixed dimensions with IDs that contain periods showing as "undefined" in records table visualization (id: 41009)
  • Fixed Pivot 2 visualizations crashing on data cubes with no dimensions (id: 40998)
  • Fixed inability to set "Greater than 0" measure filter in flat table visualization (id: 40985)
  • Fixed a problem with visualization URLs not updating after a measure is deleted from a data cube (id: 40565)
  • Fixed "overall" values rendering incorrectly in line chart visualization when they should be hidden (id: 40501)
  • Fixed incorrect time bucket label for America/Mexico_City timezone in DST (id: 39749)
  • Fixed inability to scroll pinned dimensions list (id: 39647)
  • Fixed discrepancies when applying custom UI colors (id: 40266)
  • Improved handling of time filters dashboard tiles (id: 41171)
  • Improved measures in tables visualization to show nulls if they contain no data (id: 40665)
  • Improved the display of comparison values in visualizations, by adding the ability to sort by delta and percentage (id: 38539)

Druid changes

  • Added QueryLifecycle#authorize for grpcqueryextension (#15816) (id: 41725)
  • Added nested array index support fix some issues (#15752) (id: 41724)
  • Added support for array types in the web console ingestion wizards (#15588) (id: 41613)
  • Added SQUARE_ROOT function to the timeseries extension: MAP_TIMESERIES(timeseries, 'sqrt(value)') (id: 41516)
  • Added null value index wiring for nested columns (#15687) (id: 41475)
  • Added support to the web console for sorting the segment table on start and end when grouped (#15720) (id: 41438)
  • Added a tile to the web console for the new Azure input source (id: 41317)
  • Added ImmutableLookupMap for static lookups (#15675) (id: 41268)
  • Added Cache value selectors in RowBasedColumnSelectorFactory (#15615) (id: 41265)
  • Added faster kway merging using tournament trees 8byte key strides (#15661) (id: 40987)
  • Added CONCAT flattening filter decomposition (#15634) (id: 40986)
  • Added partition boosting for INSERT with GROUP BY (dealing with skewed partition) (#15474) (id: 15015)
  • Added SQL compatibility for numeric first and last column types. The web console also provides an option for first and last aggregation(#15607) (id: 40615)
  • Added differentiation between null and empty strings in SerializablePairStringLong serde (id: 40401)
  • Changed IncrementalIndex#add is no longer thread safe and improves performance (#15697) (id: 41260)
  • Fixed the KafkaInputFormat parsing incoming JSON newline-delimited (as if it were a batch ingest) rather than as a whole entity (as is typical for streaming ingest) (#15692) (id: 41261)
  • Improved segment locking behavior so that the RetrieveSegmentsToReplaceAction is no longer needed (#15699) (id: 41484)
  • Disabled eager initialization for non-query connection requests (#15751) (id: 41407)
  • Enabled ArrayListRowsAndColumns to StorageAdapter conversion (#15735) (id: 41616)
  • Enabled query request queuing by default when total laning is turned on (#15440) (id: 40807)
  • Fixed web console forcing waitUntilSegmentLoad to true even if the user sets it to false (#15781) (id: 41614)
  • Fixed CVEs (#15814) (id: 41612)
  • Fixed interpolated exception message in InvalidNullByteFault (#15804) (id: 41546)
  • Fixed extractionFns on numberwrapping dimension selectors (#15761) (id: 41443)
  • Fixed summary iterator in grouping engine(#15658) (id: 41264)
  • Fixed incorrect scale when reading decimal from parquet (#15715) (id: 41263)
  • Fixed a rendering issue for disabled workers in the web console (#15712) (id: 41259)
  • Fixed issues so that the Kafka emitter will now run all scheduled callables. The emitter now intelligently provision threads to make sure there are no wasted threads, and all callables can run (#15719) (id: 41258)
  • Fixed MSQ task engine intermediate files not being immediately cleaned up in Azure (id: 41243)
  • Fixed audit log entries not appearing for "Mark as used all segments" actions (id: 41080)
  • Fixed some naming related to AggregatePullUpLookupRule (#15677) (id: 41030) -- NOT USER FACING. DELETE
  • Fixed an NPE that could occur if the StandardDeviationPostAggregator passed in is null: postAggregations.estimator: null (#15660) (id: 41003)
  • Fixed reverse pull-up lookups in the SQL planner (#15626) (id: 41002)
  • Fixed compaction getting stuck on intervals with tombstones (#15676) (id: 41001)
  • Fixed Resultcache causing an exception when a sketch is stored in the cache (#15654) (id: 40885)
  • Fixed concurrent append and replace options in the web console (#15649) (id: 40868)
  • Fixed an issue that blocked queries issued from the small Run buttons (from inside the larger queries) from being modified from the table actions. (#15779) (id: 41515)
  • Improved segment killing performance for Azure (#15770) (id: 38567)
  • Improved the performance of the druid-basic-security extension (#15648) (id: 40884)
  • Improved lookups to register first lookup immediately, regardless of the cache status (#15598) (id: 40863)
  • Improved numerical first and last aggregators so that they work for SQL-based ingestion too (id: 40996)
  • Improved parsing speed for list-based input rows (#15681) (id: 41262)
  • Improved error messages for DATE_TRUNC operators (#15759) (id: 41471)
  • Improved the web console to support using file inputs instead of text inputs for the Load query detail archive dialogue (#15632) (id: 40941)
  • Changed the web console to use the new azureStorage input type instead of the azure storage type for ingesting from Azure (#15820) (id: 41723)
  • Changed the cryptographic salt size that Druid uses to 128 bits so that it is FIPS compliant (#15758) (id: 41405)

Changes in 2024.01.4

Pivot changes

  • Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)

Changes in 2024.01.3

Druid changes

  • Fixed an issue where DataSketches HLL Sketches would erroneously be considered empty. For details see the following Imply Knowledge Base article (id: 41916)

Changes in 2024.01.2

Druid changes

  • Fixed an issue where an exception occurs when queries use filters on TIME_FLOOR (#15778)

Changes in 2024.01.1

Druid changes

  • Fixed an issue with the default value for the inSubQueryThreshold parameter, which resulted in slower than expected queries. The default value for it is now 2147483647 (up from 20) (#15688) (id: 40814)

Changes in 2024.01

Pivot highlights

Pivot now runs natively on macOS ARM systems

We encourage on-prem customers to opt-in to an updated distribution format for Pivot by setting an environment variable in your Pivot nodes: IMPLY_PIVOT_NOPKG=1. This format will become the default later in 2024.

This distribution format enables Pivot to target current and future LTS versions of Node.js and provides a compatibility option for customers who are unable to upgrade from legacy Linux distributions such as RHEL 7, CentOS 7, and Ubuntu 18.04. (id: 40447)

Druid highlights

SQL PIVOT and UNPIVOT (beta)

You can now use the SQL PIVOT and UNPIVOT operators to turn rows into columns and column values into rows respectively. (id: 37598)

The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:

PIVOT (aggregation_function(column_to_aggregate)
FOR column_with_values_to_pivot
IN (pivoted_column1 [, pivoted_column2 ...])
)

The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:

UNPIVOT (values_column 
FOR names_column
IN (unpivoted_column1 [, unpivoted_column2 ... ])
)

New JSON_QUERY_ARRAY function

The JSON_QUERY_ARRAY function is similar to JSON_QUERY except the return type is always ARRAY<COMPLEX<json>> instead of COMPLEX<json>. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations. (#15521) (id: 40335)

Changes to native equals filter

Native query equals filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays. (#15503) (id: 40328)

Support for GCS for SQL-based ingestion

You can now use Google Cloud Storage (GCS) as durable storage for SQL-based ingestion and queries from deep storage. (#15398) (id: 35053)

Improved INNER joins

Druid can support arbitrary join conditions for INNER join. For INNER joins, Druid will look at the join condition, and any sub-conditions that cannot be evaluated efficiently as part of the join will be converted to a post-join filter. With this feature, you can do inequality joins that were not possible before. (#15302) (id: 37564)

Pivot changes

  • Added Pivot server configuration property forceNoRedirect which forces the Pivot UI to always render the splash page without automatic redirection (id: 38986)
  • Added the ability to sort a data cube by the first column, by clicking the column header (id: 31363)
  • Fixed percent of root causing downloads from deep storage to fail (id: 40673)
  • Fixed incorrect sort order in deep storage downloads (id: 40374)
  • Fixed flat table visualization with absolute time filter using "Latest day" when accessed with link (id: 40339)
  • Fixed functional and display issues in the overall visualization (id: 40271)
  • Fixed back button not working correctly in async downloads dialog (id: 40265)
  • Improved query generation in Pivot and Plywood to use the 2-value IS NOT TRUE version of the NOT operator (id: 40638)
  • Improved data cube measure preview by providing a manual override prompt when the preview fails (id: 38763)
  • Updated the names of the async downloads feature flags to Async Downloads (Deprecated) and Async Downloads, New Engine, 2023 (Alpha) (id: 40525)

Druid changes

  • Added experimental support for first/last data types for double/float/long during native and SQL-based ingestion (#14462) (id: 37231)
  • Added new config druid.audit.manager.type which can take values log, sql(default). This allows audited events to either be logged or persisted in metadata store (default behavior). (#15480) (id: 37696)
  • Added new config druid.audit.manager.logLevel which allows users to set the log level of audit events and can take values DEBUG, INFO(default), WARN. (#15480) (id: 37696)
  • Added array column type support to EXTEND operator (#15458) (id: 40286)
  • Changed what happens when query scheduler threads are less than server HTTP threads. When that happens, total laning is enforced, and some HTTP threads are reserved for non-query requests, such as health checks. Previously, any request that exceeded lane capacity was rejected. Now, excess requests are queued with a timeout equal to MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout). If the value is negative, requests are queued forever. (#15440) (id: 40776)
  • Changed the ARRAY_TO_MV function to support expression inputs (#15528) (id: 40358)
  • Changed the auto column indexer so that when columns that contain only empty or null containing arrays are ingested, they are stored as ARRAY<LONG> instead of COMPLEX<json>. (#15505) (id: 40313)
  • Fixed an issue where null and empty strings were treated equally, and the return value was always null (#15525) (id: 40401)
  • Fixed an issue where lookups fail with an error related to failing to construct FilteredAggregatorFactory (#15526) (id: 40296)
  • Fixed issues related to null handling and vector expression processors (#15587) (id: 40545)
  • Fixed a bug in the ingestion spec to SQL-based ingestion query convertor for the web console (#15627) (id: 40795)
  • Fixed redundant expansion in SearchOperatorConversion (#15625) (id: 40768)
  • Fixed an issue where some ARRAY types were treated incorrectly as COMPLEX types instead(#15543) (id: 40514)
  • Fixed a NPE with virtual expressions and unnest (#15513) (id: 40348)
  • Fixed an issue where the Window function minimum aggregates nulls as 0 (#15371) (id: 40327)
  • Fixed an issue where null filters on datasources with range partitioning could lead to excessive segment pruning, leading to missed results (#15500) (id: 40288)
  • Fixed an issue with window functions where a string cannot be cast when creating HLL sketches (#15465) (id: 39859)
  • Fixed a bug in segment allocation that can potentially cause loss of appended data when running interleaved append and replace tasks. (#15459) (id: 39718)
  • Improved filtering performance by adding support for using underlying column index for ExpressionVirtualColumn (#15585) (#15633) (id: 39668) (id: 40794)
  • Improved how three-valued logic is handled (#15629) (id: 40797)
  • Improved the Broker to be able to use catalog for datasource schemas for SQL queries (#15469) (id: 40796)
  • Improved the Druid audit system to log when a supervisor is created or updated (#15636) (id: 40774)
  • Improved the connection between Brokers and Coordinators with Historical and real-time processes (#15596) (id: 40763)
  • Improved how segment granularity is handled when there is a conflict and the requested segment granularity can't be allocated. Day granularity is now considered after month. Previously, week was used, but weeks do not align with months perfectly. You can still explicitly request week granularity. (#15589) (id: 40701)
  • Improved polling in segment allocation queue to improve efficiency and prevent race conditions (#15590) (id: 40690)
  • Improved the web console to detect EXPLAIN PLAN queries and be able to run them individually (#15570) (id: 40508)
  • Improved the efficiency of queries by Reducing amount of expression objects created during evaluations (#15552) (id: 40495)
  • Improved the error message you get if you try to use INSERT INTO and OVERWRITE syntax (id: 37790)
  • Improved the JDBC lookup dialog in the web console to include Jitter seconds, Load timeout seconds, and Max heap percentage options (#15472) (id: 40246)
  • Improved compaction so that it skips for datasources with partial eternity segments, which could result in memory pressure on the Coordinator (#15542) (id: 40075)
  • Improved Kinesis integration so that only checkpoints for partitions with unavailable sequence numbers are reset (#15338) (id: 29788)
  • Improved the performance of the following:
    • how Druid generates queries from Calcite plans
    • the internal SEARCH operator used by other functions
    • the COALESCE function (#15609) (id: 40672) (#15623) (id: 40691)
  • Removed the ‘auto’ strategy from search queries. Specifying ‘auto’ will now be equivalent to specifying useIndexes (#15550) (id: 40460)

Clarity changes

  • Updated subsetFormula for server cube to accept null values (id: 40254)

Platform changes

  • Added support for JVM memory metrics in GKE ZooKeeper deployments (id: 38855)

Upgrade and downgrade notes

Minimum supported version for rolling upgrade

See "Supported upgrade paths" in the Lifecycle Policy documentation.

Front-coded dictionaries

In 2025, the front-coded dictionaries feature will be enabled by default. Front-coded dictionaries reduce storage and improve performance by optimizing strings with similar prefixes.

Once this feature is enabled, you cannot easily downgrade to an earlier version that doesn't support it.

For more information, see Migration guide: front-coded dictionaries.

If you're already using this feature, you don't need to take any action.

Batch ingestion task failure

There is a known issue with Imply 2024.05 versions where batch ingestion tasks can fail during rolling upgrades or downgrades. If you run into a task failure during an upgrade or downgrade, restart the failed task after the rolling upgrade or downgrade completes. This issue is fixed in Imply Enterprise and Imply Hybrid 2024.06. If you need to avoid such task failures, upgrade to 2024.06 or later.

Filter tokens in Pivot

If you use subset filters in conjunction with filter tokens, upgrade to 2024.07. For details, see the following Imply Knowledge Base article.

Remove orphaned segments in GCS deep storage

If you have orphaned segments from failed kill tasks from 2024.01 STS through 2024.02.3 STS, optionally identify and delete any segments that meet both of the following criteria:

  • Segment exists in deep storage, but has no corresponding metadata store record.
  • Segment is older than 1 week.

Identifying segments older than a week will prevent deletion of pending segments.

stopTaskCount must now be explicitly set

Starting in 2024.03 STS, you must explicitly set a value for stopTaskCount if you want to use it for streaming ingestion. It no longer defaults to the same value as taskCount.

Segment metrics for real-time queries

Starting in 2024.02 STS, segment metrics for real-time queries (such as query/segment/time) are per-FireHydrant instead of per-Sink when the context parameter bySegment is set to false, which is common for most use cases.

Renamed segment metric

Starting in 2024.03 STS, the kill/candidateUnusedSegments/count is now called kill/eligibleUnusedSegments/count.

(#15977) (id: 42492)

GroupBy queries that use the MSQ task engine during upgrades

Beginning in 2024.02 STS, the performance and behavior for segment partitioning has been improved. GroupBy queries may fail during an upgrade if some workers are on an older version and some are on a more recent version.

Changes to native equals filter

Beginning in 2024.01 STS, the native query equals filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays.

Imply Hybrid MySQL upgrade

Imply Hybrid previously used MySQL 5.7 by default. New clusters will use MySQL 8 by default. If you have an existing cluster, you'll need to upgrade the MySQL version since the Amazon RDS support end date for this version is scheduled for February 29, 2024. Although you can opt for extended support from Amazon, you can use Imply Hybrid Manager to upgrade your MySQL instance to MySQL 8.

The upgrade should have little to no impact on your queries but does require a reconnection to the database. The process can take an hour and services will reconnect to the database during the upgrade.

In preparation for the upgrade, you need to grant certain permissions to the Cloud Manager IAM role by applying the following policy:

Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"rds:CreateBlueGreenDeployment",
"rds:PromoteReadReplica"
],
"Resource": [
"arn:aws:rds:*:*:pg:*",
"arn:aws:rds:*:*:deployment:*",
"arn:aws:rds:*:*:*:imply-*"
],
"Effect": "Allow"
},
{
"Action": [
"rds:AddTagsToResource",
"rds:CreateDBInstanceReadReplica",
"rds:DeleteBlueGreenDeployment",
"rds:DescribeBlueGreenDeployments",
"rds:SwitchoverBlueGreenDeployment"
],
"Resource": "*",
"Effect": "Allow"
}
]
}

After you grant the permissions, click Apply changes for Amazon RDS MySQL Update on the Overview page of Imply Hybrid Manager.

Three-valued logic

caution

The legacy two-valued logic and the corresponding properties that support it will be removed in the December 2024 STS and January 2025 LTS. The SQL compatible three-valued logic will be the only option.

Update your queries and downstream apps prior to these releases.

SQL standard three-valued logic introduced in 2023.11 primarily affects filters using the logical NOT operation on columns with NULL values. This applies to both query and ingestion time filtering.

The following example illustrates the old behavior and the new behavior: Consider the filter “x <> 'some value'” to filter results for which x is not equal to 'some value'. Previously, Druid included all rows not matching "x='some value'" including null values. The new behavior follows the SQL standard and will now only match rows with a value and which are not equal to 'some value'. Null values are excluded from the results.

This change primarily affects filters using the logical NOT operation on columns with NULL values.

Three-valued logic is only enabled if you accept the following default values:

druid.generic.useDefaultValueForNull=false
druid.expressions.useStrictBooleans=true
druid.generic.useThreeValueLogicForNativeFilters=true

SQL compatibility

caution

The legacy behavior that is not compatible with standard ANSI SQL and the corresponding properties will be removed in the December 2024 STS and January 2025 LTS releases. The SQL compatible behavior introduced in the 2023.09 STS will be the only behavior available.

Update your queries and any downstream apps prior to these releases.

Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.

For nulls, Druid now differentiates between an empty string ('') and a record with no data as well as between an empty numerical record and 0.

You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull to true. This property affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time and query time. Reverting this setting to the old value restores the previous behavior without reingestion.

For booleans, Druid now strictly uses 1 (true) or 0 (false). Previously, true and false could be represented either as true and false as well as 1 and 0, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL.

druid.expressions.useStrictBooleans primarily affects querying, however it also affects json columns and type-aware schema discovery for ingestion. You can set druid.expressions.useStrictBooleans to false to configure Druid to ingest booleans in 'auto' and 'json' columns as VARCHAR (native STRING) typed columns that use string values of 'true' and 'false' instead of BIGINT (native LONG). You must set it on all Druid service types to be available at both ingestion time and query time.

The following table illustrates some example scenarios and the impact of the changes:

Show the table
Query2023.08 STS and earlier2023.09 STS and later
Query empty stringEmpty string ('') or nullEmpty string ('')
Query null stringNull or emptyNull
COUNT(*)All rows, including nullsAll rows, including nulls
COUNT(column)All rows excluding empty stringsAll rows including empty strings but excluding nulls
Expression 100 && 11111
Expression 100 || 111001
Null FLOAT/DOUBLE column0.0Null
Null LONG column0Null
Null __time column0, meaning 1970-01-01 00:00:00 UTC1970-01-01 00:00:00 UTC
Null MVD column''Null
ARRAYNullNull
COMPLEXnoneNull
Update your queries

Before you upgrade from a version prior to 2023.09 to 2023.09 or later, update your queries to account for the changed behavior:

NULL filters

If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL to s IS NULL OR s = ''.

COUNT functions

COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column) with COUNT(column) FILTER(WHERE column <> '').

GroupBy queries

GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.

Avatica JDBC driver upgrade

info

The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.

If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection.

Parameter execution changes for Kafka

When using the built-in FileConfigProvider for Kafka, interpolations are now intercepted by the JsonConfigurator instead of being passed down to the Kafka provider. This breaks existing deployments.

For more information, see KIP-297 and #13023.

Deprecation notices

Two-valued logic

Druid's legacy two-valued logic for native filters and the properties for maintaining that behavior are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.

The ANSI-SQL compliant three-valued logic will be the only supported behavior after these releases. This SQL compatible behavior became the default for deployments that use Imply 2023.11 STS and January 2024 LTS releases.

Update your queries and downstream apps prior to these releases.

For more information, see three-valued logic.

Properties for legacy Druid SQL behavior

Druid's legacy behavior for Booleans and NULLs and the corresponding properties are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.

The ANSI-SQL compliant treatment of Booleans and null values will be the only supported behavior after these releases. This SQL compatible behavior became the default for Imply 2023.11 STS and January 2024 LTS.

Update your queries and downstream apps prior to these releases.

For more information, see SQL compatibility.

Some segment loading configs deprecated

Starting with 2023.08 STS, the following segment related configs are now deprecated and will be removed in future releases:

  • maxSegmentsInNodeLoadingQueue
  • maxSegmentsToMove
  • replicationThrottleLimit
  • useRoundRobinSegmentAssignment
  • replicantLifetime
  • maxNonPrimaryReplicantsToLoad
  • decommissioningMaxPercentOfMaxSegmentsToMove

Use smartSegmentLoading mode instead, which calculates values for these variables automatically.

SysMonitor support deprecated

Starting with 2023.08 STS, switch to OshiSysMonitor as SysMonitor is now deprecated and will be removed in future releases.

Asynchronous SQL download deprecated

The async downloads feature is deprecated and will be removed in future releases. Instead consider using Query from deep storage.

End of support

CrossTab view

The CrossTab view feature is no longer supported. Use Pivot 2.0 instead, which incorporates the capabilities of CrossTab view.

Zookeeper-based segment loading

Zookeeper-based segment loading is no longer supported. You do not need to take any action. Druid ignores the following related configs:

  • druid.coordinator.load.timeout
  • druid.coordinator.loadqueuepeon.type
  • druid.coordinator.curator.loadqueuepeon.numCallbackThreads

Druid only uses the recommended HTTP loading, which includes improvements to the Coordinator service such as smart segment loading.

As part of this change, compaction configs for inactive datasources are automatically cleaned up by default.

(#15705) (#id: 60764)