Imply Enterprise and Hybrid release notes
Imply releases include Imply Manager, Pivot, Clarity, and Imply's distribution of Apache Druid®. Imply delivers improvements more quickly than open source because Imply's distribution of Apache Druid uses the primary branch of Apache Druid. This means that it isn't an exact match to any specific open source release. Any open source version numbers mentioned in the Imply documentation don't pertain to Imply's distribution of Apache Druid.
The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2025.07. Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. Additionally, review the deprecations page regularly to see if any features you use are impacted.
For information on the LTS release, see the LTS release notes.
If you are upgrading by more than one version, read the intermediate release notes too.
The following end-of-support dates apply in 2025:
- On January 26, 2025, Imply 2023.01 LTS reaches EOL. This means that the 2023.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
- On January 31, 2025, Imply 2024.01 LTS ends general support status and will be eligible only for security support.
For more information, see Lifecycle Policy.
See Previous versions for information on older releases.
Imply evaluation
New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!
With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.
Imply Enterprise
If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.
Changes in 2025.07
Druid highlights
Use SET statements for query context parameters
You can now use SET statements to define query context parameters for a query through the Druid console or the API. For example, you can include the following in your query to set the timeout:
SET timeout = 20000;
Improved SQL endpoint
You can now use raw SQL in the HTTP body for /druid/v2/sql
endpoints. You can set Content-Type
to text/plain
instead of application/json
, so you can provide raw text that isn't escaped (#17937)
Dart query engine (beta)
The Dart query engine is designed to support highly complex queries, such as large joins, high-cardinality GROUP BY, sub-queries and common table expressions. You'll typically find these types of query used in ad-hoc, data warehouses. They should be familiar to developers using Apache Spark and Presto. Dart uses multi-threaded workers, conducts in-memory shuffles, and accesses locally cached data directly, rather than reading from deep storage.
To use Dart, add the following to _common/common.runtime.properties
:
druid.msq.dart.enabled = true
Then, include the engine
query context parameter in your queries and set it to msq-dart
.
(id: 67698) (id: 67261) (id: 66855) (id: 66854) (id: 66853) (id: 66852) (id: 64665) (id: 64258) (id: 65720) (id: 66042)
Projections (alpha)
Projections significantly reduce the number of rows to scan when processing certain queries. Therefore, they can greatly improve the performance. A projection is pre-aggregated data that matches a commonly used query. When you run the query, Druid uses the projection to process the query, reducing the number of rows processed, leading to lower compute and i/o costs.
You define projections either as part of an ingestion, in the catalog for a datasource, or in the compaction spec for a datasource.
(id: 67624) (id: 67552) (id: 67493) (id: 67393) (id: 67365) (id: 67274) (id: 67128)
Multiple supervisors for a datasource
You can now use more than one supervisor to ingest data into the same datasource. Use the id
field to distinguish between supervisors ingesting into the same datasource (identified by spec.dataSchema.dataSource
for streaming supervisors).
(#18149) (#18082)
Segment metadata cache
You can now enable segment metadata caching on the Overlord to significantly improve performance of various segment read, commit, and allocate actions performed by the Overlord during ingestion. You can also enable it on the Coordinator to improve performance of segment metadata polls
Set druid.manager.segments.useIncrementalCache
to one of the following values:
never
: never use the incremental cache.ifSynced
: use the incremental cache if it's up-to-date. This setting doesn't block service startup.always
: always use the incremental cache. This setting blocks service startup until the cache has synced with the metadata store at least once.
If you enable incremental caching, you can control the polling period using druid.manager.segments.pollDuration
(defaults to PT1M
). Set the config to an ISO 8601 format duration.
You can use the following metrics to monitor the performance of the segment metadata cache:
Show the metrics
segment/used/count
segment/unused/count
segment/pending/count
segment/metadataCache/sync/time
segment/metadataCache/deleted
segment/metadataCache/skipped
segment/metadataCache/used/stale
segment/metadataCache/used/updated
segment/metadataCache/unused/updated
segment/metadataCache/pending/deleted
segment/metadataCache/pending/updated
segment/metadataCache/pending/skipped
(#17996) (id: 67034) (#17935) (id: 66709) (#17947) (id: 66639) (#17653)
Historical clones
You can use Historical clones to speed up rolling updates when you want to launch a new Historical as a replacement for an existing one.
Set the cloneServers
field in the Coordinator dynamic config to a map from the target Historical server to the source Historical:
"cloneServers": {"historicalClone":"historicalOriginal"}
When you query your data, you can prefer (preferClones
), exclude (excludeClones
), or include (includeClones
) clones by setting the query context parameter cloneQueryMode
. By default, clones are excluded while querying.
(id: 65717) (id: 66707) (id: 65718) (#17863) (#17899)(#17956)
Embedded kill tasks on the Overlord (alpha)
You can now run kill tasks directly on the Overlord itself. You need to have segment metadata cache enabled. Embedded kill tasks provide several benefits; they:
- kill segments as soon as they're eligible
- don't take up a task slot
- finish faster since they use optimized metadata queries and don't launch a new JVM
- kill a small number of segments per task to avoid holding locks on an interval for too long
- skip locked intervals to avoid head-of-line blocking
- require minimal configuration
- can keep up with a large number of unused segments in the cluster.
Use the following configurations for this feature:
druid.manager.segments.killUnused.enabled
- Enable or disable the featuredruid.manager.segments.killUnused.bufferPeriod
- The duration before Druid permanently removes a segment from metadata and deep storage. Use this as a buffer period to prevent data loss if you may need data after it's marked unused.
As part of this feature, new metrics have been added:
Show the metrics
Metric | Description | Dimensions |
---|---|---|
segment/killed/metadataStore/count | Number of segments permanently deleted from metadata store | dataSource , taskId , taskType , groupId , tags |
segment/killed/deepStorage/count | Number of segments permanently deleted from deep storage | dataSource , taskId , taskType , groupId , tags |
segment/kill/queueReset/time | Time taken to reset the kill queue on the Overlord. This metric is emitted only if druid.manager.segments.killUnused.enabled is true. | |
segment/kill/queueProcess/time | Time taken to fully process all the jobs in the kill queue on the Overlord. This metric is emitted only if druid.manager.segments.killUnused.enabled is true. | |
segment/kill/jobsProcessed/count | Number of jobs processed from the kill queue on the Overlord. This metric is emitted only if druid.manager.segments.killUnused.enabled is true. | |
segment/kill/skippedIntervals/count | Number of intervals skipped from kill due to being already locked. This metric is emitted only if druid.manager.segments.killUnused.enabled is true. | dataSource , taskId |
(#18028) (id: 66111)
Pivot highlights
Multi-axis line chart improvements
Improved the ability to customize the y-axis in the multi-axis line chart. You can now:
- Set a custom tick mark interval.
- Specify a maximum value for the y-axis.
- Change the label orientation.
(ids 67040, 67436, 67039)
Alerts improvements
You can now trigger alerts based on the previous value of a measure. To do this, set an alert condition to greater than or less than Previous value.
(ids 67496, 67458)
Other changes
Druid changes
- Added new metrics for real-time ingestion (#17847) (id: 66232)
- Added new Kafka consumer metrics (#17919)
- Added new MSQ task engine metrics and new dimensions to existing metrics (#18121)
- Added the
description
dimension for thetask/run/time
metric - Added a metric for how long it takes to complete an autoscale action:
task/autoScaler/scaleActionTime
(#17971) (id: 66752) - Added a
taskType
dimension to Overlord-emitted task count metrics (#18032) (id: 67226) - Added the following groupBy metrics to the Prometheus emitter:
mergeBuffer/used
,mergeBuffer/acquisitionTimeNs
,mergeBuffer/acquisition
,groupBy/spilledQueries
,groupBy/spilledBytes
, andgroupBy/mergeDictionarySize
(#17929) (id: 66527) - Changed the logging level for query cancellation from
warn
toinfo
to reduce noise (#18046) (id: 67279) - Changed query logging so that SQL queries that can't be parsed are no longer logged and don't emit metrics (#18102)
- Changed the logging level for lifecycle from
debug
toinfo
(#17884) (id: 66354) - Added
groupId
andtasks
to Overlord logs (#18046) - You can now use the
druid.request.logging.rollPeriod
to configure the log rotation period (default 1 day) (#17976) (id: 66767) - Improved metric emission on the Broker to include per-query result-level caching (
query/resultCache/hit
returning1
means the cache was used) (#18063) (id: 67325) - Added audit logs for the following
BasicAuthorizerResource
update methods:authorizerUserUpdateListener
,authorizerGroupMappingUpdateListener
,authorizerUpdateListener
(deprecated) (#17916) - Added support for streaming task logs to Indexers (#18170)
- Improved
ThreadDumpMonitor
. It's no loner part ofMonitorScheduler
so that thread dumps are available even if theMonitorScheduler
thread is blocked (id: 67230) - Added the following Coordinator APIs:
/druid/coordinator/v1/cloneStatus
and/druid/coordinator/v1/brokerConfigurationStatus
(#17899) Fixed an issue where MSQ task engine workers deadlocked when retrying (#18254) (id: 68043) - Improved concurrency for batch and streaming ingestion tasks (#17828)
- Improved how MSQ task engine tasks get canceled, speeding it up and freeing up resources sooner (#18095)
- Improved streaming ingestion so that it automatically determine the maximum number of columns to merge (#17917)
- Improved batch segment allocation so that it uses multiple threads to improve performance when allocating segments for different datasources. Set the config
druid.indexer.tasklock.batchAllocationNumThreads
(default value 5) to the number of threads you want to use for segment allocation. If you set this config to a very large value, it can hinder performance due to strain on the metadata store (id: 67449) (#18098) - Added the optional
taskCountStart
property to the lag based auto scaler. Use it to specify the initial task count for the supervisor to be submitted with (#18098) - Changed how Druid interacts with streaming input sources. Druid now explicitly prevents Seekable Stream Supervisors (Kafka, Kinesis, and Rabbit) from updating the underlying "input stream" (such as a topic for kafka) that is persisted for it. This action, while previously allowed by the API, is not fully supported by the underlying system. Going forward, a request to make such a change will result in a 400 error from the Supervisor API with details on the reason why it is not allowed. The docs and the message in the response describe a work-a-round for users who are adamant that they want to make such a change (#17955) (id: 66327)
- Added support for big decimal values (id: 63062) (id: 663)
- Fixed an issue with realtime queries using the MSQ task engine that led to incorrect results (#18235) (id: 67972)
- Fixed an issue with numeric vector selectors on
JSON_VALUE
when the least restrictive type contains arrays (#18053) (id: 67356) - Changed
groupBy
queries. Druid now uses thegroupBy
native query type, rather thantopN
, for SQL queries that group by and order by the same column, haveLIMIT
, and don't haveHAVING
. This speeds up execution of such queries sincegroupBy
is vectorized whiletopN
is not. (#18074) (id:67423) - Changed
MV_OVERLAP
andMV_CONTAINS
functions now aligns more closely with the nativeinType
filter (#18084) - Fixed an issue where
equalTo
,greaterThan
, andlessThan
specs throw a NPE when used with first or last aggregators that return null (#17911) (id: 66482) - Improved
json_merge()
to be SQL-compliant when arguments are null. The function now returns null if any argument is null. For example, queries like SELECT JSON_MERGE(null, null) and SELECT JSON_MERGE(null, '') will return null instead of throwing an error. (#17983) - Fixed an issue where NVL vector processor fails with NPE if all inputs are null (#18024) (id: 67214)
- Fixed the ordering for certain float values in row-based frames (#18181) (id: 66846)
- You can now perform big decimal aggregations using the MSQ task engine (#18164)
- You can now configure a timeout for
index_parallel
andcompact
type tasks. Set the context parametersubTaskTimeoutMillis
to the maximum time in milliseconds you want to wait before a subtask gets canceled. By default, there's no timeout. (#18039) (id: 67252) - Improved query handling when segments are temporarily missing on Historicals but not detected by Brokers. Druid doesn't return partial results incorrectly now. (#18025) (id: 67211)
- Fixed an issue with dropping and reingesting segments when using concurrent append and replace (#18216) (id: 67907)
- Fixed an issue where concurrent replace tasks didn't delete rows despite marking them as successfully deleted (#18099) (id: 67455)
- Fixed an issue with URL encoding with Azure (#17887) (id: 66370)
- Improved the concurrency on the Overlord for the task queue (#17828) (id: 66035)
- Improved the concurrency on the Overlord by ensuring that task actions on one datasource don't block actions on other datasources (#17390) (id: 66034)
- Improved streaming ingestion so that it automatically determine the maximum number of columns to merge (#17917)
- Improved the streaming task autoscaler to use a common thread pool (#18163) (id: 66647)
- Fixed an issue that can happen when a cursor is reset and the offset contains an OR filter bundle that includes partial index value matchers (#18029) (id: 67221)
- You can now assign tiered replications to tiers that aren't currently online in the Druid console (#18050)
- You can now filter tasks by the error in the Task view (#18057)
- Improved SQL autocomplete and added JSON autocomplete (#18126)
- Changed how the web console determines what functions are available, improving things like auto-completion (#18214)
- Updated the web console to use the Overlord APIs instead of Coordinator APIs when managing segments, such as marking them as unused (#18172)
- Removed the
IS_INCREMENTAL_HANDOFF_SUPPORTED
context reference from supervisors, as incremental publishing has been the default behavior since version 0.16.0. This context was originally introduced to support rollback toLegacyKafkaIndexTaskRunner
in versions earlier than 0.16.0, which has since been removed (#18120) (id: 67502) - Removed the
useMaxMemoryEstimates
config. When set to false, Druid used a much more accurate memory estimate that was introduced in Druid 0.23.0. That more accurate method is the only available method now. The config has defaulted to false for several releases (#17936)
Pivot changes
- In a line chart visualization the Multiples property is now limited to a single dimension. This is in line with the multi-axis line chart (id: 66811)
- Comparisons in reports now use the same unit as the original measure (id: 66480)
- Links to data cubes in alert emails, report emails, and webhooks are now permanent. This means that the links remain valid long after the emails are delivered (id: 23929)
- Fixed DELTA_TIMESERIES to automatically align the window with the time bucket, preventing errors in the time series visualization (id: 67487)
- Fixed dashboard string filters to show suggestions from all relevant data cubes when multiple dimensions share the same title (id: 67484)
- Fixed an issue where an alert would fire multiple times after being changed to a smaller time frame (id: 66546)
Imply Hybrid
- Imply Hybrid now supports
r7gd.8xlarge
instances (id: 68078)
Changes in 2025.04.2
June 18, 2025
Druid changes
- Changed the cast expression to be more lenient when casting arrays of primitive values. Casting multi-element arrays now results in a null value in the target type instead of failing. This behavior is consistent with other primitive value behavior (#18078) (id: 67399)
- Fixed an issue where the NVL function failed if all the inputs were null (#18024) (id: 67214)
- Fixed an issue where JSON_VALUE would return null for all values if a nested field contained an array (#18053) (id: 67356)
- Fixed an issue that occurred during a cursor reset if the offset contained an OR filter bundle (#18029) (id: 67221)
- Fixed an issue where the segment map for a GroupBy query got discarded (#17763) (id: 67430)
Pivot changes
- The line chart visualization now allows you to select a Horizontal scale: Scroll (default) or Fit to view. To display all data in the visible line chart without scrolling, select Fit to view (id: 67054)
- Fixed an issue where you could create collections with the same name (id: 67235)
- Fixed an issue in the Collections API where you could create collections containing data cube and dashboard IDs that didn't exist. This resulted in collections with no assets (id: 67235)
- Fixed an issue with long collection names in the UI (id: 67235)
- Fixed scrolling issues on the collections page (id: 66702)
- Fixed an issue where an alert would fire multiple times after being changed to a smaller time frame (id: 66546)
- Improved the API endpoint to create a collection so that it only accepts the
name
anddescription
fields (id: 67235)
Changes in 2025.04.1
May 7, 2025
Druid changes
- Fixed an issue that caused streaming ingestion jobs to fail when upsert was enabled (id: 10053)
Changes in 2025.04
April 22, 2025
Pivot highlights
Collections
You can now group Pivot data cubes and dashboards into collections for easy access. See Collections for more details.
(id: 65850)
Druid highlights
Improved the query results API
The query results API (GET /druid/v2/sql/statements/{queryId}/results
) now supports an optional filename
parameter. When provided, the response uses the Content-Disposition
header to instruct web browsers to save the results as a file instead of displaying them inline.
(#17840) (id: 66215)
Increase segment load speed (alpha)
You can now increase the speed at which segments get loaded on a Historical by providing a list of servers for the Coordinator dynamic config turboLoadingNodes
. For these servers, the Coordinator ignores druid.coordinator.loadqueuepeon.http.batchSize
and uses the value of the respective numLoadingThreads
instead. Note that putting a Historical in turbo-loading mode might affect query performance since more resources would be used by the segment loading threads.
(#17775) (id: 65714)
Overlord APIs for compaction (alpha)
You can use the following Overlord compaction APIs to manage compaction status and configs. These APIs work seamlessly irrespective of whether compaction supervisors are enabled or not.
Method | Path | Description | Required permission |
---|---|---|---|
GET | /druid/indexer/v1/compaction/config/cluster | Get the cluster-level compaction config | Read configs |
POST | /druid/indexer/v1/compaction/config/cluster | Update the cluster-level compaction config | Write configs |
GET | /druid/indexer/v1/compaction/config/datasources | Get the compaction configs for all datasources | Read datasource |
GET | /druid/indexer/v1/compaction/config/datasources/{dataSource} | Get the compaction config of a single datasource | Read datasource |
POST | /druid/indexer/v1/compaction/config/datasources/{dataSource} | Update the compaction config of a single datasource | Write datasource |
GET | /druid/indexer/v1/compaction/config/datasources/{dataSource}/history | Get the compaction config history of a single datasource | Read datasource |
GET | /druid/indexer/v1/compaction/status/datasources | Get the compaction status of all datasources | Read datasource |
GET | /druid/indexer/v1/compaction/status/datasources/{dataSource} | Get the compaction status of a single datasource | Read datasource |
(#17834)(id: 66269)
Faster segment metadata operations
Enable segment metadata caching on the Overlord with the runtime property druid.manager.segments.useCache
. This feature is off by default.
You can set the property to the following values:
never
: Cache is disabled (default).always
: Reads are always done from the cache. Service start-up will be blocked until the cache has synced with the metadata store at least once. Transactions are blocked until the cache has synced with the metadata store at least once after becoming leader.ifSynced
: Reads are done from the cache only if it has already synced with the metadata store. This mode does not block service start-up or transactions unlike thealways
setting.
As part of this change, additional metrics have been introduced:
segment/metadataCache/sync/time
segment/metadataCache/transactions/readOnly
segment/metadataCache/transactions/writeOnly
segment/metadataCache/transactions/readWrite
(#17653) (#17824) (#17785) (id: 66110) (id: 65819)
Compaction supervisors (experimental)
You now configure compaction supervisors with the following Coordinator compaction config:
useSupervisors
- Enable compaction to run as a supervisor on the Overlord instead of as a Coordinator duty.engine
- Choose betweennative
andmsq
to run compaction tasks. Themsq
setting uses the MSQ task engine and can be used only whenuseSupervisors
is true.
Previously, you used runtime properties for the Overlord. Support for these has been removed.
(#17782) (id: 65912)
Automatic kill task interval
The Coordinator can optionally issue kill tasks for cleaning up unused segments. Starting with this release, individual kill tasks are limited to processing 30 days or fewer worth of segments per task by default. This improves performance of the individual kill tasks.
The previous behavior (no limit on interval per kill task) can be restored by setting druid.coordinator.kill.maxInterval = P0D.
(#17680) (id: 63023)
Pivot changes
- Fixed limits configured on time splits are not correctly applied when a data cube's timezone is set to Etc/UTC (id: 65568)
Druid changes
- Changed Javascript backed selector strategies to use GraalJS (#17843) (id: 66312)
- Changed how the Druid console exports data. It is now normalized to how Druid exports data. Additionally, you can export results as Markdown tables (#17845)(id: 66228)
- Changed the cap for
balancerComputeThreads
to 100 (#17855) (id: 66275) - Improved cleanup of unused datasources from segment metadata cache (#17853)(id: 66272)
- Improved projections so that they work with compaction (id: 65174) (#17803) (id: 66310)
- Updated
netty4
version (#17755)(id: 65657) - Updated
async-http-client
to 3.0.1 (#17646) (id: 65304) - Updated
parquet-avro
(#17874) (id: 66328)
Row policies are now enforced for Task SQL and Dart SQL endpoints (#17666) (id: 65091)
Imply Manager changes
- Added support for MySQL 8.4's default authentication method (id: 65921)
- Imply Enterprise now requires Python version 3.8 or later (id: 65824)
- Imply Enterprise on Kubernetes now requires Kubernetes 1.25 or later (id: 65709)
- You can now use Java 21 for Imply Enterprise (id: 65618)
- Fixed an issue where you couldn't supply a custom version (id: 66212)
- Fixed an issue where Imply Manager can get stuck when part of an upgrade times out (id: 66372)
- Upgraded ZooKeeper client to 3.8.4 (id: 65594)
Changes in 2025.01.1
March 19, 2025
Druid changes
- Fixed a CVE. See advisory (id: 65594)
Changes in 2025.01
Druid highlights
SQL behavior
Starting in 2025.01 STS, you can't continue to use non-ANSI SQL compliant behavior for Booleans, nulls, and two-valued logic.
Make sure you update your queries to account for this behavior. For more information on how to update your queries, see the SQL compliant mode migration guide.
Support for the configs that enabled the legacy behavior has been removed. They no longer affect your query results. If these configs are set to the legacy behavior, Druid services fail to start.
Remove the following configs:
druid.generic.useDefaultValueForNull=true
druid.expressions.useStrictBooleans=false
druid.generic.useThreeValueLogicForNativeFilters=false
If you want to continue to get the same results , you must update your queries or your results will be incorrect after you upgrade.
Join hints in MSQ task engine queries
Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.
select /*+ sort_merge */ w1.cityName, w2.countryName
from
(
select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName
) w1
JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName
where w1.cityName='New York';
(#17406) (id: 62998)
Front-coded dictionaries
You can specify that Druid uses front-coded dictionaries feature during segment creation. Once Druid starts using segments with front-coded dictionaries, you can't downgrade to a version where Druid doesn't support front-coded dictionaries. For more information, see Migration guide: front-coded dictionaries.
Concurrent append and replace
Concurrent append and replace is now generally available.
Deprecation updates
- CentOS support for Imply Enterprise: if you are using CentOS, migrate to a supported operating system: RHEL 7.x and 8.x or Ubuntu 18.04 and 20.04. Support is planned to end in April 2025.
ioConfig.inputSource.type.azure
storage schema: update your ingestion specs to use theazureStorage
storage schema, which provides more capabilities. Support is planned to end in 2026.01 STS.- ZooKeeper-based task discovery: it has not been the default method for task discovery for several releases. Support is planned to end in 2026.01 STS.
For features that have reached end of support in 2025.01 STS, see End of support.
For a more complete list of deprecations including upcoming ones, see Deprecations.
Segment management APIs
APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service:
-
Mark all segments of a datasource as unused:
POST /druid/indexer/v1/datasources/{dataSourceName}
-
Mark all (non-overshadowed) segments of a datasource as used:
DELETE /druid/indexer/v1/datasources/{dataSourceName}
-
Mark multiple segments as used
POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed
-
Mark multiple (non-overshadowed) segments as unused
POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused
-
Mark a single segment as used:
POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}
-
Mark a single segment as unused:
DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}
(#17545) (id: 64884)
Improve metadata IO
You can now reduce the metadata I/O during segment allocation by using the following Overlord runtime property: druid.indexer.tasklock.batchAllocationReduceMetadataIO
.
This property is set to true
by default.
When set to true
, the Overlord only fetches necessary segment payloads during segment allocation.
(#17496) (id: 64772)
New metrics for GroupBy queries
When merging the groupBy results, the following metrics are now emitted by the GroupByStatsMonitor
:
mergeBuffer/used
: Number of merge buffers used.mergeBuffer/acquisitionTimeNs
: Total time required to acquire merge buffer.mergeBuffer/acquisition
: Number of queries that acquired a batch of merge buffers.groupBy/spilledQueries
: Number of queries that spilled onto the disk.groupBy/spilledBytes
: Spilled bytes on the disk.groupBy/mergeDictionarySize
: Size of the merging dictionary.
(#17360) (id: 62147)
Auto-compaction using compaction supervisors (alpha)
You can run automatic compaction using compaction supervisors on the Overlord rather than Coordinator duties. Compaction supervisors provide the following benefits over Coordinator duties:
- Can use the supervisor framework to get information about the auto-compaction, such as status or state
- More easily suspend or resume compaction for a datasource
- Can use either the native compaction engine or the MSQ task engine
- More reactive and submits tasks as soon as a compaction slot is available
- Tracked compaction task status to avoid re-compacting an interval repeatedly
For more information, see Auto-compaction using compaction supervisors
(#16291)
Projections (alpha)
Datasources now support projections as an alpha feature. They can improve query performance by pre-aggregating data. They are similar to materialized views but are part built into a segment and are automatically used when a query fits the projection.
To use a projection, you must ingest a datasource using JSON-based ingestion. Include a projections
block in your ingestion spec with the following fields: type
, name
, virtualColumns
, groupingColumns
, and aggregators
. Note that you can have projections that only include aggregators and no grouping columns, such as when you want to create a projection for the sum of certain columns.
Then, use the following query context flags when running either a native query or SQL query:
useProjection
: accepts a specific projection name and instructs the query engine that it must use that projection, and will fail the query if the projection does not match the queryforceProjections
: accepts true or false and instructs the query engine that it must use a projection, and will fail the query if it cannot find a matching projectionnoProjections
: accpets true or false and instructs the query engines to not use any projections
Note that auto-compaction does not preserve projections.
For more information, see the open source Druid issue for projections.
(#17214) (id: 64172) (#17484) (id: 64763)
Realtime query processing for multi-value strings
Realtime query processing no longer considers all strings as multi-value strings during expression processing, fixing a number of bugs and unexpected failures. This should also improve realtime query performance of expressions on string columns.
This change impacts topN queries for realtime segments where rows of data are implicitly null, such as from a property missing from a JSON object.
Before this change, these were handled as [] instead of null, leading to inconsistency between processing realtime segments and published segments. When processing segments, the value was treated as [], which topN ignores. After publishing, the value became null, which topN does not ignore. The same query could have different results before and after being persisted
After this change, the topN engine now treats [] as null when processing realtime segments, which is consistent with published segments.
This change doesn't impact actual multi-value string columns, regardless of if they're realtime.
(#17386) (id: 63771) (id: 64672)
Druid changes
- Added support to the web console for the
expectedLoadTimeMillis
metric (#17359) (id: 64208) - Added support for aggregate only projections (#17484) (id: 64763)
- Added support for UNION in decoupled planning (#17354) (id: 64402)
- Added
ingest/notices/queueSize
,ingest/pause/time
, andingest/notices/time
to statsd emitter (#17487) (id: 64679) (#17468) (id: 64601) - Added
druid.expressions.allowVectorizeFallback
and default to false (#17248) (id: 64173) - Added
stageId
andworkerNumber
to the MSQ task engine's processing thread names (#17324) (id: 64147) - Added support for a high-precision ST_GEOHASH function that takes the complex column
geo
, which contains longitude and latitude in that order, and returns a hash (id: 63437) - Added the config
druid.server.http.showDetailedJsonMappingError
, which is similar todruid.server.http.showDetailedJettyError
, to configure the detail level for JSON mapping error messages (#16821) (id: 62645) - Changed how real-time segment metrics are now for each Sink instead of for each FireHydrant. This is a return to emission behavior prior to improvements to real-time query performance made in 2024.02 (#17170) (id: 61871)
- You no longer have to configure a temporary storage directory on the Middle Manager for durable storage or exports. If it isn't configured, Druid uses the task directory (#17015) (id: 60547)
- Improved the column order for scan queries so that they align with its desired signature (#17463) (id: 64441)
- Improved the Query view in the web console to support resizable side panels (#17387) (id: 64404)
- Improved how the Overlord service determines the leader and hands off leadership (#17415) (id: 64312)
- Improved Middle Manager-less ingestion so that the Kubernetes task runner exposes the
getMaximumCapacity
field (#17107) (id: 64168) - Improved the styling in the web console for the stage timing bar (#17295) (id: 64157)
- Improved autoscaling for Supervisors so that scaling doesn't happen when partitions are less than
minTaskCount
(#17335) (id: 64145) - Improved how the Explore view in the web console handles defaults (#17252) (id: 64020)
- Improved the MSQ task engine to account for situations where there are two simultaneous statistics collectors (#17216) (id: 63987)
- Improved the lookups extension to support iterating over fetched data (#17212) (id: 63939)
- Improved logging to include
taskId
in handoff notifier thread (#17185) (id: 63882) - Improved window functions that use the MSQ task engine so that its processor can send any number of rows and columns to the operator without having to partition by column (#17038)(id: 63249)
- Fixed an issue with PostgreSQL metadata storage because of table name casing issues (#17351) (id: 64128)
- Fixed an issue with Supervisor autoscaling which could cause it to get skipped when the Supervisor could be publishing or when
minTriggerScaleActionFrequencyMillis
hasn't elapsed (#17356) (id: 64226) - Fixed in issue in the web console where the progress indication for table input gets stuck at 0 (#17334) (id: 64209)
- Fixed an issue where batch segment allocation fails when there are replicas (#17262) (id: 64169)
- Fixed an issue when grouping on a string array and sorting by it (#17183) (id: 64166)
- Fixed an issue where duplicate compaction tasks might get launched (#17287) (id: 64154)
- Fixed a race condition for failed queries with the MSQ task engine (#17313) (id: 64153)
- Fixed several issues with the Explore view in the web console (#17234) (id: 64005) (#17240) (id: 64010) (#17225) (id: 63985)
- Fixed an issue with querying realtime segment when using concurrent append and replace (#17157) (id: 63852)
- Fixed an issue where Indexer tasks get stuck in a publishing state and must either get killed or hit the timeout (#17146) (id: 63800)
- Removed unused Coordinator dynamic configs
mergeSegmentsLimit
andmergeBytesLimit
(#17384) (id: 64267)
Imply Manager changes
- Fixed a problem where updated Helm values were sometimes incorrectly displayed (id: 64648)
Pivot changes
- The async download process now shows more information during the download process, including the number of rows processed (id: 60947)
- The time series visualization now supports the TIMESERIES function (id: 63901)
- In the records visualization you can now use the Nulls summary pill drop-down to turn off displaying the number of hidden null values (id: 64197)
- You can now set a minimum auto-refresh rate when creating or editing a dashboard (id: 64032)
- You can now preview the time range when adding a relative comparison to a visualization (id: 63944)
- You can now specify the date and time to start evaluating alerts (id: 40669)
- In the general options for a dashboard you can now set a default auto-refresh rate (id: 39798)
- Fixed an issue with editing a report after removing a dimension used as a report filter (id: 63475)
Upgrade and downgrade notes
In addition to the upgrade and downgrade notes, review the deprecations page regularly to see if any features you use are impacted.
Minimum supported version for rolling upgrade
See "Supported upgrade paths" in the Lifecycle Policy documentation.
Segment metadata cache configs
If you need to downgrade to a version where Druid doesn't support the segment metadata cache, you must set the druid.manager.segments.useCache
config to false or remove it prior to the upgrade.
This feature was introduced in 2025.07 STS.
Kubernetes version
Starting in 2025.04 STS, Imply Enterprise on Kubernetes requires Kubernetes 1.25 or later. (id: 65709)
Python version
Starting in 2025.04 STS, Imply Manager now requires Python 3.8 or later. (id: 65824)
useMaxMemoryEstimates
config
Starting in 2025.04 STS, useMaxMemoryEstimates
is now set to false for MSQ task engine tasks. Additionally, the property has been deprecated and will be removed in a future release. Setting this to false allows for better on-heap memory estimation.
(#17792)(id: 66290)
Default string array ingestion
Starting in 2024.10 STS, SQL-based ingestion with the MSQ task engine defaults to array typed columns instead of multi-value dimensions (MVDs). You must adjust your queries to either use array typed columns or explicitly specify your arrays as MVDs in your ingestion query. For more information, refer to the product feature update that Imply shared.
Front-coded dictionaries
Once Druid starts using segments with front-coded dictionaries, you can't downgrade to a version where Druid doesn't support front-coded dictionaries. For more information, see Migration guide: front-coded dictionaries.
If you're already using this feature, you don't need to take any action.
Automatic compaction
Imply preserves your automatic compaction configurations upon upgrade.
Segment sorting
This feature is in alpha and not backwards compatible with versions earlier than 2024.09. If you enable it, you can't downgrade to a version earlier than 2024.09 STS.
You can now configure Druid to sort segments by something other than time first.
For SQL-based ingestion, include the query context parameter forceSegmentSortByTime: false
. For JSON-based batch and streaming ingestion, include forceSegmentSortByTime: false
in the dimensionsSpec
block.
(#16849) (id: 63215)
Changed low-level APIs for extensions
This information is meant for users who write their own Druid extensions and doesn't impact anyone who only uses extensions supported by Imply.
As part of changes starting in 2024.09 to improve Druid, including the changes described in Segment sorting for Druid users, some low-level APIs used by some extensions may no longer be compatible with any existing custom extensions you have. For more information about which interfaces are impacted, see the following pull requests:
Compression for complex metric columns
If you use the IndexSpec
option complexMetricCompression
to compress complex metric columns, you cannot downgrade to a version that doesn't support compressing those columns.
This feature was introduced in 2024.09 STS.
(#16863) (id: 63277)
Changes to native equals
filter
Beginning in 2024.01 STS, the native query equals
filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema
reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays.
Imply Hybrid MySQL upgrade
Imply Hybrid previously used MySQL 5.7 by default. New clusters will use MySQL 8 by default. If you have an existing cluster, you'll need to upgrade the MySQL version since the Amazon RDS support end date for this version is scheduled for February 29, 2024. Although you can opt for extended support from Amazon, you can use Imply Hybrid Manager to upgrade your MySQL instance to MySQL 8.
The upgrade should have little to no impact on your queries but does require a reconnection to the database. The process can take an hour and services will reconnect to the database during the upgrade.
In preparation for the upgrade, you need to grant certain permissions to the Cloud Manager IAM role by applying the following policy:
Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"rds:CreateBlueGreenDeployment",
"rds:PromoteReadReplica"
],
"Resource": [
"arn:aws:rds:*:*:pg:*",
"arn:aws:rds:*:*:deployment:*",
"arn:aws:rds:*:*:*:imply-*"
],
"Effect": "Allow"
},
{
"Action": [
"rds:AddTagsToResource",
"rds:CreateDBInstanceReadReplica",
"rds:DeleteBlueGreenDeployment",
"rds:DescribeBlueGreenDeployments",
"rds:SwitchoverBlueGreenDeployment"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
After you grant the permissions, click Apply changes for Amazon RDS MySQL Update on the Overview page of Imply Hybrid Manager.
Three-valued logic
The legacy two-valued logic and the corresponding properties that support it will be removed in the January 2025 STS and January 2026 LTS. The SQL compatible three-valued logic will become the only option.
Update your queries and downstream apps prior to these releases.
SQL standard three-valued logic introduced in 2023.11 primarily affects filters using the logical NOT operation on columns with NULL values. This applies to both query and ingestion time filtering.
The following example illustrates the old behavior and the new behavior:
Consider the filter “x <> 'some value'”
to filter results for which x
is not equal to 'some value'
.
Previously, Druid included all rows not matching "x='some value'"
including null values.
The new behavior follows the SQL standard and will now only match rows with a value and which are not equal to 'some value'
.
Null values are excluded from the results.
This change primarily affects filters using the logical NOT operation on columns with NULL values.
SQL compatibility
The legacy behavior that is not compatible with standard ANSI SQL and the corresponding properties was removed in the January 2025 STS and January 2026 LTS releases. The SQL-compatible behavior introduced in the 2023.09 STS will be the only behavior available.
Update your queries and any downstream apps prior to these releases.
Starting with 2023.09 STS, the default way Druid treats nulls and booleans changed.
For nulls, Druid now differentiates between an empty string (''
) and a record with no data as well as between an empty numerical record and 0
.
For booleans, Druid now strictly uses 1
(true) or 0
(false). Previously, true and false could be represented either as true
and false
as well as 1
and 0
, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL
.
The following table illustrates some example scenarios and the impact of the changes:
Show the table
Query | 2023.08 STS and earlier | 2023.09 STS and later |
---|---|---|
Query empty string | Empty string ('' ) or null | Empty string ('' ) |
Query null string | Null or empty | Null |
COUNT(*) | All rows, including nulls | All rows, including nulls |
COUNT(column) | All rows excluding empty strings | All rows including empty strings but excluding nulls |
Expression 100 && 11 | 11 | 1 |
Expression 100 || 11 | 100 | 1 |
Null FLOAT/DOUBLE column | 0.0 | Null |
Null LONG column | 0 | Null |
Null __time column | 0, meaning 1970-01-01 00:00:00 UTC | 1970-01-01 00:00:00 UTC |
Null MVD column | '' | Null |
ARRAY | Null | Null |
COMPLEX | none | Null |
Update your queries
Before you upgrade, update your queries to account for the following changed behavior:
NULL filters
If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL
to s IS NULL OR s = ''
.
COUNT functions
COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column)
with COUNT(column) FILTER(WHERE column <> '')
.
GroupBy queries
GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.
Avatica JDBC driver upgrade
The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.
If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection
.
Parameter execution changes for Kafka
When using the built-in FileConfigProvider
for Kafka, interpolations are now intercepted by the JsonConfigurator
instead of being passed down to the Kafka provider. This breaks existing deployments.
For more information, see KIP-297 in the Kafka project and #13023.
Deprecation notices
For a more complete list of deprecations and their planned removal dates, see Deprecations.
Some segment loading configs deprecated
The following segment related configs are now deprecated and will be removed in future releases:
replicationThrottleLimit
useRoundRobinSegmentAssignment
maxNonPrimaryReplicantsToLoad
decommissioningMaxPercentOfMaxSegmentsToMove
Use smartSegmentLoading
mode instead, which calculates values for these variables automatically.
ioCOnfig.inputSource.type.azure
storage schema
Update your ingestion specs to use the azureStorage
storage schema, which provides more capabilities.
ZooKeeper-based task discovery
Use HTTP-based task discovery instead, which has been the default since 2022.
End of support
CentOS support
If you are using CentOS, migrate to a supported operating system: RHEL 7.x and 8.x or Ubuntu 18.04 and 20.04. Support for CentOS ended April 2025.
Two-valued logic
Druid's legacy two-valued logic for native filters and the properties for maintaining that behavior are deprecated and will be removed in the January 2025 STS and January 2026 LTS releases.
The ANSI-SQL compliant three-valued logic will be the only supported behavior after these releases. This SQL-compatible behavior became the default in the Imply 2023.11 STS and January 2024 LTS releases.
Update your queries and downstream apps and remove the corresponding configs.
For more information, see three-valued logic.
Properties for legacy Druid SQL behavior
Druid's legacy behavior for Booleans and NULLs and the corresponding properties are deprecated and will be removed in the January 2025 STS and January 2026 LTS releases.
The ANSI-SQL compliant treatment of Booleans and null values will be the only supported behavior after these releases. This SQL-compatible behavior became the default in the Imply 2023.11 STS and January 2024 LTS releases.
Update your queries and downstream apps and remove the corresponding configs.
For more information, see SQL compatibility.
druid.azure.endpointSuffix
The config has been removed. Update any references to use druid.azure.storageAccountEndpointSuffix instead.
SysMonitor
support
Switch to OshiSysMonitor
as SysMonitor
is removed.
Asynchronous SQL download
The async downloads feature has been removed. This refers to an older version of async SQL download that has been replaced with a new version with the same name. For more information, see Download data.
ZooKeeper segment serving processes
ZooKeep-based segment loading has been disabled in 2024.06 STS.
In 2024.08 STS, segment serving processes such as Peons, Historicals and Indexers won't create ZooKeeper loadQueuePath
anymore. The property druid.zk.paths.loadQueuePath
will also be ignored if they are still in your configs.
If you are still using ZooKeeper-based segment loading and want to upgrade to a more recent release where only HTTP-based segment loading is supported, switch to HTTP-based segment loading before upgrading. For more information, see Segment management.
(#16816) (id: 62629)
Java 8
Java 8 for Druid is at end of support. We recommend you upgrade to Java 17.
JSON columns v3 and v4
JSON columns v3 and v4 is at end of support. Only JSON column v5 is supported and has been the default for several releases. While Druid can still read these older versions, it can't create those versions. Druid can only create v5 columns now.
After upgrading to a version with support for a higher JSON version, you cannot downgrade to an earlier version. Imply's distribution of Apache Druid® has been on JSON v5 since the 2024.01 LTS and 2023.09 STS.
Segment loading rules
Smart segment loading automatically calculates the optimal values for settings you previously had to manually set. As a result, the following settings are automatically ignored: maxSegmentsInNodeLoadingQueue
, maxSegmentsToMove
, replicantLifetime
, balancerComputeThreads
. Additionally, the cachingCost
balancer strategy is no longer supported.