Imply Enterprise and Hybrid release notes
Imply releases include Imply Manager, Pivot, Clarity, and Imply's distribution of Apache Druid®. Imply delivers improvements more quickly than open source because Imply's distribution of Apache Druid uses the primary branch of Apache Druid. This means that it isn't an exact match to any specific open source release. Any open source version numbers mentioned in the Imply documentation don't pertain to Imply's distribution of Apache Druid.asdfsed
The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2025.01. Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. Additionally, review the deprecations page regularly to see if any features you use are impacted.
For information on the LTS release, see the LTS release notes.
If you are upgrading by more than one version, read the intermediate release notes too.
The following end-of-support dates apply in 2025:
- On January 26, 2025, Imply 2023.01 LTS reaches EOL. This means that the 2023.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
- On January 31, 2025, Imply 2024.01 LTS ends general support status and will be eligible only for security support.
For more information, see Lifecycle Policy.
See Previous versions for information on older releases.
Imply evaluation
New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!
With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.
Imply Enterprise
If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.
Changes in 2025.01
Druid highlights
SQL behavior
Starting in 2025.01 STS, you can't continue to use non-ANSI SQL compliant behavior for Booleans, nulls, and two-valued logic.
Make sure you update your queries to account for this behavior. For more information on how to update your queries, see the SQL compliant mode migration guide.
Support for the configs that enabled the legacy behavior has been removed. They no longer affect your query results. If these configs are set to the legacy behavior, Druid services fail to start.
Remove the following configs:
druid.generic.useDefaultValueForNull=true
druid.expressions.useStrictBooleans=false
druid.generic.useThreeValueLogicForNativeFilters=false
If you want to continue to get the same results , you must update your queries or your results will be incorrect after you upgrade.
Join hints in MSQ task engine queries
Druid now supports hints for SQL JOIN queries that use the MSQ task engine. This allows queries to provide hints for the JOIN type that should be used at a per join level. Join hints recursively affect sub queries.
select /*+ sort_merge */ w1.cityName, w2.countryName
from
(
select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName
) w1
JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName
where w1.cityName='New York';
(#17406) (id: 62998)
Front-coded dictionaries
You can specify that Druid uses front-coded dictionaries feature during segment creation. Once Druid starts using segments with front-coded dictionaries, you can't downgrade to a version where Druid doesn't support front-coded dictionaries. For more information, see Migration guide: front-coded dictionaries.
Concurrent append and replace
Concurrent append and replace is now generally available.
Deprecation updates
- CentOS support for Imply Enterprise: if you are using CentOS, migrate to a supported operating system: RHEL 7.x and 8.x or Ubuntu 18.04 and 20.04. Support is planned to end in April 2025.
ioConfig.inputSource.type.azure
storage schema: update your ingestion specs to use theazureStorage
storage schema, which provides more capabilities. Support is planned to end in 2026.01 STS.- ZooKeeper-based task discovery: it has not been the default method for task discovery for several releases. Support is planned to end in 2026.01 STS.
For features that have reached end of support in 2025.01 STS, see End of support.
For a more complete list of deprecations including upcoming ones, see Deprecations.
Segment management APIs
APIs for marking segments as used or unused have been moved from the Coordinator to the Overlord service:
Mark all segments of a datasource as unused:
POST /druid/indexer/v1/datasources/{dataSourceName}
Mark all (non-overshadowed) segments of a datasource as used:
DELETE /druid/indexer/v1/datasources/{dataSourceName}
Mark multiple segments as used
POST /druid/indexer/v1/datasources/{dataSourceName}/markUsed
Mark multiple (non-overshadowed) segments as unused
POST /druid/indexer/v1/datasources/{dataSourceName}/markUnused
Mark a single segment as used:
POST /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}
Mark a single segment as unused:
DELETE /druid/indexer/v1/datasources/{dataSourceName}/segments/{segmentId}
(#17545) (id: 64884)
Improve metadata IO
You can now reduce the metadata I/O during segment allocation by using the following Overlord runtime property: druid.indexer.tasklock.batchAllocationReduceMetadataIO
.
This property is set to true
by default.
When set to true
, the Overlord only fetches necessary segment payloads during segment allocation.
(#17496) (id: 64772)
New metrics for GroupBy queries
When merging the groupBy results, the following metrics are now emitted by the GroupByStatsMonitor
:
mergeBuffer/used
: Number of merge buffers used.mergeBuffer/acquisitionTimeNs
: Total time required to acquire merge buffer.mergeBuffer/acquisition
: Number of queries that acquired a batch of merge buffers.groupBy/spilledQueries
: Number of queries that spilled onto the disk.groupBy/spilledBytes
: Spilled bytes on the disk.groupBy/mergeDictionarySize
: Size of the merging dictionary.
(#17360) (id: 62147)
Auto-compaction using compaction supervisors (alpha)
You can run automatic compaction using compaction supervisors on the Overlord rather than Coordinator duties. Compaction supervisors provide the following benefits over Coordinator duties:
- Can use the supervisor framework to get information about the auto-compaction, such as status or state
- More easily suspend or resume compaction for a datasource
- Can use either the native compaction engine or the MSQ task engine
- More reactive and submits tasks as soon as a compaction slot is available
- Tracked compaction task status to avoid re-compacting an interval repeatedly
For more information, see Auto-compaction using compaction supervisors
(#16291)
Projections (alpha)
Datasources now support projections as an alpha feature. They can improve query performance by pre-aggregating data. They are similar to materialized views but are part built into a segment and are automatically used when a query fits the projection.
To use a projection, you must ingest a datasource using JSON-based ingestion. Include a projections
block in your ingestion spec with the following fields: type
, name
, virtualColumns
, groupingColumns
, and aggregators
. Note that you can have projections that only include aggregators and no grouping columns, such as when you want to create a projection for the sum of certain columns.
Then, use the following query context flags when running either a native query or SQL query:
useProjection
: accepts a specific projection name and instructs the query engine that it must use that projection, and will fail the query if the projection does not match the queryforceProjections
: accepts true or false and instructs the query engine that it must use a projection, and will fail the query if it cannot find a matching projectionnoProjections
: accpets true or false and instructs the query engines to not use any projections
Note that auto-compaction does not preserve projections.
For more information, see the open source Druid issue for projections.
(#17214) (id: 64172) (#17484) (id: 64763)
Realtime query processing for multi-value strings
Realtime query processing no longer considers all strings as multi-value strings during expression processing, fixing a number of bugs and unexpected failures. This should also improve realtime query performance of expressions on string columns.
This change impacts topN queries for realtime segments where rows of data are implicitly null, such as from a property missing from a JSON object.
Before this change, these were handled as [] instead of null, leading to inconsistency between processing realtime segments and published segments. When processing segments, the value was treated as [], which topN ignores. After publishing, the value became null, which topN does not ignore. The same query could have different results before and after being persisted
After this change, the topN engine now treats [] as null when processing realtime segments, which is consistent with published segments.
This change doesn't impact actual multi-value string columns, regardless of if they're realtime.
(#17386) (id: 63771) (id: 64672)
Druid changes
- Added support to the web console for the
expectedLoadTimeMillis
metric (#17359) (id: 64208) - Added support for aggregate only projections (#17484) (id: 64763)
- Added support for UNION in decoupled planning (#17354) (id: 64402)
- Added
ingest/notices/queueSize
,ingest/pause/time
, andingest/notices/time
to statsd emitter (#17487) (id: 64679) (#17468) (id: 64601) - Added
druid.expressions.allowVectorizeFallback
and default to false (#17248) (id: 64173) - Added
stageId
andworkerNumber
to the MSQ task engine's processing thread names (#17324) (id: 64147) - Added support for a high-precision ST_GEOHASH function that takes the complex column
geo
, which contains longitude and latitude in that order, and returns a hash (id: 63437) - Added the config
druid.server.http.showDetailedJsonMappingError
, which is similar todruid.server.http.showDetailedJettyError
, to configure the detail level for JSON mapping error messages (#16821) (id: 62645) - Changed how real-time segment metrics are now for each Sink instead of for each FireHydrant. This is a return to emission behavior prior to improvements to real-time query performance made in 2024.02 (#17170) (id: 61871)
- You no longer have to configure a temporary storage directory on the Middle Manager for durable storage or exports. If it isn't configured, Druid uses the task directory (#17015) (id: 60547)
- Improved the column order for scan queries so that they align with its desired signature (#17463) (id: 64441)
- Improved the Query view in the web console to support resizable side panels (#17387) (id: 64404)
- Improved how the Overlord service determines the leader and hands off leadership (#17415) (id: 64312)
- Improved Middle Manager-less ingestion so that the Kubernetes task runner exposes the
getMaximumCapacity
field (#17107) (id: 64168) - Improved the styling in the web console for the stage timing bar (#17295) (id: 64157)
- Improved autoscaling for Supervisors so that scaling doesn't happen when partitions are less than
minTaskCount
(#17335) (id: 64145) - Improved how the Explore view in the web console handles defaults (#17252) (id: 64020)
- Improved the MSQ task engine to account for situations where there are two simultaneous statistics collectors (#17216) (id: 63987)
- Improved the lookups extension to support iterating over fetched data (#17212) (id: 63939)
- Improved logging to include
taskId
in handoff notifier thread (#17185) (id: 63882) - Improved window functions that use the MSQ task engine so that its processor can send any number of rows and columns to the operator without having to partition by column (#17038)(id: 63249)
- Fixed an issue with PostgreSQL metadata storage because of table name casing issues (#17351) (id: 64128)
- Fixed an issue with Supervisor autoscaling which could cause it to get skipped when the Supervisor could be publishing or when
minTriggerScaleActionFrequencyMillis
hasn't elapsed (#17356) (id: 64226) - Fixed in issue in the web console where the progress indication for table input gets stuck at 0 (#17334) (id: 64209)
- Fixed an issue where batch segment allocation fails when there are replicas (#17262) (id: 64169)
- Fixed an issue when grouping on a string array and sorting by it (#17183) (id: 64166)
- Fixed an issue where duplicate compaction tasks might get launched (#17287) (id: 64154)
- Fixed a race condition for failed queries with the MSQ task engine (#17313) (id: 64153)
- Fixed several issues with the Explore view in the web console (#17234) (id: 64005) (#17240) (id: 64010) (#17225) (id: 63985)
- Fixed an issue with querying realtime segment when using concurrent append and replace (#17157) (id: 63852)
- Fixed an issue where Indexer tasks get stuck in a publishing state and must either get killed or hit the timeout (#17146) (id: 63800)
- Removed unused Coordinator dynamic configs
mergeSegmentsLimit
andmergeBytesLimit
(#17384) (id: 64267)
Imply Manager changes
- Fixed a problem where updated Helm values were sometimes incorrectly displayed (id: 64648)
Pivot changes
- The async download process now shows more information during the download process, including the number of rows processed (id: 60947)
- The time series visualization now supports the TIMESERIES function (id: 63901)
- In the records visualization you can now use the Nulls summary pill drop-down to turn off displaying the number of hidden null values (id: 64197)
- You can now set a minimum auto-refresh rate when creating or editing a dashboard (id: 64032)
- You can now preview the time range when adding a relative comparison to a visualization (id: 63944)
- You can now specify the date and time to start evaluating alerts (id: 40669)
- In the general options for a dashboard you can now set a default auto-refresh rate (id: 39798)
- Fixed an issue with editing a report after removing a dimension used as a report filter (id: 63475)
Upgrade and downgrade notes
In addition to the upgrade and downgrade notes, review the deprecations page regularly to see if any features you use are impacted.
Minimum supported version for rolling upgrade
See "Supported upgrade paths" in the Lifecycle Policy documentation.
Default string array ingestion
Starting in 2024.10 STS, SQL-based ingestion with the MSQ task engine defaults to array typed columns instead of multi-value dimensions (MVDs). You must adjust your queries to either use array typed columns or explicitly specify your arrays as MVDs in your ingestion query. For more information, refer to the product feature update that Imply shared.
Front-coded dictionaries
Once Druid starts using segments with front-coded dictionaries, you can't downgrade to a version where Druid doesn't support front-coded dictionaries. For more information, see Migration guide: front-coded dictionaries.
If you're already using this feature, you don't need to take any action.
Automatic compaction
Imply preserves your automatic compaction configurations upon upgrade.
Segment sorting
This feature is in alpha and not backwards compatible with versions earlier than 2024.09. If you enable it, you can't downgrade to a version earlier than 2024.09 STS.
You can now configure Druid to sort segments by something other than time first.
For SQL-based ingestion, include the query context parameter forceSegmentSortByTime: false
. For JSON-based batch and streaming ingestion, include forceSegmentSortByTime: false
in the dimensionsSpec
block.
(#16849) (id: 63215)
Changed low-level APIs for extensions
This information is meant for users who write their own Druid extensions and doesn't impact anyone who only uses extensions supported by Imply.
As part of changes starting in 2024.09 to improve the Druid, including the changes described in Segment sorting for Druid users, some low-level APIs used by some extensions may no longer be compatible with any existing custom extensions you have. For more information about which interfaces are impacted, see the following pull requests:
Compression for complex metric columns
If you use the IndexSpec
option complexMetricCompression
to compress complex metric columns, you cannot downgrade to a version that doesn't support compressing those columns.
This feature was introduced in 2024.09 STS.
(#16863) (id: 63277)
Changes to native equals
filter
Beginning in 2024.01 STS, the native query equals
filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema
reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays.
Imply Hybrid MySQL upgrade
Imply Hybrid previously used MySQL 5.7 by default. New clusters will use MySQL 8 by default. If you have an existing cluster, you'll need to upgrade the MySQL version since the Amazon RDS support end date for this version is scheduled for February 29, 2024. Although you can opt for extended support from Amazon, you can use Imply Hybrid Manager to upgrade your MySQL instance to MySQL 8.
The upgrade should have little to no impact on your queries but does require a reconnection to the database. The process can take an hour and services will reconnect to the database during the upgrade.
In preparation for the upgrade, you need to grant certain permissions to the Cloud Manager IAM role by applying the following policy:
Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"rds:CreateBlueGreenDeployment",
"rds:PromoteReadReplica"
],
"Resource": [
"arn:aws:rds:*:*:pg:*",
"arn:aws:rds:*:*:deployment:*",
"arn:aws:rds:*:*:*:imply-*"
],
"Effect": "Allow"
},
{
"Action": [
"rds:AddTagsToResource",
"rds:CreateDBInstanceReadReplica",
"rds:DeleteBlueGreenDeployment",
"rds:DescribeBlueGreenDeployments",
"rds:SwitchoverBlueGreenDeployment"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
After you grant the permissions, click Apply changes for Amazon RDS MySQL Update on the Overview page of Imply Hybrid Manager.
Three-valued logic
The legacy two-valued logic and the corresponding properties that support it will be removed in the January 2025 STS and January 2026 LTS. The SQL compatible three-valued logic will become the only option.
Update your queries and downstream apps prior to these releases.
SQL standard three-valued logic introduced in 2023.11 primarily affects filters using the logical NOT operation on columns with NULL values. This applies to both query and ingestion time filtering.
The following example illustrates the old behavior and the new behavior:
Consider the filter “x <> 'some value'”
to filter results for which x
is not equal to 'some value'
.
Previously, Druid included all rows not matching "x='some value'"
including null values.
The new behavior follows the SQL standard and will now only match rows with a value and which are not equal to 'some value'
.
Null values are excluded from the results.
This change primarily affects filters using the logical NOT operation on columns with NULL values.
Three-valued logic is only enabled if you accept the following default values:
druid.generic.useDefaultValueForNull=false
druid.expressions.useStrictBooleans=true
druid.generic.useThreeValueLogicForNativeFilters=true
SQL compatibility
The legacy behavior that is not compatible with standard ANSI SQL and the corresponding properties will be removed in the January 2025 STS and January 2026 LTS releases. The SQL-compatible behavior introduced in the 2023.09 STS will be the only behavior available.
Update your queries and any downstream apps prior to these releases.
Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.
For nulls, Druid now differentiates between an empty string (''
) and a record with no data as well as between an empty numerical record and 0
.
You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull
to true
. This property affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time and query time. Reverting this setting to the old value restores the previous behavior without reingestion.
For booleans, Druid now strictly uses 1
(true) or 0
(false). Previously, true and false could be represented either as true
and false
as well as 1
and 0
, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL
.
druid.expressions.useStrictBooleans
primarily affects querying, however it also affects json columns and type-aware schema discovery for ingestion. You can set druid.expressions.useStrictBooleans
to false
to configure Druid to ingest booleans in 'auto'
and 'json'
columns as VARCHAR (native STRING)
typed columns that use string values of 'true'
and 'false'
instead of BIGINT (native LONG)
. You must set it on all Druid service types to be available at both ingestion time and query time.
The following table illustrates some example scenarios and the impact of the changes:
Show the table
Query | 2023.08 STS and earlier | 2023.09 STS and later |
---|---|---|
Query empty string | Empty string ('' ) or null | Empty string ('' ) |
Query null string | Null or empty | Null |
COUNT(*) | All rows, including nulls | All rows, including nulls |
COUNT(column) | All rows excluding empty strings | All rows including empty strings but excluding nulls |
Expression 100 && 11 | 11 | 1 |
Expression 100 || 11 | 100 | 1 |
Null FLOAT/DOUBLE column | 0.0 | Null |
Null LONG column | 0 | Null |
Null __time column | 0, meaning 1970-01-01 00:00:00 UTC | 1970-01-01 00:00:00 UTC |
Null MVD column | '' | Null |
ARRAY | Null | Null |
COMPLEX | none | Null |
Update your queries
Before you upgrade, update your queries to account for the following changed behavior:
NULL filters
If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL
to s IS NULL OR s = ''
.
COUNT functions
COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column)
with COUNT(column) FILTER(WHERE column <> '')
.
GroupBy queries
GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.
Avatica JDBC driver upgrade
The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.
If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection
.
Parameter execution changes for Kafka
When using the built-in FileConfigProvider
for Kafka, interpolations are now intercepted by the JsonConfigurator
instead of being passed down to the Kafka provider. This breaks existing deployments.
For more information, see KIP-297 and #13023.
Deprecation notices
For a more complete list of deprecations and their planned removal dates, see Deprecations.
CentOS support
If you are using CentOS, migrate to a supported operating system: RHEL 7.x and 8.x or Ubuntu 18.04 and 20.04. Removal of support for CentOS has been planned for April 2025.
Some segment loading configs deprecated
The following segment related configs are now deprecated and will be removed in future releases:
replicationThrottleLimit
useRoundRobinSegmentAssignment
maxNonPrimaryReplicantsToLoad
decommissioningMaxPercentOfMaxSegmentsToMove
Use smartSegmentLoading
mode instead, which calculates values for these variables automatically.
ioCOnfig.inputSource.type.azure
storage schema
Update your ingestion specs to use the azureStorage
storage schema, which provides more capabilities.
ZooKeeper-based task discovery
Use HTTP-based task discovery instead, which has been the default since 2022.
End of support
Two-valued logic
Druid's legacy two-valued logic for native filters and the properties for maintaining that behavior are deprecated and will be removed in the January 2025 STS and January 2026 LTS releases.
The ANSI-SQL compliant three-valued logic will be the only supported behavior after these releases. This SQL-compatible behavior became the default in the Imply 2023.11 STS and January 2024 LTS releases.
Update your queries and downstream apps and remove the corresponding configs.
For more information, see three-valued logic.
Properties for legacy Druid SQL behavior
Druid's legacy behavior for Booleans and NULLs and the corresponding properties are deprecated and will be removed in the January 2025 STS and January 2026 LTS releases.
The ANSI-SQL compliant treatment of Booleans and null values will be the only supported behavior after these releases. This SQL-compatible behavior became the default in the Imply 2023.11 STS and January 2024 LTS releases.
Update your queries and downstream apps and remove the corresponding configs.
For more information, see SQL compatibility.
druid.azure.endpointSuffix
The config has been removed. Update any references to use druid.azure.storageAccountEndpointSuffix instead.
SysMonitor
support
Switch to OshiSysMonitor
as SysMonitor
is removed.
Asynchronous SQL download
The async downloads feature has been removed. This refers to an older version of async SQL download that has been replaced with a new version with the same name. For more information, see Download data.
ZooKeeper segment serving processes
ZooKeep-based segment loading has been disabled in 2024.06 STS.
In 2024.08 STS, segment serving processes such as Peons, Historicals and Indexers won't create ZooKeeper loadQueuePath
anymore. The property druid.zk.paths.loadQueuePath
will also be ignored if they are still in your configs.
If you are still using ZooKeeper-based segment loading and want to upgrade to a more recent release where only HTTP-based segment loading is supported, switch to HTTP-based segment loading before upgrading. For more information, see Segment management.
(#16816) (id: 62629)
Java 8
Java 8 for Druid is at end of support. We recommend you upgrade to Java 17.
JSON columns v3 and v4
JSON columns v3 and v4 is at end of support. Only JSON column v5 is supported and has been the default for several releases. While Druid can still read these older versions, it can't create those versions. Druid can only create v5 columns now.
After upgrading to a version with support for a higher JSON version, you cannot downgrade to an earlier version. Imply's distribution of Apache Druid® has been on JSON v5 since the 2024.01 LTS and 2023.09 STS.
Segment loading rules
Smart segment loading automatically calculates the optimal values for settings you previously had to manually set. As a result, the following settings are automatically ignored: maxSegmentsInNodeLoadingQueue
, maxSegmentsToMove
, replicantLifetime
, balancerComputeThreads
. Additionally, the cachingCost
balancer strategy is no longer supported.