Imply Enterprise and Hybrid release notes
Imply releases include Imply Manager, Pivot, Clarity, and Imply's distribution of Apache Druid®. Imply delivers improvements more quickly than open source because Imply's distribution of Apache Druid uses the primary branch of Apache Druid. This means that it isn't an exact match to any specific open source release. Any open source version numbers mentioned in the Imply documentation don't pertain to Imply's distribution of Apache Druid.
The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2024.10.1. Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. Additionally, review the deprecations page regularly to see if any features you use are impacted.
For information on the LTS release, see the LTS release notes.
If you are upgrading by more than one version, read the intermediate release notes too.
The following end-of-support dates apply in 2023:
- On January 26, 2023, Imply 2021.01 LTS reached EOL. This means that the 2021.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
- On January 31, 2023, Imply 2022.01 LTS ended general support status and is eligible only for security support.
For more information, see Lifecycle Policy.
See Previous versions for information on older releases.
Imply evaluation
New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!
With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.
Imply Enterprise
If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.
Changes in 2024.10.1
Druid changes
- Fixed an issue with complex metric compression that caused some complex column data with compression enabled to be read incorrectly, resulting in segment data corruption or system instability due to out-of-memory exceptions. We recommend that you reingest data if you're using complex metric compression. See advisory (#17422) (id: 64406)
Changes in 2024.10
Pivot highlights
Latest time filter granularity
You can now set the latest time filter granularity for a data cube when using the Pivot data cubes API and in the Advanced data cube UI settings.
When using a relative time filter, Pivot applies this level of granularity to align filter boundaries when querying the underlying table. If you used data rollup at ingestion time, set to the same granularity used during ingestion or use Infer from latest timestamp. If you're not using rollup, set to Infer from latest timestamp (id: 62573)
Improvement to alert example query
In the Pivot alerts API, exampleQuery
in the request response now evaluates to true
or false
.
You can copy an example query, customize it, and run it to determine whether it would trigger the alert (true
) or not (false
) (id: 63434)
Rolling timeframes in reports
You can now set a rolling timeframe when configuring a report with options of 7, 30, and 90 days. This sets the report's end date to the final date of the rolling timeframe, which is the day the report is generated (id: 60175)
Druid highlights
arrayIngestMode
defaults to array
The SQL-based ingestion query context parameter arrayIngestMode
now defaults to array
instead of mvd
. This means that SQL VARCHAR ARRAY
types is no longer implicitly translated and stored in VARCHAR
columns. Instead, it's stored as VARCHAR ARRAY
. This change permits other array types such as BIGINT ARRAY
and DOUBLE ARRAY
to be inserted with the MSQ task engine into their respective array column types instead of failing as they do in mvd
mode.
To continue to store multi-value strings, modify any INSERT or REPLACE queries to wrap the array types with the ARRAY_TO_MV
operator. If you rely on arrayIngestMode
to ingest MVDs, we recommend that you make these changes as soon as possible. The context parameter is deprecated and is scheduled to be removed in 2025.01 STS and LTS.
Validation is in place to prevent mixing VARCHAR
and VARCHAR ARRAY
columns in the same table, so any ingestions affected by this change will fail and provide a descriptive error message instead of exhibiting unexpected behavior.
Additionally, the arrayIngestMode
option of none
has been removed.
See the following topics for more information:
- Ingest multi-value dimensions for how to ingest multi-value strings.
- Ingest arrays for ingesting arrays.
(#17133) (id: 63778) (#16789) (id: 62550)
Compaction task array handling
Compaction tasks now always use ARRAY for multivalue handling (MultiValueHandling
) to preserve order if the column schema is not explicitly defined. This preserves the order of the values in the row so that they match what got stored in the initial ingestion.
(#17110) (id: 63726)
Explore view
You can now configure the Explore view on top of a source query instead of only existing tables. You can also point and click to edit the source query, store measures in the source query, and return to the state of your view using stateful URLs.
(#17180) (id: 63912)
Window functions GA
Window functions are now GA. You no longer need to use the query context parameter to run queries that include a window function.
(#17087) (id: 63737)
Improved Coordinator management
Improve the user experience around Coordinator management as follows:
- Added an API
GET /druid/coordinator/v1/duties
that returns a status list of all duty groups currently running on the Coordinator - The Coordinator now emits the following metrics:
segment/poll/time
,segment/pollWithSchema/time
, andsegment/buildSnapshot/time
- Removed redundant logs that indicate normal operation of well-tested aspects of the Coordinator
(#16959) (id: 63772)
Custom extensions
2024.10 STS completes the work started in 2024.09 to transition away from the StorageAdapter
. This change might affect you if you create your own custom extensions that uses it, such as a custom query engine. Before you upgrade, see Changed low level APIs for extensions to check if you're impacted.
(#16985) (id: 63473)
Imply Manager highlights
Imply installations on Kubernetes, excluding GKE enhanced, now support both x86 and ARM architectures. If your Kubernetes engine supports ARM instances, you can add ARM-based node pools to your cluster (id: 63133)
Other changes
Pivot changes
- In the bar chart visualization, you can now group multiple series horizontally instead of stacking them vertically (id: 63887)
- The time series visualization now supports the DOWNSAMPLED_SUM_TIMESERIES aggregation function (id: 63284)
- The PIVOT_LEFT_JOIN function now supports compound keys. See Dimensions: Joins for more information (id: 63493)
- You can now set the
disableAlertDelivery
property in the Pivot config file to hide the Delivery and Access alerts tabs from users (id: 63431) - Added an option to sort by current value when displaying a comparison in a data cube (id: 63566)
- Improved the display of lengthy measure names in visualizations (id: 36814)
- Fixed an issue where creating a records table visualization on an empty data set could block the report cycle (id: 63929)
- Fixed an issue where reports with no data and no Overall row could block the report cycle (id: 63635)
- Fixed a visualizations display issue where a large number of filters extended beyond the screen boundaries on small screens (id: 63654)
- Fixed a problem with the filter range when sorting a data cube by measure (id: 63566)
- Fixed a problem using boolean dimensions in explore visualizations (id: 63471)
- Fixed an issue with the accuracy of alert example queries (id: 63433)
- Fixed "current day" time filters so that they correctly use the configured refresh rate in visualizations (id: 63159)
Druid changes
- You can now include request headers for HTTP input sources for JSON-based (
inputSource.requestHeaders
) and SQL-based ingestion (as part ofheaders
in your FROM clause). To enable this feature, list the headers you want to allow indruid.ingestion.http.allowedHeaders
(#16974) (id: 63520) (id: 62867) - Added support for the
maxSubqueryBytes
context parameter for window functions (#16800) (id: 62151) - Added support for minimum (MIN), first (FIRST), last (LAST), and average (AVG) scalar expressions over a time series. For example,
MIN_OVER_TIMESERIES
(#2713) (id: 62717) - Added support for FILTER expressions for time series objects (#2719)(id: 62719)
- Added an option to force segment sort by time to the web console (Select engine > INSERT / REPLACE specific context) (#16967) (id: 63296)
- Added automatic query prioritization based on the period of the actual segments scanned in a query (#17009) (id: 63390)
- Added the
JSON_MERGE
expression, which merges two or more JSON strings orCOMPLEX<json>
expressions into one:JSON_MERGE(expr1, expr2,[...])
(#17081) (id: 63602) - Added a stage graph to the web console to better illustrate how stages are connected (#17135) (id: 63769)
- Added a warning to the logs any time a column is relying on
arrayIngestMode
being set to MVD (#17164) (id: 63916) - Added support for unnest to decoupled planning (#17177) (id: 63960)
- MSQ tasks now determines task lock type using both the default task context and the query context.
- Improved window functions so that they do not run repeatedly on every PARTITION BY change (#17211) (id: 63918)
- Improved the Explore view in the web console:
- You can now hide all null columns in a record table, declare certain parameter values as sticky, and expand a nested column into its constituent paths
- Fixed dragging of a STRING or VARCHAR column to a measure control not working, filtering on a predefined measure not working, and the drag over indicator not clearing sometimes (#17213) (id: 63965)
- Fixed an issue where not all output rows get written when the frame writer doesn't have capacity (#17209)(id: 63851)
- Fixed an issue with ARRAY_TO_MV where an object selector is asked for a multi-value string but the expression virtual column wasn't expecting a multi-value string expression (#17162) (id: 63828)
- Fixed an issue where the boost column wasn't being written to the frame during a window stage (#17141) (id: 63767)
- Fixed an issue where schema and table names weren't URL encoded when syncing the catalog (#17131) (id: 63764)
- Fixed an issue where the
PostJoinCursor
it was unable to be interrupted, leading to joins taking a long time but not timing out (#17099) (id: 63729) - Fixed an issue where the error message for suggested server memory did not account for the maximum number of concurrent stages (#17108) (id: 63702)
- Fixed an issue where a generic canceled fault was shown instead of the actual error when starting up
RunWorkOrder
(#17069) (id: 63632) - Fixed the formatting of an error message (#17061) (id: 63577)
- Fixed an issue where JSON content was closed upon encountering an error, making it difficult to detect errors (#17034) (id: 63576)
- Fixed two issues with phase transitions for the MSQ task engine (#17053) (id: 63571)
- Fixed an issue where processing tasks don't get cleaned up leading to unnecessary processing for tasks whose results won't get used (#17037) (id: 63525)
- Fixed the
maxRowsInMemory
default for streaming in the UI not being accurate. It is now 150,000 (#17028) (id: 63521) - Fixed an issue where the supervisor task report API logged the wrong exception (#17016) (id: 63440)
- Fixed an issue where native batch ingestion jobs could fail during rolling upgrades from versions prior to 2024.05 to 2024.09 (#17219) (id: 63422)
- Added the ability to skip emitting certain categories of system metrics reported by the OshiSysMonitor (#16972) (id:63089)
- Fixed an issue where some SQL types could not be mapped to native Druid types, leading them to be converted to null literals of the cast type comparison should've returned boolean values instead of null (#17011) (id: 63383)
- Fixed an issue with segment balancing when the Coordinator period is too low, under 30 seconds (#16984) (id: 63301)
- Fixed an issue where the metadatacache attempts to continuously refresh tombstone segments due to missing metadata. The metadata cache now ignores tombstone segments (#16890) (id: 63078)
- Fixed an issue with window functions when using the MSQ task engine for queries that include the ARRAY_CONCAT_AGG (#16971) (id: 62779)
- Fixed an issue where queries with the ARRAY_CONCAT_AGG aggregator returned empty values (#16885) (id: 62777)
- Improved query filtering so that Druid tries to arrange query filters based on the computational cost of bitmap indexes, prioritizing less expensive filters for computation first. Filters with high compute costs relative to the number of rows they can filter might be omitted. (#17125) (id: 63829) (#17055) (id: 63656)
- Improved how tooltips are displayed in the web console and how CPU counters are displayed (#17132) (id: 63779)
- Improved performance of window functions by comparing Strings in frame bytes without converting them (#17091) (id: 63728)
- Improved worker cancellation for the MSQ task engine to prevent race conditions (#17046) (id: 63580)
- Improved memory management for the MSQ task engine to better support multi-threaded workers in shared JVMs (#17057) (id: 63575)
- Improved the query view in the web console, including now being able to configure the maximum number of tasks through the UI (#16991) (id: 63522)
- Improved the compaction status API response to be compatible with compaction supervisors (#17006) (id: 63402)
- Improved system segment query performance by creating fewer temporary maps (#16981) (id: 63333)
- Improved memory usage for non-query processing tasks by not reserving the query buffers for them (#16887) (id: 63152)
- Improved window functions so that they reject MVDs since they aren't supported for window functions (id: 61505)
- Dependency changes:
commons-lang
tocommons-lang3
(#17156)(id: 63804)joni
from version 2.1.27 to 2.1.34 (#17017) (id: 63418)- Removed
aether
andnet.thisptr
in favor ofmaven-resolver-api
(#17017) (id: 63418) Numbus.jose.jwt
from version 9.37.2 to 8.22.1 to resolve an HTTP 500 error (#16986) (id: 63302)
Imply Enterprise on K8S
If you run Imply Enterprise on Kubernetes with ARM architecture, you must use an external metadata store and an external deep storage.
Clarity changes
- Added the following ingestion measures to Ingestion data cube:
Failed Tasks
,Successful Tasks
,Failed Tasks %
,Events Processed %
,Events Unparseable %
,Events Thrown Away %
,Processed With Error %
,Distinct Peons
(id: 63664)
Changes in 2024.09
Pivot highlights
Records table (explore) visualization (beta)
The new records table (explore) visualization shows the raw data underlying the data cube. You can use the options panel to show the row indices and display columns from the underlying data source instead of the data cube dimensions.
You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, you can continue to access the original records table as well.
(id: 61646)
Druid highlights
Compressing complex metric columns
Compression is now available for all "complex" metric columns that do not have specialized implementations through the new IndexSpec
option complexMetricCompression
. This option defaults to uncompressed for backwards compatibility but can be configured to any compression strategy (lz4
, zstd
, etc). It works for most complex columns except for compressed-big-decimal and the columns stored by first/last aggregators.
This feature is not backwards compatible with versions earlier than 2024.09 STS. If you enable it, you can't downgrade to a version earlier than 2024.09 STS. Only enable this feature once you're certain that there is no need to roll-back to an older version.
(#16863) (id: 63277)
Segment sorting (alpha)
This feature is in alpha and not backwards compatible with versions earlier than 2024.09. If you enable it, you can't downgrade to a version earlier than 2024.09 STS.
You can now configure Druid to sort segments by something other than the __time
column first.
For SQL-based ingestion, include the query context parameter forceSegmentSortByTime: false
. For JSON-based batch and streaming ingestion, include forceSegmentSortByTime: false
in the dimensionsSpec
block.
(#16849) (id: 63215)
Custom extensions
This information is meant for users who write their own custom extensions. It doesn't impact you if you only use extensions supported by Imply.
If you write custom extensions, specifically query engines or anything else involving the StorageAdapter
, consider skipping the 2024.09 STS. There are ongoing changes to low-level APIs that may impact your extension. Prepare for these changes before upgrading to 2024.09 STS or later. For more information, see the following pull requests that are part of this STS release:
For information about the goal of these changes and how they all fit together, see #16985. While this PR is not in 2024.09 STS, it'll be in an upcoming release and describes how the changes in 16533, 16849, and 16917 change how you write custom extensions like query engines.
Imply Manager highlights
ARM support
Starting with Imply Manager 2024.09
and Imply Agent v7, Imply Enterprise on Linux supports x86 and ARM architectures. Previous versions of Imply Manager and Imply Agent only support x86 architecture.
(id: 62789, 62791)
Other changes
Pivot changes
- Fixed flat table visualization reports sometimes generating invalid SQL (id: 63231)
- Fixed dashboard tiles continuously refreshing when a data cube's Latest data strategy is set to use the current time (id: 63137)
- Fixed Pivot not canceling report queries that time out (id: 62668)
Druid changes
- Added a fallback SQL IN filter to expression filters when
VirtualColumnRegistry
is null (#16836) (id: 62752) - Added the option to use concurrent locks with queries that use the MSQ task engine through the web console (#16899) (id: 61172)
- Added support for the Kinesis input format to the web console (#16850) (id: 62963)
- Added the ability to choose whether or not to use concurrent locks in the Engine menu in the Query view (#16899) (id: 63006)
- Added the ability to handoff tasks for a supervisor early using the web console (#16586) (id: 63065)
- Added column mapping information to the Explain dialog in the web console (#16598) (id: 62770)
- Added metrics about the size of the subquery results materialized on the Broker (#16835) (id: 62146)
- Added segment loading rate metric for
HttpLoadQueuePeon
on the Coordinator and an expected load time to the response for/druid/coordinator/v1/loadQueue?simple
(#16691) (id: 62748) - Added logging in streaming supervisors for when task groups are greater than partition count (#16948) (id: 63237)
- Added indexer task success and failure metrics (#16829) (id: 62708)
- Added the stages counters and warnings from the new detailed status API to the web console (#16809) (id: 62700)
- Changed
IndexedStringDruidPredicateIndexes
to not needlessly lookup index of values (#16860) (id: 62962) - Changed
__time
in row signatures to be placed according to sort order rather than always first (#16958) (id: 63248) - Changed Druid table schema resolution to prefer the Druid catalog over the schema manager (#16869) (id: 62961)
- Fixed an issue that was introduced in 2024.08 when ZooKeeper-based segment management was removed (#16816) (id: 62785) (id: 62629)
- Fixed an issue where segment schemas for segments doesn't get updated. As part of this change, the following metrics have been added:
metadatacache/cold/segment/count
,metadatacache/cold/refresh/count
, andmetadatacache/cold/process/time
(id: 62781) - Fixed a buffer capacity race condition in Spatial floats (#16931) (id: 63241)
- Fixed an issue with fetching task reports from the SQL statements endpoint for deployments that use the Middle Manager (#16832) (id: 62714)
- Fixed an issue with nullable DATE TIMESTAMP reduction (#16915) (id: 63067)
- Fixed an issue with the IPV4_PARSE function would return different values for SQL and native because of the data type. The function now always returns a value in type BIGINT (#16942) (id: 63200)
- Fixed an issue in the web console where you couldn't collapse a table. Additionally, all tables used in a query are now expanded (#16910) (id: 63068)
- Fixed an issue where the
TooManyRowsInAWindowFault
error is not shown (#16906) (id: 63032) - Fixed a NPE that occurred when an intermediate column is not found (#16897) (id: 63009) (id: 62778)
- Fixed the Druid Console not being able to open the Submit Supervisor dialog (#16736) (id: 62328)
- Fixed an issue where queries like
select count(distinct c_json) from mytest1
run with the MSQ task engine returngetExtractor() UnsupportedOperationException
errors (#16825) (id: 61958) - Fixed an issue with the SuperSorter that led to excessive memory use(#16928) (id: 63154)
- Fixed a bug that causes peons to fail while starting up when
WorkerTaskCountStatsMonitor
is used on Middle Manager services (#16875) (id: 62943) - Fixed an issue where window functions that use the MSQ task engine fails with
IndexOutOfBoundsException
error (#16865) (id: 62923) - Fixed inconsistent range current row behavior for window functions (#16833) (id: 62565)
- Fixed a query correctness issue when using more than one MSQ task engine worker for window functions (#16804) (id: 62339)
- Fixed a NPE that occurred during SQL-based ingestion (#16854) (id: 62780)
- Improved the SQL statements endpoint by memoizing the redundant calls to the Overlord (#16839) (id: 62753)
- Improved exception handling in the
druid-pac4j
extension (#16979) (id: 63336) - Improved the web console so that the Supervisor view can display the Status and Stats columns after the main supervisors are loaded (#16952) (id: 63276)
- Increased query cancellation timeout to 1 second (#16656) (id: 62964)
- Improved the tracking for
IngestionState
more accurately in realtime tasks (#16934) (id: 63184) - Improved error messages for frame files (#16912) (id: 63135)
- Improved the
sys.segments
table to only generate data for columns that are queried (#16841) (id: 6055) - Improved how the web console handles server context defaults (#16868) (id: 62960)
- Improved performance for the SuperSorter by performing direct merging where possible and increasing parallelism (#16775) (id: 62858)
- Improved window function performancing by batching multiple PARTITION BY keys for processing (#16823) (id: 62589)
- Removed references to the outdated
chatAsync
from Rabbit stream supervisors (#16950) (id: 63207) - Removed unused
cachingCost
strategy runtime properties (#16918) (id: 63072) - Updated Axios to 1.7.4 to resolve a CVE (#16898) (id: 63007)
Imply Manager changes
- Fixed an issue where moving a deployment to a private GKE failed if a compiled setup script is used or if the setup script was run on a machine where
tfvars
weren't already generated (id: 63264) - Fixed an issue where restarting a machine also restarted stopped Druid processes (id: 60783)
- Fixed a problem with adding custom user files to a Kubernetes deployment (id: 63096)
Changes in 2024.08
Pivot highlights
Bar chart visualization (beta)
The new bar chart visualization is highly configurable and optimized for the display of multiple measures in separate charts.
You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, you can continue to access the original vertical bars as well.
(id: 41869)
Time zone setting for data cubes and dashboards
You can now select a timezone to apply to all data cubes and dashboards. See Data cube options for details.
(id: 39603)
Druid highlights
New config for how string dimensions are treated
The new optional cluster configuration druid.indexing.formats.stringMultiValueHandlingMode
gives you the option to override the default mode SORTED_SET for string dimensions. The possible values for the config are SORTED_SET
, SORTED_ARRAY
, or ARRAY
. SORTED_SET
is the default, and the values are not case sensitive.
While this cluster property allows users to manage the multi-value handling mode for string dimension types, we recommend you migrate to using real array types instead of MVDs.
(#16822) (id: 61521)
Improved query from deep storage
If you enable centralized datasource schemas, you can now query datasources that only exist in deep storage. Previously, at least one segment of the datasource had to be loaded onto a Historical process.
(#16676) (id: 61467)
Imply Manager highlights
Java 17 support
Imply Manager on Linux now supports Java 17.
(id: 61842)
Other changes
Pivot changes
- Explore visualizations now have improved auto-granulity and controls (id: 62023)
- Fixed an issue where downloading from explore visualizations didn't apply a default time filter (id: 62605)
- Fixed standard download failing for flat table visualization when grouping data by time (id: 62451)
- Fixed an issue where the Auto label for the line chart visualization displayed a value different from the actual limit (id: 62015)
- Fixed an issue where data cubes couldn't be refreshed more frequently than once per minute (id: 61963)
- Fixed Pivot allowing users to create multiple folders with the same name (id: 61854)
- Fixed an issue where the download output for HLL sketch columns included decimal points (id: 60859)
- Fixed Pivot 2 visualizations incorrectly bucketing data for number dimensions with underlying string column (id: 60661)
- Fixed records table visualization failing when downloading data or sending a report (id: 33228)
- Fixed query parameter filters don't support multi-value dimensions (id: 62689)
- Fixed an issue where data cube query parameters weren't applied if the default visualization was an explore visualizations (id: 62739)
Druid changes
- Added an option to hide the workbenchview toolbar in the web console (#16785) (id: 62485)
- Changed the behavior for HAVING clauses with window functions. They are rejected. (#16742) (id: 62594)
- Changed the Coordinator service to check for tombstones in wrapping storage adapters (#16791) (id: 62554)
- Exposed hooks to customize the workbenchview (#16749) (id: 62400)
- Fixed an NPE that occurred while converting
SCALAR_IN_ARRAY
to aDimFilter
(#16836) (id: 62663) - Fixed excessive logging from the
druid-basic-security
extension (#16767) (id: 62417) - Fixed a NPE in number formatting (#16760) (id: 62415)
- Fixed an issue where a windowed aggregation fails if there's an array type selector involved (#16653) (id: 61515)
- Fixed an issue where window functions treated RANGE clauses with compound ORDER BY clauses inconsistently. Some resulted in errors while others didn't (#16718) (id: 62187)
- Fixed a few Druid web console bugs:
- Unhelpful filter shortcut
- Titles for action menus
- Scrolling in the load rules editor
- MSQ counter calculation (#16735) (id: 62374)(id: 61995)
- Use in-memory cache on Overlord to serve task information even after leader re-election (#16750) (id: 62364)
- Fixed a bug when running queries with a limit clause (#16643) (id: 62193)
- Improved WindowFrames to be more specific (#16741) (id: 62547)
- Improved query behavior related to COMPLEX types. They're coerced to numbers in numeric aggregators when possible (#16564) (id: 62562)
- Improved join behavior to be more responsive to cancellation (#16773) (id: 62549)
- Improved Broker parallel merges so that processes get cancelled more quickly when an error occurs (#16748) (id: 62487)
- Improved window functions to provide better feedback when a query is unsupported (#16738) (id: 62424)
- Improved performance (#16714) (id: 62285)
- Improved the error handling for unsupported aggregations when
useApproxCountDistinct
is enabled (#16770) (id: 62230) - Improved the reset for the SQL data loader (#16696) (id: 62284)
- Improved log cleanup in for Kubernetes node discovery (#16701) (id: 62152)
- Improved guardrails for unsupported queries with approximate count distinct on complex columns of unsupported types (#16682) (id: 61936)
- Improved the fallback strategy when the Broker can't materialize a subquery's results (#16679) (id: 61785) (id: 61784) (id: 61783)
- Improved how Druid deduces the column type for aggregators in groupBy subqueries (16703) (id: 61782)
- Improved kill tasks so that they don't delete a segment from deep storage if an unused segment has the same load spec. The load spec only gets deleted once all the metadata references are removed. (#16667) (id: 43121)
- Optimized metadata store actions for Kill tasks (#16734) (id: 62302)
- The following dependencies have had their version bumped:
jclouds.version
from 2.5.0 to 2.6.0 (#16796) (id: 62592)io.grpc:grpc-netty-shaded
from 1.57.2 to 1.65.1 (#16731) (id: 62591)- Blueprint from 4 to 5 (#16756) (id: 62395)
- Reduced logging in RetryableS3OutputStream (#16853) (id: 62872)
- Removed the following features:
- Firehose ingestion has been removed. It hasn't been the recommended way to ingest data for quite some time. Imply recommends that you use SQL-based ingestion for batch ingestion. (#16758) (id: 62414)
- The native scan query 'legacy' mode has been removed. This change should not impact most deployments. The legacy mode was introduced in Apache Druid® 0.11 to maintain compatibility during an upgrade from older versions of Druid where the scan query was part of a contributor' extension. (#16659) (id: 62394)
- Task action audit logging was deprecated in Apache Druid® 0.13 and has been completely removed in this release. This change should not impact most deployments. As part of this removal, the following changes have been made:
- The API
/indexer/v1/task/{taskId}/segments
is not supported anymore and will return a 404 NOT FOUND response. - Druid will not write to or read from the metadata table
druid_taskLog
anymore. - The property
druid.indexer.auditlog.enabled
will be ignored by Druid. - The metric
task/action/log/time
won't be emitted anymore.(#16309) (id: 62345)
- The API
- Batch processing mode for task configs has been removed. Batch ingestion now always uses closed segment sinks. This change should have no impact on most deployments (#16765) (id: 62433)
Imply Manager changes
- You can now configure idle session timeouts in Imply Manager (id: 61616)
- Fixed an issue where creating a new cluster or scaling up data tiers failed to update Imply Hybrid (id: 62092)
- Fixed index validations failing for PostgreSQL tables with names containing uppercase letters (id: 61664)
- Updated the default version of MySQL to 8.0 for new clusters in Imply Private (GKE) (id: 39231)
Clarity changes
- You can now see the space used by each historical using the
Historicals Space Used %
measure in the Server metrics data cube (id: 62283) - Fixed an issue with Clarity only showing the first 200 projects (id: 61975)
Changes in 2024.07
Pivot highlights
Multi-axis line chart visualization (beta)
The new multi-axis line chart visualization is optimized for the display of multiple axes—you can display up to 10 measures on a single chart.
You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, you can continue to access the original line chart as well (id: 39612)
Window functions in custom measures
You can now use window functions when defining custom measures in data cubes. See Custom measure examples for details (id: 41427)
Other changes
Pivot changes
- Fixed folder duplicates appearing in the dimensions list (id: 61757)
- Fixed re-sorted axes producing incorrect filters in the table visualization (id: 61723)
- Fixed some dashboard tiles not applying filters when they are off-screen (id: 61722)
- Fixed a problem displaying "Other" values in a filtered table visualization (id: 61704)
- Fixed data cube instances not working with some visualizations (id: 61612)
- Fixed filter not persisting in the filter bar in the flat table visualization (id: 61499)
- Fixed a problem with downloads from flat table visualizations when All rows is selected (id: 61045)
Druid changes
- Added a way for columns to provide
GroupByVectorColumnSelectors
, which controls how the GroupBy engine operates on them (#16338) (id: 61938) - Added
druid-parquet-extensions
to all example quickstarts (#16664) (id: 61937) - Added support for bootstrap segments (#16609) (id: 61844)
- Added formatted JSON values to web console displays (#16632) (id: 61827)
- Added druid.azure.account and druid.azure.container properties to Azure deep storage configuration (#16561) (id: 61643)
- Added authorization checks for permissionless internal requests (#16419) (id: 61543)
- Added interface method for returning canonical lookup name (#16557) (id: 61542)
- Added new usage metrics for CPU and memory control groups (#16472) (id: 61401)
- Added the appropriate hash strategy and the equals method for IP types so they can be grouped on (id: 60641)
- Added private method
handleConnectionStateChanged
to handle connection state changes (#16528) (id: 57657) - Improved `AbstractSegmentMetadataCache by changing log level to debug to avoid logging signature for each segment (#16565) (id: 61552)
- Improved allocation and supervisor logs for easier debugging (#16535) (id: 61468)
- Improved
AutoCompactionSnapshotBuilder
(#16523) (id: 61447) - Improved event hubs by disabling when Kafka extensions aren't loaded (#16559) (id: 61572)
- Improved window operators by enabling reordering (#16482) (id: 39694)
- Improved Kafka support by enabling use of CSV input format in Kafka record when "Parse Kafka metadata" is also enabled (#16630) (id: 61255)
- Improved
S3UploadThreadPool
by exposing its metrics (#16616) (id: 61732) - Improved
GroupIteratorForWindowFrame
by extending its use for aggregate computations of PeerType ROWS (#16603) (id: 61834) - Improved segment allocation by optimizing unused segment query (#16623) (id: 61731)
- Improved
JsonInputFormat
by simplifying its serialized form (#15691) (id: 61547) - Improved
druid.indexer.tasklock.batchAllocationWaitTime
by updating default value to zero (#16578) (id: 43004) - Improved
UsedSegmentChecker
by renaming it toPublishedSegmentsRetriever
and cleaning up task actions (#16644) (id: 61940) - Improved Azure extension by removing an unused converter file (#16541) (id: 61491)
- Improved
ResultCache
keys by removing incorrect UTF8 conversion (#16569) (id: 61693) - Improved indexing by removing
indexrealtime
andindexrealtimeappenderator
tasks (#16602) (id: 61895) - Fixed vector grouping expression deferring evaluation to only consider dictionary-encoded strings as fixed width (#16666) (id: 61965)
- Fixed a NPE that occurred when
segmentGranularity
was set to null (#16713) (id: 62227) - Fixed null pointer exceptions when
CgroupCpuSetMonitor
was enabled (#16621) (id: 61806) (id: 61841) - Fixed duplicate entry logs during pending segment allocation (#16605) (id: 61770)
- Fixed attempts to publish the same pending segments multiple times (#16605) (id: 61769)
- Fixed retry logic in
BrokerClient
(#16618) (id: 61746) - Fixed task replica failures due to inconsistent metadata (#16614) (id: 61730)
- Fixed a bug causing
maxSubqueryBytes
to fail when segments have missing columns (#16619) (id: 61659) - Fixed
NestedDataColumnIndexerV4
reporting incorrect cardinality (#16507) (id: 61656) - Fixed pagination and filtering regression in supervisor view in the web console (#16571) (id: 61644)
- Fixed expression column capabilities so that they don't report as dictionary-encoded unless the input is a string (#16577) (id: 61642)
- Fixed query with
floor(exp(least()))
in the filter returning incorrect result (#16649) (id: 61594) - Fixed query with
greatest(floor())
in the filter returning incorrect result (#16649) (id: 61592) - Fixed capabilities reported by
UnnestStorageAdapter
(#16551) (id: 61544) - Fixed delta sorting in the explore view table in the web console (#16542) (id: 61501)
- Fixed race condition in AzureClient factory fetch (#16525) (id: 61466)
- Fixed a condition where 2 coordinators are elected leader (#16411) (id: 61456)
- Fixed
ip_stringify()
error that the column doesn't exist when there is a segment of null values (id: 61436) - Fixed queries filtering for the same condition with both an IN and EQUALS so they don't return empty results (#16597) (id: 61239)
- Fixed schema backfill count metric (#16536) (id: 60745)
- Fixed the grouping engine for a query with grouping sets when a limit is applied with order by columns different to the query dimensions (#16534) (id: 60356)
- Fixed window function in a subquery returning "Cannot convert to Scan query without any columns" (#16502) (id: 42784)
- Fixed
is null
filter on an unnest query withjson_value()
returning "Unhandled Query Planning Failure" (id: 37339) - Fixed
is not null
filter has no effect on an unnest query withjson_value()
output (id: 37258) - Upgraded
DeepJavaLibrary(DJL)
to address CVE-2024-37902 (id: 61918) - Upgraded Calcite to 1.37 (#16504) (id: 61734) (id: 60501)
Clarity changes
- Added the ability to distinguish between task types in the ingestion view (id: 62005)
- More accurate and faster percentile calculations (id: 61620)
- Added
fsDevName
andfsDirName
as dimensions in Raw metrics (id:62000) - Fixed the raw metrics
count
measure (id:61892) - Added
Asia/Yangon
timezone (id: 61701)
Imply Manager changes
- Added support for managing
loadBalancerSourceRanges
to the Helm chart (id: 3128) - You can now forbid a password from including any of the following: username, email, first or last name (id: 43193)
- Imply Enterprise enhanced on GKE can now be configured to use internal IP addresses for cluster nodes (id: 43068)
- Fixed a problem with
timestamp_format
in theCloudFormation
logs (id: 61865) - Fixed an issue where password requirements were not being validated (id: 3480)
Changes in 2024.06.1
Druid changes
- Improved the query that's used to fetch unused segments for a datasource. It now finishes more quickly. In a datasource with 1.8 million unused segments, Druid can now return results in less than a second. Previously, results in that scenario could take over 30 seconds. A long wait for results could lead to issues for the Overlord service (#16623) (id: 61731)
Changes in 2024.06
Pivot highlights
Data cube time zone setting
If you change the time zone in a data cube's settings, you can now apply the same time zone the next time you access the data cube—as either a one-time or persistent change. See Managing data cubes for details.
(id: 39603)
Pinned dimensions in query parameters
You can now use pinnedDimensions
as a query parameter in a Pivot URL to pin one or more specified dimensions to the sidebar. See Data cube and dashboard query parameters reference for details.
(id: 60664)
Druid highlights
High-precision geospatial filters
High-precision geospatial filters use a geo
dimension to provide the same filters and bound types as spatial dimensions and filters but at a higher level of precision, offering more options for how you work with and utilize your geospatial data. They replace the lower precision geospatial filters that Druid offers out-of-the-box.
To enable them, load the imply-utility-belt
extension. For more information, see High-precision geospatial filters.
Zookeeper-based segment loading removed
The improvements made to the Druid Coordinator improve the experience with HTTP-based segment loading.
Therefore, Zookeeper-based segment loading is being removed as it is known to have issues and has been deprecated for several releases.
The following configs are being removed as they are not used anymore:
druid.coordinator.load.timeout
: Not needed as the default value of this parameter (15 minutes) is known to work well for all clusters
druid.coordinator.loadqueuepeon.type
: Not needed as this value will always be http
druid.coordinator.curator.loadqueuepeon.numCallbackThreads
: Not needed as zookeeper(curator)-based segment loading is not an option anymore
If set in any cluster, these configs will be ignored by Druid.
Automatic cleanup of compaction configs of inactive datasources is now enabled by default.
(#15705) (id: 60764)
Imply Manager highlights
Improved rolling updates for Imply Hybrid
Queries running during a rolling update are now more resilient. Previously, they might fail due to being routed to a decommissioned node.
To use this feature, do the following:
- In your AWS account, grant Imply Manager the following IAM permission:
elasticloadbalancing:DeregisterInstancesFromLoadBalancer
- In Imply Manager, enable the following feature flag: Deregister instances from ELB during a rolling update.
This feature is exclusive to Imply Hybrid and was deployed as part of a control plane update following the STS release.
Other changes
Pivot changes
- Improved performance of the background reports job runner (id: 60996)
- Updated
AccessDashboards
so that users with this permission can't expand a tile in a dashboard or access the underlying data cube (id: 60285) - Fixed an error when navigating from the records or records table visualizations to the overall visualization (id: 60787)
- Fixed report owner's inability to see a report when they are removed from the recipients list (id: 42841)
- Fixed an issue where alert creation fails when the first data cube in the list is missing a primary time dimension (id: 40948)
- Fixed a problem with axis data truncating in the line chart visualization (id: 61173)
- Fixed string dimensions filter failing in some circumstances (id: 60545)
- Fixed some column headings not appearing in a data cube with multiple measures (id: 43091)
Druid changes
- Added retries to the S3 client to better handle transient errors related to finding a region (#16438)(id: 39357)
- Added validation to prevent a datasource that's being ingested into from being queried if the query includes real-time sources. This prevents issues with fetching segment details (#16310) (id: 44895)
- Added a new API that makes a best effort to trigger a handoff for tasks of a supervisor early:
/druid/indexer/v1/supervisor/{supervisorId}/taskGroups/handoff
(#16310) (id: 60152) - Added support for sorting on complex column types when using the MSQ task engine (id: 60588)
- Added support for rolling up geo complex columns in MSQ (id: 61133)
- Added native filter conversion for SCALAR_IN_ARRAY (#16312) (id: 60873)
- Added MSQ support for using the selective lookup loading (id: 60714)
- Changed lookups for compaction tasks to no longer unnecessarily load by default (#16420) (id: 61036)
- Fixed an issue where a rolling or downgrade caused batch ingestion tasks to fail (#16556) (id:61523)
- Fixed a race condition that could occur when you queried data that was in Azure-based deep storage (#16525) (id: 61462)
- Fixed an issue where having two exact COUNT(DISTINCT ) aggregations with certain conditions produces a data correctness issue. (#16402)(id: 60355)
- Fixed an issue where small concurrent lookup queries time out after 5 minutes (id: 60657)
- Fixed a NPE in the segment schema cache (#16404) (id: 61289)
- Fixed an issue where sorting on a delta column in the Explore view table results in an error (#16417) (id: 61283)
- Fixed an issue with Geo columns where deserialization on Peon services violated the buffer (#16389) (id: 60896)
- Fixed an issue where an exception led to columns leaking (#16365) (id: 60874)
- Fixed an issue where sketches didn't downsample sketches sufficiently. This could lead to situations where sketches exceeded their allowed memory usage (#16119)(id: 43055)
- Fixed issues with type-aware schema discovery related to grouping:
- Inconsistent results when grouping on real-time data with type-aware schema discovery
- discovered LONG and DOUBLe type columns incorrectly report not having null values, resulting in incorrect null handling when grouping (#16489)(id: 61259)
- Improved how the web console detects durable storage settings (#16493) (id: 61329)
- Improved the MSQ export log error message (#16363) (id: 61287)
- Improved the web console to use globs instead of filters for files (#16452) (id: 61281)
- Improved the web console's Download all button to produce a single file with concatenated data instead of individual files (#16375) (id: 60857)
- Improved the Supervisor view in the web console to provide information dynamically (#16318) (id: 60840)
- Improved the speed of SQL IN queries that use the SCALAR_IN_ARRAY function (#16388) (id: 43245) (id: 42869)
- Improved sketches that use the MSQ task engine to reduce memory usage when transferring sketches between the controller and worker (#16269)(id: 42171)
- Updated the web console's Druid doctor check to accept Java 17 (#16250) (id: 61280)
- Updated
org.scala-lang:scala-library
from 2.13.11 to 2.13.14 (#16364) (id: 61291) - Updated the MySQL JDBC connector (
mysql:mysql-connector-java
) to 8.2.0 (#16024) (id: 42634)
Imply Manager changes
- You can now add PreStop hooks to query and master nodes through the Helm chart. Additionally, query pods now include a default PreStop hook that provides time for LoadBalancers and Ingress controllers to reconcile (id: 60925)
- Added support for
c6g.2xlarge
(AWS) query nodes in Imply Hybrid (id: 60869)
Changes in 2024.05.2
Pivot changes
- Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)
Changes in 2024.05.1
Pivot changes
- Improved the Pivot alerts API to provide the details for a planned query (id: 40980)
Druid changes
- Fixed an issue with new tasks failing to get the location of currently running tasks (id: 61110)
- Fixed an issue with numeric replace-with-default behavior when using Druid legacy null handling mode (id: 61139)
SaaS Clarity changes
- Added Leader dimension srvice/heartbeat metric (id: 43031)
- Fixed an issue with Pivot query cancellation (id: 43222)
- Disabled custom time alerts for less than one minute (id: 43223)
- Disabled alert previews (# 43263)
Changes in 2024.05
Pivot highlights
Download types feature flag
If you're using the Async download (alpha) feature, a new feature flag Download types (Alpha) allows you to control the download options available to users in datacubes. The feature flag accepts a JSON array of values to enable download types and set their priority:
["standard","async"]
sets standard download as the default experience, with async download (alpha) available as a secondary option.["async","standard"]
sets async download (alpha) as the default experience, with standard download available as a secondary option.["standard"]
sets standard download as the only download experience.["async"]
sets async download (alpha) as the only download experience. In the case of an invalid array, Pivot applies["standard","async"]
. An empty array turns off all downloads.
See Download data for more information.
(id: 43248)
Druid highlights
Zookeeper-based segment loading turned off
Zookeeper-based segment loading is no longer supported. You do not need to take any action. Druid ignores the following related configs:
druid.coordinator.load.timeout
druid.coordinator.loadqueuepeon.type
druid.coordinator.curator.loadqueuepeon.numCallbackThreads
Druid now only uses the recommended HTTP loading, which includes improvements to the Coordinator service such as smart segment loading.
As part of this change, compaction configs for inactive datasources are automatically cleaned up by default.
(#15705) (#id: 60764)
Manifest files for MSQ task engine exports (beta)
Export queries that use the MSQ task engine now also create a manifest file at the destination, which lists the files created by the query.
During a rolling update, older versions of workers don't return a list of exported files, and older Controllers don't create a manifest file. Therefore, export queries ran during this time might have incomplete manifests.
(#15953) (id: 42101)
MSQ task engine compaction state
You can now include a storeCompactionState
context parameter to MSQ task engine replace queries. If set to true
, segment metadata would include the last compaction state, which allows compaction jobs to skip segments where the compaction state matches the desired state (#15965) (id: 60301) (id: 39754)
Kinesis autoscaling
The Kinesis autoscaler now considers max lag in minutes instead of total lag. To maintain backwards compatibility, this change is opt-in for existing Kinesis connections. To opt in, set lagBased.lagAggregate
in your supervisor spec to MAX
. New connections use max lag by default. (#16284) (id: 60222) (#16334) (id: 60672) (#16314) (id: 60572)
Double and null values in arrays
Druid now supports double or null values for SQL array types when you use dynamic parameters in a query.
(#16274) (id: 60410)
New SCALAR_IN_ARRAY function
You can now use the following function to check if a scalar expression appears in an array:
SCALAR_IN_ARRAY(expr, arr)
(#16306) (id: 60546)
Improved native queries
Native queries can now group on nested columns and arrays.
(#16068) (id: 42483)
Improved performance for LIKE filters
Previously, simple regular expressions could trigger backtracking that dramatically increased query time. For example, an expresion with a few %
wildcards. Druid now uses a simple greedy algorithm that avoids backtracking to improve query performance by up to 20% for worst-case scenarios.
(#43169)(id: 43169)
Centralized datasource schema (alpha)
You can now configure Druid to centralize schema management using the Coordinator service. Previously, Brokers needed to query data nodes and tasks for segment schemas. Centralizing datasource schemas can improve startup time for Brokers and the efficiency of your deployment.
If enabled, the following changes occur:
- Realtime segment schema changes get periodically pushed to the Coordinator
- Tasks publish segment schemas and metadata to the metadata database
- The Coordinator service polls the schema and segment metadata to build datasource schemas
- Brokers fetch datasource schemas from the Coordinator when possible. If not, the Broker builds the schema.
This behavior is currently opt-in. To enable this feature, set the following configs:
- In your common runtime properties, set
druid.centralizedDatasourceSchema.enabled
to true. - If you're using MiddleManagers, you also need to set
druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled
to true in your MiddleManager runtime properties.
You can return to the previous behavior by changing the configs to false.
You can configure the following properties to control how the Coordinator service handles unused segment schemas:
Name | Description | Required | Default |
---|---|---|---|
druid.coordinator.kill.segmentSchema.on | Boolean value for enabling automatic deletion of unused segment schemas. If set to true, the Coordinator service periodically identifies segment schemas that are not referenced by any used segment and marks them as unused. At a later point, these unused schemas are deleted. | No | True |
druid.coordinator.kill.segmentSchema.period | How often to do automatic deletion of segment schemas in ISO 8601 duration format. Value must be equal to or greater than druid.coordinator.period.metadataStoreManagementPeriod . Only applies if druid.coordinator.kill.segmentSchema.on is set to true. | No | P1D |
druid.coordinator.kill.segmentSchema.durationToRetain | ISO 8601 duration for the time a segment schema is retained for from when it's marked as unused. Only applies if druid.coordinator.kill.segmentSchema.on is set to true. | Yes, if druid.coordinator.kill.segmentSchema.on is set to true. | P90D |
In addition there are new metrics for monitoring after enabling centralized datasource schemas:
metadatacache/schemaPoll/count
metadatacache/schemaPoll/failed
metadatacache/schemaPoll/time
metadacache/init/time
metadatacache/refresh/count
metadatacache/refresh/time
metadatacache/backfill/count
metadatacache/finalizedSegmentMetadata/size
metadatacache/finalizedSegmentMetadata/count
metadatacache/finalizedSchemaPayload/count
metadatacache/temporaryMetadataQueryResults/count
metadatacache/temporaryPublishedMetadataQueryResults/count
For more information, see Metrics.
(#15817) (id: 37627)(id: 36862)
Imply Manager highlights
New configurable security policies
In Settings > Security policy, you can configure custom password options and login throttling.
For passwords, some of the parameters you can configure include the following:
- Minimum and maximum password length
- Minimum passphrase length
- Password lifetime
- Password history
For login throttling options, you can configure the following:
- Lockout tries: the number of attempts before a user account is locked out
- Lockout duration: the amount of time the user account is locked out for
- Disable tries: the number of attempts before a user account is disabled
- Disable duration: the amount of time the user account is disabled for
(id: 43193)
Other changes
Pivot changes
- Improved the flat table, gauge, time series, and overall (beta) visualizations to allow users to drag-and-drop dimensions, measures, and comparisons into the visualization. This functionality is already available in other visualizations (id: 60621)
- Improved custom date time formatting (id: 40619)
- Improved the
PIVOT_NESTED_AGG
function (id: 43264) - Improved the experience of switching from the overall visualization to the overall (beta) visualization (id: 60607)
- Fixed
rawDownloadLimit
not working in data cube advanced options (id: 34072) - Fixed an issue that prevented the All rows async download option from working in the records table visualization (id: 60110)
- Fixed a rendering issue for overall and flat table visualizations on data cubes without a primary time dimension (id: 60151)
- Fixed an issue with alert webhook URLs missing protocol (id: 60383)
- Fixed a problem with Ok and Cancel buttons not appearing when multiple time ranges were applied to a data cube (id: 60502)
- Fixed an issue where hiding 'overall' for a column dimension would also hide the dimension name (id: 40649)
- Fixed an issue with dashboard filters resetting from "greater than or equal to/less than or equal to" to "greater than/less than" (id: 40777)
- Fixed a problem with the time dimension bucket option "default, no bucket" (id: 41304)
- Fixed an issue where dashboards with mixed tile types didn't apply filters to some visualizations (id: 41787)
- Fixed a problem with the table visualization hiding nested split values in some circumstances (id: 42517)
- Fixed an issue where users without the
accessDataCubes
permission could expand a dashboard tile and navigate directly to a data cube view (id: 60285) - Fixed a problem with the overall (beta) visualization when using a LONG column as primary time dimension (id: 43293)
Druid changes
- Added support for selective loading of lookups so that MSQ task engine workers don't load unnecessary lookups (#16328) (id: 40610)
- Added the JVM version to JVM monitor metrics (#16262) (id: 60386)
- Added a new index for pending segments table for datasource and
task_allocator_id
columns (#16355) (id: 60743) - Changed the web console to no longer send transform expressions containing lookups to the sampler, which always resulted in an error. The web console now uses a placeholder (#16234) (id: 60276)
- Changed the upload buffer size in GoogleTaskLogs to 1 MB instead of 15 MB to allow more uploads in parallel and prevent the MiddleManager service from running out of memory (#16236) (id: 60293)
- Changed default value of
useMaxMemoryEstimates
for Hadoop jobs to false (#16280) (id: 60688) - Fixed an issue with concurrent replace where you might get duplicate query results due to a race condition (#16144) (id: 60300)
- Fixed an issue where the log count for the number of datasources affected by auto-kill was wrong (#16341) (id: 60706)
- Fixed an issue where join queries on complex data types return the wrong results (id: 60698)
- Fixed an issue where concurrent replace skipped intervals locked by append locks during compaction (#16316) (id: 60583)
- Fixed an issue where the query context parameter
enableTimeBoundaryPlanning: true
makes a max time query return incorrect results when a virtual column is used. TimeBoundary queries don't support virtual columns #(id: 60682) - Fixed an issue where TimeBoundary queries incorrectly allowed filters that require virtual columns. TimeBoundary queries don't support virtual columns (#16337) (id: 60686)
- Fixed an incorrect check while generating MSQ task engine error Report (#16273) (id: 60577)
- Fixed the supervisor offset reset dialog in the web console (#16298) (id: 60571)
- Fixed an exception that occurs while loading lookups from an empty JDBC source (#16307) (id: 60539)
- Fixed query timer issues in the web console (#16235) (id: 60266)
- Fixed windowed aggregates so that they update the aggregation value based on the final compute (#16244) (id: 60390)
- Fixed an issue where ORDER BY gets ignored on certain GROUPING SETS (#16268) (id: 60389)
- Fixed CVEs (#16147) (id: 60279)
- Fixed an issue where the web console could return incorrect creation times and durations for tasks after the Overlord service restarts (#16228) (id: 60223)
- Fixed issues with the first/last vector aggregators (#16230) (id: 47262)
- Fixed an issue where groupBy queries that have
bit_xor() is null
return the wrong result (#16237) (id: 42556) - Fixed an issue where
ipv4_parse()
returns an assertion error instead of null on an invalid IP address string literal (#15916) (id: 42199) - Fixed an issue where Broker merge buffers get into a deadlock when multiple simultaneous queries use them (#15420) (id: 37984)
- Improved the feedback you receive when a task fails due to lock revocation exceptions in task status (#16325) (id: 60710)
- Improved how scalars work in arrays (#16311) (id: 60705)
- Improved how Druid parses JSON by using
charsetFix
(#16212) (id: 60708) - Improved the error message when a task fails before becoming ready (#16286) (id: 60525)
- Improved the performance of OR filters in certain use cases (#16300) (id: 60507)
- Improved the performance of queries that use filter bundles (#16292) (id: 60500)
- Improved the user experience for the web console to better indicate when it is in manual capability detection mode and limited features are available (#16191) (id: 60288)
- Improved the error messages when a supervisor's checkpoint state is invalid (#16208) (id: 60236)
- Improved MSQ task engine reports to show why range partitioning was not chosen (#16175) (id: 42652)
Imply Manager changes
- The root password for Imply Enterprise on Kubernetes deployments, including GKE and AKS, now expire one year after startup. Previously, the password expired one year after the image was created (id: 60397)
- You can now specify GCP resource labels in installation script for Imply Enterprise on GKE (id: 43069)
Changes in 2024.04.1
Pivot changes
- Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)
Changes in 2024.04
Pivot highlights
Database auth tokens
You can now create a database auth token on a Pivot role to enable access to specific Druid data. See Database auth tokens for more information.
(id: 41170)
Druid highlights
Improved array ingest mode
The array
mode for arrayIngestMode
contains improvements that make it the best choice for any new datasources that contain arrays. Imply strongly recommends that you use array
mode instead of mvd
mode. array
mode provides a better experience, including support for a wider range of array types. Continued improvements to the array
ingest mode and array-typed columns are on the roadmap. Additionally, you can avoid certain limitations of mvd
mode by using array
mode.
The following list describes the behavior based on what you set arrayIngestMode
to:
- If you set it to
array
, SQL ARRAY types are stored using Druid array columns. This is recommended for new tables. - If you set it to
mvd
, SQLVARCHAR ARRAY
types are implicitly wrapped inARRAY_TO_MV
. This causes them to be stored as multi-value strings, using the sameSTRING
column type as regular scalar strings. This is the default behavior whenarrayIngestMode
is not provided in your query context. - If you set it to
none
, Druid throws an exception when trying to store any type of array.
The following table summarizes the differences in SQL ARRAY handling between arrayIngestMode: array
and arrayIngestMode: mvd
:
SQL type | Stored type when arrayIngestMode: array | Stored type when arrayIngestMode: mvd (default) |
---|---|---|
VARCHAR ARRAY | ARRAY<STRING> | multi-value STRING |
BIGINT ARRAY | ARRAY<LONG> | not possible (validation error) |
DOUBLE ARRAY | ARRAY<DOUBLE> | not possible (validation error) |
In either mode, you can explicitly wrap string arrays in ARRAY_TO_MV
to cause them to be stored as
multi-value strings.
Note that you cannot mix string arrays and multi-value strings in the same column.
(#15920) (id: 43043)
Other changes
Pivot changes
- You can now use asymmetric number range filters with flat table, gauge, time series, and overall (beta) visualizations (id: 58507)
- Added query precision and query caching session persistence (id: 43251)
- Added
FilterWithRegex
permission which allows users to use the regex filter for string dimensions (id: 42840) - Added Pivot server configuration property
disableExternalEmails
which allows administrators to disable sending alerts and reports to external email addresses (id: 42201) - Added the ability to include only one value in a number filter (id: 43253)
- Improved the performance and behavior of the
PIVOT_NESTED_AGG
function (id: 43264) - Fixed an issue with the line chart visualization causing dashboards to crash (id: 58506)
- Fixed an issue with the street map visualization including all data cube Latitude/Longitude dimensions, even the ones not in the visualization (id: 58505)
- Fixed an issue with the y-axis extending above the line chart visualization boundary for very small values (id: 43300)
- Fixed unnecessary query in axis-query generation (id: 43244)
- Fixed an issue with table columns not respecting the time format (id: 43181)
- Fixed incorrect x-axis in the time series vizualisation (id: 42725)
- Fixed dashboard time filter "include end bound" not working in gauge, flat table, time series, and overall (beta) visualizations (id: 42406)
- Fixed flat table changes not being propagated from data cube to dashboard (id: 42275)
- Fixed time comparison not working as expected in a line chart visualization when bucketing <=5 minutes (id: 41982)
- Fixed an issue where dashboard filters reset from Greater than or equal and Less than or equal to Greater than and Less than (id: 40777)
Druid changes
- Added more logging for S3 retries (#16117) (id: 43161)
- Added new in filter that preserves the input types (id: 41500)
- Added new typed in filter (#16039) (id: 48937)
- Added error code to failure type InternalServerError (#16186) (id: 54432)
- Added support for using window functions with the MSQ task engine as the query engine (#15470) (id: 39416)
- Added support for joins in decoupled mode (#15957) (id: 42763)
- Added
segmentsRead
andsegmentsPublished
fields to parallel compaction task completion reports so that you can see how effective a compaction task is (#15947) (id: 38574) - Added a new
task/autoScaler/requiredCount
metric that provides a count of required tasks based on the calculations of thelagBased
autoscaler. Compare that value totask/running/count
to discover the difference between the current and desired task counts (#16199) (id: 58510) - Changed the controller checker for the MSQ task engine to check for closed only (#16161) (id: 43289)
- Added geospatial interfaces (#16029) (id: 60162)
- Fixed ColumnType to RelDataType conversion for nested arrays (#16138) (id: 43178)
- Fixed
WindowingscanAndSort
query issues on top of Joins (#15996) (id: 42717) - Fixed
REGEXP_LIKE
,CONTAINS_STRING
, andICONTAINS_STRING
so that they correctly return null for null value inputs in ANSI SQL compatible null handling mode (the default configuration). Previously, they returned false (#15963) (id: 43288) - Fixed the Azure icon not rendering in the web console (#16173) (id: 43286)
- Fixed a bug in the
MarkOvershadowedSegmentsAsUnused
Coordinator duty to also consider segments that are overshadowed by a segment that requires zero replicas (#16181) (id: 43285) - Fixed issues with
ARRAY_CONTAINS
andARRAY_OVERLAP
with null left side arguments as well asMV_CONTAINS
andMV_OVERLAP
(#15974) (id: 43162) - Fixed an issue where numeric LATEST_BY and EARLIEST_BY aggregations show incorrect results with latest_by (#15939) (id: 42342)
- Fixed a bug in the
markUsed
andmarkUnused
APIs where an empty set of segment IDs would be inconsistently treated as null or non-null in different scenarios (#16145) (id: 43153) - Fixed a bug where export queries did not use the output names specified and exported the temporary column names instead for some queries, such as GROUP BY (#16096) (id: 42826)
- Fixed a bug where
numSegmentsKilled
is reported incorrectly (#16103) (id: 42960) - Fixed an issue with metric emission in the segment generation phase (#16146) (id: 43152)
- Fixed a data race in getting results from MSQ select tasks (#16107) (id: 43000)
- Fixed an issue which can occur when using schema auto-discovery on columns with a mix of array and scalar values and querying with scan queries (#16105) (id: 43007)
- Fixed a bug where completion task reports are not being generated on
index_parallel
tasks. (#16042) (id: 42805) - Fixed an issue where
safe_divide
queries returned "Calcite assertion violated" errors (id: 41766) - Fixed an issue where SQL-based ingestion fails if the first monitor for
druid.server.metrics.ServiceStatusMonitor
isServiceStatusMonitor
(id: 38520) - Improved ingestion performance by parsing an input stream directly instead of converting it to a string and parsing the string as JSON (#15693) (id: 57692)
- Improved optimizations to the MSQ task engine for real-time queries so that they are backwards compatible (id: 42658)
- Improved serialization of TaskReportMap (#16217) (id: 60179)
- Improved the creation of input row filter predicate in various batch tasks (#16196) (id: 56861)
- Improved how tasks are fetched from the Overlord to redact credentials (#16182) (id: 52829)
- Improved the web console to only pick the Kafka input format by default when needed (#16180) (id: 60186)
- Improved compaction segment read and published fields to include sequential compaction tasks (#16171) (id: 60142)
- Improved the
markUnused
API endpoint to handle an empty list of segment versions (#16198) (id: 56864) - Improved the
segmentIds
filter in themarkUsed
API payload so that it's parameterized in the database query (#16174) (id: 47268) - Improved how quickly workers get canceled for the MSQ task engine (#16158) (id: 43179)
- Improved the MSQ task engine to support
IS NOT DISTINCT FROM
for SortMerge joins (#16003) (id: 43099) - Improved the download query detail archive option in the web console to be more resilient when the detail archive is incomplete (#16071) (id: 42908)
- Improved the UX for
arrayIngestMode
in the web console (#15927) (id: 43038) - Improved array handling for Booleans to account for queries such as
select array[true, false] from datasource
(#16093) (id: 42963) (id: 42610) - Improved nested columns. Nested column serialization now releases nested field compression buffers as soon as the nested field serialization is completed, which requires significantly less direct memory during segment serialization when many nested fields are present (#16076) (id: 42955)
- Improved querying to decrease the chance of going OOM with high cardinality data Group By (#16114) (id: 42502)
- Improved real-time queries that use the MSQ task engine by changing how segments are grouped (#15399) (id: 39167)
- Optimized
isOvershadowed
when there is a unique minor version for an interval (#15952) (id: 43287) - Updated the following dependencies:
redisclients:jedis
from 5.0.2 to 5.1.2 (#16074) (id: 42909)express
from 4.18.2 to 4.19.2 in the web console (#16204) (id: 60147)webpackdevmiddleware
from 5.3.3 to 5.3.4 in the web console (#16195) (id: 60146)followredirects
from 1.15.5 to 1.15.6 in the web console (#16134) (id: 43157)axios
in web console (#16087) (id: 42954)druidtoolkit
from 0.21.9 to 0.22.11 in the web console (#16213) (id: 60144)
Clarity changes
- Disabled alert custom time periods of less than one minute (id: 43223)
Imply Manager changes
- Allowed all kube-system pods to be moved by the cluster-autoscaler in GKE (id: 43198)
- Prevent middle managers from being replaced before task status is synced during rolling updates (id: 40137)
- Imply Hybrid on AWS:
- Enabled ServiceStatusMonitor by default (id: 38540)
- Fixed cluster manager API so that custom extensions are not removed (id: 33190)
Changes in 2024.03.2
Pivot changes
- Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)
Changes in 2024.03.1
Druid changes
- Fixed an issue where the Overlord process could fail to return the location of tasks (id: 60106)
Changes in 2024.03
Pivot highlights
Time series visualization supports more functions
The time series visualization now supports the following time series functions in addition to TIMESERIES and DELTA_TIMESERIES:
- ADD_TIMESERIES
- DIVIDE_TIMESERIES
- MULTIPY_TIMESERIES
- SUBTRACT_TIMESERIES
The configuration options have also been simplified. See Visualization reference for details.
(id: 40670)
Druid highlights
Dynamic table append
You can now use the TABLE(APPEND(...))
function to implicitly create unions based on table schemas. For example, the two following queries are equivalent:
TABLE(APPEND('table1','table2','table3'))
and
SELECT column1,NULL AS column2,NULL AS column3 FROM table1
UNION ALL
SELECT NULL AS column1,column2,NULL AS column3 FROM table2
UNION ALL
SELECT column1,column2,column3 FROM table3
Note that if the same columns are defined with different input types, Druid uses the least restrictive column type.
(#15897) (id: 42645)
Renamed segment kill metric
The kill/candidateUnusedSegments/count
metric is now called kill/eligibleUnusedSegments/count
.
(#15977) (id: 42492)
Improved streaming task completion reports
Streaming Task completion reports now have an extra field, recordsProcessed
. The field lists the partitions processed by that task and the count of records for each partition. You can look at this field to see the actual throughput of tasks and make decisions on whether to scale your workers vertically or horizontally.
(#15930) (id: 42430)
Improved Supervisor rolling restarts
The stopTaskCount
config now prioritizes stopping older tasks first. As part of this change, you must also explicitly set a value for stopTaskCount
. It no longer defaults to the same value as taskCount
.
(#15859) (id: 42143) (id: 40605)
Parallelized incremental segment creation
You can now configure the number of threads used to create and persist incremental segments on the disk using the numPersistThreads
property. Use additional threads to parallelize the segment creation to prevent ingestion from stalling or pausing frequently as long as there are sufficient CPU resources available.
(#13982) (id: 32098)
Fixes for deep storage on Google Cloud Storage
This release contains fixes for customers using deep storage on GCS. The issues were caused by updates to the Google Cloud Client libraries from an older API client. Affected STS versions of Imply were 2024.01 STS through 2024.02.3 STS. For remediation steps for kill task failures see Remove orphaned segments in deep storage.
- Fixed kill task failures caused when trying to delete a file that doesn't exist in Google Cloud Storage (#16047) (id: 42663)
- Fixed an issue where Druid incorrectly deleted task log events when
druid.indexer.logs.kill.enabled
is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838) - Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)
Improved query performance for AND filters
Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. Known as "filter partitioning," it can result in dramatic performance increases, depending on the order of filters in the query.
For example, take a query like SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'
. Previously, Druid used indexes when processing filters if they are available. That's not always ideal; imagine if stringColumn1 = '1000'
matches 100 rows. With indexes, we have to find every value of stringColumn2 LIKE '%1%'
that is true to compute the indexes for the filter. If stringColumn2
has more than 100 values, it ends up being worse than simply checking for a match in those 100 remaining rows.
With the new logic, Druid checks the selectivity of indexes as it processes each clause of the AND filter. If it determines it would take more work to compute the index than to match the remaining rows, Druid skips computing the index.
The order you write filters in a WHERE clause of a query can improve the performance of the query. More improvements are coming, but you can try out the existing improvements by reordering a query. Put less intensive to compute indexes such as IS NULL, =, and comparisons (>
, >=,
<
, and <=
) near the start of AND filters so that Druid more efficiently processes your queries. Not ordering your filters in this way won’t degrade performance from previous releases since the fallback behavior is what Druid did previously.
(#15838) (id: 41535)
Other changes
Pivot changes
- Added permission
AccessDownloadAsync
to allow users to access the async download (alpha) feature when the feature is enabled for your organization (id: 42274) - You can now set the Latest data strategy to Query the latest timestamp from the data source, relative to the latest full day in the advanced data cube options (id: 39634)
- You can now set the default view in a data cube's defaults to be a gauge, flat table, time series, or overall (beta) visualization (id: 41373)
- You can now choose whether or not to display the year in time values in a table visualization (id: 40988)
- Fixed an issue where filters and shown dimensions and measures were not preserved when switching to some visualization types (id: 41059)
- Fixed Pivot showing an error for some time comparisons in a data cube (id: 41013)
- Fixed a rounding issue in the display of dimensions (id: 42654)
- Fixed downloads limited to 5,000 rows for flat table, gauge, time series, and overall (beta) visualizations (id: 42600)
- Fixed failed async downloads producing a truncated file instead of an error (id: 42595)
- Fixed query precision issues (ids: 42521, 42227, 42230)
- Fixed async downloads not working with "previous period" comparisons (id: 42247)
- Fixed Pivot crashing when applying a filter to the records visualization (id: 42189)
- Fixed dashboard tiles causing save conflicts in flat table, gauge, time series, and overall (beta) visualizations (id: 41414)
- Fixed lack of indication when data cube is refreshed (id: 40260)
Druid changes
- Added support for single value aggregated Group By queries for scalars (#15700) (id: 41951)
- Added support for numeric arrays to columnar frames, which are used in subquery materializations and window functions (#15917) (id: 41784)
- Added the ability to set custom dimensions for events emitted by the Kafka emitter as a JSON map for the
druid.emitter.kafka.extra.dimensions
property. For example,druid.emitter.kafka.extra.dimensions={"region":"us-east-1","environment":"preProd"}
(#15845) (id: 41961) - Added more AWS Kinesis regions and groups to the web console (#15900) (id: 42476)
- Added support to the web console for Protobuf input formats and the Avro bytes decoder (#15950) (id: 42461)
- Changed the format of the value of
targetDataSource
in EXPLAIN clauses for SQL-based ingestion queries back to being a string. For some recent releases, it was a JSON object (#16004) (id: 42575) - Changed the severity of a
k8sTaskRunner
log message to WARN (#15871) (id: 42303) - Changed the
durationMs
properties in MSQ task reports to exclude worker/controller start up time (id: 40311) - Fixed an issue where queries that use LATEST_BY or EARLIEST_BY return null when they contain a secondary timestamp column (#15939) (id: 42917)
- Fixed an issue where Druid incorrectly deleted task log events when
druid.indexer.logs.kill.enabled
is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838) - Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)
- Fixed an issue where the data loader for the web console crashes when attempting to parse data that can't be parsed (#15983) (id: 42649)
- Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. (#15615) (id: 42657)
- Fixed an issue where compaction tasks reports got overwritten. New entries are written to the report instead (#15981) (id: 42673)
- Fixed an issue that occurred when the
castToType
parameter is set onauto
column schema (#15921) (id: 42434) - Fixed an issue where
flattenSpec
is in the wrong location if you use the web console to generate the supervisor spec for a Kafka ingestion (#15946) (id: 42433) - Fixed an issue where Kubernetes environment variables that use underscores would be parsed incorrectly (#15919) (id: 42336)
- Fixed an issue where the wrong base template would be used for task types included through extensions, such as
index_kinesis
. For example, if you definedruid.indexer.runner.k8s.podTemplate.index_kafka
, the KubernetesTaskRunner still useddruid.indexer.runner.k8s.podTemplate.base
as the base template for tasks.(#15915) (id: 42293) - Fixed an issue where a query returns the wrong results if
PARSE_LONG
is null (#15909) (id: 42134) - Fixed an issue where Druid incorrectly deleted task log events when
druid.indexer.logs.kill.enabled
is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838) - Fixed an issue where MSQ task engine results are truncated and return an error (#16107)
- Improved Connection Count server select strategy to account for slow connection requests (#15975) (id: 42662)
- Improved the retry behavior for deep storage connections (#15938) (id: 42690)
- Improved how segments are counted so that segments still available through deep storage (replicas set to 0) are not marked as unavailable (#16020) (id: 42656)
- Improved the error message for when a MSQ task engine-based join using the
sortMerge
option falls back to a broadcast join (#16002) (id: 42655) - Improved
druid-basic-security
performance by using the cache for password hash when validating LDAP passwords (#15993) (id: 42650) - Improved concurrent replace to work with supervisors using concurrent locks (#15995) (id: 42648)
- Improved the web console to detect doubles better (#15998) (id: 42646)
- Improved the web console to be able to search in tables and columns (#15990) (id: 42647)
- Improved segment trouble shooting. Segments created in the same batch have the same
created_date
entry (#15977) (id: 42492) - Improved the error messages you get if there's an issue with your PARTITIONED BY clause (#15961) (id: 42462)
- Improved the web console to support export with the MSQ task engine (#15969) (id: 42460)
- Improved how connections are counted and servers are selected to account for slow connections (#15975) (id: 42407)
- Improved the web console to allow compaction config slots to drop to 0, such as when compaction is paused (#15877) (id: 42178)
- Improved the web console to include system fields when using the batch data loader (#15858) (id: 41918)
- Updated PostgreSQL from 42.6.0 to 42.7.2 (#15931) (id: 42432)
- Improved performance for real-time queries that use the MSQ task engine (#15399) (id: 39167)
- Improved the Coordinator process to better handle an uninitialized cache in node role watchers, which could lead to stuck tasks (#15726) (id: 39099)
- Improved how expressions are evaluated to ensure thread safety (#15694) (id: 42620)
- Improved batching of scan results while estimating bytes (#15987) (id: 42507)
- Updated Log4j from 2.18.0 to 2.22.1 (#15934) (id: 42431)
Platform changes
- Account settings now display in Imply Hybrid Manager in SSO mode (id: 42372)
- Fixed an issue with Imply Enterprise on GKE deployments where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)
- Fixed a race condition that could cause Enterprise deployments on GKE to fail to start because of files missing from the configuration bundle (id: 42747) (id: 42726)
- Fixed an issue where GCP automatically deploys a managed Prometheus instance causing pod exhaustion. Imply Enterprise on GKE turns off these automatic Prometheus deployments by default now (id: 42567)
Changes in 2024.02.4
Pivot changes
- Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)
Changes in 2024.02.3
Druid changes
- Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. For details, see the following Imply Knowledge Base article (id: 42545)
Changes in 2024.02.2
Druid changes
- Fixed an issue with filters on expression virtual column indexes incorrectly considering values null in some cases for expressions which translate null values into not null values (id: 42448)
Changes in 2024.02.1
Druid changes
- Fixed an issue where the Druid console generates a Kafka supervisor spec where
flattenSpec
is in the wrong place, causing it to be ignored (#15946)
Pivot changes
- Fixed an issue where Pivot closes unexpectedly when you open the records visualization and apply a filter (id: 42189)
Platform changes
- Fixed an issue with the GKE enhanced installation where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)
Changes in 2024.02
Pivot highlights
New overall visualization (beta)
A new overall visualization includes a trend line and an updated properties panel.
You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, the beta overall visualization replaces the standard overall visualization. See Visualizations reference for more information. (ids: 40562, 41090)
Druid highlights
Improved concurrent append and replace
You no longer need to manually specify the task lock type for concurrent append and replace using the taskLockType
context parameter. Instead, Druid can determine it for you. You can either use a context parameter or a cluster-wide config:
- Use the context parameter
"useConcurrentLocks": true
for specific JSON-based or streaming ingestion tasks and datasource. Datasources need the parameter in situations such as when you want to be able to append data to the datasource while compaction is running. - Set the cluster-wide config
druid.indexer.task.default.context
totrue
.
(#1568) (id: 41083)
Range support for window functions
Window functions now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter windowingStrictValidation
to false
.
The following example shows a window expression with RANGE frame specifications:
(ORDER BY c)
(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)
(#15746) (#15365) (id: 41623)
Ingest from multiple Azure accounts
Azure as an ingestion source now supports ingesting data from multiple storage accounts that are specified in druid.azure.account
. To do this, use the new azureStorage
schema instead of the previous azure
schema. For example,
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "azureStorage",
"objectGlob": "**.json",
"uris": ["azureStorage://storageAccount/container/prefix1/file.json", "azureStorage://storageAccount/container/prefix2/file2.json"]
},
"inputFormat": {
"type": "json"
},
...
},
...
(#15630) (id: 41428)
Improved performance for real-time queries
If the query context bySegment
is set to false
for real-time queries, the way in which layers are merged has been improved to be more efficient. There's now only a single layer of merging, just like for Historical processes. As part of this change, segment metrics, like query/segment/time
, are now per-FireHydrant instead of per-Sink.
If you set bySegment
to true
, the old behavior of two layer is preserved.
(#15757) (id: 41406)
Other changes
Pivot changes
- Added
maxNumDownloadTasks
to Pivot server configuration file, to optionally set the maximum number of tasks to assign to async downloads. See Pivot server config for more information (id: 41092) - Added an option to "Go to URL" for URL dimensions in the flat table visualization (id: 41283)
- Fixed an error that appeared when duplicating a dashboard from the header bar (id: 41537)
- Fixed a problem with filtering on a dimension with Set/String type that contains nulls (id: 41459)
- Fixed an issue where async downloads didn't include filters by measure (id: 41435)
- Fixed records table visualization crashing when scrolling to the bottom in a dashboard tile (id: 41165)
- Fixed an issue with the records visualization not supporting async download (id: 41289)
- Fixed dimensions with IDs that contain periods showing as "undefined" in records table visualization (id: 41009)
- Fixed Pivot 2 visualizations crashing on data cubes with no dimensions (id: 40998)
- Fixed inability to set "Greater than 0" measure filter in flat table visualization (id: 40985)
- Fixed a problem with visualization URLs not updating after a measure is deleted from a data cube (id: 40565)
- Fixed "overall" values rendering incorrectly in line chart visualization when they should be hidden (id: 40501)
- Fixed incorrect time bucket label for America/Mexico_City timezone in DST (id: 39749)
- Fixed inability to scroll pinned dimensions list (id: 39647)
- Fixed discrepancies when applying custom UI colors (id: 40266)
- Improved handling of time filters dashboard tiles (id: 41171)
- Improved measures in tables visualization to show nulls if they contain no data (id: 40665)
- Improved the display of comparison values in visualizations, by adding the ability to sort by delta and percentage (id: 38539)
Druid changes
- Added
QueryLifecycle#authorize
forgrpcqueryextension
(#15816) (id: 41725) - Added nested array index support fix some issues (#15752) (id: 41724)
- Added support for array types in the web console ingestion wizards (#15588) (id: 41613)
- Added SQUARE_ROOT function to the timeseries extension:
MAP_TIMESERIES(timeseries, 'sqrt(value)')
(id: 41516) - Added null value index wiring for nested columns (#15687) (id: 41475)
- Added support to the web console for sorting the segment table on start and end when grouped (#15720) (id: 41438)
- Added a tile to the web console for the new Azure input source (id: 41317)
- Added
ImmutableLookupMap
for static lookups (#15675) (id: 41268) - Added Cache value selectors in
RowBasedColumnSelectorFactory
(#15615) (id: 41265) - Added faster
kway
merging using tournament trees 8byte key strides (#15661) (id: 40987) - Added CONCAT flattening filter decomposition (#15634) (id: 40986)
- Added partition boosting for INSERT with GROUP BY (dealing with skewed partition) (#15474) (id: 15015)
- Added SQL compatibility for numeric first and last column types. The web console also provides an option for first and last aggregation(#15607) (id: 40615)
- Added differentiation between null and empty strings in
SerializablePairStringLong
serde (id: 40401) - Changed
IncrementalIndex#add
is no longer thread safe and improves performance (#15697) (id: 41260) - Fixed the KafkaInputFormat parsing incoming JSON newline-delimited (as if it were a batch ingest) rather than as a whole entity (as is typical for streaming ingest) (#15692) (id: 41261)
- Improved segment locking behavior so that the
RetrieveSegmentsToReplaceAction
is no longer needed (#15699) (id: 41484) - Disabled eager initialization for non-query connection requests (#15751) (id: 41407)
- Enabled
ArrayListRowsAndColumns
toStorageAdapter
conversion (#15735) (id: 41616) - Enabled query request queuing by default when total laning is turned on (#15440) (id: 40807)
- Fixed web console forcing
waitUntilSegmentLoad
totrue
even if the user sets it to false (#15781) (id: 41614) - Fixed CVEs (#15814) (id: 41612)
- Fixed interpolated exception message in
InvalidNullByteFault
(#15804) (id: 41546) - Fixed extractionFns on
numberwrapping
dimension selectors (#15761) (id: 41443) - Fixed summary iterator in grouping engine(#15658) (id: 41264)
- Fixed incorrect scale when reading decimal from parquet (#15715) (id: 41263)
- Fixed a rendering issue for disabled workers in the web console (#15712) (id: 41259)
- Fixed issues so that the Kafka emitter will now run all scheduled callables. The emitter now intelligently provision threads to make sure there are no wasted threads, and all callables can run (#15719) (id: 41258)
- Fixed MSQ task engine intermediate files not being immediately cleaned up in Azure (id: 41243)
- Fixed audit log entries not appearing for "Mark as used all segments" actions (id: 41080)
- Fixed some naming related to
AggregatePullUpLookupRule
(#15677) (id: 41030) -- NOT USER FACING. DELETE - Fixed an NPE that could occur if the
StandardDeviationPostAggregator
passed in is null:postAggregations.estimator: null
(#15660) (id: 41003) - Fixed reverse pull-up lookups in the SQL planner (#15626) (id: 41002)
- Fixed compaction getting stuck on intervals with tombstones (#15676) (id: 41001)
- Fixed
Resultcache
causing an exception when a sketch is stored in the cache (#15654) (id: 40885) - Fixed concurrent append and replace options in the web console (#15649) (id: 40868)
- Fixed an issue that blocked queries issued from the small Run buttons (from inside the larger queries) from being modified from the table actions. (#15779) (id: 41515)
- Improved segment killing performance for Azure (#15770) (id: 38567)
- Improved the performance of the
druid-basic-security
extension (#15648) (id: 40884) - Improved lookups to register first lookup immediately, regardless of the cache status (#15598) (id: 40863)
- Improved numerical first and last aggregators so that they work for SQL-based ingestion too (id: 40996)
- Improved parsing speed for list-based input rows (#15681) (id: 41262)
- Improved error messages for DATE_TRUNC operators (#15759) (id: 41471)
- Improved the web console to support using file inputs instead of text inputs for the Load query detail archive dialogue (#15632) (id: 40941)
- Changed the web console to use the new
azureStorage
input type instead of theazure
storage type for ingesting from Azure (#15820) (id: 41723) - Changed the cryptographic salt size that Druid uses to 128 bits so that it is FIPS compliant (#15758) (id: 41405)
Changes in 2024.01.4
Pivot changes
- Fixed an issue where filter tokens weren't correctly applied when a data cube also contained a subset filter definition. For details, see the following Imply Knowledge Base article (id: 61235)
Changes in 2024.01.3
Druid changes
- Fixed an issue where DataSketches HLL Sketches would erroneously be considered empty. For details see the following Imply Knowledge Base article (id: 41916)
Changes in 2024.01.2
Druid changes
- Fixed an issue where an exception occurs when queries use filters on TIME_FLOOR (#15778)
Changes in 2024.01.1
Druid changes
- Fixed an issue with the default value for the
inSubQueryThreshold
parameter, which resulted in slower than expected queries. The default value for it is now2147483647
(up from20
) (#15688) (id: 40814)
Changes in 2024.01
Pivot highlights
Pivot now runs natively on macOS ARM systems
We encourage on-prem customers to opt-in to an updated distribution format for Pivot by setting an environment variable in your Pivot nodes: IMPLY_PIVOT_NOPKG=1
. This format will become the default later in 2024.
This distribution format enables Pivot to target current and future LTS versions of Node.js and provides a compatibility option for customers who are unable to upgrade from legacy Linux distributions such as RHEL 7, CentOS 7, and Ubuntu 18.04. (id: 40447)
Druid highlights
SQL PIVOT and UNPIVOT (beta)
You can now use the SQL PIVOT and UNPIVOT operators to turn rows into columns and column values into rows respectively. (id: 37598)
The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:
PIVOT (aggregation_function(column_to_aggregate)
FOR column_with_values_to_pivot
IN (pivoted_column1 [, pivoted_column2 ...])
)
The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:
UNPIVOT (values_column
FOR names_column
IN (unpivoted_column1 [, unpivoted_column2 ... ])
)
New JSON_QUERY_ARRAY function
The JSON_QUERY_ARRAY function is similar to JSON_QUERY except the return type is always ARRAY<COMPLEX<json\>\>
instead of COMPLEX<json\>
. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations. (#15521) (id: 40335)
Changes to native equals
filter
Native query equals filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema
reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays. (#15503) (id: 40328)
Support for GCS for SQL-based ingestion
You can now use Google Cloud Storage (GCS) as durable storage for SQL-based ingestion and queries from deep storage. (#15398) (id: 35053)
Improved INNER joins
Druid can support arbitrary join conditions for INNER join. For INNER joins, Druid will look at the join condition, and any sub-conditions that cannot be evaluated efficiently as part of the join will be converted to a post-join filter. With this feature, you can do inequality joins that were not possible before. (#15302) (id: 37564)
Other changes
Pivot changes
- Added Pivot server configuration property
forceNoRedirect
which forces the Pivot UI to always render the splash page without automatic redirection (id: 38986) - Added the ability to sort a data cube by the first column, by clicking the column header (id: 31363)
- Fixed percent of root causing downloads from deep storage to fail (id: 40673)
- Fixed incorrect sort order in deep storage downloads (id: 40374)
- Fixed flat table visualization with absolute time filter using "Latest day" when accessed with link (id: 40339)
- Fixed functional and display issues in the overall visualization (id: 40271)
- Fixed back button not working correctly in async downloads dialog (id: 40265)
- Improved query generation in Pivot and Plywood to use the 2-value IS NOT TRUE version of the NOT operator (id: 40638)
- Improved data cube measure preview by providing a manual override prompt when the preview fails (id: 38763)
- Updated the names of the async downloads feature flags to
Async Downloads (Deprecated)
andAsync Downloads, New Engine, 2023 (Alpha)
(id: 40525)
Druid changes
- Added experimental support for first/last data types for double/float/long during native and SQL-based ingestion (#14462) (id: 37231)
- Added new config
druid.audit.manager.type
which can take valueslog
,sql
(default). This allows audited events to either be logged or persisted in metadata store (default behavior). (#15480) (id: 37696) - Added new config
druid.audit.manager.logLevel
which allows users to set the log level of audit events and can take valuesDEBUG
,INFO
(default),WARN
. (#15480) (id: 37696) - Added array column type support to EXTEND operator (#15458) (id: 40286)
- Changed what happens when query scheduler threads are less than server HTTP threads. When that happens, total laning is enforced, and some HTTP threads are reserved for non-query requests, such as health checks. Previously, any request that exceeded lane capacity was rejected. Now, excess requests are queued with a timeout equal to
MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)
. If the value is negative, requests are queued forever. (#15440) (id: 40776) - Changed the ARRAY_TO_MV function to support expression inputs (#15528) (id: 40358)
- Changed the auto column indexer so that when columns that contain only empty or null containing arrays are ingested, they are stored as
ARRAY<LONG\>
instead ofCOMPLEX<json\>
. (#15505) (id: 40313) - Fixed an issue where null and empty strings were treated equally, and the return value was always null (#15525) (id: 40401)
- Fixed an issue where lookups fail with an error related to failing to construct
FilteredAggregatorFactory
(#15526) (id: 40296) - Fixed issues related to null handling and vector expression processors (#15587) (id: 40545)
- Fixed a bug in the ingestion spec to SQL-based ingestion query convertor for the web console (#15627) (id: 40795)
- Fixed redundant expansion in SearchOperatorConversion (#15625) (id: 40768)
- Fixed an issue where some ARRAY types were treated incorrectly as COMPLEX types instead(#15543) (id: 40514)
- Fixed a NPE with virtual expressions and unnest (#15513) (id: 40348)
- Fixed an issue where the Window function minimum aggregates nulls as 0 (#15371) (id: 40327)
- Fixed an issue where null filters on datasources with range partitioning could lead to excessive segment pruning, leading to missed results (#15500) (id: 40288)
- Fixed an issue with window functions where a string cannot be cast when creating HLL sketches (#15465) (id: 39859)
- Fixed a bug in segment allocation that can potentially cause loss of appended data when running interleaved append and replace tasks. (#15459) (id: 39718)
- Improved filtering performance by adding support for using underlying column index for
ExpressionVirtualColumn
(#15585) (#15633) (id: 39668) (id: 40794) - Improved how three-valued logic is handled (#15629) (id: 40797)
- Improved the Broker to be able to use catalog for datasource schemas for SQL queries (#15469) (id: 40796)
- Improved the Druid audit system to log when a supervisor is created or updated (#15636) (id: 40774)
- Improved the connection between Brokers and Coordinators with Historical and real-time processes (#15596) (id: 40763)
- Improved how segment granularity is handled when there is a conflict and the requested segment granularity can't be allocated. Day granularity is now considered after month. Previously, week was used, but weeks do not align with months perfectly. You can still explicitly request week granularity. (#15589) (id: 40701)
- Improved polling in segment allocation queue to improve efficiency and prevent race conditions (#15590) (id: 40690)
- Improved the web console to detect EXPLAIN PLAN queries and be able to run them individually (#15570) (id: 40508)
- Improved the efficiency of queries by Reducing amount of expression objects created during evaluations (#15552) (id: 40495)
- Improved the error message you get if you try to use INSERT INTO and OVERWRITE syntax (id: 37790)
- Improved the JDBC lookup dialog in the web console to include Jitter seconds, Load timeout seconds, and Max heap percentage options (#15472) (id: 40246)
- Improved compaction so that it skips for datasources with partial eternity segments, which could result in memory pressure on the Coordinator (#15542) (id: 40075)
- Improved Kinesis integration so that only checkpoints for partitions with unavailable sequence numbers are reset (#15338) (id: 29788)
- Improved the performance of the following:
- how Druid generates queries from Calcite plans
- the internal SEARCH operator used by other functions
- the COALESCE function (#15609) (id: 40672) (#15623) (id: 40691)
- Removed the ‘auto’ strategy from search queries. Specifying ‘auto’ will now be equivalent to specifying
useIndexes
(#15550) (id: 40460)
Clarity changes
- Updated
subsetFormula
for server cube to accept null values (id: 40254)
Platform changes
- Added support for JVM memory metrics in GKE ZooKeeper deployments (id: 38855)
Upgrade and downgrade notes
In addition to the upgrade and downgrade notes, review the deprecations page regularly to see if any features you use are impacted.
Minimum supported version for rolling upgrade
See "Supported upgrade paths" in the Lifecycle Policy documentation.
Front-coded dictionaries
Starting in 2025.01 STS, the front-coded dictionaries feature will be on by default. Once Druid starts using segments with front-coded dictionaries, you can't downgrade to a version where Druid doesn't support front-coded dictionaries. For more information, see Migration guide: front-coded dictionaries.
Automatic compaction on by default
Starting in 2025.01 STS, automatic compaction will be on by default.
Segment sorting
This feature is in alpha and not backwards compatible with versions earlier than 2024.09. If you enable it, you can't downgrade to a version earlier than 2024.09 STS.
You can now configure Druid to sort segments by something other than time first.
For SQL-based ingestion, include the query context parameter forceSegmentSortByTime: false
. For JSON-based batch and streaming ingestion, include forceSegmentSortByTime: false
in the dimensionsSpec
block.
(#16849) (id: 63215)
Changed low-level APIs for extensions
This information is meant for users who write their own Druid extensions and doesn't impact anyone who only uses extensions supported by Imply.
As part of changes starting in 2024.09 to improve the Druid, including the changes described in Segment sorting for Druid users, some low-level APIs used by some extensions may no longer be compatible with any existing custom extensions you have. For more information about which interfaces are impacted, see the following pull requests:
Compression for complex metric columns
If you use the IndexSpec
option complexMetricCompression
to compress complex metric columns, you cannot downgrade to a version that doesn't support compressing those columns.
This feature was introduced in 2024.09 STS.
(#16863) (id: 63277)
ZooKeeper segment serving processes
ZooKeep-based segment loading has been disabled in 2024.06 STS.
In 2024.08 STS, segment serving processes such as Peons, Historicals and Indexers won't create ZooKeeper loadQueuePath
anymore. The property druid.zk.paths.loadQueuePath
will also be ignored if they are still in your configs.
If you are still using ZooKeeper-based segment loading and want to upgrade to a more recent release where only HTTP-based segment loading is supported, switch to HTTP-based segment loading before upgrading. For more information, see Segment management.
(#16816) (id: 62629)
Front-coded dictionaries
In 2025, the front-coded dictionaries feature will be enabled by default. Front-coded dictionaries reduce storage and improve performance by optimizing strings with similar prefixes.
Once this feature is enabled, you cannot easily downgrade to an earlier version that doesn't support it.
For more information, see Migration guide: front-coded dictionaries.
If you're already using this feature, you don't need to take any action.
Batch ingestion task failure
There is a known issue with Imply 2024.05 versions where batch ingestion tasks can fail during rolling upgrades or downgrades. If you run into a task failure during an upgrade or downgrade, restart the failed task after the rolling upgrade or downgrade completes. This issue is fixed in Imply Enterprise and Imply Hybrid 2024.06. If you need to avoid such task failures, upgrade to 2024.06 or later.
Filter tokens in Pivot
If you use subset filters in conjunction with filter tokens, upgrade to 2024.10.1. For details, see the following Imply Knowledge Base article.
Remove orphaned segments in GCS deep storage
If you have orphaned segments from failed kill tasks from 2024.01 STS through 2024.02.3 STS, optionally identify and delete any segments that meet both of the following criteria:
- Segment exists in deep storage, but has no corresponding metadata store record.
- Segment is older than 1 week.
Identifying segments older than a week will prevent deletion of pending segments.
stopTaskCount
must now be explicitly set
Starting in 2024.03 STS, you must explicitly set a value for stopTaskCount
if you want to use it for streaming ingestion. It no longer defaults to the same value as taskCount
.
Segment metrics for real-time queries
Starting in 2024.02 STS, segment metrics for real-time queries (such as query/segment/time
) are per-FireHydrant instead of per-Sink when the context parameter bySegment
is set to false
, which is common for most use cases.
Renamed segment metric
Starting in 2024.03 STS, the kill/candidateUnusedSegments/count
is now called kill/eligibleUnusedSegments/count
.
(#15977) (id: 42492)
GroupBy queries that use the MSQ task engine during upgrades
Beginning in 2024.02 STS, the performance and behavior for segment partitioning has been improved. GroupBy queries may fail during an upgrade if some workers are on an older version and some are on a more recent version.
Changes to native equals
filter
Beginning in 2024.01 STS, the native query equals
filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema
reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays.
Imply Hybrid MySQL upgrade
Imply Hybrid previously used MySQL 5.7 by default. New clusters will use MySQL 8 by default. If you have an existing cluster, you'll need to upgrade the MySQL version since the Amazon RDS support end date for this version is scheduled for February 29, 2024. Although you can opt for extended support from Amazon, you can use Imply Hybrid Manager to upgrade your MySQL instance to MySQL 8.
The upgrade should have little to no impact on your queries but does require a reconnection to the database. The process can take an hour and services will reconnect to the database during the upgrade.
In preparation for the upgrade, you need to grant certain permissions to the Cloud Manager IAM role by applying the following policy:
Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"rds:CreateBlueGreenDeployment",
"rds:PromoteReadReplica"
],
"Resource": [
"arn:aws:rds:*:*:pg:*",
"arn:aws:rds:*:*:deployment:*",
"arn:aws:rds:*:*:*:imply-*"
],
"Effect": "Allow"
},
{
"Action": [
"rds:AddTagsToResource",
"rds:CreateDBInstanceReadReplica",
"rds:DeleteBlueGreenDeployment",
"rds:DescribeBlueGreenDeployments",
"rds:SwitchoverBlueGreenDeployment"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
After you grant the permissions, click Apply changes for Amazon RDS MySQL Update on the Overview page of Imply Hybrid Manager.
Three-valued logic
The legacy two-valued logic and the corresponding properties that support it will be removed in the December 2024 STS and January 2025 LTS. The SQL compatible three-valued logic will be the only option.
Update your queries and downstream apps prior to these releases.
SQL standard three-valued logic introduced in 2023.11 primarily affects filters using the logical NOT operation on columns with NULL values. This applies to both query and ingestion time filtering.
The following example illustrates the old behavior and the new behavior:
Consider the filter “x <> 'some value'”
to filter results for which x
is not equal to 'some value'
.
Previously, Druid included all rows not matching "x='some value'"
including null values.
The new behavior follows the SQL standard and will now only match rows with a value and which are not equal to 'some value'
.
Null values are excluded from the results.
This change primarily affects filters using the logical NOT operation on columns with NULL values.
Three-valued logic is only enabled if you accept the following default values:
druid.generic.useDefaultValueForNull=false
druid.expressions.useStrictBooleans=true
druid.generic.useThreeValueLogicForNativeFilters=true
SQL compatibility
The legacy behavior that is not compatible with standard ANSI SQL and the corresponding properties will be removed in the December 2024 STS and January 2025 LTS releases. The SQL compatible behavior introduced in the 2023.09 STS will be the only behavior available.
Update your queries and any downstream apps prior to these releases.
Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.
For nulls, Druid now differentiates between an empty string (''
) and a record with no data as well as between an empty numerical record and 0
.
You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull
to true
. This property affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time and query time. Reverting this setting to the old value restores the previous behavior without reingestion.
For booleans, Druid now strictly uses 1
(true) or 0
(false). Previously, true and false could be represented either as true
and false
as well as 1
and 0
, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL
.
druid.expressions.useStrictBooleans
primarily affects querying, however it also affects json columns and type-aware schema discovery for ingestion. You can set druid.expressions.useStrictBooleans
to false
to configure Druid to ingest booleans in 'auto'
and 'json'
columns as VARCHAR (native STRING)
typed columns that use string values of 'true'
and 'false'
instead of BIGINT (native LONG)
. You must set it on all Druid service types to be available at both ingestion time and query time.
The following table illustrates some example scenarios and the impact of the changes:
Show the table
Query | 2023.08 STS and earlier | 2023.09 STS and later |
---|---|---|
Query empty string | Empty string ('' ) or null | Empty string ('' ) |
Query null string | Null or empty | Null |
COUNT(*) | All rows, including nulls | All rows, including nulls |
COUNT(column) | All rows excluding empty strings | All rows including empty strings but excluding nulls |
Expression 100 && 11 | 11 | 1 |
Expression 100 || 11 | 100 | 1 |
Null FLOAT/DOUBLE column | 0.0 | Null |
Null LONG column | 0 | Null |
Null __time column | 0, meaning 1970-01-01 00:00:00 UTC | 1970-01-01 00:00:00 UTC |
Null MVD column | '' | Null |
ARRAY | Null | Null |
COMPLEX | none | Null |
Update your queries
Before you upgrade from a version prior to 2023.09 to 2023.09 or later, update your queries to account for the changed behavior:
NULL filters
If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL
to s IS NULL OR s = ''
.
COUNT functions
COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column)
with COUNT(column) FILTER(WHERE column <> '')
.
GroupBy queries
GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.
Avatica JDBC driver upgrade
The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.
If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection
.
Parameter execution changes for Kafka
When using the built-in FileConfigProvider
for Kafka, interpolations are now intercepted by the JsonConfigurator
instead of being passed down to the Kafka provider. This breaks existing deployments.
For more information, see KIP-297 and #13023.
Deprecation notices
Two-valued logic
Druid's legacy two-valued logic for native filters and the properties for maintaining that behavior are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.
The ANSI-SQL compliant three-valued logic will be the only supported behavior after these releases. This SQL compatible behavior became the default for deployments that use Imply 2023.11 STS and January 2024 LTS releases.
Update your queries and downstream apps prior to these releases.
For more information, see three-valued logic.
Properties for legacy Druid SQL behavior
Druid's legacy behavior for Booleans and NULLs and the corresponding properties are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.
The ANSI-SQL compliant treatment of Booleans and null values will be the only supported behavior after these releases. This SQL compatible behavior became the default for Imply 2023.11 STS and January 2024 LTS.
Update your queries and downstream apps prior to these releases.
For more information, see SQL compatibility.
Some segment loading configs deprecated
Starting with 2023.08 STS, the following segment related configs are now deprecated and will be removed in future releases:
maxSegmentsInNodeLoadingQueue
maxSegmentsToMove
replicationThrottleLimit
useRoundRobinSegmentAssignment
replicantLifetime
maxNonPrimaryReplicantsToLoad
decommissioningMaxPercentOfMaxSegmentsToMove
Use smartSegmentLoading
mode instead, which calculates values for these variables automatically.
SysMonitor
support deprecated
Starting with 2023.08 STS, switch to OshiSysMonitor
as SysMonitor
is now deprecated and will be removed in future releases.
Asynchronous SQL download deprecated
The async downloads feature is deprecated and will be removed in future releases. Instead consider using Query from deep storage.
End of support
CrossTab view
The CrossTab view feature is no longer supported. Use Pivot 2.0 instead, which incorporates the capabilities of CrossTab view.
Zookeeper-based segment loading
Zookeeper-based segment loading is no longer supported starting in 2024.06 STS. You do not need to take any action. Druid ignores the following related configs:
druid.coordinator.load.timeout
druid.coordinator.loadqueuepeon.type
druid.coordinator.curator.loadqueuepeon.numCallbackThreads
Druid only uses the recommended HTTP loading, which includes improvements to the Coordinator service such as smart segment loading.
As part of this change, compaction configs for inactive datasources are automatically cleaned up by default.
(#15705) (#id: 60764)