Skip to main content

Imply Enterprise and Hybrid release notes

Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2024.04.

For information on the LTS release, see the LTS release notes.

If you are upgrading by more than one version, read the intermediate release notes too.

The following end-of-support dates apply in 2023:

  • On January 26, 2023, Imply 2021.01 LTS reached EOL. This means that the 2021.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
  • On January 31, 2023, Imply 2022.01 LTS ended general support status and is eligible only for security support.

For more information, see Lifecycle Policy.

See Previous versions for information on older releases.

Imply evaluation

New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!

With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.

Imply Enterprise

If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.

Changes in 2024.04

Pivot highlights

Database auth tokens

You can now create a database auth token on a Pivot role to enable access to specific Druid data. See Database auth tokens for more information.

(id: 41170)

Druid highlights

Improved array ingest mode

The array mode for arrayIngestMode contains improvements that make it the best choice for any new datasources that contain arrays. Imply strongly recommends that you use array mode instead of mvd mode. array mode provides a better experience, including support for a wider range of array types. Continued improvements to the array ingest mode and array-typed columns are on the roadmap. Additionally, you can avoid certain limitations of mvd mode by using array mode.

The following list describes the behavior based on what you set arrayIngestMode to:

  • If you set it to array, SQL ARRAY types are stored using Druid array columns. This is recommended for new tables.
  • If you set it to mvd, SQL VARCHAR ARRAY types are implicitly wrapped in ARRAY_TO_MV. This causes them to be stored as multi-value strings, using the same STRING column type as regular scalar strings. This is the default behavior when arrayIngestMode is not provided in your query context.
  • If you set it to none, Druid throws an exception when trying to store any type of array.

The following table summarizes the differences in SQL ARRAY handling between arrayIngestMode: array and arrayIngestMode: mvd:

SQL typeStored type when arrayIngestMode: arrayStored type when arrayIngestMode: mvd (default)
VARCHAR ARRAYARRAY<STRING>multi-value STRING
BIGINT ARRAYARRAY<LONG>not possible (validation error)
DOUBLE ARRAYARRAY<DOUBLE>not possible (validation error)

In either mode, you can explicitly wrap string arrays in ARRAY_TO_MV to cause them to be stored as multi-value strings.

Note that you cannot mix string arrays and multi-value strings in the same column.

(#15920) (id: 43043)

Pivot changes

  • You can now use asymmetric number range filters with flat table, gauge, time series, and overall (beta) visualizations (id: 58507)
  • Added query precision and query caching session persistence (id: 43251)
  • Added FilterWithRegex permission which allows users to use the regex filter for string dimensions (id: 42840)
  • Added Pivot server configuration property disableExternalEmails which allows administrators to disable sending alerts and reports to external email addresses (id: 42201)
  • Added the ability to include only one value in a number filter (id: 43253)
  • Improved the performance and behavior of the PIVOT_NESTED_AGG function (id: 43264)
  • Fixed an issue with the line chart visualization causing dashboards to crash (id: 58506)
  • Fixed an issue with the street map visualization including all data cube Latitude/Longitude dimensions, even the ones not in the visualization (id: 58505)
  • Fixed an issue with the y-axis extending above the line chart visualization boundary for very small values (id: 43300)
  • Fixed unnecessary query in axis-query generation (id: 43244)
  • Fixed an issue with table columns not respecting the time format (id: 43181)
  • Fixed incorrect x-axis in the time series vizualisation (id: 42725)
  • Fixed dashboard time filter "include end bound" not working in gauge, flat table, time series, and overall (beta) visualizations (id: 42406)
  • Fixed flat table changes not being propagated from data cube to dashboard (id: 42275)
  • Fixed time comparison not working as expected in a line chart visualization when bucketing <=5 minutes (id: 41982)
  • Fixed an issue where dashboard filters reset from Greater than or equal and Less than or equal to Greater than and Less than (id: 40777)

Druid changes

  • Added more logging for S3 retries (#16117) (id: 43161)
  • Added new in filter that preserves the input types (id: 41500)
  • Added new typed in filter (#16039) (id: 48937)
  • Added error code to failure type InternalServerError (#16186) (id: 54432)
  • Added support for using window functions with the MSQ task engine as the query engine (#15470) (id: 39416)
  • Added support for joins in decoupled mode (#15957) (id: 42763)
  • Added segmentsRead and segmentsPublished fields to parallel compaction task completion reports so that you can see how effective a compaction task is (#15947) (id: 38574)
  • Added a new task/autoScaler/requiredCount metric that provides a count of required tasks based on the calculations of the lagBased autoscaler. Compare that value to task/running/count to discover the difference between the current and desired task counts (#16199) (id: 58510)
  • Changed the controller checker for the MSQ task engine to check for closed only (#16161) (id: 43289)
  • Added geospatial interfaces (#16029) (id: 60162)
  • Fixed ColumnType to RelDataType conversion for nested arrays (#16138) (id: 43178)
  • Fixed WindowingscanAndSort query issues on top of Joins (#15996) (id: 42717)
  • Fixed REGEXP_LIKE, CONTAINS_STRING, and ICONTAINS_STRING so that they correctly return null for null value inputs in ANSI SQL compatible null handling mode (the default configuration). Previously, they returned false (#15963) (id: 43288)
  • Fixed the Azure icon not rendering in the web console (#16173) (id: 43286)
  • Fixed a bug in the MarkOvershadowedSegmentsAsUnused Coordinator duty to also consider segments that are overshadowed by a segment that requires zero replicas (#16181) (id: 43285)
  • Fixed issues with ARRAY_CONTAINS and ARRAY_OVERLAP with null left side arguments as well as MV_CONTAINS and MV_OVERLAP (#15974) (id: 43162)
  • Fixed an issue where numeric LATEST_BY and EARLIEST_BY aggregations show incorrect results with latest_by (#15939) (id: 42342)
  • Fixed a bug in the markUsed and markUnused APIs where an empty set of segment IDs would be inconsistently treated as null or non-null in different scenarios (#16145) (id: 43153)
  • Fixed a bug where export queries did not use the output names specified and exported the temporary column names instead for some queries, such as GROUP BY (#16096) (id: 42826)
  • Fixed a bug where numSegmentsKilled is reported incorrectly (#16103) (id: 42960)
  • Fixed an issue with metric emission in the segment generation phase (#16146) (id: 43152)
  • Fixed a data race in getting results from MSQ select tasks (#16107) (id: 43000)
  • Fixed an issue which can occur when using schema auto-discovery on columns with a mix of array and scalar values and querying with scan queries (#16105) (id: 43007)
  • Fixed a bug where completion task reports are not being generated on index_parallel tasks. (#16042) (id: 42805)
  • Fixed an issue where safe_divide queries returned "Calcite assertion violated" errors (id: 41766)
  • Fixed an issue where SQL-based ingestion fails if the first monitor for druid.server.metrics.ServiceStatusMonitor is ServiceStatusMonitor (id: 38520)
  • Improved ingestion performance by parsing an input stream directly instead of converting it to a string and parsing the string as JSON (#15693) (id: 57692)
  • Improved optimizations to the MSQ task engine for real-time queries so that they are backwards compatible (id: 42658)
  • Improved serialization of TaskReportMap (#16217) (id: 60179)
  • Improved the creation of input row filter predicate in various batch tasks (#16196) (id: 56861)
  • Improved how tasks are fetched from the Overlord to redact credentials (#16182) (id: 52829)
  • Improved the web console to only pick the Kafka input format by default when needed (#16180) (id: 60186)
  • Improved compaction segment read and published fields to include sequential compaction tasks (#16171) (id: 60142)
  • Improved the markUnused API endpoint to handle an empty list of segment versions (#16198) (id: 56864)
  • Improved the segmentIds filter in the markUsed API payload so that it's parameterized in the database query (#16174) (id: 47268)
  • Improved how quickly workers get canceled for the MSQ task engine (#16158) (id: 43179)
  • Improved the MSQ task engine to support IS NOT DISTINCT FROM for SortMerge joins (#16003) (id: 43099)
  • Improved the download query detail archive option in the web console to be more resilient when the detail archive is incomplete (#16071) (id: 42908)
  • Improved the UX for arrayIngestMode in the web console (#15927) (id: 43038)
  • Improved array handling for Booleans to account for queries such as select array[true, false] from datasource (#16093) (id: 42963) (id: 42610)
  • Improved nested columns. Nested column serialization now releases nested field compression buffers as soon as the nested field serialization is completed, which requires significantly less direct memory during segment serialization when many nested fields are present (#16076) (id: 42955)
  • Improved querying to decrease the chance of going OOM with high cardinality data Group By (#16114) (id: 42502)
  • Improved real-time queries that use the MSQ task engine by changing how segments are grouped (#15399) (id: 39167)
  • Optimized isOvershadowed when there is a unique minor version for an interval (#15952) (id: 43287)
  • Updated the following dependencies:
    • redisclients:jedis from 5.0.2 to 5.1.2 (#16074) (id: 42909)
    • express from 4.18.2 to 4.19.2 in the web console (#16204) (id: 60147)
    • webpackdevmiddleware from 5.3.3 to 5.3.4 in the web console (#16195) (id: 60146)
    • followredirects from 1.15.5 to 1.15.6 in the web console (#16134) (id: 43157)
    • axios in web console (#16087) (id: 42954)
    • druidtoolkit from 0.21.9 to 0.22.11 in the web console (#16213) (id: 60144)

Clarity changes

  • Disabled alert custom time periods of less than one minute (id: 43223)

Imply Manager changes

  • Allowed all kube-system pods to be moved by the cluster-autoscaler in GKE (id: 43198)
  • Prevent middle managers from being replaced before task status is synced during rolling updates (id: 40137)
  • Imply Hybrid on AWS:
    • Enabled ServiceStatusMonitor by default (id: 38540)
    • Fixed cluster manager API so that custom extensions are not removed (id: 33190)

Changes in 2024.03.1

Druid changes

  • Fixed an issue where the Overlord process could fail to return the location of tasks (id: 60106)

Changes in 2024.03

Pivot highlights

Time series visualization supports more functions

The time series visualization now supports the following time series functions in addition to TIMESERIES and DELTA_TIMESERIES:

  • ADD_TIMESERIES
  • DIVIDE_TIMESERIES
  • MULTIPY_TIMESERIES
  • SUBTRACT_TIMESERIES

The configuration options have also been simplified. See Visualization reference for details.

(id: 40670)

Druid highlights

Dynamic table append

You can now use the TABLE(APPEND(...)) function to implicitly create unions based on table schemas. For example, the two following queries are equivalent:

TABLE(APPEND('table1','table2','table3'))

and

SELECT column1,NULL AS column2,NULL AS column3 FROM table1
UNION ALL
SELECT NULL AS column1,column2,NULL AS column3 FROM table2
UNION ALL
SELECT column1,column2,column3 FROM table3

Note that if the same columns are defined with different input types, Druid uses the least restrictive column type.

(#15897) (id: 42645)

Renamed segment kill metric

The kill/candidateUnusedSegments/count metric is now called kill/eligibleUnusedSegments/count.

(#15977) (id: 42492)

Improved streaming task completion reports

Streaming Task completion reports now have an extra field, recordsProcessed. The field lists the partitions processed by that task and the count of records for each partition. You can look at this field to see the actual throughput of tasks and make decisions on whether to scale your workers vertically or horizontally.

(#15930) (id: 42430)

Improved Supervisor rolling restarts

The stopTaskCount config now prioritizes stopping older tasks first. As part of this change, you must also explicitly set a value for stopTaskCount. It no longer defaults to the same value as taskCount.

(#15859) (id: 42143) (id: 40605)

Parallelized incremental segment creation

You can now configure the number of threads used to create and persist incremental segments on the disk using the numPersistThreads property. Use additional threads to parallelize the segment creation to prevent ingestion from stalling or pausing frequently as long as there are sufficient CPU resources available.

(#13982) (id: 32098)

Fixes for deep storage on Google Cloud Storage

This release contains fixes for customers using deep storage on GCS. The issues were caused by updates to the Google Cloud Client libraries from an older API client. Affected STS versions of Imply were 2024.01 STS through 2024.02.3 STS. For remediation steps for kill task failures see Remove orphaned segments in deep storage.

  • Fixed kill task failures caused when trying to delete a file that doesn't exist in Google Cloud Storage (#16047) (id: 42663)
  • Fixed an issue where Druid incorrectly deleted task log events when druid.indexer.logs.kill.enabled is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838)
  • Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)

Improved query performance for AND filters

Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. Known as "filter partitioning," it can result in dramatic performance increases, depending on the order of filters in the query.

For example, take a query like SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'. Previously, Druid used indexes when processing filters if they are available. That's not always ideal; imagine if stringColumn1 = '1000' matches 100 rows. With indexes, we have to find every value of stringColumn2 LIKE '%1%' that is true to compute the indexes for the filter. If stringColumn2 has more than 100 values, it ends up being worse than simply checking for a match in those 100 remaining rows.

With the new logic, Druid checks the selectivity of indexes as it processes each clause of the AND filter. If it determines it would take more work to compute the index than to match the remaining rows, Druid skips computing the index.

The order you write filters in a WHERE clause of a query can improve the performance of the query. More improvements are coming, but you can try out the existing improvements by reordering a query. Put less intensive to compute indexes such as IS NULL, =, and comparisons (>, >=, <, and <=) near the start of AND filters so that Druid more efficiently processes your queries. Not ordering your filters in this way won’t degrade performance from previous releases since the fallback behavior is what Druid did previously.

(#15838) (id: 41535)

Pivot changes

  • Added permission AccessDownloadAsync to allow users to access the async download (alpha) feature when the feature is enabled for your organization (id: 42274)
  • You can now set the Latest data strategy to Query the latest timestamp from the data source, relative to the latest full day in the advanced data cube options (id: 39634)
  • You can now set the default view in a data cube's defaults to be a gauge, flat table, time series, or overall (beta) visualization (id: 41373)
  • You can now choose whether or not to display the year in time values in a table visualization (id: 40988)
  • Fixed an issue where filters and shown dimensions and measures were not preserved when switching to some visualization types (id: 41059)
  • Fixed Pivot showing an error for some time comparisons in a data cube (id: 41013)
  • Fixed a rounding issue in the display of dimensions (id: 42654)
  • Fixed downloads limited to 5,000 rows for flat table, gauge, time series, and overall (beta) visualizations (id: 42600)
  • Fixed failed async downloads producing a truncated file instead of an error (id: 42595)
  • Fixed query precision issues (ids: 42521, 42227, 42230)
  • Fixed async downloads not working with "previous period" comparisons (id: 42247)
  • Fixed Pivot crashing when applying a filter to the records visualization (id: 42189)
  • Fixed dashboard tiles causing save conflicts in flat table, gauge, time series, and overall (beta) visualizations (id: 41414)
  • Fixed lack of indication when data cube is refreshed (id: 40260)

Druid changes

  • Added support for single value aggregated Group By queries for scalars (#15700) (id: 41951)
  • Added support for numeric arrays to columnar frames, which are used in subquery materializations and window functions (#15917) (id: 41784)
  • Added the ability to set custom dimensions for events emitted by the Kafka emitter as a JSON map for the druid.emitter.kafka.extra.dimensions property. For example, druid.emitter.kafka.extra.dimensions={"region":"us-east-1","environment":"preProd"} (#15845) (id: 41961)
  • Added more AWS Kinesis regions and groups to the web console (#15900) (id: 42476)
  • Added support to the web console for Protobuf input formats and the Avro bytes decoder (#15950) (id: 42461)
  • Changed the format of the value of targetDataSource in EXPLAIN clauses for SQL-based ingestion queries back to being a string. For some recent releases, it was a JSON object (#16004) (id: 42575)
  • Changed the severity of a k8sTaskRunner log message to WARN (#15871) (id: 42303)
  • Changed the durationMs properties in MSQ task reports to exclude worker/controller start up time (id: 40311)
  • Fixed an issue where queries that use LATEST_BY or EARLIEST_BY return null when they contain a secondary timestamp column (#15939) (id: 42917)
  • Fixed an issue where Druid incorrectly deleted task log events when druid.indexer.logs.kill.enabled is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838)
  • Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)
  • Fixed an issue where the data loader for the web console crashes when attempting to parse data that can't be parsed (#15983) (id: 42649)
  • Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. (#15615) (id: 42657)
  • Fixed an issue where compaction tasks reports got overwritten. New entries are written to the report instead (#15981) (id: 42673)
  • Fixed an issue that occurred when the castToType parameter is set on auto column schema (#15921) (id: 42434)
  • Fixed an issue where flattenSpec is in the wrong location if you use the web console to generate the supervisor spec for a Kafka ingestion (#15946) (id: 42433)
  • Fixed an issue where Kubernetes environment variables that use underscores would be parsed incorrectly (#15919) (id: 42336)
  • Fixed an issue where the wrong base template would be used for task types included through extensions, such as index_kinesis. For example, if you define druid.indexer.runner.k8s.podTemplate.index_kafka, the KubernetesTaskRunner still used druid.indexer.runner.k8s.podTemplate.base as the base template for tasks.(#15915) (id: 42293)
  • Fixed an issue where a query returns the wrong results if PARSE_LONG is null (#15909) (id: 42134)
  • Fixed an issue where Druid incorrectly deleted task log events when druid.indexer.logs.kill.enabled is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838)
  • Fixed an issue where MSQ task engine results are truncated and return an error (#16107)
  • Improved Connection Count server select strategy to account for slow connection requests (#15975) (id: 42662)
  • Improved the retry behavior for deep storage connections (#15938) (id: 42690)
  • Improved how segments are counted so that segments still available through deep storage (replicas set to 0) are not marked as unavailable (#16020) (id: 42656)
  • Improved the error message for when a MSQ task engine-based join using the sortMerge option falls back to a broadcast join (#16002) (id: 42655)
  • Improved druid-basic-security performance by using the cache for password hash when validating LDAP passwords (#15993) (id: 42650)
  • Improved concurrent replace to work with supervisors using concurrent locks (#15995) (id: 42648)
  • Improved the web console to detect doubles better (#15998) (id: 42646)
  • Improved the web console to be able to search in tables and columns (#15990) (id: 42647)
  • Improved segment trouble shooting. Segments created in the same batch have the same created_date entry (#15977) (id: 42492)
  • Improved the error messages you get if there's an issue with your PARTITIONED BY clause (#15961) (id: 42462)
  • Improved the web console to support export with the MSQ task engine (#15969) (id: 42460)
  • Improved performance by reducing the number of metadata calls for the status of active tasks (#15724) (id: 42445)
  • Improved how connections are counted and servers are selected to account for slow connections (#15975) (id: 42407)
  • Improved the web console to allow compaction config slots to drop to 0, such as when compaction is paused (#15877) (id: 42178)
  • Improved the web console to include system fields when using the batch data loader (#15858) (id: 41918)
  • Updated PostgreSQL from 42.6.0 to 42.7.2 (#15931) (id: 42432)
  • Improved performance for real-time queries that use the MSQ task engine (#15399) (id: 39167)
  • Improved the Coordinator process to better handle an uninitialized cache in node role watchers, which could lead to stuck tasks (#15726) (id: 39099)
  • Improved how expressions are evaluated to ensure thread safety (#15694) (id: 42620)
  • Improved batching of scan results while estimating bytes (#15987) (id: 42507)
  • Updated Log4j from 2.18.0 to 2.22.1 (#15934) (id: 42431)

Platform changes

  • Account settings now display in Imply Hybrid Manager in SSO mode (id: 42372)
  • Fixed an issue with Imply Enterprise on GKE deployments where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)
  • Fixed a race condition that could cause Enterprise deployments on GKE to fail to start because of files missing from the configuration bundle (id: 42747) (id: 42726)
  • Fixed an issue where GCP automatically deploys a managed Prometheus instance causing pod exhaustion. Imply Enterprise on GKE turns off these automatic Prometheus deployments by default now (id: 42567)

Changes in 2024.02.3

Druid changes

  • Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. For details, see the following Imply Knowledge Base article (id: 42545)

Changes in 2024.02.2

Druid changes

  • Fixed an issue with filters on expression virtual column indexes incorrectly considering values null in some cases for expressions which translate null values into not null values (id: 42448)

Changes in 2024.02.1

Druid changes

  • Fixed an issue where the Druid console generates a Kafka supervisor spec where flattenSpec is in the wrong place, causing it to be ignored (#15946)

Pivot changes

  • Fixed an issue where Pivot closes unexpectedly when you open the records visualization and apply a filter (id: 42189)

Platform changes

  • Fixed an issue with the GKE enhanced installation where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)

Changes in 2024.02

Pivot highlights

New overall visualization (beta)

A new overall visualization includes a trend line and an updated properties panel.

You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, the beta overall visualization replaces the standard overall visualization. See Visualizations reference for more information. (ids: 40562, 41090)

Druid highlights

Improved concurrent append and replace

You no longer need to manually specify the task lock type for concurrent append and replace using the taskLockType context parameter. Instead, Druid can determine it for you. You can either use a context parameter or a cluster-wide config:

  • Use the context parameter "useConcurrentLocks": true for specific JSON-based or streaming ingestion tasks and datasource. Datasources need the parameter in situations such as when you want to be able to append data to the datasource while compaction is running.
  • Set the cluster-wide config druid.indexer.task.default.context to true.

(#1568) (id: 41083)

Range support for window functions

Window functions now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter windowingStrictValidation to false.

The following example shows a window expression with RANGE frame specifications:

(ORDER BY c)
(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)

(#15746) (#15365) (id: 41623)

Ingest from multiple Azure accounts

Azure as an ingestion source now supports ingesting data from multiple storage accounts that are specified in druid.azure.account. To do this, use the new azureStorage schema instead of the previous azure schema. For example,

    "ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "azureStorage",
"objectGlob": "**.json",
"uris": ["azureStorage://storageAccount/container/prefix1/file.json", "azureStorage://storageAccount/container/prefix2/file2.json"]
},
"inputFormat": {
"type": "json"
},
...
},
...

(#15630) (id: 41428)

Improved performance for real-time queries

If the query context bySegment is set to false for real-time queries, the way in which layers are merged has been improved to be more efficient. There's now only a single layer of merging, just like for Historical processes. As part of this change, segment metrics, like query/segment/time, are now per-FireHydrant instead of per-Sink.

If you set bySegment to true, the old behavior of two layer is preserved.

(#15757) (id: 41406)

Pivot changes

  • Added maxNumDownloadTasks to Pivot server configuration file, to optionally set the maximum number of tasks to assign to async downloads. See Pivot server config for more information (id: 41092)
  • Added an option to "Go to URL" for URL dimensions in the flat table visualization (id: 41283)
  • Fixed an error that appeared when duplicating a dashboard from the header bar (id: 41537)
  • Fixed a problem with filtering on a dimension with Set/String type that contains nulls (id: 41459)
  • Fixed an issue where async downloads didn't include filters by measure (id: 41435)
  • Fixed records table visualization crashing when scrolling to the bottom in a dashboard tile (id: 41165)
  • Fixed an issue with the records visualization not supporting async download (id: 41289)
  • Fixed dimensions with IDs that contain periods showing as "undefined" in records table visualization (id: 41009)
  • Fixed Pivot 2 visualizations crashing on data cubes with no dimensions (id: 40998)
  • Fixed inability to set "Greater than 0" measure filter in flat table visualization (id: 40985)
  • Fixed a problem with visualization URLs not updating after a measure is deleted from a data cube (id: 40565)
  • Fixed "overall" values rendering incorrectly in line chart visualization when they should be hidden (id: 40501)
  • Fixed incorrect time bucket label for America/Mexico_City timezone in DST (id: 39749)
  • Fixed inability to scroll pinned dimensions list (id: 39647)
  • Fixed discrepancies when applying custom UI colors (id: 40266)
  • Improved handling of time filters dashboard tiles (id: 41171)
  • Improved measures in tables visualization to show nulls if they contain no data (id: 40665)
  • Improved the display of comparison values in visualizations, by adding the ability to sort by delta and percentage (id: 38539)

Druid changes

  • Added QueryLifecycle#authorize for grpcqueryextension (#15816) (id: 41725)
  • Added nested array index support fix some issues (#15752) (id: 41724)
  • Added support for array types in the web console ingestion wizards (#15588) (id: 41613)
  • Added SQUARE_ROOT function to the timeseries extension: MAP_TIMESERIES(timeseries, 'sqrt(value)') (id: 41516)
  • Added null value index wiring for nested columns (#15687) (id: 41475)
  • Added support to the web console for sorting the segment table on start and end when grouped (#15720) (id: 41438)
  • Added a tile to the web console for the new Azure input source (id: 41317)
  • Added ImmutableLookupMap for static lookups (#15675) (id: 41268)
  • Added Cache value selectors in RowBasedColumnSelectorFactory (#15615) (id: 41265)
  • Added faster kway merging using tournament trees 8byte key strides (#15661) (id: 40987)
  • Added CONCAT flattening filter decomposition (#15634) (id: 40986)
  • Added partition boosting for INSERT with GROUP BY (dealing with skewed partition) (#15474) (id: 15015)
  • Added SQL compatibility for numeric first and last column types. The web console also provides an option for first and last aggregation(#15607) (id: 40615)
  • Added differentiation between null and empty strings in SerializablePairStringLong serde (id: 40401)
  • Changed IncrementalIndex#add is no longer thread safe and improves performance (#15697) (id: 41260)
  • Fixed the KafkaInputFormat parsing incoming JSON newline-delimited (as if it were a batch ingest) rather than as a whole entity (as is typical for streaming ingest) (#15692) (id: 41261)
  • Improved segment locking behavior so that the RetrieveSegmentsToReplaceAction is no longer needed (#15699) (id: 41484)
  • Disabled eager initialization for non-query connection requests (#15751) (id: 41407)
  • Enabled ArrayListRowsAndColumns to StorageAdapter conversion (#15735) (id: 41616)
  • Enabled query request queuing by default when total laning is turned on (#15440) (id: 40807)
  • Fixed web console forcing waitUntilSegmentLoad to true even if the user sets it to false (#15781) (id: 41614)
  • Fixed CVEs (#15814) (id: 41612)
  • Fixed interpolated exception message in InvalidNullByteFault (#15804) (id: 41546)
  • Fixed extractionFns on numberwrapping dimension selectors (#15761) (id: 41443)
  • Fixed summary iterator in grouping engine(#15658) (id: 41264)
  • Fixed incorrect scale when reading decimal from parquet (#15715) (id: 41263)
  • Fixed a rendering issue for disabled workers in the web console (#15712) (id: 41259)
  • Fixed issues so that the Kafka emitter will now run all scheduled callables. The emitter now intelligently provision threads to make sure there are no wasted threads, and all callables can run (#15719) (id: 41258)
  • Fixed MSQ task engine intermediate files not being immediately cleaned up in Azure (id: 41243)
  • Fixed audit log entries not appearing for "Mark as used all segments" actions (id: 41080)
  • Fixed some naming related to AggregatePullUpLookupRule (#15677) (id: 41030) -- NOT USER FACING. DELETE
  • Fixed an NPE that could occur if the StandardDeviationPostAggregator passed in is null: postAggregations.estimator: null (#15660) (id: 41003)
  • Fixed reverse pull-up lookups in the SQL planner (#15626) (id: 41002)
  • Fixed compaction getting stuck on intervals with tombstones (#15676) (id: 41001)
  • Fixed Resultcache causing an exception when a sketch is stored in the cache (#15654) (id: 40885)
  • Fixed concurrent append and replace options in the web console (#15649) (id: 40868)
  • Fixed an issue that blocked queries issued from the small Run buttons (from inside the larger queries) from being modified from the table actions. (#15779) (id: 41515)
  • Improved segment killing performance for Azure (#15770) (id: 38567)
  • Improved the performance of the druid-basic-security extension (#15648) (id: 40884)
  • Improved lookups to register first lookup immediately, regardless of the cache status (#15598) (id: 40863)
  • Improved numerical first and last aggregators so that they work for SQL-based ingestion too (id: 40996)
  • Improved parsing speed for list-based input rows (#15681) (id: 41262)
  • Improved error messages for DATE_TRUNC operators (#15759) (id: 41471)
  • Improved the web console to support using file inputs instead of text inputs for the Load query detail archive dialogue (#15632) (id: 40941)
  • Changed the web console to use the new azureStorage input type instead of the azure storage type for ingesting from Azure (#15820) (id: 41723)
  • Changed the cryptographic salt size that Druid uses to 128 bits so that it is FIPS compliant (#15758) (id: 41405)

Changes in 2024.01.3

Druid changes

  • Fixed an issue where DataSketches HLL Sketches would erroneously be considered empty. For details see the following Imply Knowledge Base article (id: 41916)

Changes in 2024.01.2

Druid changes

  • Fixed an issue where an exception occurs when queries use filters on TIME_FLOOR (#15778)

Changes in 2024.01.1

Druid changes

  • Fixed an issue with the default value for the inSubQueryThreshold parameter, which resulted in slower than expected queries. The default value for it is now 2147483647 (up from 20) (#15688) (id: 40814)

Changes in 2024.01

Pivot highlights

Pivot now runs natively on macOS ARM systems

We encourage on-prem customers to opt-in to an updated distribution format for Pivot by setting an environment variable in your Pivot nodes: IMPLY_PIVOT_NOPKG=1. This format will become the default later in 2024.

This distribution format enables Pivot to target current and future LTS versions of Node.js and provides a compatibility option for customers who are unable to upgrade from legacy Linux distributions such as RHEL 7, CentOS 7, and Ubuntu 18.04. (id: 40447)

Druid highlights

SQL PIVOT and UNPIVOT (beta)

You can now use the SQL PIVOT and UNPIVOT operators to turn rows into columns and column values into rows respectively. (id: 37598)

The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:

PIVOT (aggregation_function(column_to_aggregate)
FOR column_with_values_to_pivot
IN (pivoted_column1 [, pivoted_column2 ...])
)

The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:

UNPIVOT (values_column 
FOR names_column
IN (unpivoted_column1 [, unpivoted_column2 ... ])
)

New JSON_QUERY_ARRAY function

The JSON_QUERY_ARRAY function is similar to JSON_QUERY except the return type is always ARRAY<COMPLEX<json>> instead of COMPLEX<json>. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations. (#15521) (id: 40335)

Changes to native equals filter

Native query equals filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays. (#15503) (id: 40328)

Support for GCS for SQL-based ingestion

You can now use Google Cloud Storage (GCS) as durable storage for SQL-based ingestion and queries from deep storage. (#15398) (id: 35053)

Improved INNER joins

Druid can support arbitrary join conditions for INNER join. For INNER joins, Druid will look at the join condition, and any sub-conditions that cannot be evaluated efficiently as part of the join will be converted to a post-join filter. With this feature, you can do inequality joins that were not possible before. (#15302) (id: 37564)

Pivot changes

  • Added Pivot server configuration property forceNoRedirect which forces the Pivot UI to always render the splash page without automatic redirection (id: 38986)
  • Added the ability to sort a data cube by the first column, by clicking the column header (id: 31363)
  • Fixed percent of root causing downloads from deep storage to fail (id: 40673)
  • Fixed incorrect sort order in deep storage downloads (id: 40374)
  • Fixed flat table visualization with absolute time filter using "Latest day" when accessed with link (id: 40339)
  • Fixed functional and display issues in the overall visualization (id: 40271)
  • Fixed back button not working correctly in async downloads dialog (id: 40265)
  • Improved query generation in Pivot and Plywood to use the 2-value IS NOT TRUE version of the NOT operator (id: 40638)
  • Improved data cube measure preview by providing a manual override prompt when the preview fails (id: 38763)
  • Updated the names of the async downloads feature flags to Async Downloads (Deprecated) and Async Downloads, New Engine, 2023 (Alpha) (id: 40525)

Druid changes

  • Added experimental support for first/last data types for double/float/long during native and SQL-based ingestion (#14462) (id: 37231)
  • Added new config druid.audit.manager.type which can take values log, sql(default). This allows audited events to either be logged or persisted in metadata store (default behavior). (#15480) (id: 37696)
  • Added new config druid.audit.manager.logLevel which allows users to set the log level of audit events and can take values DEBUG, INFO(default), WARN. (#15480) (id: 37696)
  • Added array column type support to EXTEND operator (#15458) (id: 40286)
  • Changed what happens when query scheduler threads are less than server HTTP threads. When that happens, total laning is enforced, and some HTTP threads are reserved for non-query requests, such as health checks. Previously, any request that exceeded lane capacity was rejected. Now, excess requests are queued with a timeout equal to MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout). If the value is negative, requests are queued forever. (#15440) (id: 40776)
  • Changed the ARRAY_TO_MV function to support expression inputs (#15528) (id: 40358)
  • Changed the auto column indexer so that when columns that contain only empty or null containing arrays are ingested, they are stored as ARRAY<LONG> instead of COMPLEX<json>. (#15505) (id: 40313)
  • Fixed an issue where null and empty strings were treated equally, and the return value was always null (#15525) (id: 40401)
  • Fixed an issue where lookups fail with an error related to failing to construct FilteredAggregatorFactory (#15526) (id: 40296)
  • Fixed issues related to null handling and vector expression processors (#15587) (id: 40545)
  • Fixed a bug in the ingestion spec to SQL-based ingestion query convertor for the web console (#15627) (id: 40795)
  • Fixed redundant expansion in SearchOperatorConversion (#15625) (id: 40768)
  • Fixed an issue where some ARRAY types were treated incorrectly as COMPLEX types instead(#15543) (id: 40514)
  • Fixed a NPE with virtual expressions and unnest (#15513) (id: 40348)
  • Fixed an issue where the Window function minimum aggregates nulls as 0 (#15371) (id: 40327)
  • Fixed an issue where null filters on datasources with range partitioning could lead to excessive segment pruning, leading to missed results (#15500) (id: 40288)
  • Fixed an issue with window functions where a string cannot be cast when creating HLL sketches (#15465) (id: 39859)
  • Fixed a bug in segment allocation that can potentially cause loss of appended data when running interleaved append and replace tasks. (#15459) (id: 39718)
  • Improved filtering performance by adding support for using underlying column index for ExpressionVirtualColumn (#15585) (#15633) (id: 39668) (id: 40794)
  • Improved how three-valued logic is handled (#15629) (id: 40797)
  • Improved the Broker to be able to use catalog for datasource schemas for SQL queries (#15469) (id: 40796)
  • Improved the Druid audit system to log when a supervisor is created or updated (#15636) (id: 40774)
  • Improved the connection between Brokers and Coordinators with Historical and real-time processes (#15596) (id: 40763)
  • Improved how segment granularity is handled when there is a conflict and the requested segment granularity can't be allocated. Day granularity is now considered after month. Previously, week was used, but weeks do not align with months perfectly. You can still explicitly request week granularity. (#15589) (id: 40701)
  • Improved polling in segment allocation queue to improve efficiency and prevent race conditions (#15590) (id: 40690)
  • Improved the web console to detect EXPLAIN PLAN queries and be able to run them individually (#15570) (id: 40508)
  • Improved the efficiency of queries by Reducing amount of expression objects created during evaluations (#15552) (id: 40495)
  • Improved the error message you get if you try to use INSERT INTO and OVERWRITE syntax (id: 37790)
  • Improved the JDBC lookup dialog in the web console to include Jitter seconds, Load timeout seconds, and Max heap percentage options (#15472) (id: 40246)
  • Improved compaction so that it skips for datasources with partial eternity segments, which could result in memory pressure on the Coordinator (#15542) (id: 40075)
  • Improved Kinesis integration so that only checkpoints for partitions with unavailable sequence numbers are reset (#15338) (id: 29788)
  • Improved the performance of the following:
    • how Druid generates queries from Calcite plans
    • the internal SEARCH operator used by other functions
    • the COALESCE function (#15609) (id: 40672) (#15623) (id: 40691)
  • Removed the ‘auto’ strategy from search queries. Specifying ‘auto’ will now be equivalent to specifying useIndexes (#15550) (id: 40460)

Clarity changes

  • Updated subsetFormula for server cube to accept null values (id: 40254)

Platform changes

  • Added support for JVM memory metrics in GKE ZooKeeper deployments (id: 38855)

Upgrade and downgrade notes

Minimum supported version for rolling upgrade

See "Supported upgrade paths" in the Lifecycle Policy documentation.

Remove orphaned segments in GCS deep storage

If you have orphaned segments from failed kill tasks from 2024.01 STS through 2024.02.3 STS, optionally identify and delete any segments that meet both of the following criteria:

  • Segment exists in deep storage, but has no corresponding metadata store record.
  • Segment is older than 1 week.

Identifying segments older than a week will prevent deletion of pending segments.

stopTaskCount must now be explicitly set

Starting in 2024.03 STS, you must explicitly set a value for stopTaskCount if you want to use it for streaming ingestion. It no longer defaults to the same value as taskCount.

Segment metrics for real-time queries

Starting in 2024.02 STS, segment metrics for real-time queries (such as query/segment/time) are per-FireHydrant instead of per-Sink when the context parameter bySegment is set to false, which is common for most use cases.

Renamed segment metric

Starting in 2024.03 STS, the kill/candidateUnusedSegments/count is now called kill/eligibleUnusedSegments/count.

(#15977) (id: 42492)

GroupBy queries that use the MSQ task engine during upgrades

Beginning in 2024.02 STS, the performance and behavior for segment partitioning has been improved. GroupBy queries may fail during an upgrade if some workers are on an older version and some are on a more recent version.

Changes to native equals filter

Beginning in 2024.01 STS, the native query equals filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays.

Imply Hybrid MySQL upgrade

Imply Hybrid previously used MySQL 5.7 by default. New clusters will use MySQL 8 by default. If you have an existing cluster, you'll need to upgrade the MySQL version since the Amazon RDS support end date for this version is scheduled for February 29, 2024. Although you can opt for extended support from Amazon, you can use Imply Hybrid Manager to upgrade your MySQL instance to MySQL 8.

The upgrade should have little to no impact on your queries but does require a reconnection to the database. The process can take an hour and services will reconnect to the database during the upgrade.

In preparation for the upgrade, you need to grant certain permissions to the Cloud Manager IAM role by applying the following policy:

Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"rds:CreateBlueGreenDeployment",
"rds:PromoteReadReplica"
],
"Resource": [
"arn:aws:rds:*:*:pg:*",
"arn:aws:rds:*:*:deployment:*",
"arn:aws:rds:*:*:*:imply-*"
],
"Effect": "Allow"
},
{
"Action": [
"rds:AddTagsToResource",
"rds:CreateDBInstanceReadReplica",
"rds:DeleteBlueGreenDeployment",
"rds:DescribeBlueGreenDeployments",
"rds:SwitchoverBlueGreenDeployment"
],
"Resource": "*",
"Effect": "Allow"
}
]
}

After you grant the permissions, click Apply changes for Amazon RDS MySQL Update on the Overview page of Imply Hybrid Manager.

Three-valued logic

caution

The legacy two-valued logic and the corresponding properties that support it will be removed in the December 2024 STS and January 2025 LTS. The SQL compatible three-valued logic will be the only option.

Update your queries and downstream apps prior to these releases.

SQL standard three-valued logic introduced in 2023.11 primarily affects filters using the logical NOT operation on columns with NULL values. This applies to both query and ingestion time filtering.

The following example illustrates the old behavior and the new behavior: Consider the filter “x <> 'some value'” to filter results for which x is not equal to 'some value'. Previously, Druid included all rows not matching "x='some value'" including null values. The new behavior follows the SQL standard and will now only match rows with a value and which are not equal to 'some value'. Null values are excluded from the results.

This change primarily affects filters using the logical NOT operation on columns with NULL values.

Three-valued logic is only enabled if you accept the following default values:

druid.generic.useDefaultValueForNull=false
druid.expressions.useStrictBooleans=true
druid.generic.useThreeValueLogicForNativeFilters=true

SQL compatibility

caution

The legacy behavior that is not compatible with standard ANSI SQL and the corresponding properties will be removed in the December 2024 STS and January 2025 LTS releases. The SQL compatible behavior introduced in the 2023.09 STS will be the only behavior available.

Update your queries and any downstream apps prior to these releases.

Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.

For nulls, Druid now differentiates between an empty string ('') and a record with no data as well as between an empty numerical record and 0.

You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull to true. This property affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time and query time. Reverting this setting to the old value restores the previous behavior without reingestion.

For booleans, Druid now strictly uses 1 (true) or 0 (false). Previously, true and false could be represented either as true and false as well as 1 and 0, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL.

druid.expressions.useStrictBooleans primarily affects querying, however it also affects json columns and type-aware schema discovery for ingestion. You can set druid.expressions.useStrictBooleans to false to configure Druid to ingest booleans in 'auto' and 'json' columns as VARCHAR (native STRING) typed columns that use string values of 'true' and 'false' instead of BIGINT (native LONG). You must set it on all Druid service types to be available at both ingestion time and query time.

The following table illustrates some example scenarios and the impact of the changes:

Show the table
Query2023.08 STS and earlier2023.09 STS and later
Query empty stringEmpty string ('') or nullEmpty string ('')
Query null stringNull or emptyNull
COUNT(*)All rows, including nullsAll rows, including nulls
COUNT(column)All rows excluding empty stringsAll rows including empty strings but excluding nulls
Expression 100 && 11111
Expression 100 || 111001
Null FLOAT/DOUBLE column0.0Null
Null LONG column0Null
Null __time column0, meaning 1970-01-01 00:00:00 UTC1970-01-01 00:00:00 UTC
Null MVD column''Null
ARRAYNullNull
COMPLEXnoneNull
Update your queries

Before you upgrade from a version prior to 2023.09 to 2023.09 or later, update your queries to account for the changed behavior:

NULL filters

If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL to s IS NULL OR s = ''.

COUNT functions

COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column) with COUNT(column) FILTER(WHERE column <> '').

GroupBy queries

GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.

Avatica JDBC driver upgrade

info

The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.

If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection.

Parameter execution changes for Kafka

When using the built-in FileConfigProvider for Kafka, interpolations are now intercepted by the JsonConfigurator instead of being passed down to the Kafka provider. This breaks existing deployments.

For more information, see KIP-297 and #13023.

Deprecation notices

azure ingestion source parameter

Starting in 2024.02, the ioConfig.inputSource.type.azure parameter has been deprecated. Use the new azureStorage parameter instead. The new parameter supports ingesting from multiple accounts.

Two-valued logic

Druid's legacy two-valued logic for native filters and the properties for maintaining that behavior are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.

The ANSI-SQL compliant three-valued logic will be the only supported behavior after these releases. This SQL compatible behavior became the default for deployments that use Imply 2023.11 STS and January 2024 LTS releases.

Update your queries and downstream apps prior to these releases.

For more information, see three-valued logic.

Properties for legacy Druid SQL behavior

Druid's legacy behavior for Booleans and NULLs and the corresponding properties are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.

The ANSI-SQL compliant treatment of Booleans and null values will be the only supported behavior after these releases. This SQL compatible behavior became the default for Imply 2023.11 STS and January 2024 LTS.

Update your queries and downstream apps prior to these releases.

For more information, see SQL compatibility.

Some segment loading configs deprecated

Starting with 2023.08 STS, the following segment related configs are now deprecated and will be removed in future releases:

  • maxSegmentsInNodeLoadingQueue
  • maxSegmentsToMove
  • replicationThrottleLimit
  • useRoundRobinSegmentAssignment
  • replicantLifetime
  • maxNonPrimaryReplicantsToLoad
  • decommissioningMaxPercentOfMaxSegmentsToMove

Use smartSegmentLoading mode instead, which calculates values for these variables automatically.

SysMonitor support deprecated

Starting with 2023.08 STS, switch to OshiSysMonitor as SysMonitor is now deprecated and will be removed in future releases.

Asynchronous SQL download deprecated

The async downloads feature is deprecated and will be removed in future releases. Instead consider using Query from deep storage.

End of support

CrossTab view is deprecated

The CrossTab view feature is no longer supported. Use Pivot 2.0 instead, which incorporates the capabilities of CrossTab view.