Imply Enterprise and Hybrid release notes
Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2024.04.
For information on the LTS release, see the LTS release notes.
If you are upgrading by more than one version, read the intermediate release notes too.
The following end-of-support dates apply in 2023:
- On January 26, 2023, Imply 2021.01 LTS reached EOL. This means that the 2021.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
- On January 31, 2023, Imply 2022.01 LTS ended general support status and is eligible only for security support.
For more information, see Lifecycle Policy.
See Previous versions for information on older releases.
Imply evaluation
New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!
With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.
Imply Enterprise
If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.
Changes in 2024.04
Pivot highlights
Database auth tokens
You can now create a database auth token on a Pivot role to enable access to specific Druid data. See Database auth tokens for more information.
(id: 41170)
Druid highlights
Improved array ingest mode
The array
mode for arrayIngestMode
contains improvements that make it the best choice for any new datasources that contain arrays. Imply strongly recommends that you use array
mode instead of mvd
mode. array
mode provides a better experience, including support for a wider range of array types. Continued improvements to the array
ingest mode and array-typed columns are on the roadmap. Additionally, you can avoid certain limitations of mvd
mode by using array
mode.
The following list describes the behavior based on what you set arrayIngestMode
to:
- If you set it to
array
, SQL ARRAY types are stored using Druid array columns. This is recommended for new tables. - If you set it to
mvd
, SQLVARCHAR ARRAY
types are implicitly wrapped inARRAY_TO_MV
. This causes them to be stored as multi-value strings, using the sameSTRING
column type as regular scalar strings. This is the default behavior whenarrayIngestMode
is not provided in your query context. - If you set it to
none
, Druid throws an exception when trying to store any type of array.
The following table summarizes the differences in SQL ARRAY handling between arrayIngestMode: array
and arrayIngestMode: mvd
:
SQL type | Stored type when arrayIngestMode: array | Stored type when arrayIngestMode: mvd (default) |
---|---|---|
VARCHAR ARRAY | ARRAY<STRING> | multi-value STRING |
BIGINT ARRAY | ARRAY<LONG> | not possible (validation error) |
DOUBLE ARRAY | ARRAY<DOUBLE> | not possible (validation error) |
In either mode, you can explicitly wrap string arrays in ARRAY_TO_MV
to cause them to be stored as
multi-value strings.
Note that you cannot mix string arrays and multi-value strings in the same column.
(#15920) (id: 43043)
Pivot changes
- You can now use asymmetric number range filters with flat table, gauge, time series, and overall (beta) visualizations (id: 58507)
- Added query precision and query caching session persistence (id: 43251)
- Added
FilterWithRegex
permission which allows users to use the regex filter for string dimensions (id: 42840) - Added Pivot server configuration property
disableExternalEmails
which allows administrators to disable sending alerts and reports to external email addresses (id: 42201) - Added the ability to include only one value in a number filter (id: 43253)
- Improved the performance and behavior of the
PIVOT_NESTED_AGG
function (id: 43264) - Fixed an issue with the line chart visualization causing dashboards to crash (id: 58506)
- Fixed an issue with the street map visualization including all data cube Latitude/Longitude dimensions, even the ones not in the visualization (id: 58505)
- Fixed an issue with the y-axis extending above the line chart visualization boundary for very small values (id: 43300)
- Fixed unnecessary query in axis-query generation (id: 43244)
- Fixed an issue with table columns not respecting the time format (id: 43181)
- Fixed incorrect x-axis in the time series vizualisation (id: 42725)
- Fixed dashboard time filter "include end bound" not working in gauge, flat table, time series, and overall (beta) visualizations (id: 42406)
- Fixed flat table changes not being propagated from data cube to dashboard (id: 42275)
- Fixed time comparison not working as expected in a line chart visualization when bucketing <=5 minutes (id: 41982)
- Fixed an issue where dashboard filters reset from Greater than or equal and Less than or equal to Greater than and Less than (id: 40777)
Druid changes
- Added more logging for S3 retries (#16117) (id: 43161)
- Added new in filter that preserves the input types (id: 41500)
- Added new typed in filter (#16039) (id: 48937)
- Added error code to failure type InternalServerError (#16186) (id: 54432)
- Added support for using window functions with the MSQ task engine as the query engine (#15470) (id: 39416)
- Added support for joins in decoupled mode (#15957) (id: 42763)
- Added
segmentsRead
andsegmentsPublished
fields to parallel compaction task completion reports so that you can see how effective a compaction task is (#15947) (id: 38574) - Added a new
task/autoScaler/requiredCount
metric that provides a count of required tasks based on the calculations of thelagBased
autoscaler. Compare that value totask/running/count
to discover the difference between the current and desired task counts (#16199) (id: 58510) - Changed the controller checker for the MSQ task engine to check for closed only (#16161) (id: 43289)
- Added geospatial interfaces (#16029) (id: 60162)
- Fixed ColumnType to RelDataType conversion for nested arrays (#16138) (id: 43178)
- Fixed
WindowingscanAndSort
query issues on top of Joins (#15996) (id: 42717) - Fixed
REGEXP_LIKE
,CONTAINS_STRING
, andICONTAINS_STRING
so that they correctly return null for null value inputs in ANSI SQL compatible null handling mode (the default configuration). Previously, they returned false (#15963) (id: 43288) - Fixed the Azure icon not rendering in the web console (#16173) (id: 43286)
- Fixed a bug in the
MarkOvershadowedSegmentsAsUnused
Coordinator duty to also consider segments that are overshadowed by a segment that requires zero replicas (#16181) (id: 43285) - Fixed issues with
ARRAY_CONTAINS
andARRAY_OVERLAP
with null left side arguments as well asMV_CONTAINS
andMV_OVERLAP
(#15974) (id: 43162) - Fixed an issue where numeric LATEST_BY and EARLIEST_BY aggregations show incorrect results with latest_by (#15939) (id: 42342)
- Fixed a bug in the
markUsed
andmarkUnused
APIs where an empty set of segment IDs would be inconsistently treated as null or non-null in different scenarios (#16145) (id: 43153) - Fixed a bug where export queries did not use the output names specified and exported the temporary column names instead for some queries, such as GROUP BY (#16096) (id: 42826)
- Fixed a bug where
numSegmentsKilled
is reported incorrectly (#16103) (id: 42960) - Fixed an issue with metric emission in the segment generation phase (#16146) (id: 43152)
- Fixed a data race in getting results from MSQ select tasks (#16107) (id: 43000)
- Fixed an issue which can occur when using schema auto-discovery on columns with a mix of array and scalar values and querying with scan queries (#16105) (id: 43007)
- Fixed a bug where completion task reports are not being generated on
index_parallel
tasks. (#16042) (id: 42805) - Fixed an issue where
safe_divide
queries returned "Calcite assertion violated" errors (id: 41766) - Fixed an issue where SQL-based ingestion fails if the first monitor for
druid.server.metrics.ServiceStatusMonitor
isServiceStatusMonitor
(id: 38520) - Improved ingestion performance by parsing an input stream directly instead of converting it to a string and parsing the string as JSON (#15693) (id: 57692)
- Improved optimizations to the MSQ task engine for real-time queries so that they are backwards compatible (id: 42658)
- Improved serialization of TaskReportMap (#16217) (id: 60179)
- Improved the creation of input row filter predicate in various batch tasks (#16196) (id: 56861)
- Improved how tasks are fetched from the Overlord to redact credentials (#16182) (id: 52829)
- Improved the web console to only pick the Kafka input format by default when needed (#16180) (id: 60186)
- Improved compaction segment read and published fields to include sequential compaction tasks (#16171) (id: 60142)
- Improved the
markUnused
API endpoint to handle an empty list of segment versions (#16198) (id: 56864) - Improved the
segmentIds
filter in themarkUsed
API payload so that it's parameterized in the database query (#16174) (id: 47268) - Improved how quickly workers get canceled for the MSQ task engine (#16158) (id: 43179)
- Improved the MSQ task engine to support
IS NOT DISTINCT FROM
for SortMerge joins (#16003) (id: 43099) - Improved the download query detail archive option in the web console to be more resilient when the detail archive is incomplete (#16071) (id: 42908)
- Improved the UX for
arrayIngestMode
in the web console (#15927) (id: 43038) - Improved array handling for Booleans to account for queries such as
select array[true, false] from datasource
(#16093) (id: 42963) (id: 42610) - Improved nested columns. Nested column serialization now releases nested field compression buffers as soon as the nested field serialization is completed, which requires significantly less direct memory during segment serialization when many nested fields are present (#16076) (id: 42955)
- Improved querying to decrease the chance of going OOM with high cardinality data Group By (#16114) (id: 42502)
- Improved real-time queries that use the MSQ task engine by changing how segments are grouped (#15399) (id: 39167)
- Optimized
isOvershadowed
when there is a unique minor version for an interval (#15952) (id: 43287) - Updated the following dependencies:
redisclients:jedis
from 5.0.2 to 5.1.2 (#16074) (id: 42909)express
from 4.18.2 to 4.19.2 in the web console (#16204) (id: 60147)webpackdevmiddleware
from 5.3.3 to 5.3.4 in the web console (#16195) (id: 60146)followredirects
from 1.15.5 to 1.15.6 in the web console (#16134) (id: 43157)axios
in web console (#16087) (id: 42954)druidtoolkit
from 0.21.9 to 0.22.11 in the web console (#16213) (id: 60144)
Clarity changes
- Disabled alert custom time periods of less than one minute (id: 43223)
Imply Manager changes
- Allowed all kube-system pods to be moved by the cluster-autoscaler in GKE (id: 43198)
- Prevent middle managers from being replaced before task status is synced during rolling updates (id: 40137)
- Imply Hybrid on AWS:
- Enabled ServiceStatusMonitor by default (id: 38540)
- Fixed cluster manager API so that custom extensions are not removed (id: 33190)
Changes in 2024.03.1
Druid changes
- Fixed an issue where the Overlord process could fail to return the location of tasks (id: 60106)
Changes in 2024.03
Pivot highlights
Time series visualization supports more functions
The time series visualization now supports the following time series functions in addition to TIMESERIES and DELTA_TIMESERIES:
- ADD_TIMESERIES
- DIVIDE_TIMESERIES
- MULTIPY_TIMESERIES
- SUBTRACT_TIMESERIES
The configuration options have also been simplified. See Visualization reference for details.
(id: 40670)
Druid highlights
Dynamic table append
You can now use the TABLE(APPEND(...))
function to implicitly create unions based on table schemas. For example, the two following queries are equivalent:
TABLE(APPEND('table1','table2','table3'))
and
SELECT column1,NULL AS column2,NULL AS column3 FROM table1
UNION ALL
SELECT NULL AS column1,column2,NULL AS column3 FROM table2
UNION ALL
SELECT column1,column2,column3 FROM table3
Note that if the same columns are defined with different input types, Druid uses the least restrictive column type.
(#15897) (id: 42645)
Renamed segment kill metric
The kill/candidateUnusedSegments/count
metric is now called kill/eligibleUnusedSegments/count
.
(#15977) (id: 42492)
Improved streaming task completion reports
Streaming Task completion reports now have an extra field, recordsProcessed
. The field lists the partitions processed by that task and the count of records for each partition. You can look at this field to see the actual throughput of tasks and make decisions on whether to scale your workers vertically or horizontally.
(#15930) (id: 42430)
Improved Supervisor rolling restarts
The stopTaskCount
config now prioritizes stopping older tasks first. As part of this change, you must also explicitly set a value for stopTaskCount
. It no longer defaults to the same value as taskCount
.
(#15859) (id: 42143) (id: 40605)
Parallelized incremental segment creation
You can now configure the number of threads used to create and persist incremental segments on the disk using the numPersistThreads
property. Use additional threads to parallelize the segment creation to prevent ingestion from stalling or pausing frequently as long as there are sufficient CPU resources available.
(#13982) (id: 32098)
Fixes for deep storage on Google Cloud Storage
This release contains fixes for customers using deep storage on GCS. The issues were caused by updates to the Google Cloud Client libraries from an older API client. Affected STS versions of Imply were 2024.01 STS through 2024.02.3 STS. For remediation steps for kill task failures see Remove orphaned segments in deep storage.
- Fixed kill task failures caused when trying to delete a file that doesn't exist in Google Cloud Storage (#16047) (id: 42663)
- Fixed an issue where Druid incorrectly deleted task log events when
druid.indexer.logs.kill.enabled
is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838) - Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)
Improved query performance for AND filters
Druid query processing now adaptively determines when children of AND filters should compute indexes and when to simply match rows during the scan based on selectivity of other filters. Known as "filter partitioning," it can result in dramatic performance increases, depending on the order of filters in the query.
For example, take a query like SELECT SUM(longColumn) FROM druid.table WHERE stringColumn1 = '1000' AND stringColumn2 LIKE '%1%'
. Previously, Druid used indexes when processing filters if they are available. That's not always ideal; imagine if stringColumn1 = '1000'
matches 100 rows. With indexes, we have to find every value of stringColumn2 LIKE '%1%'
that is true to compute the indexes for the filter. If stringColumn2
has more than 100 values, it ends up being worse than simply checking for a match in those 100 remaining rows.
With the new logic, Druid checks the selectivity of indexes as it processes each clause of the AND filter. If it determines it would take more work to compute the index than to match the remaining rows, Druid skips computing the index.
The order you write filters in a WHERE clause of a query can improve the performance of the query. More improvements are coming, but you can try out the existing improvements by reordering a query. Put less intensive to compute indexes such as IS NULL, =, and comparisons (>
, >=,
<
, and <=
) near the start of AND filters so that Druid more efficiently processes your queries. Not ordering your filters in this way won’t degrade performance from previous releases since the fallback behavior is what Druid did previously.
(#15838) (id: 41535)
Pivot changes
- Added permission
AccessDownloadAsync
to allow users to access the async download (alpha) feature when the feature is enabled for your organization (id: 42274) - You can now set the Latest data strategy to Query the latest timestamp from the data source, relative to the latest full day in the advanced data cube options (id: 39634)
- You can now set the default view in a data cube's defaults to be a gauge, flat table, time series, or overall (beta) visualization (id: 41373)
- You can now choose whether or not to display the year in time values in a table visualization (id: 40988)
- Fixed an issue where filters and shown dimensions and measures were not preserved when switching to some visualization types (id: 41059)
- Fixed Pivot showing an error for some time comparisons in a data cube (id: 41013)
- Fixed a rounding issue in the display of dimensions (id: 42654)
- Fixed downloads limited to 5,000 rows for flat table, gauge, time series, and overall (beta) visualizations (id: 42600)
- Fixed failed async downloads producing a truncated file instead of an error (id: 42595)
- Fixed query precision issues (ids: 42521, 42227, 42230)
- Fixed async downloads not working with "previous period" comparisons (id: 42247)
- Fixed Pivot crashing when applying a filter to the records visualization (id: 42189)
- Fixed dashboard tiles causing save conflicts in flat table, gauge, time series, and overall (beta) visualizations (id: 41414)
- Fixed lack of indication when data cube is refreshed (id: 40260)
Druid changes
- Added support for single value aggregated Group By queries for scalars (#15700) (id: 41951)
- Added support for numeric arrays to columnar frames, which are used in subquery materializations and window functions (#15917) (id: 41784)
- Added the ability to set custom dimensions for events emitted by the Kafka emitter as a JSON map for the
druid.emitter.kafka.extra.dimensions
property. For example,druid.emitter.kafka.extra.dimensions={"region":"us-east-1","environment":"preProd"}
(#15845) (id: 41961) - Added more AWS Kinesis regions and groups to the web console (#15900) (id: 42476)
- Added support to the web console for Protobuf input formats and the Avro bytes decoder (#15950) (id: 42461)
- Changed the format of the value of
targetDataSource
in EXPLAIN clauses for SQL-based ingestion queries back to being a string. For some recent releases, it was a JSON object (#16004) (id: 42575) - Changed the severity of a
k8sTaskRunner
log message to WARN (#15871) (id: 42303) - Changed the
durationMs
properties in MSQ task reports to exclude worker/controller start up time (id: 40311) - Fixed an issue where queries that use LATEST_BY or EARLIEST_BY return null when they contain a secondary timestamp column (#15939) (id: 42917)
- Fixed an issue where Druid incorrectly deleted task log events when
druid.indexer.logs.kill.enabled
is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838) - Fixed errors when loading lookups sourced from GCS buckets where the fetched GCS object version is null (#16097) (id: 42916)
- Fixed an issue where the data loader for the web console crashes when attempting to parse data that can't be parsed (#15983) (id: 42649)
- Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. (#15615) (id: 42657)
- Fixed an issue where compaction tasks reports got overwritten. New entries are written to the report instead (#15981) (id: 42673)
- Fixed an issue that occurred when the
castToType
parameter is set onauto
column schema (#15921) (id: 42434) - Fixed an issue where
flattenSpec
is in the wrong location if you use the web console to generate the supervisor spec for a Kafka ingestion (#15946) (id: 42433) - Fixed an issue where Kubernetes environment variables that use underscores would be parsed incorrectly (#15919) (id: 42336)
- Fixed an issue where the wrong base template would be used for task types included through extensions, such as
index_kinesis
. For example, if you definedruid.indexer.runner.k8s.podTemplate.index_kafka
, the KubernetesTaskRunner still useddruid.indexer.runner.k8s.podTemplate.base
as the base template for tasks.(#15915) (id: 42293) - Fixed an issue where a query returns the wrong results if
PARSE_LONG
is null (#15909) (id: 42134) - Fixed an issue where Druid incorrectly deleted task log events when
druid.indexer.logs.kill.enabled
is active due to a mismatch in time units between Druid configuration and the Google Cloud client (#16083) (id: 42838) - Fixed an issue where MSQ task engine results are truncated and return an error (#16107)
- Improved Connection Count server select strategy to account for slow connection requests (#15975) (id: 42662)
- Improved the retry behavior for deep storage connections (#15938) (id: 42690)
- Improved how segments are counted so that segments still available through deep storage (replicas set to 0) are not marked as unavailable (#16020) (id: 42656)
- Improved the error message for when a MSQ task engine-based join using the
sortMerge
option falls back to a broadcast join (#16002) (id: 42655) - Improved
druid-basic-security
performance by using the cache for password hash when validating LDAP passwords (#15993) (id: 42650) - Improved concurrent replace to work with supervisors using concurrent locks (#15995) (id: 42648)
- Improved the web console to detect doubles better (#15998) (id: 42646)
- Improved the web console to be able to search in tables and columns (#15990) (id: 42647)
- Improved segment trouble shooting. Segments created in the same batch have the same
created_date
entry (#15977) (id: 42492) - Improved the error messages you get if there's an issue with your PARTITIONED BY clause (#15961) (id: 42462)
- Improved the web console to support export with the MSQ task engine (#15969) (id: 42460)
- Improved performance by reducing the number of metadata calls for the status of active tasks (#15724) (id: 42445)
- Improved how connections are counted and servers are selected to account for slow connections (#15975) (id: 42407)
- Improved the web console to allow compaction config slots to drop to 0, such as when compaction is paused (#15877) (id: 42178)
- Improved the web console to include system fields when using the batch data loader (#15858) (id: 41918)
- Updated PostgreSQL from 42.6.0 to 42.7.2 (#15931) (id: 42432)
- Improved performance for real-time queries that use the MSQ task engine (#15399) (id: 39167)
- Improved the Coordinator process to better handle an uninitialized cache in node role watchers, which could lead to stuck tasks (#15726) (id: 39099)
- Improved how expressions are evaluated to ensure thread safety (#15694) (id: 42620)
- Improved batching of scan results while estimating bytes (#15987) (id: 42507)
- Updated Log4j from 2.18.0 to 2.22.1 (#15934) (id: 42431)
Platform changes
- Account settings now display in Imply Hybrid Manager in SSO mode (id: 42372)
- Fixed an issue with Imply Enterprise on GKE deployments where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)
- Fixed a race condition that could cause Enterprise deployments on GKE to fail to start because of files missing from the configuration bundle (id: 42747) (id: 42726)
- Fixed an issue where GCP automatically deploys a managed Prometheus instance causing pod exhaustion. Imply Enterprise on GKE turns off these automatic Prometheus deployments by default now (id: 42567)
Changes in 2024.02.3
Druid changes
- Fixed an issue where string inputs are ingested as null values when they are typed as LONG or BIGINT. For example, decimals like "1.0" or "1.23" were incorrectly treated as NULL instead of 1 or 1.23. For details, see the following Imply Knowledge Base article (id: 42545)
Changes in 2024.02.2
Druid changes
- Fixed an issue with filters on expression virtual column indexes incorrectly considering values null in some cases for expressions which translate null values into not null values (id: 42448)
Changes in 2024.02.1
Druid changes
- Fixed an issue where the Druid console generates a Kafka supervisor spec where
flattenSpec
is in the wrong place, causing it to be ignored (#15946)
Pivot changes
- Fixed an issue where Pivot closes unexpectedly when you open the records visualization and apply a filter (id: 42189)
Platform changes
- Fixed an issue with the GKE enhanced installation where passing a custom certificate authority certificate for a MySQL instance causes the installation to fail (id: 42316)
Changes in 2024.02
Pivot highlights
New overall visualization (beta)
A new overall visualization includes a trend line and an updated properties panel.
You can enable this beta feature through the SDK based visualizations feature flag. Once enabled, the beta overall visualization replaces the standard overall visualization. See Visualizations reference for more information. (ids: 40562, 41090)
Druid highlights
Improved concurrent append and replace
You no longer need to manually specify the task lock type for concurrent append and replace using the taskLockType
context parameter. Instead, Druid can determine it for you. You can either use a context parameter or a cluster-wide config:
- Use the context parameter
"useConcurrentLocks": true
for specific JSON-based or streaming ingestion tasks and datasource. Datasources need the parameter in situations such as when you want to be able to append data to the datasource while compaction is running. - Set the cluster-wide config
druid.indexer.task.default.context
totrue
.
(#1568) (id: 41083)
Range support for window functions
Window functions now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter windowingStrictValidation
to false
.
The following example shows a window expression with RANGE frame specifications:
(ORDER BY c)
(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)
(#15746) (#15365) (id: 41623)
Ingest from multiple Azure accounts
Azure as an ingestion source now supports ingesting data from multiple storage accounts that are specified in druid.azure.account
. To do this, use the new azureStorage
schema instead of the previous azure
schema. For example,
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "azureStorage",
"objectGlob": "**.json",
"uris": ["azureStorage://storageAccount/container/prefix1/file.json", "azureStorage://storageAccount/container/prefix2/file2.json"]
},
"inputFormat": {
"type": "json"
},
...
},
...
(#15630) (id: 41428)
Improved performance for real-time queries
If the query context bySegment
is set to false
for real-time queries, the way in which layers are merged has been improved to be more efficient. There's now only a single layer of merging, just like for Historical processes. As part of this change, segment metrics, like query/segment/time
, are now per-FireHydrant instead of per-Sink.
If you set bySegment
to true
, the old behavior of two layer is preserved.
(#15757) (id: 41406)
Pivot changes
- Added
maxNumDownloadTasks
to Pivot server configuration file, to optionally set the maximum number of tasks to assign to async downloads. See Pivot server config for more information (id: 41092) - Added an option to "Go to URL" for URL dimensions in the flat table visualization (id: 41283)
- Fixed an error that appeared when duplicating a dashboard from the header bar (id: 41537)
- Fixed a problem with filtering on a dimension with Set/String type that contains nulls (id: 41459)
- Fixed an issue where async downloads didn't include filters by measure (id: 41435)
- Fixed records table visualization crashing when scrolling to the bottom in a dashboard tile (id: 41165)
- Fixed an issue with the records visualization not supporting async download (id: 41289)
- Fixed dimensions with IDs that contain periods showing as "undefined" in records table visualization (id: 41009)
- Fixed Pivot 2 visualizations crashing on data cubes with no dimensions (id: 40998)
- Fixed inability to set "Greater than 0" measure filter in flat table visualization (id: 40985)
- Fixed a problem with visualization URLs not updating after a measure is deleted from a data cube (id: 40565)
- Fixed "overall" values rendering incorrectly in line chart visualization when they should be hidden (id: 40501)
- Fixed incorrect time bucket label for America/Mexico_City timezone in DST (id: 39749)
- Fixed inability to scroll pinned dimensions list (id: 39647)
- Fixed discrepancies when applying custom UI colors (id: 40266)
- Improved handling of time filters dashboard tiles (id: 41171)
- Improved measures in tables visualization to show nulls if they contain no data (id: 40665)
- Improved the display of comparison values in visualizations, by adding the ability to sort by delta and percentage (id: 38539)
Druid changes
- Added
QueryLifecycle#authorize
forgrpcqueryextension
(#15816) (id: 41725) - Added nested array index support fix some issues (#15752) (id: 41724)
- Added support for array types in the web console ingestion wizards (#15588) (id: 41613)
- Added SQUARE_ROOT function to the timeseries extension:
MAP_TIMESERIES(timeseries, 'sqrt(value)')
(id: 41516) - Added null value index wiring for nested columns (#15687) (id: 41475)
- Added support to the web console for sorting the segment table on start and end when grouped (#15720) (id: 41438)
- Added a tile to the web console for the new Azure input source (id: 41317)
- Added
ImmutableLookupMap
for static lookups (#15675) (id: 41268) - Added Cache value selectors in
RowBasedColumnSelectorFactory
(#15615) (id: 41265) - Added faster
kway
merging using tournament trees 8byte key strides (#15661) (id: 40987) - Added CONCAT flattening filter decomposition (#15634) (id: 40986)
- Added partition boosting for INSERT with GROUP BY (dealing with skewed partition) (#15474) (id: 15015)
- Added SQL compatibility for numeric first and last column types. The web console also provides an option for first and last aggregation(#15607) (id: 40615)
- Added differentiation between null and empty strings in
SerializablePairStringLong
serde (id: 40401) - Changed
IncrementalIndex#add
is no longer thread safe and improves performance (#15697) (id: 41260) - Fixed the KafkaInputFormat parsing incoming JSON newline-delimited (as if it were a batch ingest) rather than as a whole entity (as is typical for streaming ingest) (#15692) (id: 41261)
- Improved segment locking behavior so that the
RetrieveSegmentsToReplaceAction
is no longer needed (#15699) (id: 41484) - Disabled eager initialization for non-query connection requests (#15751) (id: 41407)
- Enabled
ArrayListRowsAndColumns
toStorageAdapter
conversion (#15735) (id: 41616) - Enabled query request queuing by default when total laning is turned on (#15440) (id: 40807)
- Fixed web console forcing
waitUntilSegmentLoad
totrue
even if the user sets it to false (#15781) (id: 41614) - Fixed CVEs (#15814) (id: 41612)
- Fixed interpolated exception message in
InvalidNullByteFault
(#15804) (id: 41546) - Fixed extractionFns on
numberwrapping
dimension selectors (#15761) (id: 41443) - Fixed summary iterator in grouping engine(#15658) (id: 41264)
- Fixed incorrect scale when reading decimal from parquet (#15715) (id: 41263)
- Fixed a rendering issue for disabled workers in the web console (#15712) (id: 41259)
- Fixed issues so that the Kafka emitter will now run all scheduled callables. The emitter now intelligently provision threads to make sure there are no wasted threads, and all callables can run (#15719) (id: 41258)
- Fixed MSQ task engine intermediate files not being immediately cleaned up in Azure (id: 41243)
- Fixed audit log entries not appearing for "Mark as used all segments" actions (id: 41080)
- Fixed some naming related to
AggregatePullUpLookupRule
(#15677) (id: 41030) -- NOT USER FACING. DELETE - Fixed an NPE that could occur if the
StandardDeviationPostAggregator
passed in is null:postAggregations.estimator: null
(#15660) (id: 41003) - Fixed reverse pull-up lookups in the SQL planner (#15626) (id: 41002)
- Fixed compaction getting stuck on intervals with tombstones (#15676) (id: 41001)
- Fixed
Resultcache
causing an exception when a sketch is stored in the cache (#15654) (id: 40885) - Fixed concurrent append and replace options in the web console (#15649) (id: 40868)
- Fixed an issue that blocked queries issued from the small Run buttons (from inside the larger queries) from being modified from the table actions. (#15779) (id: 41515)
- Improved segment killing performance for Azure (#15770) (id: 38567)
- Improved the performance of the
druid-basic-security
extension (#15648) (id: 40884) - Improved lookups to register first lookup immediately, regardless of the cache status (#15598) (id: 40863)
- Improved numerical first and last aggregators so that they work for SQL-based ingestion too (id: 40996)
- Improved parsing speed for list-based input rows (#15681) (id: 41262)
- Improved error messages for DATE_TRUNC operators (#15759) (id: 41471)
- Improved the web console to support using file inputs instead of text inputs for the Load query detail archive dialogue (#15632) (id: 40941)
- Changed the web console to use the new
azureStorage
input type instead of theazure
storage type for ingesting from Azure (#15820) (id: 41723) - Changed the cryptographic salt size that Druid uses to 128 bits so that it is FIPS compliant (#15758) (id: 41405)
Changes in 2024.01.3
Druid changes
- Fixed an issue where DataSketches HLL Sketches would erroneously be considered empty. For details see the following Imply Knowledge Base article (id: 41916)
Changes in 2024.01.2
Druid changes
- Fixed an issue where an exception occurs when queries use filters on TIME_FLOOR (#15778)
Changes in 2024.01.1
Druid changes
- Fixed an issue with the default value for the
inSubQueryThreshold
parameter, which resulted in slower than expected queries. The default value for it is now2147483647
(up from20
) (#15688) (id: 40814)
Changes in 2024.01
Pivot highlights
Pivot now runs natively on macOS ARM systems
We encourage on-prem customers to opt-in to an updated distribution format for Pivot by setting an environment variable in your Pivot nodes: IMPLY_PIVOT_NOPKG=1
. This format will become the default later in 2024.
This distribution format enables Pivot to target current and future LTS versions of Node.js and provides a compatibility option for customers who are unable to upgrade from legacy Linux distributions such as RHEL 7, CentOS 7, and Ubuntu 18.04. (id: 40447)
Druid highlights
SQL PIVOT and UNPIVOT (beta)
You can now use the SQL PIVOT and UNPIVOT operators to turn rows into columns and column values into rows respectively. (id: 37598)
The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:
PIVOT (aggregation_function(column_to_aggregate)
FOR column_with_values_to_pivot
IN (pivoted_column1 [, pivoted_column2 ...])
)
The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:
UNPIVOT (values_column
FOR names_column
IN (unpivoted_column1 [, unpivoted_column2 ... ])
)
New JSON_QUERY_ARRAY function
The JSON_QUERY_ARRAY function is similar to JSON_QUERY except the return type is always ARRAY<COMPLEX<json>> instead of COMPLEX<json>. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations. (#15521) (id: 40335)
Changes to native equals
filter
Native query equals filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema
reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays. (#15503) (id: 40328)
Support for GCS for SQL-based ingestion
You can now use Google Cloud Storage (GCS) as durable storage for SQL-based ingestion and queries from deep storage. (#15398) (id: 35053)
Improved INNER joins
Druid can support arbitrary join conditions for INNER join. For INNER joins, Druid will look at the join condition, and any sub-conditions that cannot be evaluated efficiently as part of the join will be converted to a post-join filter. With this feature, you can do inequality joins that were not possible before. (#15302) (id: 37564)
Pivot changes
- Added Pivot server configuration property
forceNoRedirect
which forces the Pivot UI to always render the splash page without automatic redirection (id: 38986) - Added the ability to sort a data cube by the first column, by clicking the column header (id: 31363)
- Fixed percent of root causing downloads from deep storage to fail (id: 40673)
- Fixed incorrect sort order in deep storage downloads (id: 40374)
- Fixed flat table visualization with absolute time filter using "Latest day" when accessed with link (id: 40339)
- Fixed functional and display issues in the overall visualization (id: 40271)
- Fixed back button not working correctly in async downloads dialog (id: 40265)
- Improved query generation in Pivot and Plywood to use the 2-value IS NOT TRUE version of the NOT operator (id: 40638)
- Improved data cube measure preview by providing a manual override prompt when the preview fails (id: 38763)
- Updated the names of the async downloads feature flags to
Async Downloads (Deprecated)
andAsync Downloads, New Engine, 2023 (Alpha)
(id: 40525)
Druid changes
- Added experimental support for first/last data types for double/float/long during native and SQL-based ingestion (#14462) (id: 37231)
- Added new config
druid.audit.manager.type
which can take valueslog
,sql
(default). This allows audited events to either be logged or persisted in metadata store (default behavior). (#15480) (id: 37696) - Added new config
druid.audit.manager.logLevel
which allows users to set the log level of audit events and can take valuesDEBUG
,INFO
(default),WARN
. (#15480) (id: 37696) - Added array column type support to EXTEND operator (#15458) (id: 40286)
- Changed what happens when query scheduler threads are less than server HTTP threads. When that happens, total laning is enforced, and some HTTP threads are reserved for non-query requests, such as health checks. Previously, any request that exceeded lane capacity was rejected. Now, excess requests are queued with a timeout equal to
MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)
. If the value is negative, requests are queued forever. (#15440) (id: 40776) - Changed the ARRAY_TO_MV function to support expression inputs (#15528) (id: 40358)
- Changed the auto column indexer so that when columns that contain only empty or null containing arrays are ingested, they are stored as ARRAY<LONG> instead of COMPLEX<json>. (#15505) (id: 40313)
- Fixed an issue where null and empty strings were treated equally, and the return value was always null (#15525) (id: 40401)
- Fixed an issue where lookups fail with an error related to failing to construct
FilteredAggregatorFactory
(#15526) (id: 40296) - Fixed issues related to null handling and vector expression processors (#15587) (id: 40545)
- Fixed a bug in the ingestion spec to SQL-based ingestion query convertor for the web console (#15627) (id: 40795)
- Fixed redundant expansion in SearchOperatorConversion (#15625) (id: 40768)
- Fixed an issue where some ARRAY types were treated incorrectly as COMPLEX types instead(#15543) (id: 40514)
- Fixed a NPE with virtual expressions and unnest (#15513) (id: 40348)
- Fixed an issue where the Window function minimum aggregates nulls as 0 (#15371) (id: 40327)
- Fixed an issue where null filters on datasources with range partitioning could lead to excessive segment pruning, leading to missed results (#15500) (id: 40288)
- Fixed an issue with window functions where a string cannot be cast when creating HLL sketches (#15465) (id: 39859)
- Fixed a bug in segment allocation that can potentially cause loss of appended data when running interleaved append and replace tasks. (#15459) (id: 39718)
- Improved filtering performance by adding support for using underlying column index for
ExpressionVirtualColumn
(#15585) (#15633) (id: 39668) (id: 40794) - Improved how three-valued logic is handled (#15629) (id: 40797)
- Improved the Broker to be able to use catalog for datasource schemas for SQL queries (#15469) (id: 40796)
- Improved the Druid audit system to log when a supervisor is created or updated (#15636) (id: 40774)
- Improved the connection between Brokers and Coordinators with Historical and real-time processes (#15596) (id: 40763)
- Improved how segment granularity is handled when there is a conflict and the requested segment granularity can't be allocated. Day granularity is now considered after month. Previously, week was used, but weeks do not align with months perfectly. You can still explicitly request week granularity. (#15589) (id: 40701)
- Improved polling in segment allocation queue to improve efficiency and prevent race conditions (#15590) (id: 40690)
- Improved the web console to detect EXPLAIN PLAN queries and be able to run them individually (#15570) (id: 40508)
- Improved the efficiency of queries by Reducing amount of expression objects created during evaluations (#15552) (id: 40495)
- Improved the error message you get if you try to use INSERT INTO and OVERWRITE syntax (id: 37790)
- Improved the JDBC lookup dialog in the web console to include Jitter seconds, Load timeout seconds, and Max heap percentage options (#15472) (id: 40246)
- Improved compaction so that it skips for datasources with partial eternity segments, which could result in memory pressure on the Coordinator (#15542) (id: 40075)
- Improved Kinesis integration so that only checkpoints for partitions with unavailable sequence numbers are reset (#15338) (id: 29788)
- Improved the performance of the following:
- how Druid generates queries from Calcite plans
- the internal SEARCH operator used by other functions
- the COALESCE function (#15609) (id: 40672) (#15623) (id: 40691)
- Removed the ‘auto’ strategy from search queries. Specifying ‘auto’ will now be equivalent to specifying
useIndexes
(#15550) (id: 40460)
Clarity changes
- Updated
subsetFormula
for server cube to accept null values (id: 40254)
Platform changes
- Added support for JVM memory metrics in GKE ZooKeeper deployments (id: 38855)
Upgrade and downgrade notes
Minimum supported version for rolling upgrade
See "Supported upgrade paths" in the Lifecycle Policy documentation.
Remove orphaned segments in GCS deep storage
If you have orphaned segments from failed kill tasks from 2024.01 STS through 2024.02.3 STS, optionally identify and delete any segments that meet both of the following criteria:
- Segment exists in deep storage, but has no corresponding metadata store record.
- Segment is older than 1 week.
Identifying segments older than a week will prevent deletion of pending segments.
stopTaskCount
must now be explicitly set
Starting in 2024.03 STS, you must explicitly set a value for stopTaskCount
if you want to use it for streaming ingestion. It no longer defaults to the same value as taskCount
.
Segment metrics for real-time queries
Starting in 2024.02 STS, segment metrics for real-time queries (such as query/segment/time
) are per-FireHydrant instead of per-Sink when the context parameter bySegment
is set to false
, which is common for most use cases.
Renamed segment metric
Starting in 2024.03 STS, the kill/candidateUnusedSegments/count
is now called kill/eligibleUnusedSegments/count
.
(#15977) (id: 42492)
GroupBy queries that use the MSQ task engine during upgrades
Beginning in 2024.02 STS, the performance and behavior for segment partitioning has been improved. GroupBy queries may fail during an upgrade if some workers are on an older version and some are on a more recent version.
Changes to native equals
filter
Beginning in 2024.01 STS, the native query equals
filter on mixed type 'auto' columns that contain arrays must now be filtered as their presenting type. So if any rows are arrays (the segment metadata and information_schema
reports the type as some array type), then the native queries must also filter as if they are some array type. This does not impact SQL, which already has this limitation due to how the type presents itself. This only impacts mixed type 'auto' columns, which contain both scalars and arrays.
Imply Hybrid MySQL upgrade
Imply Hybrid previously used MySQL 5.7 by default. New clusters will use MySQL 8 by default. If you have an existing cluster, you'll need to upgrade the MySQL version since the Amazon RDS support end date for this version is scheduled for February 29, 2024. Although you can opt for extended support from Amazon, you can use Imply Hybrid Manager to upgrade your MySQL instance to MySQL 8.
The upgrade should have little to no impact on your queries but does require a reconnection to the database. The process can take an hour and services will reconnect to the database during the upgrade.
In preparation for the upgrade, you need to grant certain permissions to the Cloud Manager IAM role by applying the following policy:
Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"rds:CreateBlueGreenDeployment",
"rds:PromoteReadReplica"
],
"Resource": [
"arn:aws:rds:*:*:pg:*",
"arn:aws:rds:*:*:deployment:*",
"arn:aws:rds:*:*:*:imply-*"
],
"Effect": "Allow"
},
{
"Action": [
"rds:AddTagsToResource",
"rds:CreateDBInstanceReadReplica",
"rds:DeleteBlueGreenDeployment",
"rds:DescribeBlueGreenDeployments",
"rds:SwitchoverBlueGreenDeployment"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
After you grant the permissions, click Apply changes for Amazon RDS MySQL Update on the Overview page of Imply Hybrid Manager.
Three-valued logic
The legacy two-valued logic and the corresponding properties that support it will be removed in the December 2024 STS and January 2025 LTS. The SQL compatible three-valued logic will be the only option.
Update your queries and downstream apps prior to these releases.
SQL standard three-valued logic introduced in 2023.11 primarily affects filters using the logical NOT operation on columns with NULL values. This applies to both query and ingestion time filtering.
The following example illustrates the old behavior and the new behavior:
Consider the filter “x <> 'some value'”
to filter results for which x
is not equal to 'some value'
.
Previously, Druid included all rows not matching "x='some value'"
including null values.
The new behavior follows the SQL standard and will now only match rows with a value and which are not equal to 'some value'
.
Null values are excluded from the results.
This change primarily affects filters using the logical NOT operation on columns with NULL values.
Three-valued logic is only enabled if you accept the following default values:
druid.generic.useDefaultValueForNull=false
druid.expressions.useStrictBooleans=true
druid.generic.useThreeValueLogicForNativeFilters=true
SQL compatibility
The legacy behavior that is not compatible with standard ANSI SQL and the corresponding properties will be removed in the December 2024 STS and January 2025 LTS releases. The SQL compatible behavior introduced in the 2023.09 STS will be the only behavior available.
Update your queries and any downstream apps prior to these releases.
Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.
For nulls, Druid now differentiates between an empty string (''
) and a record with no data as well as between an empty numerical record and 0
.
You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull
to true
. This property affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time and query time. Reverting this setting to the old value restores the previous behavior without reingestion.
For booleans, Druid now strictly uses 1
(true) or 0
(false). Previously, true and false could be represented either as true
and false
as well as 1
and 0
, respectively. In addition, Druid now returns a null value for Boolean comparisons like True && NULL
.
druid.expressions.useStrictBooleans
primarily affects querying, however it also affects json columns and type-aware schema discovery for ingestion. You can set druid.expressions.useStrictBooleans
to false
to configure Druid to ingest booleans in 'auto'
and 'json'
columns as VARCHAR (native STRING)
typed columns that use string values of 'true'
and 'false'
instead of BIGINT (native LONG)
. You must set it on all Druid service types to be available at both ingestion time and query time.
The following table illustrates some example scenarios and the impact of the changes:
Show the table
Query | 2023.08 STS and earlier | 2023.09 STS and later |
---|---|---|
Query empty string | Empty string ('' ) or null | Empty string ('' ) |
Query null string | Null or empty | Null |
COUNT(*) | All rows, including nulls | All rows, including nulls |
COUNT(column) | All rows excluding empty strings | All rows including empty strings but excluding nulls |
Expression 100 && 11 | 11 | 1 |
Expression 100 || 11 | 100 | 1 |
Null FLOAT/DOUBLE column | 0.0 | Null |
Null LONG column | 0 | Null |
Null __time column | 0, meaning 1970-01-01 00:00:00 UTC | 1970-01-01 00:00:00 UTC |
Null MVD column | '' | Null |
ARRAY | Null | Null |
COMPLEX | none | Null |
Update your queries
Before you upgrade from a version prior to 2023.09 to 2023.09 or later, update your queries to account for the changed behavior:
NULL filters
If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL
to s IS NULL OR s = ''
.
COUNT functions
COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column)
with COUNT(column) FILTER(WHERE column <> '')
.
GroupBy queries
GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.
Avatica JDBC driver upgrade
The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.
If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection
.
Parameter execution changes for Kafka
When using the built-in FileConfigProvider
for Kafka, interpolations are now intercepted by the JsonConfigurator
instead of being passed down to the Kafka provider. This breaks existing deployments.
For more information, see KIP-297 and #13023.
Deprecation notices
azure
ingestion source parameter
Starting in 2024.02, the ioConfig.inputSource.type.azure
parameter has been deprecated. Use the new azureStorage
parameter instead. The new parameter supports ingesting from multiple accounts.
Two-valued logic
Druid's legacy two-valued logic for native filters and the properties for maintaining that behavior are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.
The ANSI-SQL compliant three-valued logic will be the only supported behavior after these releases. This SQL compatible behavior became the default for deployments that use Imply 2023.11 STS and January 2024 LTS releases.
Update your queries and downstream apps prior to these releases.
For more information, see three-valued logic.
Properties for legacy Druid SQL behavior
Druid's legacy behavior for Booleans and NULLs and the corresponding properties are deprecated and will be removed in the December 2024 STS and January 2025 LTS releases.
The ANSI-SQL compliant treatment of Booleans and null values will be the only supported behavior after these releases. This SQL compatible behavior became the default for Imply 2023.11 STS and January 2024 LTS.
Update your queries and downstream apps prior to these releases.
For more information, see SQL compatibility.
Some segment loading configs deprecated
Starting with 2023.08 STS, the following segment related configs are now deprecated and will be removed in future releases:
maxSegmentsInNodeLoadingQueue
maxSegmentsToMove
replicationThrottleLimit
useRoundRobinSegmentAssignment
replicantLifetime
maxNonPrimaryReplicantsToLoad
decommissioningMaxPercentOfMaxSegmentsToMove
Use smartSegmentLoading
mode instead, which calculates values for these variables automatically.
SysMonitor
support deprecated
Starting with 2023.08 STS, switch to OshiSysMonitor
as SysMonitor
is now deprecated and will be removed in future releases.
Asynchronous SQL download deprecated
The async downloads feature is deprecated and will be removed in future releases. Instead consider using Query from deep storage.
End of support
CrossTab view is deprecated
The CrossTab view feature is no longer supported. Use Pivot 2.0 instead, which incorporates the capabilities of CrossTab view.