Imply Enterprise and Hybrid release notes
Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2023.09.
For information on the LTS release, see the LTS release notes.
If you are upgrading by more than one version, read the intermediate release notes too.
The following end-of-support dates apply in 2023:
- On January 26, 2023, Imply 2021.01 LTS reached EOL. This means that the 2021.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
- On January 31, 2023, Imply 2022.01 LTS ended general support status and is eligible only for security support.
For more information, see Lifecycle Policy.
See Previous versions for information on older releases.
Imply evaluation
New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!
With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.
Imply Enterprise
If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.
Changes in 2023.09
Druid highlights
Ingest from multiple Kafka topics to a single datasource
You can now ingest streaming data from multiple Kafka topics to a datasource using a single supervisor.
You can configure the topics for the supervisor spec using a regex pattern as the value for topic
in the IO config.
If you add new topics to Kafka that match the regex, Druid automatically start ingesting from those new topics.
If you enable multi-topic ingestion for a datasource, downgrading will cause the supervisor to fail. For more information, see Stop supervisors that ingest from multiple Kafka topics before downgrading.
Hadoop 2 removed
Imply no longer supports using Hadoop 2 with your Druid cluster. Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x for ingestion today. If migrating to Druid's built-in ingestion is not possible, you must upgrade your Hadoop infrastructure to 3.x+ before upgrading to 2023.09.
Legacy GroupBy v1 removed
GroupBy v1 is a legacy engine and has not been supported since 2021. It's been removed in this release. Please use GroupBy v2 instead, which has been the default GroupBy engine for several releases. There should be no impact on your queries.
cachingCost
strategy removed
The cachingCost
strategy for segment loading has been removed. Use cost
instead, which has the same benefits as cachingCost
.
If you have cachingCost
set, the system ignores this setting and automatically uses cost
.
SQL changes
Strict booleans
By default, Druid now handles booleans strictly using 1
(true) or 0
(false). Previously, true and false could be represented either as true
and false
, respectively, as well as 1
and 0
.
This change may impact your query results. For more information, see SQL compatibility in the upgrade notes.
Null handling changes
By default, Druid now differentiates between empty records, such as ' '
, and null records. Previously, Druid might treat empty records as empty or as null.
This change may impact your query results. For more information, see SQL compatibility in the upgrade notes.
SQL planning and optimization
Druid uses Apache Calcite for SQL planning and optimization. The Calcite version been upgraded from 1.21 to 1.35.
As part of this upgrade, the recommended syntax for UNNEST has changed. We recommend using CROSS JOIN instead of commas for most queries to prevent issues with precedence:
SELECT column_alias_name1 FROM datasource CROSS JOIN UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...
For more information, see UNNEST syntax in the upgrade notes.
JSON and auto column indexer
For nested columns, the default format for the json
type has changed to be equivalent to the auto
type.
This format improves support for nested arrays of strings, longs, and doubles. It also optimizes storage when there is no actual nested data processed.
The new format was introduced with type-aware schema discovery ("auto" type column schema and indexer) in 2023.04. It is not compatible with the following versions:
- 2022.12 STS
- 2023.01 STS
- 2023.02 STS
- 2023.03 STS
- 2023.01 LTS
If you upgrade from one of these versions, you can continue to write nested columns in a backwards compatible format (version 4). For more information, see Nested column format in the upgrade notes.
Broker parallel merge config options
The paths fordruid.processing.merge.pool.*
and druid.processing.merge.task.*
have been flattened to use druid.processing.merge.*
instead. The previous paths for the configs are now deprecated but will continue to work in 2023.09. Migrate your settings to use the new paths because the old paths will be ignored in the future.
Guava upgrade
The version of Guava that Druid uses has been upgraded from 16.0.1 to 31.1-jre to address bug fixes and security issues. If you use an extension that has a transitive Guava dependency from Druid, it may be impacted.
The extensions that Imply packages and enables with our releases have accounted for this change.
If you have an extension not provided by Imply that uses Guava, you must upgrade the Guava version within the extension and rebuild your extension. Rolling upgrades will only be possible if you unload your extension before upgrading the cluster and reloading the extension with the newer version of Guava.
Other changes
Pivot changes
- Improved alert error handling so that Pivot checks the alert's owner for the
SeeErrorMessages
permission before sending a webhook request (id: 37184) - Improved the alignment of values in the table visualization (id: 36903)
- Improved relative time filters—they are now inclusive of the lower bound and exclusive of the upper bound (id: 37119)
- Fixed the absence of a limit on stack area queries in Pivot 2 (id: 36530)
- Fixed a problem with updating data cube refresh rate (id: 36071)
- Fixed "cannot read properties of undefined" error when an alert query results in an empty data set (id: 35914)
Druid changes
- Added topic name as a column in the Kafka input format (#14857) (id: 36885)
- Added support to ingest from multiple Kafka topics into a single datasource (id: 600)
- Added Kafka topic column controls (#14865) (id: 36884)
- Added sampling factor for
DeterminePartitionsJob
(#13840) (id: 31009) - Added test and metrics for
KillStalePendingSegments
duty (#14951) (id: 37357) - Added a configurable buffer period between when a segment is marked unused and deleted by
KillUnusedSegments
duty (#12599) (id: 36770) - Added support for broadcast segments (#14789) (id: 36614)
- Added lifecycle hooks to
KubernetesTaskRunner
(#14790) (id: 36508) - Added index filtering on Coordinator service to reduce the log size in Datadog (id: 36329)
- Added grace period for auto-kill based on when a segment is marked unused (id: 35404)
- Added new method for
SqlStatementResource
andSqlTaskResource
to set request attribute (#14878) (id: 37096) - Added brush to timechart in web console (#14929) (id: 37225)
- Added format notice for CSV and TSV in web console (#14783) (id: 36790)
- Added dynamic query parameters UI to the web console (#14921) (id: 37226)
- Added format selection for download in web console (#14794) (id: 36615)
- Re-added the v4 writers and adding a system config in JSON and auto indexers (id: 37334)
- Consolidated JSON and auto indexers remove v4 nested column serializer (#14456) (id: 36967)
- Deprecated
configmagic
in favor of JSON configuration (#14695) (id: 36781) - Disabled
cachingCost
balancer strategy (#14798) (id: 36788) - Enabled SQL-compatible null handling mode by default (#14792) (id: 36913)
- Enabled Kafka multi-topic ingestion from the data loader in the web console (#14833) (id: 36779)
- Exposed new Coordinator properties in the dialog in the web console (#14791) (id: 36772)
- Fixed bug in
KillStalePendingSegments
(#14961) (id: 37445) - Fixed
StringLastAggregatorFactory equalstoString
(#14907) (id: 37371) - Fixed bug in computed value of
balancerComputeThreads
(#14947) (id: 37308) - Fixed a mapping issue with the "others" field in line charts in the web console (#14931) (id: 37224)
- Fixed an error caused by datatype mismatch in numeric latest aggregations (id: 37127)
- Fixed latest aggregation for null in time selector (id: 37085)
- Fixed aggregation filter expression processing without projection (#14893) (id: 36956)
- Fixed error messages relating to OVERWRITE keyword (#14870) (id: 36932)
- Fixed MSQ select query failing with
RuntimeException
if REPLACE run after INSERT (id: 36226) - Fixed an issue with scaling code repeatedly resubmitting the same supervisor spec for idle supervisors (id: 36019)
- Fixed results when
useGroupingSetForExactDistinct
is set to true (id: 33088) - Fixed
SimpleChannelUpstreamHandler
exception (id: 33059) - Fixed UNNEST query with 'not between' filter returning wrong result (id: 32054)
- Fixed "is null" failing to find unnested null values coming from null rows (id: 32042)
- Fixed UNNEST query with
where <unnested column> not in (value list)
returning empty result set (id: 31861) - Fixed pushing not filter into base during UNNEST causes incorrect result (id: 36476)
- Fixed latest Vectorization throwing exception with expression in time (id: 36269)
- Fixed a bug in
QosFilter
(#14859) (id: 36870) - Fixed several issues and SQL query reformatting in web console (#14906) (id: 37090)
- Fixed a bug in result count in the web console (#14786) (id: 36509)
- Improved exception message when
DruidLeaderClient
doesn't find leader node (#14775) (id: 36618) - Improved speed of
SQLMetadataStorageActionHandlerTest
(#14856) (id: 36776) - Improved streaming ingestion completion timeout error message (#14636) (id: 36925)
- Improved incremental compilation (#14860) (id: 37041)
- Improved clarity of retention dialog in the web console (#14793) (id: 36616)
- Increased the computed value of
replicationThrottleLimit
(#14913) (id: 37109) - Improved helper queries by allowing for running inline helper queries in the web console (#14801) (id: 36778)
- Moved some lifecycle management from
doTask
shutdown for the middle manager-less task runner (#14895) (id: 37081) - Moved
UpdateCoordinatorStateAndPrepareCluster
duty out of the Coordinator (#14845) (id: 36938) - Reduced Coordinator logs in normal operation (#14926) (id: 37173)
- Removed DruidAggregateCaseToFilterRule (#14940) (id: 37271)
- Removed deprecated Coordinator dynamic configurations (#14923) (id: 37220)
- Removed config
druid.coordinator.compaction.skipLockedIntervals
(#14807) (id: 36801) - Removed groupby v1 (#14866) (id: 37028)
- Removed segmentsToBeDropped from SegmentTransactionInsertAction (#14883) (id: 36886)
- Removed support for Hadoop 2 (#14763) (id: 36510)
- Replaced
BaseLongVectorValueSelector
withVectorValueSelector
forStringFirstAggregatorFactoryfactorizeVector
(#14957) (id: 37384) - Reset offsets supervisor API (#14772) (id: 36773)
- Reset to specific offsets dialog for the Web console (#14863) (id: 36771)
- Updated
druidexpressionsuseStrictBooleans
default to true (#14734) (id: 36916) - Updated Coordinator dynamic
maxSegmentsToMove
based on cluster skew undersmartSegmentLoading
(id: 35427) - Updated task view to show execution dialog (#14930) (id: 37223)
- Updated
ServiceMetricEventBuilder
(#14933) (id: 37221) - Updated
filtersmd
(#14917) (id: 37135) - Updated EARLIEST, EARLIESTBY, LATEST, LATEST_BY for STRING columns to make
maxStringBytes
optional (#14848) (id: 36986) UpdatedInvalidNullByteException
to include the output column name (#14780) (id: 37035) - Updated Coordinator to use separate executor for each Coordinator duty group (#14869) (id: 36874)
- Updated
balancerComputeThreads
to use number of cores (#14902) (id: 37067) - Upgraded Druid's Calcite dependency to the latest stable version (id: 26962)
- Upgraded
comibmicu:icu4j
from 551 to 732 (#14853) (id: 36809) - Upgraded
orgapacherat:apacheratplugin
from 012 to 015 (#14817) (id: 36800) - Upgraded
orgapachemavenplugins:mavensurefireplugin
(#14813) (id: 36799) - Upgraded
comgithuboshi:oshicore
from 642 to 644 (#14814) (id: 36798) - Upgraded
orgscalalang:scalalibrary
from 2139 to 21311 (#14826) (id: 36797) - Upgraded
orgapachemavenplugins:mavensourceplugin
from 221 to 330 (#14812) (id: 36796) - Upgraded
orgassertj:assertjcore
from 3190 to 3242 (#14815) (id: 36795) - Upgraded
dropwizardmetricsversion
from 400 to 4219 (#14824) (id: 36793) - Upgraded
protobufversion
from 3217 to 3240 (#14823) (id: 36792) - Upgraded guava version to 311jre (#14767) (id: 36917)
- Upgraded
orgapachecommons:commonscompress
from 121 to 1230 (#14820) (id: 36789) - Upgraded
apachecuratorversion
from 540 to 550 (#14843) (id: 36786) - Upgraded
orgtukaani:xz
from 18 to 19 (#14839) (id: 36785) - Upgraded
commonscli:commonscli
from 131 to 150 (#14837) (id: 36784) - Upgraded
iodropwizardmetrics:metricsgraphite
from 312 to 4219 (#14842) (id: 36783) - Upgraded
orgapachedirectoryapi:apiutil
from 103 to 213 (#14852) (id: 36775) - Upgraded
jodatime
from 2124 to 2125 (#14855) (id: 36774) - Upgraded
jacksondatabind
to 2127 (#14770) (id: 36430) - Upgraded Postgresql from 4241 to 4260 (#13959) (id: 36613)
- Updated post filters and filters for UNNEST to include only the subset not pushed to base (id: 37253)
- Security issues (id: 37039)
Platform changes
- Added notification during Imply GKE installation if an operation will cause resources to be deleted (id: 16572)
- Allowed specifying GKE CIDR ranges when using TF directly (id: 37209)
- Supported custom versions in GKE Enhanced (id: 36507)
Clarity changes
- Fixed error "cannot read properties of undefined" error when an alert query results in an empty data set (id: 35914)
AWS Cloud Manager changes
- Added support for addition ARM instances to Imply Hybrid (id: 29972)
Changes in 2023.08
Druid highlights
Explore view in Druid console
The Explore view is a simple, stateless, SQL backed, data exploration view to the web console. It lets users explore data in Druid with point-and-click interaction and visualizations (instead of writing SQL and looking at a table). This can provide faster time-to-value for a user new to Druid and can allow a Druid veteran to quickly chart some data that they care about.
The Explore view is accessible from the More (...) menu in the header:
Query from deep storage (alpha)
Druid now supports querying segments that are stored only in deep storage. When you query from deep storage, you can query larger data available for queries without necessarily having to scale your Historical processes to accommodate more data. To take advantage of the potential storage savings, make sure you configure your load rules to not load all your segments onto Historical processes.
Note that at least one segment of a datasource must be loaded onto a Historical process so that the Broker can plan the query. It can be any segment though.
For more information, see the following:
Schema auto-discovery and array column types
Type-aware schema auto-discovery is now generally available. Druid can determine the schema for the data you ingest rather than you having to manually define the schema.
As part of the type-aware schema discovery improvements, array column types are now generally available. Druid can determine the column types for your schema and assign them to these array column types when you ingest data using type-aware schema auto-discovery with the auto
column type.
For more information, see Type-aware schema discovery.
Smart segment loading
The Coordinator is now much more stable and user-friendly. In the new smartSegmentLoading mode, it dynamically computes values for several configs which maximize performance.
The Coordinator can now prioritize load of more recent segments and segments that are completely unavailable over load of segments that already have some replicas loaded in the cluster. It can also re-evaluate decisions taken in previous runs and cancel operations that are not needed anymore. Moreover, move operations started by segment balancing do not compete with the load of unavailable segments thus reducing the reaction time for changes in the cluster and speeding up segment assignment decisions.
Additionally, leadership changes have less impact now, and the Coordinator doesn't get stuck even if re-election happens while a Coordinator run is in progress.
Lastly, the cost balancer strategy performs much better now and is capable of moving more segments in a single Coordinator run. These improvements were made by borrowing ideas from the cachingCost
strategy. We recommend using cost
instead since cachingCost
is now deprecated.
For more information, see:
- Smart segment loading
- Upgrade note for config changes related to smart segment loading
- Deprecation note for some segment related configs
New query filters
Druid now supports the following filters:
- Equality: Use in place of the selector filter. It never matches null values.
- Null: Match null values. Use in place of the selector filter.
- Range: Filter on ranges of dimension values. Use in place of the bound filter. It never matches null values
Note that Druid's SQL planner uses these new filters in place of their older counterparts by default whenever druid.generic.useDefaultValueForNull=false
or if sqlUseBoundAndSelectors
is set to false
on the SQL query context.
You can use these filters for filtering equality and ranges on ARRAY columns instead of only strings with the previous selector and bound filters.
For more information, see Query filters.
Guardrail for subquery results (alpha)
Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryBytes
in the Broker's config or maxSubqueryBytes
in the query context. This guardrail is recommended over row-based limiting.
This feature is experimental for now and defaults back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.
Added a new OSHI system monitor
Added a new OSHI system monitor (OshiSysMonitor
) to replace SysMonitor
. The new monitor has a wider support for different machine architectures including ARM instances. Switch to the new monitor. SysMonitor
is now deprecated and will be removed in future releases.
Java 17 support
Druid now fully supports Java 17. Note that this support is specifically for Druid, not Imply's other offerings.
Pivot highlights
Improvements to alerts:
- Updated the alert payload to include the alert query.
- Added data cube properties Minimum alert frequency and Minimum alert timeframe: these allow a user to prohibit alerts with a specified frequency on a data cube. See Managing data cubes for more information.
Other changes
Pivot changes
- Updated alerts to include the query in the alert payload (id: 31383)
- Updated alerts to allow users to prohibit alerts of a particular frequency (id: 34616)
- Updated alerts and reports to only display UI error notifications and send error notification emails to users with the
SeeErrorMessages
permission (id: 35160) - Updated the permissions associated with importing settings, data cubes, and dashboards (id: 34962)
- Fixed inability to preview a dimension change without the
CreateDataCube
permission (id: 35512) - Fixed filtering on a non-bucketed numeric dimension causing a partial query error (id: 35329)
- Fixed a problem that prevented alerts created in on a Pivot classic data cube from working in Pivot 2 (id: 35255)
- Fixed incorrect error code being generated when a data cube encounters a SQL parse exception—correct 400 error now displays to user (id: 35188)
- Fixed a problem with downloading null values in data cubes (id: 34859)
Druid changes
- Added new filters that replace existing filters:
- Use the new equality and null filters instead of selector filters
- Use the new range filter instead of the bound filter
- (#14542) (#14612) (id: 35060)
- Added
IP_COMPARE
function (id: 35700) - Added ZooKeeper connection state alerts and metrics (#14333) (id: 35454)
- Added support for smartSegmentLoading (#14610) (id: 35696)
- Added Explore view (#14602) (id: 35877)
- Added frames support for string arrays that are null (#14653) (id: 36042)
- Added durable storage selector to the Druid console (#14669) (id: 36057)
- Added support for query from deep storage to the web console (id: 34342)
- Added metric (
compact/segmentAnalyzer/fetchAndProcessMillis
) to report time spent fetching and analyzing segments (#14752) (id: 36335)(id: 36312) - Added new filters to unnest filter pushdown (#14777) (id: 36345)
- Added new dimensions for serviceheartbeat (#14743) (id: 36284)
- Added a shortcut menu for
COMPLEX<?>
types (#14668) (id: 36204) - Added ability to download pages of results from the new async APIs to the web console (#14712) (id: 36201)
- Added
serviceheartbeat
metric intostatsd-reporter
(#14564) (id: 35414) - Added support for earliest
aggregatorMergeStrategy
(#14598) (id: 35669) - Added log statements for
tmpStorageBytes
in MSQ (#14449) (id: 35372) - Added task toolbox to DruidInputSource (#14507)(id: 35304)
- Changed kill tasks to use bulk file delete API from S3 (id: 32746)
- Changed the default format from OBJECT to OBJECTLINES (#14700) (id: 36085)
- Changed default
handoffConditionTimeout
to 15 minutes (#14539) (id: 35476) - Enabled
ServiceStatusMonitor
in the example configurations (#14744) (id: 36228) - Enabled result level cache for
GroupByStrategyV2
on Broker (#11595) (id: 35428) - Enabled leader dimension in
service/heartbeat
metric intostatsd-reporter
(#14593) (id: 35537) - Fixed a bug where
json_value()
filter fails to find a scalar value match on a column with mixed scalar values and arrays (id: 35922) - Fixed a bug where the Druid auto column indexer failed for certain mixed type arrays (#14710) (id: 36097)
- Fixed the response when a task ID is not found in the Overlord process (#14706) (id: 36086)
- Fixed table filters not working in the web console when grouping is enabled (#14668) (id: 36204)
- Fixed an issue with Hadoop ingestion by adding the
PropertyNamingStrategies
from a compatiblejackson-databind
version (#14671) (id: 36205) - Fixed a bug for
SegmentLoadDropHandler
(#14670) (id: 35989) - Fixed a bug in
getIndexInfo
for MySQL (#14750) (id: 36274) - Fixed an issue where a JSON error caused the Next button to be greyed out while typing JSON in the web console (#14712) (id: 36201)
- Fixed a bug that occurred when the return type is STRING but is coming from a top level array typed column instead ofa nested array column. (#14729) (id: 36133)
- Fixed a bug introduced by #11201 (#14544) (id: 35363)
- Fixed a resource leak with Window processing (#14573) (id: 35456)
- Fixed two bugs in the web console: service view filtering not working and the data loader for SQL-based ingestion did not pick the best time column available (#14597) (id: 35586)
- Fixed a bug in the Coordinator to ensure that replication factor is reported correctly for async segments.(#14701) (id: 36060)
- Fixed
maxCompletedTasks
parameter inOverlordClientImpl
(#14667) (id: 35984) - Fixed an NPE that occurs during ingestion due to
datasketches
4.0.0 (#14568) (id: 35378) - Fixed a bug in the web console where the cursor jumped to the end or a field failed to display characters (#14632) (id: 35787)
- Fixed a bug where
Select ...
wherejson_keys()
is null returns wrong result (id: 22179) - Fixed issues with equality and range filters matching double values to long typed inputs (#14654) (id: 36027)
- Fixed an NPE that the
StringLast
aggregation throws when vectorization is enabled (id: 35868) - Fixed time unit for handoff in
CoordinatorBasedSegmentHandoffNotifier
(#14640) (id: 35815) - Fixed boolean segment filters (#14622) (id: 35701)
- Fixed a null pointer for rows and column stats information (#14617) (id: 35668)
- Improved description field when emitting metric for broadcast failure (#14703) (id: 36101)
- Improved performance of topN queries by minimizing
PostAggregator
computations (#14708) (id: 36253) - Improved alert message for segment assignments (#14696) (id: 36203)
- Improved MSQ to handle a race condition that occurs when
postCounters
is in flight and the Controller goes offline (#14707) (id: 36200) - Improved heap footprint of ingesting auto typed columns by pushing compression and index generation into writeTo (#14615) (id: 35703)
- Improved performance when extracting files from input sources (#14677) (id: 36047)
- Improved the core API required for Iceberg extension (#14614) (id: 35876)
- Improved heap footprint of
GenericIndexed
(#14563) (id: 35443) - Improved SQL statement API error messages (#14629) (id: 35733)
- Increased heap size for router (#14699) (id: 36098)
- Improved SQL planning logic to simplify bounds/range versus selectors/equality (#14619) (id: 35704)
- Improved the schema discovery description in the web console (#14601) (id: 35589)
- Improved Kubernetes performance by storing get task location on the lifecycle object (#14649) (id: 35874)
- Improved query behavior to reserve threads for nonquery requests without using laning (#14576) (id: 35706)
- Improved the performance impact of segment deletion on a cluster (#14642) (id: 35819)
- Improved segment deletion performance by using batching (#14639) (id: 35808)
- Improvements to the web console:
pages
information for SQL statements API,- Empty tiered replicants
- Interactive APIs for MSQ task engine
- Replication factor column for the metadata table
- Data format and compression in MSQ task assignment now accounted for
- Improved errors
- UI for dynamic compaction
- Better autofresh behavior
- Fixed a bug with the data loader
- Fixed a bug with counter misalignment in MSQ input counters
- (#14540) (id: 35536)
- Improved the error code of
InsertTimeOutOfBoundsFault
to be consistent with others (#14495) (id: 35472) - Improved worker generation (#14546) (id: 35468)
- Removed
chatAsync
parameter, so chat is always async (#14692) (id: 36099) - Removed the deprecated
InsertCannotOrderByDescending
MSQ fault (#14588) (id: 35539) - Updated
tough-cookie
from 4.0.0 to 4.1.3 in the web console (#14557) (id: 35401) - Updated core Apache Kafka dependencies to 3.5.1 (#14721) (id: 36104)
- Updated the
orgmozilla:rhino
dependency (#14765) (id: 36339) - Updated
decode-uri-component
from 0.2.0 to 0.2.2 in webconsole (#13481) (id: 35963) - Updated
org.xerial.snappy:snappy-java
from 1.1.10.1 to 1.1.10.3 (#14641) (id: 35931) - Updated version in Iceberg POM (#14605) (id: 35672)
Platform changes
- Added
ingressClassName
support to ingresses in Helm (id: 35954)
Clarity changes
- Improved SSO Clarity landing page when trying to use an invalid account (id: 35199)
Changes in 2023.07
Druid highlights
Time series functions (alpha)
Added support for time series functions. You can use time series functions to analyze time series data, identify trends and seasonality, interpolate values, and load extra time periods to fill in boundary values. Time series functions are disabled by default. Enable this feature by loading the imply-timeseries
extension. See Time series functions for more information.
Documentation changes
The Druid API reference docs are now collected under an API reference section and is now organized by function.
Pivot highlights
Time series visualization (alpha)
You can now use time series functions to generate a line or bar chart showing the rate of change in your data. You must load the imply-timeseries
extension and enable the SDK based visualizations feature flag before you can use this feature. See Time series visualization for more information.
Pivot improvements:
- You can now apply date formats to time dimensions. See Time dimensions for more information.
- You can now control the precision of TopN and COUNT DISTINCT queries by setting the new Query precision property in a data cube. See Managing data cubes for more information.
- You can now override the default 40 second query timeout by setting the new Query timeout override property in a data cube. See Managing data cubes for more information.
Other changes in 2023.07
Pivot changes
- Added time series visualization (id: 33035)
- Added the ability to apply date formats to time dimensions (id: 28776)
- Added query precision property to data cubes (id: 33500)
- Added an optional override for data cube 40 second query timeout (id: 33229)
- Added loading symbol to data cube display when measures and dimensions are loading, and improved speed of attribute display (id: 32861)
- Modified the alert occurrence list UI to emphasize the time frame of the occurrence over the time that the alert was triggered (id: 31799)
- Fixed Pivot attempting to validate custom time dimension bucketing as a number when dimension is not named
__time
(id: 35078) - Fixed alert preview incorrectly showing that an alert always triggers regardless of data and conditions (id: 35007)
- Fixed street map dashboard tile resetting latitude and longitude granularity when panning and zooming (id: 34932)
- Fixed dynamic time filter clause evaluating incorrectly when
maxTime
is set to midnight (id: 34453) - Fixed Geo Marks visualization appearing in dashboard tile when Geo Shade was selected (id: 29639)
- Fixed empty download file for data cubes with PiiMask (id: 34326)
Druid changes
- Added ability for SQL-based ingestion to write select results to durable storage (#14527) (id: 35343)
- Added full support for Java 17 (#14384) (id: 35341)
- Added
stringEncoding
parameter to DataSketches HLL (#11201) (id: 35177) - Added the file mapper to handle v2 buffer deserialization (#14429) (id: 34677)
- Added ability to enable cold-tier per datasources based on time interval (id: 31185)
- Removed unused coordinator dynamic configurations (#14524) (id: 35279)
- Removed
druidprocessingcolumnCachesizeBytes
andCachingIndexed
combine string column implementations (#14500) (id: 35191) - Fixed bug that occurred during
HttpServerInventoryView
initialization (#14517) (id: 35278) - Fixed incorrect error code on SQL-based ingestion query (id: 35267)
- Fixed NPE in datasketch (id: 35223)
- Fixed
SortMergeJoinFrameProcessor
buffering bugs (#14196) (id: 35189) - Fixed compatibility issue with
SqlTaskResource
(#14466) (id: 34993) - Fixed JSON_VALUE expression returning null instead of an array (id: 34833)
- Fixed null handling in
DruidCoordinatorgetReplicationFactor
(#14447) (id: 34754) - Fixed ingestion failing with mixed empty array and object in an array (id: 34681)
- Fixed double synchronize on simple map operations (#14435) (id: 34732)
- Fixed query planning failure if a CLUSTERED BY column contains descending order (#14436) (id: 34729)
- Fixed broker parallel merge to help managed blocking performance (#14427) (id: 34676)
- Fixed Kafka input format reader schema discovery and partial schema discovery (#14421) (id: 34675)
- Fixed queries not responding and "Sequence iterator timed out waiting for data" error in the logs (id: 34647)
- Fixed
HttpServerInventoryView
initialization delayed when server disappears (id: 33770) - Fixed
sortMerge
query returning error: "SQL requires a join with 'INPUT_REF' condition that is not supported." (id: 32366) - Fixed emitting negative lag metrics when there are Kafka connection issues (id: 32349)
- Fixed incorrect filtering on a column from an external datasource if named
__time
. (#14336) (id: 35129) - Fixed a problem with S3-compatible implementations (#14290) (id: 35035)
- Improved the ingestion view by splitting it into two views: Supervisors and Tasks (#14395) (id: 34680)
- Improved task update handling in task queue by using separate executor (#14533) (id: 35344)
- Improved
IntervalIterator
(#14530) (id: 35280) - Improved queries by setting explain attributes after the query is prepared (#14490) (id: 35318)
- Improved coerce exceptions by logging the field name (#14483) (id: 35193)
- Improved
InsertTimeOutOfBounds
error message in SQL-based ingestion (#14511) (id: 35186) - Improved segment loading (id: 23355)
- Improved subquery guardrail so that it obeys memory limit (id: 13296)
- Improved SQL
OperatorConversions
:IntroduceaggregatorBuilder
to allow CASTasliteral (#14249) (id: 35033) - Improved default
clusterStatisticsMergeMode
by making it sequential (#14310) (id: 35031) - Improved EXPLAIN PLAN attributes (#14441) (id: 35030)
- Improved
CostBalancerStrategy
by deprecatingcachingCost
(#14484) (id: 35027) - Improved error messaging for coercion errors (id: 34717)
- Improved visibility into
SegmentMetadataCache
(id: 33768) - Improved visibility into
ChangeRequestHttpSyncer
(id: 33767) - Improved the
getTasks
API by creating an additional index on the task table (id: 34802) - Improved logical planning and native query generation by decoupling them in SQL planning (#14232) (id: 34750)
- Improved Kafka supervisors by making them quieter in all bundled log4j2xml (#14444) (id: 34738)
- Improved handling of mixed type arrays by allowing expression best efforts determination (#14438) (id: 34698)
- Improved variance SQL aggregate function by supporting complex variance object inputs (#14463) (id: 35103)
- Upgraded Hadoop to version 336 (#14489) (id: 35064)
- Upgraded Avro to the latest version (id: 34697)
Clarity changes
- Added dimension 'description' to Raw Metrics data cube (id: 35241)
- Fixed Clarity UI attempting to validate custom time dimension bucketing as a number when dimension not named
__time
(id: 35078)
Changes in 2023.06.2
Druid changes
- Fixed an NPE with the DataSketches aggregator (id: 35283)
Pivot changes
- Fixed alerts to properly evaluate All conditions (id: 35142)
- Fixed an issue where an alert preview incorrectly showed that the alert would always trigger regardless of data and conditions (id: 35007)
Changes in 2023.06.1
Platform changes
- Security updates (id: 34192)
Changes in 2023.06
Druid highlights
druid.worker.baseTaskDirs
now applies to SQL-based ingestion
Starting in 2023.06, multi-stage query (MSQ) tasks for SQL-based ingestion now honor the size you set for task directories. This change allows the MSQ task engine to sort more data at the cost of performance. If a task requires more storage than the size you set, data spills over to S3, which can have performance impacts.
To mitigate the performance impact, you can either increase the number of tasks or increase the size you set for druid.worker.baseTaskDirs
.
Changed Coordinator config values
The following Coordinator dynamic configs have new default values:
maxsegmentsInNodeLoadingQueue
: 500, previously 100maxSegmentsToMove
: 100, previously 5replicationThrottleLimit
: 500, previously 10
These new defaults can improve performance for most use cases.
(#14269) (id: 34131)
Other changes in 2023.06
Pivot changes
- Added
checkFrequency
andtimeFrame
properties to the alert payload (id: 30543) - Moved the primary time dimension to the general tab in data cube properties (id: 32557)
- Fixed table visualization with duplicate measures rendering values in incorrect sort order (id: 34546)
- Fixed data cube not updating default filter when its dimensions change (id: 34395)
- Fixed incorrect redirect after OIDC authentication (id: 34263)
- Fixed "partial query error" appearing when refreshing or loading Pivot to display multiple measures at once (id: 33502)
- Fixed download modal hanging indefinitely when network errors occur (id: 33501)
- Fixed measure filters not working properly in visualizations when time is the only added dimension (id: 33477)
- Fixed incorrect auto-filled dimension expression when creating a new data cube (id: 33288)
- Fixed no data in dashboard tiles until a global filter is applied (id: 32742)
- Fixed dashboard tiles stuck loading indefinitely while axis and facet queries constantly execute (id: 32741)
- Fixed ability to add the same measure multiple times by expanding the show bar (id: 30397)
- Fixed NULL value appearing twice in filter menu (id: 29738)
- Fixed filtering stack area on two time ranges producing a crash (id: 26619)
Druid changes
- Added more statement attributes to explain plan result (#14391) (id: 34502)
- Adjust broker parallel merge to help managed blocking be more well behaved (#14427)
- Revert "Added method to authorize native query using authentication result" to prevent noisy native query logs (#14376) (id: 34495)
- Added logs for deleting files using storage connector (#14350) (id: 34491)
- Added
NullHandling
module initialization forLookupDimensionSpecTest
(#14393) (id: 34444) - Added configurable
ColumnTypeMergePolicy
toSegmentMetadataCache
(#14319) (id: 34080) - Added TYPENAME to the complex serde classes and replaced the hardcoded names (#14317) (id: 34012)
- Added ability to load segments on Peons (#14239) (id: 33607)
- Added
OverlordDuty
to replaceOverlordHelper
and align withCoordinatorDuty
(#14235) (id: 33595) - Added
arraytomv
function to convert arrays into multi-value dimensions (#14236) (id: 33514) - Added context flag
useAutoColumnSchemas
to use new auto types for MSQ segment generation (#14175) (id: 33499) - Fixed bug with sparse 'auto' column leading to ingestion failure (id: 34410)
- Fixed an NPE that happens when blocked threads and
workerPool
fail to execute (#14426) - Fixed
InsertCannotAllocateSegment
not reported when batch segment allocation is in use (id: 34086) - Fixed log streaming (#14285) (id: 34013)
- Fixed excessive alerts for "Request did not have an authorization check performed" (id: 33755)
- Fixed issue with launching Kubernetes jobs (#14282) (id: 33753)
- Fixed
SegmentAnalyzer
to be more resilient and prefer to use 'error' flag and messages onColumnAnalysis
rather than exploding (#14296) (id: 33689) - Fixed issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297) (id: 33684)
- Fixed an issue with filtering on a single dimension by converting In (#14277) (id: 33633)
- Fixed issue with MSQ rollup ingestion and aggregators with multiple names (#14367) (id: 34382)
- Fixed EARLIESTBYLATESTBY signature and included function name in signature (#14352) (id: 34380)
- Fixed MSQ NPE exception in controller logs for QueryNotSupported error (id: 34197)
- Fixed intermittent MSQ task failure due to inability to copy error (id: 33616)
- Fixed segment metadata queries for auto ingested columns that had all null values (#14262) (id: 33561)
- Fixed joins not optimizing filter correctly (id: 33430)
- Fixed expr getCacheKey implementations do not delegate (id: 32638)
- Fixed balancing thread stuck waiting for futures to resolve (id: 30425)
- Improved HLL sketch and Theta sketch estimates so that they can now be used as an expression (#14312) (id: 34391)
- Improved HLLSKetchPostAggregator so that it can now be used as an expression (id: 33653)
- Improved Broker logs by changing what gets logged (#14368) (id: 34315)
- Improved exception handling to include all types of exceptions when initializing input source in sampler API (#14355) (id: 34276)
- Improved Druid datasource to datasource ingestion by filtering out tombstone segments (id: 34096)
- Improved EXPLAIN PLAN to return RESOURCES as an ordered collection (#14323) (id: 34011)
- Improved merged counters to only show when they are non-zero (#14311) (id: 33766)
- Improved concurrency to not cancel already running workflows on a branch even when a new commit is pushed (#14279) (id: 33635)
- Improved task queue in Kubernetes task runner when capacity is fully utilized (#14156) (id: 33581)
- Improved error when user attempts to set null retention rules (id: 33057)
- Improved web console time format to account for auto-allowing for leading and trailing spaces (#14224) (id: 33352)
- Improved parent compaction task by limiting number of retries made while submitting a sub-task (id: 31208)
- Removed context paramaters from React components (#14366) (id: 34383)
- Removed incorrect optimization (#14246) (id: 33506)
- Removed
AbstractIndex
(#14388) (id: 34445) - Updated operations per run (#14325) (id: 34113)
- Updated web console DQT to latest version and fixed bigint crash (#14318) (id: 34098)
- Updated heap size of coordinator overlord services in Docker IT environment (#14214) (id: 33594)
- Upgraded the React dependency to v18 (#14380) (id: 34472)
Changes in 2023.05
Druid highlights
Schema auto-discovery (beta)
You can now declare a partial schema for type-aware schema discovery. Previously, the schema had to be fully declared or omitted. (#14076) (id: 32605)
MSQ task engine
Added support for querying lookup and inline data directly for the MSQ task engine (#14048) (id: 32663)
Hadoop 3
Starting with 2023.05 STS, the Imply distribution is compatible with Hadoop 3 by default. If you need a Hadoop 2 compatible build, contact Imply Support.
Pivot highlights
Pivot alert improvements:
- You can now configure alerts to send notification emails to users outside Pivot.
- The Check every and Timeframe properties now appear in the alert payload.
Other Pivot improvements:
- You can now use the Pivot server configuration property
userNameAuthority
to determine whether Pivot or OIDC populates name fields in Pivot. See Pivot server config for details.
Other changes in 2023.05
Pivot changes
- Added ability to send alert emails to users outside Pivot (id: 31178)
- Added Check every and Timeframe properties to alert payload (id: 30543)
- Added Pivot server configuration property
userNameAuthority
that determines whether Pivot or OIDC populates name fields in Pivot (id: 31971) - Fixed Pivot auto-creating dimensions for ARRAY columns during data cube creation. ARRAY data is not supported in Pivot (id: 33265)
- Fixed null value in time dimension producing an error and preventing visualization (id: 32068)
- Fixed dimension preview using an alias instead of a formula for string filters (id: 33126)
- Fixed date range missing on bubble modal on the horizontal bars view, when doing measure compare (id: 32850)
- Fixed no data showing in dashboard tiles until a global filter is applied (id: 32742)
- Fixed dashboard tiles stuck indefinitely loading while axis and facet queries constantly execute (id: 32741)
- Fixed unsupported time range error after converting a dashboard (id: 32740)
- Fixed non-additive measures not working in Treemap visualizations (id: 32737)
- Fixed inability to add complex comparisons from show bar (id: 32711)
- Fixed dropping a dimension in Pivot 2 doesn't replace all existing splits (id: 32697)
- Fixed opening a data cube created in SQL mode from home view fails when Pivot 2 feature flag is turned off (id: 32666)
- Fixed numeric filter with bucketing set to "never bucket" creates duplicate items (id: 32067) and crashes on searching (id: 32056)
- Fixed deleting data cubes and dashboards checks for payload conflicts before commit (id: 32052)
- Fixed relative time filter not working properly in delta filter comparison (id: 29411)
- Changed
No such datasource
status codes to 400 during data cube creation (id: 33105) - Changed data cube option
boostPrefixRank
totrue
for queries generated from string filters for multi-value dimensions. This boosts the rank of results whose prefix matches the search text (id: 32714) - Improved Pivot behavior to return user to dashboard after adding a datacube view to a new or existing dashboard (id: 32839)
- Improved context of alert and report logs (id: 32047)
- Improved the consistency of alert trigger time (id: 31798)
Druid changes
- Added logging for merge and push timings for PartialGenericSegmentMergeTask (#14089) (id: 32727)
- Added the ability for tasks to run using multiple different mount points for temporary storage. Set
druid.worker.baseTaskDirs
to an array of locations to enable. If you were usingdruid.indexer.task.baseTaskDirPaths
, that setting no longer works. You must switch todruid.worker.baseTaskDirs
. (#14063) (id: 32604) - Added a new column start_time to sys.servers that captures the time at which the server was added to the cluster. (#13358) (id: 32628)
- Added support for querying lookup and inline data directly for MSQ (#14048) (id: 32663)
- Added check for required avroBytesDecoder property that otherwise causes NPE (#14177) (id: 33230)
- Added more logs for sequential merge (#14097) (id: 32692)
- Added support for multiple result columns with the same name (#14025) (id: 32607)
- Added support in the web console for changing the time column name when reading from external sources in MSQ (#14165) (id: 33091)
- Changed the web console to set the count on the rule history API (#14164) (id: 33086)
- Changed the timeout to get worker capacity in the Druid console to be higher (#14095) (id: 32670)
- Changed
useSchemaDiscovery
to also include the behavior of includeAllDimensions to support partial schema declaration without having to set two flags (#14076) (id: 32605) - Changed to Hadoop 3 by default for Imply distribution (id: 31956)
- Changed compaction tasks to so that input specs with intervals not aligned with
segmentGranularity
aren't allowed (#14127) (id: 33018) - Changed tombstone behavior so that supervisor tombstones get created only when creating a new supervisor fails (id: 32946)
- Fixed bugs with auto encoded long vector deserializers (#14186) (id: 33113)
- Fixed issues with filtering nulls on values coerced to numeric types (id: 33324)
- Fixed a regression where queries with json_value() encounter NPE (id: 33426)
- Fixed miscellaneous bugs in the web console (#14216) (id: 33316)
- Fixed NPE in test parse exception report Add more tests with different thresholds (#14209) (id: 33284)
- Fixed CaseOperatorConversion has a bug when
THEN
clause contains binary operator expression (id: 33094) - Fixed Kafka Avro ingestion throws unhandled ParseException (id: 29247)
- Fixed task query error decode (#14174) (id: 33090)
- Fixed bug filtering nested columns with expression filters (#14096) (id: 32717)
- Fixed NPE with gs uri having underscores (id: 32749) (#14107)
- Fixed input source security feature not working for MSQ tasks (#14056) (id: 32553)
- Fixed failed queries not releasing lanes in certain scenarios (id: 33028)
- Fixed bugs and added support for boolean inputs to classic long dimension indexer (#14069) (id: 32595)
- Fixed input source security SQL layer can handle input source with multiple types (#14050) (id: 32538)
- Fixed 75K files scenario failed in stage0 with Unable to execute HTTP request error (id: 32537)
- Fixed natural comparator selection for groupBy (SQL) (#14075) (id: 32668)
- Fixed a bug where you couldn't mark segments as used when the whole datasource is unused (#14185) (id: 33111)
- Fixed an issue where ephemeral storage from the Overlord for Peon tasks wasn't respected (#14201) (id: 33204)
- Fixed a bug where if
json
is explicitly specified,auto
got returned instead. (#14144) (id: 32920) - Improved MSQ to preserve original ParseException when writing frames (#14122) (id: 32978)
- Improved the tier selector in the web console (#14143) (id: 32915)
- Improved error message when Druid services are not running (#14202) (id: 33262)
- Improved the web console to stringly schemas in the data loader (#14189) (id: 33206)
- Improved error message for CSV with no properties (#14093) (id: 32771)
- Improved the Avro extension to: allow more complex jsonpath expressions (#14149) (id: 32933)
- Improved handling for zerolength intervals (#14136) (id: 33017)
- Improved GCP initialization to be truly lazy (#14077) (id: 32606)
- Improved native JSON error UX (#14155) (id: 33003)
- Improved lookup 404 detection in the web console (#14108) (id: 32820)
Platform changes
- Imply GKE now supports Ubuntu OS in addition to Container OS (id: 33335)
- Updated the version of NGINX that ships with GKE (id: 33203)
- Made
deepstore
the default intermediary data storage type for managed deployments with deep storage (id: 33119) - Fixed whitespace issues causing a full cluster restart when adding a node via API (id: 33071)
- Fixed
reserveThreadsForNonQueryRequests
not triggering a restart of query nodes (id: 32652) - Fixed an issue where you couldn't use SQL-based ingestion when navigating to the Druid console through Pivot (id: 32590)
Changes in 2023.04
Druid highlights
Auto type column schema (beta)
A new "auto" type column schema and indexer has been added to native ingestion as the next logical iteration of the nested column functionality. This automatic type column indexer that produces the most appropriate column for the given inputs, producing either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json> columns, all sharing a common 'nested' format.
All columns produced by 'auto' have indexes to aid in fast filtering (unlike classic LONG and DOUBLE columns) and use cardinality based thresholds to attempt to only utilize these indexes when it is likely to actually speed up the query (unlike classic STRING columns).
COMPLEX<json> columns produced by this 'auto' indexer store arrays of simple scalar types differently than their 'json' (v4) counterparts, storing them as ARRAY typed columns. This means that the JSON_VALUE function can now extract entire arrays, for example JSON_VALUE(nested, '$.array' RETURNING BIGINT ARRAY). There is no change with how arrays of complex objects are stored at this time.
This improvement also adds a completely new functionality to Druid, ARRAY typed columns, which unlike classic multi-value STRING columns behave with ARRAY semantics. These columns can currently only be created via the 'auto' type indexer when all values are an arrays with the same type of elements.
An array data type is a data type that allows you to store multiple values in a single column of a database table. Arrays are typically used to store sets of related data that can be easily accessed and manipulated as a group.
This release adds support for storing arrays of primitive values such as ARRAY<STRING>
, ARRAY<LONG>
, and ARRAY<DOUBLE>
as specialized nested columns instead of breaking them into separate element columns.
(#14014) (#13803) (id: 32406)
These changes affect two additional new features available in 26.0: schema auto-discovery and unnest.
Schema auto-discovery (beta)
We’re adding schema-auto discovery with type inference to Druid. With this feature, the data type of each incoming field is detected when schema is available. For incoming data which may contain added, dropped, or changed fields, you can choose to reject the nonconforming data (“the database is always correct - rejecting bad data!”), or you can let schema auto-discovery alter the datasource to match the incoming data (“the data is always right - change the database!”).
To use this feature, set spec.dataSchema.dimensionsSpec.useSchemaDiscovery
to true
. Druid can infer the entire schema or some of it if you explicitly list dimensions in your dimensions list.
Schema auto-discovery is available for native batch and streaming ingestion.
(#13653) (#13672) (#14076)
Sort-merge join and hash shuffle join for MSQ (beta)
We can now perform shuffle joins by setting the context parameter sqlJoinAlgorithm
to sortMerge
for the sort-merge algorithm or omitting it to perform broadcast joins (default).
Multi-stage queries can use a sort-merge join algorithm. With this algorithm, each pairwise join is planned into its own stage with two inputs. This approach is generally less performant but more scalable, than broadcast.
Set the context parameter sqlJoinAlgorithm
to sortMerge
to use this method.
Broadcast hash joins are similar to how native join queries are executed.
For more information, see Broadcast and Sort-merge`.
(#13506) (id: 31556)
Storage improvements on dictionary compression
Switching to using frontcoding dictionary compression (beta) can save up to 30% with little to no impact to query performance.
This release further improves the frontCoded
type of stringEncodingStrategy
on indexSpec
with a new segment format version, which typically has faster read speeds and reduced segment size. This improvement is backwards incompatible with Druid 25.0. Added a new formatVersion
option, which defaults to the current version 0
. Set formatVersion
to 1
to start using the new version.
(#13988) (#13996)
Additionally, overall storage size, particularly with using larger buckets, has been improved.
(#13854)
Pivot highlights
Enterprise and Hybrid customers can now configure the street map visualization without contacting Imply to enable the feature. On-prem customers must still contact Imply before they can set up a street map visualization.
Pivot alert improvements:
- If an alert evaluation cycle is delayed into the time frame of the next evaluation due to technical failures, Pivot now identifies and processes skipped evaluation cycles in an orderly manner.
- You can now trigger an alert when no new data has been ingested, with the Latest data strategy override property. See Create an alert for details.
- Improved the alert preview message in the Pivot UI.
Other changes in 2023.04
Pivot changes
- Added support for triggering an alert even when no new data has been ingested (id: 10662)
- Added the ability to identify and backfill skipped alert evaluation cycles in an orderly manner (id: 31796)
- Added the ability to swap axes for table and sparkline visualizations (id: 30172)
- Added microseconds and nanoseconds to measurement abbreviations (id: 31949)
- Improved the alert preview message (id: 19383)
- Improved the 1 month comparison period in visualizations: now 1 calendar month instead of 30 days (id: 31994)
- Improved alert logging (id: 31793)
- Fixed Open interactive report button in report emails not working (id: 32490)
- Fixed an issue with scheduled reports not retaining view, filter, and split information when opened in Pivot (id: 32301)
- Fixed Pivot downloads using hardcoded split limit, not row limit from download options (id: 32039)
- Fixed comparison data not being included in Pivot 2 data downloads (id: 32019)
- Fixed line chart visualization showing a data point for zero when there is no data (id: 31996)
- Fixed query error on transform measures defined using the Advanced tab (id: 31898)
- Fixed blank spot in sunburst and pie chart visualization when Others option is set to Hide (id: 31784)
- Fixed report now showing dimension data in emails produced from async download (id: 31623)
- Fixed user remaining logged in after changing their password—Pivot now enforces reauthentication (id: 31212)
- Fixed overall comparison failing in report view (id: 30862)
- Fixed filter by measure not being applied to Pivot 2 report (id: 30532)
- Fixed inability to create data cube from a source with nested JSON columns using a "SELECT * from" query (id: 30018)
- Fixed creating a dimension using the Add a dimension link in data cube view produces an error (id: 32300)
Druid changes
- Added ability to add configuration files when using Druid on Kubernetes (#13795) (id: 30736)
- Added tuple sketch SQL support (#13887) (id: 31459)
- Added better null handling for HttpServerInventoryView, HttpRemoteTaskRunner, LookupNodeDiscovery, and SystemSchema (id: 31680)
- Added better FrontCodedIndexed (#13854) (id: 31695)
- Added better JSON column support for literal arrays (id: 20204)
- Added engine as a dimension for sqlQuery metrics (#13906) (id: 31700)
- Added JWT authenticator support for validating ID tokens (#13242) (id: 28173)
- Added timeout to TaskStartTimeoutFault (#13970) (id: 32112)
- Added arrays to nested columns—array columns (#13803) (id: 32111)
- Added Kubernetes task runner live reports (#13986) (id: 32339)
- Added a UI for Overlord dynamic configurations to the Druid console (#13993) (id: 32335)
- Added backwards compatibility mode for frontCoded stringEncodingStrategy (#13988) (id: 32176) (#13996) (id: 32336)
- Added a new error message for task deletion (#14008) (id: 32333)
- Added null handling and proper error message when a server does not exist/has already been removed from ServerInventoryView and BrokerServerView is trying to add a segment for it (id: 29925)
- Added configurable retries to ZooKeeper connections (#13913) (id: 31592)
- Added back function signature for compatibility (#13914) (id: 31600)
- Changed default maxRowsInMemory for realtime ingestion to a lower number (#13939) (id: 31942)
- Changed SQL operators
NVL
andCOALESCE
with 2 arguments to now plan a nativenvl
expression, which supports the vector engine. Multi-argument COALESCE still plans into a case_searched, which is not vectorized (#13897) (id: 31566) - Enabled round robin segment assignment and batch segment allocation by default (#13942) (id: 31954)
- Fixed JOIN or UNNEST queries over tombstone segment can fail (#14021) (id: 32486)
- Fixed querying SQL (#14026) (id: 32484)
- Fixed SQL in segment card (#13895) (id: 31507)
- Fixed off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method (#14047) (id: 32481)
- Fixed issues with null pointers on jobResponse (#14010) (id: 32407)
- Fixed some queries fail post 2022.05 versions with error "Cannot convert query parts into an actual query" (id: 26709)
- Fixed join and Unnest planning to ensure that duplicate join prefixes are not used (#13943) (id: 31990)
- Fixed an issue with SELECT COUNT(distinct) with GROUP BY returning an HLL cast error in MSQ (id: 32198)
- Fixed an issue so that realtime tasks retry when they fail to pause (#11515) (id: 32079)
- Fixed bug with expression transform byte[] handling and improvements to expression transform array handling (id: 31864)
- Fixed Peon errors when executing tasks in ipv6(#13972) (#13995) (id: 32337)
- Fixed Overlord not becoming a leader when syncing the lock from metadata store (#14038) (id: 32452)
- Fixed an OOM in the tombstone generating logic in MSQ (#13893) (id: 31563)
- Fixed HSTS for MiddleManager (#13975) (id: 32077)
- Fixed new HSTS header not being applied to MiddleManager service (id: 32084)
- Fixed an NPE for SELECT COUNT(unnested column) when the column has a null row (id: 31851)
- Fixed Unnest with WHERE sees 'Received a non-applicable rewrite' error (id: 31660)
- Fixed Unnest with WHERE sees 'SQL query is unsupported' error (id: 31659)
- Fixed in issue with the start-druid script (#13891) (id: 31549)
- Fixed several issues for Unnest (#13892) (id: 31604)
- Fixed Parquet ingestion bug for uint_32 type fields (id: 31602)
- Fixed KafkaInputFormat when used with Sampler API (#13900) (id: 31555)
- Fixed Load Data UI so that it supports the "kafka" inputType (id: 12898)
- Fixed a bug that occurred when using expression filters for filtering
COMPLEX<json>
columns or values extracted from them usingJSON_VALUE
or any other nested column function. The column would be incorrectly treated as all null values (#14096) (id: 32717) - Improved subquery guardrail so that it obeys memory limit (id: 13296)
- Improved nested column index utilization (#13977) (id: 32487)
- Improved segment heap footprint and fixed bug with expression type coercion (#14002) (id: 32334)
- Improved the Druid console to show segment writing progress (#13929) (id: 32002)
- Improved performance by creating new RelDataTypeFactory during SQL planning (#13904) (id: 31562)
- Improved nested column storage format for simple typed columns (id: 29771)
- Improved error message when topic name changes within same supervisor (#13815) (id: 31511)
- Upgraded ZK from 3.5.9 to 3.5.10 to avoid data inconsistency risk (#13715) (id: 30187)
- Upgraded the fabric client to support newer versions of k8s (#13804) (id: 30766)
- Window planning: use collation traits to improve subquery logic (#13902) (id: 31596)
Platform changes
- Added support for more GKE regions (id: 32513)
- Scaled ZK memory usage based on master instance type (id: 32293)
Clarity changes
Updated Clarity password policy. Passwords for Clarity must now satisfy the following criteria:
- A least 8 characters.
- No older than 90 days.
- Doesn't match the 5 most recent passwords.
- Contains alphabetic, numeric, and special characters.
Additionally, Clarity now locks a user out after 6 invalid login attempts.
(id: 31281)
Changes in 2023.03.1
Druid updates
- Fixed an issue where ingestion using MSQ ran out of memory and failed when durable shuffle storage was enabled (id: 31960)
- Fixed an issue where MSQ tasks ran out of disk even though
intermediateSuperSorterStorageMaxLocalBytes
was set.
Changes in 2023.03
Druid highlights
Information schema now uses numeric column types
This is a breaking change.
The Druid system table (INFORMATION_SCHEMA
) now uses SQL types instead of Druid types for columns. This change makes the INFORMATION_SCHEMA
table behave more like standard SQL. You may need to update your queries in order to avoid unexpected results if you depend on either of the following:
- Numeric fields being treated as strings
- Column numbering starting at 0—column numbering is now 1-based
(#13777) (id: 31065)
Unnest (beta) changes
The UNNEST SQL function has been improved. You can now unnest multiple columns within a single query. For example: select * from "example_table", UNNEST(MV_TO_ARRAY("dim3")) as table_alias1(d3), UNNEST(ARRAY[dim4,dim5]) as table_alias2(d45)
Additionally, note the following breaking changes:
The UNNEST SQL function requires you to set the context parameter
enableUnnest
totrue
. There is no required context parameter for the nativeunnest
query or for Pivot queries.The syntax for the
unnest
datasource for native queries has changed. It now uses avirtualColumn
to perform the unnest:"virtualColumn": {
"type": "expression",
"expression": "\"column_reference\""
},The
unnest
datasource no longer has an allow list option.For more information about these changes to the
unnest
datasource, seeunnest
.
(#13892) (id: 30663) (id: 30537)
Pivot highlights
Improved IP dimension performance: After you upgrade to 2023.03, update any existing IP dimensions to benefit from new performance improvements. To do this, edit the IP dimension in Pivot and:
- Remove
IP_STRINGIFY
from the custom formula—for example, for a column with nameipaddress
:- Old custom formula:
IP_STRINGIFY("t"."ipaddress")
- New custom formula:
"t"."ipaddress"
- Old custom formula:
- Change the dimension type from
String
toIP
orIP prefix
as appropriate.
Dual axis: The line chart visualization now supports multiple scales and lines. You can display two continuous metrics on the same chart with two axes. See the Visualizations reference for details.
Other changes in 2023.03
Pivot changes
- Added dual axis capability to the line chart visualization (id: 30240)
- Added data cube option View essence to display the JSON data structure for a Pivot 2 data cube (id: 30824)
- Added searchable drop-downs throughout Pivot (id: 23489)
- Fixed transforms not being correctly applied to multi-value dimensions (id: 31192)
- Fixed records visualization cutting off data when long values were present (id: 30219)
- Fixed next available measure not displaying when default measure was removed from data cube (id: 29737)
- Improved error handling for color legends of multi-value dimensions (id: 29907)
- Improved performance of IP dimensions (id: 29444)
- Removed Show metadata option from Pivot 2 data cubes (id: 30158)
Druid changes
- Added support for running indexing tasks on multiple disks for Middlemanagers/Indexers. Multiple base task directories can be assigned using
druid.indexer.task.baseTaskDirPaths=[\"PATH1\",\"PATH2\",...]
in the runtime properties of the Middlemanager/Indexer (#13476) (id: 16181) - Added support for range partitioning for Hadoop-based batch ingestion (#13303) (id: 28169)
- Added a new dialog to the web console that shows compaction history (#13861) (id: 31452)
- Added a new Python Druid API for use in Jupyter notebooks (#13787) (id: 31416)
- Added new functionality for Tuple sketches (#13819) (id: 30811) You can now do the following:
- Get the sketch output as base64 string
- Provide a constant Tuple sketch in post-aggregation step that can be used in set operations
- Get the
Estimated Value(Sum)
of summary/metrics objects associated with Tuple sketch
- Added metric for time taken for broker to start up (#13716) (id: 29924)
- Changed the SQL CAST operator conversion to use
Calcites.getColumnTypeForRelDataType
to convert Calcite types to native Druid types instead of using its own customSqlTypeName
to ExprType mapping. This makes it more consistent with the SQL to Druid type conversions for most other operators (13890) (id: 25979) - Fixed an issue where nested queries for unnest had wrong output column name (#13892) (id: 30440)
- Fixed an issue where unnest returned different results on the same MV array when there was a null row (#13922) (id: 30537)
- Fixed an issue where MSQ replaces segments of granularity ALL with other granularities, causing the Peon to run out of memory (id: 31454)
- Fixed
expectedSingleiContainerOutputyaml
spelling (#13870) (id: 31422) - Fixed an issue where checking if durable storage for MSQ is enabled returned inaccurate results (#13881) (id: 31417)
- Fixed an issue where queries with multiple unnests returned incorrect results (id: 31310)
- Fixed a NPE in Kinesis supervisor when
recordsPerFetch
was not set (id: 31122) - Fixed an issue where leader redirection didn't work when both plainText and TLS ports were set (id: 31082)
- Fixed infinite checkpointing between tasks and Overlord (#13825) (id: 31056)
- Fixed query cancel NPE in the web console (#13786) (id: 30829)
- Fixed an issue where ShuffleStorage ingestion failed with an OOM error for MSQ (id: 30019)
- Fixed an issue where escaping string literals in lookup queries for lookups loaded on MariaDB fail (id: 30810)
- Fixed ARRAY_AGG so that it works with complex types, and fixed bugs with expression aggregator complex array handling (#13781) (id: 30778)
- Fixed an issue with the SQL planner when virtual column capabilities were null (#13797) (id: 30750)
- Improved null value handling in SQL multi-value string functions (id: 25978)
- Improved dependencies—consolidate
druid-core
,extendedset
, anddruid-hll
modules intodruid-processing
to simplify dependencies. Any extensions referencing these should be updated to use druid-processing instead. Existing extension binaries should continue to function normally when used with newer versions of Druid (#13698) (id: 30891) - Improved logs for query errors (#13776) (id: 30776)
- Improved logging for MSQ worker tasks (#13790) (id: 31186)
- Improved speed for composite key joins on IndexedTable (#13516) (id: 31064)
- Improved auto completion in the web console (#13830) (id: 31048)
- Improved HLL sketches to be more optimized (#13737) (id: 30681)
- Improved
/druid/indexer/v1/sampler
to includelogicalDimension
,physicalDimension
andlogicalSegmentSchema
, which are a list of the most restrictive typed dimension schemas, the list of dimension schemas actually used to sample the data, and full resulting segment schema for the set of rows sampled respectively (13711) (id: 29884) - Improved join performance on dense composite keys (id: 29014)
- Improved client change counter management in HTTP server view (#13010) (id: 28183)
- Removed
FiniteFirehoseFactory
and implementations (#12852) (id: 23960) - Upgraded Druid query toolkit (#13848) (id: 31251)
- Upgraded Kafka version to resolve CVE-2023-25194 (id: 31200)
- Updated Apache Kafka dependencies to 340 (#13802) (id: 30774)
Changes in 2023.02.1
Platform changes
- Security updates
Changes in 2023.02
Druid highlights
SQL UNNEST (beta)
You can unnest arrays with either the UNNEST function (SQL) or the unnest
datasource.
The UNNEST function for SQL allows you to unnest arrays by providing a source array expression using the following syntax:
UNNEST(source_expression) AS table_alias_name (column_alias_name)
For more information, see either UNNEST (SQL) or unnest
(native).
Pivot highlights
Pivot API docs update
We've corrected and improved the Pivot API docs, and added request examples.
Other changes in 2023.02
Pivot changes
- Fixed editing and saving a data cube from the data cubes tab not returning user to data cubes tab (id: 25283)
- Fixed query monitoring showing an empty query for cached queries (id: 30283)
- Fixed inability to create data cube directly from sources that include JSON columns (id: 30017)
- Fixed UI attempting to validate custom time dimension bucketing as a number when dimension was not named
__time
(id: 29866) - Fixed measure "missing value fill" not working when set to previous or interpolation and splitting on time (id: 29857)
- Fixed filtering by null value showing multiple "null" options in drop-down (id: 29638)
- Fixed measure and dimension conversions for chained filter expressions (id: 29233)
- Fixed download queries ignoring "Hide filtered out values" multi-value dimension setting on axis queries (id: 29195)
- Fixed street map visualization suggestions when one of the dimensions was removed (id: 29117)
- Fixed unexpected sort order in table visualization with multiple values (id: 26933)
- Fixed datacube info not showing all relevant datacube names when tiles were from multiple datacubes (id: 20694)
- Fixed inability to configure the settings export limit (id: 30210)
- Fixed multi-value dimension filter returning incorrect results when there were duplicated values in a multi-value column (id: 29629)
- Fixed Horizontal bars visualization not filtering correctly by Delta measure (id: 29378)
- Fixed report attachments using split limits incorrectly (id: 28919)
- Fixed errors not appearing when alert and report generation fails (id: 28857)
- Fixed on-prem Clarity users unable to see data due to missing
clarityUser
metric (id: 30587) - Improved and corrected Pivot API docs (id: 28735, 29975)
- Removed "Show metadata" option from async downloads (id: 29328)
Druid changes
- Added schemaless ingestion to allow discovery of nested columns (id: 29770)
- Added nested column indexer for schemaless ingestion (id: 29769)
- Added SQL functions for UNNEST (id: 28859)
- Added rules to support comma join and UNNEST syntax (id: 28858)
- Added extra time-granularity with DATE_EXPAND (id: 20364)
- Added API endpoint
CoordinatorCompactionConfigsResource#getCompactionConfigHistory
to return automatic compaction configuration history (#13699) (id: 29962) - Added Fallback virtual column that enables falling back to another column if the original column doesn't exist (id: 30524)
- Added feature flag for nested column scalar handling (id: 30181)
- Added SQL version of UNNEST native Druid function (id: 29963)
- Added implementations of semantic interfaces to optimize window processing on top of an ArrayListSegment (id: 29724)
- Added more robust default fetch settings for Kinesis (id: 29689)
- Added support for adding Strict-Transport-Security header to HTTP responses (id: 29558)
- Fixed
json_value
on scalar virtual column in WHERE clause returning an error (id: 30620) - Fixed compaction history returning empty list instead of 404 when not found (id: 30456)
- Fixed overly verbose batch allocation logs by changing log level from info to debug (id: 30452)
- Fixed MSQ insert with
to_json_string
failing with CannotParseExternalData (id: 30341) - Fixed Overlord can continuously remove and add workers for the same node when httpRemote is enabled (id: 30270)
- Fixed nested column handling of null values (id: 30180)
- Fixed web console data loader not allowing for multiline JSON messages in Kafka (id: 30127)
- Fixed window parameter for timeseries function (id: 30014)
- Fixed JSON column becoming unreadable after appending complex data to scalar data (id: 30006)
- Fixed variable name in
start-druid
(id: 29587) - Fixed
json_value
not working for Protobuf bytes (id: 29516) - Fixed null values missing after ingesting a JSON array as a JSON column (id: 29214)
- Fixed ingesting an Avro array as a JSON column misses null values in the array (id: 28895)
- Fixed scalar values being ingested as
COMPLEX<json>
column type when there were nulls (id: 29841) - Improved query operations by making them pausable (id: 29973)
- Improved and extended table functions functionality (id: 29815)
Platform changes
- Added support for additional ports on headless services (id: 30175)
- Fixed feature flag list not always showing feature flags already enabled on the cluster (id: 30474)
- Removed the feature flag and extension loading option for multi-stage query engine (id: 29835)
- Enabled durable storage for SQL-based ingestion (MSQ task engine) by default for Imply Enterprise and Hybrid clusters if you use S3 as your deep storage option.
- Updated ZK to version 3.7.1 (K8s and GKE Enhanced) (id: 30196)
Changes in 2023.01
Druid highlights
The MSQ task engine now has fault tolerance support for workers. This means an ingestion task will retry if the underlying machine executing ingestion fails. You can enable this behavior by setting faultTolerance
to true
in the context parameters for a query.
Pivot highlights
You can now create a URL containing query parameters, to link to a dashboard or data cube. See Query parameters reference for details.
You can now apply a "flat" layout to the Table visualization to display a column for each dimension. See the Visualizations reference for details.
Dashboards and Pivot 2 data cubes now include an option to enable or disable the query cache for all dashboards and data cubes. See Managing data cubes and Managing dashboards for details. This option already existed for Pivot Classic data cubes.
Platform highlights
For Imply Enterprise on GKE, you can now use persistent disks for data nodes by selecting the option when configuring a tier. Using persistent disks can improve stability by making pod rescheduling more streamlined.
Other changes in 2023.01
Pivot changes
- Added ability to create a URL with query parameters, to link to a dashboard or data cube (id: 28068, id: 28067, id: 28106)
- Added flat table layout for table visualizations (id: 24697)
- Added query cache property to options menu for dashboards and Pivot 2 data cubes (id: 23747)
- Improved UX for comparative alert constraints (id: 5330)
- Fixed changing visualization resets dimension/measure selection in data cube view (id: 29538)
- Fixed inability to create a data cube via custom SQL with an aliased JSON column (id: 29206)
- Fixed gap in data cube visualization panel (id: 28915)
- Fixed JSON column introspection failing when creating data cube via custom SQL (id: 28853)
- Fixed UX when attempting to exclude values in filter when there are multiple nulls in the column (id: 27633)
- Fixed query generation when filtering on multiple null values (id: 27632)
- Fixed reports breaking if underlying data cube was deleted (id: 26073)
Druid changes
- Added support for adding
Strict-Transport-Security
header to HTTP responses (#13489) (id: 29466) - Added new Broker query metrics:
- Total parallel merge processing 'wall' time
- Parallel merge pool time spent waiting for results of 'fastest' and 'slowest' partitions.
- (#13420) (id: 28897)
- Added nested columns support for protobuf (#13519) (id: 28713)
- Added ability to select a specific emitter for each feed. For example this means that alerts can go to
http
emitter, metrics can go to statsd emitter, andrequestLog
events can go to a Kafka emitter. For more information, see Switching emitter (#13363) (id: 28904) - Added tracking for bytes processed by a task in MSQ task reports (#13520) (id: 28436)
- Allowed string dimension indexer to handle byte as base64 strings (#13573) (id: 29498)
- Changed how ZooKeeper is specified as a service for the
start-druid-main
script (#13550) (id: 29152) - Changed Druid to monotonically increase worker capacity based on total resources (#13581) (id: 29497)
- Changed the default max workers to cluster capacity in the Druid console and simplify live reports (#13577) (id: 29493)
- Changed how Druid behaves on a failed start. Druid no longer waits for a graceful shutdown (#13087) (id: 28902)
- Changed
chatAsync
default totrue
. This means that Druid uses asynchronous communication with Kafka and Kinesis for indexing tasks and ignores thechatThreads
parameter. (#13491) (id: 28901) - Changed segment allocation behavior:
- The max batch size for
SegmentAllocationQueue
is now 500 batchAllocationMaxWaitTime
is nowbatchAllocationWaitTime
to more accurately reflect the behavior since the wait time can actually exceed the configured value- For more information, see Batching
segmentAllocate
actions (#13503) (id: 28913)
- The max batch size for
- Changed logging behavior so queries that fail authorization checks are now logged. Previously, they were not (#13564) (id: 29489)
- Changed operators to a push style API (#13600) (id: 29482)
- Changed how Druid extends the FROM grammar from Calcite by using a template file for adding table functions grammar (#13553) (id: 29300)
- Changed task memory computation in the
start-druid
script (#13563) (id: 29297) - Changed MSQ to only look at
sqlInsertSegmentGranularity
on the outer query (#13537) (id: 29118) - Fixed error anchors in the Druid console (#13527) (id: 29101)
- Fixed scope of dependencies in
protobufextensions
pom (#13593) (id: 29488) - Fixed an issue where the preview in the Druid console stopped using the MSQ task engine when auto is selected for the Engine (#13586) (id: 29494)
- Fixed issue with Jetty graceful shutdown of data servers when
druid.serverview.type
is set tohttp
(#13499) (id: 28899) - Fixed issue with JDBC and query metrics causing query failures (#13608) (id: 29487)
- Fixed typo in metric name. The correct name is ingest/segments/count (#13521) (id: 28933)
- Fixed an issue where serialization of the LocalInputSource object converts relative file paths to absolute paths, changing the meaning of an MSQ ingest query (#13534) (id: 29491)
- Fixed an issue which cause broker parallel merge pool metrics to not be emitted. (#13420) (id: 28897)
- Improved JDBC lookup by quoting and escaping literals to allow reserved identifiers (#13632) (id: 29533)
- Improved nested column storage format for broader compatibility (#13568) (id: 29499)
- Improved the stage UI in the Druid console (#13615) (id: 29483)
- Improved MSQ table functions (#13360) (id: 29100)
- Improved the error message when
thetasketchintersect
is used on scalar expressions (#13508) (id: 28914) - Improved the
TooManyBuckets
error message for the MSQ task engine (#13525) (id: 29102) - Improved MSQ to track bytes read from an input source in the stage counters of an MSQ task report. (#13559) (id: 29502)
- Improved logging related to the SQL planner to validate response headers and include the cause (#13609) (id: 29478)
- Improved the Druid console to better show totals when grouping (#13631) (id: 29477)
- Improved error reporting in the Druid console (#13636) (id: 29476)
- Improved disk usage and make Historicals load segments more quickly:
- Added
druid.storage.zip
parameter for local storage (defaults tofalse
). This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update because the older code actually already handled unzipped directories being present on local deep storage - For more information, see Local (#13394) (id: 29161)
- Added
- Improved the Druid console: add support for arrayOfDoublesSketch, fix padding when aggregating in a table, and add syntax highlighting for window function keywords. (#13486) (id: 28912)
- Improved the Druid quickstart script to automate memory-parameter defaults (# 13365) (id: 26000)
- Upgraded to netty 4186Final (#13604) (id: 29484)
Platform changes
- Fixed an issue where an Imply Enterprise cluster on Kubernetes fails validation (id: 29204)
Upgrade and downgrade notes
Minimum supported version for rolling upgrade
See "Supported upgrade paths" in the Lifecycle Policy documentation.
UNNEST syntax
Starting with 2023.09 STS, the recommended syntax for SQL UNNEST has changed. We recommend using CROSS JOIN instead of commas for most queries to prevent issues with precedence. For example, use
SELECT column_alias_name1 FROM datasource CROSS JOIN UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...
Do not use:
SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...
SQL compatibility
Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.
For nulls, Druid now differentiates between an empty string (''
) and a record with no data as well as between an empty numerical record and 0
.
You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull
to true
.
For booleans, Druid now strictly uses 1
(true) or 0
(false). Previously, true and false could be represented either as true
or false
, respectively, as well as 1
or 0
. In addition, Druid now returns a null value for Boolean comparisions like True && NULL
.
You can revert to the previous behavior by setting druid.expressions.useStrictBooleans
to false
.
The following table illustrates some example scenarios and the impact of the changes:
Show the table
Query | 2023.08 STS and earlier | 2023.09 STS and later |
---|---|---|
Query empty string | Empty string ('' ) or null | Empty string ('' ) |
Query null string | Null or empty | Null |
COUNT(*) | All rows, including nulls | All rows, including nulls |
COUNT(column) | All rows excluding empty strings | All rows including empty strings but excluding nulls |
Expression 100 && 11 | 11 | 1 |
Expression 100 || 11 | 100 | 1 |
Null FLOAT/DOUBLE column | 0.0 | Null |
Null LONG column | 0 | Null |
Null __time column | 0, meaning 1970-01-01 00:00:00 UTC | 1970-01-01 00:00:00 UTC |
Null MVD column | '' | Null |
ARRAY | Null | Null |
COMPLEX | none | Null |
Update your queries
Before you upgrade from a version prior to 2023.09 to 2023.09 or later, update your queries to account for the changed behavior:
NULL filters
If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL
to s IS NULL OR s = ''
.
COUNT functions
COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column)
with COUNT(column) FILTER(WHERE column <> '')
.
GroupBy queries
GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.
Nested column format
Starting with 2023.09 STS, the default format for the json
type for nested columns has changed to be equivalent to the auto
type.
When upgrading from a previous version, you can continue to write nested columns in a backwards compatible format (version 4).
In a classic batch ingestion job, include formatVersion
in the dimensions
list of the dimensionsSpec
property. For example:
"dimensionsSpec": {
"dimensions": [
"product",
"department",
{
"type": "json",
"name": "shipTo",
"formatVersion": 4
}
]
},
To set the default nested column version, set the desired format version in the common runtime properties. For example:
druid.indexing.formats.nestedColumnFormatVersion=4
Stop supervisors that ingest from multiple Kafka topics before downgrading
If you have added supervisors that ingest from multiple Kafka topics in 2023.09 or later, stop those supervisors before downgrading to a version prior to 2023.09 because the supervisors will fail in versions prior to 2023.09.
Remove load rules for query from deep storage before downgrading
If you have added load rules that enable query from deep storage in 2023.08 or later, disable those load rules before downgrading to a version prior to 2023.08. Otherwise the Historicals on older versions will fail to start because they cannot process the latest load rules.
Avatica JDBC driver upgrade
The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.
If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection
.
Information schema now uses numeric column types
This is a breaking change introduced in 2023.03.
The Druid system table (INFORMATION_SCHEMA
) now uses SQL types instead of Druid types for columns. This change makes the INFORMATION_SCHEMA
table to behave more like standard SQL. You may need to update your queries in the following scenarios in order to avoid unexpected results if you depend either of the following:
- Numeric fields being treated as strings.
- Column numbering starting at 0. Column numbering is now 1-based.
#13777 (id: 31065)
Task directories for Druid
If you use the druid.indexer.task.baseTaskDirPaths
, that setting no longer works for versions 2023.05 and later. Use druid.worker.baseTaskDirs
instead.
druid.worker.baseTaskDirs
applies to SQL-based ingestion
Starting in 2023.06, multi-stage query (MSQ) tasks for SQL-based ingestion now honor the size you set for task directories. This change allows the MSQ task engine to sort more data at the cost of performance. If a task requires more storage than the size you set, data spills over to S3, which can have performance impacts.
To mitigate the performance impact, you can either increase the number of tasks or increase the size you set for druid.worker.baseTaskDirs
.
Removed property for setting max bytes for dimension lookup cache
Starting with 2023.08 STS, druid.processing.columnCache.sizeBytes
has been removed since it provided limited utility after a number of internal changes. Leaving this config is harmless, but it does nothing.
Removed Coordinator dynamic configs
Starting with 2023.08 STS, the following Coordinator dynamic configs have been removed:
emitBalancingStats
: Stats for errors encountered while balancing will always be emitted. Other debugging stats will not be emitted but can be logged by setting the appropriatedebugDimensions
.useBatchedSegmentSampler
andpercentOfSegmentsToConsiderPerMove
: Batched segment sampling is now the standard and will always be on.
Use the new smart segment loading mode instead.
Changed Coordinator config defaults
Starting with 2023.08 STS, the defaults for the following Coordinator dynamic configs have changed:
maxsegmentsInNodeLoadingQueue
: 500, previously 100maxSegmentsToMove
: 100, previously 5replicationThrottleLimit
: 500, previously 10
These new defaults can improve performance for most use cases.
Worker input bytes for SQL-based ingestion
Starting with 2023.08 STS, the maximum input bytes for each worker for SQL-based ingestion is now 512 MiB (previously 10 GiB).
Parameter execution changes for Kafka
When using the built-in FileConfigProvider
for Kafka, interpolations are now intercepted by the JsonConfigurator
instead of being passed down to the Kafka provider. This breaks existing deployments.
For more information, see KIP-297.
#13023
Deprecation notices
Some segment loading configs deprecated
Starting with 2023.08 STS, the following segment related configs are now deprecated and will be removed in future releases:
maxSegmentsInNodeLoadingQueue
maxSegmentsToMove
replicationThrottleLimit
useRoundRobinSegmentAssignment
replicantLifetime
maxNonPrimaryReplicantsToLoad
decommissioningMaxPercentOfMaxSegmentsToMove
Use smartSegmentLoading
mode instead, which calculates values for these variables automatically.
SysMonitor
support deprecated
Starting with 2023.08 STS, switch to OshiSysMonitor
as SysMonitor
is now deprecated and will be removed in future releases.
CrossTab view is deprecated
The CrossTab view feature is deprecated. It is replaced by Pivot 2.0, which incorporates the capabilities of CrossTab view.
End of support notices
Firehose ingestion
Support for firehose ingestion will be removed in Imply 2022.10, as well as the upcoming LTS release. Firehose has been deprecated since Druid version 0.17. You must transition your ingestion tasks to use inputSource
and ioConfig
before upgrading to 2022.10.
Hadoop 2
In 2023.09 STS and later, Imply no longer supports using Hadoop 2 with your Druid cluster. Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x for ingestion today. If migrating to Druid ingestion is not possible, plan to upgrade your Hadoop infrastructure.
GroupBy v1
In 2023.09 STS and later, the v1 legacy GroupBy engine has been removed. Use v2 instead, which has been the default GroupBy engine.
cachingCost segment balancing strategy removed
In 2023.09 STS and later, the cachingCost
strategy has been removed. Use an alternate segment balancing strategy instead, such as cost
.