Skip to main content

Imply Enterprise and Hybrid release notes

Read all release notes carefully, especially the Upgrade and downgrade notes, before upgrading. The following release notes provide information on features, improvements, and bug fixes up to Imply STS release 2023.09.

For information on the LTS release, see the LTS release notes.

If you are upgrading by more than one version, read the intermediate release notes too.

The following end-of-support dates apply in 2023:

  • On January 26, 2023, Imply 2021.01 LTS reached EOL. This means that the 2021.01 LTS release line will no longer receive any patches, including security updates. Imply recommends that you upgrade to the latest LTS or STS release.
  • On January 31, 2023, Imply 2022.01 LTS ended general support status and is eligible only for security support.

For more information, see Lifecycle Policy.

See Previous versions for information on older releases.

Imply evaluation

New to Imply? Get started with an Imply Hybrid (formerly Imply Cloud) Free Trial or start a self-hosted trial at Get started with Imply!

With Imply Hybrid, the Imply team manages your clusters in AWS, while you control the infrastructure and own the data. With self-hosted Imply, you can run Imply on *NIX systems in your own environment or cloud provider.

Imply Enterprise

If you run Imply Enterprise, see Imply product releases & downloads to access the Imply Enterprise distribution. When prompted, log on to Zendesk with your Imply customer credentials.

Changes in 2023.09

Druid highlights

Ingest from multiple Kafka topics to a single datasource

You can now ingest streaming data from multiple Kafka topics to a datasource using a single supervisor. You can configure the topics for the supervisor spec using a regex pattern as the value for topic in the IO config. If you add new topics to Kafka that match the regex, Druid automatically start ingesting from those new topics.

If you enable multi-topic ingestion for a datasource, downgrading will cause the supervisor to fail. For more information, see Stop supervisors that ingest from multiple Kafka topics before downgrading.

Hadoop 2 removed

Imply no longer supports using Hadoop 2 with your Druid cluster. Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x for ingestion today. If migrating to Druid's built-in ingestion is not possible, you must upgrade your Hadoop infrastructure to 3.x+ before upgrading to 2023.09.

Legacy GroupBy v1 removed

GroupBy v1 is a legacy engine and has not been supported since 2021. It's been removed in this release. Please use GroupBy v2 instead, which has been the default GroupBy engine for several releases. There should be no impact on your queries.

cachingCost strategy removed

The cachingCost strategy for segment loading has been removed. Use cost instead, which has the same benefits as cachingCost.

If you have cachingCost set, the system ignores this setting and automatically uses cost.

SQL changes

Strict booleans

By default, Druid now handles booleans strictly using 1 (true) or 0 (false). Previously, true and false could be represented either as true and false, respectively, as well as 1 and 0.

This change may impact your query results. For more information, see SQL compatibility in the upgrade notes.

Null handling changes

By default, Druid now differentiates between empty records, such as ' ', and null records. Previously, Druid might treat empty records as empty or as null.

This change may impact your query results. For more information, see SQL compatibility in the upgrade notes.

SQL planning and optimization

Druid uses Apache Calcite for SQL planning and optimization. The Calcite version been upgraded from 1.21 to 1.35.

As part of this upgrade, the recommended syntax for UNNEST has changed. We recommend using CROSS JOIN instead of commas for most queries to prevent issues with precedence:

SELECT column_alias_name1 FROM datasource CROSS JOIN UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...

For more information, see UNNEST syntax in the upgrade notes.

JSON and auto column indexer

For nested columns, the default format for the json type has changed to be equivalent to the auto type. This format improves support for nested arrays of strings, longs, and doubles. It also optimizes storage when there is no actual nested data processed.

The new format was introduced with type-aware schema discovery ("auto" type column schema and indexer) in 2023.04. It is not compatible with the following versions:

  • 2022.12 STS
  • 2023.01 STS
  • 2023.02 STS
  • 2023.03 STS
  • 2023.01 LTS

If you upgrade from one of these versions, you can continue to write nested columns in a backwards compatible format (version 4). For more information, see Nested column format in the upgrade notes.

Broker parallel merge config options

The paths fordruid.processing.merge.pool.* and druid.processing.merge.task.* have been flattened to use druid.processing.merge.* instead. The previous paths for the configs are now deprecated but will continue to work in 2023.09. Migrate your settings to use the new paths because the old paths will be ignored in the future.

Guava upgrade

The version of Guava that Druid uses has been upgraded from 16.0.1 to 31.1-jre to address bug fixes and security issues. If you use an extension that has a transitive Guava dependency from Druid, it may be impacted.

The extensions that Imply packages and enables with our releases have accounted for this change.

If you have an extension not provided by Imply that uses Guava, you must upgrade the Guava version within the extension and rebuild your extension. Rolling upgrades will only be possible if you unload your extension before upgrading the cluster and reloading the extension with the newer version of Guava.

Other changes

Pivot changes

  • Improved alert error handling so that Pivot checks the alert's owner for the SeeErrorMessages permission before sending a webhook request (id: 37184)
  • Improved the alignment of values in the table visualization (id: 36903)
  • Improved relative time filtersthey are now inclusive of the lower bound and exclusive of the upper bound (id: 37119)
  • Fixed the absence of a limit on stack area queries in Pivot 2 (id: 36530)
  • Fixed a problem with updating data cube refresh rate (id: 36071)
  • Fixed "cannot read properties of undefined" error when an alert query results in an empty data set (id: 35914)

Druid changes

  • Added topic name as a column in the Kafka input format (#14857) (id: 36885)
  • Added support to ingest from multiple Kafka topics into a single datasource (id: 600)
  • Added Kafka topic column controls (#14865) (id: 36884)
  • Added sampling factor for DeterminePartitionsJob (#13840) (id: 31009)
  • Added test and metrics for KillStalePendingSegments duty (#14951) (id: 37357)
  • Added a configurable buffer period between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599) (id: 36770)
  • Added support for broadcast segments (#14789) (id: 36614)
  • Added lifecycle hooks to KubernetesTaskRunner (#14790) (id: 36508)
  • Added index filtering on Coordinator service to reduce the log size in Datadog (id: 36329)
  • Added grace period for auto-kill based on when a segment is marked unused (id: 35404)
  • Added new method for SqlStatementResource and SqlTaskResource to set request attribute (#14878) (id: 37096)
  • Added brush to timechart in web console (#14929) (id: 37225)
  • Added format notice for CSV and TSV in web console (#14783) (id: 36790)
  • Added dynamic query parameters UI to the web console (#14921) (id: 37226)
  • Added format selection for download in web console (#14794) (id: 36615)
  • Re-added the v4 writers and adding a system config in JSON and auto indexers (id: 37334)
  • Consolidated JSON and auto indexers remove v4 nested column serializer (#14456) (id: 36967)
  • Deprecated configmagic in favor of JSON configuration (#14695) (id: 36781)
  • Disabled cachingCost balancer strategy (#14798) (id: 36788)
  • Enabled SQL-compatible null handling mode by default (#14792) (id: 36913)
  • Enabled Kafka multi-topic ingestion from the data loader in the web console (#14833) (id: 36779)
  • Exposed new Coordinator properties in the dialog in the web console (#14791) (id: 36772)
  • Fixed bug in KillStalePendingSegments (#14961) (id: 37445)
  • Fixed StringLastAggregatorFactory equalstoString (#14907) (id: 37371)
  • Fixed bug in computed value of balancerComputeThreads (#14947) (id: 37308)
  • Fixed a mapping issue with the "others" field in line charts in the web console (#14931) (id: 37224)
  • Fixed an error caused by datatype mismatch in numeric latest aggregations (id: 37127)
  • Fixed latest aggregation for null in time selector (id: 37085)
  • Fixed aggregation filter expression processing without projection (#14893) (id: 36956)
  • Fixed error messages relating to OVERWRITE keyword (#14870) (id: 36932)
  • Fixed MSQ select query failing with RuntimeException if REPLACE run after INSERT (id: 36226)
  • Fixed an issue with scaling code repeatedly resubmitting the same supervisor spec for idle supervisors (id: 36019)
  • Fixed results when useGroupingSetForExactDistinct is set to true (id: 33088)
  • Fixed SimpleChannelUpstreamHandler exception (id: 33059)
  • Fixed UNNEST query with 'not between' filter returning wrong result (id: 32054)
  • Fixed "is null" failing to find unnested null values coming from null rows (id: 32042)
  • Fixed UNNEST query with where <unnested column> not in (value list) returning empty result set (id: 31861)
  • Fixed pushing not filter into base during UNNEST causes incorrect result (id: 36476)
  • Fixed latest Vectorization throwing exception with expression in time (id: 36269)
  • Fixed a bug in QosFilter (#14859) (id: 36870)
  • Fixed several issues and SQL query reformatting in web console (#14906) (id: 37090)
  • Fixed a bug in result count in the web console (#14786) (id: 36509)
  • Improved exception message when DruidLeaderClient doesn't find leader node (#14775) (id: 36618)
  • Improved speed of SQLMetadataStorageActionHandlerTest (#14856) (id: 36776)
  • Improved streaming ingestion completion timeout error message (#14636) (id: 36925)
  • Improved incremental compilation (#14860) (id: 37041)
  • Improved clarity of retention dialog in the web console (#14793) (id: 36616)
  • Increased the computed value of replicationThrottleLimit (#14913) (id: 37109)
  • Improved helper queries by allowing for running inline helper queries in the web console (#14801) (id: 36778)
  • Moved some lifecycle management from doTask shutdown for the middle manager-less task runner (#14895) (id: 37081)
  • Moved UpdateCoordinatorStateAndPrepareCluster duty out of the Coordinator (#14845) (id: 36938)
  • Reduced Coordinator logs in normal operation (#14926) (id: 37173)
  • Removed DruidAggregateCaseToFilterRule (#14940) (id: 37271)
  • Removed deprecated Coordinator dynamic configurations (#14923) (id: 37220)
  • Removed config druid.coordinator.compaction.skipLockedIntervals (#14807) (id: 36801)
  • Removed groupby v1 (#14866) (id: 37028)
  • Removed segmentsToBeDropped from SegmentTransactionInsertAction (#14883) (id: 36886)
  • Removed support for Hadoop 2 (#14763) (id: 36510)
  • Replaced BaseLongVectorValueSelector with VectorValueSelector for StringFirstAggregatorFactoryfactorizeVector (#14957) (id: 37384)
  • Reset offsets supervisor API (#14772) (id: 36773)
  • Reset to specific offsets dialog for the Web console (#14863) (id: 36771)
  • Updated druidexpressionsuseStrictBooleans default to true (#14734) (id: 36916)
  • Updated Coordinator dynamic maxSegmentsToMove based on cluster skew under smartSegmentLoading (id: 35427)
  • Updated task view to show execution dialog (#14930) (id: 37223)
  • Updated ServiceMetricEventBuilder (#14933) (id: 37221)
  • Updated filtersmd (#14917) (id: 37135)
  • Updated EARLIEST, EARLIESTBY, LATEST, LATEST_BY for STRING columns to make maxStringBytes optional (#14848) (id: 36986) Updated InvalidNullByteException to include the output column name (#14780) (id: 37035)
  • Updated Coordinator to use separate executor for each Coordinator duty group (#14869) (id: 36874)
  • Updated balancerComputeThreads to use number of cores (#14902) (id: 37067)
  • Upgraded Druid's Calcite dependency to the latest stable version (id: 26962)
  • Upgraded comibmicu:icu4j from 551 to 732 (#14853) (id: 36809)
  • Upgraded orgapacherat:apacheratplugin from 012 to 015 (#14817) (id: 36800)
  • Upgraded orgapachemavenplugins:mavensurefireplugin (#14813) (id: 36799)
  • Upgraded comgithuboshi:oshicore from 642 to 644 (#14814) (id: 36798)
  • Upgraded orgscalalang:scalalibrary from 2139 to 21311 (#14826) (id: 36797)
  • Upgraded orgapachemavenplugins:mavensourceplugin from 221 to 330 (#14812) (id: 36796)
  • Upgraded orgassertj:assertjcore from 3190 to 3242 (#14815) (id: 36795)
  • Upgraded dropwizardmetricsversion from 400 to 4219 (#14824) (id: 36793)
  • Upgraded protobufversion from 3217 to 3240 (#14823) (id: 36792)
  • Upgraded guava version to 311jre (#14767) (id: 36917)
  • Upgraded orgapachecommons:commonscompress from 121 to 1230 (#14820) (id: 36789)
  • Upgraded apachecuratorversion from 540 to 550 (#14843) (id: 36786)
  • Upgraded orgtukaani:xz from 18 to 19 (#14839) (id: 36785)
  • Upgraded commonscli:commonscli from 131 to 150 (#14837) (id: 36784)
  • Upgraded iodropwizardmetrics:metricsgraphite from 312 to 4219 (#14842) (id: 36783)
  • Upgraded orgapachedirectoryapi:apiutil from 103 to 213 (#14852) (id: 36775)
  • Upgraded jodatime from 2124 to 2125 (#14855) (id: 36774)
  • Upgraded jacksondatabind to 2127 (#14770) (id: 36430)
  • Upgraded Postgresql from 4241 to 4260 (#13959) (id: 36613)
  • Updated post filters and filters for UNNEST to include only the subset not pushed to base (id: 37253)
  • Security issues (id: 37039)

Platform changes

  • Added notification during Imply GKE installation if an operation will cause resources to be deleted (id: 16572)
  • Allowed specifying GKE CIDR ranges when using TF directly (id: 37209)
  • Supported custom versions in GKE Enhanced (id: 36507)

Clarity changes

  • Fixed error "cannot read properties of undefined" error when an alert query results in an empty data set (id: 35914)

AWS Cloud Manager changes

  • Added support for addition ARM instances to Imply Hybrid (id: 29972)

Changes in 2023.08

Druid highlights

Explore view in Druid console

The Explore view is a simple, stateless, SQL backed, data exploration view to the web console. It lets users explore data in Druid with point-and-click interaction and visualizations (instead of writing SQL and looking at a table). This can provide faster time-to-value for a user new to Druid and can allow a Druid veteran to quickly chart some data that they care about.

The Explore view is accessible from the More (...) menu in the header:

Query from deep storage (alpha)

Druid now supports querying segments that are stored only in deep storage. When you query from deep storage, you can query larger data available for queries without necessarily having to scale your Historical processes to accommodate more data. To take advantage of the potential storage savings, make sure you configure your load rules to not load all your segments onto Historical processes.

Note that at least one segment of a datasource must be loaded onto a Historical process so that the Broker can plan the query. It can be any segment though.

For more information, see the following:

Schema auto-discovery and array column types

Type-aware schema auto-discovery is now generally available. Druid can determine the schema for the data you ingest rather than you having to manually define the schema.

As part of the type-aware schema discovery improvements, array column types are now generally available. Druid can determine the column types for your schema and assign them to these array column types when you ingest data using type-aware schema auto-discovery with the auto column type.

For more information, see Type-aware schema discovery.

Smart segment loading

The Coordinator is now much more stable and user-friendly. In the new smartSegmentLoading mode, it dynamically computes values for several configs which maximize performance.

The Coordinator can now prioritize load of more recent segments and segments that are completely unavailable over load of segments that already have some replicas loaded in the cluster. It can also re-evaluate decisions taken in previous runs and cancel operations that are not needed anymore. Moreover, move operations started by segment balancing do not compete with the load of unavailable segments thus reducing the reaction time for changes in the cluster and speeding up segment assignment decisions.

Additionally, leadership changes have less impact now, and the Coordinator doesn't get stuck even if re-election happens while a Coordinator run is in progress.

Lastly, the cost balancer strategy performs much better now and is capable of moving more segments in a single Coordinator run. These improvements were made by borrowing ideas from the cachingCost strategy. We recommend using cost instead since cachingCost is now deprecated.

For more information, see:

New query filters

Druid now supports the following filters:

  • Equality: Use in place of the selector filter. It never matches null values.
  • Null: Match null values. Use in place of the selector filter.
  • Range: Filter on ranges of dimension values. Use in place of the bound filter. It never matches null values

Note that Druid's SQL planner uses these new filters in place of their older counterparts by default whenever druid.generic.useDefaultValueForNull=false or if sqlUseBoundAndSelectors is set to false on the SQL query context.

You can use these filters for filtering equality and ranges on ARRAY columns instead of only strings with the previous selector and bound filters.

For more information, see Query filters.

Guardrail for subquery results (alpha)

Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryBytes in the Broker's config or maxSubqueryBytes in the query context. This guardrail is recommended over row-based limiting.

This feature is experimental for now and defaults back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.

Added a new OSHI system monitor

Added a new OSHI system monitor (OshiSysMonitor) to replace SysMonitor. The new monitor has a wider support for different machine architectures including ARM instances. Switch to the new monitor. SysMonitor is now deprecated and will be removed in future releases.

Java 17 support

Druid now fully supports Java 17. Note that this support is specifically for Druid, not Imply's other offerings.

Pivot highlights

Improvements to alerts:

  • Updated the alert payload to include the alert query.
  • Added data cube properties Minimum alert frequency and Minimum alert timeframe: these allow a user to prohibit alerts with a specified frequency on a data cube. See Managing data cubes for more information.

Other changes

Pivot changes

  • Updated alerts to include the query in the alert payload (id: 31383)
  • Updated alerts to allow users to prohibit alerts of a particular frequency (id: 34616)
  • Updated alerts and reports to only display UI error notifications and send error notification emails to users with the SeeErrorMessages permission (id: 35160)
  • Updated the permissions associated with importing settings, data cubes, and dashboards (id: 34962)
  • Fixed inability to preview a dimension change without the CreateDataCube permission (id: 35512)
  • Fixed filtering on a non-bucketed numeric dimension causing a partial query error (id: 35329)
  • Fixed a problem that prevented alerts created in on a Pivot classic data cube from working in Pivot 2 (id: 35255)
  • Fixed incorrect error code being generated when a data cube encounters a SQL parse exceptioncorrect 400 error now displays to user (id: 35188)
  • Fixed a problem with downloading null values in data cubes (id: 34859)

Druid changes

  • Added new filters that replace existing filters:
    • Use the new equality and null filters instead of selector filters
    • Use the new range filter instead of the bound filter
    • (#14542) (#14612) (id: 35060)
  • Added IP_COMPARE function (id: 35700)
  • Added ZooKeeper connection state alerts and metrics (#14333) (id: 35454)
  • Added support for smartSegmentLoading (#14610) (id: 35696)
  • Added Explore view (#14602) (id: 35877)
  • Added frames support for string arrays that are null (#14653) (id: 36042)
  • Added durable storage selector to the Druid console (#14669) (id: 36057)
  • Added support for query from deep storage to the web console (id: 34342)
  • Added metric (compact/segmentAnalyzer/fetchAndProcessMillis) to report time spent fetching and analyzing segments (#14752) (id: 36335)(id: 36312)
  • Added new filters to unnest filter pushdown (#14777) (id: 36345)
  • Added new dimensions for serviceheartbeat (#14743) (id: 36284)
  • Added a shortcut menu for COMPLEX<?> types (#14668) (id: 36204)
  • Added ability to download pages of results from the new async APIs to the web console (#14712) (id: 36201)
  • Added serviceheartbeat metric into statsd-reporter (#14564) (id: 35414)
  • Added support for earliest aggregatorMergeStrategy (#14598) (id: 35669)
  • Added log statements for tmpStorageBytes in MSQ (#14449) (id: 35372)
  • Added task toolbox to DruidInputSource (#14507)(id: 35304)
  • Changed kill tasks to use bulk file delete API from S3 (id: 32746)
  • Changed the default format from OBJECT to OBJECTLINES (#14700) (id: 36085)
  • Changed default handoffConditionTimeout to 15 minutes (#14539) (id: 35476)
  • Enabled ServiceStatusMonitor in the example configurations (#14744) (id: 36228)
  • Enabled result level cache for GroupByStrategyV2 on Broker (#11595) (id: 35428)
  • Enabled leader dimension in service/heartbeat metric into statsd-reporter (#14593) (id: 35537)
  • Fixed a bug where json_value() filter fails to find a scalar value match on a column with mixed scalar values and arrays (id: 35922)
  • Fixed a bug where the Druid auto column indexer failed for certain mixed type arrays (#14710) (id: 36097)
  • Fixed the response when a task ID is not found in the Overlord process (#14706) (id: 36086)
  • Fixed table filters not working in the web console when grouping is enabled (#14668) (id: 36204)
  • Fixed an issue with Hadoop ingestion by adding the PropertyNamingStrategies from a compatible jackson-databind version (#14671) (id: 36205)
  • Fixed a bug for SegmentLoadDropHandler (#14670) (id: 35989)
  • Fixed a bug in getIndexInfo for MySQL (#14750) (id: 36274)
  • Fixed an issue where a JSON error caused the Next button to be greyed out while typing JSON in the web console (#14712) (id: 36201)
  • Fixed a bug that occurred when the return type is STRING but is coming from a top level array typed column instead ofa nested array column. (#14729) (id: 36133)
  • Fixed a bug introduced by #11201 (#14544) (id: 35363)
  • Fixed a resource leak with Window processing (#14573) (id: 35456)
  • Fixed two bugs in the web console: service view filtering not working and the data loader for SQL-based ingestion did not pick the best time column available (#14597) (id: 35586)
  • Fixed a bug in the Coordinator to ensure that replication factor is reported correctly for async segments.(#14701) (id: 36060)
  • Fixed maxCompletedTasks parameter in OverlordClientImpl (#14667) (id: 35984)
  • Fixed an NPE that occurs during ingestion due to datasketches 4.0.0 (#14568) (id: 35378)
  • Fixed a bug in the web console where the cursor jumped to the end or a field failed to display characters (#14632) (id: 35787)
  • Fixed a bug where Select ... where json_keys() is null returns wrong result (id: 22179)
  • Fixed issues with equality and range filters matching double values to long typed inputs (#14654) (id: 36027)
  • Fixed an NPE that the StringLast aggregation throws when vectorization is enabled (id: 35868)
  • Fixed time unit for handoff in CoordinatorBasedSegmentHandoffNotifier (#14640) (id: 35815)
  • Fixed boolean segment filters (#14622) (id: 35701)
  • Fixed a null pointer for rows and column stats information (#14617) (id: 35668)
  • Improved description field when emitting metric for broadcast failure (#14703) (id: 36101)
  • Improved performance of topN queries by minimizing PostAggregator computations (#14708) (id: 36253)
  • Improved alert message for segment assignments (#14696) (id: 36203)
  • Improved MSQ to handle a race condition that occurs when postCounters is in flight and the Controller goes offline (#14707) (id: 36200)
  • Improved heap footprint of ingesting auto typed columns by pushing compression and index generation into writeTo (#14615) (id: 35703)
  • Improved performance when extracting files from input sources (#14677) (id: 36047)
  • Improved the core API required for Iceberg extension (#14614) (id: 35876)
  • Improved heap footprint of GenericIndexed (#14563) (id: 35443)
  • Improved SQL statement API error messages (#14629) (id: 35733)
  • Increased heap size for router (#14699) (id: 36098)
  • Improved SQL planning logic to simplify bounds/range versus selectors/equality (#14619) (id: 35704)
  • Improved the schema discovery description in the web console (#14601) (id: 35589)
  • Improved Kubernetes performance by storing get task location on the lifecycle object (#14649) (id: 35874)
  • Improved query behavior to reserve threads for nonquery requests without using laning (#14576) (id: 35706)
  • Improved the performance impact of segment deletion on a cluster (#14642) (id: 35819)
  • Improved segment deletion performance by using batching (#14639) (id: 35808)
  • Improvements to the web console:
    • pages information for SQL statements API,
    • Empty tiered replicants
    • Interactive APIs for MSQ task engine
    • Replication factor column for the metadata table
    • Data format and compression in MSQ task assignment now accounted for
    • Improved errors
    • UI for dynamic compaction
    • Better autofresh behavior
    • Fixed a bug with the data loader
    • Fixed a bug with counter misalignment in MSQ input counters
    • (#14540) (id: 35536)
  • Improved the error code of InsertTimeOutOfBoundsFault to be consistent with others (#14495) (id: 35472)
  • Improved worker generation (#14546) (id: 35468)
  • Removed chatAsync parameter, so chat is always async (#14692) (id: 36099)
  • Removed the deprecated InsertCannotOrderByDescending MSQ fault (#14588) (id: 35539)
  • Updated tough-cookie from 4.0.0 to 4.1.3 in the web console (#14557) (id: 35401)
  • Updated core Apache Kafka dependencies to 3.5.1 (#14721) (id: 36104)
  • Updated the orgmozilla:rhino dependency (#14765) (id: 36339)
  • Updated decode-uri-component from 0.2.0 to 0.2.2 in webconsole (#13481) (id: 35963)
  • Updated org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.3 (#14641) (id: 35931)
  • Updated version in Iceberg POM (#14605) (id: 35672)

Platform changes

  • Added ingressClassName support to ingresses in Helm (id: 35954)

Clarity changes

  • Improved SSO Clarity landing page when trying to use an invalid account (id: 35199)

Changes in 2023.07

Druid highlights

Time series functions (alpha)

Added support for time series functions. You can use time series functions to analyze time series data, identify trends and seasonality, interpolate values, and load extra time periods to fill in boundary values. Time series functions are disabled by default. Enable this feature by loading the imply-timeseries extension. See Time series functions for more information.

Documentation changes

The Druid API reference docs are now collected under an API reference section and is now organized by function.

Pivot highlights

Time series visualization (alpha)

You can now use time series functions to generate a line or bar chart showing the rate of change in your data. You must load the imply-timeseries extension and enable the SDK based visualizations feature flag before you can use this feature. See Time series visualization for more information.

Pivot improvements:

  • You can now apply date formats to time dimensions. See Time dimensions for more information.
  • You can now control the precision of TopN and COUNT DISTINCT queries by setting the new Query precision property in a data cube. See Managing data cubes for more information.
  • You can now override the default 40 second query timeout by setting the new Query timeout override property in a data cube. See Managing data cubes for more information.

Other changes in 2023.07

Pivot changes

  • Added time series visualization (id: 33035)
  • Added the ability to apply date formats to time dimensions (id: 28776)
  • Added query precision property to data cubes (id: 33500)
  • Added an optional override for data cube 40 second query timeout (id: 33229)
  • Added loading symbol to data cube display when measures and dimensions are loading, and improved speed of attribute display (id: 32861)
  • Modified the alert occurrence list UI to emphasize the time frame of the occurrence over the time that the alert was triggered (id: 31799)
  • Fixed Pivot attempting to validate custom time dimension bucketing as a number when dimension is not named __time (id: 35078)
  • Fixed alert preview incorrectly showing that an alert always triggers regardless of data and conditions (id: 35007)
  • Fixed street map dashboard tile resetting latitude and longitude granularity when panning and zooming (id: 34932)
  • Fixed dynamic time filter clause evaluating incorrectly when maxTime is set to midnight (id: 34453)
  • Fixed Geo Marks visualization appearing in dashboard tile when Geo Shade was selected (id: 29639)
  • Fixed empty download file for data cubes with PiiMask (id: 34326)

Druid changes

  • Added ability for SQL-based ingestion to write select results to durable storage (#14527) (id: 35343)
  • Added full support for Java 17 (#14384) (id: 35341)
  • Added stringEncoding parameter to DataSketches HLL (#11201) (id: 35177)
  • Added the file mapper to handle v2 buffer deserialization (#14429) (id: 34677)
  • Added ability to enable cold-tier per datasources based on time interval (id: 31185)
  • Removed unused coordinator dynamic configurations (#14524) (id: 35279)
  • Removed druidprocessingcolumnCachesizeBytes and CachingIndexed combine string column implementations (#14500) (id: 35191)
  • Fixed bug that occurred during HttpServerInventoryView initialization (#14517) (id: 35278)
  • Fixed incorrect error code on SQL-based ingestion query (id: 35267)
  • Fixed NPE in datasketch (id: 35223)
  • Fixed SortMergeJoinFrameProcessor buffering bugs (#14196) (id: 35189)
  • Fixed compatibility issue with SqlTaskResource (#14466) (id: 34993)
  • Fixed JSON_VALUE expression returning null instead of an array (id: 34833)
  • Fixed null handling in DruidCoordinatorgetReplicationFactor (#14447) (id: 34754)
  • Fixed ingestion failing with mixed empty array and object in an array (id: 34681)
  • Fixed double synchronize on simple map operations (#14435) (id: 34732)
  • Fixed query planning failure if a CLUSTERED BY column contains descending order (#14436) (id: 34729)
  • Fixed broker parallel merge to help managed blocking performance (#14427) (id: 34676)
  • Fixed Kafka input format reader schema discovery and partial schema discovery (#14421) (id: 34675)
  • Fixed queries not responding and "Sequence iterator timed out waiting for data" error in the logs (id: 34647)
  • Fixed HttpServerInventoryView initialization delayed when server disappears (id: 33770)
  • Fixed sortMerge query returning error: "SQL requires a join with 'INPUT_REF' condition that is not supported." (id: 32366)
  • Fixed emitting negative lag metrics when there are Kafka connection issues (id: 32349)
  • Fixed incorrect filtering on a column from an external datasource if named __time. (#14336) (id: 35129)
  • Fixed a problem with S3-compatible implementations (#14290) (id: 35035)
  • Improved the ingestion view by splitting it into two views: Supervisors and Tasks (#14395) (id: 34680)
  • Improved task update handling in task queue by using separate executor (#14533) (id: 35344)
  • Improved IntervalIterator (#14530) (id: 35280)
  • Improved queries by setting explain attributes after the query is prepared (#14490) (id: 35318)
  • Improved coerce exceptions by logging the field name (#14483) (id: 35193)
  • Improved InsertTimeOutOfBounds error message in SQL-based ingestion (#14511) (id: 35186)
  • Improved segment loading (id: 23355)
  • Improved subquery guardrail so that it obeys memory limit (id: 13296)
  • Improved SQL OperatorConversions: IntroduceaggregatorBuilder to allow CASTasliteral (#14249) (id: 35033)
  • Improved default clusterStatisticsMergeMode by making it sequential (#14310) (id: 35031)
  • Improved EXPLAIN PLAN attributes (#14441) (id: 35030)
  • Improved CostBalancerStrategy by deprecating cachingCost (#14484) (id: 35027)
  • Improved error messaging for coercion errors (id: 34717)
  • Improved visibility into SegmentMetadataCache (id: 33768)
  • Improved visibility into ChangeRequestHttpSyncer (id: 33767)
  • Improved the getTasks API by creating an additional index on the task table (id: 34802)
  • Improved logical planning and native query generation by decoupling them in SQL planning (#14232) (id: 34750)
  • Improved Kafka supervisors by making them quieter in all bundled log4j2xml (#14444) (id: 34738)
  • Improved handling of mixed type arrays by allowing expression best efforts determination (#14438) (id: 34698)
  • Improved variance SQL aggregate function by supporting complex variance object inputs (#14463) (id: 35103)
  • Upgraded Hadoop to version 336 (#14489) (id: 35064)
  • Upgraded Avro to the latest version (id: 34697)

Clarity changes

  • Added dimension 'description' to Raw Metrics data cube (id: 35241)
  • Fixed Clarity UI attempting to validate custom time dimension bucketing as a number when dimension not named __time (id: 35078)

Changes in 2023.06.2

Druid changes

  • Fixed an NPE with the DataSketches aggregator (id: 35283)

Pivot changes

  • Fixed alerts to properly evaluate All conditions (id: 35142)
  • Fixed an issue where an alert preview incorrectly showed that the alert would always trigger regardless of data and conditions (id: 35007)

Changes in 2023.06.1

Platform changes

  • Security updates (id: 34192)

Changes in 2023.06

Druid highlights

druid.worker.baseTaskDirs now applies to SQL-based ingestion

Starting in 2023.06, multi-stage query (MSQ) tasks for SQL-based ingestion now honor the size you set for task directories. This change allows the MSQ task engine to sort more data at the cost of performance. If a task requires more storage than the size you set, data spills over to S3, which can have performance impacts.

To mitigate the performance impact, you can either increase the number of tasks or increase the size you set for druid.worker.baseTaskDirs.

Changed Coordinator config values

The following Coordinator dynamic configs have new default values:

  • maxsegmentsInNodeLoadingQueue: 500, previously 100
  • maxSegmentsToMove: 100, previously 5
  • replicationThrottleLimit: 500, previously 10

These new defaults can improve performance for most use cases.

(#14269) (id: 34131)

Other changes in 2023.06

Pivot changes

  • Added checkFrequency and timeFrame properties to the alert payload (id: 30543)
  • Moved the primary time dimension to the general tab in data cube properties (id: 32557)
  • Fixed table visualization with duplicate measures rendering values in incorrect sort order (id: 34546)
  • Fixed data cube not updating default filter when its dimensions change (id: 34395)
  • Fixed incorrect redirect after OIDC authentication (id: 34263)
  • Fixed "partial query error" appearing when refreshing or loading Pivot to display multiple measures at once (id: 33502)
  • Fixed download modal hanging indefinitely when network errors occur (id: 33501)
  • Fixed measure filters not working properly in visualizations when time is the only added dimension (id: 33477)
  • Fixed incorrect auto-filled dimension expression when creating a new data cube (id: 33288)
  • Fixed no data in dashboard tiles until a global filter is applied (id: 32742)
  • Fixed dashboard tiles stuck loading indefinitely while axis and facet queries constantly execute (id: 32741)
  • Fixed ability to add the same measure multiple times by expanding the show bar (id: 30397)
  • Fixed NULL value appearing twice in filter menu (id: 29738)
  • Fixed filtering stack area on two time ranges producing a crash (id: 26619)

Druid changes

  • Added more statement attributes to explain plan result (#14391) (id: 34502)
  • Adjust broker parallel merge to help managed blocking be more well behaved (#14427)
  • Revert "Added method to authorize native query using authentication result" to prevent noisy native query logs (#14376) (id: 34495)
  • Added logs for deleting files using storage connector (#14350) (id: 34491)
  • Added NullHandling module initialization for LookupDimensionSpecTest (#14393) (id: 34444)
  • Added configurable ColumnTypeMergePolicy to SegmentMetadataCache (#14319) (id: 34080)
  • Added TYPENAME to the complex serde classes and replaced the hardcoded names (#14317) (id: 34012)
  • Added ability to load segments on Peons (#14239) (id: 33607)
  • Added OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235) (id: 33595)
  • Added arraytomv function to convert arrays into multi-value dimensions (#14236) (id: 33514)
  • Added context flag useAutoColumnSchemas to use new auto types for MSQ segment generation (#14175) (id: 33499)
  • Fixed bug with sparse 'auto' column leading to ingestion failure (id: 34410)
  • Fixed an NPE that happens when blocked threads and workerPool fail to execute (#14426)
  • Fixed InsertCannotAllocateSegment not reported when batch segment allocation is in use (id: 34086)
  • Fixed log streaming (#14285) (id: 34013)
  • Fixed excessive alerts for "Request did not have an authorization check performed" (id: 33755)
  • Fixed issue with launching Kubernetes jobs (#14282) (id: 33753)
  • Fixed SegmentAnalyzer to be more resilient and prefer to use 'error' flag and messages on ColumnAnalysis rather than exploding (#14296) (id: 33689)
  • Fixed issues with handling arrays with all null elements and arrays of booleans in strict mode (#14297) (id: 33684)
  • Fixed an issue with filtering on a single dimension by converting In (#14277) (id: 33633)
  • Fixed issue with MSQ rollup ingestion and aggregators with multiple names (#14367) (id: 34382)
  • Fixed EARLIESTBYLATESTBY signature and included function name in signature (#14352) (id: 34380)
  • Fixed MSQ NPE exception in controller logs for QueryNotSupported error (id: 34197)
  • Fixed intermittent MSQ task failure due to inability to copy error (id: 33616)
  • Fixed segment metadata queries for auto ingested columns that had all null values (#14262) (id: 33561)
  • Fixed joins not optimizing filter correctly (id: 33430)
  • Fixed expr getCacheKey implementations do not delegate (id: 32638)
  • Fixed balancing thread stuck waiting for futures to resolve (id: 30425)
  • Improved HLL sketch and Theta sketch estimates so that they can now be used as an expression (#14312) (id: 34391)
  • Improved HLLSKetchPostAggregator so that it can now be used as an expression (id: 33653)
  • Improved Broker logs by changing what gets logged (#14368) (id: 34315)
  • Improved exception handling to include all types of exceptions when initializing input source in sampler API (#14355) (id: 34276)
  • Improved Druid datasource to datasource ingestion by filtering out tombstone segments (id: 34096)
  • Improved EXPLAIN PLAN to return RESOURCES as an ordered collection (#14323) (id: 34011)
  • Improved merged counters to only show when they are non-zero (#14311) (id: 33766)
  • Improved concurrency to not cancel already running workflows on a branch even when a new commit is pushed (#14279) (id: 33635)
  • Improved task queue in Kubernetes task runner when capacity is fully utilized (#14156) (id: 33581)
  • Improved error when user attempts to set null retention rules (id: 33057)
  • Improved web console time format to account for auto-allowing for leading and trailing spaces (#14224) (id: 33352)
  • Improved parent compaction task by limiting number of retries made while submitting a sub-task (id: 31208)
  • Removed context paramaters from React components (#14366) (id: 34383)
  • Removed incorrect optimization (#14246) (id: 33506)
  • Removed AbstractIndex (#14388) (id: 34445)
  • Updated operations per run (#14325) (id: 34113)
  • Updated web console DQT to latest version and fixed bigint crash (#14318) (id: 34098)
  • Updated heap size of coordinator overlord services in Docker IT environment (#14214) (id: 33594)
  • Upgraded the React dependency to v18 (#14380) (id: 34472)

Changes in 2023.05

Druid highlights

Schema auto-discovery (beta)

You can now declare a partial schema for type-aware schema discovery. Previously, the schema had to be fully declared or omitted. (#14076) (id: 32605)

MSQ task engine

Added support for querying lookup and inline data directly for the MSQ task engine (#14048) (id: 32663)

Hadoop 3

Starting with 2023.05 STS, the Imply distribution is compatible with Hadoop 3 by default. If you need a Hadoop 2 compatible build, contact Imply Support.

Pivot highlights

Pivot alert improvements:

  • You can now configure alerts to send notification emails to users outside Pivot.
  • The Check every and Timeframe properties now appear in the alert payload.

Other Pivot improvements:

  • You can now use the Pivot server configuration property userNameAuthority to determine whether Pivot or OIDC populates name fields in Pivot. See Pivot server config for details.

Other changes in 2023.05

Pivot changes

  • Added ability to send alert emails to users outside Pivot (id: 31178)
  • Added Check every and Timeframe properties to alert payload (id: 30543)
  • Added Pivot server configuration property userNameAuthority that determines whether Pivot or OIDC populates name fields in Pivot (id: 31971)
  • Fixed Pivot auto-creating dimensions for ARRAY columns during data cube creation. ARRAY data is not supported in Pivot (id: 33265)
  • Fixed null value in time dimension producing an error and preventing visualization (id: 32068)
  • Fixed dimension preview using an alias instead of a formula for string filters (id: 33126)
  • Fixed date range missing on bubble modal on the horizontal bars view, when doing measure compare (id: 32850)
  • Fixed no data showing in dashboard tiles until a global filter is applied (id: 32742)
  • Fixed dashboard tiles stuck indefinitely loading while axis and facet queries constantly execute (id: 32741)
  • Fixed unsupported time range error after converting a dashboard (id: 32740)
  • Fixed non-additive measures not working in Treemap visualizations (id: 32737)
  • Fixed inability to add complex comparisons from show bar (id: 32711)
  • Fixed dropping a dimension in Pivot 2 doesn't replace all existing splits (id: 32697)
  • Fixed opening a data cube created in SQL mode from home view fails when Pivot 2 feature flag is turned off (id: 32666)
  • Fixed numeric filter with bucketing set to "never bucket" creates duplicate items (id: 32067) and crashes on searching (id: 32056)
  • Fixed deleting data cubes and dashboards checks for payload conflicts before commit (id: 32052)
  • Fixed relative time filter not working properly in delta filter comparison (id: 29411)
  • Changed No such datasource status codes to 400 during data cube creation (id: 33105)
  • Changed data cube option boostPrefixRank to true for queries generated from string filters for multi-value dimensions. This boosts the rank of results whose prefix matches the search text (id: 32714)
  • Improved Pivot behavior to return user to dashboard after adding a datacube view to a new or existing dashboard (id: 32839)
  • Improved context of alert and report logs (id: 32047)
  • Improved the consistency of alert trigger time (id: 31798)

Druid changes

  • Added logging for merge and push timings for PartialGenericSegmentMergeTask (#14089) (id: 32727)
  • Added the ability for tasks to run using multiple different mount points for temporary storage. Set druid.worker.baseTaskDirs to an array of locations to enable. If you were using druid.indexer.task.baseTaskDirPaths, that setting no longer works. You must switch to druid.worker.baseTaskDirs. (#14063) (id: 32604)
  • Added a new column start_time to sys.servers that captures the time at which the server was added to the cluster. (#13358) (id: 32628)
  • Added support for querying lookup and inline data directly for MSQ (#14048) (id: 32663)
  • Added check for required avroBytesDecoder property that otherwise causes NPE (#14177) (id: 33230)
  • Added more logs for sequential merge (#14097) (id: 32692)
  • Added support for multiple result columns with the same name (#14025) (id: 32607)
  • Added support in the web console for changing the time column name when reading from external sources in MSQ (#14165) (id: 33091)
  • Changed the web console to set the count on the rule history API (#14164) (id: 33086)
  • Changed the timeout to get worker capacity in the Druid console to be higher (#14095) (id: 32670)
  • Changed useSchemaDiscovery to also include the behavior of includeAllDimensions to support partial schema declaration without having to set two flags (#14076) (id: 32605)
  • Changed to Hadoop 3 by default for Imply distribution (id: 31956)
  • Changed compaction tasks to so that input specs with intervals not aligned with segmentGranularity aren't allowed (#14127) (id: 33018)
  • Changed tombstone behavior so that supervisor tombstones get created only when creating a new supervisor fails (id: 32946)
  • Fixed bugs with auto encoded long vector deserializers (#14186) (id: 33113)
  • Fixed issues with filtering nulls on values coerced to numeric types (id: 33324)
  • Fixed a regression where queries with json_value() encounter NPE (id: 33426)
  • Fixed miscellaneous bugs in the web console (#14216) (id: 33316)
  • Fixed NPE in test parse exception report Add more tests with different thresholds (#14209) (id: 33284)
  • Fixed CaseOperatorConversion has a bug when THEN clause contains binary operator expression (id: 33094)
  • Fixed Kafka Avro ingestion throws unhandled ParseException (id: 29247)
  • Fixed task query error decode (#14174) (id: 33090)
  • Fixed bug filtering nested columns with expression filters (#14096) (id: 32717)
  • Fixed NPE with gs uri having underscores (id: 32749) (#14107)
  • Fixed input source security feature not working for MSQ tasks (#14056) (id: 32553)
  • Fixed failed queries not releasing lanes in certain scenarios (id: 33028)
  • Fixed bugs and added support for boolean inputs to classic long dimension indexer (#14069) (id: 32595)
  • Fixed input source security SQL layer can handle input source with multiple types (#14050) (id: 32538)
  • Fixed 75K files scenario failed in stage0 with Unable to execute HTTP request error (id: 32537)
  • Fixed natural comparator selection for groupBy (SQL) (#14075) (id: 32668)
  • Fixed a bug where you couldn't mark segments as used when the whole datasource is unused (#14185) (id: 33111)
  • Fixed an issue where ephemeral storage from the Overlord for Peon tasks wasn't respected (#14201) (id: 33204)
  • Fixed a bug where if json is explicitly specified, auto got returned instead. (#14144) (id: 32920)
  • Improved MSQ to preserve original ParseException when writing frames (#14122) (id: 32978)
  • Improved the tier selector in the web console (#14143) (id: 32915)
  • Improved error message when Druid services are not running (#14202) (id: 33262)
  • Improved the web console to stringly schemas in the data loader (#14189) (id: 33206)
  • Improved error message for CSV with no properties (#14093) (id: 32771)
  • Improved the Avro extension to: allow more complex jsonpath expressions (#14149) (id: 32933)
  • Improved handling for zerolength intervals (#14136) (id: 33017)
  • Improved GCP initialization to be truly lazy (#14077) (id: 32606)
  • Improved native JSON error UX (#14155) (id: 33003)
  • Improved lookup 404 detection in the web console (#14108) (id: 32820)

Platform changes

  • Imply GKE now supports Ubuntu OS in addition to Container OS (id: 33335)
  • Updated the version of NGINX that ships with GKE (id: 33203)
  • Made deepstore the default intermediary data storage type for managed deployments with deep storage (id: 33119)
  • Fixed whitespace issues causing a full cluster restart when adding a node via API (id: 33071)
  • Fixed reserveThreadsForNonQueryRequests not triggering a restart of query nodes (id: 32652)
  • Fixed an issue where you couldn't use SQL-based ingestion when navigating to the Druid console through Pivot (id: 32590)

Changes in 2023.04

Druid highlights

Auto type column schema (beta)

A new "auto" type column schema and indexer has been added to native ingestion as the next logical iteration of the nested column functionality. This automatic type column indexer that produces the most appropriate column for the given inputs, producing either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json> columns, all sharing a common 'nested' format.

All columns produced by 'auto' have indexes to aid in fast filtering (unlike classic LONG and DOUBLE columns) and use cardinality based thresholds to attempt to only utilize these indexes when it is likely to actually speed up the query (unlike classic STRING columns).

COMPLEX<json> columns produced by this 'auto' indexer store arrays of simple scalar types differently than their 'json' (v4) counterparts, storing them as ARRAY typed columns. This means that the JSON_VALUE function can now extract entire arrays, for example JSON_VALUE(nested, '$.array' RETURNING BIGINT ARRAY). There is no change with how arrays of complex objects are stored at this time.

This improvement also adds a completely new functionality to Druid, ARRAY typed columns, which unlike classic multi-value STRING columns behave with ARRAY semantics. These columns can currently only be created via the 'auto' type indexer when all values are an arrays with the same type of elements.

An array data type is a data type that allows you to store multiple values in a single column of a database table. Arrays are typically used to store sets of related data that can be easily accessed and manipulated as a group.

This release adds support for storing arrays of primitive values such as ARRAY<STRING>, ARRAY<LONG>, and ARRAY<DOUBLE> as specialized nested columns instead of breaking them into separate element columns.

(#14014) (#13803) (id: 32406)

These changes affect two additional new features available in 26.0: schema auto-discovery and unnest.

Schema auto-discovery (beta)

We’re adding schema-auto discovery with type inference to Druid. With this feature, the data type of each incoming field is detected when schema is available. For incoming data which may contain added, dropped, or changed fields, you can choose to reject the nonconforming data (“the database is always correct - rejecting bad data!”), or you can let schema auto-discovery alter the datasource to match the incoming data (“the data is always right - change the database!”).

To use this feature, set spec.dataSchema.dimensionsSpec.useSchemaDiscovery to true. Druid can infer the entire schema or some of it if you explicitly list dimensions in your dimensions list.

Schema auto-discovery is available for native batch and streaming ingestion.

(#13653) (#13672) (#14076)

Sort-merge join and hash shuffle join for MSQ (beta)

We can now perform shuffle joins by setting the context parameter sqlJoinAlgorithm to sortMerge for the sort-merge algorithm or omitting it to perform broadcast joins (default).

Multi-stage queries can use a sort-merge join algorithm. With this algorithm, each pairwise join is planned into its own stage with two inputs. This approach is generally less performant but more scalable, than broadcast.

Set the context parameter sqlJoinAlgorithm to sortMerge to use this method.

Broadcast hash joins are similar to how native join queries are executed.

For more information, see Broadcast and Sort-merge`.

(#13506) (id: 31556)

Storage improvements on dictionary compression

Switching to using frontcoding dictionary compression (beta) can save up to 30% with little to no impact to query performance.

This release further improves the frontCoded type of stringEncodingStrategy on indexSpec with a new segment format version, which typically has faster read speeds and reduced segment size. This improvement is backwards incompatible with Druid 25.0. Added a new formatVersion option, which defaults to the current version 0. Set formatVersion to 1 to start using the new version.

(#13988) (#13996)

Additionally, overall storage size, particularly with using larger buckets, has been improved.

(#13854)

Pivot highlights

Enterprise and Hybrid customers can now configure the street map visualization without contacting Imply to enable the feature. On-prem customers must still contact Imply before they can set up a street map visualization.

Pivot alert improvements:

  • If an alert evaluation cycle is delayed into the time frame of the next evaluation due to technical failures, Pivot now identifies and processes skipped evaluation cycles in an orderly manner.
  • You can now trigger an alert when no new data has been ingested, with the Latest data strategy override property. See Create an alert for details.
  • Improved the alert preview message in the Pivot UI.

Other changes in 2023.04

Pivot changes

  • Added support for triggering an alert even when no new data has been ingested (id: 10662)
  • Added the ability to identify and backfill skipped alert evaluation cycles in an orderly manner (id: 31796)
  • Added the ability to swap axes for table and sparkline visualizations (id: 30172)
  • Added microseconds and nanoseconds to measurement abbreviations (id: 31949)
  • Improved the alert preview message (id: 19383)
  • Improved the 1 month comparison period in visualizations: now 1 calendar month instead of 30 days (id: 31994)
  • Improved alert logging (id: 31793)
  • Fixed Open interactive report button in report emails not working (id: 32490)
  • Fixed an issue with scheduled reports not retaining view, filter, and split information when opened in Pivot (id: 32301)
  • Fixed Pivot downloads using hardcoded split limit, not row limit from download options (id: 32039)
  • Fixed comparison data not being included in Pivot 2 data downloads (id: 32019)
  • Fixed line chart visualization showing a data point for zero when there is no data (id: 31996)
  • Fixed query error on transform measures defined using the Advanced tab (id: 31898)
  • Fixed blank spot in sunburst and pie chart visualization when Others option is set to Hide (id: 31784)
  • Fixed report now showing dimension data in emails produced from async download (id: 31623)
  • Fixed user remaining logged in after changing their passwordPivot now enforces reauthentication (id: 31212)
  • Fixed overall comparison failing in report view (id: 30862)
  • Fixed filter by measure not being applied to Pivot 2 report (id: 30532)
  • Fixed inability to create data cube from a source with nested JSON columns using a "SELECT * from" query (id: 30018)
  • Fixed creating a dimension using the Add a dimension link in data cube view produces an error (id: 32300)

Druid changes

  • Added ability to add configuration files when using Druid on Kubernetes (#13795) (id: 30736)
  • Added tuple sketch SQL support (#13887) (id: 31459)
  • Added better null handling for HttpServerInventoryView, HttpRemoteTaskRunner, LookupNodeDiscovery, and SystemSchema (id: 31680)
  • Added better FrontCodedIndexed (#13854) (id: 31695)
  • Added better JSON column support for literal arrays (id: 20204)
  • Added engine as a dimension for sqlQuery metrics (#13906) (id: 31700)
  • Added JWT authenticator support for validating ID tokens (#13242) (id: 28173)
  • Added timeout to TaskStartTimeoutFault (#13970) (id: 32112)
  • Added arrays to nested columnsarray columns (#13803) (id: 32111)
  • Added Kubernetes task runner live reports (#13986) (id: 32339)
  • Added a UI for Overlord dynamic configurations to the Druid console (#13993) (id: 32335)
  • Added backwards compatibility mode for frontCoded stringEncodingStrategy (#13988) (id: 32176) (#13996) (id: 32336)
  • Added a new error message for task deletion (#14008) (id: 32333)
  • Added null handling and proper error message when a server does not exist/has already been removed from ServerInventoryView and BrokerServerView is trying to add a segment for it (id: 29925)
  • Added configurable retries to ZooKeeper connections (#13913) (id: 31592)
  • Added back function signature for compatibility (#13914) (id: 31600)
  • Changed default maxRowsInMemory for realtime ingestion to a lower number (#13939) (id: 31942)
  • Changed SQL operators NVL and COALESCE with 2 arguments to now plan a native nvl expression, which supports the vector engine. Multi-argument COALESCE still plans into a case_searched, which is not vectorized (#13897) (id: 31566)
  • Enabled round robin segment assignment and batch segment allocation by default (#13942) (id: 31954)
  • Fixed JOIN or UNNEST queries over tombstone segment can fail (#14021) (id: 32486)
  • Fixed querying SQL (#14026) (id: 32484)
  • Fixed SQL in segment card (#13895) (id: 31507)
  • Fixed off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method (#14047) (id: 32481)
  • Fixed issues with null pointers on jobResponse (#14010) (id: 32407)
  • Fixed some queries fail post 2022.05 versions with error "Cannot convert query parts into an actual query" (id: 26709)
  • Fixed join and Unnest planning to ensure that duplicate join prefixes are not used (#13943) (id: 31990)
  • Fixed an issue with SELECT COUNT(distinct) with GROUP BY returning an HLL cast error in MSQ (id: 32198)
  • Fixed an issue so that realtime tasks retry when they fail to pause (#11515) (id: 32079)
  • Fixed bug with expression transform byte[] handling and improvements to expression transform array handling (id: 31864)
  • Fixed Peon errors when executing tasks in ipv6(#13972) (#13995) (id: 32337)
  • Fixed Overlord not becoming a leader when syncing the lock from metadata store (#14038) (id: 32452)
  • Fixed an OOM in the tombstone generating logic in MSQ (#13893) (id: 31563)
  • Fixed HSTS for MiddleManager (#13975) (id: 32077)
  • Fixed new HSTS header not being applied to MiddleManager service (id: 32084)
  • Fixed an NPE for SELECT COUNT(unnested column) when the column has a null row (id: 31851)
  • Fixed Unnest with WHERE sees 'Received a non-applicable rewrite' error (id: 31660)
  • Fixed Unnest with WHERE sees 'SQL query is unsupported' error (id: 31659)
  • Fixed in issue with the start-druid script (#13891) (id: 31549)
  • Fixed several issues for Unnest (#13892) (id: 31604)
  • Fixed Parquet ingestion bug for uint_32 type fields (id: 31602)
  • Fixed KafkaInputFormat when used with Sampler API (#13900) (id: 31555)
  • Fixed Load Data UI so that it supports the "kafka" inputType (id: 12898)
  • Fixed a bug that occurred when using expression filters for filtering COMPLEX<json> columns or values extracted from them using JSON_VALUE or any other nested column function. The column would be incorrectly treated as all null values (#14096) (id: 32717)
  • Improved subquery guardrail so that it obeys memory limit (id: 13296)
  • Improved nested column index utilization (#13977) (id: 32487)
  • Improved segment heap footprint and fixed bug with expression type coercion (#14002) (id: 32334)
  • Improved the Druid console to show segment writing progress (#13929) (id: 32002)
  • Improved performance by creating new RelDataTypeFactory during SQL planning (#13904) (id: 31562)
  • Improved nested column storage format for simple typed columns (id: 29771)
  • Improved error message when topic name changes within same supervisor (#13815) (id: 31511)
  • Upgraded ZK from 3.5.9 to 3.5.10 to avoid data inconsistency risk (#13715) (id: 30187)
  • Upgraded the fabric client to support newer versions of k8s (#13804) (id: 30766)
  • Window planning: use collation traits to improve subquery logic (#13902) (id: 31596)

Platform changes

  • Added support for more GKE regions (id: 32513)
  • Scaled ZK memory usage based on master instance type (id: 32293)

Clarity changes

  • Updated Clarity password policy. Passwords for Clarity must now satisfy the following criteria:

    • A least 8 characters.
    • No older than 90 days.
    • Doesn't match the 5 most recent passwords.
    • Contains alphabetic, numeric, and special characters.

    Additionally, Clarity now locks a user out after 6 invalid login attempts.

    (id: 31281)

Changes in 2023.03.1

Druid updates

  • Fixed an issue where ingestion using MSQ ran out of memory and failed when durable shuffle storage was enabled (id: 31960)
  • Fixed an issue where MSQ tasks ran out of disk even though intermediateSuperSorterStorageMaxLocalBytes was set.

Changes in 2023.03

Druid highlights

Information schema now uses numeric column types

This is a breaking change.

The Druid system table (INFORMATION_SCHEMA) now uses SQL types instead of Druid types for columns. This change makes the INFORMATION_SCHEMA table behave more like standard SQL. You may need to update your queries in order to avoid unexpected results if you depend on either of the following:

  • Numeric fields being treated as strings
  • Column numbering starting at 0column numbering is now 1-based

(#13777) (id: 31065)

Unnest (beta) changes

The UNNEST SQL function has been improved. You can now unnest multiple columns within a single query. For example: select * from "example_table", UNNEST(MV_TO_ARRAY("dim3")) as table_alias1(d3), UNNEST(ARRAY[dim4,dim5]) as table_alias2(d45)

Additionally, note the following breaking changes:

  • The UNNEST SQL function requires you to set the context parameter enableUnnest to true. There is no required context parameter for the native unnest query or for Pivot queries.

  • The syntax for the unnest datasource for native queries has changed. It now uses a virtualColumn to perform the unnest:

       "virtualColumn": {
    "type": "expression",
    "expression": "\"column_reference\""
    },
  • The unnest datasource no longer has an allow list option.

    For more information about these changes to the unnest datasource, see unnest.

(#13892) (id: 30663) (id: 30537)

Pivot highlights

Improved IP dimension performance: After you upgrade to 2023.03, update any existing IP dimensions to benefit from new performance improvements. To do this, edit the IP dimension in Pivot and:

  1. Remove IP_STRINGIFY from the custom formulafor example, for a column with name ipaddress:
    • Old custom formula: IP_STRINGIFY("t"."ipaddress")
    • New custom formula: "t"."ipaddress"
  2. Change the dimension type from String to IP or IP prefix as appropriate.

Dual axis: The line chart visualization now supports multiple scales and lines. You can display two continuous metrics on the same chart with two axes. See the Visualizations reference for details.

Other changes in 2023.03

Pivot changes

  • Added dual axis capability to the line chart visualization (id: 30240)
  • Added data cube option View essence to display the JSON data structure for a Pivot 2 data cube (id: 30824)
  • Added searchable drop-downs throughout Pivot (id: 23489)
  • Fixed transforms not being correctly applied to multi-value dimensions (id: 31192)
  • Fixed records visualization cutting off data when long values were present (id: 30219)
  • Fixed next available measure not displaying when default measure was removed from data cube (id: 29737)
  • Improved error handling for color legends of multi-value dimensions (id: 29907)
  • Improved performance of IP dimensions (id: 29444)
  • Removed Show metadata option from Pivot 2 data cubes (id: 30158)

Druid changes

  • Added support for running indexing tasks on multiple disks for Middlemanagers/Indexers. Multiple base task directories can be assigned using druid.indexer.task.baseTaskDirPaths=[\"PATH1\",\"PATH2\",...] in the runtime properties of the Middlemanager/Indexer (#13476) (id: 16181)
  • Added support for range partitioning for Hadoop-based batch ingestion (#13303) (id: 28169)
  • Added a new dialog to the web console that shows compaction history (#13861) (id: 31452)
  • Added a new Python Druid API for use in Jupyter notebooks (#13787) (id: 31416)
  • Added new functionality for Tuple sketches (#13819) (id: 30811) You can now do the following:
    • Get the sketch output as base64 string
    • Provide a constant Tuple sketch in post-aggregation step that can be used in set operations
    • Get the Estimated Value(Sum) of summary/metrics objects associated with Tuple sketch
  • Added metric for time taken for broker to start up (#13716) (id: 29924)
  • Changed the SQL CAST operator conversion to use Calcites.getColumnTypeForRelDataType to convert Calcite types to native Druid types instead of using its own custom SqlTypeName to ExprType mapping. This makes it more consistent with the SQL to Druid type conversions for most other operators (13890) (id: 25979)
  • Fixed an issue where nested queries for unnest had wrong output column name (#13892) (id: 30440)
  • Fixed an issue where unnest returned different results on the same MV array when there was a null row (#13922) (id: 30537)
  • Fixed an issue where MSQ replaces segments of granularity ALL with other granularities, causing the Peon to run out of memory (id: 31454)
  • Fixed expectedSingleiContainerOutputyaml spelling (#13870) (id: 31422)
  • Fixed an issue where checking if durable storage for MSQ is enabled returned inaccurate results (#13881) (id: 31417)
  • Fixed an issue where queries with multiple unnests returned incorrect results (id: 31310)
  • Fixed a NPE in Kinesis supervisor when recordsPerFetch was not set (id: 31122)
  • Fixed an issue where leader redirection didn't work when both plainText and TLS ports were set (id: 31082)
  • Fixed infinite checkpointing between tasks and Overlord (#13825) (id: 31056)
  • Fixed query cancel NPE in the web console (#13786) (id: 30829)
  • Fixed an issue where ShuffleStorage ingestion failed with an OOM error for MSQ (id: 30019)
  • Fixed an issue where escaping string literals in lookup queries for lookups loaded on MariaDB fail (id: 30810)
  • Fixed ARRAY_AGG so that it works with complex types, and fixed bugs with expression aggregator complex array handling (#13781) (id: 30778)
  • Fixed an issue with the SQL planner when virtual column capabilities were null (#13797) (id: 30750)
  • Improved null value handling in SQL multi-value string functions (id: 25978)
  • Improved dependenciesconsolidate druid-core, extendedset, and druid-hll modules into druid-processing to simplify dependencies. Any extensions referencing these should be updated to use druid-processing instead. Existing extension binaries should continue to function normally when used with newer versions of Druid (#13698) (id: 30891)
  • Improved logs for query errors (#13776) (id: 30776)
  • Improved logging for MSQ worker tasks (#13790) (id: 31186)
  • Improved speed for composite key joins on IndexedTable (#13516) (id: 31064)
  • Improved auto completion in the web console (#13830) (id: 31048)
  • Improved HLL sketches to be more optimized (#13737) (id: 30681)
  • Improved /druid/indexer/v1/sampler to include logicalDimension, physicalDimension and logicalSegmentSchema, which are a list of the most restrictive typed dimension schemas, the list of dimension schemas actually used to sample the data, and full resulting segment schema for the set of rows sampled respectively (13711) (id: 29884)
  • Improved join performance on dense composite keys (id: 29014)
  • Improved client change counter management in HTTP server view (#13010) (id: 28183)
  • Removed FiniteFirehoseFactory and implementations (#12852) (id: 23960)
  • Upgraded Druid query toolkit (#13848) (id: 31251)
  • Upgraded Kafka version to resolve CVE-2023-25194 (id: 31200)
  • Updated Apache Kafka dependencies to 340 (#13802) (id: 30774)

Changes in 2023.02.1

Platform changes

  • Security updates

Changes in 2023.02

Druid highlights

SQL UNNEST (beta)

You can unnest arrays with either the UNNEST function (SQL) or the unnest datasource. The UNNEST function for SQL allows you to unnest arrays by providing a source array expression using the following syntax:

UNNEST(source_expression) AS table_alias_name (column_alias_name)

For more information, see either UNNEST (SQL) or unnest (native).

Pivot highlights

Pivot API docs update

We've corrected and improved the Pivot API docs, and added request examples.

Other changes in 2023.02

Pivot changes

  • Fixed editing and saving a data cube from the data cubes tab not returning user to data cubes tab (id: 25283)
  • Fixed query monitoring showing an empty query for cached queries (id: 30283)
  • Fixed inability to create data cube directly from sources that include JSON columns (id: 30017)
  • Fixed UI attempting to validate custom time dimension bucketing as a number when dimension was not named __time (id: 29866)
  • Fixed measure "missing value fill" not working when set to previous or interpolation and splitting on time (id: 29857)
  • Fixed filtering by null value showing multiple "null" options in drop-down (id: 29638)
  • Fixed measure and dimension conversions for chained filter expressions (id: 29233)
  • Fixed download queries ignoring "Hide filtered out values" multi-value dimension setting on axis queries (id: 29195)
  • Fixed street map visualization suggestions when one of the dimensions was removed (id: 29117)
  • Fixed unexpected sort order in table visualization with multiple values (id: 26933)
  • Fixed datacube info not showing all relevant datacube names when tiles were from multiple datacubes (id: 20694)
  • Fixed inability to configure the settings export limit (id: 30210)
  • Fixed multi-value dimension filter returning incorrect results when there were duplicated values in a multi-value column (id: 29629)
  • Fixed Horizontal bars visualization not filtering correctly by Delta measure (id: 29378)
  • Fixed report attachments using split limits incorrectly (id: 28919)
  • Fixed errors not appearing when alert and report generation fails (id: 28857)
  • Fixed on-prem Clarity users unable to see data due to missing clarityUser metric (id: 30587)
  • Improved and corrected Pivot API docs (id: 28735, 29975)
  • Removed "Show metadata" option from async downloads (id: 29328)

Druid changes

  • Added schemaless ingestion to allow discovery of nested columns (id: 29770)
  • Added nested column indexer for schemaless ingestion (id: 29769)
  • Added SQL functions for UNNEST (id: 28859)
  • Added rules to support comma join and UNNEST syntax (id: 28858)
  • Added extra time-granularity with DATE_EXPAND (id: 20364)
  • Added API endpoint CoordinatorCompactionConfigsResource#getCompactionConfigHistory to return automatic compaction configuration history (#13699) (id: 29962)
  • Added Fallback virtual column that enables falling back to another column if the original column doesn't exist (id: 30524)
  • Added feature flag for nested column scalar handling (id: 30181)
  • Added SQL version of UNNEST native Druid function (id: 29963)
  • Added implementations of semantic interfaces to optimize window processing on top of an ArrayListSegment (id: 29724)
  • Added more robust default fetch settings for Kinesis (id: 29689)
  • Added support for adding Strict-Transport-Security header to HTTP responses (id: 29558)
  • Fixed json_value on scalar virtual column in WHERE clause returning an error (id: 30620)
  • Fixed compaction history returning empty list instead of 404 when not found (id: 30456)
  • Fixed overly verbose batch allocation logs by changing log level from info to debug (id: 30452)
  • Fixed MSQ insert with to_json_string failing with CannotParseExternalData (id: 30341)
  • Fixed Overlord can continuously remove and add workers for the same node when httpRemote is enabled (id: 30270)
  • Fixed nested column handling of null values (id: 30180)
  • Fixed web console data loader not allowing for multiline JSON messages in Kafka (id: 30127)
  • Fixed window parameter for timeseries function (id: 30014)
  • Fixed JSON column becoming unreadable after appending complex data to scalar data (id: 30006)
  • Fixed variable name in start-druid (id: 29587)
  • Fixed json_value not working for Protobuf bytes (id: 29516)
  • Fixed null values missing after ingesting a JSON array as a JSON column (id: 29214)
  • Fixed ingesting an Avro array as a JSON column misses null values in the array (id: 28895)
  • Fixed scalar values being ingested as COMPLEX<json> column type when there were nulls (id: 29841)
  • Improved query operations by making them pausable (id: 29973)
  • Improved and extended table functions functionality (id: 29815)

Platform changes

  • Added support for additional ports on headless services (id: 30175)
  • Fixed feature flag list not always showing feature flags already enabled on the cluster (id: 30474)
  • Removed the feature flag and extension loading option for multi-stage query engine (id: 29835)
  • Enabled durable storage for SQL-based ingestion (MSQ task engine) by default for Imply Enterprise and Hybrid clusters if you use S3 as your deep storage option.
  • Updated ZK to version 3.7.1 (K8s and GKE Enhanced) (id: 30196)

Changes in 2023.01

Druid highlights

The MSQ task engine now has fault tolerance support for workers. This means an ingestion task will retry if the underlying machine executing ingestion fails. You can enable this behavior by setting faultTolerance to true in the context parameters for a query.

Pivot highlights

You can now create a URL containing query parameters, to link to a dashboard or data cube. See Query parameters reference for details.

You can now apply a "flat" layout to the Table visualization to display a column for each dimension. See the Visualizations reference for details.

Dashboards and Pivot 2 data cubes now include an option to enable or disable the query cache for all dashboards and data cubes. See Managing data cubes and Managing dashboards for details. This option already existed for Pivot Classic data cubes.

Platform highlights

For Imply Enterprise on GKE, you can now use persistent disks for data nodes by selecting the option when configuring a tier. Using persistent disks can improve stability by making pod rescheduling more streamlined.

Other changes in 2023.01

Pivot changes

  • Added ability to create a URL with query parameters, to link to a dashboard or data cube (id: 28068, id: 28067, id: 28106)
  • Added flat table layout for table visualizations (id: 24697)
  • Added query cache property to options menu for dashboards and Pivot 2 data cubes (id: 23747)
  • Improved UX for comparative alert constraints (id: 5330)
  • Fixed changing visualization resets dimension/measure selection in data cube view (id: 29538)
  • Fixed inability to create a data cube via custom SQL with an aliased JSON column (id: 29206)
  • Fixed gap in data cube visualization panel (id: 28915)
  • Fixed JSON column introspection failing when creating data cube via custom SQL (id: 28853)
  • Fixed UX when attempting to exclude values in filter when there are multiple nulls in the column (id: 27633)
  • Fixed query generation when filtering on multiple null values (id: 27632)
  • Fixed reports breaking if underlying data cube was deleted (id: 26073)

Druid changes

  • Added support for adding Strict-Transport-Security header to HTTP responses (#13489) (id: 29466)
  • Added new Broker query metrics:
    • Total parallel merge processing 'wall' time
    • Parallel merge pool time spent waiting for results of 'fastest' and 'slowest' partitions.
    • (#13420) (id: 28897)
  • Added nested columns support for protobuf (#13519) (id: 28713)
  • Added ability to select a specific emitter for each feed. For example this means that alerts can go to http emitter, metrics can go to statsd emitter, and requestLog events can go to a Kafka emitter. For more information, see Switching emitter (#13363) (id: 28904)
  • Added tracking for bytes processed by a task in MSQ task reports (#13520) (id: 28436)
  • Allowed string dimension indexer to handle byte as base64 strings (#13573) (id: 29498)
  • Changed how ZooKeeper is specified as a service for the start-druid-main script (#13550) (id: 29152)
  • Changed Druid to monotonically increase worker capacity based on total resources (#13581) (id: 29497)
  • Changed the default max workers to cluster capacity in the Druid console and simplify live reports (#13577) (id: 29493)
  • Changed how Druid behaves on a failed start. Druid no longer waits for a graceful shutdown (#13087) (id: 28902)
  • Changed chatAsync default to true. This means that Druid uses asynchronous communication with Kafka and Kinesis for indexing tasks and ignores the chatThreads parameter. (#13491) (id: 28901)
  • Changed segment allocation behavior:
    • The max batch size for SegmentAllocationQueue is now 500
    • batchAllocationMaxWaitTime is now batchAllocationWaitTime to more accurately reflect the behavior since the wait time can actually exceed the configured value
    • For more information, see Batching segmentAllocate actions (#13503) (id: 28913)
  • Changed logging behavior so queries that fail authorization checks are now logged. Previously, they were not (#13564) (id: 29489)
  • Changed operators to a push style API (#13600) (id: 29482)
  • Changed how Druid extends the FROM grammar from Calcite by using a template file for adding table functions grammar (#13553) (id: 29300)
  • Changed task memory computation in the start-druid script (#13563) (id: 29297)
  • Changed MSQ to only look at sqlInsertSegmentGranularity on the outer query (#13537) (id: 29118)
  • Fixed error anchors in the Druid console (#13527) (id: 29101)
  • Fixed scope of dependencies in protobufextensions pom (#13593) (id: 29488)
  • Fixed an issue where the preview in the Druid console stopped using the MSQ task engine when auto is selected for the Engine (#13586) (id: 29494)
  • Fixed issue with Jetty graceful shutdown of data servers when druid.serverview.type is set to http (#13499) (id: 28899)
  • Fixed issue with JDBC and query metrics causing query failures (#13608) (id: 29487)
  • Fixed typo in metric name. The correct name is ingest/segments/count (#13521) (id: 28933)
  • Fixed an issue where serialization of the LocalInputSource object converts relative file paths to absolute paths, changing the meaning of an MSQ ingest query (#13534) (id: 29491)
  • Fixed an issue which cause broker parallel merge pool metrics to not be emitted. (#13420) (id: 28897)
  • Improved JDBC lookup by quoting and escaping literals to allow reserved identifiers (#13632) (id: 29533)
  • Improved nested column storage format for broader compatibility (#13568) (id: 29499)
  • Improved the stage UI in the Druid console (#13615) (id: 29483)
  • Improved MSQ table functions (#13360) (id: 29100)
  • Improved the error message when thetasketchintersect is used on scalar expressions (#13508) (id: 28914)
  • Improved the TooManyBuckets error message for the MSQ task engine (#13525) (id: 29102)
  • Improved MSQ to track bytes read from an input source in the stage counters of an MSQ task report. (#13559) (id: 29502)
  • Improved logging related to the SQL planner to validate response headers and include the cause (#13609) (id: 29478)
  • Improved the Druid console to better show totals when grouping (#13631) (id: 29477)
  • Improved error reporting in the Druid console (#13636) (id: 29476)
  • Improved disk usage and make Historicals load segments more quickly:
    • Added druid.storage.zip parameter for local storage (defaults to false). This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update because the older code actually already handled unzipped directories being present on local deep storage
    • For more information, see Local (#13394) (id: 29161)
  • Improved the Druid console: add support for arrayOfDoublesSketch, fix padding when aggregating in a table, and add syntax highlighting for window function keywords. (#13486) (id: 28912)
  • Improved the Druid quickstart script to automate memory-parameter defaults (# 13365) (id: 26000)
  • Upgraded to netty 4186Final (#13604) (id: 29484)

Platform changes

  • Fixed an issue where an Imply Enterprise cluster on Kubernetes fails validation (id: 29204)

Upgrade and downgrade notes

Minimum supported version for rolling upgrade

See "Supported upgrade paths" in the Lifecycle Policy documentation.

UNNEST syntax

Starting with 2023.09 STS, the recommended syntax for SQL UNNEST has changed. We recommend using CROSS JOIN instead of commas for most queries to prevent issues with precedence. For example, use

SELECT column_alias_name1 FROM datasource CROSS JOIN UNNEST(source_expression1) AS table_alias_name1(column_alias_name1) CROSS JOIN UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...

Do not use:

SELECT column_alias_name FROM datasource, UNNEST(source_expression1) AS table_alias_name1(column_alias_name1), UNNEST(source_expression2) AS table_alias_name2(column_alias_name2), ...

SQL compatibility

Starting with 2023.09 STS, the default way Druid treats nulls and booleans has changed.

For nulls, Druid now differentiates between an empty string ('') and a record with no data as well as between an empty numerical record and 0.

You can revert to the previous behavior by setting druid.generic.useDefaultValueForNull to true.

For booleans, Druid now strictly uses 1 (true) or 0 (false). Previously, true and false could be represented either as true or false, respectively, as well as 1 or 0. In addition, Druid now returns a null value for Boolean comparisions like True && NULL.

You can revert to the previous behavior by setting druid.expressions.useStrictBooleans to false.

The following table illustrates some example scenarios and the impact of the changes:

Show the table
Query2023.08 STS and earlier2023.09 STS and later
Query empty stringEmpty string ('') or nullEmpty string ('')
Query null stringNull or emptyNull
COUNT(*)All rows, including nullsAll rows, including nulls
COUNT(column)All rows excluding empty stringsAll rows including empty strings but excluding nulls
Expression 100 && 11111
Expression 100 || 111001
Null FLOAT/DOUBLE column0.0Null
Null LONG column0Null
Null __time column0, meaning 1970-01-01 00:00:00 UTC1970-01-01 00:00:00 UTC
Null MVD column''Null
ARRAYNullNull
COMPLEXnoneNull
Update your queries

Before you upgrade from a version prior to 2023.09 to 2023.09 or later, update your queries to account for the changed behavior:

NULL filters

If your queries use NULL in the filter condition to match both nulls and empty strings, you should add an explicit filter clause for empty strings. For example, update s IS NULL to s IS NULL OR s = ''.

COUNT functions

COUNT(column) now counts empty strings. If you want to continue excluding empty strings from the count, replace COUNT(column) with COUNT(column) FILTER(WHERE column <> '').

GroupBy queries

GroupBy queries on columns containing null values can now have additional entries as nulls can co-exist with empty strings.

Nested column format

Starting with 2023.09 STS, the default format for the json type for nested columns has changed to be equivalent to the auto type. When upgrading from a previous version, you can continue to write nested columns in a backwards compatible format (version 4).

In a classic batch ingestion job, include formatVersion in the dimensions list of the dimensionsSpec property. For example:

      "dimensionsSpec": {
"dimensions": [
"product",
"department",
{
"type": "json",
"name": "shipTo",
"formatVersion": 4
}
]
},

To set the default nested column version, set the desired format version in the common runtime properties. For example:

druid.indexing.formats.nestedColumnFormatVersion=4

Stop supervisors that ingest from multiple Kafka topics before downgrading

If you have added supervisors that ingest from multiple Kafka topics in 2023.09 or later, stop those supervisors before downgrading to a version prior to 2023.09 because the supervisors will fail in versions prior to 2023.09.

Remove load rules for query from deep storage before downgrading

If you have added load rules that enable query from deep storage in 2023.08 or later, disable those load rules before downgrading to a version prior to 2023.08. Otherwise the Historicals on older versions will fail to start because they cannot process the latest load rules.

Avatica JDBC driver upgrade

info

The Avatica JDBC is not packaged with Druid. Its upgrade is separate from any upgrades to Imply.

If you notice intermittent query failures after upgrading your Avatica JDBC to version 1.21.0 or later, you may need to set the transparent_reconnection.

Information schema now uses numeric column types

This is a breaking change introduced in 2023.03.

The Druid system table (INFORMATION_SCHEMA) now uses SQL types instead of Druid types for columns. This change makes the INFORMATION_SCHEMA table to behave more like standard SQL. You may need to update your queries in the following scenarios in order to avoid unexpected results if you depend either of the following:

  • Numeric fields being treated as strings.
  • Column numbering starting at 0. Column numbering is now 1-based.

#13777 (id: 31065)

Task directories for Druid

If you use the druid.indexer.task.baseTaskDirPaths, that setting no longer works for versions 2023.05 and later. Use druid.worker.baseTaskDirs instead.

druid.worker.baseTaskDirs applies to SQL-based ingestion

Starting in 2023.06, multi-stage query (MSQ) tasks for SQL-based ingestion now honor the size you set for task directories. This change allows the MSQ task engine to sort more data at the cost of performance. If a task requires more storage than the size you set, data spills over to S3, which can have performance impacts.

To mitigate the performance impact, you can either increase the number of tasks or increase the size you set for druid.worker.baseTaskDirs.

Removed property for setting max bytes for dimension lookup cache

Starting with 2023.08 STS, druid.processing.columnCache.sizeBytes has been removed since it provided limited utility after a number of internal changes. Leaving this config is harmless, but it does nothing.

Removed Coordinator dynamic configs

Starting with 2023.08 STS, the following Coordinator dynamic configs have been removed:

  • emitBalancingStats: Stats for errors encountered while balancing will always be emitted. Other debugging stats will not be emitted but can be logged by setting the appropriate debugDimensions.
  • useBatchedSegmentSampler and percentOfSegmentsToConsiderPerMove: Batched segment sampling is now the standard and will always be on.

Use the new smart segment loading mode instead.

Changed Coordinator config defaults

Starting with 2023.08 STS, the defaults for the following Coordinator dynamic configs have changed:

  • maxsegmentsInNodeLoadingQueue : 500, previously 100
  • maxSegmentsToMove: 100, previously 5
  • replicationThrottleLimit: 500, previously 10

These new defaults can improve performance for most use cases.

Worker input bytes for SQL-based ingestion

Starting with 2023.08 STS, the maximum input bytes for each worker for SQL-based ingestion is now 512 MiB (previously 10 GiB).

Parameter execution changes for Kafka

When using the built-in FileConfigProvider for Kafka, interpolations are now intercepted by the JsonConfigurator instead of being passed down to the Kafka provider. This breaks existing deployments.

For more information, see KIP-297.

#13023

Deprecation notices

Some segment loading configs deprecated

Starting with 2023.08 STS, the following segment related configs are now deprecated and will be removed in future releases:

  • maxSegmentsInNodeLoadingQueue
  • maxSegmentsToMove
  • replicationThrottleLimit
  • useRoundRobinSegmentAssignment
  • replicantLifetime
  • maxNonPrimaryReplicantsToLoad
  • decommissioningMaxPercentOfMaxSegmentsToMove

Use smartSegmentLoading mode instead, which calculates values for these variables automatically.

SysMonitor support deprecated

Starting with 2023.08 STS, switch to OshiSysMonitor as SysMonitor is now deprecated and will be removed in future releases.

CrossTab view is deprecated

The CrossTab view feature is deprecated. It is replaced by Pivot 2.0, which incorporates the capabilities of CrossTab view.

End of support notices

Firehose ingestion

Support for firehose ingestion will be removed in Imply 2022.10, as well as the upcoming LTS release. Firehose has been deprecated since Druid version 0.17. You must transition your ingestion tasks to use inputSource and ioConfig before upgrading to 2022.10.

Hadoop 2

In 2023.09 STS and later, Imply no longer supports using Hadoop 2 with your Druid cluster. Migrate to SQL-based ingestion or native ingestion if you are using Hadoop 2.x for ingestion today. If migrating to Druid ingestion is not possible, plan to upgrade your Hadoop infrastructure.

GroupBy v1

In 2023.09 STS and later, the v1 legacy GroupBy engine has been removed. Use v2 instead, which has been the default GroupBy engine.

cachingCost segment balancing strategy removed

In 2023.09 STS and later, the cachingCost strategy has been removed. Use an alternate segment balancing strategy instead, such as cost.