Release notes

Imply 3.3.3 includes the following packages:

Pivot evaluation

The Imply download includes a 30-day evaluation license of Pivot. Full licenses are included with Imply subscriptions. Contact us to learn more!

Druid highlights

New features

Join support

Join is a key operation in data analytics. Prior to 0.18.0, Druid supported some join-related features, such as Lookups or semi-joins in SQL. However, the use cases for those features were pretty limited and, for other join use cases, users had to denormalize their datasources when they ingest data instead of joining them at query time, which could result in exploding data volume and long ingestion time.

Druid 0.18.0 supports real joins for the first time in its history. Druid supports INNER, LEFT, and CROSS joins for now. For native queries, the new join datasource has been introduced to represent a join of two datasources. Currently, only the left-deep join is allowed. This means that only a table or another join datasource is allowed for the left datasource. For the right datasource, lookup, inline, or query datasources are allowed. Note that join of Druid datasources is not supported yet. There should be only one table datasource in the same join query.

Druid SQL also supports joins. Under the covers, SQL join queries are translated into one or several native queries that include join datasources.

When a join query is issued, the Broker first evaluates all datasources except for the primary datasource, which is the only table datasource in the query. The evaluation can include executing subqueries for query datasources. Once the Broker evaluates all non-primary datasources, it replaces them with inline datasources and sends the rewritten query to data nodes. (See Query inlining in Brokers below for more details.) Data nodes use the hash join to process join queries. They build a hash table for each non-primary leaf datasource unless it already exists.

Note that only lookup datasource has a pre-built hash table for now. As a result, the join could be sub-optimized in terms of performance for any other datasource types.

For more information, see Joins in the Druid docs.

Query laning and prioritization

When you run multiple queries of heterogenous workloads at a time, you may sometimes want to control the resource commitment for a query based on its priority. For example, you would want to limit the resources assigned to less important queries, so that important queries can be executed in time without being disrupted by less important ones.

Query laning allows you to control capacity utilization for heterogeneous query workloads. With laning, the broker examines and classifies a query for the purpose of assigning it to a 'lane'. Lanes have capacity limits, enforced by the Broker, that can be used to ensure sufficient resources are available for other lanes or for interactive queries (with no lane), or to limit overall throughput for queries within the lane.

Automatic query prioritization determines the query priority based on the configured strategy. The threshold-based prioritization strategy has been added; it automatically lowers the priority of queries that cross any of a configurable set of thresholds, such as how far in the past the data is, how large of an interval a query covers, or the number of segments taking part in a query.

For more information, see query prioritization and laning.

Query inlining in Brokers

Druid is now able to execute a nested query by inlining subqueries. Any type of subquery can be on top of any type of another, such as in the following example:

             topN
               |
       (join datasource)
         /          \
(table datasource)  groupBy

To execute this query, the Broker first evaluates the leaf groupBy subquery; it sends the subquery to data nodes and collects the result. The collected result is materialized in the Broker memory. Once the Broker collects all results for the groupBy query, it rewrites the topN query by replacing the leaf groupBy with an inline datasource which has the result of the groupBy query. Finally, the rewritten query is sent to data nodes to execute the topN query.

For more information about query execution, see Query Execution in the Druid documentation.

New dimension in query metrics

Since a native query containing subqueries can be executed part-by-part, a new subQueryId has been introduced. Each subquery has different subQueryIds but same queryId. The subQueryId is available as a new dimension in query metrics.

New configuration

A new druid.server.http.maxSubqueryRows configuration controls the maximum number of rows materialized in the Broker memory.

See query and the Broker properties reference in the Druid documentation for more information.

SQL grouping sets

GROUPING SETS is now supported, allowing you to combine multiple GROUP BY clauses into one GROUP BY clause. This GROUPING SETS clause is internally translated into the groupBy query with subtotalsSpec. The LIMIT clause is now applied after subtotalsSpec, rather than applied to each grouping set.

SQL Dynamic parameters

Druid now supports dynamic parameters for SQL. To use dynamic parameters, replace any literal in the query with a question mark (?) character. These question marks represent the places where the parameters will be bound at execution time. See Dynamic Parameters in the Druid documentation for more details.

Important Changes

Roaring bitmaps as default

Druid supports two bitmap types, Roaring and CONCISE. Since Roaring bitmaps provide a better out-of-box experience (faster query speed in general), the default bitmap type is now switched to Roaring bitmaps. See Compression in the Druid documentation for more details about bitmaps.

Complex metrics behavior change at ingestion time when SQL-compatible null handling is disabled (default mode)

The behavior of complex metric aggregation at ingestion time has been changed to be consistent with SQL-compatible null handling at query time, when SQL-compatible null handling is disabled. The complex metrics are aggregated to the default 0 values for nulls instead of skipping them during ingestion.

Array expression syntax change

Druid expression now supports typed constructors for creating arrays. Arrays can be defined with an explicit type. For example, <LONG>[1, 2, null] creates an array of LONG type containing 1, 2, and null. Note that you can still create an array without an explicit type. For example, [1, 2, null] is still a valid syntax to create an equivalent array. In this case, Druid will infer the type of array from its elements. This new syntax applies to empty arrays as well. <STRING>[], <DOUBLE>[], and <LONG>[] will create an empty array of STRING, DOUBLE, and LONG type, respectively.

Enabling pending segments cleanup by default

The pendingSegments table in the metadata store is used to create unique new segment IDs for appending tasks such as Kafka/Kinesis indexing tasks or batch tasks of appending mode. Automatic pending segments cleanup was introduced in 0.12.0, but has been disabled by default prior to 0.18.0. This configuration is now enabled by default.

Creating better input splits for native parallel indexing

The Parallel task now can create better splits. Each split can contain multiple input files based on their size. Empty files will be ignored. The split size is controllable with the new split hint spec. See Split Hint spec in the Druid documentation for more details.

Transform is now an extension point

Transform is an Interface that represents a transformation to be applied to each row at ingestion time. This interface is now an Extension point. See Writing your own extensions for how to add your custom Transform.

chunkPeriod query context is removed

chunkPeriod has been deprecated since 0.14.0 because of its limited usage (it was sometimes useful for only groupBy v1). This query context is now removed in 0.18.0.

Experimental support for Java 11

Druid now experimentally supports Java 11. You can run the same Druid binary distribution with Java 11 which is compiled with Java 8. Our tests on Travis include:

Performance testing results are not available yet.

Warnings for illegal reflective accesses when running Druid with Java 11

Since Java 9, it issues a warning when it is found that some libraries use reflection to illegally access internal APIs of the JDK. These warnings will be fixed by modifying Druid codes or upgrading library versions in future releases. For now, these warnings can be suppressed by adding JVM options such as --add-opens or --add-exports. See JDK 11 Migration Guide for more details.

Some of the warnings are:

2020-01-22T21:30:08,893 WARN [main] org.apache.druid.java.util.metrics.AllocationMetricCollectors - Cannot initialize org.apache.druid.java.util.metrics.AllocationMetricCollector
java.lang.reflect.InaccessibleObjectException: Unable to make public long[] com.sun.management.internal.HotSpotThreadImpl.getThreadAllocatedBytes(long[]) accessible: module jdk.management does not "exports com.sun.management.internal" to unnamed module @6955cb39

This warning can be suppressed by adding --add-exports jdk.management/com.sun.management.internal=ALL-UNNAMED.

2020-01-22T21:30:08,902 WARN [main] org.apache.druid.java.util.metrics.JvmMonitor - Cannot initialize GC counters. If running JDK11 and above, add `--add-exports java.base/jdk.internal.perf=ALL-UNNAMED` to the JVM arguments to enable GC counters.

This warning can be suppressed by adding --add-exports java.base/jdk.internal.perf=ALL-UNNAMED.

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

This warning can be suppressed by adding --add-opens java.base/java.lang=ALL-UNNAMED.

Security vulnerability updates

[CVE-2020-1958] Apache Druid LDAP injection vulnerability

CVE-2020-1958 has been reported recently and fixed in 0.18.0 and 0.17.1. When LDAP authentication is enabled, callers of Druid APIs can bypass the credentialsValidator.userSearch filter barrier or retrieve any LDAP attribute values of users that exist on the LDAP server, so long as that information is visible to the Druid server. Please see the description in the link for more details. It is strongly recommended to upgrade to 0.18.0 or 0.17.1 if you are using LDAP authentication with Druid.

Updating Kafka client to 2.2.2

Kafka client library has been updated to 2.2.2, in which CVE-2019-12399 is fixed.

Pivot highlights

Role-based connections to Druid

With Imply 3.3, it is now possible to link basic authentication in Druid to Pivot roles. User roles can now be configured with an optional auth token, which will be applied to all Druid queries. This allows Pivot to be set up to conditionally expose or hide data sources in a cluster based on a user’s role.

Other changes and bug fixes

Improvements

Bug fixes

Clarity highlights

Clarity re-architecture

Clarity and Pivot now have converged into a single process and into a single UI, empowering Clarity to leverage all Pivot features in the future. For more information, see Monitoring.

AccessClarity user permission added

The new AccessClarity permission allows administrators to control access to the Clarity UI embedded in Pivot. With this permission (and when user authorization is enabled in Pivot), users can access all features of the self-hosted Clarity UI at https://:9095/clarity. See Monitoring for more information.

Bug fixes

Imply Manager highlights

Azure and GCS support

You can now set up Microsoft Azure and Google Cloud Storage (GCS) as deep storage systems from the Imply Manager. For more information, see Deep Storage in planning documentation.

Other changes and bug fixes

The self-hosted Manager can now apply updates to existing server configurations programmatically through Helm charts.

Bug fixes

Upgrading from previous releases

If you are upgrading from a previous Imply release, please take note of the following sections.

Druid upgrade notes

Be aware of the following changes between 0.17.1 and 0.18.0 before upgrading. If you're updating from an earlier version than 0.17.1, please see the release notes of the relevant intermediate versions.

Core extension for Azure

The Azure storage extension has been promoted to a core extension. It also supports cleanup of stale task logs and segments now. When deploying 0.18.0, please ensure that your extensions-contrib directory does not have any older versions of druid-azure-extensions extension.

Google Storage extension

The Google storage extension now supports cleanup of stale task logs and segments. When deploying 0.18.0, please ensure that your extensions-contrib directory does not have any older versions of druid-google-extensions extension.

Hadoop AWS library included in binary distribution

Hadoop AWS library is now included in the binary distribution for better out-of-box experience. When deploying 0.18.0, please ensure that your hadoop-dependencies directory or any other directories in the classpath does not have duplicate libraries.

PostgreSQL JDBC driver for Lookups included in binary distribution

PostgreSQL JDBC driver for Lookups is now included in the binary distribution for better out-of-box experience. When deploying 0.18.0, please ensure that your extensions/druid-lookups-cached-single directory or any other directories in the classpath does not have duplicate JDBC drivers.

Known issues

Kafka streaming ingestion fails for Avro and other data formats

Fixed in Druid 0.18.1, Imply 3.3.1 Kafka streaming ingestion does not work for data in Avro or any format other than CSV or JSON. This issue causes an exception when Imply attempts to parse Kafka streams that use any formats other than CSV and JSON.

Query failure with topN or groupBy on scan with multi-valued columns

Query inlining in Brokers is newly introduced in 0.18.0 but has a bug that queries with topN or groupBy on top of scan fail if the scan query selects multi-valued dimensions. See https://github.com/apache/druid/issues/9697 for more details.

Misleading segment/unavailable/count metric during handoff

This metric is supposed to take the number of segments served by realtime tasks into consideration as well, but it isn't now. As a result, it appears that unavailability spikes up before the new segments are loaded by historicals, even if all segments actually are continuously available on some combination of realtime tasks and historicals.

Slight difference between the result of explain plan for query and the actual execution plan

The result of explain plan for can be slightly different from what Druid actually executes when the query includes joins or subqueries. The difference can be found in that each part of the query plan would be represented as if it was its own native query in the result of explain plan for. For example, for a join of a datasource d1 and a groupBy subquery on datasource d2, the explain plan for could return a plan like below:

     join
    /    \
scan    groupBy
 |        |
d1       d2

whereas the actual query plan Druid would execute is

     join
    /    \
  d1    groupBy
          |
         d2
Other known issues

For a full list of open issues, please see https://github.com/apache/druid/labels/Bug

Upgrading from earlier releases

When upgrading from Imply 3.2, which is based on Apache Druid 0.17.0, please additionally take note of the items in the "Updating from previous releases" section of the Imply 3.2 release notes. You may need to take these items into consideration if they are relevant for your deployment.

Changes in 3.3.0.2

Druid changes

Changes in 3.3.1

Pivot changes

Druid changes

Changes in 3.3.1.1

Pivot changes

Changes in 3.3.1.2

Druid changes

Changes in 3.3.2.1

Pivot changes

Druid changes

Changes in 3.3.3

Pivot changes

Druid changes

Overview

Tutorial

Deploy

Administer

Manage Data

Query Data

Visualize

Configure

Special UI Features

Misc