Imply 3.4.10 includes the following packages:
The Imply download includes a 30-day evaluation license of Pivot. Full licenses are included with Imply subscriptions. Contact us to learn more!
Vectorized query engines for GroupBy and Timeseries queries were introduced in Druid 0.16 as opt-in features. Since then we have extensively tested these engines and feel that the time has come for these improvements to find a wider audience. Note that not all of the query engine is vectorized at this time, but this change makes it so that any eligible query is vectorized. This feature may still be disabled if you encounter any problems by setting druid.query.vectorize
to false
.
New in Druid 0.19.0-iap, native batch indexing now supports Apache Avro Object Container Format encoded files, allowing batch ingestion of Avro data without needing an external Hadoop cluster. Check out the docs for more details
An SqlInputSource
has been added in Druid 0.19.0-iap to work with the new native batch ingestion specifications first introduced in Druid 0.17-iap, deprecating the SqlFirehose
. Like the SqlFirehose
it currently supports MySQL and PostgreSQL, using the driver from those extensions. This is a relatively low level ingestion task, and the operator must take care to manually ensure that the correct data is ingested, either by specially crafting queries to ensure no duplicate data is ingested for appends, or ensuring that the entire set of data is queried to be replaced when overwriting. See the docs for more operational details.
REGEXP_LIKE
A new REGEXP_LIKE
function has been added to Druid SQL and native expressions, which behaves similar to LIKE, except using regular expressions for the pattern.
A coordinator API can make it easier to determine if the latest published segments are available for querying. This is similar to the existing coordinator loadstatus
API, but is datasource specific, may specify an interval, and can optionally live refresh the metadata store snapshot to get the latest up to date information. Note that operators should still exercise caution when using this API to query large numbers of segments, especially if forcing a metadata refresh, as it can potentially be a "heavy" call on large clusters.
Part bug fix, part new feature, Druid native batch (once again) supports appending new data to existing time chunks when those time chunks were partitioned with hash or range partitioning algorithms. Note that currently the appended segments only support dynamic partitioning, and when rolling back to older versions that these appended segments will not be recognized by Druid after the downgrade. In order to roll back to a previous version, these appended segments should be compacted with the rest of the time chunk in order to have a homogeneous partitioning scheme.
Previously, lookups tables were the only native table type that supported direct joins, meaning, local joins on data that is available on all query processing nodes. Lookups, however, were limited in that they could comprise only a single key and value column. The Imply distribution of Druid introduces a new type of globally distributed table in 0.19.0-iap, the indexed table.
Indexed tables are multi-column tables that expand what is possible with efficient direct joins on globally distributed data. Indexed tables are backed by Druid segments and distributed among the cluster with broadcast load rules. The segments are created with some additional information that tells Druid how to load the table and which columns are the joinable key columns.
For more information, see Druid indexed tables (alpha) in the Imply knowledge base.
Druid 0.19.0-iap contains 65 bug fixes; you can see the complete list here.
Druid 0.19.0-iap fixes an important query correctness issue, where dynamic partitioned segments produced by a batch ingestion task were not tracking the overall number of partitions. This had the implication that when these segments came online, they did not do so as a complete set, but rather as individual segments, meaning that there would be periods of swapping where results could be queried from mixed sets of segment versions within a time chunk.
Prior to 0.19.0-iap, Druid had a bug when using hash or ranged partitioning where if data skew was such that any of the buckets were empty after ingesting, the partitions would never be recognized as complete and so never become queryable. Druid 0.19.0-iap fixes this issue by adjusting the schema of the partitioning spec. These changes to the JSON format should be backwards compatible, however rolling back to a previous version will again make these segments no longer queryable.
A bug that affected on-prem Druid versions prior to 0.19.0-iap allowed for (incorrect) coordinator operation if druid.server.maxSize
was not set. This bug would allow segments to load, and effectively randomly balance them in the cluster (regardless of what balancer strategy was actually configured) if all historicals did not have this value set. This bug has been fixed, but as a result druid.server.maxSize
must be set to the sum of the segment cache location sizes for historicals or they will not load segments. No action is needed if you are using Imply Cloud.
Imply 3.4 introduces the ability to define data cubes using SQL expressions, allowing users to more easily define advanced dimension extractions and measure aggregates without using the Plywood expression language.
This is an alpha grade feature, and should not be used in production environments. For more information on enabling this feature for testing purposes, please refer to the Knowledge Base article.
Additionally, in Imply 3.4, we are introducing the capability to add event annotations to time ranges on data cubes, allowing users to easily annotate real-world events like software releases or advertising campaigns against changes in data cube metrics.
This is also an alpha grade feature, and should not be used in production environments. For more information on enabling this feature for testing purposes, please refer to the Knowledge Base article.
DATASET
instead of TIME
operand to timeShift
or other time plywood function causes a UI crashserverRoot
config setting is appliedBehind-the-scenes monitoring improvements enhance the stability of Imply Cloud operations. The Imply field team can now proactively discover and address conditions in the Amazon RDS metadata store that may lead to incidents or downtime in your Druid clusters before those incidents occur. See Monitoring the metadata store with Cloudwatch for more information.
The Imply Manager Helm chart makes it easier to deploy a distributed Imply cluster over Kubernetes. In 3.4, the Imply Manager Helm chart has been enhanced with the following features:
labels
field. See the Deploy with Kubernetes documentation for more information about the Imply Helm chart.extraVolumeClaimTemplates
field. For more information, see Adding another volume claim.If you are upgrading from a previous Imply release, please take note of the following sections.
Be aware of the following changes between 0.18.1-iap and 0.19.0-iap before upgrading. If you're updating from an earlier version than 0.18.1-iap, please see the release notes of the relevant intermediate versions.
A Coordinator bug fix, as a side-effect, now requires druid.server.maxSize
to be set for segments to be loaded. No action is needed if you are using Imply Cloud. If using on-prem Imply, ensure that the setting is configured correctly before upgrading your clusters or else segments will not be loaded. See Segment Cache Size in the Druid documentation for more information.
dimensions
, metrics
, and shardSpec
The payload
column has been removed from the sys.segments
table, which should make queries on this table much more efficient. The most useful fields, the list of dimensions
, metrics
, and the shardSpec
, have been split out, and are still available to devote to processing queries.
The druid.segmentCache.numLoadingThreads
configuration has had the default value changed from "number of cores" to "number of cores" divided by 6. This should improve historical behavior out-of-the-box when loading a large number of segments, limiting the impact on query performance.
A number of incomplete changes to facilitate more efficient join queries, based on the idea of utilizing broadcast load rules to propagate smaller datasources among the cluster so that join operations can be pushed down to individual segment processing, have been added to 0.19.0-iap. While not a finished feature yet, as part of the changes to make this happen, 'broadcast' load rules no longer have the concept of 'colocated datasources', which would attempt to only broadcast segments to servers that had segments of the configured datasource. This didn't work so well in practice, as it was non-atomic, meaning that the broadcast segments would lag behind loads and drops of the colocated datasource, so we decided to remove it.
Another effect of the previously mentioned preliminary work to introduce efficient broadcast joins, Brokers and realtime indexing tasks now load segments loaded by broadcast rules if a segment cache is configured. Since the feature is not complete there is little reason to do this in 0.19.0-iap, and it will not happen unless explicitly configured.
lpad
and rpad
function behavior changeThe lpad
and rpad
functions have undergone a slight behavior change in Druids default non-SQL compatible mode in order to make them behave consistently with PostgreSQL. In the new behavior, if the pad expression is an empty string, then the result will be the (possibly trimmed) original characters, rather than the empty string being treated as a null and coercing the results to null.
For a full list of open issues, please see https://github.com/apache/druid/labels/Bug
If you are using an OIDC authentication provider with Pivot, such as Okta, you need to change your OIDC configuration.
The format of the issuer
field has changed. Previously, the documentation indicated that the value should include /oauth2/default
. Instead, change the issuer
value to be the base URL only, without a trailing slash. This URL is automatically concatenated with /.well-known/openid-configuration
. In the rare case that the OpenID Configuration Document for your provider is not at this address, you can use the newly introduced discoveryUrl
field to provide the exact URL for the OpenID Configuration Document.
See Using OIDC in Pivot for more information.
When upgrading from Imply 3.3, which is based on Apache Druid 0.18.0, also note any items in the "Updating from previous releases" section of the Imply 3.3 release notes that may be relevant for your deployment.
As of July 15, 2020, Imply version 2.x is no longer supported. If you still have active deployments that use Imply version 2.x, you are strongly encouraged to upgrade to the current version as soon as possible. See Subscription Support Maintenance Terms for more information about supported versions.
forceLimitPushDown
in SQL (#10253)IncrementalIndexes
(#10248)IN
filters (#10119) (#10313) (#10312)ExpressionFilter
(#10320)PreparedStatement
(#10272)CombiningFirehose
compatibility (#10264)StringDimensionIndexer
(#10245)StringGroupByColumnSelectorStrategy/bufferComparator
(#10325)stringFirst
aggregator (#10351)comment
, created_date
and remote_address
to audit log entry (#10372)sqlQueryContext
to DefaultRequestLogEvent
(#10368)stringFirst
/stringLast
crashes at aggregation time (#10332)headersTimeout
as a Pivot config propertyREGEXP_LIKE
dimensions
, metrics
, and shardSpec
lpad
and rpad
function behavior change