Extensions
Druid implements an extension system that allows for adding functionality at runtime. Extensions are commonly used to add support for deep storage (HDFS and S3), metadata stores (MySQL and PostgreSQL), new aggregators, and new input formats.
Imply bundles many commonly used Druid extensions, including core extensions, out of the box. Production clusters generally use at least two extensions: one for deep storage and one for a metadata store.
Not all Druid core extensions are intended for use or packaged with Imply. For instance, the Apache Druid pac4j extension is not supported.
Imply bundled extensions
Each Imply distribution loads a certain set of Druid extensions by default. Additionally, some extensions are loaded dynamically based on the specific conditions and selections made in the UI.
Default extensions
The following table lists extensions that are loaded by default with Imply distributions (version 4.0.0 or higher). The link on each extension name takes you to the documentation for that extension.
Name | Description |
---|---|
druid-datasketches | Support for approximate counts and set operations with DataSketches. |
druid-histogram | Approximate histograms and quantiles aggregator. This extension is deprecated. Use the DataSketches quantiles aggregator from the druid-datasketches extension instead. |
druid-kafka-indexing-service | Supervised exactly-once Kafka ingestion for the indexing service. |
druid-kinesis-indexing-service | Supervised exactly-once Kinesis ingestion for the indexing service. |
druid-lookups-cached-global | A module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data. |
Conditional extensions
The following table lists conditional extensions that are loaded dynamically based on the cluster’s configuration. Imply distributions (version 4.0.0 or higher) automatically load these extensions when the associated conditions are satisfied. The link on each extension name takes you to the documentation for that extension, when available.
Name | Description | Condition |
---|---|---|
druid-basic-security | Support for Basic HTTP authentication and role-based access control. | Customer enabled authentication for their cluster. |
druid-hdfs-storage | HDFS deep storage. | Customer uses HDFS for deep storage. |
druid-s3-extensions | A module for interfacing with data in AWS S3 and using S3 as deep storage. | Customer uses AWS S3 for deep storage. |
clarity-emitter | A module for pushing metrics to Imply's Clarity service. | Customer configured a Clarity account. |
imply-druid-security | Druid authentication and authorization extension for Imply Cloud. For Imply internal use only. | Customer enabled authentication for their cluster. |
mysql-metadata-storage | MySQL metadata store. | Customer uses a MySQL-type metadata store. |
postgresql-metadata-storage | Postgresql metadata store. | Customer uses a Postgresql-type metadata store. |
simple-client-sslcontext | Simple SSLContext provider module to be used by Druid's internal HttpClient when talking to other Druid processes over HTTPS. | Customer enabled TLS for their cluster. |
Optional extensions
Imply distributions include optional extensions that can be added to a cluster node to enhance functionality. The following table lists optional extensions packaged with Imply distributions (version 4.0.0 or higher). The link on each extension name takes you to the documentation for that extension, when available.
Name | Description |
---|---|
druid-avro-extensions | Support for data in Apache Avro data format. |
druid-azure-extensions | Microsoft Azure deep storage. |
druid-bloom-filter | Support for providing Bloom filters in Druid queries. |
druid-google-extensions | Google Cloud Storage deep storage. |
druid-kafka-extraction-namespace | Kafka-based namespaced lookup. Requires namespace lookup extension. |
druid-kerberos | Kerberos authentication for druid processes. |
druid-orc-extensions | Support for data in Apache Orc data format. |
druid-parquet-extensions | Support for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded. |
druid-protobuf-extensions | Support for data in Protobuf data format. |
druid-stats | Statistics related module including variance and standard deviation. |
imply-utility-belt | A module to parse the CloudWatch log container format. For Imply internal use only. |
indexed-table-loader | Joinable indexed tables (alpha). |
druid-google-extensions
anddruid-azure-extensions
are loaded dynamically when a customer uses Google Cloud Storage or Azure Storage respectively.
Loading bundled extensions with Imply Manager
You can manage bundled Druid extensions in Imply Manager. To view bundled extensions, open the cluster's Setup tab and go to Advanced config > Druid extensions.
Extensions loaded by default are grouped together under the Default
option. You do not need to take any additional action to add them to your cluster.
To enable optional extensions, click the edit icon next to Druid extensions. This will open a pop-up window and display optional extensions packaged with your Imply distribution.
Loading bundled extensions manually
If your Imply distribution does not include Imply Manager, you can load bundled extensions by adding their names to the druid.extensions.loadList
parameter of the common.runtime.properties
file. For example, to load the postgresql-metadata-storage
and druid-hdfs-storage
extensions, use the following configuration:
druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]
For more information on setting Druid configurations, see Configuration reference.
Druid core extensions
Core extensions are maintained by Druid committers. Some are in preview status and some are fully production-tested Druid components. For a complete list of core extensions, see Druid core extensions.
Druid community and third-party extensions
Imply does not provide support for community and third-party extensions.
You can also install community and third-party extensions not already bundled with Druid or the Imply distribution.
Community extensions are contributed by Druid community members. These extensions are not packaged with the default Druid tarball and are not maintained by Druid committers. For a list of available community extensions, see Druid community extensions.
Loading community extensions
You can download community extensions using Druid's pull-deps
tool. To do so, specify a -c
extension coordinate to pull down, followed by a Maven coordinate: org.apache.druid.extensions.contrib:{EXTENSION_NAME}:{DRUID_VERSION}
.
The version you provide should match the community Druid version that your Imply distribution is based on. For example, to install
druid-time-min-max
, run the following command:
java \
-cp "dist/druid/lib/*" \
-Ddruid.extensions.directory="dist/druid/extensions" \
-Ddruid.extensions.hadoopDependenciesDir="dist/druid/hadoop-dependencies" \
org.apache.druid.cli.Main tools pull-deps \
--no-default-hadoop \
-c "org.apache.druid.extensions.contrib:druid-time-min-max:{{druidVersion}}"
In common.runtime.properties
, add druid-time-min-max
to druid.extensions.loadList
to instruct Druid to load the extension.
Loading third-party extensions
To load a third-party extension using Imply Manager, make the extension file available to Imply Manager from a filesystem location or by a URL. After making the extension file available, click Add custom extension and provide the extension name along with the URL or path to the extension file using the manager:///<path-to-extension>
addressing scheme. For more information on pushing custom user files to Imply Manager, see Add custom user files.
To install a third-party extension manually, download the extension and then install it into your dist/druid/extensions/
directory. You can download the extension from its distributor's directly or, if it is available from Maven, use the included pull-deps
tool. To use pull-deps
, specify the full Maven coordinate of the extension in the form of groupId:artifactId:version
.