Skip to main content

Extensions

Druid implements an extension system that allows for adding functionality at runtime. Extensions are commonly used to add support for deep storage (HDFS and S3), metadata stores (MySQL and PostgreSQL), new aggregators, and new input formats.

Imply bundles many commonly used Druid extensions, including core extensions, out of the box. Production clusters generally use at least two extensions: one for deep storage and one for a metadata store.

Not all Druid core extensions are intended for use or packaged with Imply. For instance, the Apache Druid pac4j extension is not supported.

Imply bundled extensions

Each Imply distribution loads a certain set of Druid extensions by default. Additionally, some extensions are loaded dynamically based on the specific conditions and selections made in the UI.

Default extensions

The following table lists extensions that are loaded by default with Imply distributions (version 4.0.0 or higher). The link on each extension name takes you to the documentation for that extension.

NameDescription
druid-datasketchesSupport for approximate counts and set operations with DataSketches.
druid-histogramApproximate histograms and quantiles aggregator. This extension is deprecated. Use the DataSketches quantiles aggregator from the druid-datasketches extension instead.
druid-kafka-indexing-serviceSupervised exactly-once Kafka ingestion for the indexing service.
druid-kinesis-indexing-serviceSupervised exactly-once Kinesis ingestion for the indexing service.
druid-lookups-cached-globalA module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.

Conditional extensions

The following table lists conditional extensions that are loaded dynamically based on the cluster’s configuration. Imply distributions (version 4.0.0 or higher) automatically load these extensions when the associated conditions are satisfied. The link on each extension name takes you to the documentation for that extension, when available.

NameDescriptionCondition
druid-basic-securitySupport for Basic HTTP authentication and role-based access control.Customer enabled authentication for their cluster.
druid-hdfs-storageHDFS deep storage.Customer uses HDFS for deep storage.
druid-s3-extensionsA module for interfacing with data in AWS S3 and using S3 as deep storage.Customer uses AWS S3 for deep storage.
clarity-emitterA module for pushing metrics to Imply's Clarity service.Customer configured a Clarity account.
imply-druid-securityDruid authentication and authorization extension for Imply Cloud. For Imply internal use only.Customer enabled authentication for their cluster.
mysql-metadata-storageMySQL metadata store.Customer uses a MySQL-type metadata store.
postgresql-metadata-storagePostgresql metadata store.Customer uses a Postgresql-type metadata store.
simple-client-sslcontextSimple SSLContext provider module to be used by Druid's internal HttpClient when talking to other Druid processes over HTTPS.Customer enabled TLS for their cluster.

Optional extensions

Imply distributions include optional extensions that can be added to a cluster node to enhance functionality. The following table lists optional extensions packaged with Imply distributions (version 4.0.0 or higher). The link on each extension name takes you to the documentation for that extension, when available.

NameDescription
druid-avro-extensionsSupport for data in Apache Avro data format.
druid-azure-extensionsMicrosoft Azure deep storage.
druid-bloom-filterSupport for providing Bloom filters in Druid queries.
druid-google-extensionsGoogle Cloud Storage deep storage.
druid-kafka-extraction-namespaceKafka-based namespaced lookup. Requires namespace lookup extension.
druid-kerberosKerberos authentication for druid processes.
druid-orc-extensionsSupport for data in Apache Orc data format.
druid-parquet-extensionsSupport for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.
druid-protobuf-extensionsSupport for data in Protobuf data format.
druid-statsStatistics related module including variance and standard deviation.
imply-timeseriesA module to enable time series functions.
imply-utility-beltA module to parse the CloudWatch log container format. For Imply internal use only.
indexed-table-loaderJoinable indexed tables (alpha).

druid-google-extensions and druid-azure-extensions are loaded dynamically when a customer uses Google Cloud Storage or Azure Storage respectively.

Loading bundled extensions with Imply Manager

You can manage bundled Druid extensions in Imply Manager. To view bundled extensions, open the cluster's Setup tab and go to Advanced config > Druid extensions.

Extensions loaded by default are grouped together under the Default option. You do not need to take any additional action to add them to your cluster.

To enable optional extensions, click the edit icon next to Druid extensions. This will open a pop-up window and display optional extensions packaged with your Imply distribution.

Imply Manager extensions

Loading bundled extensions manually

If your Imply distribution does not include Imply Manager, you can load bundled extensions by adding their names to the druid.extensions.loadList parameter of the common.runtime.properties file. For example, to load the postgresql-metadata-storage and druid-hdfs-storage extensions, use the following configuration:

druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]

For more information on setting Druid configurations, see Configuration reference.

Druid core extensions

Core extensions are maintained by Druid committers. Some are in preview status and some are fully production-tested Druid components. For a complete list of core extensions, see Druid core extensions.

Druid community and third-party extensions

Imply does not provide support for community and third-party extensions.

You can also install community and third-party extensions not already bundled with Druid or the Imply distribution.

Community extensions are contributed by Druid community members. These extensions are not packaged with the default Druid tarball and are not maintained by Druid committers. For a list of available community extensions, see Druid community extensions.

Loading community extensions

You can download community extensions using Druid's pull-deps tool. To do so, specify a -c extension coordinate to pull down, followed by a Maven coordinate: org.apache.druid.extensions.contrib:{EXTENSION_NAME}:{DRUID_VERSION}. The version you provide should match the community Druid version that your Imply distribution is based on. For example, to install druid-time-min-max, run the following command:

java \
-cp "dist/druid/lib/*" \
-Ddruid.extensions.directory="dist/druid/extensions" \
-Ddruid.extensions.hadoopDependenciesDir="dist/druid/hadoop-dependencies" \
org.apache.druid.cli.Main tools pull-deps \
--no-default-hadoop \
-c "org.apache.druid.extensions.contrib:druid-time-min-max:{{druidVersion}}"

In common.runtime.properties, add druid-time-min-max to druid.extensions.loadList to instruct Druid to load the extension.

Loading third-party extensions

To load a third-party extension using Imply Manager, make the extension file available to Imply Manager from a filesystem location or by a URL. After making the extension file available, click Add custom extension and provide the extension name along with the URL or path to the extension file using the manager:///<path-to-extension> addressing scheme. For more information on pushing custom user files to Imply Manager, see Add custom user files.

To install a third-party extension manually, download the extension and then install it into your dist/druid/extensions/ directory. You can download the extension from its distributor's directly or, if it is available from Maven, use the included pull-deps tool. To use pull-deps, specify the full Maven coordinate of the extension in the form of groupId:artifactId:version.