Skip to main content

Imply Enterprise on Kubernetes

You can install a distributed, production-ready Imply cluster on the Kubernetes orchestration platform. This lets you take advantage of the deployment, scaling, and redundancy features built into Kubernetes.

Imply's Kubernetes implementation uses a Helm chart to define, deploy, and scale the Imply cluster. This topic covers the default Helm chart delivered with the Minikube quickstart and strategies to reconfigure the default options for a production environment.

Important: Imply supports Helm primarily as a tool to help with deploying. It is not intended as a complete replacement for Imply Manager, and it is not possible to perform many common administrative functions for Imply using Helm. Helm support is limited to the operations described in this documentation and settings in the Imply Helm chart.

For more information on Helm charts, see the Helm docs.

This documentation assumes that you are experienced with Kubernetes. See the Kubernetes documentation for complete information.

Supported environments

Imply works with several Kubernetes engines including the following:

  • Google Kubernetes Engine (GKE)
  • Amazon Elastic Kubernetes Service (EKS)
  • Azure Kubernetes Service (AKS)

You can use any Kubernetes version that these providers support as long as it version 1.19 or later.

The Imply Helm chart requires:

  • Helm 3.2.0 or later

Newer versions of Kubernetes are supported but may not have been specifically validated.

Imply recommends you deploy Imply Manager on a Linux host to avoid virtualization overhead. You can run the manager on an OSX host for the quickstart environment, but it is not recommended for production deployments. Windows hosts are not supported.

For external metadata storage, Imply Manager and the Druid clusters require MySQL version 5.7.21 or later.

While Imply works best in internet-connected environments where Imply Manager can retrieve version updates, it works without internet connectivity (air-gapped environments) as well.

An exception applies to Enhanced Imply Enterprise on GKE. Enhanced Imply Enterprise on GKE requires network connectivity and installation in air-gapped environments is not supported.

In air-gapped environments, if Imply Manager is unable to reach the Imply servers, you can only deploy clusters of the same version as Imply Manager. Updating to a new version, requires first updating Imply Manager.

Ingress ports

By default, a Kubernetes-based installation of Imply requires the following externally accessible ingress ports:

  • Imply Manager requires port 80 (or 443 if TLS is enabled)
  • Pivot requires port 9095
  • Druid requires port 8888

Getting started

Choose from the following options to start using Imply with Kubernetes:

Setting up the Imply Helm repository

The instructions and reference in this guide refer to the Imply Helm chart from the Minikube installation. If you have not followed the steps for installing Imply on Minikube, run the following commands to set up the Imply Helm repository:

helm repo add imply https://static.imply.io/onprem/helm
helm repo update

After you deploy Imply with Kubernetes, see Kubernetes Scaling Reference for information about scaling and managing your cluster.

Fetching and updating the Imply Manager Helm Chart

The Minikube installation includes an Imply Helm chart that works as a good template for creating a Helm chart customized to your environment.

Run the following command to fetch the configuration file for the latest version of the chart:

helm show values imply/imply > values.yaml

After you modify the chart configuration, run the following command to update the deployed release:

helm upgrade {releaseName} imply/imply -f values.yaml

Applying a license

By default, a new Imply trial installation includes a 30-day license. If you have a longer-term license that you want to apply, follow these steps:

  1. Create a file named IMPLY_MANAGER_LICENSE_KEY and paste your license key as the content of the file.
  2. Create a Kubernetes secret named imply-secrets by running the following command:
    kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY

To update an existing license, follow these steps:

  1. Delete the existing Kubernetes secret:

    kubectl delete secret imply-secrets
  2. Create a new Kubernetes secret. For example, assuming that IMPLY_MANAGER_LICENSE_KEY is the new license and is available in the current folder, you can run:

    kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY 
  3. Perform a rolling restart. For example, given the default deployment name, imply-manager, enter:

    kubectl rollout restart deployment imply-manager

    You can check the deployment name with kubectl get deployments.

If the rollout didn't successfully delete the pod, you can delete the pods manually as follows:

kubectl delete pod imply-query-xxxxx
kubectl delete pod imply-manager-xxxx

Replace the pod names appropriately for your instance.

Configuring cluster settings

To configure a Helm chart for your environment, work through the various sections of values.yaml. Use the following brief descriptions to help you plan your updates:

  • deployments: Identifies the cluster components to install. By default, all are set to true. Set the value to false for components running outside the cluster. See the following:

  • security: Set authentication and TLS security options between the services. See Security.

  • manager: Deployment configuration for Imply Manager.

  • druid: Settings for Druid in new clusters. Helm applies the settings during cluster creation the first time you deploy a chart. See Applying configuration changes to Druid for more detail and for information about applying settings to existing clusters.

    • customFiles: A list of custom files that the Druid services load. For example:

      druid:
      customFiles:
      - path: "<location>" # The file path, such as https://remote-custom-file
      executable: <boolean> # Set to true if this is a user-init script or false for a regular file
      unpack: <boolean> # Set to true if the file needs to be unpacked, such as a .tar.gz file or false if it doesn't.
      # Optionally, you can set one of the following two options for the location type to true. They are mutually exclusive, so if you set one to true, set the other to false or omit it.
      hadoopDependency: <boolean>
      classpath: <boolean>
    • userExtensions: A mapping of custom extension names to custom extension paths that Druid services load. For example:

      druid:
      userExtensions:
      <extension-name>: "<location>" # HTTPS URL
      <extension-name>: "<location>" # HTTPS URL
  • master: Configuration for the master service.

  • query: Configuration for the query service.

  • dataTierX: Configuration for the data tier services.

  • zookeeper: ZooKeeper configuration. See ZooKeeper.

  • mysql: Configuration for the default metadata store.

  • minio: Configuration for the default deep storage. See Deep storage.

Using an external metadata storage database

To use an external database for metadata storage:

  1. Disable the default database in deployments section of values.yaml as follows:

    deployments:
    manager: true
    agents: true

    zookeeper: true
    mysql: false # Set to false to disable.
    minio: true
  2. Configure the connection to the external database. The database must be reachable from the pods within the deployment. The database name cannot contain spaces.

If your cluster is already running, there is no automatic data migration from the default MySQL deployment to your external database. Migrate the data manually if required.

Additionally, the user for the external metadata store should have ALL privileges.

Using an external MySQL DB

To use an external MySQL database, update the metadataStore section for both manager and druid, for example:

manager: #or druid:
...
metadataStore:
type: mysql
host: mysql.example.com
port: 3306
user: root
password: imply
database: imply-manager # For the manager section only.
...

Using an external PostgreSQL DB

To use an external PostgreSQL database, update the metadataStore section for both manager and druid, for example:

manager: #or druid:
...
metadataStore:
type: postgresql
host: postgresql.example.com
port: 5432
user: root
password: imply
database: imply-manager # For the manager section only.
...

Manually creating the metadata databases

An Imply deployment uses one database for Imply Manager and one database for each Imply cluster. Imply Manager automatically attempts to create the required databases if they do not exist.

If the provided user does not have database creation privileges, manually create the databases before running the application. Both databases must use the utf8mb4 default character set. For example:

CREATE SCHEMA `imply_manager` DEFAULT CHARACTER SET utf8mb4;

Enabling TLS to the external metadata store

To enable TLS validation, add the TLS certificate to the metadataStore section. For example:

manager: # or druid:
metadataStore:
...
tlsCert: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
...

ZooKeeper

The Imply Helm chart includes a quickstart ZooKeeper configuration that is not recommended for production. For production deployments, you can:

  • Configure ZooKeeper 3.5 for high availability.
  • Use an existing highly available ZooKeeper ensemble. For information on clustered ZooKeeper ensembles, see ZooKeeper Administrator's Guide.

Imply currently supports ZooKeeper branches 3.4 and 3.5. Support for ZooKeeper 3.4 will be removed as of LTS 2022.01. Use ZooKeeper 3.5 instead. For information about ZooKeeper 3.5, see ZooKeeper 3.5 Release Notes.

Configuring a highly available ensemble

To increase the number of ZooKeeper nodes, modify the replicaCount in the zookeeper section of values.yaml.

ZooKeeper works best with an odd number of nodes. More nodes allow for more fault tolerance. Set the replicaCount to 3 or higher odd number to enable a highly available ZooKeeper ensemble. For example:

...
zookeeper:
replicaCount: 3 # Change this value
...

Using an external ZooKeeper ensemble

If you are using an external ZooKeeper ensemble, disable the default ZooKeeper deployment in the deployments section of the values.yaml. For example:

deployments:
manager: true
agents: true

zookeeper: false # Updated to false to disable.
mysql: true
minio: true

Specify the connection details for the external ZooKeeper ensemble in the druid section of values.yaml. The ensemble must be reachable from the pods within the deployment. For example:

druid:
...
zk:
connectString: "zookeeper.example.com:2181"

If you have already created your cluster, use the Manager UI to configure the external ZooKeeper ensemble.

Deep storage

The Imply Helm chart includes a quickstart MinIO configuration that is not recommended for production. See the Druid Deep storage documentation to choose the best supported deep storage type for you.

To use an external deep storage provider, disable the default MinIO deployment in the deployments section of the values.yaml. For example:

deployments:
manager: true
agents: true

zookeeper: true
mysql: true
minio: false # this line should be updated to false to disable

When you use a named deep storage type, the cluster automatically picks up the following:

  • required settings
  • authentication information
  • index logs configuration.

Also, the Druid extension is automatically added to the classpath. If you need to change these settings or others, add them to commonRuntimeProperties in the druid section of values.yaml.

If you have already created your cluster, use the Manager UI to configure the external deep storage connection.

For type-specific deep storage instructions, see the following:

Accessing cluster services

By default, the cluster is not accessible outside of Kubernetes. You can run a kubectl port-forward command to enable access to the Manager UI, Pivot, or the Druid console.

To enable specific access, use these commands:

  • Imply Manager

    kubectl --namespace default port-forward svc/imply-manager-int 9097
  • Pivot and Druid console

    kubectl --namespace default port-forward svc/imply-query 8888 9095

You can now access the Imply UIs from your web browser at the following addresses:

  • Imply Manager: http://localhost:9097
  • Druid console: http://localhost:8888
  • Pivot: http://localhost:9095

Exposing cluster services using Kubernetes services

To enable access to the Imply Manager, Pivot, and Druid console without using kubectl, expose them as Kubernetes services.

Review the Kubernetes docs to learn more and choose the best service type for your deployment.

The following examples demonstrate a LoadBalancer service. Depending on your Kubernetes configuration, the load balancer service type may not be available or supported by default.

Manager

To enable the LoadBalancer service for the Manager UI, update the manager.service entry in values.yaml:

manager:
...
service:
enabled: true # set to true to enable
type: LoadBalancer
port: "{{ ternary 80 443 (empty .Values.security.tls) }}" # will use port 80 if tls is not enabled, or 443 if enabled or any port specified
# nodePort:
# loadBalancerIP: set this to request a specific IP, refer to your Kubernetes provider documentation for more information
protocol: TCP # TCP or HTTP, refer to your Kubernetes provider documentation for more information
annotations: {} # provider specific annotation to customize the service
...

Query nodes

To enable the LoadBalancer service for the cluster's query nodes, update the query.service entry in values.yaml:

query:
...
service:
type: LoadBalancer
routerPort: 8888 # Leave blank to not expose the router through the Service
pivotPort: 9095 # Leave blank to not expose Pivot through the Service
# routerNodePort:
# pivotNodePort:
# loadBalancerIP:
protocol: TCP
annotations: {}
...

Configuring security

You can secure your cluster with Kubernetes TLS and authentication options.

TLS

The Kubernetes secrets provides a store for your TLS certificates and keys.

Run the following command to create the Kubernetes secret containing the certificate and key:

kubectl create secret tls imply-ca --key path/to/ca.key --cert path/to/ca.crt

Uncomment the TLS key in the security section of the values.yaml to enable TLS:

security:
...
tls:
secretName: imply-ca

Run the following command to verify that you have enabled TLS:

kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000

TLS: Enabled appears in the logs when it is enabled.

For information on how to generate certificates and general TLS information, see the Imply TLS Docs.

Note that legacy TLS versions (1.0 and 1.1) are supported by default. To turn off support for these versions, set enableLegacyTls to false in the Helm chart:

security:
enableLegacyTls: false
...
...

Authentication

The Kubernetes secrets provides a store for your authentication tokens.

Run the following command to create the secret containing the authentication token:

kubectl create secret generic imply-auth-token --from-literal="auth-token=supersecret"

If you pass the authentication token to the agent this way, the token remains in your history. This is meant as an example only. Consult the Kubernetes secrets documentation for secure ways to create the secret.

Uncomment the auth key in the security section of the values.yaml to enable authentication. For example:

security:
...
auth:
secretName: imply-auth-token

Run the following command to verify that you have enabled authentication:

kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000

Authentication: Enabled appears in the logs when it is enabled.

See the Imply authentication docs to learn how enabling authentication affects your cluster.

Adding custom user files

For general information on custom user files, see Add custom user files in the Imply Manager documentation.

To have custom user files available while running in Kubernetes, you need to get the files into the /mnt/var/user directory. There are multiple ways to accomplish this. The first option is to use kubectl cp to copy the file into a running Pod. The file will not be available in the container if the Pod is restarted for any reason. To copy a user file using this method, run the following command:

kubectl cp <custom-file-path> <manager-pod-name>:/mnt/var/user/<file-name>

There are two options to get a file into the container that will survive a Pod restart. The first is to create a custom image based on the official Imply Manager Docker image. To accomplish this, create a Dockerfile as follows:

FROM imply/manager:2024.04
COPY CUSTOM_FILE /mnt/var/user/FILENAME

Once the Dockerfile is created, you can build the custom image using the Docker CLI. An example command would look like:

docker build -f <path-to-Dockerfile> -t myorg/manager:<date> <folder-containing-custom-file>

Once the build is complete, you can run docker push to push it to a remote repository your Kubernetes cluster can read from. This method is simpler than adding a custom mount, but can make upgrading your manager harder in the future as you will have to create a new image each time and update your Helm values files.

The last option here, though this is not a complete list of all options, is to place the file on a shared volume and mount the volume to the manager Pod. You can use an NFS storage path, similar to the description in NFS and Kubernetes supported volumes.

If NFS is not a solution you can use, consider mounting a PVC supported by your cloud provider or Kubernetes cluster and then copying the file with kubectl to make it sticky between restarts.

Using custom versions

If you have a custom build of Imply and want to be able to deploy it using Imply Manager, first you need to make the tar.gz file that contains the custom build available to Imply Manager. See the instructions above for instructions on how to add custom user files to Imply Manager. Once this is accomplished you can update your values.yaml as follows:

manager:
...
customVersions:
- 2021.01-hdp-2.7.3.2.6.5.0-292

In the above example, the file name of the custom build that we have loaded onto our container is imply-2021.01-hdp-2.7.3.2.6.5.0-292.tar.gz. When using custom files the format of the file name needs to follow the convention imply-<version>.tar.gz, where <version> matches the version number used in the values.yaml file.

Configuring clusters to run a specific version of Imply

To configure your clusters to run a specific version of Imply, edit implyVersion in the druid section of values.yaml like so:

druid:
...
implyVersion: "2023.02"

Changing implyVersion will not have an affect on existing clusters. See Applying configuration changes to Druid for more details. Only when you create clusters with Helm will the changes be applied.

Leaving implyVersion blank will install the latest version available. If there is no internet connection, the version that is packaged with the Manager is installed.

Applying configuration changes to Druid

When you run the helm upgrade command, Helm doesn't apply changes in the druid section of values.yaml to existing clusters. Use the Manager UI to configure existing clusters.

Alternatively, you can use the preview update flag to apply modifications to existing clusters in single-cluster deployments only.

The update flag takes these values:

  • disabled: The default, applies the changes to the druid section to new clusters only.
  • hard: Updates an existing cluster through a hard upgrade. All nodes are stopped, possibly resulting in a service interruption.
  • rolling: Applies updates to an existing cluster through a rolling upgrade scheme.

See related comments in values.yaml for more information.

Example configuration for Druid and Pivot properties

If you decide to use the update flag to manage Druid or Pivot configurations as code, note that there is a difference between how you define Pivot configurations and Druid's Java processes configuration. For Druid runtime properties, you define them as a list using the format propertykey=propertyvalue while Pivot configurations need to render as a valid YAML. The following example shows the format for both Druid configuration options and Pivot:

druid:
#...
commonRuntimeProperties:
- druid.emitter.clarity.clusterName=sandboxImply
coordinatorRuntimeProperties:
- jvm.config.xms=-Xmx128m
- jvm.config.xmx=-Xmx1280m
overlordRuntimeProperties:
- jvm.config.xms=-Xms128m
- jvm.config.xmx=-Xmx128m
brokerRuntimeProperties:
- jvm.config.xms=-Xms256m
- jvm.config.xmx=-Xmx256m
- druid.processing.buffer.sizeBytes=50000000
- druid.processing.numThreads=1
routerRuntimeProperties:
- jvm.config.xms=-Xms64m
- jvm.config.xmx=-Xmx64m
historicalRuntimeProperties:
- jvm.config.xms=-Xms256m
- jvm.config.xmx=-Xmx256m
- druid.processing.buffer.sizeBytes=50000000
- druid.processing.numThreads=1
- druid.segmentCache.locations=[{"path":"/mnt/var/druid/segment-cache","maxSize":10000000000}]
- druid.server.maxSize=10000000000
- druid.historical.cache.useCache=true
- druid.historical.cache.populateCache=true
- druid.cache.sizeInBytes=50000000
middleManagerRuntimeProperties:
- jvm.config.xms=-Xms64m
- jvm.config.xmx=-Xmx64m
- druid.worker.capacity=1
- druid.indexer.runner.javaOpts=-server -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:MaxDirectMemorySize=3g -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Daws.region=us-east-1
- druid.processing.buffer.sizeBytes=50000000
- druid.processing.numThreads=1
historicalTier1RuntimeProperties: []
historicalTier2RuntimeProperties: []
historicalTier3RuntimeProperties: []
middleManagerTier1RuntimeProperties: []
middleManagerTier2RuntimeProperties: []
middleManagerTier3RuntimeProperties: []

#Pivot configuration needs to render as valid YAML, so you need to quote indents.
pivotRuntimeProperties:
- 'userMode: oidc-authentication'
- 'oidcOptions:'
- ' issuer: https://example.oidcprovider.com'
- ' app_base_url: https://my-pivot.com'
- ' client_id: XXXXXXX'
- ' client_secret: XXXXXXX'
- ' scope: "openid profile email"'

Updating and scaling your cluster

With a Kubernetes-based deployment, you must apply software updates to the cluster servers using Imply Manager, as described in Updating software versions. To update the Imply Manager software, follow the instructions in Update software versions.

The upgrade procedures assume that Imply Manager can access the Imply servers where software updates are hosted. If Imply Manager is operating in a disconnected (or air-gapped) network, we recommend the following update procedure:

  1. Update the Imply Manager software first (as described in Update software versions). Optionally, you may want to configure a delay to the Imply Manager update process when working with large clusters.
  2. Apply the update to the cluster servers from the Manager UI. Imply Manager bundles equivalent versions of the software for the Imply cluster servers.

In the event that you need to roll back an upgrade in an air-gapped environment, use the Helm rollback command to restore the Manager to its prior original version and then roll back the cluster upgrade from there.

For information on scaling your cluster, see Kubernetes Scaling Reference, which covers:

  • Adding master, query, and data nodes to your cluster.
  • Increasing disk space available to the data nodes.
  • Adding data tiers.
  • Adding clusters.

Updating Imply Manager software

Helm makes it relatively easy to update the Manager and agent software.

Run the following command to make sure the repository is current:

helm repo update

If you have customized your deployment chart, merge your changes to the updated Imply chart before updating.

tip

Optionally, you can add a delay to the Imply Manager update process so that all agents have a chance to connect before the update starts. This is useful for larger clusters.

Add the following to your YAML file:

manager:
extraEnv:
- name: imply_manager_updateDelay
# Use ISO-8601 format for the value, such as PT5M
value: ISO-8601_VALUE

Run the Helm upgrade command to install the newer version:

helm upgrade -f optional-overrides-file.yaml deployment-name imply/imply

Kubernetes applies the upgrade in a rolling manner, allowing you to avoid downtime.

Java version for the Imply Agent

You can use Java 8, 11, or 17 for the Imply Agent as long as the Druid version also supports that version of Java. Druid added Java 17 support in the 2023.08 STS release.

The different Java versions for the Agent are named like so: imply/agent:vVERSION-javaVERSION. Substitute the version numbers in for the Agent version and the Java version.

For example, for v20 of the Agent that uses Java 17, the tag is named imply/agent:v20-java17.

You can view a list of the available images on Docker Hub.