2021.02

2021.02

  • Imply
  • Pivot
  • Druid
  • Manager
  • Clarity

›Imply Private

Overview

  • Imply Overview
  • Design
  • Release notes

Tutorials

  • Quickstart
  • Data ingestion tutorial
  • Kafka ingestion tutorial
  • Connect to Kinesis
  • Querying data

Deploy

  • Deployment planning
  • Imply Managed

    • Imply Cloud overview
    • Imply Cloud security
    • Direct access Pivot
    • On-prem Cloud crossover

    Imply Private

    • Imply Private overview
    • Install Imply on Minikube
    • Imply Private on Kubernetes
    • Imply Private on Azure Kubernetes Service
    • Enhanced Imply Private on Google Kubernetes Engine
    • Kubernetes Scaling Reference
    • Kubernetes Deep Storage Reference
    • Imply Private on Linux
    • Pivot state sharing
    • Migrate to Imply

    Unmanaged Imply

    • Unmanaged Imply deploy

Misc

  • Druid API users
  • Extensions
  • Third-party software licenses
  • Experimental features

Install Imply Private on Kubernetes

You can install a distributed, production-ready Imply cluster on the Kubernetes orchestration platform. This lets you take advantage of the deployment, scaling, and redundancy features built into Kubernetes.

Imply's Kubernetes implementation uses a Helm chart to define, deploy, and scale the Imply cluster. This topic covers the default Helm chart delivered with the Imply quickstart and strategies to reconfigure the default options for a production environment.

For more information on Helm charts, see the Helm docs.

This documentation assumes that you are experienced with Kubernetes. See the Kubernetes documentation for complete information.

Supported environments

Imply works with several Kubernetes engines including the following:

  • Google Kubernetes Engine (GKE) 1.13.11
  • Amazon Elastic Kubernetes Service (EKS) 1.13.10
  • Azure Kubernetes Service (AKS) 1.13.12
  • Rancher 2.0.12

The Imply Helm chart requires:

  • Helm 2.16.0, 3.2.0, or later

Newer versions of Kubernetes are supported, but may not have been specifically validated.

Imply recommends you deploy Imply Manager on a Linux host to avoid virtualization overhead. You can run the manager on an OSX host for the quickstart environment, but it is not recommended for production deployments. Windows hosts are not supported.

For external metadata storage, the Imply Manager and the Druid clusters require MySQL version 5.7.21 or later.

Getting started

Choose from the following options to start using Imply with: Kubernetes.

  • If you are new to both systems and want to "kick the tires", try the Minikube installation and the Imply quickstart. The Minikube installation guide walks you through the steps to install Imply on a single machine Kubernetes cluster using Minikube. Minikube deployments are suitable for learning use cases only.
  • To deploy a production-ready cluster, Imply recommends you use a hosted Kubernetes engine See the specific instructions for following providers:
    • Deploying on Azure Kubernetes Engine
    • Deploying on Google Kubernetes Engine

Setting up the Imply Helm repository

The instructions and reference in this guide refer to the Imply Helm chart from the Minikube installation. If you have not followed the steps for installing Imply on Minikube, run the following commands to set up the Imply Helm repository:

helm repo add imply https://static.imply.io/onprem/helm
helm repo update

After you deploy Imply with Kubernetes, see Kubernetes Scaling Reference for information about scaling and managing your cluster.

Fetching and updating the Imply Manager Helm Chart

The Minikube installation includes an Imply Helm chart that works as a good template for creating a Helm chart customized to your environment.

Run the following command to fetch the configuration file for the latest version of the chart:

helm show values imply/imply > values.yaml

After you modify the chart configuration, run the following command to update the deployed release:

helm upgrade {releaseName} imply/imply -f values.yaml

Applying a license

By default, a new Imply trial installation includes a 30-day license. If you have a longer-term license that you want to apply, follow these steps:

  1. Create a file named IMPLY_MANAGER_LICENSE_KEY and paste your license key as the content of the file.
  2. Create a Kubernetes secret named imply-secrets by running:
kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY

To update an existing license, follow these steps:

  1. Delete the existing Kubernetes secret:

    kubectl delete secret imply-secrets
    
  2. Create a new Kubernetes secret. For example, assuming that IMPLY_MANAGER_LICENSE_KEY is the new license and is available in the current folder, you can run:

    kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY 
    
  3. Perform a rolling restart. For example, given the default deployment name, imply-manager, enter:

    kubectl rollout restart deployment imply-manager
    

    You can check the deployment name with kubectl get deployments.

If the rollout didn't successfully delete the pod, you can delete the pods manually as follows:

kubectl delete pod imply-query-xxxxx
kubectl delete pod imply-manager-xxxx

Replace the pod names appropriately for your instance.

Configuring cluster settings

To configure a Helm chart for your environment, work through the various sections of values.yaml. Use the following brief descriptions to help you plan your updates:

  • deployments: Identifies the cluster components to install. By default all are set to true. Set the value to false for components running outside the cluster. See the following:
    • Adding clusters
    • Using an external metadata storage database
    • Zookeeper
    • Deep storage
  • security: Set authentication and TLS security options between the services. See Security.
  • manager: Deployment configuration for Imply manager.
  • druid: Default settings for Druid in new clusters. Helm applies the settings during cluster creation the first time you deploy a chart. See Applying configuration changes to Druid for more detail.
  • master: Configuration for the master service.
  • query: Configuration for the query service.
  • dataTierX: Configuration for the data tier services.
  • zookeeper: Zookeeper configuration. See Zookeeper.
  • mysql: Configuration for the default metadata store.
  • minio: Configuration for the default deep storage. See Deep storage.

Scaling your cluster

See Kubernetes Scaling Reference for information about:

  • Adding master, query, and data nodes to your cluster.
  • Increasing available disk available to the data nodes.
  • Adding data tiers.
  • Adding clusters.

Using an external metadata storage database

To use an external database for metadata storage:

  1. Disable the default database in deployments section of `values.yaml as follows:

    deployments:
      manager: true
      agents: true
    
      zookeeper: true
      mysql: false # Set to false to disable.
      minio: true
    
  2. Configure the connection to the external database. The database must be reachable from the pods within the deployment. The database name cannot contain spaces.

If your cluster is already running, there is no automatic data migration from the default MySQL deployment to your external database. Migrate the data manually if required.

Using an external MySQL DB

To use an external MySQL database, update the metadataStore section for both manager and druid, for example:

manager: #or druid:
  ...
  metadataStore:
    type: mysql
    host: mysql.example.com
    port: 3306
    user: root
    password: imply
    database: imply-manager # For the manager section only.
  ...

Using an external PostgreSQL DB

To use an external MySQL database, update the metadataStore section for both manager and druid, for example:

manager: #or druid:
  ...
  metadataStore:
    type: postgresql
    host: postgresql.example.com
    port: 5432
    user: root
    password: imply
    database: imply-manager # For the manager section only.
  ...

Manually creating the metadata databases

An Imply deployment uses one database for the Imply Manager and one database for each Imply cluster. The Imply Manager automatically attempts to create the required databases if they do not exist.

If the provided user does not have database creation privileges, manually create the databases before running the application. Both databases must use the utf8mb4 default character set. For example:

CREATE SCHEMA `imply_manager` DEFAULT CHARACTER SET utf8mb4;

Enabling TLS to the external metadata store

To enable TLS validation, add the TLS certificate to the metadataStore section. For example:

manager: # or druid:
  metadataStore:
    ...
    tlsCert: |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
    ...

Zookeeper

The Imply Helm chart includes a quickstart Zookeeper configuration that is not recommended for production. For production deployments, you can:

  • Configure Zookeeper for high availability.
  • Use an existing highly available Zookeeper ensemble.

Check the Zookeeper Documentation for more information on clustered Zookeeper ensembles.

Configuring a highly available ensemble

To increase the number of Zookeeper nodes, modify the replicaCount in the zookeeper section of values.yaml.

Zookeeper works best with an odd number of nodes. More nodes allow for more fault tolerance. Set the replicaCount to 3 or higher odd number to enable a highly available Zookeeper ensemble. For example:

...
zookeeper:
  replicaCount: 3 # Change this value
...

Using an external Zookeeper ensemble

If you are using an external Zookeeper ensemble, disable the default Zookeeper deployment in the deployments section of the values.yaml. For example:

deployments:
  manager: true
  agents: true

  zookeeper: false # Updated to false to disable.
  mysql: true
  minio: true

Specify the connection details for the external Zookeeper ensemble in the druid section of values.yaml. The ensemble must be reachable from the pods within the deployment. For example:

druid:
  ...
  zk:
    connectString: "zookeeper.example.com:2181"

If you have already created your cluster, use the Manager UI to configure the external Zookeeper ensemble.

Deep storage

The Imply Helm chart includes a quickstart MinIO configuration that is not recommended for production. See the Druid Deep storage documentation to choose the best supported deep storage type for you.

To use an external deep storage provider, disable the default MinIO deployment in the deployments section of the values.yaml. For example:

deployments:
  manager: true
  agents: true

  zookeeper: true
  mysql: true
  minio: false # this line should be updated to false to disable

When you use a named deep storage type, the cluster automatically picks up:

  • required settings
  • authentication information
  • index logs configuration.

Also, the Druid extension is automatically added to the classpath. If you need to change these settings or others, add them to commonRuntimeProperties in the druid section of values.yaml.

If you have already created your cluster, use the Manager UI to configure the external deep storage connection.

See the following for instructions for your deep storage type:

  • Amazon S3
  • Microsoft Azure
  • Google Cloud Storage
  • HDFS
  • NFS and Kubernetes Supported Volumes

Accessing the cluster using services

By default the cluster is not accessible outside of Kubernetes. You can run a kubectl port-forward command to enable access to the Manager UI, Pivot, or the Druid Console.

If you want to enable access to these components without using kubectl, you can expose them as a Kubernetes service.

Review the Kubernetes docs to learn more and choose the best service type for your deployment.

The following examples demonstrate a LoadBalancer service. Depending on your Kubernetes configuration, the load balancer service type may not be available or supported by default.

Manager

To enable the LoadBalancer service for the Manager UI, update the manager.service entry in values.yaml:

manager:
  ...
  service:
    enabled: true # set to true to enable
    type: LoadBalancer
    port: "{{ ternary 80 443 (empty .Values.security.tls) }}" # will use port 80 if tls is not enabled, or 443 if enabled or any port specified
    # nodePort:
    # loadBalancerIP: set this to request a specific IP, refer to your Kubernetes provider documentation for more information
    protocol: TCP # TCP or HTTP, refer to your Kubernetes provider documentation for more information
    annotations: {} # provider specific annotation to customize the service
  ...

Query nodes

To enable the LoadBalancer service for the cluster's query nodes, update the query.service entry in values.yaml:

query:
  ...
  service:
    type: LoadBalancer
    routerPort: 8888  # Leave blank to not expose the router through the Service
    pivotPort: 9095   # Leave blank to not expose Pivot through the Service
    # routerNodePort:
    # pivotNodePort:
    # loadBalancerIP:
    protocol: TCP
    annotations: {}
  ...

Configuring security option

You can secure your cluster with Kubernetes TLS or authentication options.

TLS

The Kubernetes secrets provides a store for your TLS certificates and keys.

Run the following command to create the Kubernetes secret containing the certificate and key:

kubectl create secret tls imply-ca --key path/to/ca.key --cert path/to/ca.crt

Uncomment the TLS key in the security section of the values.yaml to enable TLS:

security:
  ...
   tls:
     secretName: imply-ca

Run the following command to verify that you have enabled TLS:

kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000

TLS: Enabled appears in the logs when it is enabled.

For information on how to generate certificates and general TLS information, see the Imply TLS Docs.

Authentication

The Kubernetes secrets provides a store for your authentication tokens.

Run the following command to create the secret containing the authentication token:

kubectl create secret generic imply-auth-token --from-literal="auth-token=supersecret"

If you pass the authentication token to the agent this way, the token remains in your history. This is meant as an example only. Consult the Kubernetes secrets documentation for secure ways to create the secret.

Uncomment the auth key in the security section of the values.yaml to enable authentication. For example:

security:
  ...
   auth:
     secretName: imply-auth-token

Run the following command to verify that you have enabled authentication:

kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000

Authentication: Enabled appears in the logs when it is enabled.

See the Imply authentication docs to learn how enabling authentication affects your cluster.

Adding custom user files

The Imply Manager can push custom user files to the nodes of the Imply cluster. Specify the fuiles in the Setup / Advanced Config section of the manager. The Manager can write files to the following locations:

  • User files (/opt/imply/user)
  • Druid's classpath
  • as a Hadoop dependency.

You can use the following processing for custom files:

  • None
  • Requires unpacking (extracts using tar -x)
  • Is executable (sets chmod +x).

You can make these files available to the Manager:

  • Over HTTP(S) by providing the URL.
  • Locally by referencing them with the manager:/// scheme.

To make the files available to the Manager locally, copy them to the /mnt/var/user directory inside the manager container. With Kubernetes, you can mount a volume to that directory in the container image and put the custom files on that volume.

After you make the files available in the specified directory, reference them from the UI using the manager:/// scheme, for example:

  • File: /mnt/var/user/hdfs-site.xml -> Manager path: manager:///hdfs-site.xml

Applying configuration changes to Druid

When you run the helm upgrade command, Helm doesn't apply changes in the druid section of values.yaml to existing clusters. Use the Manager UI to configure existing clusters.

Alternatively, you can use the experimental update flag to apply modifications to existing clusters in single-cluster deployments only.

The update flag takes these values:

  • disabled – The default, applies the changes to the druid section to new clusters only.
  • hard – Updates an existing cluster through a hard upgrade. All nodes are stopped, possibly resulting in a service interruption.
  • rolling – Applies updates to an existing cluster through a rolling upgrade scheme.

See related comments in values.yaml for more information.

Updating the Imply Manager software

Helm makes it relatively easy to update the Manager and agent software.

Run the following command to make sure the repository is up to date:

helm repo update

If you have customized your deployment chart, merge your changes to the updated Imply chart before updating.

Run the Helm upgrade command to install the newer version:

helm upgrade -f optional-overrides-file.yaml deployment-name imply/imply

Kubernetes applies the upgrade in a rolling manner, allowing you to avoid downtime.

← Install Imply on MinikubeImply Private on Azure Kubernetes Service →
  • Supported environments
  • Getting started
    • Setting up the Imply Helm repository
    • Fetching and updating the Imply Manager Helm Chart
    • Applying a license
  • Configuring cluster settings
  • Scaling your cluster
  • Using an external metadata storage database
    • Using an external MySQL DB
    • Using an external PostgreSQL DB
    • Manually creating the metadata databases
    • Enabling TLS to the external metadata store
  • Zookeeper
    • Configuring a highly available ensemble
    • Using an external Zookeeper ensemble
  • Deep storage
  • Accessing the cluster using services
    • Manager
    • Query nodes
  • Configuring security option
    • TLS
    • Authentication
  • Adding custom user files
  • Applying configuration changes to Druid
  • Updating the Imply Manager software
2021.02
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2021 Imply Data, Inc