Install Imply Private on Kubernetes
You can install a distributed, production-ready Imply cluster on the Kubernetes orchestration platform. This lets you take advantage of the deployment, scaling, and redundancy features built into Kubernetes.
Imply's Kubernetes implementation uses a Helm chart to define, deploy, and scale the Imply cluster. This topic covers the default Helm chart delivered with the Minikube quickstart and strategies to reconfigure the default options for a production environment.
For more information on Helm charts, see the Helm docs.
This documentation assumes that you are experienced with Kubernetes. See the Kubernetes documentation for complete information.
Supported environments
Imply works with several Kubernetes engines including the following:
- Google Kubernetes Engine (GKE) 1.13.11
- Amazon Elastic Kubernetes Service (EKS) 1.13.10
- Azure Kubernetes Service (AKS) 1.13.12
- Rancher 2.0.12
The Imply Helm chart requires:
- Helm 2.16.0, 3.2.0, or later
Newer versions of Kubernetes are supported, but may not have been specifically validated.
Imply recommends you deploy Imply Manager on a Linux host to avoid virtualization overhead. You can run the manager on an OSX host for the quickstart environment, but it is not recommended for production deployments. Windows hosts are not supported.
For external metadata storage, the Imply Manager and the Druid clusters require MySQL version 5.7.21 or later.
Getting started
Choose from the following options to start using Imply with: Kubernetes.
- If you are new to both systems and want to "kick the tires", try the Minikube installation and the Imply quickstart. The Minikube installation guide walks you through the steps to install Imply on a single machine Kubernetes cluster using Minikube. Minikube deployments are suitable for learning use cases only.
- To deploy a production-ready cluster, Imply recommends you use a hosted Kubernetes engine See the specific instructions for following providers:
Setting up the Imply Helm repository
The instructions and reference in this guide refer to the Imply Helm chart from the Minikube installation. If you have not followed the steps for installing Imply on Minikube, run the following commands to set up the Imply Helm repository:
helm repo add imply https://static.imply.io/onprem/helm
helm repo update
After you deploy Imply with Kubernetes, see Kubernetes Scaling Reference for information about scaling and managing your cluster.
Fetching and updating the Imply Manager Helm Chart
The Minikube installation includes an Imply Helm chart that works as a good template for creating a Helm chart customized to your environment.
Run the following command to fetch the configuration file for the latest version of the chart:
helm show values imply/imply > values.yaml
After you modify the chart configuration, run the following command to update the deployed release:
helm upgrade {releaseName} imply/imply -f values.yaml
Applying a license
By default, a new Imply trial installation includes a 30-day license. If you have a longer-term license that you want to apply, follow these steps:
- Create a file named IMPLY_MANAGER_LICENSE_KEY and paste your license key as the content of the file.
- Create a Kubernetes secret named imply-secrets by running:
kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
To update an existing license, follow these steps:
Delete the existing Kubernetes secret:
kubectl delete secret imply-secrets
Create a new Kubernetes secret. For example, assuming that
IMPLY_MANAGER_LICENSE_KEY
is the new license and is available in the current folder, you can run:kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
Perform a rolling restart. For example, given the default deployment name,
imply-manager
, enter:kubectl rollout restart deployment imply-manager
You can check the deployment name with
kubectl get deployments
.
If the rollout didn't successfully delete the pod, you can delete the pods manually as follows:
kubectl delete pod imply-query-xxxxx
kubectl delete pod imply-manager-xxxx
Replace the pod names appropriately for your instance.
Configuring cluster settings
To configure a Helm chart for your environment, work through the various sections of values.yaml
. Use the following brief descriptions to help you plan your updates:
deployments
: Identifies the cluster components to install. By default all are set totrue
. Set the value tofalse
for components running outside the cluster. See the following:security
: Set authentication and TLS security options between the services. See Security.manager
: Deployment configuration for Imply manager.druid
: Default settings for Druid in new clusters. Helm applies the settings during cluster creation the first time you deploy a chart. See Applying configuration changes to Druid for more detail.master
: Configuration for the master service.query
: Configuration for the query service.dataTierX
: Configuration for the data tier services.zookeeper
: Zookeeper configuration. See Zookeeper.mysql
: Configuration for the default metadata store.minio
: Configuration for the default deep storage. See Deep storage.
Using an external metadata storage database
To use an external database for metadata storage:
Disable the default database in
deployments
section ofvalues.yaml
as follows:deployments: manager: true agents: true zookeeper: true mysql: false # Set to false to disable. minio: true
Configure the connection to the external database. The database must be reachable from the pods within the deployment. The database name cannot contain spaces.
If your cluster is already running, there is no automatic data migration from the default MySQL deployment to your external database. Migrate the data manually if required.
Using an external MySQL DB
To use an external MySQL database, update the metadataStore
section for both manager
and druid
, for example:
manager: #or druid:
...
metadataStore:
type: mysql
host: mysql.example.com
port: 3306
user: root
password: imply
database: imply-manager # For the manager section only.
...
Using an external PostgreSQL DB
To use an external MySQL database, update the metadataStore
section for both manager
and druid
, for example:
manager: #or druid:
...
metadataStore:
type: postgresql
host: postgresql.example.com
port: 5432
user: root
password: imply
database: imply-manager # For the manager section only.
...
Manually creating the metadata databases
An Imply deployment uses one database for the Imply Manager and one database for each Imply cluster. The Imply Manager automatically attempts to create the required databases if they do not exist.
If the provided user does not have database creation privileges, manually create the databases before running the application. Both databases must use the utf8mb4
default character set. For example:
CREATE SCHEMA `imply_manager` DEFAULT CHARACTER SET utf8mb4;
Enabling TLS to the external metadata store
To enable TLS validation, add the TLS certificate to the metadataStore
section. For example:
manager: # or druid:
metadataStore:
...
tlsCert: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
...
Zookeeper
The Imply Helm chart includes a quickstart Zookeeper configuration that is not recommended for production. For production deployments, you can:
- Configure Zookeeper for high availability.
- Use an existing highly available Zookeeper ensemble.
Check the Zookeeper Documentation for more information on clustered Zookeeper ensembles.
Configuring a highly available ensemble
To increase the number of Zookeeper nodes, modify the replicaCount
in the zookeeper
section of values.yaml
.
Zookeeper works best with an odd number of nodes. More nodes allow for more fault tolerance. Set the replicaCount
to 3 or higher odd number to enable a highly available Zookeeper ensemble. For example:
...
zookeeper:
replicaCount: 3 # Change this value
...
Using an external Zookeeper ensemble
If you are using an external Zookeeper ensemble, disable the default Zookeeper deployment in the deployments section of the values.yaml
. For example:
deployments:
manager: true
agents: true
zookeeper: false # Updated to false to disable.
mysql: true
minio: true
Specify the connection details for the external Zookeeper ensemble in the druid
section of values.yaml
. The ensemble must be reachable from the pods within the deployment. For example:
druid:
...
zk:
connectString: "zookeeper.example.com:2181"
If you have already created your cluster, use the Manager UI to configure the external Zookeeper ensemble.
Deep storage
The Imply Helm chart includes a quickstart MinIO configuration that is not recommended for production. See the Druid Deep storage documentation to choose the best supported deep storage type for you.
To use an external deep storage provider, disable the default MinIO deployment in the deployments section of the values.yaml
. For example:
deployments:
manager: true
agents: true
zookeeper: true
mysql: true
minio: false # this line should be updated to false to disable
When you use a named deep storage type, the cluster automatically picks up:
- required settings
- authentication information
- index logs configuration.
Also, the Druid extension is automatically added to the classpath
. If you need to change these settings or others, add them to commonRuntimeProperties
in the druid
section of values.yaml
.
If you have already created your cluster, use the Manager UI to configure the external deep storage connection.
See the following for instructions for your deep storage type:
Accessing the cluster using services
By default the cluster is not accessible outside of Kubernetes. You can run a kubectl port-forward
command to enable access to the Manager UI, Pivot, or the Druid Console.
If you want to enable access to these components without using kubectl
, you can expose them as a Kubernetes service.
Review the Kubernetes docs to learn more and choose the best service type for your deployment.
The following examples demonstrate a LoadBalancer service. Depending on your Kubernetes configuration, the load balancer service type may not be available or supported by default.
Manager
To enable the LoadBalancer service for the Manager UI, update the manager.service
entry in values.yaml:
manager:
...
service:
enabled: true # set to true to enable
type: LoadBalancer
port: "{{ ternary 80 443 (empty .Values.security.tls) }}" # will use port 80 if tls is not enabled, or 443 if enabled or any port specified
# nodePort:
# loadBalancerIP: set this to request a specific IP, refer to your Kubernetes provider documentation for more information
protocol: TCP # TCP or HTTP, refer to your Kubernetes provider documentation for more information
annotations: {} # provider specific annotation to customize the service
...
Query nodes
To enable the LoadBalancer service for the cluster's query nodes, update the query.service
entry in values.yaml:
query:
...
service:
type: LoadBalancer
routerPort: 8888 # Leave blank to not expose the router through the Service
pivotPort: 9095 # Leave blank to not expose Pivot through the Service
# routerNodePort:
# pivotNodePort:
# loadBalancerIP:
protocol: TCP
annotations: {}
...
Configuring security
You can secure your cluster with Kubernetes TLS or authentication options.
TLS
The Kubernetes secrets provides a store for your TLS certificates and keys.
Run the following command to create the Kubernetes secret containing the certificate and key:
kubectl create secret tls imply-ca --key path/to/ca.key --cert path/to/ca.crt
Uncomment the TLS key in the security section of the values.yaml
to enable TLS:
security:
...
tls:
secretName: imply-ca
Run the following command to verify that you have enabled TLS:
kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000
TLS: Enabled
appears in the logs when it is enabled.
For information on how to generate certificates and general TLS information, see the Imply TLS Docs.
Authentication
The Kubernetes secrets provides a store for your authentication tokens.
Run the following command to create the secret containing the authentication token:
kubectl create secret generic imply-auth-token --from-literal="auth-token=supersecret"
If you pass the authentication token to the agent this way, the token remains in your history. This is meant as an example only. Consult the Kubernetes secrets documentation for secure ways to create the secret.
Uncomment the auth key in the security section of the values.yaml
to enable authentication. For example:
security:
...
auth:
secretName: imply-auth-token
Run the following command to verify that you have enabled authentication:
kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000
Authentication: Enabled
appears in the logs when it is enabled.
See the Imply authentication docs to learn how enabling authentication affects your cluster.
Adding custom user files
For general information on custom user files, see Add custom user files in the Imply Manager documentation.
To have custom user files available while running in Kubernetes we need to get the files into the /mnt/var/user
directory. There are multiple ways to accomplish this. The first option is to use kubectl cp
to copy the file into a running Pod. While this method is very easy, the file will not be available in the container if the Pod is restarted for any reason. To copy a user file this way you can run:
kubectl cp <custom-file-path> <manager-pod-name>:/mnt/var/user/<file-name>
To get a file into the container that will survive a Pod restart there are two options. The first is to create a custom image based on the official Imply Manager Docker image. To accomplish this, you can create a Dockerfile
that looks like:
FROM imply/manager:latest
COPY <custom-file> /mnt/var/user/<file-name>
Once the Dockerfile
is created the custom image can be built using the Docker CLI. An example command would look like:
docker build -f <path-to-Dockerfile> -t myorg/manager:<date> <folder-containing-custom-file>
Once the build is complete you can run docker push
to push it to a remote repository your Kubernetes cluster can read from. This method is simpler than adding a custom mount, but can make upgrading your manager harder in the future as you will have to create a new image each time and update your Helm values files.
The last option here, though this is not a complete list of all options, is to place the file on a shared volume and mount the volume to the manager Pod. You can use an NFS storage path, similar to the description in NFS and Kubernetes supported volumes.
If NFS is not a solution you can use, consider mounting a PVC supported by your cloud provider or Kubernetes cluster and then copying the file with kubectl
to make it sticky between restarts.
Using Custom Versions
If you have a custom build of Imply and want to be able to deploy it using the Manager, first you will need to make the tar.gz
file that contains the custom build available to the Manager. See the instructions above for instructions on how to add custom user files to the Manager. Once this is accomplished you can update your values.yaml
as follows:
manager:
...
customVersions:
- 2021.01-hdp-2.7.3.2.6.5.0-292
In the above example the file name of the custom build that we have loaded onto our container is imply-2021.01-hdp-2.7.3.2.6.5.0-292.tar.gz
. When using custom files the format of the file name needs to follow the convention imply-<version>.tar.gz
, where <version>
matches the version number used in the values.yaml
file.
Configuring clusters to run a specific version of Imply
To configure your clusters to run a specific version of Imply, edit implyVersion
in the druid
section of values.yaml
like so:
druid:
...
implyVersion: "2021.02"
Changing
implyVersion
will not have an affect on existing clusters. See Applying configuration changes to Druid for more details. Only when you create clusters with Helm will the changes be applied.
Leaving implyVersion
blank will install the latest version available. If there is no internet connection, the version that is packaged with the Manager is installed.
Applying configuration changes to Druid
When you run the helm upgrade
command, Helm doesn't apply changes in the druid
section of values.yaml
to existing clusters. Use the Manager UI to configure existing clusters.
Alternatively, you can use the experimental update
flag to apply modifications to existing clusters in single-cluster deployments only.
The update
flag takes these values:
disabled
– The default, applies the changes to thedruid
section to new clusters only.hard
– Updates an existing cluster through a hard upgrade. All nodes are stopped, possibly resulting in a service interruption.rolling
– Applies updates to an existing cluster through a rolling upgrade scheme.
See related comments in values.yaml for more information.
Updating and scaling your cluster
With a Kubernetes-based deployment, you can update the cluster servers to the latest version of the Imply software as described in Updating software versions. When you want to update the Imply Manager software, follow the instructions at Updating the Imply Manager software.
The upgrade procedures assume that the Manager can access the Imply servers where software updates are hosted. If the Imply Manager is operating in a disconnected (or air gapped) network, we recommend the following update procedure:
- Update the Imply Manager software first (as described in Updating the Imply Manager software.
- Apply the update to the cluster servers as usual, from the Manager UI. The Imply Manager bundles equivalent versions of the software for the Imply cluster servers.
In the event that you need to roll back an upgrade in an air gapped environment, you can use the Helm rollback
command to restore the Manager to its prior original version and then roll back the cluster upgrade from there.
For information on scaling your cluster, see Kubernetes Scaling Reference, which covers:
- Adding master, query, and data nodes to your cluster.
- Increasing available disk available to the data nodes.
- Adding data tiers.
- Adding clusters.
Updating the Imply Manager software
Helm makes it relatively easy to update the Manager and agent software.
Run the following command to make sure the repository is up to date:
helm repo update
If you have customized your deployment chart, merge your changes to the updated Imply chart before updating.
Run the Helm upgrade
command to install the newer version:
helm upgrade -f optional-overrides-file.yaml deployment-name imply/imply
Kubernetes applies the upgrade in a rolling manner, allowing you to avoid downtime.