Install Imply Private on Kubernetes
You can install a distributed, production-ready Imply cluster on the Kubernetes orchestration platform. This lets you take advantage of the deployment, scaling, and redundancy features built into Kubernetes.
Imply's Kubernetes implementation uses a Helm chart to define, deploy, and scale the Imply cluster. This topic covers the default Helm chart delivered with the Imply quickstart and strategies to reconfigure the default options for a production environment.
For more information on Helm charts, see the Helm docs.
This documentation assumes that you are experienced with Kubernetes. See the Kubernetes documentation for complete information.
Supported environments
Imply works with several Kubernetes engines including the following:
- Google Kubernetes Engine (GKE) 1.13.11
- Amazon Elastic Kubernetes Service (EKS) 1.13.10
- Azure Kubernetes Service (AKS) 1.13.12
- Rancher 2.0.12
The Imply Helm chart requires:
- Helm 2.16.0, 3.2.0, or later
Newer versions of Kubernetes are supported, but may not have been specifically validated.
Imply recommends you deploy Imply Manager on a Linux host to avoid virtualization overhead. You can run the manager on an OSX host for the quickstart environment, but it is not recommended for production deployments. Windows hosts are not supported.
For external metadata storage, the Imply Manager and the Druid clusters require MySQL version 5.7.21 or later.
Getting started
Choose from the following options to start using Imply with: Kubernetes.
- If you are new to both systems and want to "kick the tires", try the Minikube installation and the Imply quickstart. The Minikube installation guide walks you through the steps to install Imply on a single machine Kubernetes cluster using Minikube. Minikube deployments are suitable for learning use cases only.
- To deploy a production-ready cluster, Imply recommends you use a hosted Kubernetes engine See the specific instructions for following providers:
Setting up the Imply Helm repository
The instructions and reference in this guide refer to the Imply Helm chart from the Minikube installation. If you have not followed the steps for installing Imply on Minikube, run the following commands to set up the Imply Helm repository:
helm repo add imply https://static.imply.io/onprem/helm
helm repo update
After you deploy Imply with Kubernetes, see Kubernetes Scaling Reference for information about scaling and managing your cluster.
Fetching and updating the Imply Manager Helm Chart
The Minikube installation includes an Imply Helm chart that works as a good template for creating a Helm chart customized to your environment.
Run the following command to fetch the configuration file for the latest version of the chart:
helm show values imply/imply > values.yaml
After you modify the chart configuration, run the following command to update the deployed release:
helm upgrade {releaseName} imply/imply -f values.yaml
Applying a license
By default, a new Imply trial installation includes a 30-day license. If you have a longer-term license that you want to apply, follow these steps:
- Create a file named IMPLY_MANAGER_LICENSE_KEY and paste your license key as the content of the file.
- Create a Kubernetes secret named imply-secrets by running:
kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
To update an existing license, follow these steps:
Delete the existing Kubernetes secret:
kubectl delete secret imply-secrets
Create a new Kubernetes secret. For example, assuming that
IMPLY_MANAGER_LICENSE_KEY
is the new license and is available in the current folder, you can run:kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
Perform a rolling restart. For example, given the default deployment name,
imply-manager
, enter:kubectl rollout restart deployment imply-manager
You can check the deployment name with
kubectl get deployments
.
If the rollout didn't successfully delete the pod, you can delete the pods manually as follows:
kubectl delete pod imply-query-xxxxx
kubectl delete pod imply-manager-xxxx
Replace the pod names appropriately for your instance.
Configuring cluster settings
To configure a Helm chart for your environment, work through the various sections of values.yaml
. Use the following brief descriptions to help you plan your updates:
deployments
: Identifies the cluster components to install. By default all are set totrue
. Set the value tofalse
for components running outside the cluster. See the following:security
: Set authentication and TLS security options between the services. See Security.manager
: Deployment configuration for Imply manager.druid
: Default settings for Druid in new clusters. Helm applies the settings during cluster creation the first time you deploy a chart. See Applying configuration changes to Druid for more detail.master
: Configuration for the master service.query
: Configuration for the query service.dataTierX
: Configuration for the data tier services.zookeeper
: Zookeeper configuration. See Zookeeper.mysql
: Configuration for the default metadata store.minio
: Configuration for the default deep storage. See Deep storage.
Scaling your cluster
See Kubernetes Scaling Reference for information about:
- Adding master, query, and data nodes to your cluster.
- Increasing available disk available to the data nodes.
- Adding data tiers.
- Adding clusters.
Using an external metadata storage database
To use an external database for metadata storage:
Disable the default database in
deployments
section of `values.yaml as follows:deployments: manager: true agents: true zookeeper: true mysql: false # Set to false to disable. minio: true
Configure the connection to the external database. The database must be reachable from the pods within the deployment. The database name cannot contain spaces.
If your cluster is already running, there is no automatic data migration from the default MySQL deployment to your external database. Migrate the data manually if required.
Using an external MySQL DB
To use an external MySQL database, update the metadataStore
section for both manager
and druid
, for example:
manager: #or druid:
...
metadataStore:
type: mysql
host: mysql.example.com
port: 3306
user: root
password: imply
database: imply-manager # For the manager section only.
...
Using an external PostgreSQL DB
To use an external MySQL database, update the metadataStore
section for both manager
and druid
, for example:
manager: #or druid:
...
metadataStore:
type: postgresql
host: postgresql.example.com
port: 5432
user: root
password: imply
database: imply-manager # For the manager section only.
...
Manually creating the metadata databases
An Imply deployment uses one database for the Imply Manager and one database for each Imply cluster. The Imply Manager automatically attempts to create the required databases if they do not exist.
If the provided user does not have database creation privileges, manually create the databases before running the application. Both databases must use the utf8mb4
default character set. For example:
CREATE SCHEMA `imply_manager` DEFAULT CHARACTER SET utf8mb4;
Enabling TLS to the external metadata store
To enable TLS validation, add the TLS certificate to the metadataStore
section. For example:
manager: # or druid:
metadataStore:
...
tlsCert: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
...
Zookeeper
The Imply Helm chart includes a quickstart Zookeeper configuration that is not recommended for production. For production deployments, you can:
- Configure Zookeeper for high availability.
- Use an existing highly available Zookeeper ensemble.
Check the Zookeeper Documentation for more information on clustered Zookeeper ensembles.
Configuring a highly available ensemble
To increase the number of Zookeeper nodes, modify the replicaCount
in the zookeeper
section of values.yaml
.
Zookeeper works best with an odd number of nodes. More nodes allow for more fault tolerance. Set the replicaCount
to 3 or higher odd number to enable a highly available Zookeeper ensemble. For example:
...
zookeeper:
replicaCount: 3 # Change this value
...
Using an external Zookeeper ensemble
If you are using an external Zookeeper ensemble, disable the default Zookeeper deployment in the deployments section of the values.yaml
. For example:
deployments:
manager: true
agents: true
zookeeper: false # Updated to false to disable.
mysql: true
minio: true
Specify the connection details for the external Zookeeper ensemble in the druid
section of values.yaml
. The ensemble must be reachable from the pods within the deployment. For example:
druid:
...
zk:
connectString: "zookeeper.example.com:2181"
If you have already created your cluster, use the Manager UI to configure the external Zookeeper ensemble.
Deep storage
The Imply Helm chart includes a quickstart MinIO configuration that is not recommended for production. See the Druid Deep storage documentation to choose the best supported deep storage type for you.
To use an external deep storage provider, disable the default MinIO deployment in the deployments section of the values.yaml
. For example:
deployments:
manager: true
agents: true
zookeeper: true
mysql: true
minio: false # this line should be updated to false to disable
When you use a named deep storage type, the cluster automatically picks up:
- required settings
- authentication information
- index logs configuration.
Also, the Druid extension is automatically added to the classpath
. If you need to change these settings or others, add them to commonRuntimeProperties
in the druid
section of values.yaml
.
If you have already created your cluster, use the Manager UI to configure the external deep storage connection.
See the following for instructions for your deep storage type:
Accessing the cluster using services
By default the cluster is not accessible outside of Kubernetes. You can run a kubectl port-forward
command to enable access to the Manager UI, Pivot, or the Druid Console.
If you want to enable access to these components without using kubectl
, you can expose them as a Kubernetes service.
Review the Kubernetes docs to learn more and choose the best service type for your deployment.
The following examples demonstrate a LoadBalancer service. Depending on your Kubernetes configuration, the load balancer service type may not be available or supported by default.
Manager
To enable the LoadBalancer service for the Manager UI, update the manager.service
entry in values.yaml:
manager:
...
service:
enabled: true # set to true to enable
type: LoadBalancer
port: "{{ ternary 80 443 (empty .Values.security.tls) }}" # will use port 80 if tls is not enabled, or 443 if enabled or any port specified
# nodePort:
# loadBalancerIP: set this to request a specific IP, refer to your Kubernetes provider documentation for more information
protocol: TCP # TCP or HTTP, refer to your Kubernetes provider documentation for more information
annotations: {} # provider specific annotation to customize the service
...
Query nodes
To enable the LoadBalancer service for the cluster's query nodes, update the query.service
entry in values.yaml:
query:
...
service:
type: LoadBalancer
routerPort: 8888 # Leave blank to not expose the router through the Service
pivotPort: 9095 # Leave blank to not expose Pivot through the Service
# routerNodePort:
# pivotNodePort:
# loadBalancerIP:
protocol: TCP
annotations: {}
...
Configuring security option
You can secure your cluster with Kubernetes TLS or authentication options.
TLS
The Kubernetes secrets provides a store for your TLS certificates and keys.
Run the following command to create the Kubernetes secret containing the certificate and key:
kubectl create secret tls imply-ca --key path/to/ca.key --cert path/to/ca.crt
Uncomment the TLS key in the security section of the values.yaml
to enable TLS:
security:
...
tls:
secretName: imply-ca
Run the following command to verify that you have enabled TLS:
kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000
TLS: Enabled
appears in the logs when it is enabled.
For information on how to generate certificates and general TLS information, see the Imply TLS Docs.
Authentication
The Kubernetes secrets provides a store for your authentication tokens.
Run the following command to create the secret containing the authentication token:
kubectl create secret generic imply-auth-token --from-literal="auth-token=supersecret"
If you pass the authentication token to the agent this way, the token remains in your history. This is meant as an example only. Consult the Kubernetes secrets documentation for secure ways to create the secret.
Uncomment the auth key in the security section of the values.yaml
to enable authentication. For example:
security:
...
auth:
secretName: imply-auth-token
Run the following command to verify that you have enabled authentication:
kubectl logs -lapp.kubernetes.io/name=imply-manager --tail=10000
Authentication: Enabled
appears in the logs when it is enabled.
See the Imply authentication docs to learn how enabling authentication affects your cluster.
Adding custom user files
The Imply Manager can push custom user files to the nodes of the Imply cluster. Specify the fuiles in the Setup / Advanced Config section of the manager. The Manager can write files to the following locations:
- User files (
/opt/imply/user
) - Druid's classpath
- as a Hadoop dependency.
You can use the following processing for custom files:
- None
- Requires unpacking (extracts using
tar -x
) - Is executable (sets
chmod +x
).
You can make these files available to the Manager:
- Over HTTP(S) by providing the URL.
- Locally by referencing them with the
manager:///
scheme.
To make the files available to the Manager locally, copy them to the /mnt/var/user
directory inside the manager container. With Kubernetes, you can mount a volume to that directory in the container image and put the custom files on that volume.
After you make the files available in the specified directory, reference them from the UI using the manager:///
scheme, for example:
- File: /mnt/var/user/hdfs-site.xml -> Manager path: manager:///hdfs-site.xml
Applying configuration changes to Druid
When you run the helm upgrade
command, Helm doesn't apply changes in the druid
section of values.yaml
to existing clusters. Use the Manager UI to configure existing clusters.
Alternatively, you can use the experimental update
flag to apply modifications to existing clusters in single-cluster deployments only.
The update
flag takes these values:
disabled
– The default, applies the changes to thedruid
section to new clusters only.hard
– Updates an existing cluster through a hard upgrade. All nodes are stopped, possibly resulting in a service interruption.rolling
– Applies updates to an existing cluster through a rolling upgrade scheme.
See related comments in values.yaml for more information.
Updating the Imply Manager software
Helm makes it relatively easy to update the Manager and agent software.
Run the following command to make sure the repository is up to date:
helm repo update
If you have customized your deployment chart, merge your changes to the updated Imply chart before updating.
Run the Helm upgrade
command to install the newer version:
helm upgrade -f optional-overrides-file.yaml deployment-name imply/imply
Kubernetes applies the upgrade in a rolling manner, allowing you to avoid downtime.