This document describes how to use Kubernetes to deploy a highly available, distributed Imply cluster managed by the Imply Manager.
While this document takes you through the steps for setting up Imply with the Azure Kubernetes Service (AKS), it does not constitute a complete guide for AKS. For much more information, see the Azure Kubernetes Service documentation. Also, for general information on Imply with Kubernetes, see Deploy with Kubernetes.
The following diagram depicts the components deployed in this document:
As shown, the Imply cluster deployments are deployed to Kubernetes pods. It relies on Azure services for metadata storage and deep storage. As usual, the Imply cluster are made up of master, query and data tier nodes, which are deployed as Kubernetes pods. Underlying the cluster (and not shown) are ZooKeeper for orchestration. Setup and administration functions are handled through Helm and the Imply Manager.
An Azure account and access to the Azure Cloud Shell or Azure CLI.
A license key from Imply.
Helm 2.16.0 or 3.2.0, or later
Open the Azure portal and choose Kubernetes services from the list of Azure services.
In the Kubernetes services page, click Create Kubernetes service, opening the form in which you can configure the Kubernetes
cluster:
In the Basics tab, set the following values:
Setting | Value |
---|---|
Subscription | Choose the Azure subscription under which you want to create the resources for the Imply cluster. |
Resource group | Choose the resource group you want to use, or click Create new to create a new group for Imply. Note that the remaining instructions on this page assume a resource group named RG_DRUID . |
Kubernetes cluster name | Any valid cluster name, such as Imply-K8s-1 . |
Region | Choose any region desired, but note your choice, since all resources you create later need to reside in the same region as the one chosen for AKS. |
Kubernetes version | Keep the default. |
Primary node pool | For Node size choose, E8 v3 or better. For Node count, 3 or more. In general, for the Node size, one of the RAM-optimized Azure virtual machine types, such as the E-series virtual machines, is recommended. For an evaluation cluster, a good starting point is four E8 v3 machines, with three nodes operating as Data servers and a fourth node running the Query and Master servers. Production deployments may require larger machines and additional configuration tuning. |
In the Node pools tab, keep the defaults. Note that VM scale sets allows for automated up- or down-scaling of your cluster to accommodate bursty traffic. For more information on implementing this feature, see the cluster autoscaler documentation.
In the Authentication tab, keep the defaults.
In the Networking tab, click the Advanced network configuration button.
Choose the Virtual network you want to use, or click Create new to create a new one for the cluster.
Verify or adjust the remaining default network settings. Note that the Kubernetes service address range provided for a network needs to be large enough to accommodate the number of service IPs for your cluster, given its size. For an average-sized cluster (10 to 20 nodes), the default is sufficient. The other settings can remain at their default values.
In the Integration tab, select container monitoring and Log Analytics workspace if desired.
In the Tags tab, optionally add tags.
In the Review + Create tab, review your settings and click Create when ready.
The AKS cluster deployment will take a few minutes to complete.
You will be using the Kubernetes client, kubectl
to connect to the AKS cluster. First follow these steps to set up kubectl
:
Start an Azure Cloud Shell or your local Azure CLI session.
Get the credentials for the AKS cluster we just created:
az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
rg_druid
is the name of the resource group you configure. Replace {your_cluster_name}
with the name of the cluster you just created, Imply-K8s-1
in the steps above.
Run a test kubectl
command to verify it is working:
kubectl get nodes
The output should list the nodes you created, along with their status, roles, age and version, looking something like this:
NAME STATUS ROLES AGE VERSION
aks-agentpool-29979005-vmss000000 Ready agent 6d2h v1.15.10
aks-agentpool-29979005-vmss000001 Ready agent 6d2h v1.15.10
aks-agentpool-29979005-vmss000002 Ready agent 6d2h v1.15.10
An Imply cluster uses a relational database to store its metadata. With Azure, you use Azure MySQL Database as the metadata database.
Set up MySQL as follows:
From the Azure Portal home page, click Azure Database for MySQL servers. You may need to view all services view to see this option.
Click Create Azure Database for MySQL server to add a new MySQL server.
In the Basics tab, set the following values under Project details:
Setting | Value |
---|---|
Subscription | Choose the subscription that contains the RG_DRUID resource group. |
Resource group | Choose the resource group you created for Druid, RG_DRUID . |
Under Server details, set these values:
Setting | Value |
---|---|
Server name | Any valid MySQL server name. |
Data source | None (Use Backup only if restoring from a previously created database). |
Admin username | Enter a valid username. |
Password | Enter a valid password. |
Location | Choose the same location as chosen for the AKS cluster. |
Version | 5.7+ |
Compute + storage | For a test or trial cluster, you can reduce the size 2 cores (located along top). However, you must maintain at least a General Purpose tier. |
Enter tags for the resource, if desired.
Review the configuration and click Create when ready.
To avoid timezone conflict errors, set the MySQL configuration to a specific timezone, as follows:
When the MySQL deployment completes, you may wish to note access details for the MySQL instance for use in configuring the cluster connection a few steps later:
In this step, you open up communication from the subnet where the AKS cluster resides to the subnet where MySQL resides.
In the Azure Portal, find Azure Database for MySQL servers.
Select the MySQL database you created in the previous step.
Click Settings > Connection security.
Look for the VNET rules section. If not present, verify you have installed a General Purpose MySQL cluster.
Click Enabled next to Enforce SSL connection.
You will need access details for MySQL to use in the Imply Druid configuration later.
a. Navigate to the Overview section for the MySQL server.
b. Note or copy the Server name and Server admin login name to a location that you can access later.
The storage account is where the cluster stores its data. Create it as follows:
From the Azure home page, click Storage accounts.
Click Create storage account.
Enter Basic details:
Setting | Value |
---|---|
Subscription | Choose the subscription that contains the RG_DRUID resource group. |
Resource group | Choose the resource group you created for Druid, RG_DRUID . |
Configure Instance details:
Setting | Value |
---|---|
Storage account name | Any valid account name. |
Location | The same location you chose for the AKS cluster. |
Performance | For a test or trial cluster, Standard is sufficient. |
Account kind | Choose StorageV2. |
Replication | Select your preferred replication. |
Access tier | Choose Hot. |
For the Networking details, under Network Connectivity, select either:
Enter the Advanced details:
Setting | Value |
---|---|
Secure transfer required | Enabled. |
Blob soft delete | Disabled. |
Hierarchical namespace | Choose Enabled for hierarchical support (see details below) or Disabled. |
Enter tags for the resource, if desired.
Review the configuration and click Create when ready.
Add a blob service container to the storage account you just created:
Navigate to the Storage account you created.
Scroll to Blob service.
Select Containers.
Depending on whether you enabled hierarchical support:
For the name of the blob storage container, enter any valid name, such as druid
.
If hierarchical support is disabled, set the Public access level to Private (no anonymous access). If hierarchical support is enabled, this field is not present in the UI.
You may wish to note down access information for the new container, which you will need in a few steps:
Navigate to Settings > Access keys.
Copy the key
or key2
key.
You can now open the Azure Cloud Shell and follow the instructions in the Kubernetes deployment guide to complete the installation using Helm:
Add the Imply repository to Helm by running:
helm repo add imply https://static.imply.io/onprem/helm
helm repo update
See Deploy with Kubernetes for introductory information on using Helm with Imply.
Create a values.yaml
file, populating it with the downloaded contents of the latest Helm chart from Imply, as follows:
helm show values imply/imply > values.yaml
In values.yaml
, change the configuration of minIO and mySQL to false, since you're using Azure storage and the MySQL instance we created in the previous steps.
Add the license key, typically from a file:
kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
Add metadata store configuration:
Provide the Azure mySQL hostname and username password.
Add the certificate, as follows:
manager.metadataStore.tlsCert
If you enabled the Enforce SSL connection option, download the SSL certificate referenced in
Step 1: Obtain SSL certificate in the Microsoft documentation. Copy the contents of the PEM file into the tlsCert
value. (Use a | and indent each line with 4 spaces.)
(dataTier{X}|query|master).nodeSelector
If you have more than one node pool, use this attribute to pin pods to specific node pools. See details in the Kubernetes nodeSelector documentation. For example, use agent pool:
Configure the same settings for the Druid metadata store configuration as well.
Set Azure Storage as the deepStorage
configuration, setting the Path to the container name you created in the storage account, druidk8s
in the example, and other Azure deep storage settings.
Run:
helm install {release-name} imply/imply -f values.yaml
Where {release-name}
is the deployment name you chose.
To access Imply Druid Cluster and Manager locally, make sure you’ve switched the Kubernetes context to the {aks-cluster-name}
and get the credential from Azure CLI:
az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
Use port forwarding to access Imply Manager, following the instructions presented when finishing the helm installation.
To configure Druid Clusters to be accessible inside a VPC, follow Configure Azure CNI networking in Azure Kubernetes Service (AKS).
For more information on adapting the Helm chart for your deployment, see Deploy with Kubernetes.
You now have a cluster running in AKS. If you are getting to know Imply, learn about how to load data in the quickstart.
For ongoing administration and maintenance, see Deploy with Kubernetes and Using the Imply Manager.