Imply Enterprise on Kubernetes (AKS)
This document describes how to use Kubernetes to deploy a highly available, distributed Imply cluster managed by the Imply Manager.
While this document takes you through the steps for setting up Imply with the Azure Kubernetes Service (AKS), it does not constitute a complete guide for AKS. For much more information, see the Azure Kubernetes Service documentation. Also, for general information on Imply with Kubernetes, see Deploy with Kubernetes.
Azure components and sample architecture
The following diagram depicts the components deployed in this document:
As shown, the Imply cluster deployments are deployed to Kubernetes pods. It relies on Azure services for metadata storage and deep storage. As usual, the Imply cluster are made up of master, query and data tier nodes, which are deployed as Kubernetes pods. Underlying the cluster (and not shown) are ZooKeeper for orchestration. Setup and administration functions are handled through Helm and the Imply Manager.
Requirements
An Azure account and access to the Azure Cloud Shell or Azure CLI.
A license key from Imply.
Helm 2.16.0 or 3.2.0, or later
Step 1: Set up the Azure Kubernetes Service (AKS)
Open the Azure portal and choose Kubernetes services from the list of Azure services.
In the Kubernetes services page, click Create Kubernetes service, opening the form in which you can configure the Kubernetes cluster:
In the Basics tab, set the following values:
Setting Value Subscription Choose the Azure subscription under which you want to create the resources for the Imply cluster. Resource group Choose the resource group you want to use, or click Create new to create a new group for Imply. Note that the remaining instructions on this page assume a resource group named RG_DRUID
.Kubernetes cluster name Any valid cluster name, such as Imply-K8s-1
.Region Choose any region desired, but note your choice, since all resources you create later need to reside in the same region as the one chosen for AKS. Kubernetes version Keep the default. Primary node pool For Node size choose, E8 v3 or better. For Node count, 3 or more.
In general, for the Node size, one of the RAM-optimized Azure virtual machine types, such as the E-series virtual machines, is recommended. For an evaluation cluster, a good starting point is four E8 v3 machines, with three nodes operating as Data servers and a fourth node running the Query and Master servers. Production deployments may require larger machines and additional configuration tuning.In the Node pools tab, keep the defaults. Note that VM scale sets allows for automated up- or down-scaling of your cluster to accommodate bursty traffic. For more information on implementing this feature, see the cluster autoscaler documentation.
In the Authentication tab, keep the defaults.
In the Networking tab, click the Advanced network configuration button.
Choose the Virtual network you want to use, or click Create new to create a new one for the cluster.
Verify or adjust the remaining default network settings. Note that the Kubernetes service address range provided for a network needs to be large enough to accommodate the number of service IPs for your cluster, given its size. For an average-sized cluster (10 to 20 nodes), the default is sufficient. The other settings can remain at their default values.
In the Integration tab, select container monitoring and Log Analytics workspace if desired.
In the Tags tab, optionally add tags.
In the Review + Create tab, review your settings and click Create when ready.
The AKS cluster deployment will take a few minutes to complete.
Step 2: Configure the Kubernetes CLI
You will be using the Kubernetes client, kubectl
to connect to the AKS cluster. First follow these steps to set up kubectl
:
Start an Azure Cloud Shell or your local Azure CLI session.
Get the credentials for the AKS cluster we just created:
az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
rg_druid
is the name of the resource group you configure. Replace{your_cluster_name}
with the name of the cluster you just created,Imply-K8s-1
in the steps above.Run a test
kubectl
command to verify it is working:kubectl get nodes
The output should list the nodes you created, along with their status, roles, age and version, looking something like this:
NAME STATUS ROLES AGE VERSION
aks-agentpool-29979005-vmss000000 Ready agent 6d2h v1.15.10
aks-agentpool-29979005-vmss000001 Ready agent 6d2h v1.15.10
aks-agentpool-29979005-vmss000002 Ready agent 6d2h v1.15.10
Step 3: Set up MySQL
An Imply cluster uses a relational database to store its metadata. With Azure, you use Azure MySQL Database as the metadata database.
Set up MySQL as follows:
From the Azure Portal home page, click Azure Database for MySQL servers. You may need to view all services view to see this option.
Click Create Azure Database for MySQL server to add a new MySQL server.
In the Basics tab, set the following values under Project details:
Setting Value Subscription Choose the subscription that contains the RG_DRUID
resource group.Resource group Choose the resource group you created for Druid, RG_DRUID
.Under Server details, set these values:
Setting Value Server name Any valid MySQL server name. Data source None
(UseBackup
only if restoring from a previously created database).Admin username Enter a valid username. Password Enter a valid password. Location Choose the same location as chosen for the AKS cluster. Version 8.0 Compute + storage For a test or trial cluster, you can reduce the size 2 cores (located along top). However, you must maintain at least a General Purpose tier. Enter tags for the resource, if desired.
Review the configuration and click Create when ready.
Step 4: Update the database timezone
To avoid timezone conflict errors, set the MySQL configuration to a specific timezone, as follows:
- Go to Settings > Server parameters.
- Scroll down to the time zone setting.
- Set the time zone value to
+00:00
, Coordinated Universal Time. - Click Save.
Step 5: Collect MySQL access details
When the MySQL deployment completes, you may wish to note access details for the MySQL instance for use in configuring the cluster connection a few steps later:
- Go to the Overview section for the MySQL server
- Note the Server name and Server admin login name.
Step 6: Open a secure connection
In this step, you open up communication from the subnet where the AKS cluster resides to the subnet where MySQL resides.
In the Azure Portal, find Azure Database for MySQL servers.
Select the MySQL database you created in the previous step.
Click Settings > Connection security.
Look for the VNET rules section. If not present, verify you have installed a General Purpose MySQL cluster.
- Create a new VNET rule by clicking + Adding existing virtual network, if there is an
existing virtual you want to use, or + Create new virtual network, to create a new one.
Important: If the service endpoint is not enabled yet, you will need to enable it first. A message will appear in this case; click Enable to continue.
- Enter a name for the rule.
- Select the virtual network to use.
- Select the appropriate subnet for the Subnet name and click OK. (This will be the subnet where your Druid cluster resides.)
- Create a new VNET rule by clicking + Adding existing virtual network, if there is an
existing virtual you want to use, or + Create new virtual network, to create a new one.
Click Enabled next to Enforce SSL connection.
You will need access details for MySQL to use in the Druid configuration later.
- Navigate to the Overview section for the MySQL server.
- Note or copy the Server name and Server admin login name to a location that you can access later.
Step 7: Configure the storage account
The storage account is where the cluster stores its data. Create it as follows:
From the Azure home page, click Storage accounts.
Click Create storage account.
Enter Basic details:
Setting Value Subscription Choose the subscription that contains the RG_DRUID
resource group.Resource group Choose the resource group you created for Druid, RG_DRUID
.Configure Instance details:
Setting Value Storage account name Any valid account name. Location The same location you chose for the AKS cluster. Performance For a test or trial cluster, Standard is sufficient. Account kind Choose StorageV2. Replication Select your preferred replication. Access tier Choose Hot. For the Networking details, under Network Connectivity, select either:
- Public endpoint (all networks), or one of the other options in the networking section.
- For higher security, use Public endpoint (selected networks) and select the subnet where your Druid cluster resides.
Enter the Advanced details:
Setting Value Secure transfer required Enabled. Blob soft delete Disabled. Hierarchical namespace Choose Enabled for hierarchical support (see details below) or Disabled. Enter tags for the resource, if desired.
Review the configuration and click Create when ready.
Step 8: Create the storage container
Hierarchical namespaces aren't supported for storage accounts.
Add a blob service container to the storage account you just created:
- Navigate to the storage account you created.
- Scroll to Blob service.
- Select Containers.
- Depending on whether you enabled hierarchical support:
- If disabled, create a new container using the + Container action.
- If enabled, add a file system using the + File system action.
- For the name of the blob storage container, enter any valid name, such as
druid
. - If hierarchical support is disabled, set the Public access level to Private (no anonymous access). If hierarchical support is enabled, this field is not present in the UI.
You may wish to note down access information for the new container, which you will need a few steps later in this procedure. To do so:
Navigate to Settings > Access keys.
Copy the
key
orkey2
key.
Step 9: Install Druid
You can now open the Azure Cloud Shell and follow the instructions in the Kubernetes deployment guide to complete the installation using Helm:
Add the Imply repository to Helm by running:
helm repo add imply https://static.imply.io/onprem/helm
helm repo updateSee Deploy with Kubernetes for introductory information on using Helm with Imply.
Create a
values.yaml
file, populating it with the downloaded contents of the latest Helm chart from Imply, as follows:helm show values imply/imply > values.yaml
In
values.yaml
, change the configuration of minIO and mySQL to false, since you're using Azure storage and the MySQL instance we created in the previous steps.Create a Kubernetes Secret key from the Imply license key:
- Create a file named
IMPLY_MANAGER_LICENSE_KEY
and paste your license key as the content of the file. - Create a K8s secret named
imply-secrets
by running:kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
- Create a file named
Add metadata store configuration:
- Provide the Azure mySQL hostname and username password.
- Add the certificate, as follows:
manager.metadataStore.tlsCert
If you enabled the Enforce SSL connection option, download the SSL certificate referenced in Step 1: Obtain SSL certificate in the Microsoft documentation. Copy the contents of the PEM file into thetlsCert
value. (Use a | and indent each line with 4 spaces.)(dataTier{X}|query|master).nodeSelector
If you have more than one node pool, use this attribute to pin pods to specific node pools. See details in the Kubernetes nodeSelector documentation. For example, use agent pool:<node pool name>
.
Configure the same settings for the Druid metadata store configuration as well.
Set Azure Storage as the
deepStorage
configuration, setting the Path to the container name you created in the storage account,druidk8s
in the example, and other Azure deep storage settings.Run:
helm install {release-name} imply/imply -f values.yaml
Where
{release-name}
is the deployment name you chose.To access Druid cluster and Manager locally, make sure you’ve switched the Kubernetes context to the
{aks-cluster-name}
and get the credential from Azure CLI:az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
Use port forwarding to access Imply Manager, following the instructions presented when finishing the helm installation.
To configure Druid Clusters to be accessible inside a VPC, follow Configure Azure CNI networking in Azure Kubernetes Service (AKS).
For more information on adapting the Helm chart for your deployment, see Deploy with Kubernetes.
Next steps
You now have a cluster running in AKS. If you are getting to know Imply, learn about how to load data in the quickstart.
For ongoing administration and maintenance, see Deploy with Kubernetes and Using the Imply Manager.