Imply Enterprise on Kubernetes (AKS)

This document describes how to use Kubernetes to deploy a highly available, distributed Imply cluster managed by the Imply Manager.

While this document takes you through the steps for setting up Imply with the Azure Kubernetes Service (AKS), it does not constitute a complete guide for AKS. For much more information, see the Azure Kubernetes Service documentation. Also, for general information on Imply with Kubernetes, see Deploy with Kubernetes.

Azure components and sample architecture

The following diagram depicts the components deployed in this document:

Azure sample setup

As shown, the Imply cluster deployments are deployed to Kubernetes pods. It relies on Azure services for metadata storage and deep storage. As usual, the Imply cluster are made up of master, query and data tier nodes, which are deployed as Kubernetes pods. Underlying the cluster (and not shown) are ZooKeeper for orchestration. Setup and administration functions are handled through Helm and the Imply Manager.

Requirements

An Azure account and access to the Azure Cloud Shell or Azure CLI.
A license key from Imply.
Helm 2.16.0 or 3.2.0, or later

Step 1: Set up the Azure Kubernetes Service (AKS)

Open the Azure portal and choose Kubernetes services from the list of Azure services.
In the Kubernetes services page, click Create Kubernetes service, opening the form in which you can configure the Kubernetes cluster:

In the Basics tab, set the following values:

Setting	Value
Subscription	Choose the Azure subscription under which you want to create the resources for the Imply cluster.
Resource group	Choose the resource group you want to use, or click Create new to create a new group for Imply. Note that the remaining instructions on this page assume a resource group named `RG_DRUID`.
Kubernetes cluster name	Any valid cluster name, such as `Imply-K8s-1`.
Region	Choose any region desired, but note your choice, since all resources you create later need to reside in the same region as the one chosen for AKS.
Kubernetes version	Keep the default.
Primary node pool	For Node size, choose E8 v3 or better. For Node count, 3 or more. In general, for the Node size, Imply recommends one of the RAM-optimized Azure virtual machine types, such as the E-series virtual machines. For an evaluation cluster, a good starting point is four E8 v3 machines, with three nodes operating as data servers and a fourth node running the query and master servers. Production deployments may require larger machines and additional configuration tuning.

In the Node pools tab, keep the defaults. Note that VM scale sets allows for automated up- or down-scaling of your cluster to accommodate bursty traffic. For more information on implementing this feature, see the cluster autoscaler documentation.
In the Authentication tab, keep the defaults.
In the Networking tab, click the Advanced network configuration button.
Choose the Virtual network you want to use, or click Create new to create a new one for the cluster.
Verify or adjust the remaining default network settings. Note that the Kubernetes service address range provided for a network needs to be large enough to accommodate the number of service IPs for your cluster, given its size. For an average-sized cluster (10 to 20 nodes), the default is sufficient. The other settings can remain at their default values.
In the Integration tab, select container monitoring and Log Analytics workspace if desired.
In the Tags tab, optionally add tags.
In the Review + Create tab, review your settings and click Create when ready.

The AKS cluster deployment will take a few minutes to complete.

Step 2: Configure the Kubernetes CLI

You will be using the Kubernetes client, kubectl to connect to the AKS cluster. First follow these steps to set up kubectl:

Start an Azure Cloud Shell or your local Azure CLI session.
Get the credentials for the AKS cluster we just created:
```
az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
```
rg_druid is the name of the resource group you configure. Replace {your_cluster_name} with the name of the cluster you just created, Imply-K8s-1 in the steps above.

Run a test kubectl command to verify it is working:

kubectl get nodes

The output should list the nodes you created, along with their status, roles, age and version, looking something like this:

NAME                                STATUS   ROLES   AGE    VERSION
aks-agentpool-29979005-vmss000000   Ready    agent   6d2h   v1.15.10
aks-agentpool-29979005-vmss000001   Ready    agent   6d2h   v1.15.10
aks-agentpool-29979005-vmss000002   Ready    agent   6d2h   v1.15.10

Step 3: Set up MySQL

An Imply cluster uses a relational database to store its metadata. With Azure, you use Azure MySQL Database as the metadata database.

Set up MySQL as follows:

From the Azure Portal home page, click Azure Database for MySQL servers. You may need to view all services view to see this option.
Click Create Azure Database for MySQL server to add a new MySQL server.
In the Basics tab, set the following values under Project details:

Setting Value
Subscription Choose the subscription that contains the RG_DRUID resource group.
Resource group Choose the resource group you created for Druid, RG_DRUID.

Setting	Value
Subscription	Choose the subscription that contains the `RG_DRUID` resource group.
Resource group	Choose the resource group you created for Druid, `RG_DRUID`.

Under Server details, set these values:

Setting	Value
Server name	Any valid MySQL server name.
Data source	`None` (Use `Backup` only if restoring from a previously created database).
Admin username	Enter a valid username.
Password	Enter a valid password.
Location	Choose the same location as chosen for the AKS cluster.
Version	8.0
Compute + storage	For a test or trial cluster, you can reduce the size 2 cores (located along top). However, you must maintain at least a General Purpose tier.

Enter tags for the resource, if desired.
Review the configuration and click Create when ready.

Step 4: Update the database timezone

To avoid timezone conflict errors, set the MySQL configuration to a specific timezone, as follows:

Go to Settings > Server parameters.
Scroll down to the time zone setting.
Set the time zone value to +00:00, Coordinated Universal Time.
Click Save.

Step 5: Collect MySQL access details

When the MySQL deployment completes, you may wish to note access details for the MySQL instance for use in configuring the cluster connection a few steps later:

Go to the Overview section for the MySQL server
Note the Server name and Server admin login name.

Step 6: Open a secure connection

In this step, you open up communication from the subnet where the AKS cluster resides to the subnet where MySQL resides.

In the Azure Portal, find Azure Database for MySQL servers.
Select the MySQL database you created in the previous step.
Click Settings > Connection security.
Look for the VNET rules section. If not present, verify you have installed a General Purpose MySQL cluster.
1. Create a new VNET rule by clicking + Adding existing virtual network, if there is an existing virtual you want to use, or + Create new virtual network, to create a new one.
  
  Important: If the service endpoint is not enabled yet, you will need to enable it first. A message will appear in this case; click Enable to continue.
2. Enter a name for the rule.
3. Select the virtual network to use.
4. Select the appropriate subnet for the Subnet name and click OK. (This will be the subnet where your Druid cluster resides.)
Click Enabled next to Enforce SSL connection.
You will need access details for MySQL to use in the Druid configuration later.
1. Navigate to the Overview section for the MySQL server.
2. Note or copy the Server name and Server admin login name to a location that you can access later.

Step 7: Configure the storage account

The storage account is where the cluster stores its data. Create it as follows:

From the Azure home page, click Storage accounts.
Click Create storage account.
Enter Basic details:

Setting Value
Subscription Choose the subscription that contains the RG_DRUID resource group.
Resource group Choose the resource group you created for Druid, RG_DRUID.

Setting	Value
Subscription	Choose the subscription that contains the `RG_DRUID` resource group.
Resource group	Choose the resource group you created for Druid, `RG_DRUID`.

Configure Instance details:

Setting	Value
Storage account name	Any valid account name.
Location	The same location you chose for the AKS cluster.
Performance	For a test or trial cluster, Standard is sufficient.
Account kind	Choose StorageV2.
Replication	Select your preferred replication.
Access tier	Choose Hot.

For the Networking details, under Network Connectivity, select either:
- Public endpoint (all networks), or one of the other options in the networking section.
- For higher security, use Public endpoint (selected networks) and select the subnet where your Druid cluster resides.
Enter the Advanced details:

Setting Value
Secure transfer required Enabled.
Blob soft delete Disabled.
Hierarchical namespace Choose Enabled for hierarchical support (see details below) or Disabled.
Enter tags for the resource, if desired.
Review the configuration and click Create when ready.

Setting	Value
Secure transfer required	Enabled.
Blob soft delete	Disabled.
Hierarchical namespace	Choose Enabled for hierarchical support (see details below) or Disabled.

Step 8: Create the storage container

Hierarchical namespaces

Hierarchical namespaces aren't supported for storage accounts.

Add a blob service container to the storage account you just created:

Navigate to the storage account you created.
Scroll to Blob service.
Select Containers.
Depending on whether you enabled hierarchical support:
- If disabled, create a new container using the + Container action.
- If enabled, add a file system using the + File system action.
For the name of the blob storage container, enter any valid name, such as druid.
If hierarchical support is disabled, set the Public access level to Private (no anonymous access). If hierarchical support is enabled, this field is not present in the UI.

You may wish to note down access information for the new container, which you will need a few steps later in this procedure. To do so:

Navigate to Settings > Access keys.
Copy the key or key2 key.

Step 9: Install Druid

You can now open the Azure Cloud Shell and follow the instructions in the Kubernetes deployment guide to complete the installation using Helm:

Add the Imply repository to Helm by running:
```
helm repo add imply https://static.imply.io/onprem/helm
helm repo update
```
See Deploy with Kubernetes for introductory information on using Helm with Imply.
Create a values.yaml file, populating it with the downloaded contents of the latest Helm chart from Imply, as follows:
```
helm show values imply/imply > values.yaml
```
In values.yaml, change the configuration of minIO and mySQL to false, since you're using Azure storage and the MySQL instance we created in the previous steps.
Create a Kubernetes Secret key from the Imply license key:
1. Create a file named IMPLY_MANAGER_LICENSE_KEY and paste your license key as the content of the file.
2. Create a K8s secret named imply-secrets by running:
```
kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
```
Add metadata store configuration:
1. Provide the Azure mySQL hostname and username password.
2. Add the certificate, as follows:
- manager.metadataStore.tlsCert If you enabled the Enforce SSL connection option, download the SSL certificate referenced in Step 1: Obtain SSL certificate in the Microsoft documentation. Copy the contents of the PEM file into the tlsCert value. (Use a | and indent each line with 4 spaces.)
- (dataTier{X}|query|master).nodeSelector If you have more than one node pool, use this attribute to pin pods to specific node pools. See details in the Kubernetes nodeSelector documentation. For example, use agent pool: <node pool name>.
Configure the same settings for the Druid metadata store configuration as well.
Set Azure Storage as the deepStorage configuration, setting the Path to the container name you created in the storage account, druidk8s in the example, and other Azure deep storage settings.
Run:
```
helm install {release-name} imply/imply -f values.yaml
```
Where {release-name} is the deployment name you chose.
To access Druid cluster and Manager locally, make sure you’ve switched the Kubernetes context to the {aks-cluster-name} and get the credential from Azure CLI:
```
az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
```
Use port forwarding to access Imply Manager, following the instructions presented when finishing the helm installation.

To configure Druid Clusters to be accessible inside a VPC, follow Configure Azure CNI networking in Azure Kubernetes Service (AKS).

For more information on adapting the Helm chart for your deployment, see Deploy with Kubernetes.

Next steps

You now have a cluster running in AKS. If you are getting to know Imply, learn about how to load data in the quickstart.

For ongoing administration and maintenance, see Deploy with Kubernetes and Using the Imply Manager.

Azure components and sample architecture​

Requirements​

Step 1: Set up the Azure Kubernetes Service (AKS)​

Step 2: Configure the Kubernetes CLI​

Step 3: Set up MySQL​

Step 4: Update the database timezone​

Step 5: Collect MySQL access details​

Step 6: Open a secure connection​

Step 7: Configure the storage account​

Step 8: Create the storage container​

Step 9: Install Druid​

Next steps​