Skip to main content

Imply Enterprise on Kubernetes (AKS)

This document describes how to use Kubernetes to deploy a highly available, distributed Imply cluster managed by the Imply Manager.

While this document takes you through the steps for setting up Imply with the Azure Kubernetes Service (AKS), it does not constitute a complete guide for AKS. For much more information, see the Azure Kubernetes Service documentation. Also, for general information on Imply with Kubernetes, see Deploy with Kubernetes.

Azure components and sample architecture

The following diagram depicts the components deployed in this document:

Azure sample setup

As shown, the Imply cluster deployments are deployed to Kubernetes pods. It relies on Azure services for metadata storage and deep storage. As usual, the Imply cluster are made up of master, query and data tier nodes, which are deployed as Kubernetes pods. Underlying the cluster (and not shown) are ZooKeeper for orchestration. Setup and administration functions are handled through Helm and the Imply Manager.


Step 1: Set up the Azure Kubernetes Service (AKS)

  1. Open the Azure portal and choose Kubernetes services from the list of Azure services.

  2. In the Kubernetes services page, click Create Kubernetes service, opening the form in which you can configure the Kubernetes cluster:

    Azure Kubernetes setup

  3. In the Basics tab, set the following values:

    SubscriptionChoose the Azure subscription under which you want to create the resources for the Imply cluster.
    Resource groupChoose the resource group you want to use, or click Create new to create a new group for Imply. Note that the remaining instructions on this page assume a resource group named RG_DRUID.
    Kubernetes cluster nameAny valid cluster name, such as Imply-K8s-1.
    RegionChoose any region desired, but note your choice, since all resources you create later need to reside in the same region as the one chosen for AKS.
    Kubernetes versionKeep the default.
    Primary node poolFor Node size choose, E8 v3 or better. For Node count, 3 or more.

    In general, for the Node size, one of the RAM-optimized Azure virtual machine types, such as the E-series virtual machines, is recommended. For an evaluation cluster, a good starting point is four E8 v3 machines, with three nodes operating as Data servers and a fourth node running the Query and Master servers. Production deployments may require larger machines and additional configuration tuning.
  4. In the Node pools tab, keep the defaults. Note that VM scale sets allows for automated up- or down-scaling of your cluster to accommodate bursty traffic. For more information on implementing this feature, see the cluster autoscaler documentation.

  5. In the Authentication tab, keep the defaults.

  6. In the Networking tab, click the Advanced network configuration button.

  7. Choose the Virtual network you want to use, or click Create new to create a new one for the cluster.

  8. Verify or adjust the remaining default network settings. Note that the Kubernetes service address range provided for a network needs to be large enough to accommodate the number of service IPs for your cluster, given its size. For an average-sized cluster (10 to 20 nodes), the default is sufficient. The other settings can remain at their default values.

  9. In the Integration tab, select container monitoring and Log Analytics workspace if desired.

  10. In the Tags tab, optionally add tags.

  11. In the Review + Create tab, review your settings and click Create when ready.

The AKS cluster deployment will take a few minutes to complete.

Step 2: Configure the Kubernetes CLI

You will be using the Kubernetes client, kubectl to connect to the AKS cluster. First follow these steps to set up kubectl:

  1. Start an Azure Cloud Shell or your local Azure CLI session.

  2. Get the credentials for the AKS cluster we just created:

    az aks get-credentials --resource-group rg_druid --name {your_cluster_name}

    rg_druid is the name of the resource group you configure. Replace {your_cluster_name} with the name of the cluster you just created, Imply-K8s-1 in the steps above.

  3. Run a test kubectl command to verify it is working:

    kubectl get nodes

    The output should list the nodes you created, along with their status, roles, age and version, looking something like this:

    NAME                                STATUS   ROLES   AGE    VERSION
    aks-agentpool-29979005-vmss000000 Ready agent 6d2h v1.15.10
    aks-agentpool-29979005-vmss000001 Ready agent 6d2h v1.15.10
    aks-agentpool-29979005-vmss000002 Ready agent 6d2h v1.15.10

Step 3: Set up MySQL

An Imply cluster uses a relational database to store its metadata. With Azure, you use Azure MySQL Database as the metadata database.

Set up MySQL as follows:

  1. From the Azure Portal home page, click Azure Database for MySQL servers. You may need to view all services view to see this option.

  2. Click Create Azure Database for MySQL server to add a new MySQL server.

  3. In the Basics tab, set the following values under Project details:

    SubscriptionChoose the subscription that contains the RG_DRUID resource group.
    Resource groupChoose the resource group you created for Druid, RG_DRUID.
  4. Under Server details, set these values:

    Server nameAny valid MySQL server name.
    Data sourceNone (Use Backup only if restoring from a previously created database).
    Admin usernameEnter a valid username.
    PasswordEnter a valid password.
    LocationChoose the same location as chosen for the AKS cluster.
    Compute + storageFor a test or trial cluster, you can reduce the size 2 cores (located along top). However, you must maintain at least a General Purpose tier.
  5. Enter tags for the resource, if desired.

  6. Review the configuration and click Create when ready.

Step 4: Update the database timezone

To avoid timezone conflict errors, set the MySQL configuration to a specific timezone, as follows:

  1. Go to Settings > Server parameters.
  2. Scroll down to the time zone setting.
  3. Set the time zone value to +00:00, Coordinated Universal Time.
  4. Click Save.

Step 5: Collect MySQL access details

When the MySQL deployment completes, you may wish to note access details for the MySQL instance for use in configuring the cluster connection a few steps later:

  1. Go to the Overview section for the MySQL server
  2. Note the Server name and Server admin login name.

Step 6: Open a secure connection

In this step, you open up communication from the subnet where the AKS cluster resides to the subnet where MySQL resides.

  1. In the Azure Portal, find Azure Database for MySQL servers.

  2. Select the MySQL database you created in the previous step.

  3. Click Settings > Connection security.

  4. Look for the VNET rules section. If not present, verify you have installed a General Purpose MySQL cluster.

    1. Create a new VNET rule by clicking + Adding existing virtual network, if there is an existing virtual you want to use, or + Create new virtual network, to create a new one.

      Important: If the service endpoint is not enabled yet, you will need to enable it first. A message will appear in this case; click Enable to continue.

    2. Enter a name for the rule.
    3. Select the virtual network to use.
    4. Select the appropriate subnet for the Subnet name and click OK. (This will be the subnet where your Druid cluster resides.)
  5. Click Enabled next to Enforce SSL connection.

  6. You will need access details for MySQL to use in the Druid configuration later.

    1. Navigate to the Overview section for the MySQL server.
    2. Note or copy the Server name and Server admin login name to a location that you can access later.

Step 7: Configure the storage account

The storage account is where the cluster stores its data. Create it as follows:

  1. From the Azure home page, click Storage accounts.

  2. Click Create storage account.

  3. Enter Basic details:

    SubscriptionChoose the subscription that contains the RG_DRUID resource group.
    Resource groupChoose the resource group you created for Druid, RG_DRUID.
  4. Configure Instance details:

    Storage account nameAny valid account name.
    LocationThe same location you chose for the AKS cluster.
    PerformanceFor a test or trial cluster, Standard is sufficient.
    Account kindChoose StorageV2.
    ReplicationSelect your preferred replication.
    Access tierChoose Hot.
  5. For the Networking details, under Network Connectivity, select either:

    • Public endpoint (all networks), or one of the other options in the networking section.
    • For higher security, use Public endpoint (selected networks) and select the subnet where your Druid cluster resides.
  6. Enter the Advanced details:

    Secure transfer requiredEnabled.
    Blob soft deleteDisabled.
    Hierarchical namespaceChoose Enabled for hierarchical support (see details below) or Disabled.
  7. Enter tags for the resource, if desired.

  8. Review the configuration and click Create when ready.

Step 8: Create the storage container

Add a blob service container to the storage account you just created:

  1. Navigate to the Storage account you created.

  2. Scroll to Blob service.

  3. Select Containers.

  4. Depending on whether you enabled hierarchical support:

    • If disabled, create a new container using the + Container action.
    • If enabled, add a file system using the + File system action.
  5. For the name of the blob storage container, enter any valid name, such as druid.

  6. If hierarchical support is disabled, set the Public access level to Private (no anonymous access). If hierarchical support is enabled, this field is not present in the UI.

You may wish to note down access information for the new container, which you will need a few steps later in this procedure. To do so:

  1. Navigate to Settings > Access keys.

  2. Copy the key or key2 key.

Step 9: Install Druid

You can now open the Azure Cloud Shell and follow the instructions in the Kubernetes deployment guide to complete the installation using Helm:

  1. Add the Imply repository to Helm by running:

    helm repo add imply
    helm repo update

    See Deploy with Kubernetes for introductory information on using Helm with Imply.

  2. Create a values.yaml file, populating it with the downloaded contents of the latest Helm chart from Imply, as follows:

    helm show values imply/imply > values.yaml
  3. In values.yaml, change the configuration of minIO and mySQL to false, since you're using Azure storage and the MySQL instance we created in the previous steps.

  4. Create a Kubernetes Secret key from the Imply license key:

    1. Create a file named IMPLY_MANAGER_LICENSE_KEY and paste your license key as the content of the file.
    2. Create a K8s secret named imply-secrets by running:
      kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
  5. Add metadata store configuration:

    1. Provide the Azure mySQL hostname and username password.
    2. Add the certificate, as follows:
    • manager.metadataStore.tlsCert If you enabled the Enforce SSL connection option, download the SSL certificate referenced in Step 1: Obtain SSL certificate in the Microsoft documentation. Copy the contents of the PEM file into the tlsCert value. (Use a | and indent each line with 4 spaces.)

    • (dataTier{X}|query|master).nodeSelector If you have more than one node pool, use this attribute to pin pods to specific node pools. See details in the Kubernetes nodeSelector documentation. For example, use agent pool: <node pool name>.

  6. Configure the same settings for the Druid metadata store configuration as well.

  7. Set Azure Storage as the deepStorage configuration, setting the Path to the container name you created in the storage account, druidk8s in the example, and other Azure deep storage settings.

  8. Run:

    helm install {release-name} imply/imply -f values.yaml

    Where {release-name} is the deployment name you chose.

  9. To access Druid cluster and Manager locally, make sure you’ve switched the Kubernetes context to the {aks-cluster-name} and get the credential from Azure CLI:

    az aks get-credentials --resource-group rg_druid --name {your_cluster_name}
  10. Use port forwarding to access Imply Manager, following the instructions presented when finishing the helm installation.

To configure Druid Clusters to be accessible inside a VPC, follow Configure Azure CNI networking in Azure Kubernetes Service (AKS).

For more information on adapting the Helm chart for your deployment, see Deploy with Kubernetes.

Next steps

You now have a cluster running in AKS. If you are getting to know Imply, learn about how to load data in the quickstart.

For ongoing administration and maintenance, see Deploy with Kubernetes and Using the Imply Manager.