2020.11

2020.11

  • Imply
  • Pivot
  • Druid
  • Manager
  • Clarity

›Imply Private

Overview

  • Imply Overview
  • Quickstart
  • Design
  • Release notes

Deploy

  • Deployment planning
  • Imply Managed

    • Imply Cloud overview
    • Imply Cloud security
    • Direct access Pivot
    • On-prem Cloud crossover

    Imply Private

    • Imply Private overview
    • Install Imply on Minikube
    • Imply Private on Kubernetes
    • Imply Private on Azure Kubernetes Service
    • Enhanced Imply Private on Google Kubernetes Engine (alpha)
    • Kubernetes Scaling Reference
    • Kubernetes Deep Storage Reference
    • Imply Private on Linux
    • Pivot state sharing
    • Migrate to Imply

    Unmanaged Imply

    • Unmanaged Imply deploy

Misc

  • Druid API users
  • Extensions
  • Third-party software licenses

Kubernetes Scaling Reference

This topic describes the configuration options for scaling your Imply cluster on Kubernetes. For general information on how to deploy Imply on Kubernetes, see Install Imply Private on Kubernetes.

Configuring the node count

The default Helm configuration deploys one each of master, query, and data nodes. To increase the number of nodes running in the cluster, modify the value for replicaCount in values.yaml.

For example, to increase the master replica count to three nodes, set the following in values.yaml:

master:
  replicaCount: 3

Configuring available disk

To increase the disk available to data nodes you can:

  • Increase the size of the requested persistent volume.
  • Add volume requests and set the the Druid segment cache to span across them. Additional volumes can take advantage of multiple volumes on a node.

Carefully consider the disk size you allocate to data nodes before going to production. Increasing the size of a volume is not supported by all storageClass implementations.

Increasing the size of volume claims for StatefulSet is currently out of the scope of this documentation. Depending on the storageClass used this could be hard to change. See https://github.com/kubernetes/kubernetes/issues/68737 for more information.

Increasing requested volume size

To make a larger initial request of volume size modify the following section in values.yaml:

dataTier1:
  ...
  segmentCacheVolume:
    storageClassName:
    resources:
      requests:
        storage: 40Gi
  ...

Configure the storage setting in Druid in the historical runtime properties key in the druid section of values.yaml. For example:

druid:
  historicalRuntimeProperties:
    - 'druid.segmentCache.locations=[{"path": "/mnt/var", "maxSize": 40000000000}]'
    - 'druid.server.maxSize=40000000000'

Keep in mind that this size is the amount of storage used for segment cache only. There should be about 25 GB of additional space available for other internal processes such as temporary storage space for ingestion, heap dumps, and log files.

See the Druid configuration documentation for more information.

Adding a volume claim

You can add volume claims to the extraVolumeClaimTemplates section in values.yaml. For example:

dataTier1:
  ...
  extraVolumeClaimTemplates:
    - metadata:
        name: var2
      spec:
        accessModes:
          -ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
  ...

After you add a volume claim, verify that the data pods mount this volume. Update the extraVolumeMounts section in values.yaml. For example:

dataTier1:
  ...
  extraVolumeMounts:
    - mountPath: "/mnt/var2"
      name: var2
  ...

The value of name in the extra volume mounts section must match the name from the extra volume claim templates section. Do not use the reserved namesvar or tmp. They are used by the defaults segmentCacheVolume and tmpVolume. To configure Druid use the new volume, update the historical runtime properties in values.yaml:

druid:
  historicalRuntimeProperties:
    - 'druid.segmentCache.locations=[{"path": "/mnt/var", "maxSize": 40000000000}, {"path": "/mnt/var2", "maxSize": 20000000000}]'
    - 'druid.server.maxSize=60000000000'

This adds /mnt/var2 as another available location to cache segments and sets druid.server.maxSize to the combined size of the two locations. See the Druid configuration documentation for more information.

Adding additional data tiers

Adding data tiers can be works much the same way as scaling. By default the dataTier1 comprises one data node. To add a second data tier, increase the replicaCount for the dataTier2 to the desired count. All configuration options available from the default tier are available to both dataTier2 and dataTier3.

More information on Druid data tiers can be found in the Druid multitenancy documentation.

Adding clusters

Add a new cluster as follows:

  1. Run helm list to display a list of the currently deployed releases.

    Note the name of the release name of the existing deployment.

  2. Create a values file for the new cluster as follows:

    helm show values imply/imply > cluster-two.yaml
    
  3. Edit the deployments section of cluster-two.yaml and disable everything except the agents key:

    ...
    deployments:
      manager: false
      agents: true
    
      zookeeper: false
      mysql: false
      minio: false
    ...
    

    The second cluster reuses the other resources.

  4. Set the managerHost value to point to the Manager service defined in the existing deployment. Also configure the name for the new cluster, for example:

    ...
    agents:
      managerHost: wanton-toucan-imply-manager-int
      clusterName: cluster-two
    ...
    

    In this example wanton-toucan is the release name of the deployment we found using helm list.

  5. Verify your Kubernetes environment has enough capacity to accommodate the second Druid Cluster, as defined by the cluster-two.yaml settings.

  6. Run the following command to deploy the second cluster:

    helm install -f cluster-two.yaml cluster-two imply/imply
    

Any changes to the druid settings in the second cluster will not change the defaults. Use the Manager UI to modify these values. Only changes to the master/query/dataTierX take effect.

← Enhanced Imply Private on Google Kubernetes Engine (alpha)Kubernetes Deep Storage Reference →
  • Configuring the node count
  • Configuring available disk
    • Increasing requested volume size
    • Adding a volume claim
  • Adding additional data tiers
  • Adding clusters
2020.11
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2020 Imply Data, Inc