Kubernetes Scaling Reference
This topic describes the configuration options for scaling your Imply cluster on Kubernetes. For general information on how to deploy Imply on Kubernetes, see Install Imply Private on Kubernetes.
Configuring the node count
The default Helm configuration deploys one each of master, query, and data nodes. To increase the number of nodes running in the cluster, modify the value for replicaCount
in values.yaml.
For example, to increase the master replica count to three nodes, set the following in values.yaml
:
master:
replicaCount: 3
Configuring available disk
To increase the disk available to data nodes you can:
- Increase the size of the requested persistent volume.
- Add volume requests and set the the Druid segment cache to span across them. Additional volumes can take advantage of multiple volumes on a node.
Carefully consider the disk size you allocate to data nodes before going to production. Increasing the size of a volume is not supported by all storageClass
implementations.
Increasing the size of volume claims for StatefulSet is currently out of the scope of this documentation. Depending on the storageClass used this could be hard to change. See https://github.com/kubernetes/kubernetes/issues/68737 for more information.
Increasing requested volume size
To make a larger initial request of volume size modify the following section in values.yaml:
dataTier1:
...
segmentCacheVolume:
storageClassName:
resources:
requests:
storage: 40Gi
...
Configure the storage setting in Druid in the historical runtime properties key in the druid section of values.yaml
. For example:
druid:
historicalRuntimeProperties:
- 'druid.segmentCache.locations=[{"path": "/mnt/var", "maxSize": 40000000000}]'
- 'druid.server.maxSize=40000000000'
Keep in mind that this size is the amount of storage used for segment cache only. There should be about 25 GB of additional space available for other internal processes such as temporary storage space for ingestion, heap dumps, and log files.
See the Druid configuration documentation for more information.
Adding a volume claim
You can add volume claims to the extraVolumeClaimTemplates
section in values.yaml. For example:
dataTier1:
...
extraVolumeClaimTemplates:
- metadata:
name: var2
spec:
accessModes:
-ReadWriteOnce
resources:
requests:
storage: 20Gi
...
After you add a volume claim, verify that the data pods mount this volume. Update the extraVolumeMounts
section in values.yaml. For example:
dataTier1:
...
extraVolumeMounts:
- mountPath: "/mnt/var2"
name: var2
...
The value of name
in the extra volume mounts section must match the name from the extra volume claim templates section. Do not use the reserved namesvar
or tmp
. They are used by the defaults segmentCacheVolume
and tmpVolume
. To configure Druid use the new volume, update the historical runtime properties in values.yaml
:
druid:
historicalRuntimeProperties:
- 'druid.segmentCache.locations=[{"path": "/mnt/var", "maxSize": 40000000000}, {"path": "/mnt/var2", "maxSize": 20000000000}]'
- 'druid.server.maxSize=60000000000'
This adds /mnt/var2 as another available location to cache segments and sets druid.server.maxSize
to the combined size of the two locations. See the Druid configuration documentation for more information.
Adding additional data tiers
Adding data tiers can be works much the same way as scaling. By default the dataTier1
comprises one data node. To
add a second data tier, increase the replicaCount
for the dataTier2
to the desired count. All configuration options available from the default tier are available to both dataTier2
and dataTier3
.
More information on Druid data tiers can be found in the Druid multitenancy documentation.
Adding clusters
Add a new cluster as follows:
Run
helm list
to display a list of the currently deployed releases.Note the name of the release name of the existing deployment.
Create a values file for the new cluster as follows:
helm show values imply/imply > cluster-two.yaml
Edit the
deployments
section ofcluster-two.yaml
and disable everything except the agents key:... deployments: manager: false agents: true zookeeper: false mysql: false minio: false ...
The second cluster reuses the other resources.
Set the
managerHost
value to point to the Manager service defined in the existing deployment. Also configure the name for the new cluster, for example:... agents: managerHost: wanton-toucan-imply-manager-int clusterName: cluster-two ...
In this example
wanton-toucan
is the release name of the deployment we found usinghelm list
.Verify your Kubernetes environment has enough capacity to accommodate the second Druid Cluster, as defined by the
cluster-two.yaml settings
.Run the following command to deploy the second cluster:
helm install -f cluster-two.yaml cluster-two imply/imply
Any changes to the druid
settings in the second cluster will not change the defaults. Use the Manager UI to modify these values. Only changes to the master/query/dataTierX take effect.