Kubernetes deep storage reference

This topic describes the deep storage configuration options for an Imply cluster on Kubernetes. For general information on how to deploy Imply on Kubernetes, see Install Imply Private on Kubernetes. See the Druid Deep storage documentation to choose the best supported deep storage type for you.

S3

To use Amazon S3 as the deep storage mechanism, set the deepStorage configuration in the druid section of values.yaml as follows:

druid:
  ...
  deepStorage:
    type: s3
    path: "s3://my-bucket-name/" # the bucket to use
    user: AKIAIOSFODNN7EXAMPLE # the AWS Access Key ID
    password: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY # the AWS Secret Access Key
  ...

You can omit the user and password fields if the nodes that the pods in this deployment are running on have instance roles with access to the bucket.

For more configuration options, see Druid S3 extension documentation.

Azure

To use Microsoft Azure for deep storage, configure the deepStorage type in the druid section of values.yaml as follows:

druid:
  ...
 deepStorage:
    type: azure
    path: <container/optional-prefix>
    user: <azure-service-account-name>
    password: <azure-account-shared-key>
  ...

You can omit the user and password fields if you are running the Imply cluster on Azure instances with roles.

The path you specify will be prepended by azure://. For example, a path value of druid results in the full path of azure://druid. In turn the segment location results in azure://druid/<partial_cluster-id>/segment.

For the Azure shared account key to use, see Authorize with Shared Key in the Azure documentation.

For more configuration options, see the Druid Azure extension documentation.

GCS

To use Google Cloud Storage for deep storage, configure the deepStorage type in the druid section of values.yaml as follows:

druid:
 ...
 deepStorage:
    type: google
    path: <bucket/path>
    user:
    password:
      {
        "type": "service_account",
        "project_id": "project-id",
        "private_key_id": "key-id",
        ...
      }
 ...

The path you specify will be prepended by gs://. For example, a path value of druid results in a full path gs://druid. In turn, the Druid segment location results in gs://druid/<partial_cluster-id>/segment.

You can leave the user value blank because it does not apply to the GCS connection configuration.

A Google service account key is only required if you are running outside of GCS or the service account attached to the VM does not have read/write access to the bucket.

For the Google service account key to use, see Create and delete service account keys in the Google Cloud documentation.

For more configuration options, see the Druid Google Cloud Storage extension documentation.

HDFS

New cluster defaults for HDFS can be set under deepStorage in the druid section of values.yaml. For example:

druid:
  ...
  deepStorage:
    type: hdfs
    path: "hfds://hdfs.example.com:9000/druid"
  ...

Store all HDFS configuration files on the Druid classpath. Review the documentation on Adding custom user files for how to add them.

For more configuration options, see the Druid HDFS extension documentation.

NFS and Kubernetes Supported Volumes

You can use any one of the Kubernetes supported volume types as deep storage. All pods must have access to the same deep storage volumes that you set up.

The type you select needs to support the ReadWriteMany access mode as well. Take into account the fact that different supported types will have different levels of data durability when you choose the volume type.

If you want to use a Persistent Volume Claim (PVC), first allocate the PVC outside the Helm chart. Then reference the claim as persistentVolumeClaim.claimName in the extraVolume and extraVolumeMounts fields in values.yaml.

For more information, see: Claims as volumes.

To use an existing PersistentVolume, mount it to each of:

master
query
dataTierX (only required for dataTiers with replicaCount greater than zero)

Update the following sections to mount the PersistentVolume:

master/query/dataTierX:
  ...
  extraVolumes:
    - name: deep-storage
      nfs:
        server: 10.108.211.244
        path: /segments
  extraVolumeMounts:
    - mountPath: "/mnt/deep-storage"
      name: deep-storage
  ...

This example shows a NFS volume mounted to the /mnt/deep-storage path. This is the same path that is used automatically by the volumeClaim, if used.

To update the defaults for new clusters to use the mount, update deepStorage in the druid section of values.yaml as follows:

druid:
  ...
  deepStorage:
    type: local
    path: "/mnt/deep-storage"
  ...

S3​

Azure​

GCS​

HDFS​

NFS and Kubernetes Supported Volumes​

S3

Azure

GCS

HDFS

NFS and Kubernetes Supported Volumes