Deploy on Google Kubernetes Engine

This document describes how to deploy a highly available, distributed Imply cluster on Google Cloud Platform, with Google Kubernetes Engine (GKE) acting as the underlying orchestration engine.

While this document takes you through the steps for setting up Imply with the Google Cloud Platform, it does not constitute a complete guide for GKE or Google Cloud. For more information, see the Google Kubernetes Engine documentation. Also, for general information on deploying Imply on Kubernetes, see Deploy with Kubernetes.

Google Cloud components and sample architecture

This guide takes you through the steps for creating a Google Cloud deployment depicted in the following diagram:

GCP sample setup

As shown, the Imply cluster deployments are deployed to Kubernetes pods.

For external dependencies, we'll use MySQL (powered by Google Cloud Managed RDS: CloudSQL) for metadata storage and Google Cloud storage for deep storage. All components will be in a single GCP project.

We recommend running all components in a dedicated, private network VPC. Thus, if you want to rely on an existing CloudSQL RDS store, make sure you create the GKE cluster in the same VPC. Otherwise, be sure to create the new MySQL instance in the same VPC as the GKE cluster.

The machines in the private VPC will need to be able to connect to the Internet. The following instructions include minimal steps for configuring firewall rules required rules for a private cluster, but you can find detailed instructions in Example GKE Setup in the Google documentation.

Requirements

Step 1. Set up Google Kubernetes Engine (GKE)

  1. From the Google Cloud Platform home, create or select the project in which you want to deploy the GKE cluster.
  2. Click Kubernetes Engine from the left navigation pane. It may take a moment for the Kubernetes Engine API to be enabled. The Kubernetes configuration page appears: GCP Kubernetes setup
  3. In the Cluster basics page, enter a name for the Kubernetes cluster.
  4. Choose the desired location type for the cluster, zonal or regional. Regional (multiple zones) is recommended for production deployments requiring high availability, whereas zonal (single zone) is usually sufficient for experimental or testing environments.
  5. Click default-pool from the left pane, and click the Enable autoscaling option to enable it.
  6. Under Nodes, for the machine type, choose one of the high memory machines, such as n2-highmem-8. The default image type, Container-Optimized OS, is acceptable.
  7. For Node security, Metadata, and Automation settings, you can keep the defaults.
  8. For Networking, choose whether the cluster should be public or private (recommended). If you choose a private network, you'll need to configure Cloud NAT for the Kubernetes cluster, as described in the next step, in order to reach the Imply servers and download the software.
  9. Click Create to create the cluster.

After a moment, the new cluster should appear in the Kubernetes cluster list with a green check mark, indicating that it is running.

Note that you can check the status of the cluster by clicking Connect next to your new cluster, and then Open Workloads dashboard.

Step 2. Configure Open NAT if using a private cluster

If you chose to use a private cluster in the network settings in the previous step, configure Open NAT as described here. If you chose public, you can skip this step.

The steps below summarize the steps described by steps 3 through 7 from Example GKE setup in the Google documentation. Refer to that document for complete instructions.

  1. Create a firewall rule for your VPC allowing SSH as described in Step 3.
  2. Create IAP SSH permissions on one of your nodes as described in Step 4.
  3. Log into the node as described in Step 5.
  4. Create a NAT configuration using Cloud Router as described in Step 6.
  5. Confirm you can now connect to the Internet from the node.

Step 3. Set up MySQL metadata store

  1. From the console home, go to SQL and click Create instance.
  2. Choose MySQL for the database type.
  3. Enter a password for the instance and enter an instance ID.
  4. For Region, choose the same region as you chose for the GKE cluster.
  5. Click show configuration options.
  6. Choose private IP and under Associated networking choose the VPC used for the Kubernetes cluster.
  7. Click Create.
  8. Give the MySQL database full IAM permission scope in the project. For more information, see Project Access Control in the Google documentation.

When the instance finishes deploying, note the address shown in the Private IP address column. You will need to enter this in the YAML configuration file later.

GCP Kubernetes setup

Step 4. Configure Storage Account

  1. Go to the Storage browser and click CREATE BUCKET.
  2. Be sure to select the same region for the new bucket as you chose for the Kubernetes cluster.
  3. Give the storage full IAM permission scope in the project. For more information, see Project Access Control in the Google documentation.

The following shows the remaining configuration settings, which can remain at their default values:

GCP Kubernetes setup

Step 5. Connect to the GKE Cluster

  1. In the cluster page, click Connect next to the new cluster.

  2. Get credentials for the GKE cluster by copying the command shown and running in the Google Cloud client. The command should be similar to the following: gcloud container clusters get-credentials imply-k8s-1 --region us-central1 --project possible-router-145522

    Where imply-k8s-1 is the GKE cluster name, possible-router-145522 is the project ID.

Step 6: Install Imply Druid

You can now follow the instructions in the Kubernetes deployment guide to complete the installation using Helm:

  1. Add the Imply repository to Helm by running:

    helm repo add imply https://static.imply.io/onprem/helm
    helm repo update

    See Deploy with Kubernetes for introductory information on using Helm with Imply.

  2. Create a values.yaml file, populating it with the downloaded contents of the latest Helm chart from Imply:

    helm show values imply/imply > values.yaml
  3. In values.yaml, change the deployments configuration of minIO and mySQL to false, since you're using Google Cloud storage and the MySQL instance.

    ...
    deployments:
      manager: true
      agents: true
    
      zookeeper: true
      mysql: false
      minio: false
    ...
  4. Add the license key, typically from a file:

     kubectl create secret generic imply-secrets --from-file=IMPLY_MANAGER_LICENSE_KEY
  5. Configure the metadata store for the Manager in values.yaml:

    1. Specify the MySQL database name, IP, username, and password.

      Note that the default name in the Imply Helm chart has a hyphen in the database name, which is not permitted in CloudSQL database names. Be sure to replace the name with the name of the database you created in Step 2 above, or change the default name.

      For example:

      manager:
      
      ...
      metadataStore:
       type: mysql
       host: <private_IP_address>
       port: 3306
       user: root
       password: imply
       database: imply-manager
    2. If you want to only allow SSL, add the certificate.

  6. Configure the same settings for the Druid metadata store configuration.

  7. Configure Google Cloud Storage as the deepStorage resource by setting the Path to gs://imply-k8s. (No username or password is required, since the storage lives in the same project.)

    druid:
      ...
      deepStorage:
        type: google
        path: gs://<bucket/path>
    ...

    See other Google deep storage settings for more information.

  8. Install Imply from the shell by running:

    helm install {release-name} imply/imply -f values.yaml

    Where {release-name} is the deployment name you chose.

  9. Follow the instructions printed to the screen when the helm installation finishes. For GCP, you'll need to configure your local Google Cloud SDK installation to connect to the Project and Cluster and then run port forwarding locally to open the Manager, Pivot or Druid UIs in a browser pointed at localhost.

For more general information on adapting the Helm chart for your deployment, see Deploy with Kubernetes.

Next steps

That's it! You now have a cluster running on GKE. If you are getting to know Imply, learn about who to load data in the quickstart.

For ongoing administration and maintenance, see Deploy with Kubernetes and Using the Imply Manager.

Overview

Tutorial

Deploy

Administer

Manage Data

Query Data

Visualize

Configure

Misc