2021.01 LTS

2021.01 LTS

  • Imply
  • Pivot
  • Druid
  • Manager
  • Clarity

›Imply Private

Overview

  • Imply Overview
  • Design
  • Release notes

Tutorials

  • Quickstart
  • Data ingestion tutorial
  • Kafka ingestion tutorial
  • Connect to Kinesis
  • Querying data

Deploy

  • Deployment planning
  • Imply Managed

    • Imply Cloud overview
    • Imply Cloud security
    • Direct access Pivot
    • On-prem Cloud crossover

    Imply Private

    • Imply Private overview
    • Install Imply on Minikube
    • Imply Private on Kubernetes
    • Imply Private on Azure Kubernetes Service
    • Enhanced Imply Private on Google Kubernetes Engine
    • Kubernetes Scaling Reference
    • Kubernetes Deep Storage Reference
    • Imply Private on Linux
    • Pivot state sharing
    • Migrate to Imply

    Unmanaged Imply

    • Unmanaged Imply deploy

Misc

  • Druid API users
  • Extensions
  • Third-party software licenses
  • Experimental features

Install Enhanced Imply Private on Google Kubernetes Engine (beta)

Enhanced Imply Private on Google Kubernetes Engine is a Beta feature. For more information, see Experimental features.

This document describes how to deploy and manage Imply Private on GKE using the enhanced installation mode. This solution offers users the simplest set up and management process when running Imply on Google Cloud Platform.

Google Cloud components and sample architecture

The following diagram shows the resulting resources that are created in this deployment mode.

GCP enhanced sample setup

As shown the following resources will be created within a given Google Cloud project

  • VPC - The network that resources will communicate over. An existing VPC can be used, or Imply will create one for you.
  • Public GKE cluster - The Kubernetes cluster which within all compute resources will run. Several Node pools will also be created for the cluster to use when deploying Druid deployments. An existing cluster cannot be used, the cluster will be created by Imply.
  • Cloud SQL instance - The data store used as the default metadata store for Druid deployments, as well as for the Imply control plane. An existing Cloud SQL instance can be used, or Imply will create one for you.
  • Cloud Storage bucket - The storage bucket used as the default Deep Storage for all druid deployments. An existing cloud storage bucket can be used, or Imply will create one for you.
  • Load balancer - Used to access the Imply control plane.

Requirements

Before proceeding please ensure the following requirements are fulfilled.

GCP Resource Requirements

You will need the following Google Cloud resources

  • A Google Cloud account
  • A Google Cloud project
  • A User with Owner privilege
  • A Monitoring workspace containing the respective project. Allow Google to create the workspace with the same name as your project.

Ensure that the following APIs are enabled prior to execution:

  • Compute Engine API
  • Kubernetes Engine API
  • Service Networking API

To enable these APIs please navigate to the APIs & Services section of the Google cloud Web Console, search for them, and enable them.

Tool Dependencies

The resources are deployed by a setup script which depends on the following tools:

  • terraform
  • gcloud
  • jq

These tools are included by default in the Google cloud shell, and for that reason we recommend running the setup script from there.

Creating an Imply Deployment

The Imply deployment is setup by running a setup script. To download the setup script, run the command curl -O https://static.imply.io/onprem/setup. We encourage users to run the setup script in the Google cloud shell, as it contains all the dependent tools that the setup script needs, by default.

Authorize with Google Cloud

You will need to first authorize with Google to gain access to the needed resources. To do so, run the command gcloud auth login, and follow the prompts shown. Ensure that you authorize as a user with owner privilege.

Start the Script

To start the script run sh setup. You will see output similar to the following:

Welcome to the Imply GCP Installer. (build 50aa6b9)

This installer will guide you through setting up the required Google
Cloud Platform resources for running a managed Imply install in GCP
on Kubernetes.

Select Project

Next you will be prompted to select the Google project that you would like to work in.

1) golden-bonbon-283506
2) pure-episode-234323
Select the Google Project ID [pure-episode-234323]:

Enter the number corresponding to the project that you want to use and press enter, or press enter with no value to use the default value in brackets.

Enter License

Next you will be prompted to enter a license key.

Enter your Imply License (blank for trial):

Enter your Imply license here and press enter, or press enter with no value to use a trial license which will allow you to use the product for 30 days. The deployment can be updated later on with a proper license if using a trial mode license.

Select Region

Next you will be prompted to enter the region for the deployment. This is the region that Imply will be deployed to.

1) asia-east1
2) asia-southeast1
3) europe-west1
4) europe-west4
5) us-central1
6) us-east1
7) us-east4
Select a region [us-central1]:

Enter the number corresponding to the region that you want to use for the deployment, and press enter.

Select HA or Non-HA Deployment

Next you will be prompted whether to deploy the resources across multiple availability zones in the region. When deployed across multiple availability zones, Imply is more resilient to failures in any one particular zone.

Running in multiple zones will create all resources with
redundancies in multiple GCP zones. Resources such as optionally
created MySQL and the Kubernetes cluster will have redundancy
configured across these nodes. This is recommended in production.
Run in multiple zones? [yes]:

Press enter to take the default of using multiple zones, or enter no, for a single zone deployment, and press enter. We recommend using multiple zones.

Select Network

Next you will be prompted whether to create a new VPC to use, or to use an existing VPC, and will be asked details about the CIDR block to use for the deployment. This is the network that the Imply deployment will use when deploying resources.

Creating a separate VPC is recommended but if you require
the Imply cluster to run in an existing VPC to be able to
access resources that cannot otherwise be access you can
select an existing VPC.
Note: When using an existing VPC please ensure that private
service networking is enabled to allow the Imply cluster
to communicate with GCP services.
Create a new VPC [yes]:
CIDR Range for the Imply Cluster to use [10.128.0.0/16]:

Press enter to take the default of creating a new VPC, and you can take the default CIDR range specified, or specify your own. If you want to use an existing VPC, enter no and follow the prompts. Ensure that the CIDR range that you specify in this case is available for use.

Select Deep Storage Bucket

Next you will be prompted whether to create a Deep Storage bucket or to use an existing bucket. This bucket is used as the default deep storage for all Imply deployments.

Enter a bucket to use as Deep Storage for Imply or preferably
have one created for use with Imply.
Note: If a bucket is created for you it will be removed on uninstall.
All data in it will be removed as well.
Example existing bucket: gs://my-bucket-1
Existing GCS bucket or New [New]:

Press enter to take the default of creating a new storage bucket, or enter no to use an existing bucket, and follow the prompts to give details to identify the bucket.

Select Metadata Store

Next you will be prompted whether to create a cloud SQL store for storing metadata about your Imply deployment, or to use an existing database. This store will be used by default to store metadata for all Imply data planes and control plane.

Creating a MySQL Cloud SQL for use by the Imply Manager and
as a default for clusters created by it is recommended. If
you required the Imply Manager to use an existing MySQL
database you can choose to enter those details instead.
Note: The stored Terraform state as well as installer
state may store your sql connection information in plain
text inside a GCS bucket in your account.
Note: If the Database is created for you it will be remove on uninstall.
All data in it will be removed as well.
Create a new MySQL database [yes]:

Press enter to take the default of creating a new cloud SQL instance, or enter no to use an existing database, and follow the prompts to give details to identify the database.

Select Ingress

Next you will be prompted whether or not to setup ingress. Ingress will allow the Imply control plane to be deployed with a Google managed certificate, allowing secure connections to the manager. This requires that you have at least one Cloud DNS managed zone configured.

Ingress can be automatically configured using Cloud DNS
and Google Managed Certiciates to allow secure connections
to the Imply Manager. This feature requires that a Cloud DNS
managed zone already exist. For more information on setting
up a Cloud DNS zone see:
https://cloud.google.com/dns/docs/zones
Automatically setup ingress [no]: yes
Determining available Cloud DNS Managed Zones
1) gcp-imply-io
Select a zone: 1

If you have no Cloud DNS managed zones configured, press enter to take the default selection no, as this feature is not supported in this case. Otherwise, if you'd like to use this feature, enter yes, and then enter the number corresponding to the Cloud DNS zone that you want to use for the deployment, and press enter. This will result in the control plane being hosted in a subdomain of the Cloud DNS zone that you selected.

Confirm Details

Next you will be prompted to confirm the details of the deployment

License:
Project ID:  pure-episode-234323
Region:      us-central1
Zone:        [Multizone]
VPC:         [Created]
CIDR:        10.128.0.0/16
Bucket:      [Created]
MySQL:       [Created]
Ingress:     yes
Is the above information correct [yes]:

Ensure that the details are correct, and if so, press enter to take the default option yes to confirm the details, and proceed with the deployment. Otherwise, enter no and the script will restart.

Wait for Deployment to Complete

The deployment will take around 30 minutes. Wait for it to finish deploying all needed resources. When deployment finishes, you will receive a message similar to the following.

helm_release.manager: Creation complete after 3m3s [id=imply]

Apply complete! Resources: 32 added, 0 changed, 0 destroyed

Outputs:

bucket =
cidr = 10.128.0.0/16
dns_host = imply.gcp.imply.io
dns_zone = gcp-imply-io
gke_id = imply-gke
license = 
project_id = pure-episode-234323
region = us-central1
sql_endpoint =
sql_password =
sql_username =
vpc =
zone =

You can access your Imply Manager at:
https://imply.gcp.imply.io
Note: It may take 15-20 minutes for the SSL certificate to be generated
and during that time you may see a certificate error

The details will include instructions for how to access your deployment.

Create an Imply Manager Admin User

Use the access instructions given at the end of the setup script to access your deployment. Please refer to Adding users and roles for how to add users.

Cluster Management

Please refer here for how to manage your Imply clusters.

Connect to Your Cluster

Please refer here for how to connect to your cluster.

Loading Data into Your Cluster

This will take you through an example of loading data into your cluster from a Google Cloud storage bucket. Other ingestion methods are available. Please refer to the Druid ingestion documentation for other ingestion types.

By default, the service accounts that are being used on the Druid service pods only have access to read and write data to the deep storage bucket configured. In order to ingest data from another storage bucket, you will need to explicitly give the agent service account the required permissions needed to access that bucket. Please see Google's IAM documentation. The agent service account is named imply-default-asa.

Give Service Account Permission to Data Bucket

First you will need to give the service account access to the resource from which you would like to ingest data. In this example, we will be ingesting data from Google Storage Bucket imply-doc-demo-ingest.

Use the Google Cloud Web Console to navigate to your service account, we'll need to look up the email address of the agent service account. Find the agent service account, which is named imply-default-asa, and copy the email address for the account.

GCP enhanced service account details

In the Google Cloud web console, navigate to the bucket details for the bucket that you would like to ingest data from, and click the permissions tab.

GCP enhanced bucket permissions

Add the service account with role storage admin using the service account email.

GCP enhanced bucket add service account permission

Now the service account has the permission needed to read and write to the bucket.

Ingest Data Using Druid Web Console

Navigate to the Druid Web Console, click new ingestion spec and choose the Google Cloud Storage ingestion type.

GCP enhanced choose Google Cloud Storage ingestion type

Follow the on-screen instructions for providing the path to the data that you want to ingest, and any filtering or aggregation options that you'd like to apply. After the data is ingested successfully, you will see a corresponding ingestion task listed in the Ingestion tab with status, SUCCESS.

GCP enhanced successful ingestion task

Once Druid loads all of the segments data, the corresponding datasource will show as Fully available under the Datasources tab

GCP enhanced segments fully loaded

Query the Data

Now the datasource is ready to be queried. Please see the query tutorial documentation.

Updating an Imply Deployment

To update an existing Imply deployment, run the setup script as described in Creating an Imply Deployment. Give the selection for the deployment that you'd like to update, and when asked if you'd like to update, enter yes.

Select the Google Project ID [zachsherman]:
1) New
2) default
Select existing deployment to manage or New to create one [New]: 2
An existing deployment was found, update? [yes]:

You will be given the details of the deployment, and prompted whether the details are correct. If you are using an updated version of the script, and don't want to make any explicit changes to your deployment, enter yes. If you would like to make explicit changes to your deployment enter no.

License:
Project ID:  zachsherman
Region:      us-central1
Zone:        [Multizone]
VPC:         [Created]
CIDR:        10.128.0.0/16
Bucket:      [Created]
MySQL:       [Created]
Ingress:     no
Is the above information correct [yes]:

If you entered no, from there, you will be permitted to modify the deployment. Follow the steps as described earlier in this document.

Deleting an Imply Deployment

To delete an existing Imply deployment, run the setup script as described in Creating an Imply Deployment. Give the selection for the deployment that you'd like to update, and when asked if you'd like to update, enter no. You will then be given the option to uninstall the deployment.

GCP enhanced delete deployment

Follow the prompts from here.

Advanced

The following section decribes some advanced use cases of the deployment

Multiple deployments

The setup script can be used to create multiple Imply deployments. If you do not currently have any deployments, you will not be prompted for a deployment name, and instead a name of default will be automatically given for your first deployment. If you have at least one deployment, you will be prompted to select which deployment to use or can create a new deployment at this time

Select the Google Project ID [pure-episode-234323]:
1) New
2) default
Select existing deployment to manage or New to create one [New]:

If using a non-default deployment, the name of the agent service account will be imply-{DEPLOYMENT-NAME}-asa.

Common Issues

The following section goes over some common issues that you may face in the field and how to resolve them.

Deployment Terraform State Locked

The setup script uses terraform to create, update, or delete resources in Google Cloud for the Imply deployment. If an error occurs while terraform is in the process of modifying resources, such as the user's terminal window being closed, the next time the setup script is run, terraform will complain that it was not able to acquire a lock on the state for the deployment. Such a message would look something like the following:

GCP enhanced terraform lock issue

To resolve this situation, ensure that no other operations are being performed on the deployment, and unlock the state file using terraform force-unlock. If this fails, delete the lock file from the Google storage bucket noted in the message.

Cluster Create or Update Fails to Scale-up Node Pool

You may encounter an issue when creating or updating an existing cluster, where a failure occurs because a particular node pool fails to scale to the required number of nodes, as shown below:

GCP enhanced scale-out failure manager

This can be caused by quota limits in the account being hit. To fix this issue, you will need to increase the corresponding quota.

← Imply Private on Azure Kubernetes ServiceKubernetes Scaling Reference →
  • Google Cloud components and sample architecture
  • Requirements
    • GCP Resource Requirements
    • Tool Dependencies
  • Creating an Imply Deployment
    • Authorize with Google Cloud
    • Start the Script
    • Select Project
    • Enter License
    • Select Region
    • Select HA or Non-HA Deployment
    • Select Network
    • Select Deep Storage Bucket
    • Select Metadata Store
    • Select Ingress
    • Confirm Details
    • Wait for Deployment to Complete
  • Create an Imply Manager Admin User
  • Cluster Management
  • Connect to Your Cluster
  • Loading Data into Your Cluster
    • Give Service Account Permission to Data Bucket
    • Ingest Data Using Druid Web Console
    • Query the Data
  • Updating an Imply Deployment
  • Deleting an Imply Deployment
  • Advanced
  • Multiple deployments
  • Common Issues
    • Deployment Terraform State Locked
    • Cluster Create or Update Fails to Scale-up Node Pool
2021.01 LTS
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2021 Imply Data, Inc