Install Imply Private on Linux
This document describes how to install the binary distribution of Imply on CentOS Linux. This is a basic, self-hosted deployment that leaves the provisioning and bootstrapping of the cluster nodes to you.
If you have existing Kubernetes expertise and infrastructure, we recommend deploying Imply Private on Kubernetes. Use the installation described here if you cannot use Kubernetes or Imply Cloud.
Overview
The Imply Private on Linux distribution consists of two archives:
- Imply Manager distribution: imply-manager-2021.01.1
- Imply Agent distribution: imply-agent-v1
Upon installation, Imply Agents become the master, query, and data nodes in your cluster. The Imply Manager archive, naturally, provides for the installation of the Imply Manager, which serves the central configuration and administration interface for the Imply deployment.
Additionally, as depicted in the figure below, a complete Imply deployment includes:
- An external database for metadata storage (MySQL or PostgreSQL)
- A ZooKeeper ensemble for server coordination
- A distributed storage system to serve as Druid's deep storage (e.g., S3, GCS, HDFS, NFS, etc.)
See Design for more about cluster servers and processes.
Distribution archive versioning
In the previous section, notice that the names of the Manager and agent distribution archives include version numbers 2021.01.1
and v1
.
There are a few points to note regarding this versioning:
- While the Manager archive version corresponds to the Imply software it is distributed with, the Manager distribution archive is not necessarily bound to that version. That is, you can typically upgrade the Imply cluster to a later version without updating the Manager version.
- Occasionally, a change in a subsequent version of the Imply software may necessitate an update to the Imply Manager software, as installed here. You may wish to upgrade to access new features in the Manager as well. However, it is not typically required for a routine update of a cluster with the latest Imply software version.
- Imply agents follow a simple versioning scheme, such as
v1
. Like the Manager, agent versions are not necessarily bound to an Imply software version; that is, you can update the cluster to a later version without updating the agent. However, it is possible, although relatively rare, that a update to a new Imply version will first require an update to the underlying Imply agent software.
Currently, there is only a single agent version, so no incompatibilities exist between Imply software versions and agent versions.
Supported Linux Distributions
- Centos 8
Software Dependencies
The Imply Manager and each Imply Agent must have the following software installed
- Java 8 (8u92 or higher) Zulu OpenJDK Recommended
- Python 3 (3.6 or higher)
- Perl 5 (5.26 or higher)
The Imply installer will check if these dependencies are installed and will fail if they are not found.
Networking
The Imply Manager and each Imply Agent machine must be configured with a hostname unique within your network infrastructure. Each hostname must have a corresponding entry in the domain name system (DNS) so that Imply services can connect to each other using the hostname. The fully qualified domain name (FQDN), which is used in DNS, includes the hostname and the entire domain name. The hostname and the domain name labels are separated by periods or dots, as follows: hostname.domain
Installation steps
Before starting, install or verify the availability of the external dependencies mentioned above (a relational database, ZooKeeper, and deep storage). Note that the Manager and cluster both rely on external databases to store metadata. You can either use the same or different databases for the manager and the cluster.
You will need sudo privileges on the target machines to run the installers.
Step 1: Install Imply Manager
On the machine that will serve Imply Manager, follow these steps:
Download the imply-manager-2021.01.1.tar.gz archive to the target directory.
Extract the distribution archive:
$ tar -xvf imply-manager-2021.01.1.tar.gz
To run the installer:
$ sudo imply-manager-2021.01.1/script/install
The installer checks the environment for software dependencies. It raises an error message if they are not found.
Configure the Manager by editing
/etc/opt/imply/manager.conf
. Specify the connection settings for the manager store database, Zookeeper, the Imply metadata storage database, and deep storage, as appropriate for your environment. For example:IMPLY_MANAGER_LICENSE_KEY=<license (trial if not set)> IMPLY_MANAGER_STORE_TYPE=<[mysql, postgresql]> IMPLY_MANAGER_STORE_HOST=<db_host> IMPLY_MANAGER_STORE_PORT=3306 IMPLY_MANAGER_STORE_USER=<user> IMPLY_MANAGER_STORE_PASSWORD=<password> IMPLY_MANAGER_STORE_DATABASE=imply_manager # These can be set here or through the Imply Manager when creating the cluster imply_defaults_zkType=external imply_defaults_zkHosts=<zk_host>:2181 imply_defaults_zkBasePath=imply imply_defaults_metadataStorageType=<[mysql, postgresql]> imply_defaults_metadataStorageHost=<db_host> imply_defaults_metadataStoragePort=3306 imply_defaults_metadataStorageUser=<user> imply_defaults_metadataStoragePassword=<password> imply_defaults_deepStorageType=<[azure, google, hdfs, local, s3]> imply_defaults_deepStoragePath=<e.g. s3://bucket-name/path> imply_defaults_deepStorageUser=<user> imply_defaults_deepStoragePassword=<password>
See Enabling TLS, below, for how to configure TLS in the cluster.
Start Imply Manager:
$ sudo systemctl start imply-manager
(Optional) Ensure Imply Manager is running
Ensure all services starting with "imply-" have a green dot
$ systemctl list-dependencies --reverse imply-manager imply-manager.service ● ├─imply-grove-server.service ● ├─imply-manager-be.service ● ├─imply-manager-fe.service
The Imply Manager UI should now be accessible on port 9097.
Step 2: Log in and create the cluster
- From a browser, access Imply Manager at:
http://<hostname>:9097
. - Create the administrator account and click Login to proceed.
- From the dashboard, click + New cluster.
- The default values are appropriate for a small-scale deployment. If you did not provide the
imply_defaults_*
configurations in themanager.conf
file, you will need to set them at this time. - In the Schema field, type a distinguishing name for the Imply Manager database schema, such as
imply_manager
. - Click Create cluster to proceed.
- Click OK to confirm cluster creation. Note the Cluster ID indicated in the cluster status page. You will need this in the next step to configure the agents.
Step 3: Install agents
Install the agent on each node that will be part of this cluster. You will need at least one master, one query, and one data node:
The Imply agent must have read and write permissions to the
/mnt/var
and/mnt/tmp
directories. If these directories do not exist, the installer will create them and set appropriate permissions. However, if they do exist and theimply
user does not have read and write permissions to them, the install will fail.
Download the imply-agent-v1.tar.gz archive to the target directory.
Extract the distribution archive:
$ tar -xvf imply-agent-2021.01.1.tar.gz
To run the installer:
$ sudo imply-agent-2021.01.1/script/install
Configure the agent by editing
/etc/opt/imply/agent.conf
. Set the Manager hostname, the cluster ID you noted in the previous step, and the node type for this agent (master
,query
ordata
):IMPLY_MANAGER_HOST=<manager_host> IMPLY_MANAGER_AGENT_CLUSTER=<cluster_id> IMPLY_MANAGER_AGENT_NODE_TYPE=<[master, query, data]>
The value of
IMPLY_MANAGER_HOST
must not include the protocol (http:// or https://) section of the URL.For clusters that use multiple data tiers,
IMPLY_MANAGER_AGENT_NODE_TYPE
can also be set todataTier2
ordataTier3
.Start the agent:
$ sudo systemctl start imply-agent
(Optional) Ensure Imply Agent is running
Ensure all services starting with "imply-" are preceded by a green dot, which indicates running status:
$ systemctl list-dependencies --reverse imply-agent imply-agent.service ● ├─imply-grove-agent.service ● ├─imply-runsvdir.service
Repeat this step for each server in the cluster, specifying the appropriate node type for each.
You can view status or stop the agent by running
systemctl status imply-agent
andsudo systemctl stop imply-agent
, respectively.
Step 4: Accessing Imply
Once the nodes in the cluster have been configured, they should appear on the Servers list. Ensure that all of the nodes have joined the cluster, and then locate the IP / hostname for a query node. Access the Pivot UI from your browser at: http://<query_hostname>:9095
.
A typical deployment would consist of multiple master, query, and data nodes for high availability. Instead of directly accessing the query node, you would route your requests through a load balancer for resiliency.
Reinstalling
You can re-install the agent or manager by re-running the installer. The installer detects an existing installation and prompts you to remove the instance. Like installation, root access is required to reinstall Imply.
Existing configuration files are removed during the reinstall. To ensure you don't lose your configuration, make a backup of the manager.conf or agent.conf files to a directory that does not include /opt/imply/agent
.
Imply Manager
$ cp /etc/opt/imply/manager.conf ~/manager.conf.bkp
Imply Agent
$ cp /etc/opt/imply/agent.conf ~/agent.conf.bkp
Enabling TLS
See the TLS Docs for information on how to generate certificates and general TLS information.
By default Imply Manager looks for the certificate files in the following locations:
- Signing certificate:
/run/secrets/imply-ca.crt
- Signing key:
/run/secrets/imply-ca.key
The
imply
user must have read permissions to the CA Key and Certificate files
To enable TLS, follow these steps:
Copy the certificate and key to the above location on the cluster nodes, or specify a custom location in the following configuration files.
Add the following configurations:
For the manager, add the following lines to
/etc/opt/imply/manager.conf
:IMPLY_MANAGER_CA_KEY_PATH=/run/secrets/imply-ca.key IMPLY_MANAGER_CA_CERT_PATH=/run/secrets/imply-ca.crt
For the agent, add the following line to
/etc/opt/imply/agent.conf
:IMPLY_MANAGER_CA_CERT_PATH=/run/secrets/imply-ca.crt
Restart the cluster.
If successfully enabled, TLS: Enabled
appears in the logs at manager and agent startup.
Authentication
See the Authentication Docs for more information on Manager authentication.
To enable authentication, follow these steps:
The authentication token can either be loaded from a file or set in the manager and agent configuration files.
Load from file
By default Imply Manager looks for the authentication token in /run/secrets/imply-auth-token
. To load the token from a custom file location, set IMPLY_MANAGER_AUTH_TOKEN_PATH
in the manager and agent configuration files.
For the manager, add the following lines to
/etc/opt/imply/manager.conf
:IMPLY_MANAGER_AUTH_TOKEN_PATH=/run/secrets/imply-auth-token
For the agent, add the following line to
/etc/opt/imply/agent.conf
:IMPLY_MANAGER_AUTH_TOKEN_PATH=/run/secrets/imply-auth-token
Set in configuration file
For the manager, add the following lines to
/etc/opt/imply/manager.conf
:IMPLY_MANAGER_AUTH_TOKEN=<authentication token text>
For the agent, add the following line to
/etc/opt/imply/agent.conf
:IMPLY_MANAGER_AUTH_TOKEN=<authentication token text>
Logging
Imply Manager and agents write logs to the system log, /var/log/messages
on CentOS. If you have trouble installing or starting, be sure to check the log for error messages.