Security is of the utmost importance for any mission-critical application, and at Imply we know this is especially true for applications deployed in the cloud. This document details the security features offered by Imply Hybrid (formerly Imply Cloud).
For information about how to set up your AWS account so that you can run Imply in an AWS sub account, see Organizing Your AWS Environment Using Multiple Accounts from AWS. Use the resources there and on this page to secure your Imply Hybrid deployment.
Imply Hybrid is built on the secure foundation of Amazon Web Services (AWS). Your Imply cluster is private and dedicated to you. It is deployed completely within your own AWS account, using a dedicated virtual private network (VPC). This gives you full control over your machines and your data.
Imply also operates user-facing login and management software (“Imply Hybrid Portal”), allowing you to manage and use your clusters through a web interface at https://implycloud.com/. The Imply Hybrid Portal is provisioned on Imply servers and does not run in your AWS account.
Data that you load into your Imply cluster is never stored on Imply servers—it is only stored on EC2 instances and S3 buckets in your own AWS account. However, you may query your cluster through the Imply Hybrid Portal, in which case query requests and responses are transferred through Imply's network. Imply's network merely transfers these query responses to you, and does not store them. This communication is encrypted using TLS; for more details, see data in transit below.
Imply Hybrid requires two linkages between your AWS account and Imply:
- You must grant Imply permissions through IAM for management operations such as launching and terminating instances. This occurs during initial setup of your Imply Hybrid account, as described in the Cloud signup instructions that appear after you sign up.
- The Imply Hybrid VPC in your AWS account will be peered with an Imply-operated VPC containing the Imply Hybrid Portal. This peering allows the Imply Hybrid service to communicate with your cluster through private IP addresses, without requiring access over the internet. For increased security, the standard Imply Hybrid configuration blocks ingress from the internet to your Imply Hybrid VPC.
User authentication and authorization
Imply Hybrid identifies each authorized user of your account by a unique login tied to their email address. Imply Hybrid offers a role-based access control (RBAC) model that allows you to ensure that your users have exactly the permissions they need to do their work.
In particular, the RBAC model can be used to grant or restrict the following permissions on Imply Cloud authorized users:
- Ability to manage clusters, datasets, and/or other users.
- Ability to manage data cubes (building blocks of visualizations).
- Ability to manage dashboards.
- Ability to see visualizations (data cube view).
- Ability to query the Imply cluster directly with SQL.
- Ability to load new data into the Imply cluster.
In addition to authorized users, Imply Hybrid lets you define API users that can call Druid APIs on your Imply cluster directly. API users can be used for building your own apps on top of the Druid API, automating workloads, and many other functions. The Druid API is protected by TLS encryption and HTTP Basic authentication. API users can be granted permissions tailored to the access they require, including:
- Ability to read or write druid configuration.
- Ability to load new data into the cluster (all datasets, or specific datasets).
- Ability to query the cluster (all datasets, or specific datasets).
At login, you have the option to authenticate yourself for 30 days by checking the Remember Me checkbox. Subsequent logins from that device and browser will not prompt you to authenticate until the trust period expires.
Data in transit
Imply Hybrid uses Transport Layer Security (TLS) for end-to-end encryption of data in transit. In particular, TLS is used to secure communications including the following:
- All communications between your browser and the Imply Hybrid Portal at https://implycloud.com/.
- All communications between the Imply Hybrid Portal and your private Imply cluster.
- All internal traffic involving your data within your Imply cluster, including data ingestion, persistence, and query requests and responses.
- All Druid API calls you make to your Imply cluster.
TLS 1.0 and 1.1 are deprecated for use with any Imply user interface, including browser-based UIs, such as Pivot or the Imply Manager, or APIs. If you use a supported browser to access Imply user interfaces, you should not be impacted by this change, since they use later protocols exclusively. However, if you have tools or other types of client software that access Imply APIs, you should verify that they use TLS 1.2 or later.
Your Imply cluster also uses ZooKeeper for certain cluster coordination tasks. This traffic is not encrypted. However, Imply Hybrid still safeguards this traffic using network segmentation through AWS EC2 Security Groups. Furthermore, this traffic is restricted to cluster coordination: your data is not at risk of exposure in these communications.
Data at rest
Data that you load into your Imply cluster is never stored on Imply servers. Your data is only stored on EC2 instances and S3 buckets in your own AWS account, and you have full control over it.
Imply Hybrid supports encryption of your data in S3 at rest, through the use of Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), if you configure this option for your S3 buckets linked to Imply Hybrid. The choice of whether or not to use this option is wholly within your control. If you would like to enable this, you should do it before launching your Imply cluster.
Imply Hybrid optionally supports encryption of data at rest on your EC2 instances. The mechanism varies based on instance type:
- For EBS-backed instance types—for example c5, m5—Imply Hybrid can optionally provision encrypted EBS volumes. This is controlled by the "instance encryption" setting in your Imply Hybrid cluster configuration, and is enabled by default for clusters created after April 2, 2018. This setting can be changed for existing clusters, but they must be stopped first and then restarted.
- Instances types that use NVMe local storage—for example i3, c5d, m5d—are transparently encrypted via hardware on the instance. This is always enabled, regardless of your Imply Hybrid "instance encryption" setting. See Amazon official documentation on SSD instance store volumes for details.
Imply Hybrid uses 6 subnets across 3 availability zones to provide resiliency in case of an AWS data center outage (each AZ has one subnet for dynamic addresses and one for static ones). These subnets are configured with a route to an internet gateway to provide egress access to, and optionally ingress access from the internet.
Three security groups are set up for use by EC2 and ELB instances: one managed group containing security rules specified by Imply, and initially empty unmanaged groups for the EC2 and ELB instances respectively intended for custom rules provided by you. By default, there are no internet- facing rules configured and all ingress from the internet is blocked.
To facilitate communication with the Imply Hybrid Manager, the VPC created in your AWS account is linked through a peering connection with the Cloud Manager VPC in Imply's account. This allows your Imply cluster to be addressable by the Manager using only private IP addresses without requiring any form of public accessibility.
Ports used in Imply Hybrid
To allow communication between the Imply Manager and the Druid cluster VPCs, Imply configures the following ingress rules in the managed security group.
The table lists the ports, why they are open, and whether they can be disabled for scenarios in which security policies require strict minimization of open ports.
(Deprecated; will be removed soon)
Used for detailed health checks on Master, Query and Data Nodes.
Coordinator Process on Master Nodes.
Sending node states in the cluster.
Coordinator access is required to perform cluster upgrades initiated from the Imply Manager.
Broker Process on Query Nodes. Used for getting detailed health check information.
Historical Process on Data Nodes. Used for getting detailed health check information.
Overlord Process on Master Nodes for rolling update (checking pending index tasks)
Overlord access is required to perform cluster upgrades initiated from the Imply Manager.
Middle Manager Process on Data Nodes for rolling update (checking pending index tasks)
Middle Manager access is required to perform cluster upgrades initiated from the Imply Manager.
Router Process on Query Nodes. Used for information retrieval and web console access from within the Imply Manager.
Used for retrieving data that is displayed on the Cluster Overview Page on the Imply Manager UI (for example, remaining capacity and server information).
Also, the Manage Data link to the Druid Web Console uses this port.
Used for sending server logs to the Imply control plane, which is surfaced to the Imply Manager UI for user to gain quick access.
Used for the feature function of View server logs
Pivot Process on Query Nodes
Used for submitting pivot queries, dashboard requests from Imply Hybrid Manager UI.
|22||When granted permission, the Imply support team may connect via SSH to cluster nodes to help with troubleshooting and maintenance.|