Imply overview

Imply is a real-time reporting solution to rapidly ingest, visualize, slice and dice, drill down, and aggregate critical business activities. Imply is particularly powerful for providing low latency queries on high volume, high dimension, high cardinality data. You can ingest data either as streams (in real-time) or as static files.

Imply is offered as a cloud service or installable on-premise.

Deployment models

Imply Cloud (AWS managed service)

Imply deploys EC2 instances in a Virtual Private Cloud (VPC) running in your Amazon Web Services account. You own the cluster and the data. The main features include:

  • Easily provision and scale clusters: start, terminate, upgrade, scale up, or scale down clusters with a simple point-and-click interface.
  • Leverage the power of Druid: Imply Cloud is created by the original team behind the Druid project. You can be confident that your clusters are expertly managed and tuned.
  • Easily import data: Seamlessly load data streaming and batch data. Built in support for data ingestion directly from Kafka, S3, HDFS, Spark, Samza, Storm, and HTTP.
  • Operational visibility: Imply includes Clarity, our performance monitoring and analytics solution for Druid clusters.

For docs on Imply Cloud, please see here.

On-premise installation

You can download Imply as packaged software and install it in any on-premise or cloud-based environment. If you deploy Imply on-premise, you will have to self-manage deployment, operations, and updates. The management, easy data loading, and operations features of Imply Cloud are not available for on-premise installations.

For docs on installing Imply on-premise, please see here.

Main components

Diagram

Druid

Druid is the open source analytics data store at the core of the platform. Druid enables arbitrary data exploration, low latency data ingestion, and fast aggregations at scale. Druid can scale to store trillion of events and ingest millions of events per second. Druid is best used to power user-facing data applications.

For more information about Druid, please visit http://druid.io.

Pivot

Imply Pivot is a web-based UI for visual data exploration. It features dimensional pivoting, slice-and- dice and nested visualization, as well as contextual information and navigation. Use Pivot to perform OLAP operations with your data and immediately visualize your data once it is loaded in the platform.

For more information about Pivot, please visit the Pivot section.

Clarity

Clarity is a dev ops and performance analytics tool that connects to your Imply Cluster. Explore anomalies, diagnose performance bottlenecks, and ensure your cluster is working optimally.

Server types

Diagram

  • Query servers running Druid Brokers and Imply Pivot.
  • Data servers running Druid Historical Nodes and Druid MiddleManagers.
  • Master server(s) running a Druid Coordinator and Druid Overlord.

Query Server

Query servers are the endpoints that users and client applications interact with. Query servers run a Druid Broker that route queries to the appropriate data nodes. They also include an Imply Pivot server as a way to directly explore and visualize your data.

Data Server

Data servers store and ingest data. Data servers run Druid Historical Nodes for storage and processing of large amounts of immutable data, Druid MiddleManagers for ingestion and processing of data, and optionally Tranquility components to assist in streaming data ingestion.

For clusters with complex resource allocation needs, you can break apart the pre-packaged Data server and scale the components individually. This allows you to scale Druid Historical Nodes independently of Druid MiddleManagers, as well as eliminate the possibility of resource contention between historical workloads and real-time workloads.

Master Server

The Master server coordinates data ingestion and storage in your Druid cluster. It is not involved in queries. It is responsible for starting new ingestion jobs and for handling failover of the Druid Historical Node and Druid MiddleManager processes running on your Data servers.

Master servers can be deployed standalone, or in a highly-available configuration with failover. For failover-based configurations, we recommend separating ZooKeeper and the metadata store into their own hardware. See the clustering documentation for more details.

Overview