At a high level, Imply is an integrated solution that consists of a powerful analytics engine (Druid) and a collaborative app designed for arbitrary drill downs (Pivot).
Druid is the open source analytics data store at the core of the Imply platform. Druid enables arbitrary data exploration, low latency data ingestion, and fast aggregations at scale. Druid can scale to store trillions of events and ingest millions of events per second. Druid is best used to power user-facing data applications.
For more information about Druid, please visit https://druid.apache.org/.
Pivot is a web-based app for visual data exploration. It features dimensional pivoting, slice-and-dice and nested visualizations, as well as contextual information and navigation. It is used to perform OLAP operations with your data and immediately visualize your data once it is loaded in the platform.
For more information about visualizations, please visit the visualize docs.
Clarity is a dev ops and performance analytics tool that connects to your Imply Cluster. Explore anomalies, diagnose performance bottlenecks, and ensure your cluster is working optimally.
Query servers are the endpoints that users and client applications interact with. Query servers run a Druid Broker that routes queries to the appropriate data nodes, and a Druid Router that acts as a unified query and API endpoint. They also include an Imply Pivot server as a way to directly explore and visualize your data.
Data servers store and ingest data. Data servers run Druid Historical Nodes for storage and processing of large amounts of immutable data, Druid MiddleManagers for ingestion and processing of data.
For clusters with complex resource allocation needs, you can break apart the pre-packaged Data server and scale the components individually. This allows you to scale Druid Historical Nodes independently of Druid MiddleManagers, as well as eliminate the possibility of resource contention between historical workloads and real-time workloads.
The Master server coordinates data ingestion and storage in your Druid cluster. It is not involved in queries. It is responsible for starting new ingestion jobs and for handling failover of the Druid Historical Node and Druid MiddleManager processes running on your Data servers.
Master servers can be deployed standalone, or in a highly-available configuration with failover. For failover-based configurations, we recommend separating ZooKeeper and the metadata store into their own hardware. See the Planning documentation for more details.
Imply loads raw data from file systems such as AWS S3, HDFS, or local files, and message buses such as Apache Kafka, or AWS Kinesis. The raw data is converted to a specialized column format that is highly optimized for fast groupings, filters, and aggregations. A traditional database "table" is known as a "datasource" in Imply, and a single Imply deployment may hold multiple datasources.
Datasources can be visualized by creating one or more data cubes in Pivot. Each data cube contains a set of dimensions and measures. Dimensions are attributes of the data that you normally group or filter on. Measures are aggregates. Different visualizations can be created by dragging and dropping dimensions, and one or more measures can be displayed at any time. Data cubes have one primary visualization as the focus, and you can arbitrarily drill into the visualization through any combination of dimensions.
Dashboards can be created from data cubes. Dashboards combine multiple visualizations into a single view and are best used to distill information rather than heavy exploration. Dashboards also support arbitrary filters. You can expand any given visual in a dashboard to return to the data cube view.
Imply's backend exposes a RESTful interface where you can issue queries directly. For more information, see the API documentation.