Loading data

Connect

Imply supports all of Druid's real-time and batch ingestion methods. The most popular configurations are:

Getting started

The easiest ways to get started with loading data is to follow the included tutorials.

Hybrid batch/streaming

You can combine batch (file-based) and streaming methods in a hybrid batch/streaming architecture, sometimes called a "lambda architecture". In a hybrid architecture, you use a streaming method to do initial ingestion, and then periodically re-ingest 'finalized' data in batch mode (typically every few hours or nightly).

Hybrid architectures are simple with Druid, since batch loaded data for a particular time range automatically replaces streaming loaded data for that same time range. All Druid queries seamlessly access historical data together with real-time data. We recommend this kind of architecture if you need real-time analytics but also need the ability to reprocess historical data. Common reasons for reprocessing historical data include:

Note that with the Kafka indexing service, it is possible to reprocess historical data in a pure streaming architecture, by migrating to a new stream-based datasource whenever you want to reprocess historical data. This is sometimes called a "kappa architecture".

Realtime nodes

Imply supports using Realtime nodes to load data, but we generally do not recommend this. Realtime nodes are a legacy streaming ingestion mechanism that do not offer a way to easily achieve redundancy, durability, and high availability. They can also be difficult to manage at scale. We believe that in most cases, Tranquility or the Kafka indexing service are more suitable choices.

Imply does not include built-in configurations for Realtime nodes.

Overview

Tutorial

Deploy

Manage Data

Query Data

Visualize

Configure

Special UI Features

Misc