Loading data


Imply Cloud supports the following real-time and batch ingestion methods:

Getting started

The easiest ways to get started with loading data is to follow the included tutorials.

Hybrid batch/streaming

You can combine batch (file-based) and streaming methods in a hybrid batch/streaming architecture, sometimes called a "lambda architecture". In a hybrid architecture, you use a streaming method to do initial ingestion, and then periodically re-ingest 'finalized' data in batch mode (typically every few hours or nightly).

Hybrid architectures are simple with Druid, since batch loaded data for a particular time range automatically replaces streaming loaded data for that same time range. All Druid queries seamlessly access historical data together with real-time data. We recommend this kind of architecture if you need real-time analytics but also need the ability to reprocess historical data. Common reasons for reprocessing historical data include:

Note that with the Kafka indexing service, it is possible to reprocess historical data in a pure streaming architecture, by migrating to a new stream-based datasource whenever you want to reprocess historical data. This is sometimes called a "kappa architecture".



Manage Data

Query Data