Skip to main content

Query processing

This topic provides a high-level overview of how Apache Druid distributes and processes queries.

The general flow is as follows:

  1. A query enters the Broker service, which identifies the segments with data that may pertain to that query. The list of segments is always pruned by time, and may also be pruned by other attributes depending on how the datasource is partitioned.
  2. The Broker identifies which Historical and MiddleManager services are serving those segments and distributes a rewritten subquery to each of the services.
  3. The Historical and MiddleManager services execute each subquery and return results to the Broker.
  4. The Broker merges the partial results to get the final answer, which it returns to the original caller.

Druid uses time and attribute pruning to minimize the data it must scan for each query.

For filters that are more precise than what the Broker uses for pruning, the indexing structures inside each segment allow Historical services to identify matching rows before accessing the data. Once the Historical service knows which rows match a particular query, it only accesses the requires rows and columns.

To maximize query performance, Druid uses the following techniques:

  • Pruning the set of segments accessed for a query.
  • Within each segment, using indexes to identify which rows must be accessed.
  • Within each segment, only reading the specific rows and columns that are relevant to a particular query.

Learn more‚Äč

See the following topic for more information: