Apache Druid
  • Imply Documentation

›Concepts

Getting started

  • Introduction to Apache Druid
  • Quickstart
  • Docker
  • Single server deployment
  • Clustered deployment

Tutorials

  • Loading files natively
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Kerberized HDFS deep storage

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Schema design tips
  • Data management
  • Stream ingestion

    • Apache Kafka
    • Amazon Kinesis
    • Tranquility

    Batch ingestion

    • Native batch
    • Hadoop-based
  • Task reference
  • Troubleshooting FAQ

Querying

  • Druid SQL
  • Native queries
  • Query execution
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Multitenancy
    • Query caching
    • Context parameters

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Getting started with Apache Druid
  • Basic cluster tuning
  • API reference
  • High availability
  • Rolling updates
  • Retaining or automatically dropping data
  • Metrics
  • Alerts
  • Working with different versions of Apache Hadoop
  • HTTP compression
  • TLS support
  • Password providers
  • dump-segment tool
  • reset-cluster tool
  • insert-segment-to-db tool
  • pull-deps tool
  • Misc

    • Legacy Management UIs
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Segment Size Optimization
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Cardinality/HyperUnique aggregators
  • Select
  • Realtime Process
Edit

Query context

General parameters

The query context is used for various query configuration parameters. Query context parameters can be specified in the following ways:

  • For Druid SQL, context parameters are provided either as a JSON object named context to the HTTP POST API, or as properties to the JDBC connection.
  • For native queries, context parameters are provided as a JSON object named context.

These parameters apply to all query types.

propertydefaultdescription
timeoutdruid.server.http.defaultQueryTimeoutQuery timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means no timeout. To set the default timeout, see Broker configuration
priority0Query Priority. Queries with higher priority get precedence for computational resources.
lanenullQuery lane, used to control usage limits on classes of queries. See Broker configuration for more details.
queryIdauto-generatedUnique identifier given to this query. If a query ID is set or known, this can be used to cancel the query
useCachetrueFlag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid uses druid.broker.cache.useCache or druid.historical.cache.useCache to determine whether or not to read from the query cache
populateCachetrueFlag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache or druid.historical.cache.populateCache to determine whether or not to save the results of this query to the query cache
useResultLevelCachetrueFlag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the result-level query cache
populateResultLevelCachetrueFlag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateResultLevelCache to determine whether or not to save the results of this query to the result-level query cache
bySegmentfalseReturn "by segment" results. Primarily used for debugging, setting it to true returns results associated with the data segment they came from
finalizetrueFlag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the hyperUnique aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to false
maxScatterGatherBytesdruid.server.http.maxScatterGatherBytesMaximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce maxScatterGatherBytes limit at query time. See Broker configuration for more details.
maxQueuedBytesdruid.broker.http.maxQueuedBytesMaximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to maxScatterGatherBytes, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.
serializeDateTimeAsLongfalseIf true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process
serializeDateTimeAsLongInnerfalseIf true, DateTime is serialized as long in the data transportation between Broker and compute process
enableParallelMergetrueEnable parallel result merging on the Broker. Note that druid.processing.merge.useParallelMergePool must be enabled for this setting to be set to true. See Broker configuration for more details.
parallelMergeParallelismdruid.processing.merge.pool.parallelismMaximum number of parallel threads to use for parallel result merging on the Broker. See Broker configuration for more details.
parallelMergeInitialYieldRowsdruid.processing.merge.task.initialYieldNumRowsNumber of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See Broker configuration for more details.
parallelMergeSmallBatchRowsdruid.processing.merge.task.smallBatchNumRowsSize of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See Broker configuration for more details.
useFilterCNFfalseIf true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.

Query-type-specific parameters

In addition, some query types offer context parameters specific to that query type.

TopN

propertydefaultdescription
minTopNThreshold1000The top minTopNThreshold local results from each segment are returned for merging to determine the global topN.

Timeseries

propertydefaultdescription
skipEmptyBucketsfalseDisable timeseries zero-filling behavior, so only buckets with results will be returned.

GroupBy

See the list of GroupBy query context parameters available on the groupBy query page.

Vectorization parameters

The GroupBy and Timeseries query types can run in vectorized mode, which speeds up query execution by processing batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following requirements:

  • All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
  • All filters in filtered aggregators must offer vectorized row-matchers.
  • All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
  • No virtual columns.
  • For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
  • For GroupBy: No multi-value dimensions.
  • For Timeseries: No "descending" order.
  • Only immutable segments (not real-time).
  • Only table datasources (not joins, subqueries, lookups, or inline datasources).

Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without vectorization. These query types will ignore the "vectorize" parameter even if it is set to "force".

propertydefaultdescription
vectorizetrueEnables or disables vectorized query execution. Possible values are false (disabled), true (enabled if possible, disabled otherwise, on a per-segment basis), and force (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The "force" setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override druid.query.vectorize if it's set.
vectorSize512Sets the row batching size for a particular query. This will override druid.query.vectorSize if it's set.
← Query cachingTimeseries →
  • General parameters
  • Query-type-specific parameters
    • TopN
    • Timeseries
    • GroupBy
  • Vectorization parameters

Technology · Use Cases · Powered by Druid · Docs · Community · Download · FAQ

 ·  ·  · 
Copyright © 2019 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.