Apache Druid
  • Imply Documentation

›Querying

Getting started

  • Introduction to Apache Druid
  • Quickstart
  • Docker
  • Single server deployment
  • Clustered deployment

Tutorials

  • Loading files natively
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Kerberized HDFS deep storage

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Schema design tips
  • Data management
  • Stream ingestion

    • Apache Kafka
    • Amazon Kinesis
    • Tranquility

    Batch ingestion

    • Native batch
    • Hadoop-based
  • Task reference
  • Troubleshooting FAQ

Querying

  • Druid SQL
  • Native queries
  • Query execution
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Multitenancy
    • Query caching
    • Context parameters

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Getting started with Apache Druid
  • Basic cluster tuning
  • API reference
  • High availability
  • Rolling updates
  • Retaining or automatically dropping data
  • Metrics
  • Alerts
  • Working with different versions of Apache Hadoop
  • HTTP compression
  • TLS support
  • Password providers
  • dump-segment tool
  • reset-cluster tool
  • insert-segment-to-db tool
  • pull-deps tool
  • Misc

    • Legacy Management UIs
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Segment Size Optimization
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Cardinality/HyperUnique aggregators
  • Select
  • Realtime Process
Edit

Native queries

Apache Druid supports two query languages: Druid SQL and native queries. This document describes the native query language. For information about how Druid SQL chooses which native query types to use when it runs a SQL query, refer to the SQL documentation.

Native queries in Druid are JSON objects and are typically issued to the Broker or Router processes. Queries can be posted like this:

curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>

Replace <queryable_host>:<port> with the appropriate address and port for your system. For example, if running the quickstart configuration, replace <queryable_host>:<port> with localhost:8888.

You can also enter them directly in the Druid console's Query view. Simply pasting a native query into the console switches the editor into JSON mode.

Native query

Druid's native query language is JSON over HTTP, although many members of the community have contributed different client libraries in other languages to query Druid.

The Content-Type/Accept Headers can also take 'application/x-jackson-smile'.

curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @<query_json_file>

If the Accept header is not provided, it defaults to the value of 'Content-Type' header.

Druid's native query is relatively low level, mapping closely to how computations are performed internally. Druid queries are designed to be lightweight and complete very quickly. This means that for more complex analysis, or to build more complex visualizations, multiple Druid queries may be required.

Even though queries are typically made to Brokers or Routers, they can also be accepted by Historical processes and by Peons (task JVMs)) that are running stream ingestion tasks. This may be valuable if you want to query results for specific segments that are served by specific processes.

Available queries

Druid has numerous query types for various use cases. Queries are composed of various JSON properties and Druid has different types of queries for different use cases. The documentation for the various query types describe all the JSON properties that can be set.

Aggregation queries

  • Timeseries
  • TopN
  • GroupBy

Metadata queries

  • TimeBoundary
  • SegmentMetadata
  • DatasourceMetadata

Other queries

  • Scan
  • Search

Which query type should I use?

For aggregation queries, if more than one would satisfy your needs, we generally recommend using Timeseries or TopN whenever possible, as they are specifically optimized for their use cases. If neither is a good fit, you should use the GroupBy query, which is the most flexible.

Query cancellation

Queries can be cancelled explicitly using their unique identifier. If the query identifier is set at the time of query, or is otherwise known, the following endpoint can be used on the Broker or Router to cancel the query.

DELETE /druid/v2/{queryId}

For example, if the query ID is abc123, the query can be cancelled as follows:

curl -X DELETE "http://host:port/druid/v2/abc123"

Query errors

If a query fails, you will get an HTTP 500 response containing a JSON object with the following structure:

{
  "error" : "Query timeout",
  "errorMessage" : "Timeout waiting for task.",
  "errorClass" : "java.util.concurrent.TimeoutException",
  "host" : "druid1.example.com:8083"
}

If a query request fails due to being limited by the query scheduler laning configuration, an HTTP 429 response with the same JSON object schema as an error response, but with errorMessage of the form: "Total query capacity exceeded" or "Query capacity exceeded for lane 'low'".

The fields in the response are:

fielddescription
errorA well-defined error code (see below).
errorMessageA free-form message with more information about the error. May be null.
errorClassThe class of the exception that caused this error. May be null.
hostThe host on which this error occurred. May be null.

Possible codes for the error field include:

codedescription
Query timeoutThe query timed out.
Query interruptedThe query was interrupted, possibly due to JVM shutdown.
Query cancelledThe query was cancelled through the query cancellation API.
Resource limit exceededThe query exceeded a configured resource limit (e.g. groupBy maxResults).
Unauthorized request.The query was denied due to security policy. Either the user was not recognized, or the user was recognized but does not have access to the requested resource.
Unsupported operationThe query attempted to perform an unsupported operation. This may occur when using undocumented features or when using an incompletely implemented extension.
Truncated response contextAn intermediate response context for the query exceeded the built-in limit of 7KB.

The response context is an internal data structure that Druid servers use to share out-of-band information when sending query results to each other. It is serialized in an HTTP header with a maximum length of 7KB. This error occurs when an intermediate response context sent from a data server (like a Historical) to the Broker exceeds this limit.

The response context is used for a variety of purposes, but the one most likely to generate a large context is sharing details about segments that move during a query. That means this error can potentially indicate that a very large number of segments moved in between the time a Broker issued a query and the time it was processed on Historicals. This should rarely, if ever, occur during normal operation.
Unknown exceptionSome other exception occurred. Check errorMessage and errorClass for details, although keep in mind that the contents of those fields are free-form and may change from release to release.
← Druid SQLQuery execution →
  • Available queries
    • Aggregation queries
    • Metadata queries
    • Other queries
  • Which query type should I use?
  • Query cancellation
  • Query errors

Technology · Use Cases · Powered by Druid · Docs · Community · Download · FAQ

 ·  ·  · 
Copyright © 2019 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.