Asynchronous query download is an alpha feature that should be considered experimental and subject to change or removal at any time. Alpha features are provided "as is," and are not subject to Imply SLAs.
Asynchronous query download (async download) lets you run longer-executing queries and retrieve the results after the queries complete. It solves problems caused when timeouts cause interruptions in the connection between query clients and the Druid cluster. Query execution using the async download APIs is similar to that of using the synchronous APIs. Instead of one synchronous API, async download requires you to call three APIs to:
- Submit the query.
- Poll for query status.
- Fetch the result.
Async downloads does not:
- Provide file management APIs. Druid is not a file management system, therefore Druid does not expose the concept of the file containing the results to users.
- Support long retention periods for the query results. You should write your query to fetch the query result as soon as possible. The client cannot use deep storage as a query cache layer.
To enable async query downloads, replicate the following properties for the Broker in
broker/runtime.properties and the Coordinator in
Enabling async downloads
|Must be set to ||yes|
Query result storage
Async downloads supports:
- Local file storage on the broker by default.
- Amazon S3 storage.
The local storage uses a local disk on the broker to store result files. This storage is recommended to use only for testing.
|Must be set to ||no|
|Directory to store query results.||yes when local storage is used||not defined|
The following example shows the configuration for local storage:
druid-s3-extensions must be loaded to use S3 result storage.
|Must be set to ||yes|
|S3 bucket to store query results.||yes|
|S3 prefix to store query results. Do not share the same path with other objects. The auto cleanup task will delete all objects under the same path if it thinks they are expired.||yes|
|Directory path in local disk to store query results temporarily.||yes|
|Max size of each result file. It should be between 5MiB and 5TiB. Supports human-readable format.||no||100MiB|
|Max total size of all query result files. Supports human-readable format.||no||5GiB|
|This property is intended only to support rare cases. This property defines the size of each chunk to temporarily store in ||no||null|
|This property is intended only to support rare cases. This property defines the max number times to attempt s3 API calls to avoid failures due to transient errors.||no||10|
The following example shows the configuration for S3 storage:
druid.query.async.storage.type=s3 druid.query.async.storage.s3.bucket=your_bucket druid.query.async.storage.s3.prefix=your_prefix druid.query.async.storage.s3.tempDir=/path/to/your/temp/dir druid.query.async.storage.s3.maxResultsSize=1GiB druid.query.async.storage.s3.maxTotalResultsSize=10GiB
S3 permissions/lifecycle settings
s3:AbortMultipartUpload are required for pushing/fetching query results to/from S3.
s3:ListBucket are required for cleaning up expired results.
When a result file upload fails, Druid will abort the upload to clean up partially uploaded files. However, if the abort
fails after a couple of retries, those partially uploaded files can remain in your s3 bucket. To handle them, it is recommended to set
a lifecycle rule using
AbortIncompleteMultipartUpload for your s3 bucket and prefix with
A recommended value for
DaysAfterInitiation is 2 * query timeout.
Query state and result file management
Async downloads uses two coordinator duties to clean up expired query states and result files:
When you turn on async downloads, Druid enables these duties automatically.
These duties support the following properties:
|Retention period of query states and result files. Supports the ISO 8601 duration format.||no||PT60S|
|Duty group run period. Must be a form of ||no||PT30S|
Query execution limits
|Max number of active queries that can run concurrently. Druid queues any queries that exceed the active limit.||no||10|
|Max number of queries Druid can track, including actively running queries, queued queries, complete queries, and failed queries. Druid rejects any queries that exceed the limit. For example if you have 10 actively running queries, 15 queued queries, 20 complete queries, and 5 failed queries with ||no||50|
|Max number of queries to store in the queue. Druid rejects any queries that exceed the limit.||no||20|
Submitting a query
To start a query, POST a request to the /druid/v2/sql/async/ endpoint in the same format as SQL sync API.
When the API succeeds, it returns a JSON object with the following fields:
asyncResultId: Unique ID for the query. This ID can be different from the queryId for native or sql queries. Always present.
state: Query state. Always present. Possible states:
INITIALIZED: The query has been set up but has not yet started running.
RUNNING: The query has started running. The time it takes for queries to move from
RUNNINGdepends on the load state of the system.
COMPLETE: The query has finished and results are ready to fetch.
FAILED: The query has failed.
UNDETERMINED: The query state is unknown. This state can be returned if Druid is aware of the query, but cannot determine the valid query state. This is different from the unknown queries.
resultFormat: Result format for this query, taken from the list at SQL sync API. Present when state is
COMPLETE, absent otherwise.
resultLength: Size in bytes of the query results. Present when state is
COMPLETE, absent otherwise.
error: Druid query error object, as described at Query execution failures . Present when state is
FAILED, absent otherwise.
The following HTTP status codes may be returned:
HTTP 202 if the query has been accepted. The returned query status object will be in state "INITIALIZED". You can use the Query Status API to check on its status.
HTTP 429 if the query has been rejected due to concurrency limits. Callers should retry the query after waiting an appropriate amount of time. Exponential backoff is encouraged. The returned query status object will be in state "FAILED". The Query Status API will return 404 for this async result ID.
HTTP 4xx (not 429) or 5xx if the query has been rejected for some other reason. The returned query status object will be in state "FAILED". The Query Status API will return 404 for this async result ID.
Getting query status
After a query has been started, its status can be checked using the below API.
When the API succeeds, it returns the same JSON object as described in the previous section.
A 404 error will be returned if the async result ID does not exist or if the async result ID has expired.
Getting query results
After a query has been completed, you can fetch the results. Attempting to fetch results before query completion returns a 404. This API should return the same results for the same query during the async download retention period, no matter how many times you call the API.
The file, if it is ready.
HTTP 404 if the async result ID does not exist, if the async result ID has expired, or if the async result ID is not in state
The following headers will be set on a successful response:
- Content-Length is set to the result file size.
- Content-Type is set based on the query resultFormat as described in SQL response.
- Content-Disposition is set to "attachment"
async/result/tracked/count: number of queries tracked by Druid.
async/result/tracked/bytes: total results query size tracked by Druid.
async/sqlQuery/running/count: number of running queries.
async/sqlQuery/queued/count: number of queued queries.
async/sqlQuery/tracked/max: max number of queries tracked by Druid. Must be the same as
async/sqlQuery/running/max: max number of running queries. Must be the same as
async/sqlQuery/queued/max: max number of queued queries. Must be the same as
async/cleanup/result/removed/count: number of query result files successfully deleted in each coordinator run.
async/cleanup/result/failed/count: number of failed attempts to delete query results in each coordinator run.
async/cleanup/metadata/removed/count: number of query states successfully cleaned up in each coordinator run.
async/cleanup/metadata/failed/count: number of failed attempts to clean up query states in each coordinator run.
async/cleanup/metadata/skipped/count: number of query states that have not expired in each coordinator run.
- When the broker shuts down gracefully, it marks all queries running in it as
FAILEDbefore it completely shuts down. However, if the broker fails before it finishes updating the query status, the states of queries in progress remain unchanged. In this case, you must manually clean up the queries left in invalid states.