Query data in deep storage by API

info

This is a beta feature. Related APIs are enabled by default for all projects created after July 1, 2024.
If you are running a project created before that time, contact your Polaris support representative to enable access.

Deep storage encompasses all persisted data, including precached and noncached data in Imply Polaris. To query data in deep storage, use the /query/sql/statements endpoint. It returns comprehensive results that also include real-time data. If the data for your query timeframe resides in both precache and deep storage, using the /query/sql endpoint returns results only from precached data.

Polaris queries data in deep storage asynchronously. Asynchronous (async) queries are long running, high latency, and high throughput whereas synchronous queries are short running (highly concurrent) and low latency. Async queries don't return the query results in the same API response. Instead, the response includes a query ID, which you use to check the query's status and retrieve the results. For more information on async queries, see Asynchronous query.

This topic shows you how to query deep storage data asynchronously using the Query API. To submit SQL queries using the SQL workbench in the UI, see Query data. For information on how to write SQL queries, see Druid SQL overview.

Prerequisites

To run async queries, you need the AccessQueries permissions. For DML async queries that involve INSERT or REPLACE, you also need the ManageIngestionJobs permission. For more information on permissions, see Permissions reference.

In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication.

Submit a query

To query data in deep storage, send a POST request to /query/sql/statements with a JSON request body containing your query. The request payload supports the same properties as querying precached data.

You can optionally pass the following request parameters:

page to fetch results based on page numbers. If not specified, all results are returned sequentially starting from page 0 to N in the same response.
resultFormat to defines the format of the results. The default is object. You can choose arrayLines,objectLines,array, and csv.

info

Async queries support INSERT and REPLACE statements for SQL-based ingestion.

Sample request

The following example shows how to submit a request to query data in deep storage:

cURL
Python

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "query": "SELECT \"country\", AVG(\"session_length\") as \"avg_session_length\" FROM \"Koalas Precache\" GROUP BY \"country\""
}'

import os
import requests
import json

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements"

apikey = os.getenv("POLARIS_API_KEY")

payload = json.dumps({
  "query": "SELECT \"country\", AVG(\"session_length\") as \"avg_session_length\" FROM \"Koalas Precache\" GROUP BY \"country\""
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': f'Basic {apikey}'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Sample response

The following example shows a successful response. The response contains information on the state of the query but does not contain query results. Save the queryId to check the query status and, once finished, retrieve the query results.

{
    "createdAt": "2023-11-29T23:28:49.873Z",
    "durationMs": -1,
    "job": null,
    "queryId": "query-7fdc0212-b3b3-408f-b924-5d54539794d0",
    "result": null,
    "schema": [
        {
            "name": "country",
            "type": "VARCHAR",
            "nativeType": "STRING"
        },
        {
            "name": "avg_session_length",
            "type": "DOUBLE",
            "nativeType": "DOUBLE"
        }
    ],
    "state": "ACCEPTED"
}

Get query status

Send a GET request to /query/sql/statements/QUERY_ID to get the status of the async query. Replace QUERY_ID with the value of queryId in the response to submit the query.

To get the status of INSERT and REPLACE async queries, use the Jobs v1 API. For an example, see View and manage jobs by API.

Sample request

The following example shows how to get the status of a request to query data in deep storage:

cURL
Python

curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID" \
--header "Authorization: Basic $POLARIS_API_KEY"

import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
  'Authorization': f'Basic {apikey}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

Sample response

A successful request returns a 200 OK response code and information about the query:

View the response

{
    "createdAt": "2023-11-29T23:28:49.873Z",
    "durationMs": 41709,
    "job": null,
    "queryId": "query-7fdc0212-b3b3-408f-b924-5d54539794d0",
    "result": {
        "dataSource": "__query_select",
        "numTotalRows": 5,
        "pages": [
            {
                "id": 0,
                "numRows": 5,
                "sizeInBytes": 340
            }
        ],
        "resultFormat": null,
        "sampleRecords": [
            [
                null,
                30069.8
            ],
            [
                "Brazil",
                104299.0
            ],
            [
                "Canada",
                12621.0
            ],
            [
                "New Zealand",
                30901.5
            ],
            [
                "United States",
                20329.166666666668
            ]
        ],
        "totalSizeInBytes": 340
    },
    "schema": [
        {
            "name": "country",
            "type": "VARCHAR",
            "nativeType": "STRING"
        },
        {
            "name": "avg_session_length",
            "type": "DOUBLE",
            "nativeType": "DOUBLE"
        }
    ],
    "state": "SUCCESS"
}

Get query results

Send a GET request to /query/sql/statements/QUERY_ID/results to get the results of the query. Replace QUERY_ID with the value of queryId in the response to submit the query.

This request doesn’t return results when the async query performs an ingestion job, that is, a query starting with INSERT or REPLACE. To get the status of the ingestion job, see Get job status.

Sample request

The following example shows how to get the results of an async query:

cURL
Python

curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID/results" \
--header "Authorization: Basic $POLARIS_API_KEY"

import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID/results"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
  'Authorization': f'Basic {apikey}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

Sample response

A successful request returns a 200 OK response code and the query results:

[
    {
        "country": null,
        "avg_session_length": 30069.8
    },
    {
        "country": "Brazil",
        "avg_session_length": 104299.0
    },
    {
        "country": "Canada",
        "avg_session_length": 12621.0
    },
    {
        "country": "New Zealand",
        "avg_session_length": 30901.5
    },
    {
        "country": "United States",
        "avg_session_length": 20329.166666666668
    }
]

Cancel a query

Send a DELETE request to /query/sql/statements/QUERY_ID to cancel the query. Replace QUERY_ID with the value of queryId in the response to submit the query.

For INSERT and REPLACE async queries, use the Jobs v1 API. For more information, see View and manage jobs by API.

Sample request

The following example shows how to cancel an async query:

cURL
Python

curl --location --request DELETE "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID" \
--header "Authorization: Basic $POLARIS_API_KEY"

import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
  'Authorization': f'Basic {apikey}'
}

response = requests.request("DELETE", url, headers=headers)

print(response.text)

Sample response

A successful response returns an HTTP 2xx status code without a response body. An error response returns an HTTP 404 status code with a JSON object that describes the error, such as an invalid query ID.

Learn more

See the following topics for more information:

Query data for querying data using the UI.
Query data by API for an overview of querying data using the API.
Query API for reference on the Query API.
Create and manage projects: Get project details to view project storage usage by API.
Query precached data for querying precached data using the API.
Druid SQL documentation for reference on Druid SQL queries.
Set a storage policy by API for setting retention and precache policies using the API.

Prerequisites​

Submit a query​

Sample request​

Sample response​

Get query status​

Sample request​

Sample response​

Get query results​

Sample request​

Sample response​

Cancel a query​

Sample request​

Sample response​

Learn more​

Prerequisites

Submit a query

Sample request

Sample response

Get query status

Sample request

Sample response

Get query results

Sample request

Sample response

Cancel a query

Sample request

Sample response

Learn more