Query data by API

After you ingest data into Imply Polaris, you can use the Query API to run queries against your data. You can also submit SQL queries using the SQL workbench in the UI. For information on how to write SQL queries, see Druid SQL overview.

This topic shows how to query data using the Query API.

info

Project-less regional API resources have been deprecated and will be removed by the end of September 2024. See Migrate to project-scoped URL for more information.

Prerequisites

You must have an API key with the AccessQueries permission. Querying from deep storage also requires the ManageIngestionJobs permission.

In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

Determine query use case

Polaris stores data in both cache and deep storage. By default, Polaris caches all data to enable high concurrency and low latency queries. Cached data is pre-loaded into your project and counts towards the project's storage size.

For optimal performance, you should store your most frequently accessed data in the cache. To save costs and conserve resources, especially for less frequently accessed data, you can implement a cache policy that retains data in cache only within a specified time period. Polaris offloads cached data that is outside that period into deep storage. Offloaded data doesn't count towards the project’s storage size but continues to incur deep storage costs. Data that's not cached still remains in deep storage. For information on how to set a cache policy, see Offload data from cache.

Deep storage encompasses all data, including real-time, cached, and non-cached data. Polaris queries deep storage data asynchronously and loads it on demand, so expect performance to be slower compared to querying cached data. You can only query data in deep storage using the Query API.

Polaris offers two endpoints for querying data:

/query/sql provides query results directly in response to a request. Use this endpoint to query cached data.
/query/sql/statements doesn't automatically return the query results. The response includes a query ID that you use to check the query's status and retrieve the results. Use this endpoint to query data in deep storage.

Both types of queries also include results for real-time data. This is data that Polaris has ingested in a streaming ingestion job during approximately the past hour.

The following diagram summarizes data accessibility based on the query type:

Query type diagram

caution

If the time period in your cache policy does not encompass any of the data in the table, no data is cached. You will not be able to query any data in the table if no data is cached. Ensure your cache policy covers at least a portion of data in the table.

For information on cache policies, see Data lifecycle management.

Query cached data

tip

For optimal performance, cache the data you expect to query often, and query the data based on the examples in this section.

Use the /query/sql endpoint to query cached data. You also use this endpoint to query data ingested from a streaming source.

If you have an active cache policy, Polaris only keeps data cached when it's within a specified time period. Data outside this time period only resides in deep storage. To query data that's both cached and in deep storage, use the /query/sql/statements endpoint. For more information, see Query data in deep storage.

The examples in this section show how to query cached data.

Submit a query

To submit a Druid SQL query via the API, send a POST request along with a JSON body containing your query. Change the SQL statement in the query field to modify the query.

Sample request

The following example shows how to submit a query using the API:

cURL
Python

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data-raw '{
    "query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
}'

import os
import requests
import json

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"

apikey = os.getenv("POLARIS_API_KEY")

payload = json.dumps({
  "query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
})
headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': f'Basic {apikey}'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Sample response

The following example shows a successful response containing the query result:

View the response

[
    {
        "continent": "North America",
        "counts": 241671
    },
    {
        "continent": "Europe",
        "counts": 169433
    },
    {
        "continent": "Oceania",
        "counts": 35905
    },
    {
        "continent": "Asia",
        "counts": 29902
    },
    {
        "continent": "South America",
        "counts": 25336
    },
    {
        "continent": "Africa",
        "counts": 2263
    },
    {
        "continent": "",
        "counts": 922
    }
]

Query and return CSV format

Specify resultFormat in the JSON object to set the format of the query results. You can return results as JSON objects or arrays or in CSV format.

For additional fields you can define for Query API request payloads, see Query API.

Sample request

The following example shows how to submit a query and return CSV-formatted results:

cURL
Python

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data-raw '{
    "query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC",
    "resultFormat": "csv"
}'

import os
import requests
import json

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"

apikey = os.getenv("POLARIS_API_KEY")

payload = json.dumps({
  "query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC",
  "resultFormat": "csv"
})
headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': f'Basic {apikey}'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Sample response

The following example shows a successful response containing the query result in CSV format:

North America,241671
Europe,169433
Oceania,35905
Asia,29902
South America,25336
Africa,2263
,922

Query and save results to a file

To save the query results to a file, use the curl -o, --output option or the file handling capabilities of your application's programming language.

Sample request

The following example shows how to submit a query and save the results to a file named output.txt:

cURL
Python

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data-raw '{
    "query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
}' \
--output output.txt

import os
import requests
import json

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"

apikey = os.getenv("POLARIS_API_KEY")

payload = json.dumps({
  "query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
})
headers = {
  'Accept': 'application/json',
  'Content-Type': 'application/json',
  'Authorization': f'Basic {apikey}'
}

response = requests.request("POST", url, headers=headers, data=payload)

with open("output.txt", "w") as f:
    f.write(response.text)

Sample response

The file output.txt contains the query results.

Query data in deep storage

info

This is a beta feature. Related APIs are enabled by default for all projects created after July 1, 2024.
If you are running a project created before that time, contact your Polaris support representative to enable access.

To query data that includes a subset stored in deep storage, you must use the /query/sql/statements endpoint. If you have data stored in cache as well as deep storage and you use the /query/sql endpoint, Polaris returns incomplete query results that only contain information from cached data.

Polaris queries data in deep storage asynchronously and loads it on demand. Asynchronous (async) queries are long running, high latency, and high throughput, in contrast to synchronous queries, which are short running (highly concurrent) and low latency. The main benefit of async queries is that they are very cost effective on infrequently accessed data. Async queries can run for longer periods of time on much larger datasets without affecting the rest of your low-latency application.

When you use the /query/sql/statements endpoint, Polaris doesn't return the query results in the same API response. The response includes a query ID that you use to check the query's status and retrieve the results.

The workflow for async queries is as follows:

Submit a query
Get query status
Get query results

Submit a query

To query data in cache and deep storage, send a POST request to /query/sql/statements with a JSON request body containing your query. The request payload supports the same properties as querying cached data. These types of queries are designed for longer-running use cases. The response includes a query ID that you use to get the query's status and results or cancel the query.

You can optionally pass the following request parameters:

page to fetch results based on page numbers. If not specified, all results are returned sequentially starting from page 0 to N in the same response.
resultFormat to defines the format of the results. The default is object. You can choose arrayLines,objectLines,array, and csv.

Sample request

The following example shows how to submit a request to query data in cache and deep storage:

cURL
Python

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
    "query": "SELECT \"country\", AVG(\"session_length\") as \"avg_session_length\" FROM \"Koalas Cache\" GROUP BY \"country\""
}'

import os
import requests
import json

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements"

apikey = os.getenv("POLARIS_API_KEY")

payload = json.dumps({
  "query": "SELECT \"country\", AVG(\"session_length\") as \"avg_session_length\" FROM \"Koalas Cache\" GROUP BY \"country\""
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': f'Basic {apikey}'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Sample response

The following example shows a successful response. The response contains information on the state of the query but does not contain query results. Save the queryId to check the query status and, once finished, retrieve the query results.

{
    "createdAt": "2023-11-29T23:28:49.873Z",
    "durationMs": -1,
    "job": null,
    "queryId": "query-7fdc0212-b3b3-408f-b924-5d54539794d0",
    "result": null,
    "schema": [
        {
            "name": "country",
            "type": "VARCHAR",
            "nativeType": "STRING"
        },
        {
            "name": "avg_session_length",
            "type": "DOUBLE",
            "nativeType": "DOUBLE"
        }
    ],
    "state": "ACCEPTED"
}

Get query status

Send a GET request to /query/sql/statements/QUERY_ID to get the status of the query. Replace QUERY_ID with the value of queryId in the response to submit the query.

Sample request

The following example shows how to get the status of a request to query data in cache and deep storage:

cURL
Python

curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID" \
--header "Authorization: Basic $POLARIS_API_KEY"

import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
  'Authorization': f'Basic {apikey}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

Sample response

A successful request returns a 200 OK response code and information about the query:

View the response

{
    "createdAt": "2023-11-29T23:28:49.873Z",
    "durationMs": 41709,
    "job": null,
    "queryId": "query-7fdc0212-b3b3-408f-b924-5d54539794d0",
    "result": {
        "dataSource": "__query_select",
        "numTotalRows": 5,
        "pages": [
            {
                "id": 0,
                "numRows": 5,
                "sizeInBytes": 340
            }
        ],
        "resultFormat": null,
        "sampleRecords": [
            [
                null,
                30069.8
            ],
            [
                "Brazil",
                104299.0
            ],
            [
                "Canada",
                12621.0
            ],
            [
                "New Zealand",
                30901.5
            ],
            [
                "United States",
                20329.166666666668
            ]
        ],
        "totalSizeInBytes": 340
    },
    "schema": [
        {
            "name": "country",
            "type": "VARCHAR",
            "nativeType": "STRING"
        },
        {
            "name": "avg_session_length",
            "type": "DOUBLE",
            "nativeType": "DOUBLE"
        }
    ],
    "state": "SUCCESS"
}

Get query results

Send a GET request to /query/sql/statements/QUERY_ID/results to get the results of the query. Replace QUERY_ID with the value of queryId in the response to submit the query.

Sample request

The following example shows how to get the results from querying data in cache and deep storage:

cURL
Python

curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID/results" \
--header "Authorization: Basic $POLARIS_API_KEY"

import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID/results"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
  'Authorization': f'Basic {apikey}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

Sample response

A successful request returns a 200 OK response code and the query results:

[
    {
        "country": null,
        "avg_session_length": 30069.8
    },
    {
        "country": "Brazil",
        "avg_session_length": 104299.0
    },
    {
        "country": "Canada",
        "avg_session_length": 12621.0
    },
    {
        "country": "New Zealand",
        "avg_session_length": 30901.5
    },
    {
        "country": "United States",
        "avg_session_length": 20329.166666666668
    }
]

Cancel a query

Send a DELETE request to /query/sql/statements/QUERY_ID to cancel the query. Replace QUERY_ID with the value of queryId in the response to submit the query.

Sample request

The following example shows how to cancel a request to query data in cache and deep storage:

cURL
Python

curl --location --request DELETE "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID" \
--header "Authorization: Basic $POLARIS_API_KEY"

import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
  'Authorization': f'Basic {apikey}'
}

response = requests.request("DELETE", url, headers=headers)

print(response.text)

Sample response

A successful response returns an HTTP 2xx status code without a response body. An error response returns an HTTP 404 status code with a JSON object that describes the error, such as an invalid query ID.

Learn more

See the following topics for more information:

Query data for querying data using the UI.
Query API for reference on the Query API.
Druid SQL documentation for reference on Druid SQL queries.
Query using JDBC for querying data over JDBC.
Set a storage policy by API for setting retention and cache policies using the API.

Prerequisites​

Determine query use case​

Query cached data​

Submit a query​

Sample request​

Sample response​

Query and return CSV format​

Sample request​

Sample response​

Query and save results to a file​

Sample request​

Sample response​

Query data in deep storage​

Submit a query​

Sample request​

Sample response​

Get query status​

Sample request​

Sample response​

Get query results​

Sample request​

Sample response​

Cancel a query​

Sample request​

Sample response​

Learn more​

Prerequisites

Determine query use case

Query cached data

Submit a query

Sample request

Sample response

Query and return CSV format

Sample request

Sample response

Query and save results to a file

Sample request

Sample response

Query data in deep storage

Submit a query

Sample request

Sample response

Get query status

Sample request

Sample response

Get query results

Sample request

Sample response

Cancel a query

Sample request

Sample response

Learn more