Query data by API
Project-less regional API resources have been deprecated and will be removed by the end of September 2024.
You must include the project ID in the URL for all regional API calls in projects created after September 29, 2023.
For example: https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID
Projects created before September 29, 2023 can continue to use project-less URLs until the end of September 2024. We strongly recommend updating your regional API calls to include the project ID prior to September 2024. See the API migration guide for more information.
After you ingest data into Imply Polaris, you can use the Query API to run queries against your data. You can also submit SQL queries using the SQL workbench in the UI. For information on how to write SQL queries, see Druid SQL overview.
This topic shows how to query data in a table using the Polaris API.
Prerequisites
This topic assumes you have an API key with the AccessQueries
permission.
Querying from deep storage also requires the ManageIngestionJobs
permission.
In the examples below, the key value is stored in the variable named POLARIS_API_KEY
.
For information about how to obtain an API key and assign permissions, see API key authentication.
For more information on permissions, visit Permissions reference.
Determine query use case
Polaris offers two endpoints to query your data, /query/sql
and /query/sql/statement
.
This section offers guidance to determine which endpoint to choose, depending on your cache policy and use case.
For optimal performance, ensure that regularly accessed data is stored in cache.
The default behavior in Polaris is to keep all data cached.
To query cached data, use /query/sql
.
Polaris returns query results directly in response to the query request.
To save costs and conserve resources, such as when running longer reporting-style queries,
you can offload data from the cache when the data is outside a certain time period.
Data that's not cached still remains in deep storage.
To query cached data as well as data that's not cached but in deep storage, use /query/sql/statements
.
Polaris does not immediately return the query results.
The response includes a query ID that you use to check the query's status and get its results.
Both types of queries also include results for real-time data, or data that has been ingested in a streaming ingestion job from approximately the past hour.
The following diagram summarizes data accessibility based on the query type:
If the time period in your cache policy does not encompass any of the data in the table, no data is cached. You will not be able to query any data in the table if no data is cached. Ensure your cache policy covers at least a portion of data in the table.
For information on cache policies, see Data lifecycle management.
Query cached data
For optimal performance, cache the data you expect to query often, and query the data based on the examples in this section.
Polaris keeps all data cached by default for high concurrency and low latency workloads.
Use the /query/sql
endpoint to query cached data.
You also use this endpoint to query data ingested from a streaming source.
If you have an active cache policy,
Polaris only keeps data cached when it's within a specified time period.
Data outside this time period only resides in deep storage.
To query data that's both cached and in deep storage, use the /query/sql/statements
endpoint.
For more information, see Query data in cache and deep storage.
The examples in this section show how to query cached data.
Submit a query
To submit a Druid SQL query via the API, send a POST
request along with a JSON body containing your query.
Change the SQL statement in the query
field to modify the query.
Sample request
The following example shows how to submit a query using the API:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--user ${POLARIS_API_KEY}: \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data-raw '{
"query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
})
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'Basic {apikey}'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response containing the query result:
View the response
[
{
"continent": "North America",
"counts": 241671
},
{
"continent": "Europe",
"counts": 169433
},
{
"continent": "Oceania",
"counts": 35905
},
{
"continent": "Asia",
"counts": 29902
},
{
"continent": "South America",
"counts": 25336
},
{
"continent": "Africa",
"counts": 2263
},
{
"continent": "",
"counts": 922
}
]
Query and return CSV format
Specify resultFormat
in the JSON object to set the format of the query results.
You can return results as JSON objects or arrays or in CSV format.
For additional fields you can define for Query API request payloads, see Query API.
Sample request
The following example shows how to submit a query and return CSV-formatted results:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--user ${POLARIS_API_KEY}: \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data-raw '{
"query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC",
"resultFormat": "csv"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC",
"resultFormat": "csv"
})
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'Basic {apikey}'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response containing the query result in CSV format:
North America,241671
Europe,169433
Oceania,35905
Asia,29902
South America,25336
Africa,2263
,922
Query and save results to a file
To save the query results to a file, use the curl -o, --output
option or the file handling capabilities of your application's programming language.
Sample request
The following example shows how to submit a query and save the results to a file named output.txt
:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--user ${POLARIS_API_KEY}: \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--data-raw '{
"query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
}' \
--output output.txt
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"query": "SELECT continent, COUNT(*) AS counts FROM \"Koalas to the Max\" GROUP BY 1 ORDER BY counts DESC"
})
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'Basic {apikey}'
}
response = requests.request("POST", url, headers=headers, data=payload)
with open("output.txt", "w") as f:
f.write(response.text)
Sample response
The file output.txt
contains the query results.
Query data in cache and deep storage
This is a beta feature available to select customers. Imply must enable the feature for you. Contact your Polaris support representative to find out more.
The queries in this section run asynchronously and load the data on demand. These queries are designed for downloads and reporting purposes. To ensure high performance, do not use these queries as the primary mode of querying.
Use the /query/sql/statements
endpoint to query data that resides in deep storage.
Data in deep storage is a superset of all data, including cached data as well as data
not loaded into cache.
If you have data stored in cache as well as deep storage and you use the /query/sql
endpoint,
your query results will be incomplete, only containing the results for cached data.
For details on setting a cache policy to manage data caching, see Offload data from cache.
Submit a query
To query data in cache and deep storage, send a POST
request to /query/sql/statements
with
a JSON request body containing your query. The request payload supports the same properties as querying cached data.
These types of queries are designed for longer-running use cases, and Polaris does not immediately return the query results.
The response includes a query ID that you use to get the query's status and results or cancel the query.
Sample request
The following example shows how to submit a request to query data in cache and deep storage:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements" \
--user ${POLARIS_API_KEY}: \
--header 'Content-Type: application/json' \
--data '{
"query": "SELECT \"country\", AVG(\"session_length\") as \"avg_session_length\" FROM \"Koalas Cache\" GROUP BY \"country\""
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"query": "SELECT \"country\", AVG(\"session_length\") as \"avg_session_length\" FROM \"Koalas Cache\" GROUP BY \"country\""
})
headers = {
'Content-Type': 'application/json',
'Authorization': f'Basic {apikey}'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response.
The response contains information on the state of the query but does not contain query results.
Save the queryId
to check the query status and, once finished, retrieve the query results.
{
"createdAt": "2023-11-29T23:28:49.873Z",
"durationMs": -1,
"job": null,
"queryId": "query-7fdc0212-b3b3-408f-b924-5d54539794d0",
"result": null,
"schema": [
{
"name": "country",
"type": "VARCHAR",
"nativeType": "STRING"
},
{
"name": "avg_session_length",
"type": "DOUBLE",
"nativeType": "DOUBLE"
}
],
"state": "ACCEPTED"
}
Get query status
Send a GET
request to /query/sql/statements/QUERY_ID
to get the status of the query.
Replace QUERY_ID
with the value of queryId
in the response to submit the query.
Sample request
The following example shows how to get the status of a request to query data in cache and deep storage:
- cURL
- Python
curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID" \
--user ${POLARIS_API_KEY}:
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID"
apikey = os.getenv("POLARIS_API_KEY")
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
Sample response
A successful request returns a 200 OK
response code and information about the query:
View the response
{
"createdAt": "2023-11-29T23:28:49.873Z",
"durationMs": 41709,
"job": null,
"queryId": "query-7fdc0212-b3b3-408f-b924-5d54539794d0",
"result": {
"dataSource": "__query_select",
"numTotalRows": 5,
"pages": [
{
"id": 0,
"numRows": 5,
"sizeInBytes": 340
}
],
"resultFormat": null,
"sampleRecords": [
[
null,
30069.8
],
[
"Brazil",
104299.0
],
[
"Canada",
12621.0
],
[
"New Zealand",
30901.5
],
[
"United States",
20329.166666666668
]
],
"totalSizeInBytes": 340
},
"schema": [
{
"name": "country",
"type": "VARCHAR",
"nativeType": "STRING"
},
{
"name": "avg_session_length",
"type": "DOUBLE",
"nativeType": "DOUBLE"
}
],
"state": "SUCCESS"
}
Get query results
Send a GET
request to /query/sql/statements/QUERY_ID/results
to get the results of the query.
Replace QUERY_ID
with the value of queryId
in the response to submit the query.
Sample request
The following example shows how to get the results from querying data in cache and deep storage:
- cURL
- Python
curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID/results" \
--user ${POLARIS_API_KEY}:
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID/results"
apikey = os.getenv("POLARIS_API_KEY")
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
Sample response
A successful request returns a 200 OK
response code and the query results:
[
{
"country": null,
"avg_session_length": 30069.8
},
{
"country": "Brazil",
"avg_session_length": 104299.0
},
{
"country": "Canada",
"avg_session_length": 12621.0
},
{
"country": "New Zealand",
"avg_session_length": 30901.5
},
{
"country": "United States",
"avg_session_length": 20329.166666666668
}
]
Cancel a query
Send a DELETE
request to /query/sql/statements/QUERY_ID
to cancel the query.
Replace QUERY_ID
with the value of queryId
in the response to submit the query.
Sample request
The following example shows how to cancel a request to query data in cache and deep storage:
- cURL
- Python
curl --location --request DELETE "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID" \
--user ${POLARIS_API_KEY}:
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql/statements/QUERY_ID"
apikey = os.getenv("POLARIS_API_KEY")
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("DELETE", url, headers=headers)
print(response.text)
Sample response
A successful response returns an HTTP 2xx
status code without a response body.
An error response returns an HTTP 404
status code with a JSON object that describes the error, such as an invalid query ID.
Learn more
See the following topics for more information:
- Query data for querying data using the UI.
- Query API for reference on the Query API.
- Druid SQL documentation for reference on Druid SQL queries.
- Query using JDBC for querying data over JDBC.
- Set a storage policy by API for setting retention and cache policies using the API.