Asynchronous query
This is a beta feature. Related APIs are enabled by default for all projects created after July 1, 2024. If you are running a project created before that time, contact your Polaris support representative to enable access.
You might want to disable the async query feature. To do so, contact your Polaris support representative.
Imply Polaris stores data in both precache and deep storage. Deep storage encompasses all persisted data, including precached and noncached data. Precached data refers to a subset of your data that is stored in deep storage but is also loaded into a caching layer.
Polaris queries data in deep storage asynchronously. Asynchronous (async) queries are long running, high latency, and high throughput. They can run for longer periods of time on much larger datasets without affecting the rest of the application. Async queries are more cost effective on infrequently accessed data than synchronous (sync) queries.
To run async queries, you need the AccessQueries
and ManageIngestionJobs
permissions.
For more information on permissions, see Permissions reference.
Common use cases for executing an async query include:
- Retaining a significantly larger amount of historical data without increasing the project size.
- Running long-duration queries that would typically time out on a regular query engine.
- Executing complex queries, such as large joins (often referred to as fact-to-fact joins).
- Downloading high volumes of data. Polaris considers more than 100 MB a high volume for a single query.
The async query feature doesn’t support the direct export of query results to deep storage.
Execute an async query
You use the /query/sql/statements
endpoint of the Query API to execute async queries on data in deep storage.
The /query/sql/statements
endpoint doesn't return the query results in the same API response. Instead, the response includes a query ID, which you use to check the query's status and retrieve the results.
For more information and examples, see Query data in deep storage.
When you query data asynchronously, be aware that you can potentially process a large amount of data, resulting in high usage costs.
This is because async queries include real-time, precached, and noncached data.
For instance, an async SELECT COUNT(*)
query may return more results than a sync query.
We recommend limiting time ranges of your queries to protect them from running over.
Learn more
See the following topics for more information:
- Query API for reference on the Query API.
- Query data by API for querying data using the Query API.
- Create and manage projects: Get project details to view the amount of data only in deep storage using the API.
- Downloads from deep storage for downloading data stored in deep storage.
- Data lifecycle management for offloading data from precache.
- Set a storage policy by API for setting retention and precache policies using the API.
- Billing overview for async query billing.