Query data by API
After you ingest data into Imply Polaris, you can use the Query API to run queries against your data. For information on how to write SQL queries, see Druid SQL overview.
The Query API offers two endpoints for querying data:
/query/sql
submits synchronous queries against precached data. This endpoint returns query results in the response. For more information, see Query precached data./query/sql/statements
submits asynchronous queries against data in deep storage. This endpoint doesn't return query results in the response. Instead, the response includes a query ID to use when checking the query status and retrieving the results. Because queries against deep storage data run asynchronously, performance might be slower compared to querying precached data. For more information, see Query data in deep storage.
Both types of queries include results for real-time data ingested in a streaming job during approximately the past hour.
The following diagram summarizes data accessibility based on the query type:
The following table highlights the differences between synchronous and asynchronous queries in Polaris:
Synchronous query | Asynchronous query | |
---|---|---|
SQL queries supported | SELECT | SELECT, INSERT, REPLACE |
Permissions required | AccessQueries | AccessQueries and ManageIngestionJobs |
Data range covered | Real-time and precached data | Real-time, precached, and noncached data |
Query latency | Seconds | Minutes to hours depending on query complexity |
Max runtime | One minute | Hours to days |
Best use case | Low-latency interactive queries | Scheduled reports, data downloads and exports, long-running complex queries |
Cost | Included in project billing, bound by project capacity | Billed per query based on concurrency factor, auto-scaled on demand |
Data management for optimal query performance
By default, Polaris precaches all data to enable high concurrency and low latency queries. Precached data is pre-loaded into your project and counts towards the project's storage size.
For optimal performance, you should store your most frequently accessed data in the precache. To save costs and conserve resources, especially for less frequently accessed data, you can implement a precache policy that retains data in precache only within a specified time period. Data offloaded from precache can only be accessed from deep storage. Offloaded data doesn't count towards the project’s storage size but continues to incur deep storage costs. For information on precache policies, see Data lifecycle management.
If the time period in your precache policy does not encompass any of the data in the table, no data is precached. You will not be able to query any data in the table if no data is precached. Ensure your precache policy covers at least a portion of data in the table.
Learn more
See the following topics for more information:
- Query data for submitting queries using the SQL workbench in the UI.
- Query API for reference on the Query API.
- Create and manage projects: Get project details to view project storage usage by API.
- Query precached data for querying precached data by API.
- Query data in deep storage for querying deep storage data by API.
- Druid SQL documentation for reference on Druid SQL queries.
- Set a storage policy by API for setting retention and precache policies using the API.