Query data by API

After you ingest data into Imply Polaris, you can use the Query API to run queries against your data. For information on how to write SQL queries, see Druid SQL overview.

The Query API offers two endpoints for querying data:

/query/sql submits synchronous queries against precached data. This endpoint returns query results in the response. For more information, see Query precached data.
/query/sql/statements submits asynchronous queries against data in deep storage. This endpoint doesn't return query results in the response. Instead, the response includes a query ID to use when checking the query status and retrieving the results. Because queries against deep storage data run asynchronously, performance might be slower compared to querying precached data. For more information, see Query data in deep storage.

Both types of queries include results for real-time data ingested in a streaming job during approximately the past hour.

The following diagram summarizes data accessibility based on the query type:

Query type diagram

The following table compares synchronous and asynchronous queries in Polaris:

	Synchronous query	Asynchronous query
SQL queries supported	SELECT	SELECT, INSERT, REPLACE
Permissions required	`AccessQueries`	`AccessQueries` for all async queries `ManageIngestionJobs` for INSERT or REPLACE
Data range covered	Real-time and precached data	Real-time, precached, and noncached data
Query latency	Seconds	Minutes to hours depending on query complexity
Max runtime	One minute	Hours to days
Best use case	Low-latency interactive queries	Scheduled reports, data downloads and exports, long-running complex queries
Cost	Included in project billing, bound by project capacity	Billed per query based on concurrency factor, auto-scaled on demand

Data management for optimal query performance

By default, Polaris precaches all data to enable high concurrency and low latency queries. Precached data is pre-loaded into your project and counts towards the project's storage size.

For optimal performance, you should store your most frequently accessed data in the precache. To save costs and conserve resources, especially for less frequently accessed data, you can implement a precache policy that retains data in precache only within a specified time period. Data offloaded from precache can only be accessed from deep storage. Offloaded data doesn't count towards the project’s storage size but continues to incur deep storage costs. For information on precache policies, see Data lifecycle management.

warning

If the time period in your precache policy does not encompass any of the data in the table, no data is precached. You will not be able to query any data in the table if no data is precached. Ensure your precache policy covers at least a portion of data in the table.

Learn more

See the following topics for more information:

Query data for submitting queries using the SQL workbench in the UI.
Query API for reference on the Query API.
Create and manage projects: Get project details to view project storage usage by API.
Query precached data for querying precached data by API.
Query data in deep storage for querying deep storage data by API.
Druid SQL documentation for reference on Druid SQL queries.
Set a storage policy by API for setting retention and precache policies using the API.

Data management for optimal query performance​

Learn more​

Data management for optimal query performance

Learn more