Skip to main content

Query data by API

After you ingest data into Imply Polaris, you can use the Query API to run queries against your data. For information on how to write SQL queries, see Druid SQL overview.

The Query API offers two endpoints for querying data:

  • /query/sql submits synchronous queries against precached data. This endpoint returns query results in the response. For more information, see Query precached data.
  • /query/sql/statements submits asynchronous queries against data in deep storage. This endpoint doesn't return query results in the response. Instead, the response includes a query ID to use when checking the query status and retrieving the results. Because queries against deep storage data run asynchronously, performance might be slower compared to querying precached data. For more information, see Query data in deep storage.

Both types of queries include results for real-time data ingested in a streaming job during approximately the past hour.

The following diagram summarizes data accessibility based on the query type:

Query type diagram

The following table highlights the differences between synchronous and asynchronous queries in Polaris:

Synchronous queryAsynchronous query
SQL queries supportedSELECTSELECT, INSERT, REPLACE
Permissions requiredAccessQueriesAccessQueries and ManageIngestionJobs
Data range coveredReal-time and precached dataReal-time, precached, and noncached data
Query latencySecondsMinutes to hours depending on query complexity
Max runtimeOne minuteHours to days
Best use caseLow-latency interactive queriesScheduled reports, data downloads and exports, long-running complex queries
CostIncluded in project billing, bound by project capacityBilled per query based on concurrency factor, auto-scaled on demand

Data management for optimal query performance

By default, Polaris precaches all data to enable high concurrency and low latency queries. Precached data is pre-loaded into your project and counts towards the project's storage size.

For optimal performance, you should store your most frequently accessed data in the precache. To save costs and conserve resources, especially for less frequently accessed data, you can implement a precache policy that retains data in precache only within a specified time period. Data offloaded from precache can only be accessed from deep storage. Offloaded data doesn't count towards the project’s storage size but continues to incur deep storage costs. For information on precache policies, see Data lifecycle management.

caution

If the time period in your precache policy does not encompass any of the data in the table, no data is precached. You will not be able to query any data in the table if no data is precached. Ensure your precache policy covers at least a portion of data in the table.

Learn more

See the following topics for more information: