Skip to main content

Data lifecycle management

The lifecycle for data in Imply Polaris includes ingesting, querying, and deleting data. You can control the data lifecycle by configuring storage policies that determine the time to live (TTL) for data stored in Polaris or how quickly a query accesses data. When you set storage policies that remove obsolete data or offload data that's less frequently used, you can decrease storage costs, increase query performance, and reduce manual data maintenance.

You can customize the following storage policies for a table:

  • A retention policy to automatically delete data.
  • A precache policy to control the data that's precached or only in deep storage.

This topic introduces storage policies you can use to delete data or control what data is precached in a table.

Flexible table schema

For a flexible table, when you delete all data or when no data is precached, Polaris removes the undeclared columns from the table's queryable schema. The table schema will no longer display those columns. To re-add the columns to your table schema, either ingest and precache data that includes the desired columns or declare them on the table schema. For more information, see Flexible table.

Lookups

You can't edit storage policies on a table that's used as a lookup source.

Automatically delete data

The default retention behavior in Polaris is to retain all data forever until manually deleted. You can set a retain-type storage policy, or retention policy, to automatically delete data with timestamps outside the specified time period. Polaris evaluates data that meet the criteria for automatic deletion every minute.

Data outside the retention period does not appear in queries or count towards your project size. Polaris schedules the data for permanent deletion 30 days after the retention period expires. You can recover deleted data within this 30 day grace period. Note that permanent deletion is a periodic, intermittent process. Once scheduled, permanent deletion may take up to an additional 30 days to complete.

Offload data from precache

When you ingest data into Imply Polaris, Polaris stores the data in both precache and deep storage. Data in deep storage is a superset of all data, containing precached data as well as data not loaded into precache. By default, Polaris precaches all data to enable high concurrency and low latency workloads. Precached data pre-loads data into your project which counts towards the storage size of your project. Any data ingested beyond your pre-purchased capacity does not get loaded into the precache but into deep storage only.

To save costs and conserve resources, such as when running longer reporting-style queries, you can offload data from the precache when the data is outside a certain time period. Data that resides only in deep storage can still be queried, but the data is loaded on demand so expect performance to be slower than querying precached data. Only asynchronous queries can access data in deep storage. For information on querying data whether its precached or only in deep storage, see Query data.

A cached-type storage policy, or precache policy, only keeps data in precache when it's within the specified time period. Polaris evaluates data that meet the criteria for removal from precache every minute.

info

Precache policies encompass retention behavior. Polaris retains all data that is precached, regardless of the time range set in the retention policy.

View table storage

The Tables list view shows the size of every table in terms of what's precached and what's only in deep storage:

Table size with precache policy list

You can also access this information in the table view:

Table size with precache policy table

To view storage usage of a table using the API, see Set a storage policy by API.

Data management and access

  • Don't offload all your data from precache. Ensure at least one segment of data is precached for Polaris to plan and execute queries. If no data is precached, you won't be able to run queries, whether synchronous or asynchronous, on any data.

  • If a column is only present in deep storage but not precached data, Polaris doesn't show that column in the UI.

  • Synchronous queries, submitted using the UI or API, rely on the schema of precached data. This may affect your queries in the following ways:

    • If the detected data type of your precached data differs the data only in deep storage, the results use the type of the precached data. This may not be the best fit type for all your data.
    • If you explicitly reference a column that's in deep storage but not precached (for example, SELECT col1), the query fails.
    • If you indirectly reference a column that's in deep storage but not precached (for example, SELECT *), the query doesn't return any results for that column.

Storage policy time period

A storage policy applies for the time period that you define in the policy. For example, if your retention policy specifies a one-week duration, Polaris only retains the most recent week of data.

You can specify the time period using a duration relative to the current time or one or more intervals in UTC time. A policy can accept both a duration and intervals.

The following list describes how Polaris applies a storage policy with respect to its time period:

  • If you ingest data outside the storage policy period:

    • Retention policy: Polaris ingests the data but immediately removes it.
    • Precache policy: Polaris ingests the data onto precache then immediately offloads it to deep storage.
  • If your policy specifies a duration and your data contains timestamps into the future:

    • Retention policy: Polaris retains the future data.
    • Precache policy: Polaris precaches the future data.
  • If you update a storage policy to lengthen its time period:

    • Retention policy: Polaris does not recover previously deleted data.
      If the deleted data is within the 30 day grace period, you can recover this data.

    • Precache policy: Polaris reloads the applicable data onto precache.
      Note that this may cause you to exceed your project capacity.

      If you have both retention and precache policies and want to extend both of their time periods, follow this sequence to ensure that Polaris loads your data properly:

      1. Extend the time period of the retention policy.
      2. Restore any deleted data corresponding to the new retention period.
      3. Extend the time period of precache policy.

Segment granularity

The granularity of the table's segments informs how Polaris identifies data impacted by the policy. Polaris applies the policy to all segments that overlap the policy time period, even if the time chunk of the segment isn't fully contained within the time period. In other words, Polaris also retains or precaches segments that have partial overlap with the storage policy period.

Example policy with duration

For example, consider that the current day is December 31, 2024. The duration P70D represents the past 70 days, or October 22 to December 31. If your data is stored in month granularity segments for October, November, and December, Polaris applies the retention or precache behavior to the entirety of segments for these three months. This means that Polaris also retains or precaches all data from October 1 to October 31 in the partially overlapping segment. The following diagram illustrates this example:

Storage policy time period by duration

Example policy with interval

Consider that your data is stored in month granularity segments. If you provide a time interval that overlaps two segments, Polaris applies the retention or precache behavior to both segments. The following diagram illustrates this example:

Storage policy time period by interval

Combine retention and precache policies

Precache policies and retention policies don't need to overlap. This way you can create policies to fit your storage and query performance requirements. For example, consider a retention policy that specifies the period P90D and a precache policy that specifies the predating time interval 2022-01-01/2023-01-01. Since all precached data is retained regardless of your retention policy, the data for the interval is both precached and retained.

With both policies set, Polaris manages the data as follows:

  • Retain but do not precache data for the last 90 days. You can query this data from deep storage using asynchronous queries.
  • Retain and precache all data from the year 2022. You can query this data using both synchronous and asynchronous queries.

Configure a storage policy

Follow these steps to configure a retention or precache policy:

  1. From the table view, click Manage > Edit data storage policy.
  2. Select the tab for Retention policy or Precache policy.
  3. Set rules for a time period, UTC-based time intervals, or both. If you set multiple rules, Polaris retains or precaches data that fits any of the rules, depending on the rule you set. Precache policies encompass retention behavior, so whatever data is precached is also retained.
  4. Click Save.

The following screenshot shows the Data storage policy dialog:

Storage policy dialog

Learn more

See the following topics for more information: