Skip to main content

Data lifecycle management

The lifecycle for data in Imply Polaris includes ingesting, querying, and deleting data. You can control the data lifecycle by configuring storage policies that determine the time to live (TTL) for data stored in Polaris or its accessibility from cache. When you set storage policies that remove obsolete data or offload data that's less frequently used, you can decrease storage costs, increase query performance, and reduce manual data maintenance.

This topic introduces storage policies you can use to delete data or control what data is cached in a table.

Automatically delete data

The default retention behavior in Polaris is to retain all data forever until manually deleted. You can set a retain-type storage policy, or retention policy, to automatically delete data with timestamps outside the specified time period. Polaris evaluates data that meet the criteria for automatic deletion every minute.

Data outside the retention period does not appear in queries or count towards your project size. Polaris schedules the data for permanent deletion 30 days after the retention period expires. You can recover deleted data within this 30 day grace period. Note that permanent deletion is a periodic, intermittent process. Once scheduled, permanent deletion may take up to an additional 30 days to complete.

Keep the following points in mind when setting or updating a retention policy for a table:

  • If you ingest data outside the retention period, Polaris ingests but immediately removes the data.
  • Polaris always retains data with timestamps in the future.
  • When you update an existing retention period to a longer time period, Polaris does not automatically recover previously dropped or deleted data. If the deleted data is within the 30 day grace period, you can recover this data.
  • The table's partitioning setting limits the time granularity for the time period. For example, when the table has a partitioning granularity of day, you can't configure a retention period at the minute level.
  • When you delete all data in a table through a retention policy or a data deletion job, Polaris removes any undeclared columns from both the queryable schema and the table schema. You can re-add these columns to your table schema by ingesting more data that includes the desired columns or by declaring them on the table schema. For more information, see Flexible table.

Configure retention policy

Follow these steps to configure a retention policy:

  1. From the table view, click Manage > Edit data storage policy.
  2. Under Retention policy, select Retain data for.
  3. Select the time unit and enter a value for the length of time. Polaris keeps data within this period and schedules data outside this period to be deleted.
  4. Click Save.

The following screenshot shows the Data storage policy dialog with a set retention policy:

Retain storage policy

Offload data from cache

info

This is a beta feature available to select customers. Imply must enable the feature for you. Contact your Polaris support representative to find out more.

When you ingest data into Imply Polaris, Polaris stores the data in both cache and deep storage. Data in deep storage is a superset of all data, containing cached data as well as data not loaded into cache. By default, Polaris caches all data to enable high concurrency and low latency workloads. Cached data pre-loads data into your project which counts towards the storage size of your project. Any data ingested beyond your pre-purchased capacity does not get loaded into the cache but into deep storage only.

To save costs and conserve resources, such as when running longer reporting-style queries, you can offload data from the cache when the data is outside a certain time period. Data that resides only in deep storage can still be queried, but the data is loaded on demand so expect performance to be slower than querying cached data. You can currently only query data in deep storage using the API, not the SQL workbench.

A cached-type storage policy, or cache policy, only keeps data in cache when it's within the specified time period. Polaris evaluates data that meet the criteria for removal from cache every minute.

info

Cache policies encompass retention behavior. Polaris retains all data that is cached, regardless of the time range set in the retention policy.

Keep the following points in mind when setting or updating a cache policy for a table:

  • If you ingest data outside the cache period, the data is ingested onto then immediately removed from cache.
  • Polaris always caches data with timestamps in the future.
  • When you update an existing cache policy to a longer time period, Polaris reloads the applicable data onto cache. Note that this may cause you to exceed your project capacity.
  • The table's partitioning setting limits the time granularity for the time period. For example, when the table has a partitioning granularity of day, you can't configure a cache period at the minute level.
caution

If the time period in your cache policy does not encompass any of the data in the table, no data is cached. You will not be able to query any data in the table if no data is cached. Ensure your cache policy covers at least a portion of data in the table.

Configure cache policy

Follow these steps to configure a cache policy:

  1. From the table view, click Manage > Edit data storage policy.
  2. Under Cache policy, select Cache and retain data for.
  3. Select the time unit and enter a value for the length of time. Polaris only caches data within this period. Data outside this period is stored in deep storage.
  4. Click Save.

The following screenshot shows the Data storage policy dialog with a set cache policy:

Cache storage policy

Learn more

See the following topics for more information: