Skip to main content

Set a storage policy by API

You can set a storage policy on a table to manage the lifecycle and query accessibility of data in the table. A retention policy is a storage policy that determines how long to retain data. A precache policy is a storage policy that determines how long to keep data precached. There are two types of policies you can use when configuring a retention or precache policy:

  • A period-based policy (period), which accepts an ISO 8601 duration
  • An interval-based policy (intervals), which accepts one or more intervals in addition to an optional ISO 8601 duration.

This topic shows how to assign a storage policy to a table using the Polaris API. For general information on storage policies, see Data lifecycle management.

Prerequisites

You must have an API key with the ManageTables permission. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. See API key authentication to obtain an API key and assign permissions. Visit Permissions reference for more information on permissions.

Create a table with a storage policy

When you create a table, you can customize how long Polaris retains or precaches data in the table. In the table definition, specify your custom retention or precache policy in the storagePolicy property.

Retention policy

A retain-type storage policy, or retention policy, automatically deletes data with timestamps older than the specified time period or outside the specified intervals. The default behavior in Polaris is to retain all data forever until you delete the data (such as with a delete_data) job or drop the table (such as with a drop_table job).

To create a table with a retention policy, include the storagePolicy.retain property in the request payload. The following example shows a retention policy that retains data for the past three months:

    "storagePolicy": {
"retain": {
"type": "period",
"period": "P3M"
}
}

The following example shows a retention policy that retains data for part of 2023 (the interval 2024-01-01/2024-11-30):

    "storagePolicy": {
"retain": {
"type": "intervals",
"intervals": ["2024-01-01/2024-11-30"],
}
}

When you use intervals, you must provide at least one interval. To specify multiple intervals, provide a comma-separated list, such as ["2023-01-01/2023-11-30", "2024-01-01/2024-11-30", ...].

Optionally, you can combine a period with intervals. The following example shows a retention policy that retains data for part of 2023 (the interval 2023-01-01/2023-11-30) and data from the last 3 months:

    "storagePolicy": {
"retain": {
"type": "intervals",
"intervals": ["2023-01-01/2023-11-30"],
"period": "P3M"
}
}

Polaris retains any data that falls within either the intervals or the period.

Precache policy

With a cached-typed storage policy, or precache policy, Polaris precaches data within the specified time period. Data outside this time period resides only in deep storage and must be queried asynchronously. The default behavior in Polaris is to keep all data precached.

caution

If the time period in your precache policy does not encompass any of the data in the table, no data is precached. You will not be able to query any data in the table if no data is precached. Ensure your precache policy covers at least a portion of data in the table.

To create a table with a precache policy, include the storagePolicy.cached property in the request payload. The following example shows a precache policy to precache data for the last month:

    "storagePolicy": {
"cached": {
"type": "period",
"period": "P1M"
}
}

The following example shows a precache policy to precache data for the interval 2024-01-01/2024-11-30:

    "storagePolicy": {
"cached": {
"type": "intervals",
"intervals": ["2024-01-01/2024-11-30"]
}
}

You can provide a combination of one or more intervals and an optional period for precache policies. The following example shows a precache policy to precache data from part of 2023 (the interval 2023-01-01/2023-11-30) and the last 3 months:

    "storagePolicy": {
"cached": {
"type": "intervals",
"intervals": ["2023-01-01/2023-11-30"],
"period": "P3M"
}
}

Polaris precaches any data that falls within either the intervals or the period.

Retention and precache policy

You can set both retention and precache policies simultaneously when creating a table.

The following example shows a storage policy definition to retain data for the past three months and precache data for the last month:

    "storagePolicy": {
"retain": {
"type": "period",
"period": "P3M"

},
"cached": {
"type": "period",
"period": "P1M"
}
}
info

Cache policies encompass retention behavior. Polaris retains all precached data, regardless of the time range of the retention policy.

Use the /query/sql/statements endpoint to submit an asynchronous query that accesses data outside the precache period and within the retention period. For example with a P3M retention period and a P1M precache period, you must use an asynchronous query to access data older than one month but within the last three months. Queries that use the /query/sql endpoint access precached data only. For optimal performance, ensure that you precache data that is regularly accessed, and query the data using /query/sql. To learn more, see Query data in deep storage.

Cache policies and retention policies don't need to overlap. This way you can create policies to fit your storage and query performance requirements. For example, consider a retention policy that specifies the period P90D and a precache policy that specifies the predating time interval 2022-01-01/2023-01-01. Since all precached data is retained regardless of your retention policy, the data for the interval is both precached and retained.

With both policies set, Polaris manages the data as follows:

  • Retain but do not precache data for the last 90 days. You can query this data from deep storage using the /sql/statements endpoint.
  • Retain and precache all data from the year 2022. You can query this data synchronously using the /query/sql endpoint as well as from deep storage using the /sql/statements endpoint.

The policy would look like this:

    "storagePolicy": {
"retain": {
"type": "period",
"period": "P90D"

},
"cached": {
"type": "intervals",
"interval": ["2022-01-01/2023-01-01"]
}
}

Sample request

Send a POST request to the /v1/projects/PROJECT_ID/tables endpoint to create a new table with a storage policy. For more information on creating tables, see Create a table by API.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"name": "Koalas Retention",
"type": "detail",
"storagePolicy": {
"retain": {
"period": "P3M",
"type": "period"
}
}
}'

Sample response

The following example shows a successful response:

Click to view the response
{
"schema": [
{
"name": "__time",
"dataType": "timestamp"
}
],
"name": "Koalas Retention",
"type": "detail",
"version": 0,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdOnTimestamp": "2023-08-10T23:43:01.543495184Z",
"createdTimestamp": "2023-08-10T23:43:01.543495184Z",
"description": null,
"id": "0189e1d5-05a6-7015-bb2c-de10182c7f03",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdateTimestamp": "2023-08-10T23:43:01.543499193Z",
"modifiedByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"modifiedOnTimestamp": "2023-08-10T23:43:01.543499193Z",
"partitioningGranularity": "day",
"queryableSchema": [],
"storagePolicy": {
"retain": {
"period": "P3M",
"type": "period"
}
},
"schemaMode": "flexible",
"segmentCompactedBytes": 0,
"segmentTotalBytes": 0,
"totalDataSizeBytes": 0,
"totalRows": 0
}

Add or remove a storage policy

You can add or remove a storage policy from a table that already contains data. When you update the retention period on a table to a longer time period, Polaris does not recover previously deleted data.

To restore the default behavior in Polaris, remove the custom storage policies from the table. By default, Polaris retains all data forever and precaches all retained data.

The following storage policy example resets the default retention and precache behavior:

    "storagePolicy": {}

The following storage policy example resets the precache policy behavior and keeps the three month retention policy:

    "storagePolicy": {
"cached": null,
"retain": {
"period": "P3M",
"type": "period"
}
}

The net effect is for Polaris to retain and precache the past three months of data.

When sending a PUT request to update a table, keep in mind the following differences from creating a table:

  • Supply the table name as a path parameter.
  • Include version in the request body.

Sample request

Send a PUT request to the /v1/projects/PROJECT_ID/tables/TABLE_NAME endpoint to update a table's storage policy. See the Tables v1 API documentation for more information.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"name": "Koalas",
"type": "aggregate",
"version": 0,
"storagePolicy": {
"retain": {
"period": "P1M",
"type": "period"
}
}
}'

Sample response

The following example shows a successful response:

Click to view the response
{
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
}
],
"timeResolution": "millisecond",
"name": "Koalas",
"type": "aggregate",
"version": 1,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdOnTimestamp": "2023-08-10T23:27:51.549893Z",
"createdTimestamp": "2023-08-10T23:27:51.549893Z",
"description": null,
"id": "0189e1c7-22fd-7ea3-addd-a1f06705afa0",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdateTimestamp": "2023-08-10T23:50:57.554676548Z",
"modifiedByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"modifiedOnTimestamp": "2023-08-10T23:50:57.554676548Z",
"partitioningGranularity": "day",
"queryableSchema": [],
"storagePolicy": {
"retain": {
"period": "P1M",
"type": "period"
}
},
"schemaMode": "flexible",
"segmentCompactedBytes": 0,
"segmentTotalBytes": 0,
"totalDataSizeBytes": 0,
"totalRows": 0
}

View storage usage

Issue a GET request to the /v1/projects/PROJECT_ID/tables/TABLE_NAME endpoint to view the amount of precached data in a table. Use this size compared to the total data size to determine the amount of data only in deep storage.

Sample request

The following example shows how to get details for a table. Replace TABLE_NAME with the name of your table.

curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/TABLE_NAME" \
--header "Authorization: Basic $POLARIS_API_KEY"

Sample response

The following example shows a successful response.

Click to view the response
{
"queryGranularity": null,
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
},
{
"dataType": "long",
"queryAggregator": "sum",
"type": "measure",
"name": "__count"
}
],
"timeResolution": "millisecond",
"name": "size",
"type": "aggregate",
"version": 4,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdByUser": {
"username": "polaris.user@example.com",
"userId": "2703fa02-8360-49f6-8e0b-e3501701434c"
},
"createdOnTimestamp": "2024-08-04T22:04:17.372331Z",
"description": null,
"id": "01911f6c-005c-72f0-8bdd-7f428fa813e6",
"modifiedByUser": {
"username": "polaris.user@example.com",
"userId": "2703fa02-8360-49f6-8e0b-e3501701434c"
},
"modifiedOnTimestamp": "2024-08-05T20:34:50.329468Z",
"partitioningGranularity": "day",
"queryableSchema": [
{
"dataType": "timestamp",
"isDiscovered": false,
"name": "__time",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": false,
"name": "__count",
"isAggregation": true
},
{
"dataType": "string",
"isDiscovered": true,
"name": "diffUrl",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isRobot",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "added",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "delta",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "flags",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "channel",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isUnpatrolled",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isNew",
"isAggregation": false
},
{
"dataType": "double",
"isDiscovered": true,
"name": "deltaBucket",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isMinor",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "deleted",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "namespace",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "comment",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "commentLength",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "page",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "user",
"isAggregation": false
}
],
"storagePolicy": {
"cached": {
"intervals": [
"2020-01-01T00:00:00.0Z/2022-01-01T00:00:00.0Z"
],
"period": "P1D",
"type": "intervals"
}
},
"schemaMode": "flexible",
"segmentCompactedBytes": 875806706,
"segmentCount": 54,
"segmentTotalBytes": 961011719,
"precachedDataSizeBytes": 173011624,
"totalDataSizeBytes": 963352517,
"totalRows": 677064
}

Process the JSON output to extract the values for precachedDataSizeBytes and totalDataSizeBytes.

Calculate deep storage usage as totalDataSizeBytes - precachedDataSizeBytes.

Learn more

See the following topics for more information: