Set a storage policy by API
You can set a storage policy on a table to manage the lifecycle and query accessibility of data in the table. A retention policy is a storage policy that determines how long to retain data. A precache policy is a storage policy that determines how long to keep data precached. There are two types of policies you can use when configuring a retention or precache policy:
- A period-based policy (
period
), which accepts an ISO 8601 duration - An interval-based policy (
intervals
), which accepts one or more intervals in addition to an optional ISO 8601 duration.
This topic shows how to assign a storage policy to a table using the Polaris API. For general information on storage policies, see Data lifecycle management.
Project-less regional API resources have been deprecated and will be removed by the end of September 2024. See Migrate to project-scoped URL for more information.
Prerequisites
You must have an API key with the ManageTables
permission.
In the examples below, the key value is stored in the variable named POLARIS_API_KEY
.
See API key authentication to obtain an API key and assign permissions.
Visit Permissions reference for more information on permissions.
Create a table with a storage policy
When you create a table, you can customize how long Polaris retains or precaches data in the table.
In the table definition, specify your custom retention or precache policy in the storagePolicy
property.
Retention policy
A retain
-type storage policy, or retention policy, automatically deletes data with timestamps older than the specified time period or outside the specified intervals.
The default behavior in Polaris is to retain all data forever until you delete the data (such as with a delete_data
) job or drop the table (such as with a drop_table
job).
To create a table with a retention policy, include the storagePolicy.retain
property in the request payload.
The following example shows a retention policy that retains data for the past three months:
"storagePolicy": {
"retain": {
"type": "period",
"period": "P3M"
}
}
The following example shows a retention policy that retains data for part of 2023 (the interval 2024-01-01/2024-11-30
):
"storagePolicy": {
"retain": {
"type": "intervals",
"intervals": ["2024-01-01/2024-11-30"],
}
}
When you use intervals, you must provide at least one interval. To specify multiple intervals, provide a comma-separated list, such as ["2023-01-01/2023-11-30", "2024-01-01/2024-11-30", ...]
.
Optionally, you can combine a period with intervals. The following example shows a retention policy that retains data for part of 2023 (the interval 2023-01-01/2023-11-30
) and data from the last 3 months:
"storagePolicy": {
"retain": {
"type": "intervals",
"intervals": ["2023-01-01/2023-11-30"],
"period": "P3M"
}
}
Polaris retains any data that falls within either the intervals or the period.
Precache policy
With a cached
-typed storage policy, or precache policy, Polaris precaches data within the specified time period.
Data outside this time period resides only in deep storage and must be queried asynchronously.
The default behavior in Polaris is to keep all data precached.
If the time period in your precache policy does not encompass any of the data in the table, no data is precached. You will not be able to query any data in the table if no data is precached. Ensure your precache policy covers at least a portion of data in the table.
To create a table with a precache policy, include the storagePolicy.cached
property in the request payload.
The following example shows a precache policy to precache data for the last month:
"storagePolicy": {
"cached": {
"type": "period",
"period": "P1M"
}
}
The following example shows a precache policy to precache data for the interval 2024-01-01/2024-11-30
:
"storagePolicy": {
"cached": {
"type": "intervals",
"intervals": ["2024-01-01/2024-11-30"]
}
}
You can provide a combination of one or more intervals and an optional period for precache policies.
The following example shows a precache policy to precache data from part of 2023 (the interval 2023-01-01/2023-11-30
) and the last 3 months:
"storagePolicy": {
"cached": {
"type": "intervals",
"intervals": ["2023-01-01/2023-11-30"],
"period": "P3M"
}
}
Polaris precaches any data that falls within either the intervals or the period.
Retention and precache policy
You can set both retention and precache policies simultaneously when creating a table.
The following example shows a storage policy definition to retain data for the past three months and precache data for the last month:
"storagePolicy": {
"retain": {
"type": "period",
"period": "P3M"
},
"cached": {
"type": "period",
"period": "P1M"
}
}
Cache policies encompass retention behavior. Polaris retains all precached data, regardless of the time range of the retention policy.
Use the /query/sql/statements
endpoint to submit an asynchronous query that accesses data outside the precache period and within the retention period.
For example with a P3M
retention period and a P1M
precache period, you must use an asynchronous query to access data older than one month but within the last three months.
Queries that use the /query/sql
endpoint access precached data only.
For optimal performance, ensure that you precache data that is regularly accessed, and query the data using /query/sql
.
To learn more, see Query data in deep storage.
Cache policies and retention policies don't need to overlap. This way you can create policies to fit your storage and query performance requirements. For example, consider a retention policy that specifies the period P90D
and a precache policy that specifies the predating time interval 2022-01-01/2023-01-01
. Since all precached data is retained regardless of your retention policy, the data for the interval is both precached and retained.
With both policies set, Polaris manages the data as follows:
- Retain but do not precache data for the last 90 days. You can query this data from deep storage using the
/sql/statements
endpoint. - Retain and precache all data from the year 2022. You can query this data synchronously using the
/query/sql
endpoint as well as from deep storage using the/sql/statements
endpoint.
The policy would look like this:
"storagePolicy": {
"retain": {
"type": "period",
"period": "P90D"
},
"cached": {
"type": "intervals",
"interval": ["2022-01-01/2023-01-01"]
}
}
Sample request
Send a POST
request to the /v1/projects/PROJECT_ID/tables
endpoint to create a new table with a storage policy.
For more information on creating tables, see Create a table by API.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"name": "Koalas Retention",
"type": "detail",
"storagePolicy": {
"retain": {
"period": "P3M",
"type": "period"
}
}
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"name": "Koalas Retention",
"type": "detail",
"storagePolicy": {
"retain": {
"period": "P3M",
"type": "period"
}
}
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response:
Click to view the response
{
"schema": [
{
"name": "__time",
"dataType": "timestamp"
}
],
"name": "Koalas Retention",
"type": "detail",
"version": 0,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdOnTimestamp": "2023-08-10T23:43:01.543495184Z",
"createdTimestamp": "2023-08-10T23:43:01.543495184Z",
"description": null,
"id": "0189e1d5-05a6-7015-bb2c-de10182c7f03",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdateTimestamp": "2023-08-10T23:43:01.543499193Z",
"modifiedByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"modifiedOnTimestamp": "2023-08-10T23:43:01.543499193Z",
"partitioningGranularity": "day",
"queryableSchema": [],
"storagePolicy": {
"retain": {
"period": "P3M",
"type": "period"
}
},
"schemaMode": "flexible",
"segmentCompactedBytes": 0,
"segmentTotalBytes": 0,
"totalDataSizeBytes": 0,
"totalRows": 0
}
Add or remove a storage policy
You can add or remove a storage policy from a table that already contains data. When you update the retention period on a table to a longer time period, Polaris does not recover previously deleted data.
To restore the default behavior in Polaris, remove the custom storage policies from the table. By default, Polaris retains all data forever and precaches all retained data.
The following storage policy example resets the default retention and precache behavior:
"storagePolicy": {}
The following storage policy example resets the precache policy behavior and keeps the three month retention policy:
"storagePolicy": {
"cached": null,
"retain": {
"period": "P3M",
"type": "period"
}
}
The net effect is for Polaris to retain and precache the past three months of data.
When sending a PUT
request to update a table, keep in mind the following differences from creating a table:
- Supply the table name as a path parameter.
- Include
version
in the request body.
Sample request
Send a PUT
request to the /v1/projects/PROJECT_ID/tables/TABLE_NAME
endpoint to update a table's storage policy.
See the Tables v1 API documentation for more information.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"name": "Koalas",
"type": "aggregate",
"version": 0,
"storagePolicy": {
"retain": {
"period": "P1M",
"type": "period"
}
}
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"name": "Koalas",
"type": "aggregate",
"version": 0,
"storagePolicy": {
"retain": {
"period": "P1M",
"type": "period"
}
}
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("PUT", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response:
Click to view the response
{
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
}
],
"timeResolution": "millisecond",
"name": "Koalas",
"type": "aggregate",
"version": 1,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdOnTimestamp": "2023-08-10T23:27:51.549893Z",
"createdTimestamp": "2023-08-10T23:27:51.549893Z",
"description": null,
"id": "0189e1c7-22fd-7ea3-addd-a1f06705afa0",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdateTimestamp": "2023-08-10T23:50:57.554676548Z",
"modifiedByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"modifiedOnTimestamp": "2023-08-10T23:50:57.554676548Z",
"partitioningGranularity": "day",
"queryableSchema": [],
"storagePolicy": {
"retain": {
"period": "P1M",
"type": "period"
}
},
"schemaMode": "flexible",
"segmentCompactedBytes": 0,
"segmentTotalBytes": 0,
"totalDataSizeBytes": 0,
"totalRows": 0
}
View storage usage
Issue a GET
request to the /v1/projects/PROJECT_ID/tables/TABLE_NAME
endpoint to view the amount of precached data in a table.
Use this size compared to the total data size to determine the amount of data only in deep storage.
Sample request
The following example shows how to get details for a table.
Replace TABLE_NAME
with the name of your table.
- cURL
- Python
curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/TABLE_NAME" \
--header "Authorization: Basic $POLARIS_API_KEY"
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/TABLE_NAME"
apikey = os.getenv("POLARIS_API_KEY")
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
Sample response
The following example shows a successful response.
Click to view the response
{
"queryGranularity": null,
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
},
{
"dataType": "long",
"queryAggregator": "sum",
"type": "measure",
"name": "__count"
}
],
"timeResolution": "millisecond",
"name": "size",
"type": "aggregate",
"version": 4,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdByUser": {
"username": "polaris.user@example.com",
"userId": "2703fa02-8360-49f6-8e0b-e3501701434c"
},
"createdOnTimestamp": "2024-08-04T22:04:17.372331Z",
"description": null,
"id": "01911f6c-005c-72f0-8bdd-7f428fa813e6",
"modifiedByUser": {
"username": "polaris.user@example.com",
"userId": "2703fa02-8360-49f6-8e0b-e3501701434c"
},
"modifiedOnTimestamp": "2024-08-05T20:34:50.329468Z",
"partitioningGranularity": "day",
"queryableSchema": [
{
"dataType": "timestamp",
"isDiscovered": false,
"name": "__time",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": false,
"name": "__count",
"isAggregation": true
},
{
"dataType": "string",
"isDiscovered": true,
"name": "diffUrl",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isRobot",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "added",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "delta",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "flags",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "channel",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isUnpatrolled",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isNew",
"isAggregation": false
},
{
"dataType": "double",
"isDiscovered": true,
"name": "deltaBucket",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "isMinor",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "deleted",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "namespace",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "comment",
"isAggregation": false
},
{
"dataType": "long",
"isDiscovered": true,
"name": "commentLength",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "page",
"isAggregation": false
},
{
"dataType": "string",
"isDiscovered": true,
"name": "user",
"isAggregation": false
}
],
"storagePolicy": {
"cached": {
"intervals": [
"2020-01-01T00:00:00.0Z/2022-01-01T00:00:00.0Z"
],
"period": "P1D",
"type": "intervals"
}
},
"schemaMode": "flexible",
"segmentCompactedBytes": 875806706,
"segmentCount": 54,
"segmentTotalBytes": 961011719,
"precachedDataSizeBytes": 173011624,
"totalDataSizeBytes": 963352517,
"totalRows": 677064
}
Process the JSON output to extract the values for precachedDataSizeBytes
and totalDataSizeBytes
.
Calculate deep storage usage as totalDataSizeBytes - precachedDataSizeBytes
.
Learn more
See the following topics for more information:
- Data lifecycle management for details about storage policies in Polaris.
- Delete data by API for using the Polaris API to delete data and tables.