Restore data by API
You can use the Imply Polaris API to restore the following data deleted from a table as long as it is within the grace period:
- data removed with a soft delete operation
- data expired according to the table's retention policy.
Imply reserves the right to delete data before the end of 30 day grace period, if needed.
For information on recovering data using the UI, see Manage deleted data.
Project-less regional API resources have been deprecated and will be removed by the end of September 2024. See Migrate to project-scoped URL for more information.
Prerequisites
This topic assumes that you have the following:
- A table that has data soft deleted within the past 30 days.
- An API key with the
ManageIngestionJobs
andManageTables
permissions. In the examples below, the key value is stored in the variable namedPOLARIS_API_KEY
. To obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.
Restore data
Data restoration is a job in Polaris in which the job type
is restore_data
.
The job request takes a root property, interval
, and an ISO 8601 time interval of the data to recover.
While you can delete multiple intervals by providing an array of them, you can only restore a single interval at a time. For data that spans multiple time intervals, create a separate job for each one.
To view data that can be restored (soft deleted data), use the Tables API
to send a GET
request to the /unusedSegments
endpoint.
Polaris returns the soft deleted data as a list of segments that are not in use.
For example, to view soft deleted data in a table called demo_table
, make the following request:
GET /v1/projects/{projectId}/tables/demo_table/unusedSegments
There's an optional versions
field where you can provide a single string that specifies the iteration of data you want to restore. A segment version in Polaris can contain multiple iterations of the same data depending on multiple factors.
Sample request
The following example shows a restore_data
job that recovers data for the time interval 2022-07-01/2022-08-01
. It recovers the most recent iteration of the data in that interval since there's no version specified.
See the Jobs v1 API documentation for more information.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--user ${POLARIS_API_KEY}: \
--header 'Content-Type: application/json' \
--data '{
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table"
},
"interval": "2022-07-01/2022-08-01"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table",
},
"interval": "2022-07-01/2022-08-01"
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response:
Click to view the response
{
"deleteAll": false,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-08-10T00:14:21.188501674Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "0189dccb-5804-7bf9-bbbb-6ac085b71b44",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-08-10T00:14:21.188501674Z",
"spec": {
"target": {
"tableName": "demo_table",
"type": "table",
"intervals": [
"2022-07-01/2022-08-01"
]
},
"deleteAll": false,
"type": "restore_data",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "demo_table",
"type": "table",
"intervals": []
},
"type": "restore_data",
"completedTimestamp": null,
"startedTimestamp": null
}
Restore an older version of data that's in use
For general information about data partitioning and versions, see Segment generation.
If you want to restore a version of data that has a higher version in use, you need to first soft delete the newer overlapping data version that's in use.
Then, you can restore the earlier version as usual.
A common scenario that can require this process is when you want to restore segments that have been replaced. Polaris soft deletes older versions of segments when you replace data. Although you can restore the lower version, it gets soft deleted again automatically because there is a higher version in use.
You need to soft delete the higher version segment before you try to restore the lower version segment.
Consider the following example in which you ingested data from 2022. You perform the ingestion on January 1, 2024. Then you replace the data on February 1, 2024. You perform a second replace of the data on March 1, 2024.
Your table has the following segments:
- v0: (
2024-01-01T22:01:31.100Z
), which was soft deleted after you used a replace job to load v1 - v1: (
2024-02-01T23:01:31.100Z
), soft deleted after you used a replace job to load v2 - v2: (
2024-03-01T00:01:31.100Z
), which is the version that's in use
If you try to restore v1 segment at this point, the v1 segment gets automatically soft deleted when it gets restored. This occurs because the v2 segment that's active is more recent. To avoid that, soft delete the v2 segment first. Then, restore the v1 segment.
The following example restores v1, the 2024-02-01T23:01:31.100Z
version. Remember to soft delete the existing data for that interval if it's more recent:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--user ${POLARIS_API_KEY}: \
--header 'Content-Type: application/json' \
--data '{
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table"
},
"interval": "2022-001-01T00:00:00Z/2023-01-01T00:00:00Z",
"versions": ["2024-02-01T23:01:31.100Z"]
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table",
},
"interval": "2022-01-01/2023-01-01",
"versions": ["2024-02-01T23:01:31.100Z"]
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Learn more
See the following topics for more information:
- Manage deleted data for recovering data using the UI.
- Set a storage policy by API for configuring a table's retention policy.