Restore data by API
You can use the Imply Polaris API to restore the following data deleted from a table as long as it is within the grace period:
- data removed with a soft delete operation
- data expired according to the table's retention policy.
Imply reserves the right to delete data before the end of 30 day grace period, if needed.
For information on recovering data using the UI, see Manage deleted data.
Prerequisites
This topic assumes that you have the following:
- A table that has data soft deleted within the past 30 days.
- An API key with the
ManageIngestionJobs
andManageTables
permissions. In the examples below, the key value is stored in the variable namedPOLARIS_API_KEY
. To obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.
View soft deleted data
To view data that can be restored (soft deleted data), use the Tables API
to send a GET
request to the /unusedSegments
endpoint.
Polaris returns the soft deleted data as a list of segments that are not in use.
For example, to view soft deleted data in a table called demo_table
, make the following request:
GET /v1/projects/{projectId}/tables/demo_table/unusedSegments
Restore data
Data restoration is a job in Polaris with the job type
of restore_data
.
In the request body, provide the name of the target table and an ISO 8601 time interval of the data to recover.
There's an optional versions
field where you can provide a single string that specifies the iteration of data you want to restore. A segment version in Polaris can contain multiple iterations of the same data depending on multiple factors.
You can only restore a single interval at a time. This contrasts with data deletion jobs in which you can provide an array of multiple intervals. To restore data over multiple time intervals, create a separate job for each one.
Segment granularity
If the interval doesn’t align with the granularity of existing segments, Polaris attempts to restore the entirety of any segment that overlaps the interval. This effectively widens the specified time interval.
For example, if you specify a one-hour time interval but your data is stored with day
granularity, Polaris restores the entire day of data.
Note that data restoration requires that the table doesn't actively contain data for the time period being restored. The overlap may not be apparent when the segment identified for restoration spans a coarser time frame than the interval provided in the job.
For example, consider a table that contains data in month
granularity segments for March 2024 as well as January 1, 2024.
The table has soft deleted data for January and February 2024.
Suppose you try to restore data using the interval "2024-01-02/2024-03-01"
, representing January 2 up until March 1.
While the restoration interval appears to not overlap any existing data in the table,
both of the soft deleted segments are identified for restoration.
However, the existing January 1 data prevents Polaris from recovering the entirety of the soft deleted January data.
Sample request
The following example shows a restore_data
job that recovers data for the time interval 2022-07-01/2022-08-01
. It recovers the most recent iteration of the data in that interval since there's no version specified.
See the Jobs v1 API documentation for more information.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--user ${POLARIS_API_KEY}: \
--header 'Content-Type: application/json' \
--data '{
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table"
},
"interval": "2022-07-01/2022-08-01"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table",
},
"interval": "2022-07-01/2022-08-01"
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response:
Click to view the response
{
"deleteAll": false,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-08-10T00:14:21.188501674Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "0189dccb-5804-7bf9-bbbb-6ac085b71b44",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-08-10T00:14:21.188501674Z",
"spec": {
"target": {
"tableName": "demo_table",
"type": "table",
"intervals": [
"2022-07-01/2022-08-01"
]
},
"deleteAll": false,
"type": "restore_data",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "demo_table",
"type": "table",
"intervals": []
},
"type": "restore_data",
"completedTimestamp": null,
"startedTimestamp": null
}
Restore an older version of data that's in use
For general information about data partitioning and versions, see Segment generation.
If you want to restore a version of data that has a higher version in use, you need to first soft delete the newer overlapping data version that's in use.
Then, you can restore the earlier version as usual.
A common scenario that can require this process is when you want to restore segments that have been replaced. Polaris soft deletes older versions of segments when you replace data. Although you can restore the lower version, it gets soft deleted again automatically because there is a higher version in use.
You need to soft delete the higher version segment before you try to restore the lower version segment.
Consider the following example in which you ingested data from 2022 with year
granularity. You perform the ingestion on January 1, 2024. Then you replace the data on February 1, 2024. You perform a second replace of the data on March 1, 2024.
Your table has a segment with the following version:
- v0: (
2024-01-01T22:01:31.100Z
), soft deleted after you used a replace job to load v1 - v1: (
2024-02-01T23:01:31.100Z
), soft deleted after you used a replace job to load v2 - v2: (
2024-03-01T00:01:31.100Z
), the version that's in use
If you try to restore the v1 segment at this point, the v1 segment gets automatically soft deleted when it gets restored. This occurs because the v2 segment that's active is more recent. To avoid that, soft delete the v2 segment first. Then, restore the v1 segment.
The following example restores v1, the 2024-02-01T23:01:31.100Z
version. Remember to soft delete the existing data for that interval if it's more recent:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--user ${POLARIS_API_KEY}: \
--header 'Content-Type: application/json' \
--data '{
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table"
},
"interval": "2022-01-01T00:00:00Z/2023-01-01T00:00:00Z",
"versions": ["2024-02-01T23:01:31.100Z"]
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "restore_data",
"target": {
"type": "table",
"tableName": "demo_table",
},
"interval": "2022-01-01/2023-01-01",
"versions": ["2024-02-01T23:01:31.100Z"]
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Learn more
See the following topics for more information:
- Manage deleted data for recovering data using the UI.
- Set a storage policy by API for configuring a table's retention policy.