Replace data by API
After you ingest data into a table, you can replace or overwrite data in the table. Replacing data is similar to creating an ingestion job, except that you also specify the time interval for the data to be replaced.
This topic guides you through replacing data in a table using the Polaris API.
For more information on how replacing data works in Imply Polaris, see Replace data.
Prerequisites
This topic assumes that you have the following:
- A table containing data. The examples in this topic continue using the
Koalas
table in Ingest from files. - Replacement data. The examples in this topic use the data in
kttm-2019-08-19-replace.json.gz
. - An API key with the
ManageIngestionJobs
permissions. In the examples below, the key value is stored in the variable namedPOLARIS_API_KEY
. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.
Replace existing data in a table
Compared to a standard ingestion job that appends data, the configurations for a replacement ingestion job differ in the following ways:
- Ingestion mode: Set
ingestionMode
toreplace
. - Range of data: Provide one or more time intervals, or replace all data:
- To replace all data, set
replaceAll
totrue
. - To replace a certain time interval of data, define
target.intervals
with an array of ISO 8601 time intervals.
- To replace all data, set
The time intervals must align with the partitioning granularity of the job, set in partitionedBy
.
If unset, job uses the same time partitioning as the table.
For example, if the job uses day
time partitioning, you can specify an eight-day interval such as ["2022-06-01/2022-06-09"]
.
However, you can't specify an eight-hour interval such as ["2022-06-01T00:00:00Z/2022-06-01T08:00:00Z"]
.
If the provided time interval doesn’t align with the granularity of existing segments in the table, Polaris replaces the entirety of any segment that overlaps the interval. Polaris removes the entire segment even if the provided interval doesn’t fully encapsulate the segment.
For example, if you specify a one-hour time interval
but your data is stored with day
granularity, Polaris replaces the entire day of data.
Use filter expressions to filter your source data, such as to limit a time range for the incoming data.
Don't confuse timestamp filters with the replacement time intervals. The time intervals in the job refer to data that already exists in the table.
View the data before the replacement
The Koalas
table contains two days of data from 2019-08-19 through 2019-08-20.
Observe the data before the replacement ingestion job by running the following SQL query:
SELECT "__time", "city" FROM "Koalas"
WHERE FLOOR("__time" to DAY) = TIMESTAMP '2019-08-19'
ORDER BY 1 ASC LIMIT 5
Before the replacement, the query returns the following result:
{"__time":"2019-08-19T00:00:00.234Z","city":"Cumming"}
{"__time":"2019-08-19T00:00:00.528Z","city":"Aracati"}
{"__time":"2019-08-19T00:00:00.673Z","city":"Pensacola"}
{"__time":"2019-08-19T00:00:01.161Z","city":"Brighton"}
{"__time":"2019-08-19T00:00:02.190Z","city":"Carlock"}
Sample request
The following example replaces the data in 2019-08-19 with the data in kttm-2019-08-19-replace.json.gz
.
The replacement data updates the value of city
to Wellington
for the entire day.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"type": "batch",
"target": {
"type": "table",
"tableName": "Koalas",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"source": {
"type": "uploaded",
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")"
},
{
"columnName": "city",
"expression": "\"city\""
},
{
"columnName": "session",
"expression": "\"session\""
},
{
"columnName": "session_length",
"expression": "\"session_length\""
}
],
"ingestionMode": "replace"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "batch",
"target": {
"type": "table",
"tableName": "Koalas",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"source": {
"type": "uploaded",
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")"
},
{
"columnName": "city",
"expression": "\"city\""
},
{
"columnName": "session",
"expression": "\"session\""
},
{
"columnName": "session_length",
"expression": "\"session_length\""
}
],
"ingestionMode": "replace"
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
A successful request returns a 201 Created
response and the details of the ingestion job.
View the response
{
"source": {
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"filterExpression": null,
"ingestionMode": "replace",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "REPLACE INTO \"Koalas\"\nOVERWRITE\n WHERE (__time >= TIMESTAMP '2019-08-19 00:00:00.0' AND __time < TIMESTAMP '2019-08-20 00:00:00.0')\nSELECT\n TIME_PARSE(\"timestamp\") AS \"__time\",\n \"city\" AS \"city\",\n \"session\" AS \"session\",\n \"session_length\" AS \"session_length\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"fileList\":[\"kttm-2019-08-19-replace.json.gz\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"timestamp\"},{\"dataType\":\"string\",\"name\":\"city\"},{\"dataType\":\"string\",\"name\":\"session\"},{\"dataType\":\"long\",\"name\":\"session_length\"}],\"type\":\"uploaded\"}'\n )\n)\nWHERE (TIME_IN_INTERVAL(TIME_PARSE(\"timestamp\"), '2019-08-19/2019-08-20'))\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-08-15T01:27:00.445394651Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "0189f6cd-a85e-7836-8603-4576c2bd60f4",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-08-15T01:27:00.445394651Z",
"spec": {
"source": {
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"filterExpression": null,
"ingestionMode": "replace",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"type": "batch",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": []
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}
View the data after the replacement
After the replacement, the query returns the following result:
{"__time":"2019-08-19T00:00:00.234Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:00.528Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:00.673Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:01.161Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:02.190Z","city":"Wellington"}
Note that querying for TIMESTAMP '2019-08-20'
returns the same result before and after this job
because that day was not within the replacement interval.
Learn more
See the following topics for more information: