Skip to main content

Replace data by API

After you ingest data into a table, you can replace or overwrite data in the table. Replacing data is similar to creating an ingestion job, except that you also specify the time interval for the data to be replaced.

This topic guides you through replacing data in a table using the Polaris API.

For more information on how replacing data works in Imply Polaris, see Replace data.

Prerequisites

This topic assumes that you have the following:

  • A table containing data. The examples in this topic continue using the Koalas table in Ingest from files.
  • Replacement data. The examples in this topic use the data in kttm-2019-08-19-replace.json.gz.
  • An API key with the ManageIngestionJobs permissions. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

Replace existing data in a table

Compared to a standard ingestion job that appends data, the configurations for a replacement ingestion job differ in the following ways:

  • Ingestion mode: Set ingestionMode to replace.
  • Range of data: Provide one or more time intervals, or replace all data:
    • To replace all data, set replaceAll to true.
    • To replace a certain time interval of data, define target.intervals with an array of ISO 8601 time intervals.

The time intervals must align with the partitioning granularity of the job, set in partitionedBy. If unset, job uses the same time partitioning as the table.

For example, if the job uses day time partitioning, you can specify an eight-day interval such as ["2022-06-01/2022-06-09"]. However, you can't specify an eight-hour interval such as ["2022-06-01T00:00:00Z/2022-06-01T08:00:00Z"].

Segment granularity

If the provided time interval doesn’t align with the granularity of existing segments in the table, Polaris replaces the entirety of any segment that overlaps the interval. Polaris removes the entire segment even if the provided interval doesn’t fully encapsulate the segment.

For example, if you specify a one-hour time interval but your data is stored with day granularity, Polaris replaces the entire day of data.

Filter data to replace

Use filter expressions to filter your source data, such as to limit a time range for the incoming data.

Don't confuse timestamp filters with the replacement time intervals. The time intervals in the job refer to data that already exists in the table.

View the data before the replacement

The Koalas table contains two days of data from 2019-08-19 through 2019-08-20. Observe the data before the replacement ingestion job by running the following SQL query:

SELECT "__time", "city" FROM "Koalas"
WHERE FLOOR("__time" to DAY) = TIMESTAMP '2019-08-19'
ORDER BY 1 ASC LIMIT 5

Before the replacement, the query returns the following result:

{"__time":"2019-08-19T00:00:00.234Z","city":"Cumming"}
{"__time":"2019-08-19T00:00:00.528Z","city":"Aracati"}
{"__time":"2019-08-19T00:00:00.673Z","city":"Pensacola"}
{"__time":"2019-08-19T00:00:01.161Z","city":"Brighton"}
{"__time":"2019-08-19T00:00:02.190Z","city":"Carlock"}

Sample request

The following example replaces the data in 2019-08-19 with the data in kttm-2019-08-19-replace.json.gz. The replacement data updates the value of city to Wellington for the entire day.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"type": "batch",
"target": {
"type": "table",
"tableName": "Koalas",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"source": {
"type": "uploaded",
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")"
},
{
"columnName": "city",
"expression": "\"city\""
},
{
"columnName": "session",
"expression": "\"session\""
},
{
"columnName": "session_length",
"expression": "\"session_length\""
}
],
"ingestionMode": "replace"
}'

Sample response

A successful request returns a 201 Created response and the details of the ingestion job.

View the response
{
"source": {
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"filterExpression": null,
"ingestionMode": "replace",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "REPLACE INTO \"Koalas\"\nOVERWRITE\n WHERE (__time >= TIMESTAMP '2019-08-19 00:00:00.0' AND __time < TIMESTAMP '2019-08-20 00:00:00.0')\nSELECT\n TIME_PARSE(\"timestamp\") AS \"__time\",\n \"city\" AS \"city\",\n \"session\" AS \"session\",\n \"session_length\" AS \"session_length\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"fileList\":[\"kttm-2019-08-19-replace.json.gz\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"timestamp\"},{\"dataType\":\"string\",\"name\":\"city\"},{\"dataType\":\"string\",\"name\":\"session\"},{\"dataType\":\"long\",\"name\":\"session_length\"}],\"type\":\"uploaded\"}'\n )\n)\nWHERE (TIME_IN_INTERVAL(TIME_PARSE(\"timestamp\"), '2019-08-19/2019-08-20'))\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-08-15T01:27:00.445394651Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "0189f6cd-a85e-7836-8603-4576c2bd60f4",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-08-15T01:27:00.445394651Z",
"spec": {
"source": {
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"filterExpression": null,
"ingestionMode": "replace",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"type": "batch",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": []
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}

View the data after the replacement

After the replacement, the query returns the following result:

{"__time":"2019-08-19T00:00:00.234Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:00.528Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:00.673Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:01.161Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:02.190Z","city":"Wellington"}

Note that querying for TIMESTAMP '2019-08-20' returns the same result before and after this job because that day was not within the replacement interval.

Learn more

See the following topics for more information: