Skip to main content

Replace data by API

info

Project-less regional API resources have been deprecated and will be removed by the end of September 2024.

You must include the project ID in the URL for all regional API calls in projects created after September 29, 2023. For example: https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID

Projects created before September 29, 2023 can continue to use project-less URLs until the end of September 2024. We strongly recommend updating your regional API calls to include the project ID prior to September 2024. See the API migration guide for more information.

After you ingest data into a table, you can replace or overwrite data in the table. Replacing data is similar to creating an ingestion job, except that you also specify the time interval for the data to be replaced.

This topic guides you through replacing data in a table using the Polaris API.

For more information on how replacing data works in Imply Polaris, see Replace data.

Prerequisites

This topic assumes that you have the following:

  • A table containing data. The examples in this topic continue using the Koalas table in Ingest from files.
  • Replacement data. The examples in this topic use the data in kttm-2019-08-19-replace.json.gz.
  • An API key with the ManageIngestionJobs permissions. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

Replace existing data in a table

Compared to a standard ingestion job that appends data, the configurations for a replacement ingestion job differ in the following ways:

  • ingestionMode: set to replace
  • target.intervals: provide an array of ISO 8601 time intervals

The time granularity for the replacement interval depends on the time partitioning setting for the table. For example, if your partitioning granularity is month, your interval must start at the beginning of a month and extend over a period of a discrete number of months, such as ["2022-06-01/2022-08-01"].

The replacement time interval refers to a section of data that already exists in the table. While you can also filter by time using filter expressions, filter expressions apply to the source data, not the existing data in the table to replace.

View the data before the replacement

The Koalas table contains two days of data from 2019-08-19 through 2019-08-20. Observe the data before the replacement ingestion job by running the following SQL query:

SELECT "__time", "city" FROM "Koalas"
WHERE FLOOR("__time" to DAY) = TIMESTAMP '2019-08-19'
ORDER BY 1 ASC LIMIT 5

Before the replacement, the query returns the following result:

{"__time":"2019-08-19T00:00:00.234Z","city":"Cumming"}
{"__time":"2019-08-19T00:00:00.528Z","city":"Aracati"}
{"__time":"2019-08-19T00:00:00.673Z","city":"Pensacola"}
{"__time":"2019-08-19T00:00:01.161Z","city":"Brighton"}
{"__time":"2019-08-19T00:00:02.190Z","city":"Carlock"}

Sample request

The following example replaces all data from 2019-08-19 with the data in kttm-2019-08-19-replace.json.gz. The replacement data updates the value of city to Wellington for the entire day.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"type": "batch",
"target": {
"type": "table",
"tableName": "Koalas",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"source": {
"type": "uploaded",
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")"
},
{
"columnName": "city",
"expression": "\"city\""
},
{
"columnName": "session",
"expression": "\"session\""
},
{
"columnName": "session_length",
"expression": "\"session_length\""
}
],
"ingestionMode": "replace"
}'

Sample response

A successful request returns a 201 Created response and the details of the ingestion job.

View the response
{
"source": {
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"filterExpression": null,
"ingestionMode": "replace",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "REPLACE INTO \"Koalas\"\nOVERWRITE\n WHERE (__time >= TIMESTAMP '2019-08-19 00:00:00.0' AND __time < TIMESTAMP '2019-08-20 00:00:00.0')\nSELECT\n TIME_PARSE(\"timestamp\") AS \"__time\",\n \"city\" AS \"city\",\n \"session\" AS \"session\",\n \"session_length\" AS \"session_length\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"fileList\":[\"kttm-2019-08-19-replace.json.gz\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"timestamp\"},{\"dataType\":\"string\",\"name\":\"city\"},{\"dataType\":\"string\",\"name\":\"session\"},{\"dataType\":\"long\",\"name\":\"session_length\"}],\"type\":\"uploaded\"}'\n )\n)\nWHERE (TIME_IN_INTERVAL(TIME_PARSE(\"timestamp\"), '2019-08-19/2019-08-20'))\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-08-15T01:27:00.445394651Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "0189f6cd-a85e-7836-8603-4576c2bd60f4",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-08-15T01:27:00.445394651Z",
"spec": {
"source": {
"fileList": [
"kttm-2019-08-19-replace.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": [
"2019-08-19/2019-08-20"
]
},
"filterExpression": null,
"ingestionMode": "replace",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"type": "batch",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": []
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}

View the data after the replacement

After the replacement, the query returns the following result:

{"__time":"2019-08-19T00:00:00.234Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:00.528Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:00.673Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:01.161Z","city":"Wellington"}
{"__time":"2019-08-19T00:00:02.190Z","city":"Wellington"}

Note that querying for TIMESTAMP '2019-08-20' returns the same result before and after this job because that day was not within the replacement interval.

Learn more

See the following topics for more information: