If you want to make changes to existing data stored in an Imply Polaris table, you can replace the data for a specified time interval.
There is no option to update or replace by row in Polaris.
This topic covers the concepts of data replacement and how to use the Polaris UI to replace data. For information on how to replace data with the API, refer to Ingestion Jobs API and IngestionJobSpec.
To replace data, you need the
ManageIngestionJobs roles. These roles are assigned by default to the Data Manager, Project Admin, and Organization Admin roles.
For information on user roles and permissions, see User roles reference.
How replacing data works
Replacing data for a table in Polaris works similarly to batch ingestion for data, except that it applies to a specific time range for the data set.
For the source data to use in replacement you can upload a new source data file, choose from an existing upload, or upload a file using the API.
When you replace data:
- Polaris updates only the rows specified by the time interval.
- Any data outside the time interval within your table remains unaffected.
- Polaris discards any data from your source that lies outside the time interval for the replacement.
To replace all data within a table, you can specify an interval that covers the table's entire time range.
From the Polaris UI, choose Replace data from the ellipsis menu (...) on the table details page to launch an ingestion job to replace data for your table.
Set a replacement time interval
Polaris lets you specify the start time and the end time of the interval to replace. The time granularity for the replacement interval depends on the time partitioning setting for the table, for example "day" granularity by default.
Polaris replaces all data in the interval, including the From date and excluding the To date. The UI shows you the exact time range of the data affected by the replace data operation, for example: "Replacing data from 08/19/2019 T00:00:00 up to 08/21/2019 T00:00:00."
View and modify schema mapping
After you choose a source file, Polaris samples your source data and automatically maps columns from your source data to existing columns in your table when it detects matching column names. You can see the automatic mapping on the Design schema page. For example, Polaris adds columns from the source data that were not previously in the schema.
You can modify or delete any fields as necessary for the schema. The schema changes in ingestion jobs to replace data do not affect the rows outside the time interval.
For existing rows in your table outside the interval to replace, the value for any new columns is
Imagine you are working with the clickstream data in the Koalas to the Max table from the Quickstart. This example shows you how to replace the existing data with the same data set, but with an additional column "geocode" that you can use to set up a Geo dimension type in a data cube.
Download koalas-replace.json.gz for use in the following example.
The "koalas" table from the Quickstart contains clickstream data for two days: 2019-08-19 and 2019-08-20. The following steps guide you through replacing all data within the table.
- Navigate to the "Koalas to the Max" table you created in the Quickstart. Note that there are 29 columns in the table.
- From the ellipsis menu (...) in the top right, select Replace data.
- On the Replace data dialog, select Replace data by time interval.
- For the From date, enter the year, month, and day as follows:
2018 08 19.
- For the To date, enter the year, month, and day:
2018 08 21. Note that the Polaris UI displays the time interval to replace:
- Click Confirm to confirm your choice.
- Click Confirm on the confirmation dialog to reconfirm you want to replace data.
- On the Add data page, click Select files from your computer and choose the file you downloaded earlier:
- Click Continue.
- On the Replace data 2018-08-19 to 2018 08 21 / Design Schema page, you can see the additional column: "geocode": For the sake of the example, accept the schema changes as presented in Polaris. When working with your own data, you may want to add or remove columns as needed.
- Click Start ingestion. When your ingestion job completes, you can see that there are now 30 columns and all rows have a value for the "geocode" column.
See the following topics for more information: