• Developer guide
  • API reference

›Tables and data

Getting started

  • Introduction to Imply Polaris
  • Quickstart
  • Execute a POC
  • Create a dashboard
  • Navigate the console
  • Key concepts

Tables and data

  • Overview
  • Introduction to tables
  • Table schema
  • Create an ingestion job
  • Timestamp expressions
  • Data partitioning
  • Introduction to rollup
  • Approximation algorithms
  • Replace data

Ingestion sources

  • Ingestion sources overview
  • Supported data formats
  • Create a connection
  • Ingest from files
  • Ingest data from a table
  • Ingest from S3
  • Ingest from Kafka and MSK
  • Ingest from Kinesis
  • Ingest from Confluent Cloud
  • Kafka Connector for Imply Polaris
  • Push event data
  • Connect to Confluent Schema Registry

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Data cube dimensions
  • Data cube measures
  • Dashboards
  • Visualizations reference
  • Set up alerts
  • Set up reports
  • Embed visualizations
  • Query data

Monitoring

  • Overview

Management

  • Overview
  • Pause and resume a project

Billing

  • Overview
  • Polaris plans
  • Estimate project costs

Usage

  • Overview

Security

    Polaris access

    • Overview
    • Invite users to your organization
    • Manage users
    • Permissions reference
    • Manage user groups
    • Enable SSO
    • SSO settings reference
    • Map IdP groups

    Secure networking

    • Connect to AWS
    • Create AWS PrivateLink connection

Developer guide

  • Overview
  • Authentication

    • Overview
    • Authenticate with API keys
    • Authenticate with OAuth
  • Manage users and groups
  • Migrate deprecated resources
  • Create a table
  • Define a schema
  • Upload files
  • Create an ingestion job
  • Ingestion sources

    • Ingest from files
    • Ingest from a table
    • Get ARN for AWS access
    • Ingest from Amazon S3
    • Ingest from Kafka and MSK
    • Ingest from Amazon Kinesis
    • Ingest from Confluent Cloud
    • Push event data
    • Kafka Connector for Imply Polaris
    • Kafka Connector reference
  • Filter data to ingest
  • Ingest nested data
  • Ingest and query sketches
  • Specify data schema
  • Query data
  • Update a project
  • Link to BI tools
  • Connect over JDBC
  • Query parameters reference
  • API documentation

    • OpenAPI reference
    • Query API

Product info

  • Release notes
  • Known limitations
  • Druid extensions

Replace data

If you want to make changes to existing data stored in an Imply Polaris table, you can replace the data for a specified time interval.

There is no option to update or replace by row in Polaris.

This topic covers the concepts of data replacement and how to use the Polaris UI to replace data. For information on how to replace data with the API, refer to Jobs v2 API.

Prerequisites

To replace data, you need the ManageTables and ManageIngestionJobs permissions. These permissions are assigned to the Organization Admin, Project Admin, and Data Manager groups by default. For information on permissions, see Permissions reference.

How replacing data works

Replacing data for a table in Polaris works similarly to batch ingestion for data, except that it applies to a specific time range for the data set.

For the source data to use in replacement you can upload a new source data file, choose from an existing upload, or upload a file using the API.

When you replace data:

  • Polaris updates only the rows specified by the time interval.
  • Any data outside the time interval within your table remains unaffected.
  • Polaris discards any data from your source that lies outside the time interval for the replacement.

To replace all data within a table, you can specify an interval that covers the table's entire time range.

From the Polaris UI, choose Replace data from the ellipsis menu (...) on the table details page to launch an ingestion job to replace data for your table.

Set a replacement time interval

Polaris lets you specify the start time and the end time of the interval to replace. The time granularity for the replacement interval depends on the time partitioning setting for the table, for example "day" granularity by default.

The replacement time interval must have a span coarser than the table’s time partitioning. For example, if a table has a time partitioning of day, you cannot specify an eight hour interval such as 2022-06-01T00:00:00Z/2022-06-01T08:00:00Z; however, you can specify an eight day interval such as 2022-06-01/2022-06-09.

Polaris replaces all data in the interval, including the From date and excluding the To date. The UI shows you the exact time range of the data affected by the replace data operation, for example: "Replacing data from 08/19/2019 T00:00:00 up to 08/21/2019 T00:00:00."

View and modify schema mapping

After you choose a source file, Polaris samples your source data and automatically maps columns from your source data to existing columns in your table when it detects matching column names. You can see the automatic mapping on the Design schema page. For example, Polaris adds columns from the source data that were not previously in the schema.

You can modify or delete any fields as necessary for the schema. The schema changes in ingestion jobs to replace data do not affect the rows outside the time interval.

For existing rows in your table outside the interval to replace, the value for any new columns is null.

Example

Imagine you are working with the clickstream data in the Koalas to the Max table from the Quickstart. This example shows you how to replace the existing data with the same data set, but with an additional column "geocode" that you can use to set up a Geo dimension type in a data cube.

Download kttm-replace.json.tar.gz for use in the following example.

The "koalas" table from the Quickstart contains clickstream data for two days: 2019-08-19 and 2019-08-20. The following steps guide you through replacing all data within the table.

  1. Navigate to the "Koalas to the Max" table you created in the Quickstart. Note that there are 29 columns in the table.
  2. From the ellipsis menu (...) in the top right, select Replace data.
  3. On the Replace data dialog, select Replace data by time interval.
  4. For the From date, enter the year, month, and day as follows: 2018 08 19.
  5. For the To date, enter the year, month, and day: 2018 08 21. Note that the Polaris UI displays the time interval to replace: Polaris replace data dialog
  6. Click Confirm to confirm your choice.
  7. Click Confirm on the confirmation dialog to reconfirm you want to replace data.
  8. On the Insert data page, click Select files from your computer and choose the file you downloaded earlier: kttm-replace.json.tar.gz.
  9. Click Continue.
  10. On the Replace data 2018-08-19 to 2018 08 21 / Design Schema page, you can see the additional column: "geocode": Polaris replace data map schema For the sake of the example, accept the schema changes as presented in Polaris. When working with your own data, you may want to add or remove columns as needed.
  11. Click Start ingestion. When your ingestion job completes, you can see that there are now 30 columns and all rows have a value for the "geocode" column.

Learn more

See the following topics for more information:

  • Polaris quickstart for a walk-through of batch ingestion.
  • Ingestion sources for ingestion sources and strategies.
← Approximation algorithmsIngestion sources overview →
  • Prerequisites
  • How replacing data works
  • Set a replacement time interval
  • View and modify schema mapping
  • Example
  • Learn more
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2023 Imply Data, Inc