Skip to main content

Guide for Confluent Cloud ingestion

This guide walks through the end-to-end process to ingest data into Imply Polaris from Confluent Cloud. For information on creating Confluent Cloud connections in Polaris, see Connect to Confluent Cloud.

The following diagram summarizes the end-to-end process of connecting to your Confluent Cloud source and ingesting from it. Shaded boxes represent steps taken within Polaris, and unshaded boxes represent steps taken outside Polaris.

info

The screen captures in this guide show the configurations for Confluent Cloud services in December 2023. They may not reflect the current state of the product.

Prerequisites

To complete the steps in this guide, you need the following:

  • A Confluent Cloud account containing Apache Kafka topics to ingest. See Supported formats for requirements on the data format for ingestion.

  • The Confluent Cloud permissions to do the following:

    • View cluster settings in the Confluent Cloud console.
    • Create an API key with access to Kafka resources.
    • Produce messages to Kafka topics.
      See the Confluent Cloud documentation on Access management.
  • Permissions in Polaris to create tables, connections, and ingestion jobs: ManageTables, ManageConnections, and ManageIngestionJobs, respectively. For more information on permissions, visit Permissions reference.

Get details from Confluent Cloud

In this section, you get the Confluent Cloud bootstrap server details and record the name of the topic that Polaris will ingest data from.

In the Confluent Cloud console:

  1. Access your cluster and click Topics in the left pane.

  2. Copy and save the Topic name to ingest data from.

  3. Click Cluster settings in the left pane.

    Confluent Cloud cluster settings

  4. Copy and save the Bootstrap server setting.

Create an API key in Confluent Cloud

In this section, you create an API key that Confluent Cloud will use to connect to Polaris.

  1. In the Confluent Cloud console, access your cluster and click API Keys in the left pane.

  2. Click +Add key and create a key with your chosen scope.

  3. Once created, copy and save the Key and Secret.

    Confluent Cloud API key

Add data to your topic

In this section, you add data to your topic in Confluent Cloud.

In the Confluent Cloud console:

  1. Access your cluster and click Topics in the left pane.

  2. Click your topic and go to the Messages tab.

  3. Click Actions > Produce new message.

    Confluent Cloud product message

  4. Enter the message details and click Produce. Make sure you can see the event in the console.

info

If your data has a time field that you intend to use as the primary timestamp, it must fall within the late message rejection period which is 30 days by default. Otherwise, you can ingest the event timestamp from the Kafka metadata.

When you start an ingestion job later, you can preview and ingest the data into Polaris.

Create a Confluent Cloud connection

In this section, you create a Confluent Cloud connection in Polaris.

  1. In Imply Polaris, go to Sources > Create source > Confluent Cloud.

  2. In the New connection dialog, enter the following details:

    • Connection name: A unique name for your connection.
    • Description: An optional description for the connection.
    • Topic name: The name of the topic you copied.
    • Bootstrap servers: The bootstrap server you copied.
    • Confluent API key: The API key you copied.
    • Confluent API key secret: The API key secret you copied.

    Confluent Cloud connection UI

  3. Click Test connection to ensure that Polaris can make a connection to Confluent Cloud.

For more details on these fields, see Confluent Cloud connection information.

Start an ingestion job

In this section, you create an ingestion job to add data from your Confluent Cloud topic into a table in Polaris.

info

In this guide, Polaris automatically creates the table based on details in the job definition. For greater control on your table properties such as its partitioning or schema enforcement, create the table manually before starting your first ingestion job. For details, see Introduction to tables.

  1. In Imply Polaris, go to Jobs > Create job > Insert data.

  2. Click New table.

  3. Enter a name for the table, and click Next.

  4. Select the Confluent Cloud source, then the connection name, and click Next.

    Select source

  5. Verify the input format.

    Polaris doesn't ingest data older than the late message rejection period (30 days by default). You can use the Kafka event timestamp as the primary timestamp if your events don't include a timestamp or if your event timestamp is older than the late message rejection period. Alternatively, you can change the late message rejection period in the next step.

  6. Click Continue.

  7. Continue through the load data wizard and configure your ingestion job based on your data and use case.

    The Starting offset setting determines what you can do with events already sent to the Kafka topic:

    • Beginning: Ingest all events as previewed as well as future events sent to the topic.
    • End: You can preview the events in the ingestion job but Polaris only ingests events you send to the topic after the ingestion job begins.
  8. Click Start ingestion to begin ingestion.

Learn more

See the following topics for more information: