Guide for Confluent Cloud ingestion
This guide walks through the end-to-end process to ingest data into Imply Polaris from Confluent Cloud. For information on creating Confluent Cloud connections in Polaris, see Connect to Confluent Cloud.
The following diagram summarizes the end-to-end process of connecting to your Confluent Cloud source and ingesting from it. Shaded boxes represent steps taken within Polaris, and unshaded boxes represent steps taken outside Polaris.
The screen captures and instructions in this guide show the configurations for Confluent Cloud services in December 2023. They may not reflect the current state of the product.
Prerequisites
To complete the steps in this guide, you need the following:
A Confluent Cloud account containing Apache Kafka topics to ingest. See Supported formats for requirements on the data format for ingestion.
The Confluent Cloud permissions to do the following:
- View cluster settings in the Confluent Cloud console.
- Create an API key with access to Kafka resources.
- Produce messages to Kafka topics.
See the Confluent Cloud documentation on Access management.
Permissions in Polaris to create tables, connections, and ingestion jobs:
ManageTables
,ManageConnections
, andManageIngestionJobs
, respectively. For more information on permissions, visit Permissions reference.
Get details from Confluent Cloud
In this section, you get the Confluent Cloud bootstrap server details and record the name of the topic that Polaris will ingest data from.
In the Confluent Cloud console:
Access your cluster and click Topics in the left pane.
Copy and save the Topic name to ingest data from.
Click Cluster settings in the left pane.
Copy and save the Bootstrap server setting.
Create an API key in Confluent Cloud
In this section, you create an API key that Polaris will use to authenticate with Confluent Cloud.
In the Confluent Cloud console, access your cluster and click API Keys in the left pane.
Click +Add key and create a key with your chosen scope.
Once created, copy and save the Key and Secret.
Add data to your topic
In this section, you add data to your Kafka topic in Confluent Cloud.
In the Confluent Cloud console:
Access your cluster and click Topics in the left pane.
Click your topic and go to the Messages tab.
Click Actions > Produce new message.
Enter the message details and click Produce. Make sure you can see the event in the console.
If your data has a time field that you intend to use as the primary timestamp, it must fall within the late message rejection period which is 30 days by default. Otherwise, you can ingest the event timestamp from the Kafka metadata.
When you start an ingestion job later, you can preview and ingest the data into Polaris.
Create a Confluent Cloud connection
In this section, you create a Confluent Cloud connection in Polaris.
In Imply Polaris, go to Sources > Create source > Confluent Cloud.
In the New connection dialog, enter the following details:
- Connection name: A unique name for your connection.
- Description: An optional description for the connection.
- Topic name: The name of the topic you copied.
- Bootstrap servers: The bootstrap server you copied.
- Confluent API key: The API key you copied.
- Confluent API key secret: The API key secret you copied.
Click Test connection to ensure that Polaris can make a connection to Confluent Cloud.
For more details on these fields, see Confluent Cloud connection information.
Start an ingestion job
In this section, you create an ingestion job to add data from your Confluent Cloud topic into a table in Polaris.
In this guide, Polaris automatically creates the table based on details in the job definition. For greater control on your table properties such as its partitioning or schema enforcement, create the table manually before starting your first ingestion job. For details, see Introduction to tables.
In Imply Polaris, go to Jobs > Create job > Insert data.
Click New table.
Enter a name for the table, and click Next.
Select the Confluent Cloud source, then the connection name, and click Next.
Verify the input format.
Polaris doesn't ingest data older than the late message rejection period (30 days by default). You can use the Kafka event timestamp as the primary timestamp if your events don't include a timestamp or if your event timestamp is older than the late message rejection period. To do so, select Parse Kafka metadata. Alternatively, you can change the late message rejection period in the next step.
Click Continue.
Continue through the load data wizard and configure your ingestion job based on your data and use case.
The Starting offset setting determines what you can do with events already sent to the Kafka topic:
- Beginning: Ingest all events as previewed as well as future events sent to the topic.
- End: You can preview the events in the ingestion job but Polaris only ingests events you send to the topic after the ingestion job begins.
Click Start ingestion to begin ingestion.
Learn more
See the following topics for more information:
- Connect to Confluent Cloud for information on creating a Confluent Cloud connection in the Polaris UI.
- Ingest data from Confluent Cloud by API for information on using the Connections v1 API and the Jobs v1 API to ingest event data from Confluent Cloud.