You can ingest streaming data into a table in Imply Polaris from an external third-party provider. If you are looking to push event data into Polaris, see Batch ingestion.
Streaming ingestion from an external provider requires the following in Polaris:
- A table to receive the ingested data.
- A connection to an external streaming service.
- An ingestion job to bring in data from the connection.
Polaris automatically ingests data from the connection as data enters the data stream defined in the connection.
Polaris currently supports external connections to Confluent Cloud, a fully managed, cloud-native service for Apache Kafka. The following characteristics apply to ingestion from Confluent Cloud to Polaris:
- Exactly-once guarantees.
- Occurs across the Internet with TLS encryption and SASL authentication.
This topic describes how to establish a connection from Polaris to an external provider and how to ingest data into Polaris from the connection.
To configure ingestion from external providers, you need the appropriate permissions on your Polaris user account as well as information from the external service from which to ingest data.
Polaris users with the
ManageFiles role can create and edit connections. Those with the
ManageIngestionJobs role can create ingestion jobs pertaining to connections. Visit User roles reference for more information on roles and their permissions.
You can ingest newline-delimited JSON data in streaming ingestion from external providers. See Supported data and file formats for more information.
Each event timestamp must be within 7 days of ingestion time. Polaris rejects events with timestamps older than 7 days. Consider using batch ingestion if you want to ingest older data.
Stream and connection details
The following details are required from the external provider to create a connection from Polaris.
Connections to external providers require credentials for Polaris to access the external source of data. The connection type determines what credentials are required. Polaris never displays existing credentials.
A Polaris connection to Confluent Cloud requires the following information:
Topic name. The name of the Confluent Cloud topic that contains the event data.
Bootstrap servers. A list of one or more host and port pairs representing the addresses of brokers in the Kafka cluster. This list should be in the form
host1:port1,host2:port2,...For details on where to find the bootstrap server in Confluent Cloud, see Access cluster settings in the Confluent Cloud Console.
You can also use the Confluent Cloud API to read a cluster and find its bootstrap server information in the
kafka_bootstrap_endpointof the cluster object.
A Confluent Cloud API key with access to Kafka resources. The API key consists of a key and a secret. The Polaris connection takes both the key and the secret to access data through the connection. For information on creating and managing API keys in Confluent Cloud, see Use API Keys to Control Access.
Create a connection
Each connection is associated with a single source of data. If you plan to ingest data from multiple topics or streams, create a new connection for each one.
To create a new connection to Confluent Cloud:
Click Sources from the left navigation menu. You can also add connections when you create a new table or add data to an existing table.
The following screenshot shows the sources view:
Click New source and select Confluent Cloud. Polaris displays a Create connection dialog with form fields specific to the connection.
Provide the following information for the connection:
- Name of the connection in Polaris. The connection name must be unique within your Polaris organization and cannot be changed.
- An optional description of the connection.
- Connection properties specific to the external provider. See Prerequisites.
Verify the connection between Polaris and the external provider before creating the connection. Select Test connection from the Create connection dialog.
The following screenshot shows an example of a successful connection:
Polaris displays a warning if it is unable to connect to the external source; however, you can still save your connection information.
Your connections are displayed by source type on the Sources page. Click the ellipsis icon next to any connection to edit, test, or delete the connection.
Ingest from a connection
This section assumes you already have an existing table for ingesting data. See Introduction to tables for information on how to create a table.
After you create a connection to an external provider in Polaris, you start an ingestion job to import data from the external provider into a Polaris table.
When you start an ingestion job from a Confluent Cloud connection, Polaris consumes data from the Kafka topic defined in the connection.
You can create multiple jobs from a single connection. This allows you to ingest streaming data from the connection into multiple tables. However, a table can only have one associated streaming job.
To create a streaming ingestion job from a connection:
Navigate to Tables from the left pane.
Click the table that you want to receive the ingested data.
For an empty table with no schema, select the Load data tile. Otherwise, click Load data > Add data in the top navigation panel.
Click the Confluent Cloud source type in the left pane.
Select the connection to ingest from and click Next.
Polaris displays a Starting offset dialog. Select one of the following and click Next.
- Beginning: Ingest all of the existing data in the Kafka topic. Note that Polaris only ingests events whose timestamp is within the last seven days.
- End: Ingest events that arrive after creation of the connection and ingestion job are.
Map data from your source to the table. Configure which source columns match to which Polaris columns and apply transformations to the source data if necessary. The following screenshot shows the schema mapping step:
Click Start ingestion. When the streaming job starts, Polaris automatically starts to ingest data from the Confluent Cloud topic specified in the connection.
To stop a streaming ingestion job:
- Navigate to Jobs from the left pane.
- Click the job you want to cancel.
- Select Cancel and confirm the request.
Only users who have permissions to create connections can update and remove connection credentials and delete connections.
You can modify or delete an inactive connection. If the connection has an active ingestion job, cancel the job before you update or delete the connection.
To delete a connection:
- Navigate to Sources from the left pane.
- Click the ellipsis icon next to the connection and select Delete.
- Confirm the request.
Navigate to Jobs from the left pane to monitor the status of your ingestion job from external connections.
To view specific errors related to event ingestion, go to Tables. Click the menu icon on the right side of the row for your table and select View jobs.
You can also run a SQL query against the table to check that new data in the connection source is being ingested.
Go to Streaming to view dashboards that monitor the overall health of your event stream including:
- Ingest latency
- The number of Events processed
- Rejections because Events arrived too late
- Unparseable events
- Rows output
See the following topics for more information:
- Ingest data from external providers via API for details on ingesting data from an external provider using the Polaris API.
- Ingest events with the Aiven HTTP Sink Connector for Kafka for how to configure Kafka Connect to read data from a Kafka topic and publish data to a Polaris table.