Ingestion source reference
This topic summarizes the features for various ingestion sources in Imply Polaris.
For information about usage and configuration options, see the topics for individual ingestion sources. For a more conceptual overview of ingestion, see Ingestion sources overview. To learn more about Polaris and supported regions for AWS PrivateLink, see PrivateLink
Batch ingestion
Polaris supports the following batch ingestion sources.
Ingestion source | ||
---|---|---|
Amazon S3 | Ingestion method | Read from S3 |
Semantics | Atomic | |
Supported data types | JSON, CSV / TSV, Parquet, ORC, Avro OCF | |
Private networking options | N/A | |
Security for data in transit | TLS by default. User controlled | |
Authentication | IAM roles with short-lived auth keys | |
Clouds | AWS | |
AWS regions | All | |
Good for | High volume batch ingestion | |
File Upload | Ingestion method | Publish files to Polaris file staging. File ingestion is Polaris internal |
Semantics | Atomic | |
Supported data types | JSON, CSV / TSV, Parquet, ORC, Avro OCF | |
Private networking options | PrivateLink | |
Security for data in transit | TLS | |
Authentication | API keys, OAuth token-based auth | |
Clouds | All; AWS only if PrivateLink | |
AWS regions | All from public internet; Polaris-supported AWS regions if PrivateLink | |
Good for | Getting started quickly, small to medium use cases, trials and POCs | |
Table-to-table | Ingestion method | Polaris internal |
Semantics | Atomic | |
Supported data types | Polaris table | |
Private networking options | N/A | |
Security for data in transit | Polaris internal | |
Authentication | Polaris internal | |
Clouds | N/A | |
AWS regions | N/A | |
Good for | Re-indexing data |
Streaming ingestion
Polaris supports the following streaming ingestion sources.
Ingestion source | ||
---|---|---|
Confluent Cloud | Ingestion method | Consume |
Semantics | Exactly once | |
Supported data types | JSON, CSV / TSV, Avro, Protobuf | |
Private networking options | PrivateLink | |
Security for data in transit | TLS by default, user controlled | |
Authentication | SASL authentication with long-lived auth keys | |
Clouds | All | |
AWS regions | All | |
Good for | High data volume and high throughput | |
Amazon Kinesis | Ingestion method | Consume |
Semantics | Exactly once | |
Supported data types | JSON, CSV / TSV, Avro, Protobuf | |
Private networking options | N/A | |
Security for data in transit | TLS by default, user controlled | |
Authentication | IAM role assumption with short-lived auth keys | |
Clouds | AWS | |
AWS regions | All | |
Good for | High data volume and high throughput | |
Apache Kafka or AWS MSK | Ingestion method | Consume |
Semantics | Exactly once | |
Supported data types | JSON, CSV / TSV, Avro, Protobuf | |
Private networking options | PrivateLink | |
Security for data in transit | TLS for SASL/SCRAM; TLS by default for MSK + IAM, user controlled | |
Authentication | SASL/PLAIN, SASL/SCRAM. MSK also supports IAM role assumption with short-lived auth keys | |
Clouds | All for Apache Kafka; AWS for MSK | |
AWS regions | All | |
Good for | High data volume and high throughput | |
Kafka Connector to Apache Kafka or MSK | Ingestion method | Publish |
Semantics | At least once | |
Supported data types | JSON, CSV / TSV, Avro, Protobuf | |
Private networking options | PrivateLink | |
Security for data in transit | TLS | |
Authentication | OAuth token-based | |
Clouds | All; AWS only if PrivateLink | |
AWS regions | All if public internet; Polaris-supported regions if PrivateLink | |
Good for | Getting started quickly, Small-medium use cases, Trials and POC | |
Events API | Ingestion method | Publish |
Semantics | At most once (no retries) or at least once (retries) | |
Supported data types | JSON, CSV / TSV | |
Private networking options | PrivateLink | |
Security for data in transit | TLS | |
Authentication | API keys, OAuth token-based | |
Clouds | All; AWS only if PrivateLink | |
AWS regions | All if public internet; Polaris-supported regions if PrivateLink | |
Good for | Use cases and configurations where you don’t want to own Kafka, for example IoT |
Schema metadata
Polaris supports the following schema metadata sources.
Metadata source | ||
---|---|---|
Confluent Schema Registry | Ingestion method | Consume metadata |
Supported data types | Confluent Schema Registry | |
Private networking options | N/A | |
Security for data in transit | TLS by default, user controlled | |
Authentication | ||
Clouds | All | |
AWS regions | All | |
Good for | Avro or Protobuf consume ingestion | |
Schema specification | Ingestion method | Publish metadata |
Supported data types | Avro: JSON schema; Protobuf: descriptor file | |
Private networking options | PrivateLink | |
Security for data in transit | TLS | |
Authentication | API keys, OAuth token-based auth | |
Clouds | All; AWS only if PrivateLink | |
AWS regions | All if public internet; Polaris-supported regions if PrivateLink | |
Good for | Avro or Protobuf consume ingestion |