• Developer guide
  • API reference

›Ingestion sources

Getting started

  • Introduction to Imply Polaris
  • Quickstart
  • Execute a POC
  • Create a dashboard
  • Navigate the console
  • Customize Polaris
  • Key concepts

Tables and data

  • Overview
  • Introduction to tables
  • Table schema
  • Ingestion jobs

    • Create an ingestion job
    • Ingest using SQL
    • Job auto-discovery
    • Timestamp expressions
    • SQL ingestion reference
    • Ingestion status reference
  • Data partitioning
  • Introduction to rollup
  • Replace data
  • Ingestion use cases

    • Approximation algorithms
    • Ingest earliest or latest value

Ingestion sources

  • Ingestion sources overview
  • Supported data formats
  • Create a connection
  • Ingest from files
  • Ingest data from a table
  • Ingest from S3
  • Ingest from Kafka and MSK
  • Ingest from Kinesis
  • Ingest from Confluent Cloud
  • Kafka Connector for Imply Polaris
  • Push event data
  • Connect to Confluent Schema Registry
  • Ingestion source reference

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Data cube dimensions
  • Data cube measures
  • Dashboards
  • Visualizations reference
  • Set up alerts
  • Set up reports
  • Embed visualizations

Querying

  • Overview
  • Time series functions

Monitoring

  • Overview
  • Monitoring dashboards
  • Monitor performance metrics
  • Integrate with Datadog
  • Integrate with Prometheus
  • Integrate with Elastic stack
  • Metrics reference

Management

  • Overview
  • Pause and resume a project

Usage and Billing

  • Billing structure overview
  • Polaris plans
  • Add a payment method
  • Monitor account usage

Security

    Polaris access

    • Overview
    • Invite users to your organization
    • Manage users
    • Permissions reference
    • Manage user groups
    • Enable SSO
    • SSO settings reference
    • Map IdP groups

    Secure networking

    • Connect to AWS
    • Create AWS PrivateLink connection

Developer guide

  • Overview
  • Security

    • Overview
    • Authenticate with API keys
    • Authenticate with OAuth
    • Manage users and groups
    • Restrict an embedding link
  • Migrate deprecated resources
  • Create a table
  • Upload files
  • Ingestion jobs

    • Create an ingestion job
    • Create a streaming ingestion job
    • Ingest using SQL
    • View and manage jobs

    Ingestion sources

    • Ingest from files
    • Ingest from a table
    • Get ARN for AWS access
    • Ingest from Amazon S3
    • Ingest from Kafka and MSK
    • Ingest from Amazon Kinesis
    • Ingest from Confluent Cloud
    • Push event data
    • Kafka Connector for Imply Polaris
    • Kafka Connector reference

    Ingestion use cases

    • Filter data to ingest
    • Ingest nested data
    • Ingest and query sketches
    • Specify data schema
    • Ingest Kafka metadata

    Analytics

    • Query data
    • Connect over JDBC
    • Link to BI tools
    • Query parameters reference
  • Update a project
  • API documentation

    • OpenAPI reference
    • Query API

    Migrations

    • Migrate from Hybrid

Product info

    Release notes

    • 2023
    • 2022
  • Known limitations
  • Druid extensions

Ingestion source reference

This topic summarizes the features for various ingestion sources in Imply Polaris.

For information about usage and configuration options, see the topics for individual ingestion sources. For a more conceptual overview of ingestion, see Ingestion sources overview. To learn more about Polaris and supported regions for AWS PrivateLink, see PrivateLink

Batch ingestion

Polaris supports the following batch ingestion sources.

Ingestion source
Amazon S3Ingestion methodRead from S3
SemanticsAtomic
Supported data typesJSON, CSV / TSV, Parquet, ORC, Avro OCF
Private networking optionsN/A
Security for data in transitTLS by default. User controlled
AuthenticationIAM roles with short-lived auth keys
CloudsAWS
AWS regionsAll
Good forHigh volume batch ingestion
File UploadIngestion methodPublish files to Polaris file staging. File ingestion is Polaris internal
SemanticsAtomic
Supported data typesJSON, CSV / TSV, Parquet, ORC, Avro OCF
Private networking optionsPrivateLink
Security for data in transitTLS
AuthenticationAPI keys, OAuth token-based auth
CloudsAll; AWS only if PrivateLink
AWS regionsAll from public internet; Polaris-supported AWS regions if PrivateLink
Good forGetting started quickly, small to medium use cases, trials and POCs
Table-to-tableIngestion methodPolaris internal
SemanticsAtomic
Supported data typesPolaris table
Private networking optionsN/A
Security for data in transitPolaris internal
AuthenticationPolaris internal
CloudsN/A
AWS regionsN/A
Good forRe-indexing data

Streaming ingestion

Polaris supports the following streaming ingestion sources.

Ingestion source
Confluent CloudIngestion methodConsume
SemanticsExactly once
Supported data typesJSON, CSV / TSV, Avro, Protobuf
Private networking optionsPrivateLink
Security for data in transitTLS by default, user controlled
AuthenticationSASL authentication with long-lived auth keys
CloudsAll
AWS regionsAll
Good forHigh data volume and high throughput
Amazon KinesisIngestion methodConsume
SemanticsExactly once
Supported data typesJSON, CSV / TSV, Avro, Protobuf
Private networking optionsN/A
Security for data in transitTLS by default, user controlled
AuthenticationIAM role assumption with short-lived auth keys
CloudsAWS
AWS regionsAll
Good forHigh data volume and high throughput
Apache Kafka or AWS MSKIngestion methodConsume
SemanticsExactly once
Supported data typesJSON, CSV / TSV, Avro, Protobuf
Private networking optionsPrivateLink
Security for data in transitTLS for SASL/SCRAM; TLS by default for MSK + IAM, user controlled
AuthenticationSASL/PLAIN, SASL/SCRAM. MSK also supports IAM role assumption with short-lived auth keys
CloudsAll for Apache Kafka; AWS for MSK
AWS regionsAll
Good forHigh data volume and high throughput
Kafka Connector to Apache Kafka or MSKIngestion methodPublish
SemanticsAt least once
Supported data typesJSON, CSV / TSV, Avro, Protobuf
Private networking optionsPrivateLink
Security for data in transitTLS
AuthenticationOAuth token-based
CloudsAll; AWS only if PrivateLink
AWS regionsAll if public internet; Polaris-supported regions if PrivateLink
Good forGetting started quickly, Small-medium use cases, Trials and POC
Events APIIngestion methodPublish
SemanticsAt most once (no retries) or at least once (retries)
Supported data typesJSON, CSV / TSV
Private networking optionsPrivateLink
Security for data in transitTLS
AuthenticationAPI keys, OAuth token-based
CloudsAll; AWS only if PrivateLink
AWS regionsAll if public internet; Polaris-supported regions if PrivateLink
Good forUse cases and configurations where you don’t want to own Kafka, for example IoT

Schema metadata

Polaris supports the following schema metadata sources.

Metadata source
Confluent Schema RegistryIngestion methodConsume metadata
Supported data typesConfluent Schema Registry
Private networking optionsN/A
Security for data in transitTLS by default, user controlled
Authentication
CloudsAll
AWS regionsAll
Good forAvro or Protobuf consume ingestion
Schema specificationIngestion methodPublish metadata
Supported data typesAvro: JSON schema; Protobuf: descriptor file
Private networking optionsPrivateLink
Security for data in transitTLS
AuthenticationAPI keys, OAuth token-based auth
CloudsAll; AWS only if PrivateLink
AWS regionsAll if public internet; Polaris-supported regions if PrivateLink
Good forAvro or Protobuf consume ingestion
← Connect to Confluent Schema RegistryOverview →
  • Batch ingestion
  • Streaming ingestion
  • Schema metadata
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2023 Imply Data, Inc