Skip to main content

Connect to Amazon Kinesis

To ingest data into Druid from Amazon Kinesis, you can use the Druid Kinesis indexing service which:

  • Reads events using Kinesis's own shards and sequence number mechanisms to guarantee exactly-once ingestion.
  • Provides the ability to ingest historical data.

This tutorial walks you through the steps to:

  • Configure Druid to load the Kinesis indexing service.
  • Supply your AWS Kinesis connection credentials for Druid.
  • Create a ingestion spec where you can specify your AWS Kinesis endpoint and stream name. For more details about the supervisor, see Amazon Kinesis ingestion.

Before starting

This tutorial assumes that you have already set up an Imply cluster and that you have the privileges to modify the cluster configuration. See the Quickstart for more information.

The tutorial also assumes that you have an AWS account and are able to set up or access a Kinesis stream and load data into it. If you load the Wikipedia sample data or load the Wikimedia IRC data into your stream, you can use your Kinesis data source to continue with data parsing steps as part of the Data Ingestion tutorial.

Before you configure Druid to connect to Kinesis, collect the following information needed to complete the setup:

  • The AWS Kinesis endpoint where your stream residesfor example, kinesis.us-east-1.amazonaws.com.
  • The Kinesis stream name to read from.
  • Information needed to access the stream. For example, an AWS Access Key and AWS Secret Access Key.

Configure Druid

To ingest data from AWS Kinesis, update the Druid configuration to load the Kinesis indexing service and to supply your AWS credentials.

Load the Kinesis indexing service

Kinesis data ingestion requires the Druid Kinesis indexing service on the Overlord and the Middle Managers. If you use Imply Hybrid (formerly Imply Cloud), the extension for the Kinesis indexing service is loaded automatically.

If you're running the quickstart unmanaged Imply cluster, edit the Druid common.runtime.properties to load the druid-kinesis-indexing-service extension.

  1. Open the runtime properties configuration file at <imply_home>/conf-quickstart/druid/_common/common.runtime.properties.
  2. Add druid-kinesis-indexing-service to the list of extensions to load. For example:
druid.extensions.loadList=["druid-histogram", "druid-datasketches", "druid-kafka-indexing-service", "imply-utility-belt", "druid-kinesis-indexing-service"]

For more information, see Amazon Kinesis ingestion.

Keep your common runtime properties file open. You'll make more changes in the next step.

Supply your AWS credentials

Druid requires AWS credentials to access the Kinesis API.

If you are using Imply on-prem, you can add the following properties to the list of Druid common runtime properties:

  • druid.kinesis.accessKey: the access key for AWS
  • druid.kinesis.secretKey: the secret access key for AWS.

For example, append the following to common.runtime.properties making sure to replace <AWS Access Key> and <AWS Secret Key> with your credentials:

#
# Kinesis AWS Credentials
#

druid.kinesis.accessKey=<AWS Access Key>
druid.kinesis.secretKey=<AWS Secret Key>

If you use Imply Hybrid, you can set the properties in the Imply Manager:

  1. From the home page, click Manage for the cluster to connect to Kinesis.

  2. Navigate to Setup and expand the Advanced config options.

  3. Under service properties, add the runtime properties and supply your AWS credentials. For example:

    Imply manager service configuration

  4. Click Apply changes.

In your production environment be sure to follow your company security policies when configuring AWS access.

Restart Druid

Restart your Druid cluster after making changes. If you are running Imply Hybrid, the Imply Manager UI lets you restart your cluster immediately after applying changes. If you are running an unmanaged cluster, restart it manually.

Create an ingestion spec

Use the Druid Load data UI to start a supervisor to access your Kinesis stream.

  1. Click Start a new spec.
  2. On the start tab, select Amazon Kinesis and click Connect data.
  3. Configure the Stream name and AWS Endpoint to connect to your Kinesis stream. If required, add the AWS assumed role ARN or AWS external ID for your stream.
  4. Click Apply.

Druid connects to your stream and loads sample records into the UI. For example:

data loader kinesis

You can also submit a ingestion spec to the indexer API using the command line. See Kinesis indexing service documentation for more details.

Next steps

If you are using the Druid Load data UI, you can continue with the data parsing steps to set up your ingestion spec. See Data Ingestion.

For more detail about Kinesis ingestion, see Amazon Kinesis ingestion.