Loading data (Kinesis)

The Druid Kinesis indexing service provides support for ingesting data from Kinesis. This service offers exactly-once ingestion guarantees as well as the ability to ingest historical data. Additionally, this runs as part of the core Druid services and does not require any additional processes.

The Kinesis indexing service, uses supervisors which run on the overlord and manage the creation and lifetime of Kinesis indexing tasks. This indexing service can handle non-recent events and provides exactly-once ingestion semantics.

Starting with Imply 2.9.0, the bundled Kinesis indexing extension is a revamped open source edition that Imply has also contributed upstream to Apache Druid. There are some differences between the older Imply-proprietary extension and our newer open source version. For full details, please refer to the Imply release notes.

Submitting a Supervisor Spec

The Kinesis indexing service requires that the druid-kinesis-indexing-service extension be loaded on an Imply Cloud cluster.

A supervisor for a datasource is started by submitting a supervisor spec via the AWS Kinesis option under Continuous ingestion in the + Add Datasets view of a cluster:

+ Add Datasets

AWS Kinesis

A supervisor spec can also be submitted via HTTP POST to http://<OVERLORD_IP>:<OVERLORD_PORT>/druid/indexer/v1/supervisor, for example:

curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/supervisor

A sample supervisor spec is shown below:

{
  "type": "kinesis",
  "dataSchema": {
    "dataSource": "metrics-kinesis",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "auto"
        },
        "dimensionsSpec": {
          "dimensions": [],
          "dimensionExclusions": [
            "timestamp",
            "value"
          ]
        }
      }
    },
    "metricsSpec": [
      {
        "name": "count",
        "type": "count"
      },
      {
        "name": "value_sum",
        "fieldName": "value",
        "type": "doubleSum"
      },
      {
        "name": "value_min",
        "fieldName": "value",
        "type": "doubleMin"
      },
      {
        "name": "value_max",
        "fieldName": "value",
        "type": "doubleMax"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "NONE"
    }
  },
  "tuningConfig": {
    "type": "kinesis",
    "maxRowsPerSegment": 5000000
  },
  "ioConfig": {
    "stream": "metrics",
    "endpoint": "kinesis.us-east-1.amazonaws.com",
    "taskCount": 1,
    "taskDuration": "PT1H"
  }
}

More information

Please refer to Druid's Kinesis indexing service documentation for more details.

Overview

Tutorial

Deploy

Manage Data

Query Data

Visualize

Configure

Special UI Features

Imply Manager

Misc