Skip to main content

Guide for Amazon Kinesis ingestion

This guide walks through the end-to-end process to ingest data into Imply Polaris from Amazon Kinesis. For information on creating Kinesis connections in Polaris, see Ingest from Amazon Kinesis.

The following diagram summarizes the end-to-end process of connecting to your Amazon Kinesis source and ingesting from it. Shaded boxes represent steps taken within Polaris, and unshaded boxes represent steps taken outside Polaris.

info

The screen captures in this guide show the configurations for Amazon Web Services (AWS) and Kinesis services in February 2024. They may not reflect the current state of the product.

Prerequisites

To complete the steps in this guide, you need the following:

  • An Amazon Kinesis stream containing data to ingest. See Supported formats for requirements on the data format for ingestion.

    info

    Note that setting up ingestion of Kinesis data into Polaris is more straightforward if you already have data in your Kinesis stream. Add data to the stream before you follow the steps in this guide.

  • The Amazon Identity and Access Management (IAM) permissions to create roles, create policies, and attach policies to roles. See the AWS documentation on Allow users and groups to create and modify roles.

  • Permissions in Polaris to create tables, connections, and ingestion jobs: ManageTables, ManageConnections, and ManageIngestionJobs, respectively. For more information on permissions, visit Permissions reference.

Get Imply's IAM role identifier

In this section, you record the Amazon Resource Name (ARN) and the external ID of Imply's AWS role. When you create a new role in AWS, you include these details to allow Imply to assume your role.

  1. In Imply Polaris, go to Sources > Create source > Amazon Kinesis.

  2. Copy and save the ARN and the external ID of Imply's IAM role in the New connection dialog.

    Polaris new Kinesis connection

Get the Kinesis stream ARN

In this section, you record the Amazon Resource Name (ARN) of your Kinesis stream.

  1. In AWS, search for "Kinesis" and select that service.

    AWS Kinesis service

  2. Click Data streams in the left pane and then click the name of your stream.

  3. Copy and save the ARN of the stream. You'll need it in a later step.

    AWS Kinesis stream info

Create an AWS permissions policy

In this section, you create a permissions policy that grants permissions to access specific Kinesis resources. When you create a new role in AWS, you attach the permissions policy to the role, so that your role has permission to access the resources.

  1. Navigate to the IAM Dashboard in AWS.

  2. Select Policies in the left sidebar, then click Create policy.

  3. In the Select a service section, select Kinesis.

  4. Click the JSON tab.

  5. Replace the contents in the Policy editor with the following policy. Replace KINESIS ARN with the Kinesis stream ARN you copied in the previous section.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "kinesis:ListStreams",
    "kinesis:DescribeStream"
    ],
    "Resource": ["*"]
    },
    {
    "Effect": "Allow",
    "Action": [
    "kinesis:ListShards",
    "kinesis:GetShardIterator",
    "kinesis:GetRecords"
    ],
    "Resource": ["KINESIS ARN"]
    }
    ]
    }

    The policy editor in the UI should look something like this:

    Polaris new Kinesis policy

  6. Click Next.

  7. Provide a name for the permissions policy, then click Create policy.

Create an AWS IAM role

In this section, you create an AWS IAM role to which you attach the permissions policy to grant access to the Kinesis stream and a trust policy to authorize Imply to assume the role.

  1. Navigate to the IAM Dashboard in AWS.

  2. Select Roles in the left sidebar, then click Create role.

  3. For the trusted entity, select Custom trust policy.

  4. In the Custom trust policy section, AWS provides a template trust policy for your role. In the Principal object, enter the following key-value pairreplacing IMPLY ARN with the Imply ARN you saved from the previous section. This allows Imply's IAM role to assume the role that you create.

    "AWS": "IMPLY ARN"

    AWS IAM role trust policy

  5. In the Edit statement pane, identify the section for Add a condition and click Add.

  6. Complete the following details:

    • Condition key: Select sts:ExternalId.
    • Qualifier: Select Default.
    • Operator: Select StringEquals.
    • Value: Enter the external ID of Imply's IAM role you saved from the previous section.

    AWS IAM role trust policy

  7. Click Add condition, then click Next.

  8. Now you add permissions to the AWS role. This allows your IAM role to access your Kinesis stream.

    Search for and select the policy name you created in the previous section.

    AWS IAM role permissions

  9. Click Next.

  10. Enter a descriptive value for Role name, review the trust and permissions policies, then click Create role.

  11. Click the role to view its details. Record the ARN of the role to use when you create a connection in Polaris.

    AWS IAM role details

Create a Kinesis connection

In this section, you create a Kinesis connection in Polaris. First follow the steps in Get Imply's IAM role identifier.

  1. In Imply Polaris, go to Sources > Create source > Amazon Kinesis.

  2. In the New connection dialog, enter the following details:

    • Connection name: A unique name for your connection.
    • Description: An optional description for the connection.
    • Kinesis stream name: The name of the Kinesis stream.
    • AWS endpoint: The endpoint of the Kinesis stream, such as kinesis.us-east-1.amazonaws.com. There is no limitation on a region for the Kinesis data stream. To find your AWS endpoint, refer to the AWS service endpoints documentation.
    • IAM role ARN: The ARN of the AWS role you created.

    For more details on these fields, see Kinesis connection information.

  3. Click Test connection to ensure that Polaris can make a connection to the Kinesis stream and that the stream contains data.

    Kinesis connection UI

Start an ingestion job

In this section, you create an ingestion job to add data from your Kinesis stream into a table in Polaris.

info

In this guide, Polaris automatically creates the table based on details in the job definition. For greater control on your table properties such as its partitioning or schema enforcement, create the table manually before starting your first ingestion job. For details, see Introduction to tables.

  1. In Imply Polaris, go to Jobs > Create job > Insert data.

  2. Click New table.

  3. Enter a name for the table, and click Next.

  4. Select the Amazon Kinesis source, then the connection name, and click Next.

    Select source

  5. Verify the input format and fields in the parsed data and click Continue.

  6. Continue through the load data wizard and configure your ingestion job based on your data and use case.

    i. Polaris doesn't ingest data older than the late message rejection period (30 days by default). You can change this rejection period or add a rejection period to filter out events with timestamps in the future.

    ii. The Starting offset setting determines what you can do with events already sent to the Kinesis stream:

    • Beginning: Ingest all events as previewed as well as future events sent to the stream.
    • End: You can preview the events in the ingestion job but Polaris only ingests events you send to the stream after the ingestion job begins.
  7. Click Start ingestion to begin ingestion.

Learn more

See the following topics for more information: