Skip to main content

Guide for Amazon S3 ingestion

This guide walks through the end-to-end process to ingest data into Imply Polaris from Amazon S3. For information on creating S3 connections in Polaris, see Connect to S3.

The following diagram summarizes the end-to-end process of connecting to your S3 source and ingesting from it. Shaded boxes represent steps taken within Polaris, and unshaded boxes represent steps taken outside Polaris.

info

The screen captures in this guide show the configurations for Amazon services on October 2023. They may not reflect the current state of the product.

Prerequisites

To complete the steps in this guide, you need the following:

  • An Amazon S3 bucket containing objects to ingest. See Supported formats for requirements on the data format for ingestion.

  • The Amazon Identity and Access Management (IAM) permissions to create roles, create policies, and attach policies to roles. See the AWS documentation on Allow users and groups to create and modify roles.

  • Permissions in Polaris to create tables, connections, and ingestion jobs: ManageTables, ManageConnections, and ManageIngestionJobs, respectively. For more information on permissions, visit Permissions reference.

Get Imply's IAM role identifier

In this section, you record the Amazon Resource Name (ARN) and the external ID of Imply's AWS role. When you create a new role in AWS, you include these details to allow Imply to assume your role.

  1. In Imply Polaris, go to Sources > Create source > Amazon S3.

  2. Copy and save the ARN and the external ID of Imply's IAM role in the New connection dialog.

    Polaris new S3 connection

Get the S3 bucket ARN

In this section, you record the Amazon Resource Name (ARN) of your S3 bucket.

  1. In AWS, search for "S3" and select that service.

    AWS S3 service

  2. Click Buckets in the left pane and then click the name of your bucket.

  3. Click the Properties tab and copy and save the ARN of the bucket. You'll need it in a later step.

    AWS S3 bucket info

Create an AWS permissions policy

In this section, you create a permissions policy that grants permissions to access specific S3 resources. When you create a new role in AWS, you attach the permissions policy to the role, so that your role has permission to access the resources.

  1. Navigate to the IAM Dashboard in Amazon Web Services (AWS).

  2. Select Policies in the left sidebar, then click Create policy.

  3. In the Select a service section, select S3.

  4. Click the JSON tab.

  5. Replace the contents in the Policy editor with the following policy. Replace S3 ARN with the S3 bucket ARN you copied in the previous section.

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "s3:ListBucket"
    ],
    "Resource": [
    "S3 ARN"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "s3:GetObject"
    ],
    "Resource": [
    "S3 ARN/*"
    ]
    }
    ]
    }

    The policy editor in the UI should look something like this:

    Polaris new S3 policy

  6. Click Next.

  7. Provide a name for the permissions policy, then click Create policy.

Create an AWS IAM role

In this section, you create an AWS IAM role to which you attach the permissions policy to grant access to the S3 resources and a trust policy to authorize Imply to assume the role.

  1. Navigate to the IAM Dashboard in Amazon Web Services (AWS).

  2. Select Roles in the left sidebar, then click Create role.

  3. For the trusted entity, select Custom trust policy.

  4. In the Custom trust policy section, AWS provides a template trust policy for your role. In the Principal object, enter the following key-value pair. Replace IMPLY ARN with the ARN you saved from the previous section. This allows Imply's IAM role to assume the role that you create.

    "AWS": "IMPLY ARN"

    AWS IAM role trust policy

  5. In the Edit statement pane, identify the section for Add a condition and click Add.

  6. Complete the following details:

    • Condition key: Select sts:ExternalId.
    • Qualifier: Select Default.
    • Operator: Select StringEquals.
    • Value: Enter the external ID of Imply's IAM role you saved from the previous section.

    AWS IAM role add condition complete

  7. Click Add condition, then click Next.

  8. Now you add permissions to the AWS role. This allows your IAM role to access your S3 bucket.

    Search for and select the policy name you created in the previous section.

    AWS IAM role permissions

  9. Click Next.

  10. Enter a descriptive value for Role name, review the trust and permissions policies, then click Create role.

  11. Click the role to view its details. Record the ARN of the role to use when you create a connection in Polaris.

    AWS IAM role details

Create an S3 connection

In this section, you create an Amazon S3 connection in Polaris. First follow the steps in Get Imply's IAM role identifier.

  1. In Imply Polaris, go to Sources > Create source > Amazon S3.

  2. In the New connection dialog, enter the following details:

    • Connection name: A unique name for your connection.
    • Description: An optional description for the connection.
    • Bucket name: Name of the S3 bucket to ingest data from.
    • Prefix: An optional prefix to limit access to certain files in the bucket.
    • AWS endpoint: The endpoint of the S3 service, such as s3.us-east-1.amazonaws.com.
    • IAM role ARN: The ARN of the AWS role you created.

S3 connection UI

For more details on these fields, see S3 connection information.

Start an ingestion job

In this section, you create an ingestion job to add data from the S3 connection into a table in Polaris.

info

In this guide, Polaris automatically creates the table based on details in the job definition. For greater control on your table properties such as its partitioning or schema enforcement, create the table manually before starting your first ingestion job. For details, see Introduction to tables.

  1. In Imply Polaris, go to Jobs > Create job > Insert data.

  2. Click New table.

  3. Enter a name for the table, and click Next.

  4. Select the Amazon S3 source, then the connection name. Polaris lists the objects in the bucket that it has permissions to view based on the ListBucket policy and the connection prefix.

    Select source

  5. Select the file to ingest, then click Next.

  6. Verify the input format and fields in the parsed data and click Continue.

  7. Continue through the load data wizard and configure your ingestion job based on your data and use case.

  8. Click Start ingestion to begin ingestion.

Learn more

See the following topics for more information: