Guide for S3 ingestion
This guide walks through the end-to-end process to ingest data into Imply Polaris from Amazon S3. For information on creating S3 connections in Polaris, see Ingest from S3.
The following diagram summarizes the end-to-end process of connecting to your S3 source and ingesting from it:
The screen captures in this guide show the configurations for Amazon services on October 2023. They may not reflect the current state of the product.
Prerequisites
To complete the steps in this guide, you need the following:
An Amazon S3 bucket containing objects to ingest. See Supported formats for requirements on the data format for ingestion.
The Amazon Identity and Access Management (IAM) permissions to create roles, create policies, and attach policies to roles. See the AWS documentation on Allow users and groups to create and modify roles.
Permissions in Polaris to create tables, connections, and ingestion jobs:
ManageTables
,ManageConnections
, andManageIngestionJobs
, respectively. For more information on permissions, visit Permissions reference.
Get Imply's IAM role identifier
In this section, you record the Amazon Resource Name (ARN) and the external ID of Imply's AWS role. When you create a new role in AWS, you include these details to allow Imply to assume your role.
In Imply Polaris, go to Sources > Create source > Amazon S3.
Copy and save the ARN and the external ID of Imply's IAM role in the New connection dialog.
Create an AWS permissions policy
In this section, you create a permissions policy that grants permissions to access specific S3 resources. When you create a new role in AWS, you attach the permissions policy to the role, so that your role has permission to access the resources.
Navigate to the IAM Dashboard in Amazon Web Services (AWS).
Select Policies in the left sidebar, then click Create policy.
For Service, select
S3
.Add an action to grant permission to list all objects in the S3 bucket.
For Actions allowed, search for and select
ListBucket
.For Resources, select Specific.
Click Add ARNs. In Resource bucket name, enter the name of the bucket, then confirm with Add ARNs.
If you plan to use this role to grant Imply access to ingest from multiple S3 buckets, repeat the previous step for each bucket.
Add an action to grant permission to retrieve objects in the S3 bucket.
For Actions allowed, search for and select
GetObject
.For Resources, select Specific.
Click Add ARNs. In Resource bucket name, enter the name of the bucket, then confirm with Add ARNs.
If you plan to use this role to grant Imply access to ingest from multiple S3 buckets, repeat the previous step for each bucket.
Click Next and enter a descriptive value for Policy name, then click Create policy.
Create an AWS IAM role
In this section, you create an AWS IAM role to which you attach the permissions policy to grant access to the S3 resources and a trust policy to authorize Imply to assume the role.
Navigate to the IAM Dashboard in Amazon Web Services (AWS).
Select Roles in the left sidebar, then click Create role.
For the trusted entity, select Custom trust policy.
In the Custom trust policy section, AWS provides a template trust policy for your role. In the
Principal
object, enter the following key-value pair. ReplaceIMPLY ARN
with the ARN you saved from the previous section. This allows Imply's IAM role to assume the role that you create."AWS": "IMPLY ARN"
In the Edit statement pane, identify the section for Add a condition and click Add.
Complete the following details:
- Condition key: Select
sts:ExternalId
. - Qualifier: Select
Default
. - Operator: Select
StringEquals
. - Value: Enter the external ID of Imply's IAM role you saved from the previous section.
- Condition key: Select
Click Add condition, then click Next.
Now you add permissions to the AWS role. This allows your IAM role to access your S3 bucket.
Search for and select the policy name you created in the previous section.
Click Next.
Enter a descriptive value for Role name, review the trust and permissions policies, then click Create role.
Click the role to view its details. Record the ARN of the role to use when you create a connection in Polaris.
Create an S3 connection in Polaris
In this section, you create an Amazon S3 connection in Polaris. First follow the steps in Get Imply's IAM role identifier.
In the New connection dialog, enter the following details:
- Connection name: A unique name for your connection.
- Description: An optional description for the connection.
- Bucket name: Name of the S3 bucket to ingest data from. The bucket name should be one of the bucket names you listed when creating your AWS permissions policy.
- Prefix: An optional prefix to limit access to certain files in the bucket.
- AWS endpoint: The endpoint of the S3 service, such as
s3.us-east-1.amazonaws.com
. - IAM role ARN: The ARN of the AWS role you created.
For more details on these fields, see S3 connection information.
Create an ingestion job
In this section, you create an ingestion job to add data from the S3 connection into a table in Polaris.
In this guide, Polaris automatically creates the table based on details in the job definition. For greater control on your table properties such as its partitioning or schema enforcement, create the table manually before starting your first ingestion job. For details, see Introduction to tables.
In Imply Polaris, go to Jobs > Create job > Insert data.
Enter a name for the table, and click Next.
Select the Amazon S3 source, then the connection name. Polaris lists the objects in the bucket that it has permissions to view based on the
ListBucket
policy and the connection prefix.Select the file to ingest, then click Next.
Continue through the load data wizard and configure your ingestion job based on your data and use case.
Click Start ingestion to begin ingestion.