Skip to main content

Connect to Amazon S3

To ingest data from Amazon S3 into Imply Polaris, create an Amazon S3 connection and use it as the source of an ingestion job. Create a unique connection for each S3 bucket from which you want to ingest data.

Polaris authenticates with S3 using IAM role assumption or IAM access keys. We recommend that you follow security best practices to use IAM role assumption when possible. For IAM role assumption, your Polaris cluster must be hosted on AWS. Before creating the connection, familiarize yourself with IAM role assumption in Polaris.

This topic provides reference information to create an S3 connection.

tip

For an end-to-end guide to S3 ingestion in Polaris, see Guide for S3 ingestion.

Create a connection

Create an S3 connection as follows:

  1. Click Sources from the left navigation menu.
  2. Click Create source and select Amazon S3.
  3. Enter the connection information.
  4. Click Test connection to confirm that the connection is successful.
  5. Click Create connection to create the connection.

The following screenshot shows an example connection created in the UI. For more information, see Create a connection.

S3 connection UI

Connection information

Follow the steps in Create a connection to create the connection. The connection requires the following information from S3:

  • Bucket name: The name of the S3 bucket that contains the data to ingest.

  • AWS S3 endpoint: The endpoint of the S3 service, such as s3.us-east-1.amazonaws.com. To find your AWS endpoint, refer to the AWS service endpoints documentation.

  • Prefix (optional): You can limit access to designated files in the S3 bucket by specifying a prefix. The connection will be limited to the set of files matching this prefix. For example, logs/20221014T00:00:00.

    The prefix that you apply when creating a connection is separate from prefixes you can supply when selecting objects for an ingestion job. The prefix of a connection acts as an active file filter that limits the data accessible from the S3 connection.

Authentication

Polaris authenticates with Amazon S3 using IAM role assumption (recommended) or access keys.

IAM role assumption

To authenticate using IAM role assumption (recommended), create an AWS IAM role in your account. The IAM role should have permissions to access your S3 objects, and it should grant access for Polaris to assume your role.

You should be familiar with the information in Secure connections to AWS and the AWS documentation on Managing access to resources.

Ensure you have the following:

  • ARN of IAM role: In the S3 connection, provide the Amazon Resource Name (ARN) of your AWS role that Imply will assume. For example, arn:aws:iam:::123456789012:role/s3-access-role.

  • Trust policy attached to the IAM role: Authorizing access to your S3 data from Polaris requires a trust policy added to your IAM role. The trust policy allows Polaris to assume the role. For an example, see Trust policy.

  • Permissions policy attached to the IAM role: Attach a permissions policy to your IAM role. The permissions policy should list your S3 resources as well as actions granting Polaris access to your data. See an example in the following section. The following actions are required:

    • s3:GetObject to retrieve objects from the S3 bucket.

    • s3:ListBucket (optional) to list the objects in the S3 bucket.
      Polaris doesn't require this permission to ingest from S3; however, Imply strongly recommends you include the permission because it makes viewing and selecting objects to ingest more straightforward.

      Note that s3:ListBucket is the name of the permission that allows a user to list the objects in a bucket. ListObjectsV2 is the name of the API call that lists the objects in a bucket.

Example IAM permissions policy

The following example shows an IAM policy that can be attached to your IAM role for IAM role assumption. The policy grants permissions to view and obtain data from your S3 bucket. Replace S3 ARN with the ARN for your S3 resource—for example, arn:aws:s3:::bucket_name.

Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"S3 ARN"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"S3 ARN/*"
]
}
]
}

Access keys

To authenticate using access keys, supply the access key ID and secret access key from your AWS access key pair.

Select files for ingestion

To ingest files from an S3 connection, select the Amazon S3 source when you load data into a new or existing table. In the following example, the Amazon S3 source is selected for data ingestion.

S3 connection sources

Select the S3 connection that matches the S3 bucket with the data you want to ingest. You can then select individual files or use the text field to enter a prefix or a wildcard (glob) pattern.

S3 file list

Filter objects by pattern

Polaris matches a prefix or wildcard pattern on the key names of the S3 objects. For example, for the URI s3://foo/bar/file.json, foo is the bucket name defined in the connection. Polaris matches the prefix or wildcard against bar/file.json.

The following list shows examples of wildcard patterns and their selections:

  • *.json: objects ending in .json with no prefix
  • ?.json: objects ending in .json with no prefix and a single character base name
  • **.json: objects ending in .json with any prefix
  • data/{January,February}/*.json: objects ending in .json with the prefix data/January/ or data/February/

Don't use a wildcard that encompasses the file extension if you have more than one file type. If you want to ingest data from more than one file type, create a separate ingestion job for each.

When using a prefix or wildcard pattern, you can't additionally select individual files using the file selector or the URI dialog.

View and edit URIs

When you select individual files, you can click Use URIs to view a list of URIs for your selected files. You can edit this list to add or remove objects for ingestion.

S3 file URIs

Learn more

See the following topics for more information: