Skip to main content

Connect to Amazon S3

To ingest data from Amazon S3 into Imply Polaris, create an Amazon S3 connection and use it as the source of an ingestion job. Create a unique connection for each S3 bucket from which you want to ingest data.

Polaris authenticates with S3 using IAM role assumption. Before setting up an S3 connection, familiarize yourself with IAM role assumption in Polaris.

This topic provides reference information to create an S3 connection.

tip

For an end-to-end guide to S3 ingestion in Polaris, see Guide for S3 ingestion.

Create a connection

Create an S3 connection as follows:

  1. Click Sources from the left navigation menu.
  2. Click Create source and select Amazon S3.
  3. Enter the connection information.
  4. Click Test connection to confirm that the connection is successful.
  5. Click Create connection to create the connection.

The following screenshot shows an example connection created in the UI. For more information, see Create a connection.

S3 connection UI

Connection information

Follow the steps in Create a connection to create the connection. The connection requires the following information from S3:

  • Information about the S3 bucket to ingest from.

    • Bucket name: The name of the S3 bucket that contains the data to ingest.

    • AWS S3 endpoint: The endpoint of the S3 service, such as s3.us-east-1.amazonaws.com. To find your AWS endpoint, refer to the AWS service endpoints documentation.

    • Prefix (optional): You can limit access to designated files in the S3 bucket by specifying a prefix. The connection will be limited to the set of files matching this prefix. For example, logs/20221014T00:00:00.

      The prefix that you apply when creating a connection is separate from prefixes you can supply when selecting objects for an ingestion job. The prefix of a connection acts as an active file filter that limits the data accessible from the S3 connection.

  • Authorization to access the S3 bucket. For more information, see Secure connections to AWS and the AWS documentation on Managing access to resources.

    • ARN of IAM role: The Amazon Resource Name (ARN) of your AWS role Imply will assume. For example, arn:aws:iam:::123456789012:role/s3-access-role.

    • Trust policy attached to the IAM role: Authorizing access to your S3 data from Polaris requires a trust policy added to your IAM role to allow Polaris to assume the role. For an example, see Trust policy.

    • Permissions policy attached to the IAM role: In order to grant Polaris access to view and ingest data from your S3 buckets, attach to the IAM role a permissions policy that lists your S3 resources and includes the following actions:

      • s3:GetObject to retrieve objects from the S3 bucket.
      • s3:ListBucket (optional) to list the objects in the S3 bucket.
        This permission is not required to ingest from S3; however, Imply strongly recommends you include the permission because it makes viewing and selecting objects to ingest more straightforward. Note that s3:ListBucket is the name of the permission that allows a user to list the objects in a bucket. ListObjectsV2 is the name of the API call that lists the objects in a bucket.

Example IAM permissions policy

The following example shows an IAM policy that can be attached to your IAM role. The policy grants the role the listed permissions for Polaris to view and obtain data from your S3 bucket. Replace S3 ARN with the ARN for your S3 resource—for example, arn:aws:s3:::bucket_name.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"S3 ARN"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"S3 ARN/*"
]
}
]
}

Select files for ingestion

To ingest files from an S3 connection, select the Amazon S3 source when you load data into a new or existing table. In the following example, the Amazon S3 source is selected for data ingestion.

S3 connection sources

Select the S3 connection that matches the S3 bucket with the data you want to ingest. You can then select individual files or use the text field to enter a prefix or a wildcard (glob) pattern.

S3 file list

Filter objects by pattern

Polaris matches a prefix or wildcard pattern on the key names of the S3 objects. For example, for the URI s3://foo/bar/file.json, foo is the bucket name defined in the connection. Polaris matches the prefix or wildcard against bar/file.json.

The following list shows examples of wildcard patterns and their selections:

  • *.json: objects ending in .json with no prefix
  • ?.json: objects ending in .json with no prefix and a single character base name
  • **.json: objects ending in .json with any prefix
  • data/{January,February}/*.json: objects ending in .json with the prefix data/January/ or data/February/

Don't use a wildcard that encompasses the file extension if you have more than one file type. If you want to ingest data from more than one file type, create a separate ingestion job for each.

When using a prefix or wildcard pattern, you can't additionally select individual files using the file selector or the URI dialog.

View and edit URIs

When you select individual files, you can click Use URIs to view a list of URIs for your selected files. You can edit this list to add or remove objects for ingestion.

S3 file URIs

Learn more

See the following topics for more information: