Connect to Amazon S3
To ingest data from Amazon S3 into Imply Polaris, create an Amazon S3 connection and use it as the source of an ingestion job. Create a unique connection for each S3 bucket from which you want to ingest data.
Polaris authenticates with S3 using IAM role assumption or IAM access keys. We recommend that you follow security best practices to use IAM role assumption when possible. For IAM role assumption, your Polaris cluster must be hosted on AWS. Before creating the connection, familiarize yourself with IAM role assumption in Polaris.
This topic provides reference information to create an S3 connection.
For an end-to-end guide to S3 ingestion in Polaris, see Guide for S3 ingestion.
Create a connection
Create an S3 connection as follows:
- Click Sources from the left navigation menu.
- Click Create source and select Amazon S3.
- Enter the connection information.
- Click Test connection to confirm that the connection is successful.
- Click Create connection to create the connection.
The following screenshot shows an example connection created in the UI. For more information, see Create a connection.
Connection information
Follow the steps in Create a connection to create the connection. The connection requires the following information from S3:
Bucket name: The name of the S3 bucket that contains the data to ingest.
AWS S3 endpoint: The endpoint of the S3 service, such as
s3.us-east-1.amazonaws.com
. To find your AWS endpoint, refer to the AWS service endpoints documentation.Prefix (optional): You can limit access to designated files in the S3 bucket by specifying a prefix. The connection will be limited to the set of files matching this prefix. For example,
logs/20221014T00:00:00
.The prefix that you apply when creating a connection is separate from prefixes you can supply when selecting objects for an ingestion job. The prefix of a connection acts as an active file filter that limits the data accessible from the S3 connection.
Authentication
Polaris authenticates with Amazon S3 using IAM role assumption (recommended) or access keys.
IAM role assumption
To authenticate using IAM role assumption (recommended), create an AWS IAM role in your account. The IAM role should have permissions to access your S3 objects, and it should grant access for Polaris to assume your role.
You should be familiar with the information in Secure connections to AWS and the AWS documentation on Managing access to resources.
Ensure you have the following:
ARN of IAM role: In the S3 connection, provide the Amazon Resource Name (ARN) of your AWS role that Imply will assume. For example,
arn:aws:iam:::123456789012:role/s3-access-role
.Trust policy attached to the IAM role: Authorizing access to your S3 data from Polaris requires a trust policy added to your IAM role. The trust policy allows Polaris to assume the role. For an example, see Trust policy.
Permissions policy attached to the IAM role: Attach a permissions policy to your IAM role. The permissions policy should list your S3 resources as well as actions granting Polaris access to your data. See an example in the following section. The following actions are required:
s3:GetObject
to retrieve objects from the S3 bucket.s3:ListBucket
(optional) to list the objects in the S3 bucket.
Polaris doesn't require this permission to ingest from S3; however, Imply strongly recommends you include the permission because it makes viewing and selecting objects to ingest more straightforward.Note that
s3:ListBucket
is the name of the permission that allows a user to list the objects in a bucket.ListObjectsV2
is the name of the API call that lists the objects in a bucket.
Example IAM permissions policy
The following example shows an IAM policy that can be attached to your IAM role for IAM role assumption.
The policy grants permissions to view and obtain data from your S3 bucket.
Replace S3 ARN
with the ARN for your S3 resource—for example, arn:aws:s3:::bucket_name
.
Show the policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"S3 ARN"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"S3 ARN/*"
]
}
]
}
Access keys
To authenticate using access keys, supply the access key ID and secret access key from your AWS access key pair.
Select files for ingestion
To ingest files from an S3 connection, select the Amazon S3 source when you load data into a new or existing table. In the following example, the Amazon S3 source is selected for data ingestion.
Select the S3 connection that matches the S3 bucket with the data you want to ingest. You can then select individual files or use the text field to enter a prefix or a wildcard (glob) pattern.
Filter objects by pattern
Polaris matches a prefix or wildcard pattern on the key names of the S3 objects.
For example, for the URI s3://foo/bar/file.json
, foo
is the bucket name defined in the connection.
Polaris matches the prefix or wildcard against bar/file.json
.
The following list shows examples of wildcard patterns and their selections:
*.json
: objects ending in.json
with no prefix?.json
: objects ending in.json
with no prefix and a single character base name**.json
: objects ending in.json
with any prefixdata/{January,February}/*.json
: objects ending in.json
with the prefixdata/January/
ordata/February/
Don't use a wildcard that encompasses the file extension if you have more than one file type. If you want to ingest data from more than one file type, create a separate ingestion job for each.
When using a prefix or wildcard pattern, you can't additionally select individual files using the file selector or the URI dialog.
View and edit URIs
When you select individual files, you can click Use URIs to view a list of URIs for your selected files. You can edit this list to add or remove objects for ingestion.
Learn more
See the following topics for more information:
- For an end-to-end guide on S3 ingestion in Polaris, see Guide for S3 ingestion.
- To learn how to ingest data from Amazon S3 using the Polaris API, see Ingest data from Amazon S3 by API.
- For details on how to ingest metadata pertaining to the S3 data, see Ingest object metadata.