Connect to Amazon S3
To ingest data from Amazon S3 into Imply Polaris, create an Amazon S3 connection and use it as the source of an ingestion job. Create a unique connection for each S3 bucket from which you want to ingest data.
Polaris authenticates with S3 using IAM role assumption. Before setting up an S3 connection, familiarize yourself with IAM role assumption in Polaris.
This topic provides reference information to create an S3 connection.
For an end-to-end guide to S3 ingestion in Polaris, see Guide for S3 ingestion.
Create a connection
Create an S3 connection as follows:
- Click Sources from the left navigation menu.
- Click Create source and select Amazon S3.
- Enter the connection information.
- Click Test connection to confirm that the connection is successful.
- Click Create connection to create the connection.
The following screenshot shows an example connection created in the UI. For more information, see Create a connection.
Connection information
Follow the steps in Create a connection to create the connection. The connection requires the following information from S3:
Information about the S3 bucket to ingest from.
Bucket name: The name of the S3 bucket that contains the data to ingest.
AWS S3 endpoint: The endpoint of the S3 service, such as
s3.us-east-1.amazonaws.com
. To find your AWS endpoint, refer to the AWS service endpoints documentation.Prefix (optional): You can limit access to designated files in the S3 bucket by specifying a prefix. The connection will be limited to the set of files matching this prefix. For example,
logs/20221014T00:00:00
.The prefix that you apply when creating a connection is separate from prefixes you can supply when selecting objects for an ingestion job. The prefix of a connection acts as an active file filter that limits the data accessible from the S3 connection.
Authorization to access the S3 bucket. For more information, see Secure connections to AWS and the AWS documentation on Managing access to resources.
ARN of IAM role: The Amazon Resource Name (ARN) of your AWS role Imply will assume. For example,
arn:aws:iam:::123456789012:role/s3-access-role
.Trust policy attached to the IAM role: Authorizing access to your S3 data from Polaris requires a trust policy added to your IAM role to allow Polaris to assume the role. For an example, see Trust policy.
Permissions policy attached to the IAM role: In order to grant Polaris access to view and ingest data from your S3 buckets, attach to the IAM role a permissions policy that lists your S3 resources and includes the following actions:
s3:GetObject
to retrieve objects from the S3 bucket.s3:ListBucket
(optional) to list the objects in the S3 bucket.
This permission is not required to ingest from S3; however, Imply strongly recommends you include the permission because it makes viewing and selecting objects to ingest more straightforward. Note thats3:ListBucket
is the name of the permission that allows a user to list the objects in a bucket.ListObjectsV2
is the name of the API call that lists the objects in a bucket.
Example IAM permissions policy
The following example shows an IAM policy that can be attached to your IAM role.
The policy grants the role the listed permissions for Polaris to view and obtain data from your S3 bucket.
Replace S3 ARN
with the ARN for your S3 resource—for example, arn:aws:s3:::bucket_name
.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"S3 ARN"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"S3 ARN/*"
]
}
]
}
Select files for ingestion
To ingest files from an S3 connection, select the Amazon S3 source when you load data into a new or existing table. In the following example, the Amazon S3 source is selected for data ingestion.
Select the S3 connection that matches the S3 bucket with the data you want to ingest. You can then select individual files or use the text field to enter a prefix or a wildcard (glob) pattern.
Filter objects by pattern
Polaris matches a prefix or wildcard pattern on the key names of the S3 objects.
For example, for the URI s3://foo/bar/file.json
, foo
is the bucket name defined in the connection.
Polaris matches the prefix or wildcard against bar/file.json
.
The following list shows examples of wildcard patterns and their selections:
*.json
: objects ending in.json
with no prefix?.json
: objects ending in.json
with no prefix and a single character base name**.json
: objects ending in.json
with any prefixdata/{January,February}/*.json
: objects ending in.json
with the prefixdata/January/
ordata/February/
Don't use a wildcard that encompasses the file extension if you have more than one file type. If you want to ingest data from more than one file type, create a separate ingestion job for each.
When using a prefix or wildcard pattern, you can't additionally select individual files using the file selector or the URI dialog.
View and edit URIs
When you select individual files, you can click Use URIs to view a list of URIs for your selected files. You can edit this list to add or remove objects for ingestion.
Learn more
See the following topics for more information:
- For an end-to-end guide on S3 ingestion in Polaris, see Guide for S3 ingestion.
- To learn how to ingest data from Amazon S3 using the Polaris API, see Ingest data from Amazon S3 by API.
- For details on how to ingest metadata pertaining to the S3 data, see Ingest object metadata.