Skip to main content

Upload files by API

The Files API lets you upload and manage files in the Imply Polaris file staging area. We recommend completing all file uploads before starting ingestion. After uploading your files, you can use the UI or the API to ingest the data into a Polaris table.

For information on the data formats and compression formats supported by Polaris, see Supported formats.

This topic walks you through the process to upload files with the Files API. If you want to skip the details, check out the examples.

Prerequisites

This topic assumes that you have an API key with the ManageFiles permission. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. See Authenticate with API keys to obtain an API key. For more information on permissions, visit Permissions reference.

Upload a file to staging

To upload a file, submit a POST request to the Files API. You can only upload one file, up to 2 GB in size, in a single request. This limit refers to the size of the file transmitted by the browser or HTTP client. You may upload a file that's larger than 2 GB on disk if your browser or client compresses the file in transit to below 2 GB.

In the request, include form data enclosing the path and name of the file to upload. For cURL requests, precede the file path with file=@ without any spaces. If you get an error indicating that your request does not contain a file to upload, double check that you've included the @ sign.

File names in Polaris must be unique. Uploading a file with the same name as a previously uploaded file results in a 409 Conflict error.

info

The media type of the HTTP request is multipart/form-data; however, you should not set the Content-Type: multipart/form-data header explicitly. Allow your client to assign the header automatically because it will also properly set the associated boundary directive.

Sample request

The following example shows how to upload a file called kttm-2019-08-20.json.gz to Polaris:

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--form 'file=@"kttm-2019-08-20.json.gz"'

Sample response

The following example shows a response to a successful file upload:

{
"id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
"name": "kttm-2019-08-20.json.gz",
"sizeBytes": 13800837,
"dataFormat": "nd-json",
"compressionFormat": "gz",
"digest": {
"algo": "md5",
"hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
},
"uploadedByUserId": "d3c723aa-52f2-4ab0-b23f-7b5c4aaf3ded",
"uploadedOnDatetime": "2022-02-03T00:21:18.164186373Z"
}

List all files

To view the files uploaded in the staging area of Polaris, issue a GET request to the Files API. By default, Polaris returns 1000 files per page of results. To access subsequent pages of results, use the paginationOffset query parameter in your request. For example,

GET /v1/projects/PROJECT_ID/files?paginationOffset=1000

To get information on a particular file, specify the name of the file in the request URL. For example,

GET /v1/projects/PROJECT_ID/files/kttm-2019-08-19.json.gz

For additional query parameters, see the Files API documentation.

Sample request

The following example shows how to list files uploaded to Polaris:

curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files" \
--header "Authorization: Basic $POLARIS_API_KEY"

Sample response

The following example shows a successful response listing the files and associated metadata:

{
"files": [
{
"id": "1c3b309b-6a0f-4a46-a1ad-07e0793b83fd",
"name": "kttm-2019-08-19.json.gz",
"sizeBytes": 8213401,
"dataFormat": "nd-json",
"compressionFormat": "gz",
"digest": {
"algo": "md5",
"hash": "E370CAC6A8D6A938A08279200883D7B3"
},
"uploadedOnDatetime": "2022-02-02T15:07:50Z"
},
{
"id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
"name": "kttm-2019-08-20.json.gz",
"sizeBytes": 13800837,
"dataFormat": "nd-json",
"compressionFormat": "gz",
"digest": {
"algo": "md5",
"hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
},
"uploadedOnDatetime": "2022-02-03T00:21:18Z"
}
],
"space": {
"allocatedBytes": 1000000000000,
"usedBytes": 1637281596,
"remainingBytes": 998362718404
}
}

Delete a file

You can delete a file by sending a DELETE request to the Files API and specifying the filename to be deleted as a path parameter.

Sample request

The following example shows how to delete a file on Polaris:

curl --location --request DELETE "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files/kttm-2019-08-20.json.gz" \
--header "Authorization: Basic $POLARIS_API_KEY"

Sample response

The Files API returns the status code 204 No Content in response to a successful request. If the file does not exist in your Polaris staging environment, the Files API returns a 404 Not Found status code.

Examples

The following Python examples show how to upload a file named kttm-2019-08-20.json.gz to the Polaris staging area and verify that the upload successfully completed.

Upload file

This example shows you how to check if a file exists in the Polaris staging area and, if not, submit an upload request.

import os
import requests
import sys

# Replace placeholders with your information
ORGANIZATION_NAME = "example"
REGION = "us-east-1"
CLOUD_PROVIDER = "aws"
PROJECT_ID = "12375ffx-f7x4-4f0x-a1a6-3b3424987ee0"

# Supply the API key in the following string variable.
# Do not supply your API key credentials in production scripts and
# do not check them into version control systems.
# See https://docs.imply.io/polaris/api-keys for more information.
apikey = os.getenv("POLARIS_API_KEY")

# Supply the name of the file to upload in the following string variable
filename = "kttm-2019-08-20.json.gz"

# Store the endpoint for the Files API
url = f"https://{ORGANIZATION_NAME}.{REGION}.{CLOUD_PROVIDER}.api.imply.io/v1/projects/{PROJECT_ID}files"

def post_to_upload(url, filename):
def do_post():
files = {'file': open(filename, 'rb')}
headers = {
'Authorization': f'Basic {apikey}'
}
return requests.request("POST", url, headers=headers, files=files)

print(f"Submitting request to upload '{filename}'")
response = do_post()

# Raise an exception on response errors
response.raise_for_status()

return response


def get_to_check(url):
def do_get():
headers = {
'Authorization': f'Basic {apikey}'
}
return requests.get(url, headers=headers)

print(f"Checking for '{filename}' on Polaris")
response = do_get()

# Raise an exception on response errors
response.raise_for_status()

return response


def check_existing(url):
try:
response = get_to_check(url)
found_it = True
file_metadata = response.json()
except requests.HTTPError as exception:
found_it = False
file_metadata = {}

return found_it, file_metadata


# Make sure there isn't already a file with the same name in Polaris
upload_success, _ = check_existing(f"{url}/{filename}")

if upload_success == True:
sys.exit(f"'{filename}' is already on Polaris")

# Submit the upload
post_to_upload(url, filename)

# Check whether the file has been uploaded
upload_success, _ = check_existing(f"{url}/{filename}")
if upload_success:
print(f"Success! '{filename}' is uploaded on Polaris")

Verify upload

This example shows you how to periodically poll the Files API to check on the upload of a file. When the file upload is confirmed, the MD5 checksums are compared to verify data integrity.

import os
import hashlib
import requests
import time

# Replace placeholders with your information
ORGANIZATION_NAME = "example"
REGION = "us-east-1"
CLOUD_PROVIDER = "aws"
PROJECT_ID = "12375ffx-f7x4-4f0x-a1a6-3b3424987ee0/

# Supply the API key in the following string variable.
# Do not supply your API key credentials in production scripts and
# do not check them into version control systems.
# See https://docs.imply.io/polaris/api-keys for more information.
apikey = os.getenv("POLARIS_API_KEY")

# Supply the name of the file to upload in the following string variable
filename = "kttm-2019-08-20.json.gz"

# Store the endpoint for the Files API
url = f"https://{ORGANIZATION_NAME}.{REGION}.{CLOUD_PROVIDER}.api.imply.io/v1/projects/{PROJECT_ID}/files"

file_md5 = hashlib.md5(open(filename,'rb').read()).hexdigest()

def get_to_check(url):
def do_get():
headers = {
'Authorization': f'Basic {apikey}'
}
return requests.get(url, headers=headers)

print(f"Checking for '{filename}' on Polaris")
response = do_get()

# Raise an exception on response errors
response.raise_for_status()

return response

def check_existing(url):
try:
response = get_to_check(url)
found_it = True
file_metadata = response.json()
except requests.HTTPError as exception:
found_it = False
file_metadata = {}

return found_it, file_metadata

# Check whether the file has been uploaded
upload_success, file_metadata = check_existing(f"{url}/{filename}")

if not upload_success:
start = time.time()
print("Checking for the upload every 5 seconds")

while not upload_success:
elapsed = time.time() - start
print(f"\nSeconds elapsed since submitting the upload request: {elapsed:.2f}")

# Check whether the file has been uploaded
upload_success, file_metadata = check_existing(f"{url}/{filename}")

# Wait 5 seconds before checking again
time.sleep(5)

print(f"\n'{filename}' is uploaded on Polaris")

# Verify data integrity by comparing md5sum hashes
polaris_hash = file_metadata["digest"]["hash"].lower()
if file_md5 == polaris_hash:
print(f"The md5sum check succeeds for '{filename}'.")
else:
print(f"Error: The md5sum hashes do not match for '{filename}'.")

print(f"Local value: '{file_md5}'\n"
f"Polaris value: '{polaris_hash}'")

Learn more

See the following topics for more information: