Upload files by API
The Files API lets you upload and manage files in the Imply Polaris file staging area. We recommend completing all file uploads before starting ingestion. After uploading your files, you can use the UI or the API to ingest the data into a Polaris table.
For information on the data formats and compression formats supported by Polaris, see Supported formats.
This topic walks you through the process to upload files with the Files API. If you want to skip the details, check out the examples.
Prerequisites
This topic assumes that you have an API key with the ManageFiles
permission.
In the examples below, the key value is stored in the variable named POLARIS_API_KEY
.
See Authenticate with API keys to obtain an API key.
For more information on permissions, visit Permissions reference.
Upload a file to staging
To upload a file, submit a POST
request to the Files API.
You can only upload one file, up to 2 GB in size, in a single request.
This limit refers to the size of the file transmitted by the browser or HTTP client.
You may upload a file that's larger than 2 GB on disk if your browser or client compresses the file in transit to below 2 GB.
In the request, include form data enclosing the path and name of the file to upload.
For cURL requests, precede the file path with file=@
without any spaces.
If you get an error indicating that your request does not contain a file to upload, double check that you've included the @
sign.
File names in Polaris must be unique.
Uploading a file with the same name as a previously uploaded file results in a 409 Conflict
error.
The media type of the HTTP request is multipart/form-data
; however, you should not set the Content-Type: multipart/form-data
header explicitly. Allow your client to assign the header automatically because it will also properly set the associated boundary
directive.
Sample request
The following example shows how to upload a file called kttm-2019-08-20.json.gz
to Polaris:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--form 'file=@"kttm-2019-08-20.json.gz"'
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files"
apikey = os.getenv("POLARIS_API_KEY")
files = {'file': open('kttm-2019-08-20.json.gz', 'rb')}
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("POST", url, headers=headers, files=files)
print(response.text)
Sample response
The following example shows a response to a successful file upload:
{
"id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
"name": "kttm-2019-08-20.json.gz",
"sizeBytes": 13800837,
"dataFormat": "nd-json",
"compressionFormat": "gz",
"digest": {
"algo": "md5",
"hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
},
"uploadedByUserId": "d3c723aa-52f2-4ab0-b23f-7b5c4aaf3ded",
"uploadedOnDatetime": "2022-02-03T00:21:18.164186373Z"
}
List all files
To view the files uploaded in the staging area of Polaris, issue a GET
request to the Files API.
By default, Polaris returns 1000 files per page of results.
To access subsequent pages of results, use the paginationOffset
query parameter in your request. For example,
GET /v1/projects/PROJECT_ID/files?paginationOffset=1000
To get information on a particular file, specify the name of the file in the request URL. For example,
GET /v1/projects/PROJECT_ID/files/kttm-2019-08-19.json.gz
For additional query parameters, see the Files API documentation.
Sample request
The following example shows how to list files uploaded to Polaris:
- cURL
- Python
curl --location --request GET "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files" \
--header "Authorization: Basic $POLARIS_API_KEY"
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files"
apikey = os.getenv("POLARIS_API_KEY")
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("GET", url, headers=headers)
print(response.text)
Sample response
The following example shows a successful response listing the files and associated metadata:
{
"files": [
{
"id": "1c3b309b-6a0f-4a46-a1ad-07e0793b83fd",
"name": "kttm-2019-08-19.json.gz",
"sizeBytes": 8213401,
"dataFormat": "nd-json",
"compressionFormat": "gz",
"digest": {
"algo": "md5",
"hash": "E370CAC6A8D6A938A08279200883D7B3"
},
"uploadedOnDatetime": "2022-02-02T15:07:50Z"
},
{
"id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
"name": "kttm-2019-08-20.json.gz",
"sizeBytes": 13800837,
"dataFormat": "nd-json",
"compressionFormat": "gz",
"digest": {
"algo": "md5",
"hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
},
"uploadedOnDatetime": "2022-02-03T00:21:18Z"
}
],
"space": {
"allocatedBytes": 1000000000000,
"usedBytes": 1637281596,
"remainingBytes": 998362718404
}
}
Delete a file
You can delete a file by sending a DELETE
request to the Files API and specifying the filename to be deleted as a path parameter.
Sample request
The following example shows how to delete a file on Polaris:
- cURL
- Python
curl --location --request DELETE "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files/kttm-2019-08-20.json.gz" \
--header "Authorization: Basic $POLARIS_API_KEY"
import os
import requests
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/files/kttm-2019-08-20.json.gz"
apikey = os.getenv("POLARIS_API_KEY")
headers = {
'Authorization': f'Basic {apikey}'
}
response = requests.request("DELETE", url, headers=headers)
print(response.text)
Sample response
The Files API returns the status code 204 No Content
in response to a successful request.
If the file does not exist in your Polaris staging environment, the Files API returns a 404 Not Found
status code.
Examples
The following Python examples show how to upload a file named kttm-2019-08-20.json.gz
to the Polaris staging area and verify that the upload successfully completed.
Upload file
This example shows you how to check if a file exists in the Polaris staging area and, if not, submit an upload request.
import os
import requests
import sys
# Replace placeholders with your information
ORGANIZATION_NAME = "example"
REGION = "us-east-1"
CLOUD_PROVIDER = "aws"
PROJECT_ID = "12375ffx-f7x4-4f0x-a1a6-3b3424987ee0"
# Supply the API key in the following string variable.
# Do not supply your API key credentials in production scripts and
# do not check them into version control systems.
# See https://docs.imply.io/polaris/api-keys for more information.
apikey = os.getenv("POLARIS_API_KEY")
# Supply the name of the file to upload in the following string variable
filename = "kttm-2019-08-20.json.gz"
# Store the endpoint for the Files API
url = f"https://{ORGANIZATION_NAME}.{REGION}.{CLOUD_PROVIDER}.api.imply.io/v1/projects/{PROJECT_ID}files"
def post_to_upload(url, filename):
def do_post():
files = {'file': open(filename, 'rb')}
headers = {
'Authorization': f'Basic {apikey}'
}
return requests.request("POST", url, headers=headers, files=files)
print(f"Submitting request to upload '{filename}'")
response = do_post()
# Raise an exception on response errors
response.raise_for_status()
return response
def get_to_check(url):
def do_get():
headers = {
'Authorization': f'Basic {apikey}'
}
return requests.get(url, headers=headers)
print(f"Checking for '{filename}' on Polaris")
response = do_get()
# Raise an exception on response errors
response.raise_for_status()
return response
def check_existing(url):
try:
response = get_to_check(url)
found_it = True
file_metadata = response.json()
except requests.HTTPError as exception:
found_it = False
file_metadata = {}
return found_it, file_metadata
# Make sure there isn't already a file with the same name in Polaris
upload_success, _ = check_existing(f"{url}/{filename}")
if upload_success == True:
sys.exit(f"'{filename}' is already on Polaris")
# Submit the upload
post_to_upload(url, filename)
# Check whether the file has been uploaded
upload_success, _ = check_existing(f"{url}/{filename}")
if upload_success:
print(f"Success! '{filename}' is uploaded on Polaris")
Verify upload
This example shows you how to periodically poll the Files API to check on the upload of a file. When the file upload is confirmed, the MD5 checksums are compared to verify data integrity.
import os
import hashlib
import requests
import time
# Replace placeholders with your information
ORGANIZATION_NAME = "example"
REGION = "us-east-1"
CLOUD_PROVIDER = "aws"
PROJECT_ID = "12375ffx-f7x4-4f0x-a1a6-3b3424987ee0/
# Supply the API key in the following string variable.
# Do not supply your API key credentials in production scripts and
# do not check them into version control systems.
# See https://docs.imply.io/polaris/api-keys for more information.
apikey = os.getenv("POLARIS_API_KEY")
# Supply the name of the file to upload in the following string variable
filename = "kttm-2019-08-20.json.gz"
# Store the endpoint for the Files API
url = f"https://{ORGANIZATION_NAME}.{REGION}.{CLOUD_PROVIDER}.api.imply.io/v1/projects/{PROJECT_ID}/files"
file_md5 = hashlib.md5(open(filename,'rb').read()).hexdigest()
def get_to_check(url):
def do_get():
headers = {
'Authorization': f'Basic {apikey}'
}
return requests.get(url, headers=headers)
print(f"Checking for '{filename}' on Polaris")
response = do_get()
# Raise an exception on response errors
response.raise_for_status()
return response
def check_existing(url):
try:
response = get_to_check(url)
found_it = True
file_metadata = response.json()
except requests.HTTPError as exception:
found_it = False
file_metadata = {}
return found_it, file_metadata
# Check whether the file has been uploaded
upload_success, file_metadata = check_existing(f"{url}/{filename}")
if not upload_success:
start = time.time()
print("Checking for the upload every 5 seconds")
while not upload_success:
elapsed = time.time() - start
print(f"\nSeconds elapsed since submitting the upload request: {elapsed:.2f}")
# Check whether the file has been uploaded
upload_success, file_metadata = check_existing(f"{url}/{filename}")
# Wait 5 seconds before checking again
time.sleep(5)
print(f"\n'{filename}' is uploaded on Polaris")
# Verify data integrity by comparing md5sum hashes
polaris_hash = file_metadata["digest"]["hash"].lower()
if file_md5 == polaris_hash:
print(f"The md5sum check succeeds for '{filename}'.")
else:
print(f"Error: The md5sum hashes do not match for '{filename}'.")
print(f"Local value: '{file_md5}'\n"
f"Polaris value: '{polaris_hash}'")
Learn more
See the following topics for more information:
- Files v1 API for reference on managing files in Polaris.
- Ingest data from files for ingesting the uploaded files to a table in Polaris.