›Developer guide

Get started

  • Introduction to Imply Polaris
  • Quickstart
  • Navigate the console
  • Key concepts

Data

  • Overview
  • Create a schema
  • Data partitioning
  • Introduction to rollup
  • Replace data
  • Supported data formats

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Dimensions
  • Measures
  • Dashboards
  • Visualizations reference
  • Query data

Monitoring

  • Overview

Management

  • Overview

Billing

  • Overview
  • Polaris plans
  • Estimate project costs
  • Manage billing and payments

Security

  • Overview
  • Add users to an organization
  • User roles reference
  • Manage user groups
  • Enable SSO
  • SSO settings reference

Developer guide

  • Overview
  • Authenticate API requests
  • Create a table
  • Get table ID
  • Define a schema
  • Upload files
  • Ingest to table
  • Push event data
  • Query data
  • Link to BI tools
  • Connect over JDBC

API reference

  • Overview
  • Reference index
  • Events API
  • Files API
  • Ingestion Jobs API
  • Ingestion Templates API
  • Performance API
  • Query API
  • Tables API
  • Common object definitions

    • Table
    • TableRequest
    • RollupSchema
    • IngestionJobSpec
    • CsvFormatSettings
    • JsonFormatSettings
    • TimestampMapping

Product info

  • Release notes
  • Known limitations

Upload files by API

The Files API lets you upload and manage files in the Imply Polaris file staging area. We recommend completing all file uploads before starting ingestion. After uploading your files, you can use the UI or the API to ingest the data into a Polaris table.

For information on the data formats and compression formats supported by Polaris, see Supported formats.

This topic walks you through the process to upload files with the Files API. If you want to skip the details, check out the examples.

Prerequisites

This topic assumes that you have an OAuth token with the ManageFiles role. In the examples below, the token value is stored in the variable named IMPLY_TOKEN. See Authenticate API requests to obtain an access token and assign service account roles. Visit User roles reference for more information on roles and their permissions.

Upload a file to staging

To upload a file, submit a POST request to the Files API. You can only upload one file, up to 2 GB in size, in a single request. This limit refers to the size of the file transmitted by the browser or HTTP client. You may upload a file that's larger than 2 GB on disk if your browser or client compresses the file in transit to below 2 GB.

In the request, include form data enclosing the path and name of the file to upload. For cURL requests, precede the file path with file=@ without any spaces. If you get an error indicating that your request does not contain a file to upload, double check that you've included the @ sign.

File names in Polaris must be unique. Uploading a file with the same name as a previously uploaded file results in a 409 Conflict error.

The media type of the HTTP request is multipart/form-data; however, you should not set the Content-Type: multipart/form-data header explicitly. Allow your client to assign the header automatically because it will also properly set the associated boundary directive.

Sample request

The following example shows how to upload a file called kttm-2019-08-20.json.gz to Polaris:

cURL
Python
curl --location --request POST 'https://api.imply.io/v1/files' \
--header "Authorization: Bearer $IMPLY_TOKEN" \
--form 'file=@"kttm-2019-08-20.json.gz"'
import requests

url = "https://api.imply.io/v1/files"

files = {'file': open('kttm-2019-08-20.json.gz', 'rb')}

headers = {
'Authorization': 'Bearer {token}'.format(token=IMPLY_TOKEN)
}

response = requests.request("POST", url, headers=headers, files=files)

print(response.text)

Sample response

The following example shows a response to a successful file upload:

{
    "id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
    "name": "kttm-2019-08-20.json.gz",
    "sizeBytes": 13800837,
    "dataFormat": "nd-json",
    "compressionFormat": "gz",
    "digest": {
        "algo": "md5",
        "hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
    },
    "uploadedByUserId": "d3c723aa-52f2-4ab0-b23f-7b5c4aaf3ded",
    "uploadedOnDatetime": "2022-02-03T00:21:18.164186373Z"
}

List all files

To view the files uploaded in the staging area of Polaris, issue a GET request to the Files API. You can also use this endpoint with the name of the file in the request URL to get information on a particular file.

Sample request

The following example shows how to list files uploaded to Polaris:

cURL
Python
curl --location --request GET 'https://api.imply.io/v1/files' \
--header "Authorization: Bearer $IMPLY_TOKEN"
import requests

url = "https://api.imply.io/v1/files"

headers = {
'Authorization': 'Bearer {token}'.format(token=IMPLY_TOKEN)
}

response = requests.request("GET", url, headers=headers)

print(response.text)

Sample response

The following example shows a successful response listing the files and associated metadata:

{
    "files": [
        {
            "id": "1c3b309b-6a0f-4a46-a1ad-07e0793b83fd",
            "name": "kttm-2019-08-19.json.gz",
            "sizeBytes": 8213401,
            "dataFormat": "nd-json",
            "compressionFormat": "gz",
            "digest": {
                "algo": "md5",
                "hash": "E370CAC6A8D6A938A08279200883D7B3"
            },
            "uploadedOnDatetime": "2022-02-02T15:07:50Z"
        },
        {
            "id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
            "name": "kttm-2019-08-20.json.gz",
            "sizeBytes": 13800837,
            "dataFormat": "nd-json",
            "compressionFormat": "gz",
            "digest": {
                "algo": "md5",
                "hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
            },
            "uploadedOnDatetime": "2022-02-03T00:21:18Z"
        }
    ],
    "space": {
        "allocatedBytes": 1000000000000,
        "usedBytes": 1637281596,
        "remainingBytes": 998362718404
    }
}

Delete a file

You can delete a file by sending a DELETE request to the Files API and specifying the filename to be deleted as a path parameter.

Sample request

The following example shows how to delete a file on Polaris:

cURL
Python
curl --location --request DELETE 'https://api.imply.io/v1/files/kttm-2019-08-20.json.gz' \
--header "Authorization: Bearer $IMPLY_TOKEN"
import requests

url = "https://api.imply.io/v1/files/kttm-2019-08-20.json.gz"

headers = {
'Authorization': 'Bearer {token}'.format(token=IMPLY_TOKEN)
}

response = requests.request("DELETE", url, headers=headers)

print(response.text)

Sample response

The Files API returns the status code 204 No Content in response to a successful request. If the file does not exist in your Polaris staging environment, the Files API returns a 404 Not Found status code.

Examples

The following Python examples show how to upload a file named kttm-2019-08-20.json.gz to the Polaris staging area and verify that the upload successfully completed.

Upload file

This example shows you how to check if a file exists in the Polaris staging area and, if not, submit an upload request.

import requests
import sys

# Replace name with your organization
ORG_NAME = ""

# Supply the client ID and client secret in the following string variables.
# Do not supply OAuth credentials in production scripts and
# do not check them into version control systems.
# See https://docs.imply.io/polaris/oauth/ for more information.
CLIENT_ID = ""
CLIENT_SECRET = ""

# Supply the name of the file to upload in the following string variable
FILENAME = "kttm-2019-08-20.json.gz"

# Store endpoints for Polaris OAuth API and Files API
TOKEN_ENDPOINT = "https://id.imply.io/auth/realms/{org_name}/protocol/openid-connect/token".format(org_name=ORG_NAME)
FILES_ENDPOINT = "https://api.imply.io/v1/files"

access_token = None

def update_token():
    global access_token

    params = {
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET,
        "grant_type": "client_credentials",
    }

    response = requests.post(TOKEN_ENDPOINT, data=params)
    response.raise_for_status()

    access_token = response.json()['access_token']


def post_to_upload(url, filename):
    def do_post():
        files = {'file': open(filename, 'rb')}
        headers = {
            "Authorization": "Bearer {token}".format(token=access_token)
        }
        return requests.request("POST", url, headers=headers, files=files)

    print(f"Submitting request to upload '{filename}'")
    response = do_post()

    # If the token expired, refresh it and try the request again
    while response.status_code == 401:
        print("Refreshing token")
        update_token()
        response = do_post()

    # Raise an exception on response errors
    response.raise_for_status()

    return response


def get_to_check(url):
    def do_get():
        headers = {
            "Authorization": "Bearer {token}".format(token=access_token)
        }
        return requests.get(url, headers=headers)

    print(f"Checking for '{FILENAME}' on Polaris")
    response = do_get()

    # If the token expired, refresh it and try the request again
    while response.status_code == 401:
        print("Refreshing token")
        update_token()
        response = do_get()

    # Raise an exception on response errors
    response.raise_for_status()

    return response


def check_existing(url):
    try:
        response = get_to_check(url)
        found_it = True
        file_metadata = response.json()
    except requests.HTTPError as exception:
        found_it = False
        file_metadata = {}

    return found_it, file_metadata


# Make sure there isn't already a file with the same name in Polaris
upload_success, _ = check_existing(f"{FILES_ENDPOINT}/{FILENAME}")

if upload_success == True:
    sys.exit(f"'{FILENAME}' is already on Polaris")

# Submit the upload
post_to_upload(FILES_ENDPOINT, FILENAME)

# Check whether the file has been uploaded
upload_success, _ = check_existing(f"{FILES_ENDPOINT}/{FILENAME}")
if upload_success:
    print(f"Success! '{FILENAME}' is uploaded on Polaris")

Verify upload

This example shows you how to periodically poll the Files API to check on the upload of a file. When the file upload is confirmed, the MD5 checksums are compared to verify data integrity.

import hashlib
import requests
import time

# Replace name with your organization
ORG_NAME = ""

# Supply the client ID and client secret in the following string variables.
# Do not supply OAuth credentials in production scripts and
# do not check them into version control systems.
# See https://docs.imply.io/polaris/oauth/ for more information.
CLIENT_ID = ""
CLIENT_SECRET = ""

# Supply the name of the file to upload in the following string variable
FILENAME = "kttm-2019-08-20.json.gz"

# Store endpoints for Polaris OAuth API and Files API
TOKEN_ENDPOINT = "https://id.imply.io/auth/realms/{org_name}/protocol/openid-connect/token".format(org_name=ORG_NAME)
FILES_ENDPOINT = "https://api.imply.io/v1/files"

access_token = None
file_md5 = hashlib.md5(open(FILENAME,'rb').read()).hexdigest()

def update_token():
    global access_token

    params = {
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET,
        "grant_type": "client_credentials",
    }

    response = requests.post(TOKEN_ENDPOINT, data=params)
    response.raise_for_status()

    access_token = response.json()['access_token']


def get_to_check(url):
    def do_get():
        headers = {
            "Authorization": "Bearer {token}".format(token=access_token)
        }
        return requests.get(url, headers=headers)

    print(f"Checking for '{FILENAME}' on Polaris")
    response = do_get()

    # If the token expired, refresh it and try the request again
    while response.status_code == 401:
        print("Refreshing token")
        update_token()
        response = do_get()

    # Raise an exception on response errors
    response.raise_for_status()

    return response


def check_existing(url):
    try:
        response = get_to_check(url)
        found_it = True
        file_metadata = response.json()
    except requests.HTTPError as exception:
        found_it = False
        file_metadata = {}

    return found_it, file_metadata

# Check whether the file has been uploaded
upload_success, file_metadata = check_existing(f"{FILES_ENDPOINT}/{FILENAME}")

if not upload_success:
    start = time.time()
    print("Checking for the upload every 5 seconds")

while not upload_success:
    elapsed = time.time() - start
    print(f"\nSeconds elapsed since submitting the upload request: {elapsed:.2f}")

    # Check whether the file has been uploaded
    upload_success, file_metadata = check_existing(f"{FILES_ENDPOINT}/{FILENAME}")

    # Wait 5 seconds before checking again
    time.sleep(5)

print(f"\n'{FILENAME}' is uploaded on Polaris")

# Verify data integrity by comparing md5sum hashes
polaris_hash = file_metadata["digest"]["hash"].lower()
if file_md5 == polaris_hash:
    print(f"The md5sum check succeeds for '{FILENAME}'.")
else:
    print(f"Error: The md5sum hashes do not match for '{FILENAME}'.")

print(f"Local value: '{file_md5}'\n"
      f"Polaris value:  '{polaris_hash}'")

Learn more

See the following topics for more information:

  • Files API for reference on managing files in Polaris.
  • Ingest to table for ingesting the uploaded files to a table in Polaris.
← Define a schemaIngest to table →
  • Prerequisites
  • Upload a file to staging
    • Sample request
    • Sample response
  • List all files
    • Sample request
    • Sample response
  • Delete a file
    • Sample request
    • Sample response
  • Examples
    • Upload file
    • Verify upload
  • Learn more
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2022 Imply Data, Inc