• Developer guide
  • API reference

›Developer guide

Getting started

  • Introduction to Imply Polaris
  • Quickstart
  • Execute a POC
  • Create a dashboard
  • Navigate the console
  • Key concepts

Tables and data

  • Overview
  • Introduction to tables
  • Table schema
  • Create an ingestion job
  • Timestamp expressions
  • Data partitioning
  • Introduction to rollup
  • Approximation algorithms
  • Replace data

Ingestion sources

  • Ingestion sources overview
  • Supported data formats
  • Create a connection
  • Ingest from files
  • Ingest data from a table
  • Ingest from S3
  • Ingest from Kafka and MSK
  • Ingest from Kinesis
  • Ingest from Confluent Cloud
  • Kafka Connector for Imply Polaris
  • Push event data
  • Connect to Confluent Schema Registry

Analytics

  • Overview
  • Manage data cubes
  • Visualize data
  • Data cube dimensions
  • Data cube measures
  • Dashboards
  • Visualizations reference
  • Set up alerts
  • Set up reports
  • Embed visualizations
  • Query data

Monitoring

  • Overview

Management

  • Overview
  • Pause and resume a project

Billing

  • Overview
  • Polaris plans
  • Estimate project costs

Usage

  • Overview

Security

    Polaris access

    • Overview
    • Invite users to your organization
    • Manage users
    • Permissions reference
    • Manage user groups
    • Enable SSO
    • SSO settings reference
    • Map IdP groups

    Secure networking

    • Connect to AWS
    • Create AWS PrivateLink connection

Developer guide

  • Overview
  • Authentication

    • Overview
    • Authenticate with API keys
    • Authenticate with OAuth
  • Manage users and groups
  • Migrate deprecated resources
  • Create a table
  • Define a schema
  • Upload files
  • Create an ingestion job
  • Ingestion sources

    • Ingest from files
    • Ingest from a table
    • Get ARN for AWS access
    • Ingest from Amazon S3
    • Ingest from Kafka and MSK
    • Ingest from Amazon Kinesis
    • Ingest from Confluent Cloud
    • Push event data
    • Kafka Connector for Imply Polaris
    • Kafka Connector reference
  • Filter data to ingest
  • Ingest nested data
  • Ingest and query sketches
  • Specify data schema
  • Query data
  • Update a project
  • Link to BI tools
  • Connect over JDBC
  • Query parameters reference
  • API documentation

    • OpenAPI reference
    • Query API

Product info

  • Release notes
  • Known limitations
  • Druid extensions

Upload files by API

The Files API lets you upload and manage files in the Imply Polaris file staging area. We recommend completing all file uploads before starting ingestion. After uploading your files, you can use the UI or the API to ingest the data into a Polaris table.

For information on the data formats and compression formats supported by Polaris, see Supported formats.

This topic walks you through the process to upload files with the Files API. If you want to skip the details, check out the examples.

Prerequisites

This topic assumes that you have an API key with the ManageFiles permission. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. See Authenticate with API keys to obtain an API key. For more information on permissions, visit Permissions reference.

Upload a file to staging

To upload a file, submit a POST request to the Files API. You can only upload one file, up to 2 GB in size, in a single request. This limit refers to the size of the file transmitted by the browser or HTTP client. You may upload a file that's larger than 2 GB on disk if your browser or client compresses the file in transit to below 2 GB.

In the request, include form data enclosing the path and name of the file to upload. For cURL requests, precede the file path with file=@ without any spaces. If you get an error indicating that your request does not contain a file to upload, double check that you've included the @ sign.

File names in Polaris must be unique. Uploading a file with the same name as a previously uploaded file results in a 409 Conflict error.

The media type of the HTTP request is multipart/form-data; however, you should not set the Content-Type: multipart/form-data header explicitly. Allow your client to assign the header automatically because it will also properly set the associated boundary directive.

Sample request

The following example shows how to upload a file called kttm-2019-08-20.json.gz to Polaris:

cURL
Python
curl --location --request POST 'https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/files' \
--user ${POLARIS_API_KEY}: \
--form 'file=@"kttm-2019-08-20.json.gz"'
import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/files"

apikey = os.getenv("POLARIS_API_KEY")

files = {'file': open('kttm-2019-08-20.json.gz', 'rb')}

headers = {
'Authorization': f'Basic {apikey}'
}

response = requests.request("POST", url, headers=headers, files=files)

print(response.text)

Sample response

The following example shows a response to a successful file upload:

{
    "id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
    "name": "kttm-2019-08-20.json.gz",
    "sizeBytes": 13800837,
    "dataFormat": "nd-json",
    "compressionFormat": "gz",
    "digest": {
        "algo": "md5",
        "hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
    },
    "uploadedByUserId": "d3c723aa-52f2-4ab0-b23f-7b5c4aaf3ded",
    "uploadedOnDatetime": "2022-02-03T00:21:18.164186373Z"
}

List all files

To view the files uploaded in the staging area of Polaris, issue a GET request to the Files API. You can also use this endpoint with the name of the file in the request URL to get information on a particular file.

Sample request

The following example shows how to list files uploaded to Polaris:

cURL
Python
curl --location --request GET 'https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/files' \
--user ${POLARIS_API_KEY}:
import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/files"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
'Authorization': f'Basic {apikey}'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

Sample response

The following example shows a successful response listing the files and associated metadata:

{
    "files": [
        {
            "id": "1c3b309b-6a0f-4a46-a1ad-07e0793b83fd",
            "name": "kttm-2019-08-19.json.gz",
            "sizeBytes": 8213401,
            "dataFormat": "nd-json",
            "compressionFormat": "gz",
            "digest": {
                "algo": "md5",
                "hash": "E370CAC6A8D6A938A08279200883D7B3"
            },
            "uploadedOnDatetime": "2022-02-02T15:07:50Z"
        },
        {
            "id": "3576326d-5fd5-4fdc-97aa-88d72bcbd7e9",
            "name": "kttm-2019-08-20.json.gz",
            "sizeBytes": 13800837,
            "dataFormat": "nd-json",
            "compressionFormat": "gz",
            "digest": {
                "algo": "md5",
                "hash": "62D2E21501AA2BCF3F6BC1A5AC21A862"
            },
            "uploadedOnDatetime": "2022-02-03T00:21:18Z"
        }
    ],
    "space": {
        "allocatedBytes": 1000000000000,
        "usedBytes": 1637281596,
        "remainingBytes": 998362718404
    }
}

Delete a file

You can delete a file by sending a DELETE request to the Files API and specifying the filename to be deleted as a path parameter.

Sample request

The following example shows how to delete a file on Polaris:

cURL
Python
curl --location --request DELETE 'https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/files/kttm-2019-08-20.json.gz' \
--user ${POLARIS_API_KEY}:
import os
import requests

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/files/kttm-2019-08-20.json.gz"

apikey = os.getenv("POLARIS_API_KEY")

headers = {
'Authorization': f'Basic {apikey}'
}

response = requests.request("DELETE", url, headers=headers)

print(response.text)

Sample response

The Files API returns the status code 204 No Content in response to a successful request. If the file does not exist in your Polaris staging environment, the Files API returns a 404 Not Found status code.

Examples

The following Python examples show how to upload a file named kttm-2019-08-20.json.gz to the Polaris staging area and verify that the upload successfully completed.

Upload file

This example shows you how to check if a file exists in the Polaris staging area and, if not, submit an upload request.

import os
import requests
import sys

# Replace name with your organization
ORGANIZATION_NAME = "imply"

# Supply the API key in the following string variable.
# Do not supply your API key credentials in production scripts and 
# do not check them into version control systems.
# See https://docs.imply.io/polaris/api-keys for more information.
apikey = os.getenv("POLARIS_API_KEY")

# Supply the name of the file to upload in the following string variable
filename = "kttm-2019-08-20.json.gz"

# Store the endpoint for the Files API
url = f"https://{ORGANIZATION_NAME}.api.imply.io/v1/files"

def post_to_upload(url, filename):
    def do_post():
        files = {'file': open(filename, 'rb')}
        headers = {
            'Authorization': f'Basic {apikey}'
        }
        return requests.request("POST", url, headers=headers, files=files)

    print(f"Submitting request to upload '{filename}'")
    response = do_post()

    # Raise an exception on response errors
    response.raise_for_status()

    return response


def get_to_check(url):
    def do_get():
        headers = {
            'Authorization': f'Basic {apikey}'
        }
        return requests.get(url, headers=headers)

    print(f"Checking for '{filename}' on Polaris")
    response = do_get()

    # Raise an exception on response errors
    response.raise_for_status()

    return response


def check_existing(url):
    try:
        response = get_to_check(url)
        found_it = True
        file_metadata = response.json()
    except requests.HTTPError as exception:
        found_it = False
        file_metadata = {}

    return found_it, file_metadata


# Make sure there isn't already a file with the same name in Polaris
upload_success, _ = check_existing(f"{url}/{filename}")

if upload_success == True:
    sys.exit(f"'{filename}' is already on Polaris")

# Submit the upload
post_to_upload(url, filename)

# Check whether the file has been uploaded
upload_success, _ = check_existing(f"{url}/{filename}")
if upload_success:
    print(f"Success! '{filename}' is uploaded on Polaris")

Verify upload

This example shows you how to periodically poll the Files API to check on the upload of a file. When the file upload is confirmed, the MD5 checksums are compared to verify data integrity.

import os
import hashlib
import requests
import time

# Replace name with your organization
ORGANIZATION_NAME = "imply"

# Supply the API key in the following string variable.
# Do not supply your API key credentials in production scripts and 
# do not check them into version control systems.
# See https://docs.imply.io/polaris/api-keys for more information.
apikey = os.getenv("POLARIS_API_KEY")

# Supply the name of the file to upload in the following string variable
filename = "kttm-2019-08-20.json.gz"

# Store the endpoint for the Files API
url = f"https://{ORGANIZATION_NAME}.api.imply.io/v1/files"

file_md5 = hashlib.md5(open(filename,'rb').read()).hexdigest()

def get_to_check(url):
    def do_get():
        headers = {
            'Authorization': f'Basic {apikey}'
        }
        return requests.get(url, headers=headers)

    print(f"Checking for '{filename}' on Polaris")
    response = do_get()

    # Raise an exception on response errors
    response.raise_for_status()

    return response

def check_existing(url):
    try:
        response = get_to_check(url)
        found_it = True
        file_metadata = response.json()
    except requests.HTTPError as exception:
        found_it = False
        file_metadata = {}

    return found_it, file_metadata

# Check whether the file has been uploaded
upload_success, file_metadata = check_existing(f"{url}/{filename}")

if not upload_success:
    start = time.time()
    print("Checking for the upload every 5 seconds")

while not upload_success:
    elapsed = time.time() - start
    print(f"\nSeconds elapsed since submitting the upload request: {elapsed:.2f}")

    # Check whether the file has been uploaded
    upload_success, file_metadata = check_existing(f"{url}/{filename}")

    # Wait 5 seconds before checking again
    time.sleep(5)

print(f"\n'{filename}' is uploaded on Polaris")

# Verify data integrity by comparing md5sum hashes
polaris_hash = file_metadata["digest"]["hash"].lower()
if file_md5 == polaris_hash:
    print(f"The md5sum check succeeds for '{filename}'.")
else:
    print(f"Error: The md5sum hashes do not match for '{filename}'.")

print(f"Local value: '{file_md5}'\n"
      f"Polaris value:  '{polaris_hash}'")

Learn more

See the following topics for more information:

  • Files v1 API for reference on managing files in Polaris.
  • Ingest data from files for ingesting the uploaded files to a table in Polaris.
← Define a schemaCreate an ingestion job →
  • Prerequisites
  • Upload a file to staging
    • Sample request
    • Sample response
  • List all files
    • Sample request
    • Sample response
  • Delete a file
    • Sample request
    • Sample response
  • Examples
    • Upload file
    • Verify upload
  • Learn more
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
BlogApache Druid docs
Copyright © 2023 Imply Data, Inc