Ingest data from files by API

This topic walks you through the process to ingest data from uploaded files into a table with the Jobs v1 API.

For a list of all ingestion options, see Sources.

Prerequisites

Before ingesting from files, you need the following:

Files uploaded to Polaris. The example in this topic uses the following files:
- kttm-2019-08-19.json.gz
- kttm-2019-08-20.json.gz
To upload files using the Files API, visit Upload files.
An API key with the ManageIngestionJobs permission. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

Load data into a table

Launch a batch ingestion job to import data from your uploaded files to a destination table in Polaris. Submit a POST request to the Jobs v1 API and pass the job specification as a payload to the request. The job spec is a JSON object that requires the following fields:

type: string representing the type of job. Set this property to batch for batch ingestion.
target: object describing the destination for ingested data. Within the target object, set the type to table and specify the Polaris table name in tableName. For example:
```
"target": {
    "type": "table",
    "tableName": "Koalas"
},
```
createTableIfNotExists: Boolean that directs Polaris to create the table if it doesn't already exist (false by default). When this property is true and the table does not exist, Polaris automatically creates the table using the framework in Automatically created tables.

source: object describing the source of input data. Within this object, supply the following values:

type: the type of source data. Set this to uploaded.
fileList: array of files to ingest.
inputSchema: the schema of the input data. You only need to define the columns that will be ingested.
formatSettings: the data format settings. All the files listed in an ingestion job must have the same format, such as newline-delimited JSON.

The following example shows a source object for batch ingestion:

"source": {
    "type": "uploaded",
    "fileList": [
        "kttm-2019-08-19.json.gz",
        "kttm-2019-08-20.json.gz"
    ],
    "inputSchema": [
        {
            "dataType": "string",
            "name": "timestamp"
        },
        {
            "dataType": "string",
            "name": "city"
        },
        {
            "dataType": "string",
            "name": "session"
        },
        {
            "dataType": "long",
            "name": "session_length"
        }
    ],
    "formatSettings": {
        "format": "nd-json"
    }
},

mappings: array describing how the input fields of the source data map to the columns of the Polaris table. Always quote your input fields to avoid syntax errors.

Map the timestamp field from the event request body to the __time column name. See Timestamp expressions for details on the input field requirements and expressions for time.

This example directly maps the input fields to the table columns with any transformations:
```
 "mappings": [
     {
         "columnName": "__time",
         "expression": "TIME_PARSE(\"timestamp\")"
     },
     {
         "columnName": "city",
         "expression": "\"city\""
     },
     {
         "columnName": "session",
         "expression": "\"session\""
     },
     {
         "columnName": "session_length",
         "expression": "\"session_length\""
     }
 ]
```
For details on transforming your data during ingestion, see Map and transform input fields.

Sample request

The following example shows how to load data from kttm-2019-08-19.json.gz and kttm-2019-08-20.json.gz into Koalas.

For more information about the request payload for creating an ingestion job, see the Jobs v1 API documentation.

cURL
Python

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
    "type": "batch",
    "target": {
        "type": "table",
        "tableName": "Koalas"
    },
    "createTableIfNotExists": true,
    "source": {
        "type": "uploaded",
        "fileList": [
            "kttm-2019-08-19.json.gz",
            "kttm-2019-08-20.json.gz"
        ],
        "inputSchema": [
            {
                "dataType": "string",
                "name": "timestamp"
            },
            {
                "dataType": "string",
                "name": "city"
            },
            {
                "dataType": "string",
                "name": "session"
            },
            {
                "dataType": "long",
                "name": "session_length"
            }
        ],
        "formatSettings": {
            "format": "nd-json"
        }
    },
    "mappings": [
        {
            "columnName": "__time",
            "expression": "TIME_PARSE(\"timestamp\")"
        },
        {
            "columnName": "city",
            "expression": "\"city\""
        },
        {
            "columnName": "session",
            "expression": "\"session\""
        },
        {
            "columnName": "session_length",
            "expression": "\"session_length\""
        }
    ]
}'

import os
import requests
import json

url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"

apikey = os.getenv("POLARIS_API_KEY")

payload = json.dumps({
  "type": "batch",
  "target": {
    "type": "table",
    "tableName": "Koalas"
  },
  "createTableIfNotExists": True,
  "source": {
    "type": "uploaded",
    "fileList": [
      "kttm-2019-08-19.json.gz",
      "kttm-2019-08-20.json.gz"
    ],
    "inputSchema": [
      {
        "dataType": "string",
        "name": "timestamp"
      },
      {
        "dataType": "string",
        "name": "city"
      },
      {
        "dataType": "string",
        "name": "session"
      },
      {
        "dataType": "long",
        "name": "session_length"
      }
    ],
    "formatSettings": {
      "format": "nd-json"
    }
  },
  "mappings": [
    {
      "columnName": "__time",
      "expression": "TIME_PARSE(\"timestamp\")"
    },
    {
      "columnName": "city",
      "expression": "\"city\""
    },
    {
      "columnName": "session",
      "expression": "\"session\""
    },
    {
      "columnName": "session_length",
      "expression": "\"session_length\""
    }
  ]
})
headers = {
  'Authorization': f'Basic {apikey}',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Sample response

The following example shows a response to a successful ingestion job launch:

View the response

{
    "source": {
        "fileList": [
            "kttm-2019-08-19.json.gz",
            "kttm-2019-08-20.json.gz"
        ],
        "formatSettings": {
            "flattenSpec": {},
            "format": "nd-json"
        },
        "inputSchema": [
            {
                "dataType": "string",
                "name": "timestamp"
            },
            {
                "dataType": "string",
                "name": "city"
            },
            {
                "dataType": "string",
                "name": "session"
            },
            {
                "dataType": "long",
                "name": "session_length"
            }
        ],
        "type": "uploaded"
    },
    "filterExpression": null,
    "ingestionMode": "append",
    "mappings": [
        {
            "columnName": "__time",
            "expression": "TIME_PARSE(\"timestamp\")",
            "isAggregation": null
        },
        {
            "columnName": "city",
            "expression": "\"city\"",
            "isAggregation": null
        },
        {
            "columnName": "session",
            "expression": "\"session\"",
            "isAggregation": null
        },
        {
            "columnName": "session_length",
            "expression": "\"session_length\"",
            "isAggregation": null
        }
    ],
    "maxParseExceptions": 2147483647,
    "query": "INSERT INTO \"Koalas\"\nSELECT\n  TIME_PARSE(\"timestamp\") AS \"__time\",\n    \"city\" AS \"city\",\n    \"session\" AS \"session\",\n    \"session_length\" AS \"session_length\"\nFROM TABLE(\n  POLARIS_SOURCE(\n    '{\"fileList\":[\"kttm-2019-08-19.json.gz\",\"kttm-2019-08-20.json.gz\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"timestamp\"},{\"dataType\":\"string\",\"name\":\"city\"},{\"dataType\":\"string\",\"name\":\"session\"},{\"dataType\":\"long\",\"name\":\"session_length\"}],\"type\":\"uploaded\"}'\n  )\n)\n\n\nPARTITIONED BY DAY",
    "createdBy": {
        "username": "api-key-pok_vipgj...bjjvyo",
        "userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
    },
    "createdTimestamp": "2023-09-11T22:00:34.547944346Z",
    "desiredExecutionStatus": "running",
    "executionStatus": "pending",
    "health": {
        "status": "ok"
    },
    "id": "018a8642-b9f3-7dbc-bee7-0e38c3e30c12",
    "lastModifiedBy": {
        "username": "api-key-pok_vipgj...bjjvyo",
        "userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
    },
    "lastUpdatedTimestamp": "2023-09-11T22:00:34.547944346Z",
    "spec": {
        "source": {
            "fileList": [
                "kttm-2019-08-19.json.gz",
                "kttm-2019-08-20.json.gz"
            ],
            "formatSettings": {
                "flattenSpec": {},
                "format": "nd-json"
            },
            "inputSchema": [
                {
                    "dataType": "string",
                    "name": "timestamp"
                },
                {
                    "dataType": "string",
                    "name": "city"
                },
                {
                    "dataType": "string",
                    "name": "session"
                },
                {
                    "dataType": "long",
                    "name": "session_length"
                }
            ],
            "type": "uploaded"
        },
        "target": {
            "tableName": "Koalas",
            "type": "table",
            "intervals": []
        },
        "createTableIfNotExists": true,
        "filterExpression": null,
        "ingestionMode": "append",
        "mappings": [
            {
                "columnName": "__time",
                "expression": "TIME_PARSE(\"timestamp\")",
                "isAggregation": null
            },
            {
                "columnName": "city",
                "expression": "\"city\"",
                "isAggregation": null
            },
            {
                "columnName": "session",
                "expression": "\"session\"",
                "isAggregation": null
            },
            {
                "columnName": "session_length",
                "expression": "\"session_length\"",
                "isAggregation": null
            }
        ],
        "maxParseExceptions": 2147483647,
        "type": "batch",
        "desiredExecutionStatus": "running"
    },
    "target": {
        "tableName": "Koalas",
        "type": "table",
        "intervals": []
    },
    "type": "batch",
    "completedTimestamp": null,
    "startedTimestamp": null
}

Learn more

See the following topics for more information:

Jobs v1 API for reference on working with ingestion jobs in Polaris.
Upload files by API for uploading files to Polaris using the API.
Create an ingestion job by API for information on creating, monitoring, and canceling an ingestion job.
Ingest from files for ingesting from files using the UI.

Prerequisites​

Load data into a table​

Sample request​

Sample response​

Learn more​

Prerequisites

Load data into a table

Sample request

Sample response

Learn more