Skip to main content

Ingest data from files by API

This topic walks you through the process to ingest data from uploaded files into a table with the Jobs v1 API.

For a list of all ingestion options, see Sources.

Prerequisites

Before ingesting from files, you need the following:

Load data into a table

Launch a batch ingestion job to import data from your uploaded files to a destination table in Polaris. Submit a POST request to the Jobs v1 API and pass the job specification as a payload to the request. The job spec is a JSON object that requires the following fields:

  • type: string representing the type of job. Set this property to batch for batch ingestion.

  • target: object describing the destination for ingested data. Within the target object, set the type to table and specify the Polaris table name in tableName. For example:

    "target": {
    "type": "table",
    "tableName": "Koalas"
    },
  • createTableIfNotExists: Boolean that directs Polaris to create the table if it doesn't already exist (false by default). When this property is true and the table does not exist, Polaris automatically creates the table using the framework in Automatically created tables.

  • source: object describing the source of input data. Within this object, supply the following values:

    • type: the type of source data. Set this to uploaded.
    • fileList: array of files to ingest.
    • inputSchema: the schema of the input data. You only need to define the columns that will be ingested.
    • formatSettings: the data format settings. All the files listed in an ingestion job must have the same format, such as newline-delimited JSON.

    The following example shows a source object for batch ingestion:

    "source": {
    "type": "uploaded",
    "fileList": [
    "kttm-2019-08-19.json.gz",
    "kttm-2019-08-20.json.gz"
    ],
    "inputSchema": [
    {
    "dataType": "string",
    "name": "timestamp"
    },
    {
    "dataType": "string",
    "name": "city"
    },
    {
    "dataType": "string",
    "name": "session"
    },
    {
    "dataType": "long",
    "name": "session_length"
    }
    ],
    "formatSettings": {
    "format": "nd-json"
    }
    },
  • mappings: array describing how the input fields of the source data map to the columns of the Polaris table. Always quote your input fields to avoid syntax errors.

    Map the timestamp field from the event request body to the __time column name. See Timestamp expressions for details on the input field requirements and expressions for time.

    This example directly maps the input fields to the table columns with any transformations:

     "mappings": [
    {
    "columnName": "__time",
    "expression": "TIME_PARSE(\"timestamp\")"
    },
    {
    "columnName": "city",
    "expression": "\"city\""
    },
    {
    "columnName": "session",
    "expression": "\"session\""
    },
    {
    "columnName": "session_length",
    "expression": "\"session_length\""
    }
    ]

    For details on transforming your data during ingestion, see Map and transform input fields.

Sample request

The following example shows how to load data from kttm-2019-08-19.json.gz and kttm-2019-08-20.json.gz into Koalas.

For more information about the request payload for creating an ingestion job, see the Jobs v1 API documentation.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"type": "batch",
"target": {
"type": "table",
"tableName": "Koalas"
},
"createTableIfNotExists": true,
"source": {
"type": "uploaded",
"fileList": [
"kttm-2019-08-19.json.gz",
"kttm-2019-08-20.json.gz"
],
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")"
},
{
"columnName": "city",
"expression": "\"city\""
},
{
"columnName": "session",
"expression": "\"session\""
},
{
"columnName": "session_length",
"expression": "\"session_length\""
}
]
}'

Sample response

The following example shows a response to a successful ingestion job launch:

View the response
{
"source": {
"fileList": [
"kttm-2019-08-19.json.gz",
"kttm-2019-08-20.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "INSERT INTO \"Koalas\"\nSELECT\n TIME_PARSE(\"timestamp\") AS \"__time\",\n \"city\" AS \"city\",\n \"session\" AS \"session\",\n \"session_length\" AS \"session_length\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"fileList\":[\"kttm-2019-08-19.json.gz\",\"kttm-2019-08-20.json.gz\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"timestamp\"},{\"dataType\":\"string\",\"name\":\"city\"},{\"dataType\":\"string\",\"name\":\"session\"},{\"dataType\":\"long\",\"name\":\"session_length\"}],\"type\":\"uploaded\"}'\n )\n)\n\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-09-11T22:00:34.547944346Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "018a8642-b9f3-7dbc-bee7-0e38c3e30c12",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-09-11T22:00:34.547944346Z",
"spec": {
"source": {
"fileList": [
"kttm-2019-08-19.json.gz",
"kttm-2019-08-20.json.gz"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "timestamp"
},
{
"dataType": "string",
"name": "city"
},
{
"dataType": "string",
"name": "session"
},
{
"dataType": "long",
"name": "session_length"
}
],
"type": "uploaded"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": []
},
"createTableIfNotExists": true,
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "city",
"expression": "\"city\"",
"isAggregation": null
},
{
"columnName": "session",
"expression": "\"session\"",
"isAggregation": null
},
{
"columnName": "session_length",
"expression": "\"session_length\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"type": "batch",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "Koalas",
"type": "table",
"intervals": []
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}

Learn more

See the following topics for more information: