Ingest inline data by API
Imply Polaris supports batch ingestion from data you provide inline. You provide inline data directly in the ingestion job spec that you submit to the Jobs API.
With inline data, Polaris supports the following data formats:
- Newline-delimited JSON (ND-JSON)
- CSV
- TSV
- Semicolon-separated values (
;
) - Pipe-separated values (
|
)
For all other formats, upload the file first and then ingest from it. For details on how to ingest from files using the API, see Ingest data from files by API. For a list of all ingestion source options, see Ingestion sources overview.
This topic shows how to use the API to create an ingestion job with inline data in the request.
For reference on providing inline data using SQL-based ingestion, see the EXTERN function.
Prerequisites
To ingest inline data, you need an API key with the ManageIngestionJobs
permission.
In the examples below, the key value is stored in the variable named POLARIS_API_KEY
.
For information about how to obtain an API key and assign permissions, see API key authentication.
For more information on permissions, visit Permissions reference.
Ingest inline data
Submit a POST
request to the /v1/projects/PROJECT_ID/jobs
endpoint to start an ingestion job.
You don't need to create a table before starting an ingestion job. Set createTableIfNotExists
to true
in the ingestion job spec to instruct Polaris to automatically determine the table attributes from the job spec.
For details, see Automatically created tables.
In the request payload, include the inline data in the source
parameter.
The source
object takes the following fields:
type
: Set toinline
.data
: String containing the raw data.ND-JSON example:
"{\"timestamp\": 1722553997421, \"color\": \"red\", \"value\": \"#f00\"}\n{\"timestamp\": 1722554089087, \"color\": \"blue\",\"value\": \"#00f\"}"
CSV example:
"0,1722553997421,values,formatted\n1,1722554089087,as,CSV"
formatSettings
: The format of the data, either{"format": "nd-json"}
or{"format": "csv"}
for ND-JSON or delimiter-separated values, respectively. Polaris supports comma (,
), tab (\t
), semicolon (;
), and pipe (|
) delimiters in inline data.inputSchema
: The schema of the input data as an array of objects each containingname
anddataType
descriptors.
The following example shows a full source
definition:
"source": {
"type": "inline",
"data": "{\"timestamp\": 1722553997421, \"color\": \"red\", \"value\": \"#f00\"}\n{\"timestamp\": 1722554089087, \"color\": \"blue\",\"value\": \"#00f\"}"
"inputSchema": [
{
"dataType": "long",
"name": "timestamp"
},
{
"dataType": "string",
"name": "color"
},
{
"dataType": "string",
"name": "value"
},
],
"formatSettings": {
"format": "nd-json"
}
},
For more information about the request payload for creating an ingestion job, see the Jobs v1 API documentation.
Sample request
The following example shows how to load inline data into a table named inline-colors
.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"type": "batch",
"target": {
"type": "table",
"tableName": "inline-colors"
},
"createTableIfNotExists": true,
"source": {
"type": "inline",
"data": "{\"timestamp\": 1722553997421, \"color\": \"red\", \"value\": \"#f00\"}\n{\"timestamp\": 1722554089087, \"color\": \"blue\",\"value\": \"#00f\"}",
"inputSchema": [
{
"dataType": "long",
"name": "timestamp"
},
{
"dataType": "string",
"name": "color"
},
{
"dataType": "string",
"name": "value"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "MILLIS_TO_TIMESTAMP(\"timestamp\")"
},
{
"columnName": "color",
"expression": "\"color\""
},
{
"columnName": "value",
"expression": "\"value\""
}
]
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "batch",
"target": {
"type": "table",
"tableName": "inline-colors"
},
"createTableIfNotExists": True,
"source": {
"type": "inline",
"data": "{\"timestamp\": 1722553997421, \"color\": \"red\", \"value\": \"#f00\"}\n{\"timestamp\": 1722554089087, \"color\": \"blue\",\"value\": \"#00f\"}",
"inputSchema": [
{
"dataType": "long",
"name": "timestamp"
},
{
"dataType": "string",
"name": "color"
},
{
"dataType": "string",
"name": "value"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "MILLIS_TO_TIMESTAMP(\"timestamp\")"
},
{
"columnName": "color",
"expression": "\"color\""
},
{
"columnName": "value",
"expression": "\"value\""
}
]
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a response to a successful ingestion job launch:
View the response
{
"source": {
"data": "{\"timestamp\": 1722553997421, \"color\": \"red\", \"value\": \"#f00\"}\n{\"timestamp\": 1722554089087, \"color\": \"blue\",\"value\": \"#00f\"}",
"inputSchema": [
{
"dataType": "long",
"name": "timestamp"
},
{
"dataType": "string",
"name": "color"
},
{
"dataType": "string",
"name": "value"
}
],
"formatSettings": {
"flattenSpec": null,
"format": "nd-json"
},
"type": "inline"
},
"context": {
"mode": "nonStrict",
"sqlQueryId": "0191104b-7e8e-72f5-9e2b-b2498a572a35",
"maxNumTasks": 75,
"faultTolerance": true,
"taskAssignment": "auto",
"maxParseExceptions": 2147483647,
"finalizeAggregations": true,
"durableShuffleStorage": true,
"catalogValidationEnabled": false,
"clusterStatisticsMergeMode": "SEQUENTIAL",
"groupByEnableMultiValueUnnesting": false
},
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "MILLIS_TO_TIMESTAMP(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "color",
"expression": "\"color\"",
"isAggregation": null
},
{
"columnName": "value",
"expression": "\"value\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "INSERT INTO \"inline-colors\"\nSELECT\n MILLIS_TO_TIMESTAMP(\"timestamp\") AS \"__time\",\n \"color\" AS \"color\",\n \"value\" AS \"value\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"data\":\"{\\\"timestamp\\\": 1722553997421, \\\"color\\\": \\\"red\\\", \\\"value\\\": \\\"#f00\\\"}\\n{\\\"timestamp\\\": 1722554089087, \\\"color\\\": \\\"blue\\\",\\\"value\\\": \\\"#00f\\\"}\",\"inputSchema\":[{\"dataType\":\"long\",\"name\":\"timestamp\"},{\"dataType\":\"string\",\"name\":\"color\"},{\"dataType\":\"string\",\"name\":\"value\"}],\"formatSettings\":{\"format\":\"nd-json\"},\"type\":\"inline\"}'\n )\n)\n\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_7udiv...xrujvd",
"userId": "b6340b70-3f30-4ccd-86a0-fe74ebfc7cbe"
},
"createdTimestamp": "2024-08-01T23:34:28.750852Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "0191104b-7e8e-72f5-9e2b-b2498a572a35",
"lastModifiedBy": {
"username": "api-key-pok_7udiv...xrujvd",
"userId": "b6340b70-3f30-4ccd-86a0-fe74ebfc7cbe"
},
"lastUpdatedTimestamp": "2024-08-01T23:34:28.750852Z",
"spec": {
"source": {
"data": "{\"timestamp\": 1722553997421, \"color\": \"red\", \"value\": \"#f00\"}\n{\"timestamp\": 1722554089087, \"color\": \"blue\",\"value\": \"#00f\"}",
"inputSchema": [
{
"dataType": "long",
"name": "timestamp"
},
{
"dataType": "string",
"name": "color"
},
{
"dataType": "string",
"name": "value"
}
],
"formatSettings": {
"flattenSpec": null,
"format": "nd-json"
},
"type": "inline"
},
"target": {
"tableName": "inline-colors",
"type": "table",
"intervals": null
},
"context": {
"mode": "nonStrict",
"sqlQueryId": "0191104b-7e8e-72f5-9e2b-b2498a572a35",
"maxNumTasks": 75,
"faultTolerance": true,
"taskAssignment": "auto",
"maxParseExceptions": 2147483647,
"finalizeAggregations": true,
"durableShuffleStorage": true,
"catalogValidationEnabled": false,
"clusterStatisticsMergeMode": "SEQUENTIAL",
"groupByEnableMultiValueUnnesting": false
},
"clusteringColumns": [],
"createTableIfNotExists": true,
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "MILLIS_TO_TIMESTAMP(\"timestamp\")",
"isAggregation": null
},
{
"columnName": "color",
"expression": "\"color\"",
"isAggregation": null
},
{
"columnName": "value",
"expression": "\"value\"",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"partitionedBy": null,
"replaceAll": false,
"type": "batch"
},
"target": {
"tableName": "inline-colors",
"type": "table",
"intervals": null
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}
Learn more
See the following topics for more information:
- Ingest inline data for more details on the inline data source.
- Jobs v1 API for reference on working with ingestion jobs in Polaris.
- Upload files by API for uploading files to Polaris using the API.
- Create an ingestion job by API for information on creating, monitoring, and canceling an ingestion job.