Ingest nested data by API
In Imply Polaris, you can directly ingest nested JSON data and run queries to extract information from the nested objects. For example, consider ingesting the following record:
{
"time":"2022-6-14T10:32:08Z",
"product":"Keyboard",
"department":"Computers",
"shipTo":{
"firstName": "Sandra",
"lastName": "Beatty",
"address": {
"street": "293 Grant Well",
"city": "Loischester",
"state": "FL",
"country": "TV",
"postalCode": "88845-0066"
},
"phoneNumbers": [
{"type":"primary","number":"1-788-771-7028 x8627" },
{"type":"secondary","number":"1-460-496-4884 x887"}
]
},
"details":{
"color":"plum",
"price":"40.00"
}
}
The json
data type in Polaris lets you store and access nested JSON data at both ingestion and query time.
Keep the following in mind when ingesting nested data in Polaris:
- You can only ingest nested data to detail tables, not aggregate tables.
- A nested data column is a dimension column with the data type
json
. - You create nested data columns in the ingestion job using an identity mapping in the ingestion job spec that references the input column as a
json
data type. - You can optionally apply SQL JSON functions in input expressions to parse nested data before ingestion.
This topic shows how to ingest and query nested data using the Polaris API.
Prerequisites
This topic requires the following:
Data in the file
nested_example_data.json
. Upload this file to your Polaris staging area. To upload files using the Files API, visit Upload files.An API key with the
ManageTables
,ManageIngestionJobs
, andAccessQueries
permissions.
In the examples below, Polaris reads the key value from the variable namedPOLARIS_API_KEY
. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.
You do not have to create a table before starting an ingestion job. When you set createTableIfNotExists
to true
in the ingestion job spec, Polaris automatically determines the table attributes from the job spec.
For details, see Automatically created tables.
Ingest and transform JSON data
Send a POST
request to the /v1/projects/PROJECT_ID/jobs
endpoint to create an ingestion job.
In the inputSchema
array of the request payload, assign the json
data type for the JSON fields in the source data.
The mappings
array of the request payload describes a one-to-one relationship between the fields of the input data and the columns of the Polaris table.
You can map JSON fields from the source data to a table column with an identity mapping.
If your data format is in CSV or another delimiter-separated format, assign the string
data type for the JSON fields in the input schema.
Define the mappings expression for the JSON column as PARSE_JSON(\"inputField\")
.
You can also extract and transform fields using an input expression.
For example, to populate the price
column, apply an input expression referencing the details
field of the input data; the expression extracts the nested price information from details
.
JSON_VALUE(\"details\", '$.price')
The ingestion job spec in this example also directs Polaris to automatically create the table if it doesn't exist. Polaris creates the table with the following columns:
__time
: the primary timestamp of the tableproduct
: a string column with product namesdetails
: a JSON column containing product descriptorsdepartment
: a string column with department labelsshipTo
: a JSON column with user details and contact informationprice
: a string column with price information extracted fromdetails
. Note that this string column contains numeric data. To ensure thatprice
is a numeric type such asfloat
, you can create the table and declare its schema before starting ingestion.
For more information on transforming data at ingestion, see Map and transform data with input expressions.
See the Jobs v1 API documentation for a description of all required parameters to create an ingestion job.
Sample request
The following example request creates an ingestion job for the nested_demo
table using the nested_example_data.json
file.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"type": "batch",
"target": {
"type": "table",
"tableName": "nested_demo"
},
"createTableIfNotExists": true,
"source": {
"type": "uploaded",
"fileList": [
"nested_example_data.json"
],
"inputSchema": [
{
"name": "time",
"dataType": "string"
},
{
"name": "product",
"dataType": "string"
},
{
"name": "details",
"dataType": "json"
},
{
"name": "department",
"dataType": "string"
},
{
"name": "shipTo",
"dataType": "json"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")"
},
{
"columnName": "product",
"expression": "\"product\""
},
{
"columnName": "details",
"expression": "\"details\""
},
{
"columnName": "department",
"expression": "\"department\""
},
{
"columnName": "shipTo",
"expression": "\"shipTo\""
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')"
}
]
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "batch",
"target": {
"type": "table",
"tableName": "nested_demo"
},
"createTableIfNotExists": True,
"source": {
"type": "uploaded",
"fileList": [
"nested_example_data.json"
],
"inputSchema": [
{
"name": "time",
"dataType": "string"
},
{
"name": "product",
"dataType": "string"
},
{
"name": "details",
"dataType": "json"
},
{
"name": "department",
"dataType": "string"
},
{
"name": "shipTo",
"dataType": "json"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")"
},
{
"columnName": "product",
"expression": "\"product\""
},
{
"columnName": "details",
"expression": "\"details\""
},
{
"columnName": "department",
"expression": "\"department\""
},
{
"columnName": "shipTo",
"expression": "\"shipTo\""
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')"
}
]
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
A successful request returns a 201 Created
response and the ingestion job details.
Click to view the response
{
"source": {
"fileList": [
"nested_example_data.json"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "time"
},
{
"dataType": "string",
"name": "product"
},
{
"dataType": "json",
"name": "details"
},
{
"dataType": "string",
"name": "department"
},
{
"dataType": "json",
"name": "shipTo"
}
],
"type": "uploaded"
},
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")",
"isAggregation": null
},
{
"columnName": "product",
"expression": "\"product\"",
"isAggregation": null
},
{
"columnName": "details",
"expression": "\"details\"",
"isAggregation": null
},
{
"columnName": "department",
"expression": "\"department\"",
"isAggregation": null
},
{
"columnName": "shipTo",
"expression": "\"shipTo\"",
"isAggregation": null
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "INSERT INTO \"nested_demo\"\nSELECT\n TIME_PARSE(\"time\") AS \"__time\",\n \"product\" AS \"product\",\n \"details\" AS \"details\",\n \"department\" AS \"department\",\n \"shipTo\" AS \"shipTo\",\n JSON_VALUE(\"details\", '$.price') AS \"price\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"fileList\":[\"nested_example_data.json\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"time\"},{\"dataType\":\"string\",\"name\":\"product\"},{\"dataType\":\"json\",\"name\":\"details\"},{\"dataType\":\"string\",\"name\":\"department\"},{\"dataType\":\"json\",\"name\":\"shipTo\"}],\"type\":\"uploaded\"}'\n )\n)\n\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-09-12T21:56:53.432058431Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "018a8b65-b638-72c2-a65c-b50663d57e8d",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-09-12T21:56:53.432058431Z",
"spec": {
"source": {
"fileList": [
"nested_example_data.json"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "time"
},
{
"dataType": "string",
"name": "product"
},
{
"dataType": "json",
"name": "details"
},
{
"dataType": "string",
"name": "department"
},
{
"dataType": "json",
"name": "shipTo"
}
],
"type": "uploaded"
},
"target": {
"tableName": "nested_demo",
"type": "table",
"intervals": []
},
"createTableIfNotExists": true,
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")",
"isAggregation": null
},
{
"columnName": "product",
"expression": "\"product\"",
"isAggregation": null
},
{
"columnName": "details",
"expression": "\"details\"",
"isAggregation": null
},
{
"columnName": "department",
"expression": "\"department\"",
"isAggregation": null
},
{
"columnName": "shipTo",
"expression": "\"shipTo\"",
"isAggregation": null
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"type": "batch",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "nested_demo",
"type": "table",
"intervals": []
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}
Query ingested JSON data
Use the Query API to query the nested data in your table.
The following example queries the number of products sold in the west coast of the United States:
SELECT
product,
COUNT(product) AS pdt_cnt
FROM nested_demo
WHERE JSON_VALUE(shipTo, '$.address.state') in ('WA', 'OR', 'CA', 'HI', 'AK')
GROUP BY 1
ORDER BY 2 DESC
See Nested columns for information on supported Druid SQL functions and example queries.
Sample request
The following example queries the nested_demo
table:
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"query": "SELECT product, COUNT(product) AS pdt_cnt FROM nested_demo WHERE JSON_VALUE(shipTo, '$.address.state') in ('WA', 'OR', 'CA', 'HI', 'AK') GROUP BY 1 ORDER BY 2 DESC"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"query": "SELECT product, COUNT(product) AS pdt_cnt FROM nested_demo WHERE JSON_VALUE(shipTo, '$.address.state') in ('WA', 'OR', 'CA', 'HI', 'AK') GROUP BY 1 ORDER BY 2 DESC"
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
A successful request returns a 200 OK
response code and the query result, for example:
[
{
"product": "Bike",
"pdt_cnt": 2
},
{
"product": "Mouse",
"pdt_cnt": 1
},
{
"product": "Sausages",
"pdt_cnt": 1
}
]
Learn more
See the following topics for more information:
- Tables v1 API for reference on creating tables.
- Jobs v1 API for reference on initiating ingestion jobs.
- Nested data for information on support for nested data in the Polaris UI.