Skip to main content

Ingest nested data by API

In Imply Polaris, you can directly ingest nested JSON data and run queries to extract information from the nested objects. For example, consider ingesting the following record:

{
"time":"2022-6-14T10:32:08Z",
"product":"Keyboard",
"department":"Computers",
"shipTo":{
"firstName": "Sandra",
"lastName": "Beatty",
"address": {
"street": "293 Grant Well",
"city": "Loischester",
"state": "FL",
"country": "TV",
"postalCode": "88845-0066"
},
"phoneNumbers": [
{"type":"primary","number":"1-788-771-7028 x8627" },
{"type":"secondary","number":"1-460-496-4884 x887"}
]
},
"details":{
"color":"plum",
"price":"40.00"
}
}

The json data type in Polaris lets you store and access nested JSON data at both ingestion and query time.

Keep the following in mind when ingesting nested data in Polaris:

  • You can only ingest nested data to detail tables, not aggregate tables.
  • A nested data column is a dimension column with the data type json.
  • You create nested data columns in the ingestion job using an identity mapping in the ingestion job spec that references the input column as a json data type.
  • You can optionally apply SQL JSON functions in input expressions to parse nested data before ingestion.

This topic shows how to ingest and query nested data using the Polaris API.

Prerequisites

This topic requires the following:

  • Data in the file nested_example_data.json. Upload this file to your Polaris staging area. To upload files using the Files API, visit Upload files.

  • An API key with the ManageTables, ManageIngestionJobs, and AccessQueries permissions.
    In the examples below, Polaris reads the key value from the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

You do not have to create a table before starting an ingestion job. When you set createTableIfNotExists to true in the ingestion job spec, Polaris automatically determines the table attributes from the job spec. For details, see Automatically created tables.

Ingest and transform JSON data

Send a POST request to the /v1/projects/PROJECT_ID/jobs endpoint to create an ingestion job. In the inputSchema array of the request payload, assign the json data type for the JSON fields in the source data.

The mappings array of the request payload describes a one-to-one relationship between the fields of the input data and the columns of the Polaris table. You can map JSON fields from the source data to a table column with an identity mapping.

info

If your data format is in CSV or another delimiter-separated format, assign the string data type for the JSON fields in the input schema. Define the mappings expression for the JSON column as PARSE_JSON(\"inputField\").

You can also extract and transform fields using an input expression. For example, to populate the price column, apply an input expression referencing the details field of the input data; the expression extracts the nested price information from details.

JSON_VALUE(\"details\", '$.price')

The ingestion job spec in this example also directs Polaris to automatically create the table if it doesn't exist. Polaris creates the table with the following columns:

  • __time: the primary timestamp of the table
  • product: a string column with product names
  • details: a JSON column containing product descriptors
  • department: a string column with department labels
  • shipTo: a JSON column with user details and contact information
  • price: a string column with price information extracted from details. Note that this string column contains numeric data. To ensure that price is a numeric type such as float, you can create the table and declare its schema before starting ingestion.

For more information on transforming data at ingestion, see Map and transform data with input expressions.

See the Jobs v1 API documentation for a description of all required parameters to create an ingestion job.

Sample request

The following example request creates an ingestion job for the nested_demo table using the nested_example_data.json file.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"type": "batch",
"target": {
"type": "table",
"tableName": "nested_demo"
},
"createTableIfNotExists": true,
"source": {
"type": "uploaded",
"fileList": [
"nested_example_data.json"
],
"inputSchema": [
{
"name": "time",
"dataType": "string"
},
{
"name": "product",
"dataType": "string"
},
{
"name": "details",
"dataType": "json"
},
{
"name": "department",
"dataType": "string"
},
{
"name": "shipTo",
"dataType": "json"
}
],
"formatSettings": {
"format": "nd-json"
}
},
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")"
},
{
"columnName": "product",
"expression": "\"product\""
},
{
"columnName": "details",
"expression": "\"details\""
},
{
"columnName": "department",
"expression": "\"department\""
},
{
"columnName": "shipTo",
"expression": "\"shipTo\""
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')"
}
]
}'

Sample response

A successful request returns a 201 Created response and the ingestion job details.

Click to view the response
{
"source": {
"fileList": [
"nested_example_data.json"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "time"
},
{
"dataType": "string",
"name": "product"
},
{
"dataType": "json",
"name": "details"
},
{
"dataType": "string",
"name": "department"
},
{
"dataType": "json",
"name": "shipTo"
}
],
"type": "uploaded"
},
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")",
"isAggregation": null
},
{
"columnName": "product",
"expression": "\"product\"",
"isAggregation": null
},
{
"columnName": "details",
"expression": "\"details\"",
"isAggregation": null
},
{
"columnName": "department",
"expression": "\"department\"",
"isAggregation": null
},
{
"columnName": "shipTo",
"expression": "\"shipTo\"",
"isAggregation": null
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"query": "INSERT INTO \"nested_demo\"\nSELECT\n TIME_PARSE(\"time\") AS \"__time\",\n \"product\" AS \"product\",\n \"details\" AS \"details\",\n \"department\" AS \"department\",\n \"shipTo\" AS \"shipTo\",\n JSON_VALUE(\"details\", '$.price') AS \"price\"\nFROM TABLE(\n POLARIS_SOURCE(\n '{\"fileList\":[\"nested_example_data.json\"],\"formatSettings\":{\"flattenSpec\":{},\"format\":\"nd-json\"},\"inputSchema\":[{\"dataType\":\"string\",\"name\":\"time\"},{\"dataType\":\"string\",\"name\":\"product\"},{\"dataType\":\"json\",\"name\":\"details\"},{\"dataType\":\"string\",\"name\":\"department\"},{\"dataType\":\"json\",\"name\":\"shipTo\"}],\"type\":\"uploaded\"}'\n )\n)\n\n\nPARTITIONED BY DAY",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-09-12T21:56:53.432058431Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "018a8b65-b638-72c2-a65c-b50663d57e8d",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-09-12T21:56:53.432058431Z",
"spec": {
"source": {
"fileList": [
"nested_example_data.json"
],
"formatSettings": {
"flattenSpec": {},
"format": "nd-json"
},
"inputSchema": [
{
"dataType": "string",
"name": "time"
},
{
"dataType": "string",
"name": "product"
},
{
"dataType": "json",
"name": "details"
},
{
"dataType": "string",
"name": "department"
},
{
"dataType": "json",
"name": "shipTo"
}
],
"type": "uploaded"
},
"target": {
"tableName": "nested_demo",
"type": "table",
"intervals": []
},
"createTableIfNotExists": true,
"filterExpression": null,
"ingestionMode": "append",
"mappings": [
{
"columnName": "__time",
"expression": "TIME_PARSE(\"time\")",
"isAggregation": null
},
{
"columnName": "product",
"expression": "\"product\"",
"isAggregation": null
},
{
"columnName": "details",
"expression": "\"details\"",
"isAggregation": null
},
{
"columnName": "department",
"expression": "\"department\"",
"isAggregation": null
},
{
"columnName": "shipTo",
"expression": "\"shipTo\"",
"isAggregation": null
},
{
"columnName": "price",
"expression": "JSON_VALUE(\"details\", '$.price')",
"isAggregation": null
}
],
"maxParseExceptions": 2147483647,
"type": "batch",
"desiredExecutionStatus": "running"
},
"target": {
"tableName": "nested_demo",
"type": "table",
"intervals": []
},
"type": "batch",
"completedTimestamp": null,
"startedTimestamp": null
}

Query ingested JSON data

Use the Query API to query the nested data in your table.

The following example queries the number of products sold in the west coast of the United States:

SELECT
product,
COUNT(product) AS pdt_cnt
FROM nested_demo
WHERE JSON_VALUE(shipTo, '$.address.state') in ('WA', 'OR', 'CA', 'HI', 'AK')
GROUP BY 1
ORDER BY 2 DESC

See Nested columns for information on supported Druid SQL functions and example queries.

Sample request

The following example queries the nested_demo table:

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/query/sql" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"query": "SELECT product, COUNT(product) AS pdt_cnt FROM nested_demo WHERE JSON_VALUE(shipTo, '$.address.state') in ('WA', 'OR', 'CA', 'HI', 'AK') GROUP BY 1 ORDER BY 2 DESC"
}'

Sample response

A successful request returns a 200 OK response code and the query result, for example:

[
{
"product": "Bike",
"pdt_cnt": 2
},
{
"product": "Mouse",
"pdt_cnt": 1
},
{
"product": "Sausages",
"pdt_cnt": 1
}
]

Learn more

See the following topics for more information:

  • Tables v1 API for reference on creating tables.
  • Jobs v1 API for reference on initiating ingestion jobs.
  • Nested data for information on support for nested data in the Polaris UI.