Skip to main content

Ingest using SQL by API

info

Project-less regional API resources have been deprecated and will be removed by the end of September 2024.

You must include the project ID in the URL for all regional API calls in projects created after September 29, 2023. For example: https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID

Projects created before September 29, 2023 can continue to use project-less URLs until the end of September 2024. We strongly recommend updating your regional API calls to include the project ID prior to September 2024. See the API migration guide for more information.

You can use SQL to define batch ingestion jobs in Imply Polaris. With SQL-based ingestion, you describe the source data, destination table, and any data transformations in a single SQL statement. SQL-based ingestion is an alternative approach to using the ingestion job spec to create your batch ingestion job.

This topic shows you how to submit a SQL-based ingestion job using the API.

Prerequisites

This topic assumes that you have the following:

  • A batch ingestion source.
  • An API key with the ManageIngestionJobs permission. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

You do not have to create a table before starting an ingestion job. When you set createTableIfNotExists to true in the ingestion job spec, Polaris automatically determines the table attributes from the job spec. For details, see Automatically created tables.

Submit SQL-based ingestion job

Send a POST request to the /v1/projects/PROJECT_ID/jobs endpoint to create a SQL-based ingestion job. Define a SQL-based ingestion job with the following properties:

  • type: Set the job type to sql.

  • query: The SQL statement that describes the ingestion job. For reference on syntax and to see example queries, see Ingest using SQL. Ensure that fields in quotation marks in the SQL statement are properly escaped so that the request payload is in valid JSON format.

info

You can view the SQL for previous ingestion jobs even if they weren't sql-type ingestion jobs. Polaris returns this information in the query property of the Get job details endpoint.

  • context: You can optionally set context parameters to configure query planning. For more information, see Context parameters.

  • parameters: If your query is a parameterized SQL query, use this property to specify the data types and values for the dynamic parameters. For an example, see Submit parameterized SQL ingestion.

See the Jobs v1 API documentation for more information.

Sample request

The following example creates a SQL-based ingestion job to ingest data into example-table from an uploaded file called wikipedia.json.gz:

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--user ${POLARIS_API_KEY}: \
--header "Content-Type: application/json" \
--data-raw '{
"type": "sql",
"createTableIfNotExists": true,
"query": "INSERT INTO \"example-table\" SELECT TIME_PARSE(\"timestamp\") AS __time, * FROM TABLE(POLARIS_UPLOADED(files => ARRAY['\''wikipedia.json.gz'\''], format => '\''json'\'')) EXTEND(\"channel\" VARCHAR, \"isRobot\" VARCHAR, \"timestamp\" VARCHAR) PARTITIONED BY DAY"
}'

Sample response

The following example shows a successful response:

Click to view the response
{
"query": "INSERT INTO \"example-table\" SELECT TIME_PARSE(\"timestamp\") AS __time, * FROM TABLE(POLARIS_UPLOADED(files => ARRAY['wikipedia.json.gz'], format => 'json')) EXTEND(\"channel\" VARCHAR, \"isRobot\" VARCHAR, \"timestamp\" VARCHAR) PARTITIONED BY DAY",
"target": {
"tableName": "example-table",
"intervals": [],
"type": "table"
},
"createTableIfNotExists": true,
"context": {},
"parameters": [],
"type": "sql",
"id": "d4ee23e1-7b8d-4941-9b5f-ad6520c79a62",
"desiredExecutionStatus": "running",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"executionStatus": "pending",
"health": {
"status": "ok"
},
"createdTimestamp": "2023-04-11T00:30:51.00199Z",
"lastUpdatedTimestamp": "2023-04-11T00:30:51.00199Z",
"startedTimestamp": null,
"completedTimestamp": null
}

Submit parameterized SQL ingestion

The example uses the following SQL to ingest into example_table2 from example_table1 with time-based filters on the data to ingest. You pass the values for the ordered parameters in the parameters property of the request body.

INSERT INTO "example_table2"
SELECT *
FROM "example_table1"
WHERE __time >= ? AND __time < ?
PARTITIONED BY DAY

The complete request body for the parameterized query is as follows:

{
"type": "sql",
"query": "INSERT INTO \"example_table2\" SELECT * FROM \"example_table1\" WHERE __time >= ? AND __time < ? PARTITIONED BY DAY",
"parameters": [
{
"type": "TIMESTAMP",
"value": "2023-03-01 00:00:00"
},
{
"type": "TIMESTAMP",
"value": "2023-05-01 00:00:00"
}
]
}

Sample request

The following example shows how to ingest data from another table, where the ingested data lies within two timestamp parameters:

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/jobs" \
--user ${POLARIS_API_KEY}: \
--header "Content-Type: application/json" \
--header 'Authorization: Basic cG9rX1ZJcEdKUkttZFBwYTNWcWpKTVZZMzgyMmhRTklpdHg0R2F3ck9VRHVnV1pNWDF0ZlJudW5GY0tlblpVWkJKSlZ5Tzo=' \
--data '{
"type": "sql",
"query": "INSERT INTO \"example_table2\" SELECT * FROM \"example_table1\" WHERE __time >= ? AND __time < ? PARTITIONED BY DAY",
"parameters": [
{
"type": "TIMESTAMP",
"value": "2023-03-01 00:00:00"
},
{
"type": "TIMESTAMP",
"value": "2023-05-01 00:00:00"
}
]
}'

Sample response

The following example shows a successful response:

Click to view the response
{
"query": "INSERT INTO \"example_table2\" SELECT * FROM \"example_table1\" WHERE __time >= ? AND __time < ? PARTITIONED BY DAY",
"context": {
"mode": "nonStrict",
"clusterStatisticsMergeMode": "SEQUENTIAL",
"finalizeAggregations": true,
"taskAssignment": "auto",
"groupByEnableMultiValueUnnesting": false,
"maxNumTasks": 75,
"faultTolerance": true,
"durableShuffleStorage": true
},
"parameters": [
{
"type": "TIMESTAMP",
"value": "2023-03-01 00:00:00"
},
{
"type": "TIMESTAMP",
"value": "2023-05-01 00:00:00"
}
],
"target": {
"tableName": "example_table2",
"type": "table",
"intervals": []
},
"createTableIfNotExists": true,
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"createdTimestamp": "2023-07-26T22:55:54.919807Z",
"desiredExecutionStatus": "running",
"executionStatus": "pending",
"health": {
"status": "ok"
},
"id": "ee9a3580-62dc-4987-a12b-24b49e30f478",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdatedTimestamp": "2023-07-26T22:55:54.919807Z",
"spec": {
"query": "INSERT INTO \"example_table2\" SELECT * FROM \"example_table1\" WHERE __time >= ? AND __time < ? PARTITIONED BY DAY",
"context": {
"mode": "nonStrict",
"clusterStatisticsMergeMode": "SEQUENTIAL",
"finalizeAggregations": true,
"taskAssignment": "auto",
"groupByEnableMultiValueUnnesting": false,
"maxNumTasks": 75,
"faultTolerance": true,
"durableShuffleStorage": true
},
"parameters": [
{
"type": "TIMESTAMP",
"value": "2023-03-01 00:00:00"
},
{
"type": "TIMESTAMP",
"value": "2023-05-01 00:00:00"
}
],
"type": "sql",
"desiredExecutionStatus": "running"
},
"type": "sql",
"completedTimestamp": null,
"startedTimestamp": null
}

Learn more

For reference on SQL syntax as well as examples of queries for SQL-based ingestion, see Ingest using SQL.