Create a table by API
You can create and manage tables in Imply Polaris using the Tables v1 API.
Tables are a central element to Polaris where you store and organize your data records.
When creating a table, you can also define a schema to specify the column names and column types in the table. You can define the table's schema at creation time, or you can define the schema after creating the table and before loading data.
You do not have to create a table before starting an ingestion job. When you set createTableIfNotExists
to true
in the ingestion job spec, Polaris automatically determines the table attributes from the job spec.
For details, see Automatically created tables.
This topic walks you through the process to create a table using the Polaris API.
Prerequisites
You must have an API key with the ManageTables
permission.
In the examples below, the key value is stored in the variable named POLARIS_API_KEY
.
For information about how to obtain an API key and assign permissions, see API key authentication.
For more information on permissions, visit Permissions reference.
Create a basic table
To create a table in Polaris, issue a POST
request to the Tables v1 API.
At a minimum, the following details are required to create a table:
- Table name. Table names must be unique in your Polaris organization.
Creating a table with the same name as a previously existing table results in a
409 Conflict
error. You use the table name in API requests to get information about a table, update a table, or ingest data into a table. - Table type. Polaris tables are either detail tables or aggregate tables. A detail table is a regular table that stores each record as it is ingested. An aggregate table combines multiple rows together based on the table’s rollup granularity and dimensions.
Keep the following in mind when creating a table:
- You cannot change a table's name or type.
- All tables have the default timestamp column
__time
whether or not you define a schema at creation time. - Tables are created in flexible mode by default.
Sample request
The following example shows a basic table request to create a detail table named Koalas
:
{
"name": "Koalas",
"type": "detail"
}
Polaris applies the default values for all settings not specified. For more details on properties you can set for a table, see Create a table with a schema.
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"name": "Koalas",
"type": "detail"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"name": "Koalas",
"type": "detail"
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response from creating a table.
Even if you don't define a schema when you create a table, Polaris adds the default timestamp column named __time
.
{
"schema": [
{
"name": "__time",
"dataType": "timestamp"
}
],
"type": "detail",
"name": "Koalas",
"id": "cc69003c-0c0f-430b-93e2-c60745d9329b",
"version": 0,
"createdTimestamp": "2023-04-14T00:05:50.882753433Z",
"lastUpdateTimestamp": "2023-04-14T00:05:50.882761613Z",
"availability": "available",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"partitioningGranularity": "day",
"totalDataSizeBytes": 0,
"totalRows": 0,
"clusteringColumns": [],
"description": null,
"schemaMode": "flexible",
"queryableSchema": []
}
If you get a 400 Bad Request
error, confirm that the request payload is properly formatted JSON and contains both the type
and name
fields.
Create a table with a schema
In addition to the name and table type, you can specify the following characteristics when you create a table:
Description. A description of the table limited to 1000 characters.
Schema mode. The schema enforcement mode of the table, either flexible or strict. The schema mode determines how Polaris enforces a schema on the table. The default mode is flexible.
For a flexible table, Polaris auto-discovers the table schema during ingestion.
In addition to skipping table schema declaration, flexible tables can also use schema auto-discovery on streaming ingestion jobs. This allows you to skip writing
inputSchema
andmappings
in the streaming ingestion job spec. For more information, see Create a streaming ingestion job by API.For a strict table, you must explicitly assign a schema to the table before ingesting data into the table. You can also declare columns in the table schema for a flexible table.
Table schema. The
schema
field of the request payload takes an array of column definitions.The schema requirement depends on the schema enforcement mode of the table. With strict mode, you must declare all columns of the table schema. With flexible mode, declaring columns is optional. The advantage of declaring columns in the schema is to enforce a strict schema for those columns. If a column is not declared, its data type may change as more data is ingested.
The format of a column definition depends on the table type.
For a detail table, declare each column with the following syntax:
{
"dataType": <data type of the column>,
"name": <name of the column>
},For an aggregate table, declare the column with the following syntax: Set the column type to
"dimension"
or"measure"
.{
"type": <type of the column>,
"dataType": <data type of the column>,
"name": <name of the column>
},
Partitioning schema. Polaris uses segment partitioning to decrease storage size and improve query speeds. Polaris always partitions your data by timestamp first. The following fields configure the table's partitioning schema:
partitioningGranularity
: time partitioningclusteringColumns
dimensions for secondary partitioning
Rollup schema. Only for aggregate tables, Polaris uses data rollup to aggregate raw data at predefined intervals. You specify the rollup granularity in the
timeResolution
field.If you want to set a custom duration or time zone and origin, supply a period granularity using
queryGranularity
. When set,queryGranularity
overrides the rollup granularity intimeResolution
. See Tables v1 API for more information.Storage policy. Use the
storagePolicy
property to optionally set a storage policy on the table. You can use a storage policy to configure automatic data deletion for data older than a certain time period. For information on creating a table with a storage policy, see Set a storage policy by API.
Sample request
The following example creates a table with the following characteristics:
- Aggregate table named "Koalas Subset"
- Strict schema mode with the table schema fully declared
- Data partitioned by day
- Data rolled up by hour
- Table schema
__time
: the primary timestamp required for all Polaris tablescity
: a string dimensionsession
: a string dimensionmax_session_length
: a long-typed measure containing the maximum session lengths for each rolled up row
The request body to create the example is as follows:
{
"type": "aggregate",
"name": "Koalas Subset",
"partitioningGranularity": "day",
"timeResolution": "hour",
"schemaMode": "strict",
"schema": [
{
"type": "dimension",
"name": "__time",
"dataType": "timestamp"
},
{
"type": "dimension",
"name": "city",
"dataType": "string"
},
{
"type": "dimension",
"name": "session",
"dataType": "string"
},
{
"type": "measure",
"name": "max_session_length",
"dataType": "long"
}
]
}
- cURL
- Python
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"type": "aggregate",
"name": "Koalas Subset",
"partitioningGranularity": "day",
"timeResolution": "hour",
"schemaMode": "strict",
"schema": [
{
"type": "dimension",
"name": "__time",
"dataType": "timestamp"
},
{
"type": "dimension",
"name": "city",
"dataType": "string"
},
{
"type": "dimension",
"name": "session",
"dataType": "string"
},
{
"type": "measure",
"name": "max_session_length",
"dataType": "long"
}
]
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "aggregate",
"name": "Koalas Subset",
"partitioningGranularity": "day",
"timeResolution": "hour",
"schemaMode": "strict",
"schema": [
{
"type": "dimension",
"name": "__time",
"dataType": "timestamp"
},
{
"type": "dimension",
"name": "city",
"dataType": "string"
},
{
"type": "dimension",
"name": "session",
"dataType": "string"
},
{
"type": "measure",
"name": "max_session_length",
"dataType": "long"
}
]
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response:
Click to view the response
{
"timeResolution": "hour",
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
},
{
"dataType": "string",
"type": "dimension",
"name": "city"
},
{
"dataType": "string",
"type": "dimension",
"name": "session"
},
{
"dataType": "long",
"queryAggregator": null,
"type": "measure",
"name": "max_session_length"
},
{
"dataType": "long",
"queryAggregator": "sum",
"type": "measure",
"name": "__count"
}
],
"type": "aggregate",
"name": "Koalas Subset",
"id": "754abbb2-c460-4771-a036-xxxxxxxxxxxx",
"version": 0,
"createdTimestamp": "2023-03-21T20:55:48.111406237Z",
"lastUpdateTimestamp": "2023-03-21T20:55:48.111413317Z",
"availability": "available",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"partitioningGranularity": "day",
"totalDataSizeBytes": 0,
"totalRows": 0,
"clusteringColumns": [],
"description": null,
"schemaMode": "strict"
}
For measures in aggregate tables, Polaris automatically sets the aggregation function used at query time based on the ingestion time aggregation.
This value is displayed in the response field queryAggregator
. You do not define the query aggregator when creating the table.
Add a schema to an existing table
To add or update a schema on a table, submit a PUT
request to the Tables API. Include the following information in your request:
name
: Name of the table. You supply this both as a path parameter in the URL and as part of the JSON payload sent with the request.type
: The table type. Eitherdetail
oraggregate
. A table's type cannot be changed.version
: The current version of the table. The version starts at 0 and increments each time you make a change to the table. Send a GET request to get table details to view the table type and version.schema
: The table schema in the form of a JSON object to specify columns and their corresponding data types. Strict tables require you to declare all columns in the table schema before you ingest data.
Polaris applies the updated schema to subsequent ingestion jobs and does not backfill data from previous jobs.
Existing ingestion jobs for strict tables do not automatically ingest new columns.
To ingest data into new columns for strict tables, start a new ingestion job and
include the new mapping from the input field to the table column in mappings
.
For more information, see Create an ingestion job.
Streaming ingestion jobs into flexible tables may use schema auto-discovery to automatically detect and ingest new input fields as undeclared columns without having to start a new ingestion job.
For more information on schema changes, see Updating a schema.
Sample request
The following example shows how to set the schema for a table named Koalas Geography
:
- cURL
- Python
curl --location --request PUT "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas Geography" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--header "Content-Type: application/json" \
--data-raw '{
"type": "detail",
"name": "Koalas Geography",
"version": 0,
"schema": [
{
"name": "__time",
"dataType": "timestamp"
},
{
"name": "continent",
"dataType": "string"
},
{
"name": "country",
"dataType": "string"
},
{
"name": "city",
"dataType": "string"
}
]
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas Geography"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "detail",
"name": "Koalas Geography",
"version": 0,
"schema": [
{
"name": "__time",
"dataType": "timestamp"
},
{
"name": "continent",
"dataType": "string"
},
{
"name": "country",
"dataType": "string"
},
{
"name": "city",
"dataType": "string"
}
]
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("PUT", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response for setting the table's input schema:
Click to view the response
{
"type": "detail",
"name": "Koalas Geography",
"id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"version": 1,
"lastUpdateTimestamp": "2022-07-21T23:02:47.354643482Z",
"availability": "available",
"createdBy": {
"username": "service-account-docs-demo",
"userId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
},
"lastModifiedBy": {
"username": "service-account-docs-demo",
"userId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
},
"partitioningGranularity": "day",
"totalDataSizeBytes": 0,
"totalRows": 0,
"schema": [
{
"name": "__time",
"dataType": "timestamp"
},
{
"name": "continent",
"dataType": "string"
},
{
"name": "country",
"dataType": "string"
},
{
"name": "city",
"dataType": "string"
}
]
}
An invalid table schema results in a 400 Bad Request
status code with a JSON body describing the error.
Some factors that cause a table schema to be invalid include the following:
- Empty column names.
- Leading or trailing spaces in column names.
- A column name prefixed with double underscores.
- If more than 400 columns are defined.
Change the schema mode
You can change a table's schema enforcement mode between strict and flexible. For information on the conditions required for changing the schema mode, see Schema mode conversion.
To change a table's schema mode using the API, issue a PUT
request to the Tables API.
Include the following information in your request:
- Table name. You supply this both as a path parameter in the URL and as part of the JSON payload sent with the request.
- Table type. Either
detail
oraggregate
. A table's type cannot be changed. - Version number of the table. Current table version. The version starts at 0 and increments each time you make a change to the table. Send a GET request to get table details to view the table type and version.
- Schema mode. Either
strict
orflexible
.
Sample request
The following example shows how to change the schema mode from strict to flexible for a table named Koalas
:
- cURL
- Python
curl --location --request PUT "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas" \
--header "Content-Type: application/json" \
--header "Authorization: Basic $POLARIS_API_KEY" \
--data '{
"type": "aggregate",
"name": "Koalas",
"version": 0,
"schemaMode": "flexible"
}'
import os
import requests
import json
url = "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas"
apikey = os.getenv("POLARIS_API_KEY")
payload = json.dumps({
"type": "aggregate",
"name": "Koalas",
"version": 0,
"schemaMode": "flexible"
})
headers = {
'Authorization': f'Basic {apikey}',
'Content-Type': 'application/json'
}
response = requests.request("PUT", url, headers=headers, data=payload)
print(response.text)
Sample response
The following example shows a successful response for changing the table's schema mode:
Click to view the response
{
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
}
],
"timeResolution": "millisecond",
"name": "Koalas",
"type": "aggregate",
"version": 1,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdBy": {
"username": "xxxxxxxxxxxx@imply.io",
"userId": "d84ec32d-933d-43b2-9904-xxxxxxxxxxxx"
},
"createdByUser": {
"username": "xxxxxxxxxxxx@imply.io",
"userId": "d84ec32d-933d-43b2-9904-xxxxxxxxxxxx"
},
"createdOnTimestamp": "2023-07-11T20:58:56.10012Z",
"createdTimestamp": "2023-07-11T20:58:56.10012Z",
"description": null,
"id": "3d3bab1d-73de-41f1-a146-234df4a5bef4",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdateTimestamp": "2023-07-11T21:28:33.336331775Z",
"modifiedByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"modifiedOnTimestamp": "2023-07-11T21:28:33.336331775Z",
"partitioningGranularity": "day",
"queryableSchema": [],
"storagePolicy": null,
"schemaMode": "flexible",
"segmentCompactedBytes": 0,
"segmentTotalBytes": 0,
"totalDataSizeBytes": 0,
"totalRows": 0
}
Learn more
For more information on tables, see the following topics:
- Tables v1 API for reference on creating and managing tables.
- Introduction to tables for an overview of tables.
- Table schema for details on table schemas and how to create a schema in the UI.
For more information on ingesting data into tables, see the following topics:
- Ingestion sources overview for a list of data sources.
- Ingest data from files for batch ingestion into a table using the Polaris API.
- Ingest data from a table for batch ingestion from an existing table using the Polaris API.
- Push event data for push-based stream ingestion into a table using the Polaris API.