Skip to main content

Create a table by API

info

Project-less regional API resources have been deprecated and will be removed by the end of September 2024.

You must include the project ID in the URL for all regional API calls in projects created after September 29, 2023. For example: https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID

Projects created before September 29, 2023 can continue to use project-less URLs until the end of September 2024. We strongly recommend updating your regional API calls to include the project ID prior to September 2024. See the API migration guide for more information.

You can create and manage tables in Imply Polaris using the Tables v1 API.

Tables are a central element to Polaris where you store and organize your data records.

When creating a table, you can also define a schema to specify the column names and column types in the table. You can define the table's schema at creation time, or you can define the schema after creating the table and before loading data.

You do not have to create a table before starting an ingestion job. When you set createTableIfNotExists to true in the ingestion job spec, Polaris automatically determines the table attributes from the job spec. For details, see Automatically created tables.

This topic walks you through the process to create a table using the Polaris API.

Prerequisites

You must have an API key with the ManageTables permission. In the examples below, the key value is stored in the variable named POLARIS_API_KEY. For information about how to obtain an API key and assign permissions, see API key authentication. For more information on permissions, visit Permissions reference.

Create a basic table

To create a table in Polaris, issue a POST request to the Tables v1 API. At a minimum, the following details are required to create a table:

  • Table name. Table names must be unique in your Polaris organization. Creating a table with the same name as a previously existing table results in a 409 Conflict error. You use the table name in API requests to get information about a table, update a table, or ingest data into a table.
  • Table type. Polaris tables are either detail tables or aggregate tables. A detail table is a regular table that stores each record as it is ingested. An aggregate table combines multiple rows together based on the table’s rollup granularity and dimensions.

Keep the following in mind when creating a table:

  • You cannot change a table's name or type.
  • All tables have the default timestamp column __time whether or not you define a schema at creation time.
  • Tables are created in flexible mode by default.

Sample request

The following example shows a basic table request to create a detail table named Koalas:

{
"name": "Koalas",
"type": "detail"
}

Polaris applies the default values for all settings not specified. For more details on properties you can set for a table, see Create a table with a schema.

curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables" \
--user ${POLARIS_API_KEY}: \
--header "Content-Type: application/json" \
--data-raw '{
"name": "Koalas",
"type": "detail"
}'

Sample response

The following example shows a successful response from creating a table. Even if you don't define a schema when you create a table, Polaris adds the default timestamp column named __time.

{
"schema": [
{
"name": "__time",
"dataType": "timestamp"
}
],
"type": "detail",
"name": "Koalas",
"id": "cc69003c-0c0f-430b-93e2-c60745d9329b",
"version": 0,
"createdTimestamp": "2023-04-14T00:05:50.882753433Z",
"lastUpdateTimestamp": "2023-04-14T00:05:50.882761613Z",
"availability": "available",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"partitioningGranularity": "day",
"totalDataSizeBytes": 0,
"totalRows": 0,
"clusteringColumns": [],
"description": null,
"schemaMode": "flexible",
"queryableSchema": []
}

If you get a 400 Bad Request error, confirm that the request payload is properly formatted JSON and contains both the type and name fields.

Create a table with a schema

In addition to the name and table type, you can specify the following characteristics when you create a table:

  • Description. A description of the table limited to 1000 characters.

  • Schema mode. The schema enforcement mode of the table, either flexible or strict. The schema mode determines how Polaris enforces a schema on the table. The default mode is flexible.

    • For a flexible table, Polaris auto-discovers the table schema during ingestion.

      In addition to skipping table schema declaration, flexible tables can also use schema auto-discovery on streaming ingestion jobs. This allows you to skip writing inputSchema and mappings in the streaming ingestion job spec. For more information, see Create a streaming ingestion job by API.

    • For a strict table, you must explicitly assign a schema to the table before ingesting data into the table. You can also declare columns in the table schema for a flexible table.

  • Table schema. The schema field of the request payload takes an array of column definitions.

    The schema requirement depends on the schema enforcement mode of the table. With strict mode, you must declare all columns of the table schema. With flexible mode, declaring columns is optional. The advantage of declaring columns in the schema is to enforce a strict schema for those columns. If a column is not declared, its data type may change as more data is ingested.

    The format of a column definition depends on the table type.

    • For a detail table, declare each column with the following syntax:

      {
      "dataType": <data type of the column>,
      "name": <name of the column>
      },
    • For an aggregate table, declare the column with the following syntax: Set the column type to "dimension" or "measure".

      {
      "type": <type of the column>,
      "dataType": <data type of the column>,
      "name": <name of the column>
      },
  • Partitioning schema. Polaris uses segment partitioning to decrease storage size and improve query speeds. Polaris always partitions your data by timestamp first. The following fields configure the table's partitioning schema:

    • partitioningGranularity: time partitioning
    • clusteringColumns dimensions for secondary partitioning
  • Rollup schema. Only for aggregate tables, Polaris uses data rollup to aggregate raw data at predefined intervals. You specify the rollup granularity in the timeResolution field.

    If you want to set a custom duration or time zone and origin, supply a period granularity using queryGranularity. When set, queryGranularity overrides the rollup granularity in timeResolution. See Tables v1 API for more information.

  • Storage policy. Use the storagePolicy property to optionally set a storage policy on the table. You can use a storage policy to configure automatic data deletion for data older than a certain time period. For information on creating a table with a storage policy, see Set a storage policy by API.

Sample request

The following example creates a table with the following characteristics:

  • Aggregate table named "Koalas Subset"
  • Strict schema mode with the table schema fully declared
  • Data partitioned by day
  • Data rolled up by hour
  • Table schema
    • __time: the primary timestamp required for all Polaris tables
    • city: a string dimension
    • session: a string dimension
    • max_session_length: a long-typed measure containing the maximum session lengths for each rolled up row

The request body to create the example is as follows:

{
"type": "aggregate",
"name": "Koalas Subset",
"partitioningGranularity": "day",
"timeResolution": "hour",
"schemaMode": "strict",
"schema": [
{
"type": "dimension",
"name": "__time",
"dataType": "timestamp"
},
{
"type": "dimension",
"name": "city",
"dataType": "string"
},
{
"type": "dimension",
"name": "session",
"dataType": "string"
},
{
"type": "measure",
"name": "max_session_length",
"dataType": "long"
}
]
}
curl --location --request POST "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables" \
--user ${POLARIS_API_KEY}: \
--header "Content-Type: application/json" \
--data-raw '{
"type": "aggregate",
"name": "Koalas Subset",
"partitioningGranularity": "day",
"timeResolution": "hour",
"schemaMode": "strict",
"schema": [
{
"type": "dimension",
"name": "__time",
"dataType": "timestamp"
},
{
"type": "dimension",
"name": "city",
"dataType": "string"
},
{
"type": "dimension",
"name": "session",
"dataType": "string"
},
{
"type": "measure",
"name": "max_session_length",
"dataType": "long"
}
]
}'

Sample response

The following example shows a successful response:

Click to view the response
{
"timeResolution": "hour",
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
},
{
"dataType": "string",
"type": "dimension",
"name": "city"
},
{
"dataType": "string",
"type": "dimension",
"name": "session"
},
{
"dataType": "long",
"queryAggregator": null,
"type": "measure",
"name": "max_session_length"
},
{
"dataType": "long",
"queryAggregator": "sum",
"type": "measure",
"name": "__count"
}
],
"type": "aggregate",
"name": "Koalas Subset",
"id": "754abbb2-c460-4771-a036-xxxxxxxxxxxx",
"version": 0,
"createdTimestamp": "2023-03-21T20:55:48.111406237Z",
"lastUpdateTimestamp": "2023-03-21T20:55:48.111413317Z",
"availability": "available",
"createdBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"partitioningGranularity": "day",
"totalDataSizeBytes": 0,
"totalRows": 0,
"clusteringColumns": [],
"description": null,
"schemaMode": "strict"
}

For measures in aggregate tables, Polaris automatically sets the aggregation function used at query time based on the ingestion time aggregation. This value is displayed in the response field queryAggregator. You do not define the query aggregator when creating the table.

Add a schema to an existing table

To add or update a schema on a table, submit a PUT request to the Tables API. Include the following information in your request:

  • name: Name of the table. You supply this both as a path parameter in the URL and as part of the JSON payload sent with the request.
  • type: The table type. Either detail or aggregate. A table's type cannot be changed.
  • version: The current version of the table. The version starts at 0 and increments each time you make a change to the table. Send a GET request to get table details to view the table type and version.
  • schema: The table schema in the form of a JSON object to specify columns and their corresponding data types. Strict tables require you to declare all columns in the table schema before you ingest data.

Polaris applies the updated schema to subsequent ingestion jobs and does not backfill data from previous jobs.

Existing ingestion jobs for strict tables do not automatically ingest new columns. To ingest data into new columns for strict tables, start a new ingestion job and include the new mapping from the input field to the table column in mappings. For more information, see Create an ingestion job.

Streaming ingestion jobs into flexible tables may use schema auto-discovery to automatically detect and ingest new input fields as undeclared columns without having to start a new ingestion job.

For more information on schema changes, see Updating a schema.

Sample request

The following example shows how to set the schema for a table named Koalas Geography:

curl --location --request PUT "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas Geography" \
--user ${POLARIS_API_KEY}: \
--header "Content-Type: application/json" \
--data-raw '{
"type": "detail",
"name": "Koalas Geography",
"version": 0,
"schema": [
{
"name": "__time",
"dataType": "timestamp"
},
{
"name": "continent",
"dataType": "string"
},
{
"name": "country",
"dataType": "string"
},
{
"name": "city",
"dataType": "string"
}
]
}'

Sample response

The following example shows a successful response for setting the table's input schema:

Click to view the response
{
"type": "detail",
"name": "Koalas Geography",
"id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"version": 1,
"lastUpdateTimestamp": "2022-07-21T23:02:47.354643482Z",
"availability": "available",
"createdBy": {
"username": "service-account-docs-demo",
"userId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
},
"lastModifiedBy": {
"username": "service-account-docs-demo",
"userId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
},
"partitioningGranularity": "day",
"totalDataSizeBytes": 0,
"totalRows": 0,
"schema": [
{
"name": "__time",
"dataType": "timestamp"
},
{
"name": "continent",
"dataType": "string"
},
{
"name": "country",
"dataType": "string"
},
{
"name": "city",
"dataType": "string"
}
]
}

An invalid table schema results in a 400 Bad Request status code with a JSON body describing the error. Some factors that cause a table schema to be invalid include the following:

  • Empty column names.
  • Leading or trailing spaces in column names.
  • A column name prefixed with double underscores.
  • If more than 400 columns are defined.

Change the schema mode

You can change a table's schema enforcement mode between strict and flexible. For information on the conditions required for changing the schema mode, see Schema mode conversion.

To change a table's schema mode using the API, issue a PUT request to the Tables API. Include the following information in your request:

  • Table name. You supply this both as a path parameter in the URL and as part of the JSON payload sent with the request.
  • Table type. Either detail or aggregate. A table's type cannot be changed.
  • Version number of the table. Current table version. The version starts at 0 and increments each time you make a change to the table. Send a GET request to get table details to view the table type and version.
  • Schema mode. Either strict or flexible.

Sample request

The following example shows how to change the schema mode from strict to flexible for a table named Koalas:

curl --location --request PUT "https://ORGANIZATION_NAME.REGION.CLOUD_PROVIDER.api.imply.io/v1/projects/PROJECT_ID/tables/Koalas" \
--header 'Content-Type: application/json' \
--user ${POLARIS_API_KEY}: \
--data '{
"type": "aggregate",
"name": "Koalas",
"version": 0,
"schemaMode": "flexible"
}'

Sample response

The following example shows a successful response for changing the table's schema mode:

Click to view the response
{
"schema": [
{
"dataType": "timestamp",
"type": "dimension",
"name": "__time"
}
],
"timeResolution": "millisecond",
"name": "Koalas",
"type": "aggregate",
"version": 1,
"availability": "available",
"clusteringColumns": [],
"compactionConfig": null,
"createdBy": {
"username": "xxxxxxxxxxxx@imply.io",
"userId": "d84ec32d-933d-43b2-9904-xxxxxxxxxxxx"
},
"createdByUser": {
"username": "xxxxxxxxxxxx@imply.io",
"userId": "d84ec32d-933d-43b2-9904-xxxxxxxxxxxx"
},
"createdOnTimestamp": "2023-07-11T20:58:56.10012Z",
"createdTimestamp": "2023-07-11T20:58:56.10012Z",
"description": null,
"id": "3d3bab1d-73de-41f1-a146-234df4a5bef4",
"lastModifiedBy": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"lastUpdateTimestamp": "2023-07-11T21:28:33.336331775Z",
"modifiedByUser": {
"username": "api-key-pok_vipgj...bjjvyo",
"userId": "a52cacf6-3ddc-48e5-8675-xxxxxxxxxxxx"
},
"modifiedOnTimestamp": "2023-07-11T21:28:33.336331775Z",
"partitioningGranularity": "day",
"queryableSchema": [],
"storagePolicy": null,
"schemaMode": "flexible",
"segmentCompactedBytes": 0,
"segmentTotalBytes": 0,
"totalDataSizeBytes": 0,
"totalRows": 0
}

Learn more

For more information on tables, see the following topics:

For more information on ingesting data into tables, see the following topics: