IngestionJobSpec
Specification for a batch ingestion job.
Example
{
"jobType": "ingestion",
"fileList": ["logicalFilename1", ...],
"timestampMapping": {
"inputField": "myTimestampColumn",
"format": "iso",
"missingValue": "2022-01-01"
},
"columnRenames": [
{
"inputField": "source",
"outputColumn": "target"
},
...
],
"maxParsingErrors": 1000,
"formatSettings": {
"format": "nd-json"
}
}
Properties
jobType
String required
Possible values: ingestion
Type of the job.
fileList
array[String] required
List of files to ingest. All files listed must have the same format (for example, newline-delimited JSON) and, if specified, the same formatSettings
. To ingest files with different formats or format settings into the same table, split into multiple ingestion jobs.
isReplace
Boolean
Default: false
Controls behavior when the data to ingest occurs within an interval for which the table already has data. If false
(the default), appends new data and preserves any existing data for those intervals. If true
, the new data replaces any data that exists for the common intervals.
intervals
array[String]
Specifies the intervals to ingest for the data.
Required if you set isReplace
to true.
If not specified, empty, or null (the default), Polaris discovers the intervals by inspecting the data to ingest. If specified, Polaris ignores data outside the given intervals.
Each interval provided must have a span coarser than the table’s time partitioning. For example, if a table has a time partitioning of day
, you cannot specify an eight hour interval such as 2022-06-01T00:00:00Z/2022-06-01T08:00:00Z
; however, you can specify an eight day interval such as 2022-06-01/2022-06-09
.
Example: ["2021-05-01/2021-05-02", "2021-08-01/2021-08-02"]
timestampMapping
TimestampMapping required
Describes which input field should be used as the Druid timestamp column.
For rows that do not have the specified input timestamp field, define their default timestamp in missingValue
.
columnRenames
Array
Any column renames that should be applied. Specify each column rename as a JSON object with the following fields:
inputField
: Name of the field from the input data.outputColumn
: Desired new name for the input field.
maxParsingErrors
Integer
Maximum number of parsing errors allowed to occur before the job fails.
maxSavedParsingErrors
Integer
Default: 5
Maximum number of parsing errors to save.
excludedColumns
array[String]
Columns that should not be ingested.
formatSettings
JSON object
Data format settings that apply to all files in the ingestion job. Define the appropriate settings based on the files to ingest:
Polaris automatically detects the file type based on the file extension. If you specify a value for format
in formatSettings
that does not match the automatically detected type, Polaris attempts to ingest based on the user-specified value.