Skip to main content

Supported data and file formats

This topic is a reference for data and file format support in Imply Polaris. In an ingestion job, Polaris automatically detects the data format or compression format using the file extension. If you specify a filename that does not match the automatically detected type, Polaris attempts to ingest based on the user-specified value.

Supported source data formats

The following table describes the data formats that Polaris supports for batch and streaming ingestion.

FormatBatch ingestionStreaming ingestion
Newline-delimited JSONYesYes
Delimiter-separated valuesYesYes
Apache ORCYesNo
Apache ParquetYesNo
Apache AvroYes (Avro OCF)Yes (not supported for push streaming)
Protocol Buffers (Protobuf)NoYes (not supported for push streaming)

Polaris supports nested data for all supported data formats.

For details on how to specify your input data schema for Avro and Protobuf formats, see Specify input schema by API.

If your file uses the UTF-8 character encoding (the most common text encoding), ensure that the file does not store a byte order mark. The presence of a byte order mark interferes with UTF-8.

Supported file compression formats

Polaris supports the following compression formats for uploaded files:

ZIP files and TAR files are not supported.

You can send gzipped data in push streaming ingestion with the HTTP header Content-Encoding: gzip.

File size limit

Polaris supports individual files up to 10 GB. This limit refers to the size of the file transmitted by the browser or HTTP client.

You may upload a file that's larger than 10 GB on disk if your browser or client compresses the file less than 10 GB in transit.

Timestamp requirements

Polaris requires all data to have a timestamp. The timestamp is used to partition and sort data and to perform time-based data management operations, such as dropping time chunks. You can transform timestamps or fill in missing timestamps in your ingestion job. For information about parsing and transforming timestamps from your source data, see Timestamp expressions.

Late arriving event data

For streaming ingestion, the event timestamp must be within 30 days of ingestion time. Polaris rejects events with timestamps older than 30 days. To override this period, set the late message rejection period in the ingestion job.

Otherwise if you need to ingest older data, use batch ingestion.

Learn more

For information on supported timestamp formats in Polaris, see Timestamp expressions.