This topic is a reference for data and file format support in Imply Polaris. In an ingestion job, Polaris automatically detects the data format or compression format using the file extension. If you specify a filename that does not match the automatically detected type, Polaris attempts to ingest based on the user-specified value.
Supported source data formats
The following table describes the data formats that Polaris supports for batch and streaming ingestion.
|Format||Batch ingestion||Streaming ingestion|
|Apache Avro||Yes (Avro OCF)||Yes (not supported for push streaming)|
|Protocol Buffers (Protobuf)||No||Yes (not supported for push streaming)|
Polaris supports nested data for all supported data formats.
For details on how to specify your input data schema for Avro and Protobuf formats, see Specify input schema by API.
Supported file compression formats
Polaris supports the following compression formats for uploaded files:
ZIP files and TAR files are not supported.
You can send gzipped data in push streaming ingestion with the HTTP header
File size limit
Polaris supports individual files up to 10 GB. This limit refers to the size of the file transmitted by the browser or HTTP client.
You may upload a file that's larger than 10 GB on disk if your browser or client compresses the file less than 10 GB in transit.
Polaris requires all data to have a timestamp. The timestamp is used to partition and sort data and to perform time-based data management operations, such as dropping time chunks. You can transform timestamps or fill in missing timestamps in your ingestion job. For information about parsing and transforming timestamps from your source data, see Timestamp expressions.
Late arriving event data
For streaming ingestion, the event timestamp must be within 30 days of ingestion time. Polaris rejects events with timestamps older than 30 days. To override this period, set the late message rejection period in the ingestion job.
Otherwise if you need to ingest older data, use batch ingestion.
For information on supported timestamp formats in Polaris, see Timestamp expressions.