File formats

AI summary

Explains supported file and compression formats for batch ingestion in Imply Lumi, useful for backfills or quick evaluations. Covers Splunk® HEC JSON, generic JSON, Splunk CSV, and standard CSV formats. Also details compression options and automatic format detection capabilities.

About AI summaries.

When you use Imply Lumi in your observability workflows, you typically have an incoming stream of events for the logs you want to capture and analyze. In some cases, you might need to ingest a batch of events stored in a file. This can be useful when performing a backfill to load historical events or for a quick evaluation of your data in Lumi.

Of the available ingestion integrations, you can use file upload and S3 pull to upload events as a batch. This topic lists the supported file and compression formats that you can use for these integrations.

Event formats

The following table describes the event formats supported for batch ingestion:

Format	File upload	S3 pull
Splunk® HEC JSON	✅	✅
JSON (generic)	✅	✅
Splunk® CSV	✅	✅
CSV (generic)	✅	✅
Plain text	❌	✅

The following sections describe the formats in more detail.

Splunk HEC JSON

The Splunk HEC JSON format refers to the HTTP request format for the Splunk HEC endpoint. At minimum, it requires the top-level field event that contains the raw event message.

Lumi automatically detects the Splunk HEC JSON format when the object has the top-level fields for event and at least one of time, host, source, sourcetype, index, fields.

To send an event with user attributes, include the attributes in a nested JSON object assigned to fields. The following example shows a Splunk HEC JSON event that includes attributes for key1 and key2:

{
  "event": "Demo log",
  "time": "2025-11-14T22:46:11Z",
  "index": "demo",
  "fields": {
    "key1": "value1",
    "key2": [
      "value2.0",
      "value2.1"
    ]
  }
}

If you include a custom attribute—for example, status—as a top-level field, Lumi doesn't assign the user attribute. For more information, see the Splunk documentation and Lumi HEC examples.

For more information, see Format events for HTTP event collector in the Splunk documentation.

JSON

For generic JSON, Lumi extracts each top-level field as a separate user attribute. You can format JSON events using any of the following structures:

Separate objects: each object is a separate event.

For example:

{"time": "2025-11-14T22:46:11Z", "event": "example log 1"}{"time": "2025-11-14T23:46:11Z", "event": "example log 2"}

Array of objects: each object in the array is an event.

For example:

[
  {"time": "2025-11-14T22:46:11Z", "event": "example log 1"},
  {"time": "2025-11-14T23:46:11Z", "event": "example log 2"}
]

Newline-delimited objects: each line is a separate event.

For example:

{"time": "2025-11-14T22:46:11Z", "event": "example log 1"}
{"time": "2025-11-14T23:46:11Z", "event": "example log 2"}

Lumi doesn't support the following:

A single JSON object that contains all the events.
JSON files exported from Splunk.

Splunk CSV

The Splunk CSV format represents the format of a CSV file that Splunk exports. When Lumi reads this format, it looks for the fields _raw and _time, which contain the raw event message and timestamp respectively. A Splunk export includes these fields by default. See the Splunk documentation for more information.

CSV

For events in generic CSV format, Lumi automatically parses each field as a user attribute. The CSV format requires a header line.

Compressed files

A compressed format allows you to save storage space and streamline data management tasks. You can ingest a compressed file in one of the supported event formats.

Lumi supports the following compression formats:

Brotli
BZIP2
DEFLATE
GZIP
LZMA
LZ4
Snappy
XZ
Z
ZSTD

Automatic format detection

Lumi automatically detects an event or file format using the following heuristics:

Check the file extension: Identify the file extension and check for the following:
- A base file extension of .json or .ndjson indicates JSON format.
- A base file extension of .csv indicates CSV format.
Inspect file contents: When Lumi can't determine the file format from the extension, it evaluates the file contents. Lumi reads the first 1024 bytes and checks for the following:
- Contents starting with { or [ indicates JSON format.
- Contents that contain commas in a consistent pattern on each line indicates CSV format.
Identify Splunk-specific formats: Using the format detected in the previous steps, Lumi checks for Splunk-specific features:
- When the JSON contains event and one of the supported metadata fields, Lumi uses Splunk HEC JSON format.
- When the CSV contains the headers _raw and _time, Lumi uses Splunk CSV format.
If the JSON or CSV format doesn't have Splunk-specific indicators, Lumi parses events in generic JSON or CSV format, respectively.
If neither JSON nor CSV are detected, Lumi treats the event format as plain text and proceeds with line-based parsing.

The following diagram shows the decision tree of format detection for an example event. The arrows in bold show the pathway of the example event detected as generic JSON.

Format detection diagram

Event formats​

Splunk HEC JSON​

JSON​

Splunk CSV​

CSV​

Compressed files​

Automatic format detection​

Event formats

Splunk HEC JSON

JSON

Splunk CSV

CSV

Compressed files

Automatic format detection