Skip to main content

Event formats reference

AI summary
Describes event and file compression formats for batch ingesting data into Imply Lumi. Covers CSV, JSON, Splunk® formats, plain text, and compression options. Explains how Lumi auto-detects formats and when manual specification helps ensure correct parsing.

About AI summaries.

When you use Imply Lumi in your observability workflows, you typically have an incoming stream of events for the logs you want to capture and analyze. In some cases, you might need to ingest a batch of events stored in a file. This can be useful when performing a backfill to load historical events or for a quick evaluation of your data in Lumi.

This topic lists the supported event and file compression formats you can use to batch ingest events.
The formats apply to file upload and S3 pull.

Supported formats

You can use the following event formats for batch ingestion:

The following sections describe the formats in more detail. See Automatic detection for the framework Lumi uses to detect the event format and Manual specification to specifically designate your format.

CSV

For events in generic CSV format, Lumi automatically parses each field as a user attribute.

Consider the following example CSV:

host,timestamp,location
127.0.0.1,24/Mar/2025:16:25:29 -0500,"Chicago, IL"

Lumi generates the following user attributes, shown in alphabetical order:

host: 127.0.0.1
location: Chicago, IL
timestamp: 24/Mar/2025:16:25:29 -0500

When you manually specify CSV instead of using format auto-detection, you can provide the following settings:

  • Delimiter: Defaults to , for CSV. Specify any single character or \t for TSV.
  • Rows to skip: Defaults to 0. Set this value if you want to skip any preamble rows or override headers in the file.
  • Custom headers: By default, Lumi infers headers from the file. You can provide your own header names, separated by commas (regardless of your delimiter). For example, timestamp,message,host.

JSON

For generic JSON, Lumi extracts each top-level field as a separate user attribute. You can format JSON events using any of the following structures:

  • Separate objects: each object is a separate event.

    For example:

    {"time": "2025-11-14T22:46:11Z", "log": "example log 1"}{"time": "2025-11-14T23:46:11Z", "log": "example log 2"}
  • Array of objects: each object in the array is an event.

    For example:

    [
    {"time": "2025-11-14T22:46:11Z", "log": "example log 1"},
    {"time": "2025-11-14T23:46:11Z", "log": "example log 2"}
    ]
  • Newline-delimited objects: each line is a separate event.

    For example:

    {"time": "2025-11-14T22:46:11Z", "log": "example log 1"}
    {"time": "2025-11-14T23:46:11Z", "log": "example log 2"}

Lumi doesn't support the following:

  • A single JSON object that contains all the events.
  • JSON files exported from Splunk.

Splunk CSV

The Splunk CSV format represents the format of a CSV file that Splunk exports. When Lumi reads this format, it looks for the fields _raw and _time, which contain the raw event message and timestamp respectively. A Splunk export includes these fields by default. See the Splunk documentation for more information.

Splunk HEC

The Splunk HEC format refers to the JSON format for an HTTP request to the Splunk HEC endpoint. At minimum, it requires the top-level field event that contains the raw event message.

Lumi automatically detects Splunk HEC when the object has the top-level fields for event and at least one of the Splunk event metadata fields: time, host, source, sourcetype, index, fields.

To send an event with user attributes, include the attributes in a nested JSON object assigned to fields. If you include a custom attribute—for example, status—as a top-level field, Lumi doesn't assign the user attribute. For more information, see Format events for HTTP event collector in the Splunk documentation.

The following example shows a Splunk HEC JSON event that includes attributes for key1 and key2:

{
"event": "Demo log",
"time": "2025-11-14T22:46:11Z",
"index": "demo",
"fields": {
"key1": "value1",
"key2": [
"value2.0",
"value2.1"
]
}
}

In this case, Lumi generates the following user attributes:

index: demo
key1: value1
key2: [value2.0, value2.1]
sourcetype: httpevent

For additional examples, see Send events with Splunk HEC.

Plain text

Plain text encompasses all other event formats. The raw log line becomes the event message, and the timestamp Lumi received the event becomes the event timestamp. You can set up a pipeline to extract user attributes or manually assign the event timestamp.

Compressed files

A compressed format allows you to save storage space and streamline data management tasks. You can ingest a compressed file in one of the supported event formats.

Lumi supports the following compression formats:

  • Brotli
  • BZIP2
  • DEFLATE
  • GZIP
  • LZMA
  • LZ4
  • Snappy
  • XZ
  • Z
  • ZSTD

Automatic detection

Lumi automatically detects the event format using the following heuristics:

  1. Determine the format based on the base file extension. For a compressed file like .json.gz, the base file extension is .json.

    • CSV: indicated by the extension .csv
    • JSON: indicated by extension .json or .ndjson
  2. If the file extension is inconclusive, evaluate file contents in the first 1024 bytes:

    • CSV: contains at least two rows, where the first row contains at least two non-blank columns such as time,event
    • JSON: starts with { or [
  3. If neither CSV nor JSON are detected, Lumi treats the event format as plain text and proceeds with line-based parsing.

  4. Identify any Splunk-specific formats:

  5. If Lumi doesn't detect a Splunk format, parse events in generic CSV or JSON format, respectively.

The following diagram shows the decision tree of format detection for an example event. The arrows in bold show the pathway of the example event detected as generic JSON.

Format detection diagram

Manual specification

For S3 pull ingestion, you can specify an event format to ensure parsing that type rather than rely on format auto-detection. You can select any of the supported formats. For example, you might have comma-separated data that you want to treat as plain text instead of CSV. Or you might have JSON events misinterpreted as Splunk HEC, which has specific requirements on how to supply custom fields.

You can manually designate the format for S3 pull ingestion in the following ways:

  • IAM key attribute: Applies to all incoming events from S3 pull configured with the IAM key.

    To set the format on an IAM key:

    1. Go to the Keys page in Lumi and find your IAM key.
    2. For the S3 pull integration, click the ellipsis and select Edit attributes.
    3. Under Format, click the drop-down and select your format.
    4. Click Save.
  • Backfill job specification: Applies to incoming events from a single backfill job. Overrides any format assigned on the IAM key.
    When creating a job, set Format in the UI or formatOptions in your API request.

For recurring ingestion, create a separate IAM key for each format type. For backfill ingestion, create a job for each format type if you want to override the one on the key.

Learn more

See the following topics for more information: