Supported file formats

To take advantage of Apache Druid's querying capabilities, you ingest or load your original data from an input source. This topic is a reference for the supported input formats and sources.

Natively supported input formats by type

Druid supports the following input file types natively without loading an extension:

JSON

Streaming ingestion: Kafka, Kinesis
Batch ingestion: Index parallel (index), Azure blob
Druid supports Newline delimited JSON. See also JSON data format and Data ingestion tutorial.

Comma separated values (CSV)

Streaming ingestion: Kafka, Kinesis
Batch ingestion: Index parallel (index)
See CSV data format.

Tab separated values (TSV) and custom delimiters

Streaming ingestion: Kafka, Kinesis
Batch ingestion: Index parallel (index)
See TSV data format.

Input formats supported with an extension by type

Druid supports the following input file types through an extension:

Avro OCF

Extension: druid-avro-extensions
Batch ingestion: Index parallel (index), or Hadoop with the Avro Hadoop parser
See Avro OCF data format.

Avro Stream

Extension: druid-avro-extensions
Streaming ingestion: Kafka, Kinesis See Avro stream data format.

Parquet

Extension: druid-parquet-extensions
Streaming ingestion: Kafka, Kinesis
*Batch ingestion: Index parallel (index), or Hadoop with the Parquet Hadoop parser
See Parquet data format.

Orc

Extension: druid-orc-extensions
Batch ingestion: Index parallel (index), or Hadoop with the Orc Hadoop parser
See Orc data format.

Protobuf

Extension: druid-protobuf-extensions
Streaming ingestion: Kafka, Kinesis
See Protobuf parser.

Supported input formats by ingestion type

This section lists the file formats Druid supports for each ingestion type.

Streaming ingestion

Druid supports the following file formats for streaming ingestion:

Kafka

Kinesis

Azure blob

JSON

Batch ingestion

Druid supports the following file formats for batch ingestion:

Index parallel (index)

Hadoop batch ingestion

Learn more

See the following topics for more information:

Ingestion overview for basic information about ingestion.
Data formats for examples and configuration options.

Natively supported input formats by type​

JSON​

Comma separated values (CSV)​

Tab separated values (TSV) and custom delimiters​

Input formats supported with an extension by type​

Avro OCF​

Avro Stream​

Parquet​

Orc​

Protobuf​

Supported input formats by ingestion type​

Streaming ingestion​

Kafka​

Kinesis​

Azure blob​

Batch ingestion​

Index parallel (index)​

Hadoop batch ingestion​

Learn more​