Apache Druid
  • Imply Documentation

›Getting started

Getting started

  • Introduction to Apache Druid
  • Quickstart
  • Docker
  • Single server deployment
  • Clustered deployment

Tutorials

  • Loading files natively
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Kerberized HDFS deep storage

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Schema design tips
  • Data management
  • Stream ingestion

    • Apache Kafka
    • Amazon Kinesis
    • Tranquility

    Batch ingestion

    • Native batch
    • Hadoop-based
  • Task reference
  • Troubleshooting FAQ

Querying

  • Druid SQL
  • Native queries
  • Query execution
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Multitenancy
    • Query caching
    • Context parameters

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Getting started with Apache Druid
  • Basic cluster tuning
  • API reference
  • High availability
  • Rolling updates
  • Retaining or automatically dropping data
  • Metrics
  • Alerts
  • Working with different versions of Apache Hadoop
  • HTTP compression
  • TLS support
  • Password providers
  • dump-segment tool
  • reset-cluster tool
  • insert-segment-to-db tool
  • pull-deps tool
  • Misc

    • Legacy Management UIs
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Segment Size Optimization
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Cardinality/HyperUnique aggregators
  • Select
  • Realtime Process
Edit

Docker

In this quickstart, we will download the Apache Druid image from Docker Hub and set it up on a single machine using Docker and Docker Compose. The cluster will be ready to load data after completing this initial setup.

Before beginning the quickstart, it is helpful to read the general Druid overview and the ingestion overview, as the tutorials will refer to concepts discussed on those pages. Additionally, familiarity with Docker is recommended.

Prerequisites

  • Docker

Getting started

The Druid source code contains an example docker-compose.yml which can pull an image from Docker Hub and is suited to be used as an example environment and to experiment with Docker based Druid configuration and deployments.

Compose file

The example docker-compose.yml will create a container for each Druid service, as well as Zookeeper and a PostgreSQL container as the metadata store. Deep storage will be a local directory, by default configured as ./storage relative to your docker-compose.yml file, and will be mounted as /opt/data and shared between Druid containers which require access to deep storage. The Druid containers are configured via an environment file.

Configuration

Configuration of the Druid Docker container is done via environment variables, which may additionally specify paths to the standard Druid configuration files

Special environment variables:

  • JAVA_OPTS -- set java options
  • DRUID_LOG4J -- set the entire log4j.xml verbatim
  • DRUID_LOG_LEVEL -- override the default log level in default log4j
  • DRUID_XMX -- set Java Xmx
  • DRUID_XMS -- set Java Xms
  • DRUID_MAXNEWSIZE -- set Java max new size
  • DRUID_NEWSIZE -- set Java new size
  • DRUID_MAXDIRECTMEMORYSIZE -- set Java max direct memory size
  • DRUID_CONFIG_COMMON -- full path to a file for druid 'common' properties
  • DRUID_CONFIG_${service} -- full path to a file for druid 'service' properties

In addition to the special environment variables, the script which launches Druid in the container will also attempt to use any environment variable starting with the druid_ prefix as a command-line configuration. For example, an environment variable

druid_metadata_storage_type=postgresql

would be translated into

-Ddruid.metadata.storage.type=postgresql

for the Druid process in the container.

The Druid docker-compose.yml example utilizes a single environment file to specify the complete Druid configuration; however, in production use cases we suggest using either DRUID_COMMON_CONFIG and DRUID_CONFIG_${service} or specially tailored, service-specific environment files.

Launching the cluster

Run docker-compose up to launch the cluster with a shell attached, or docker-compose up -d to run the cluster in the background. If using the example files directly, this command should be run from distribution/docker/ in your Druid installation directory.

Once the cluster has started, you can navigate to http://localhost:8888. The Druid router process, which serves the Druid console, resides at this address.

Druid console

It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.

From here you can follow along with the Quickstart, or elaborate on your docker-compose.yml to add any additional external service dependencies as necessary.

Docker Memory Requirements

If you experience any processes crashing with a 137 error code you likely don't have enough memory allocated to Docker. 6 GB may be a good place to start.

← QuickstartSingle server deployment →
  • Prerequisites
  • Getting started
    • Compose file
    • Configuration
  • Launching the cluster
  • Docker Memory Requirements

Technology · Use Cases · Powered by Druid · Docs · Community · Download · FAQ

 ·  ·  · 
Copyright © 2019 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.