GCE Extensions

To use this Apache Druid (incubating) extension, make sure to include gce-extensions.

At the moment, this extension enables only Druid to autoscale instances in GCE.

The extension manages the instances to be scaled up and down through the use of the Managed Instance Groups of GCE (MIG from now on). This choice has been made to ease the configuration of the machines and simplify their management.

For this reason, in order to use this extension, the user must have created

An instance template with the right machine type and image to bu used to run the MiddleManager
A MIG that has been configured to use the instance template created in the point above

Moreover, in order to be able to rescale the machines in the MIG, the Overlord must run with a service account guaranteeing the following two scopes from the Compute Engine API

https://www.googleapis.com/auth/cloud-platform
https://www.googleapis.com/auth/compute

Overlord Dynamic Configuration

The Overlord can dynamically change worker behavior.

The JSON object can be submitted to the Overlord via a POST request at:

http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker

Optional Header Parameters for auditing the config change can also be specified.

Header Param Name	Description	Default
`X-Druid-Author`	author making the config change	""
`X-Druid-Comment`	comment describing the change being done	""

A sample worker config spec is shown below:

{
  "autoScaler": {
    "envConfig" : {
      "numInstances" : 1,
      "projectId" : "super-project",
      "zoneName" : "us-central-1",
      "managedInstanceGroupName" : "druid-middlemanagers"
    },
    "maxNumWorkers" : 4,
    "minNumWorkers" : 2,
    "type" : "gce"
  }
}

The configuration of the autoscaler is quite simple and it is made of two levels only.

The external level specifies the type—always gce in this case— and two numeric values, the maxNumWorkers and minNumWorkers used to define the boundaries in between which the number of instances must be at any time.

The internal level is the envConfig and it is used to specify

The numInstances used to specify how many workers will be spawned at each request to provision more workers. This is safe to be left to 1
The projectId used to specify the name of the project in which the MIG resides
The zoneName used to identify in which zone of the worlds the MIG is
The managedInstanceGroupName used to specify the MIG containing the instances created or removed

Please refer to the Overlord Dynamic Configuration section in the main documentation for parameters other than the ones specified here, such as selectStrategy etc.

Known limitations

The module internally uses the ListManagedInstances call from the API and, while the documentation of the API states that the call can be paged through using the pageToken argument, the responses to such call do not provide any nextPageToken to set such parameter. This means that the extension can operate safely with a maximum of 500 MiddleManagers instances at any time (the maximum number of instances to be returned for each call).

Stream ingestion

Batch ingestion

Concepts

Native query types

Native query components

Misc

Overlord Dynamic Configuration

Known limitations