2022.06

2022.06

  • Imply
  • Ingest
  • Query
  • Visualize
  • Administer
  • Deploy

›Clarity

Overview

  • About Imply administration

Manager

  • Using Imply Manager
  • Managing Imply clusters
  • Imply Manager security
  • Extensions

Users

  • Imply Manager users
  • Druid API access
  • Authentication and Authorization

    • Get started with Imply Hybrid Auth
    • Authentication
    • Local users
    • User roles
    • User groups
    • User sessions
    • Brute force attack detection
    • Identity provider integration
    • Okta OIDC integration
    • Okta SAML integration
    • LDAP integration
    • OAuth client authentication

Clarity

  • Monitoring
  • Set up Clarity
  • Cloudwatch monitoring
  • Metrics

Druid administration

  • Configuration reference
  • Logging
  • Druid design

    • Design
    • Segments
    • Processes and servers
    • Deep storage
    • Metadata storage
    • ZooKeeper

    Security

    • Security overview
    • User authentication and authorization
    • LDAP auth
    • Dynamic Config Providers
    • Password providers
    • Authentication and Authorization
    • TLS support
    • Row and column level security

    Performance tuning

    • Basic cluster tuning
    • Segment size optimization
    • Mixed workloads
    • HTTP compression
    • Automated metadata cleanup
  • API reference
  • View Manager

    • View Manager
    • View Manager API
    • Create a view
    • List views
    • Delete a view
    • Inspect view load status
  • Rolling updates
  • Retaining or automatically dropping data
  • Alerts
  • Working with different versions of Apache Hadoop
  • Misc

    • dump-segment tool
    • reset-cluster tool
    • pull-deps tool
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration

Set up Clarity

You can configure an on-prem Imply deployment to use either SaaS Clarity (recommended) or an on-prem Clarity instance. Both approaches are described below.

Imply Enterprise with SaaS Clarity

SaaS Clarity offers improved setup and maintenance over on-prem Clarity. The following steps describe how to set up Imply to use SaaS Clarity. Before starting, request API credentials for your Clarity account. Once you have that information, configure Clarity as follows.

For managed Imply

If you use the Imply Manager with Imply Enterprise (formerly Imply Private), follow these steps to enable Clarity:

  1. In the Imply Manager UI, click the user icon at the top-right corner of the UI.
  2. Click Master settings.
  3. Click Account.
  4. Enter the Clarity username and password provided to you by Imply into the Clarity user and Clarity password fields and save your settings.
  5. Restart all nodes in the cluster.

All clusters should now report metrics to your SaaS Clarity account.

For unmanaged Imply

If you do not use Imply Manager with Imply Enterprise, follow these steps to enable Clarity:

  1. Open the Imply configuration file, common.runtime.properties.

  2. Add clarity-emitter to druid.extensions.loadList.

    If you specify druid.extensions.loadList for a Druid service independently, update this configuration for each respective service's druid.extensions.loadList and restart the service. For example, if the Broker configuration includes druid.extensions.loadList, then clarity-emitter needs to be added to druid/broker/runtime.properties.

  3. Add the following emitter configuration settings at the end of the file. If you have existing emitter configs, remove those first.

    # Enable JVM monitoring.
    druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
    
    # Enable Clarity emitter.
    druid.emitter=clarity
    
    # API details provided by Imply.
    druid.emitter.clarity.recipientBaseUrl=https://cc.imply.io/d/<orgname>
    druid.emitter.clarity.basicAuthentication=<orgname>:<apikey>
    
    # Cluster name; should be different for each cluster.
    druid.emitter.clarity.clusterName=<my-cluster-name>
    

    Note additional settings in Optional Clarity emitter configurable properties.

  4. Restart all nodes in the cluster.

Optional Clarity emitter configurable properties

You can configure Clarity behavior and settings by adding the following properties to the Druid properties file, common.runtime.properties. Prepend all properties with druid.emitter.clarity. followed by the field name. For example, druid.emitter.clarity.recipientBaseUrl.

FieldTypeDescriptionDefaultRequired
recipientBaseUrlStringHTTP endpoint events will be posted to, such as http://<clarity collector host>:<port>/d/<username>[required]yes
basicAuthenticationStringBasic auth credentials, typically <username>:<password>nullno
clusterNameStringCluster name used to tag eventsnullno
anonymousBooleanDetermines if hostnames should be scrubbed from eventsFALSEno
maxBufferSizeIntegerMaximum size of event buffermin(250MB, 10% of heap)no
maxBatchSizeIntegerMaximum size of HTTP event payload5MBno
flushCountIntegerNumber of events before a flush is triggered500no
flushBufferPercentFullIntegerPercentage of buffer fill that will trigger a flush (byte-based)25no
flushMillisIntegerPeriod between flushes if not triggered by flushCount or flushBufferPercentFull60sno
flushTimeOutIntegerFlush timeoutLong.MAX_VALUEno
timeOutISO8601 PeriodHTTP client response timeoutPT1Mno
batchingStrategyString [ARRAY, NEWLINES]How events are batched together in the payloadARRAYno
compressionString [NONE, LZ4, GZIP]Compression algorithm usedLZ4no
lz4BufferSizeIntegerBlock size for the LZ4 compressor in bytes65536no
samplingRateIntegerPercentage of metrics to emit, for sampled metrics100no
sampledMetricsListThe event types to sample["query/wait/time", "query/segment/time", "query/segmentAndCache/time"]no
sampledNodeTypesListThe node types to sample["druid/historical", "druid/peon", "druid/realtime"]no

SSL options

Clarity HTTP supports HTTPS (TLS) without any special configuration. If you need to use a custom trust store, you can specify the extra configurations in the following table. Prepend all properties with druid.emitter.clarity.ssl.—for example, druid.emitter.clarity.ssl.protocol.

If you do not specify trustStorePath, a custom SSL context is not created; the default SSL context is used instead.

FieldDescriptionDefaultRequired
protocolSSL protocol to use.TLSv1.2no
trustStoreTypeThe type of the key store where trusted root certificates are stored.java.security.KeyStore.getDefaultType()no
trustStorePathThe file path or URL of the TLS/SSL key store where trusted root certificates are stored.noneno
trustStoreAlgorithmAlgorithm to be used by TrustManager to validate certificate chains.javax.net.ssl.TrustManagerFactory.getDefaultAlgorithm()no
trustStorePasswordThe Password Provider or String password for the TrustStore.noneyes, if trustStorePath is specified.

Proxy server options

You can configure the Clarity emitter to connect to Clarity through a proxy server via HTTP tunneling with the CONNECT method.

Set the following properties to configure the proxy connection. Prepend all properties with druid.emitter.clarity.proxy.—for example, druid.emitter.clarity.proxy.host.

FieldTypeDescriptionDefaultRequired
hostStringThe hostname of the proxy server to connect to.noneyes
portIntegerThe port to connect to on the proxy server.noneyes
userStringUsername for basic auth, if required by the proxy server.noneno
passwordStringPassword for basic auth, if required by the proxy server.noneno

On-prem Clarity

Under the covers, Clarity uses Druid to store metrics. For a production on-prem installation, you should install a separate Druid cluster (the collection cluster) to receive performance data from the monitored Druid cluster (the monitored cluster).

In evaluation settings, it's possible to have a single cluster acting as both the monitored and collection cluster. However, in a production setting, this is strongly discouraged; the monitoring cluster should run independently from the cluster being monitored to ensure that monitoring functions, such as alerting, continue working if the availability or performance of the the production cluster is degraded. It also prevents Clarity operations from impacting production cluster performance.

Similarly, running Pivot from the secondary Imply instance provides the equivalent advantage.

Enabling Clarity on-prem involves these steps:

  1. Set up a metrics collection cluster.
  2. Configure your monitored cluster to emit metrics to Kafka.
  3. Configure the Kafka topic to which the monitored cluster emits metrics as a data source on your metrics collection cluster.
  4. Enable the embedded Clarity UI in your Pivot configuration.

The following diagram shows a high-level architecture for monitoring your Imply cluster with Clarity:

Clarity architecture

Step 1: Set up a metrics collection cluster

Skip this step if you plan to use the same cluster for metrics emitting and metrics collection. In production, use separate clusters.

Set up a cluster for metrics collection. Most metrics are query telemetry events, which are emitted once per query per segment. It is common for clusters to have thousands of segments resulting in numerous query telemetry events. Consider the following factors when sizing your cluster:

  • Minimize size requirements of the metrics collection cluster by using load and drop rules to set a retention window on your data.
  • You can configure the percentage of metrics emitted by the monitored cluster. If you have high query concurrency and wish to limit the amount of telemetry emitted, set druid.emitter.clarity.samplingRate when enabling the metric emitter on the monitored cluster. Configure this property on the metrics emitting cluster, not the metrics collection cluster.
  • For a large metrics cluster, increase the size of taskCount in your Kafka supervisor spec when configuring Kafka ingestion. This property configures the amount of parallelism used to process metrics. You may also need to increase the number of data servers to manage ingestion jobs and store queryable data.

Verify that the druid-histogram extension is in the druid.extensions.loadList in the druid/_common/common.runtime.properties config file. This extension is used for computing 98th percentile latency metrics.

Step 2: Enable the metric emitter on the monitored cluster

For every cluster that you want to monitor, configure the Clarity emitter by following these steps:

  1. Ensure that the clarity-emitter-kafka extension is in the druid.extensions.loadList in druid/_common/common.runtime.properties file for the emitting cluster.

  2. Remove or comment out existing druid.emitter and druid.emitter.* configs in druid/_common/common.runtime.properties and replace them with the following:

    druid.emitter=clarity-kafka
    druid.emitter.clarity.topic=druid-metrics
    druid.emitter.clarity.producer.bootstrap.servers=kafka1.example.com:9092
    druid.emitter.clarity.clusterName=clarity-collection-cluster
    
    1. Replace kafka1.example.com:9092 with a comma-delimited list of Kafka brokers in your environment.
    2. The "clarity-collection-cluster" string can be anything you want, but it is intended to be used to help Clarity users tell different clusters apart in the Clarity UI.

Step 3: Configure Kafka ingestion on your metrics collection cluster

Ensure that the Druid Kafka indexing service extension is loaded on the metrics collection cluster. See extensions for information on loading Druid extension.

Download the Clarity Kafka supervisor spec. Apply the spec by running the following command from the directory to which you downloaded the spec. Replace overlord_address with the IP address of the machine running the Overlord process in your Imply cluster. This is typically the Master server in the Druid cluster.

curl -XPOST -H'Content-Type: application/json' -d@clarity-kafka-supervisor.json http://<overlord_address>:8090/druid/indexer/v1/supervisor

The Clarity emitter will write to a druid-metrics topic. Start up Druid and verify that druid-metrics exists as a datasource in the collection cluster.

druid-metrics datasource

Step 4: Configure Clarity-specific settings

Clarity maintains a connection to the Druid collection cluster that is separate from Pivot's own connection to Druid. Accordingly, you need to configure the connection separately.

Add the following minimum configuration settings to your Pivot configuration file. You can find the Pivot configuration file in conf/pivot/config.yaml and conf-quickstart/pivot/config.yaml (for a quickstart instance) in your Imply installation home.

# Specify the metrics cluster to connect to
metricsCluster:
 host: localhost:8082 # Enter the IP of your metrics collecting broker node here

# Enter the name of your clarity data source
metricsDataSource: druid-metrics

# Instead of relying on auto-detection you can explicitly specify which clusters should be available from the cluster dropdown
clientClusters: ["default"]

# If your metrics data source does not have a histogram (approxHistogram) metric column then take it out of the UI by suppressing it
#suppressQuantiles: true

# If your metrics data source does have a histogram you can specify a tuning config here
#quantileTuning: "resolution=40

Replace localhost in the metricsCluster configuration with the IP address of the metrics collection cluster.

Provide at least one cluster name in clientClusters parameter or Pivot may fail to start up. The name should match the one used in druid.emitter.clarity.clusterName in the emitting cluster's common.runtime.properties configuration file.

Depending on the configuration of the Druid collecting cluster, you may need additional settings. If authentication is enabled in Druid, you need to add a username and password to the metricsCluster configuration by adding the defaultDbAuthToken property with the auth type. For example:

metricsCluster:
 host: <broker_host>:<broker_port>
 ...
 defaultDbAuthToken:
    type: 'basic-auth'
    username: <auth_user_name>
    password: <auth_password>

If TLS is enabled, add the protocol property and provide the certificate information to the metricsCluster configuration:

metricsCluster:
  host: <MetricClusterBrokerHost>:<BrokerPort>
  protocol: tls
  ca: <certificate>

For a self-signed certificate, you can use tls-loose as the protocol:

metricsCluster:
  host: <MetricClusterBrokerHost>:<BrokerPort>
  protocol: tls-loose

Likewise, you can use any connection parameter available for connecting Pivot to Druid in the metricsCluster configuration for connecting Clarity to the metrics collection cluster as well. See metricsCluster settings for more information on those settings.

Access Clarity

If Pivot is running, you need to restart it to have the configuration change take effect. After restarting Pivot, you can open Clarity at the following address:

http://<pivot_address>:9095/clarity

The AccessClarity permission is required for users to access the Clarity UI in Pivot. Of the built-in roles, only "Super Admins" have this permission, so you'll need to allocate this permission to the users and roles as appropriate for your system.

metricsCluster connection optional parameters

See Configure Clarity settings for the basic connection settings to connect Clarity to the metrics collection cluster. You can set the following connection settings for Clarity. These settings are optional, or required only as necessitated by your metrics collection Druid configuration. They are equivalent to, but separate from, those in the Pivot configuration.

FieldDescription
timeoutThe timeout for the metric queries. Default is 40000.
protocolThe connection protocol, one of plain (the default), tls-loose, or tls. When using the tls protocol, you must also specify ca, cert, key and passphrase.
caApplies when protocol is tls. A trusted certificate of the certificate authority if using self-signed certificates. Should be PEM-formatted text.
certApplies when protocol is tls. The client side certificate to present. Should be PEM-formatted text.
keyApplies when protocol is tls. The private key file name. The key should be PEM-formatted text.
passphraseApplies when protocol is tls. A passphrase for the private key, if needed.
defaultDbAuthTokenIf Druid authentication is enabled, the default token used to authenticate against this connection.
socksHostIf Clarity needs to connect to Druid via a SOCKS5 proxy, the hostname of the proxy host.
socksUsernameThe user for the Socks proxy, if needed.
socksPasswordThe password for proxy authentication, if needed.
Last updated on 4/20/2022
← MonitoringCloudwatch monitoring →
  • Imply Enterprise with SaaS Clarity
    • For managed Imply
    • For unmanaged Imply
    • Optional Clarity emitter configurable properties
    • SSL options
    • Proxy server options
  • On-prem Clarity
    • Step 1: Set up a metrics collection cluster
    • Step 2: Enable the metric emitter on the monitored cluster
    • Step 3: Configure Kafka ingestion on your metrics collection cluster
    • Step 4: Configure Clarity-specific settings
    • Access Clarity
    • metricsCluster connection optional parameters
2022.06
Key links
Try ImplyApache Druid siteImply GitHub
Get help
Stack OverflowSupportContact us
Learn more
Apache Druid forumsBlog
Copyright © 2022 Imply Data, Inc