Set up Clarity
You can configure an on-prem Imply deployment to use either SaaS Clarity (recommended) or an on-prem Clarity instance. Both approaches are described below.
Imply Enterprise with SaaS Clarity
SaaS Clarity offers improved setup and maintenance over on-prem Clarity. The following steps describe how to set up Imply to use SaaS Clarity. Before starting, request API credentials for your Clarity account. Once you have that information, configure Clarity as follows.
For managed Imply
If you use the Imply Manager with Imply Enterprise (formerly Imply Private), follow these steps to enable Clarity:
- In the Imply Manager UI, click the user icon at the top-right corner of the UI.
- Click Master settings.
- Click Account.
- Enter the Clarity username and password provided to you by Imply into the Clarity user and Clarity password fields and save your settings.
- Restart all nodes in the cluster.
All clusters should now report metrics to your SaaS Clarity account.
For unmanaged Imply
If you do not use Imply Manager with Imply Enterprise, follow these steps to enable Clarity:
Open the Imply configuration file,
common.runtime.properties
.Add
clarity-emitter
todruid.extensions.loadList
.If you specify
druid.extensions.loadList
for a Druid service independently, update this configuration for each respective service'sdruid.extensions.loadList
and restart the service. For example, if the Broker configuration includesdruid.extensions.loadList
, thenclarity-emitter
needs to be added todruid/broker/runtime.properties
.Add the following emitter configuration settings at the end of the file. If you have existing emitter configs, remove those first.
# Enable JVM monitoring. druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"] # Enable Clarity emitter. druid.emitter=clarity # API details provided by Imply. druid.emitter.clarity.recipientBaseUrl=https://cc.imply.io/d/<orgname> druid.emitter.clarity.basicAuthentication=<orgname>:<apikey> # Cluster name; should be different for each cluster. druid.emitter.clarity.clusterName=<my-cluster-name>
Note additional settings in Optional Clarity emitter configurable properties.
Restart all nodes in the cluster.
Optional Clarity emitter configurable properties
You can configure Clarity behavior and settings by adding the following properties to the Druid properties file, common.runtime.properties
. Prepend all properties with druid.emitter.clarity.
followed by the field name. For example, druid.emitter.clarity.recipientBaseUrl
.
Field | Type | Description | Default | Required |
---|---|---|---|---|
recipientBaseUrl | String | HTTP endpoint events will be posted to, such as http://<clarity collector host>:<port>/d/<username> | [required] | yes |
basicAuthentication | String | Basic auth credentials, typically <username>:<password> | null | no |
clusterName | String | Cluster name used to tag events | null | no |
anonymous | Boolean | Determines if hostnames should be scrubbed from events | FALSE | no |
maxBufferSize | Integer | Maximum size of event buffer | min(250MB, 10% of heap) | no |
maxBatchSize | Integer | Maximum size of HTTP event payload | 5MB | no |
flushCount | Integer | Number of events before a flush is triggered | 500 | no |
flushBufferPercentFull | Integer | Percentage of buffer fill that will trigger a flush (byte-based) | 25 | no |
flushMillis | Integer | Period between flushes if not triggered by flushCount or flushBufferPercentFull | 60s | no |
flushTimeOut | Integer | Flush timeout | Long.MAX_VALUE | no |
timeOut | ISO8601 Period | HTTP client response timeout | PT1M | no |
batchingStrategy | String [ARRAY, NEWLINES] | How events are batched together in the payload | ARRAY | no |
compression | String [NONE, LZ4, GZIP] | Compression algorithm used | LZ4 | no |
lz4BufferSize | Integer | Block size for the LZ4 compressor in bytes | 65536 | no |
samplingRate | Integer | Percentage of metrics to emit, for sampled metrics | 100 | no |
sampledMetrics | List | The event types to sample | ["query/wait/time", "query/segment/time", "query/segmentAndCache/time"] | no |
sampledNodeTypes | List | The node types to sample | ["druid/historical", "druid/peon", "druid/realtime"] | no |
SSL options
Clarity HTTP supports HTTPS (TLS) without any special configuration.
If you need to use a custom trust store, you can specify the extra configurations in the following table.
Prepend all properties with druid.emitter.clarity.ssl.
—for example, druid.emitter.clarity.ssl.protocol
.
If you do not specify trustStorePath
, a custom SSL context is not created; the default SSL context is used instead.
Field | Description | Default | Required |
---|---|---|---|
protocol | SSL protocol to use. | TLSv1.2 | no |
trustStoreType | The type of the key store where trusted root certificates are stored. | java.security.KeyStore.getDefaultType() | no |
trustStorePath | The file path or URL of the TLS/SSL key store where trusted root certificates are stored. | none | no |
trustStoreAlgorithm | Algorithm to be used by TrustManager to validate certificate chains. | javax.net.ssl.TrustManagerFactory.getDefaultAlgorithm() | no |
trustStorePassword | The Password Provider or String password for the TrustStore. | none | yes, if trustStorePath is specified. |
Proxy server options
You can configure the Clarity emitter to connect to Clarity through a proxy server via HTTP tunneling
with the CONNECT
method.
Set the following properties to configure the proxy connection.
Prepend all properties with druid.emitter.clarity.proxy.
—for example, druid.emitter.clarity.proxy.host
.
Field | Type | Description | Default | Required |
---|---|---|---|---|
host | String | The hostname of the proxy server to connect to. | none | yes |
port | Integer | The port to connect to on the proxy server. | none | yes |
user | String | Username for basic auth, if required by the proxy server. | none | no |
password | String | Password for basic auth, if required by the proxy server. | none | no |
On-prem Clarity
Under the covers, Clarity uses Druid to store metrics. For a production on-prem installation, you should install a separate Druid cluster (the collection cluster) to receive performance data from the monitored Druid cluster (the monitored cluster).
In evaluation settings, it's possible to have a single cluster acting as both the monitored and collection cluster. However, in a production setting, this is strongly discouraged; the monitoring cluster should run independently from the cluster being monitored to ensure that monitoring functions, such as alerting, continue working if the availability or performance of the the production cluster is degraded. It also prevents Clarity operations from impacting production cluster performance.
Similarly, running Pivot from the secondary Imply instance provides the equivalent advantage.
Enabling Clarity on-prem involves these steps:
- Set up a metrics collection cluster.
- Configure your monitored cluster to emit metrics to Kafka.
- Configure the Kafka topic to which the monitored cluster emits metrics as a data source on your metrics collection cluster.
- Enable the embedded Clarity UI in your Pivot configuration.
The following diagram shows a high-level architecture for monitoring your Imply cluster with Clarity:
Step 1: Set up a metrics collection cluster
Skip this step if you plan to use the same cluster for metrics emitting and metrics collection. In production, use separate clusters.
Set up a cluster for metrics collection. Most metrics are query telemetry events, which are emitted once per query per segment. It is common for clusters to have thousands of segments resulting in numerous query telemetry events. Consider the following factors when sizing your cluster:
- Minimize size requirements of the metrics collection cluster by using load and drop rules to set a retention window on your data.
- You can configure the percentage of metrics emitted by the monitored cluster. If you have high query concurrency and wish to limit the amount of telemetry emitted, set
druid.emitter.clarity.samplingRate
when enabling the metric emitter on the monitored cluster. Configure this property on the metrics emitting cluster, not the metrics collection cluster. - For a large metrics cluster, increase the size of
taskCount
in your Kafka supervisor spec when configuring Kafka ingestion. This property configures the amount of parallelism used to process metrics. You may also need to increase the number of data servers to manage ingestion jobs and store queryable data.
Verify that the druid-histogram
extension is in the druid.extensions.loadList
in the druid/_common/common.runtime.properties
config file. This extension is used for computing 98th percentile latency metrics.
Step 2: Enable the metric emitter on the monitored cluster
For every cluster that you want to monitor, configure the Clarity emitter by following these steps:
Ensure that the
clarity-emitter-kafka
extension is in thedruid.extensions.loadList
indruid/_common/common.runtime.properties
file for the emitting cluster.Remove or comment out existing
druid.emitter
anddruid.emitter.*
configs indruid/_common/common.runtime.properties
and replace them with the following:druid.emitter=clarity-kafka druid.emitter.clarity.topic=druid-metrics druid.emitter.clarity.producer.bootstrap.servers=kafka1.example.com:9092 druid.emitter.clarity.clusterName=clarity-collection-cluster
- Replace
kafka1.example.com:9092
with a comma-delimited list of Kafka brokers in your environment. - The "clarity-collection-cluster" string can be anything you want, but it is intended to be used to help Clarity users tell different clusters apart in the Clarity UI.
- Replace
Step 3: Configure Kafka ingestion on your metrics collection cluster
Ensure that the Druid Kafka indexing service extension is loaded on the metrics collection cluster. See extensions for information on loading Druid extension.
Download the Clarity Kafka supervisor spec. Apply the spec by running the following command from the directory to which you downloaded the spec.
Replace overlord_address
with the IP address of the machine running the Overlord process in your Imply cluster.
This is typically the Master server in the Druid cluster.
curl -XPOST -H'Content-Type: application/json' -d@clarity-kafka-supervisor.json http://<overlord_address>:8090/druid/indexer/v1/supervisor
The Clarity emitter will write to a druid-metrics
topic. Start up Druid and verify that druid-metrics
exists as a datasource in the collection cluster.
Step 4: Configure Clarity-specific settings
Clarity maintains a connection to the Druid collection cluster that is separate from Pivot's own connection to Druid. Accordingly, you need to configure the connection separately.
Add the following minimum configuration settings to your Pivot configuration file. You can find the Pivot configuration file in conf/pivot/config.yaml
and conf-quickstart/pivot/config.yaml
(for a quickstart instance) in your Imply installation home.
# Specify the metrics cluster to connect to
metricsCluster:
host: localhost:8082 # Enter the IP of your metrics collecting broker node here
# Enter the name of your clarity data source
metricsDataSource: druid-metrics
# Instead of relying on auto-detection you can explicitly specify which clusters should be available from the cluster dropdown
clientClusters: ["default"]
# If your metrics data source does not have a histogram (approxHistogram) metric column then take it out of the UI by suppressing it
#suppressQuantiles: true
# If your metrics data source does have a histogram you can specify a tuning config here
#quantileTuning: "resolution=40
Replace localhost
in the metricsCluster
configuration with the IP address of the metrics collection cluster.
Provide at least one cluster name in clientClusters
parameter or Pivot may fail to start up. The name should match the one
used in druid.emitter.clarity.clusterName
in the emitting cluster's common.runtime.properties
configuration file.
Depending on the configuration of the Druid collecting cluster, you may need additional settings.
If authentication is enabled in Druid, you need to add a username and password to the metricsCluster
configuration by adding the defaultDbAuthToken
property with the auth
type. For example:
metricsCluster:
host: <broker_host>:<broker_port>
...
defaultDbAuthToken:
type: 'basic-auth'
username: <auth_user_name>
password: <auth_password>
If TLS is enabled, add the protocol
property and provide the certificate information to the metricsCluster
configuration:
metricsCluster:
host: <MetricClusterBrokerHost>:<BrokerPort>
protocol: tls
ca: <certificate>
For a self-signed certificate, you can use tls-loose
as the protocol:
metricsCluster:
host: <MetricClusterBrokerHost>:<BrokerPort>
protocol: tls-loose
Likewise, you can use any connection parameter available for connecting Pivot to Druid in the metricsCluster
configuration for connecting Clarity to the metrics collection cluster as well. See
metricsCluster settings for more information on those settings.
Access Clarity
If Pivot is running, you need to restart it to have the configuration change take effect. After restarting Pivot, you can open Clarity at the following address:
http://<pivot_address>:9095/clarity
The AccessClarity
permission is required for users to access the Clarity UI in Pivot. Of the built-in roles,
only "Super Admins" have this permission, so you'll need to allocate this permission to the users and roles as
appropriate for your system.
metricsCluster connection optional parameters
See Configure Clarity settings for the basic connection settings to connect Clarity to the metrics collection cluster. You can set the following connection settings for Clarity. These settings are optional, or required only as necessitated by your metrics collection Druid configuration. They are equivalent to, but separate from, those in the Pivot configuration.
Field | Description |
---|---|
timeout | The timeout for the metric queries. Default is 40000. |
protocol | The connection protocol, one of plain (the default), tls-loose , or tls . When using the tls protocol, you must also specify ca , cert , key and passphrase . |
ca | Applies when protocol is tls. A trusted certificate of the certificate authority if using self-signed certificates. Should be PEM-formatted text. |
cert | Applies when protocol is tls. The client side certificate to present. Should be PEM-formatted text. |
key | Applies when protocol is tls. The private key file name. The key should be PEM-formatted text. |
passphrase | Applies when protocol is tls. A passphrase for the private key, if needed. |
defaultDbAuthToken | If Druid authentication is enabled, the default token used to authenticate against this connection. |
socksHost | If Clarity needs to connect to Druid via a SOCKS5 proxy, the hostname of the proxy host. |
socksUsername | The user for the Socks proxy, if needed. |
socksPassword | The password for proxy authentication, if needed. |