Metrics retrieval

How Prometheus scrapes metrics

Prometheus leverages a so-called "pull mechanism", meaning that defined "targets" will be "scraped" on a set interval. The storage of these series of scrapes is sequential on disk, in custom format. There is no requirement for the storage itself, as long as it’s fast enough to keep up with ingress. More information to be found here. The generic configuration (shortened here) for what to collect looks like this:

global:
  scrape_interval:     20s
  evaluation_interval: 20s

rule_files: ['/prometheus/config/rules/*.yml']

scrape_configs:
- job_name: 'broker'
  file_sd_configs:
  - files:
    - /prometheus/config/targets/broker/*.json
- job_name: 'vm'
  file_sd_configs:
  - files:
    - /prometheus/config/targets/vm/*.json
- job_name: 'cadvisor'
  file_sd_configs:
  - files:
    - /prometheus/config/targets/cadvisor/*.json
- job_name: 'prometheus'
  file_sd_configs:
    - files:
        - /prometheus/config/targets/prometheus/*.json
- job_name: 'PickAname'
  scheme: http
  metrics_path: /actuator/prometheus
  tls_config:
    insecure_skip_verify: true
  file_sd_configs:
  - files:
    - /prometheus/config/targets/PickAname/*.json
- job_name: 'custom-tenant-apps'
  file_sd_configs:
  - files:
    - /prometheus/config/targets/tenant-apps/*.json

The referenced JSON files are populated by the Axual CLI, based upon cluster configuration files. scrape_interval and evaluation_interval are set to similar timings, but can be altered. The setting rule_files is related to Alerting.

Prometheus targets

When Prometheus is running, the current targets can be obtained by visiting the prometheus server and the list and status of endpoints is accessible through the web interface of Prometheus as well, via:

http://ip.address.prometheus.host:prometheusport/targets

If you decide to modify the YAML files, please be aware that YAML is based around indentation, one space too many or little and the entire configuration file will be deemed "corrupt". If this happens, the error isn’t really explicit, as it will not be able to precisely explain where this has happened (ie. give the wrong line number in the configuration file). Be sure to have an equivalent to "show whitespaces" setting enabled when editing these types of files.

Querying metrics

Metrics can be queried using Prometheus directly, or via Grafana. Below you see some examples of Prometheus queries for different components.

Use "Graph" to execute those queries, it can be found in the top menu, or by visiting the following URL:

http://ip.address.prometheus.host:prometheusport/graph

Example query

Below you can see an example prometheus query, obtaining a broker metric:

avg by(axual_cluster) (kafka_network_RequestMetrics_99thPercentile{name="TotalTimeMs",request="FetchFollower"})

Table 1. Results
Element	Value
{axual_cluster="cluster-example1"}	547.2866666666669
{axual_cluster="cluster-example2"}	551.345

The results show that in both clusters, the 99 percentile of the Total time spent handling "Fetchfollower" requests (eg. other brokers catching up with partition leaders), is roughly 0.5 seconds (550ms).

Obtaining metrics directly from the service

Each endpoint defined (ie. with the setting <component>_PROMETHEUS_ENABLED= set to true, plus cAdvisor and Node exporter which are always enabled), is also directly accessible on the same url that Prometheus uses and will present the full list of metrics that are "exposed".

For example, the url

http://ip.address.broker.host:brokerport/metrics

shows this:

# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 3.4966002E7
jvm_buffer_pool_used_bytes{pool="mapped",} 1.05050053816E11
...
{omitted}
...

We’re not going to list them all, as that’s far too much information. The metrics that are exposed are listed here.

The registered mbeans and recorded metrics carry the same names, but using different syntaxes.

Mbean:

kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec

Metrics:

kafka_server_BrokerTopicMetrics_Count{name="MessagesInPerSec",topic=dummy,}
kafka_server_BrokerTopicMetrics_FifteenMinuteRate{name="MessagesInPerSec",topic=dummy,}
kafka_server_BrokerTopicMetrics_FiveMinuteRate{name="MessagesInPerSec",topic=dummy,}
kafka_server_BrokerTopicMetrics_MeanRate{name="MessagesInPerSec",topic=dummy,}
kafka_server_BrokerTopicMetrics_OneMinuteRate{name="MessagesInPerSec",topic=dummy,}