Alerting
Introduction
The following alerts are coming out-of-the-box with Axual 2020.2
-
Connect Metrics Unreachable - When connect metrics unreachable for more than 5 minutes
-
Connector Failing - When connect task failing for more than 5 minutes
To read more about alerting see Alerts |
platform-deploy/include/prometheus/alert-rules/connect.yml
groups:
- name: ConnectRules
rules:
- alert: ConnectMetricsUnreachable
expr: up{is_axual_connect="true"} == 0
for: 5m
labels:
severity: high
callout: true
annotations:
summary: "Instance '{{ $labels.axual_instance }}' machine '{{ $labels.instance }}' connect metrics unreachable for more than 5 minutes."
- alert: ConnectorFailing
expr: kafka_connect_connector_metrics_status{is_axual_connect="true"} == 4
for: 5m
labels:
severity: high
callout: true
annotations:
summary: "Instance '{{ $labels.axual_instance }}' machine '{{ $labels.instance }}' connect task failing for more than 5 minutes."