Alerting
Alerting
The following alerts are coming out-of-the-box with Axual-Connect
-
Connect Metrics Unreachable - When the metrics endpoint for connect is unreachable for more than 5 minutes
-
Connector Failing - When a connector task is failing for more than 5 minutes
To read more about alerting see Alerting |
Below you can see an example alert-definition for connect.
platform-deploy/include/prometheus/alert-rules/connect.yml
groups:
- name: ConnectRules
rules:
- alert: ConnectMetricsUnreachable
expr: up{is_axual_connect="true"} == 0
for: 5m
labels:
severity: high
callout: true
annotations:
summary: "Instance '{{ $labels.axual_instance }}' machine '{{ $labels.instance }}' connect metrics unreachable for more than 5 minutes."
- alert: ConnectorFailing
expr: kafka_connect_connector_metrics_status{is_axual_connect="true"} == 4
for: 5m
labels:
severity: high
callout: true
annotations:
summary: "Instance '{{ $labels.axual_instance }}' machine '{{ $labels.instance }}' connect task failing for more than 5 minutes."