Consumer Lag in Records

The metric provides the lag of consumer groups on a topic/partition at any given time, i.e. number of messages published but not consumed yet by this consumer group.

Use cases

Is my application able to handle the amount of messages that are produced to the topic I am consuming?

It’s crucial to have consumers that can handle the load produced to the topic.

Using this metric you can see if your application has any issues with that. To do so, use basic usage (depending on your case, you can do it with or without aggregator)

If value is constant and positive or increasing, that means that your application is lagging, and you should investigate it.

Possible problems:

  • not enough consumers

  • one of the consumers is stuck (for example, can’t deserialize a message from one of the partitions)

  • not enough partitions - parallelism feature of Kafka to increase consuming speed

Basic usage

Please refer to the example consumer lag in records provided in the API docs


This request is asking for the size of lag in messages/records of the stream payment-events-stream on environment dev for the application consumer_app for the last 5 minutes with the step-size of 1 minute.

Basic Request
{
  "metric": "io.axual.application/consumer_lag_records",
  "stepSize": "PT1M",
  "timeWindow": "PT5M",
  "filter": {
    "type": "AND",
    "filters": [
      {
        "type": "FIELD",
        "field": "environment",
        "operation": "EQUALS",
        "value": "dev"
      },
      {
        "type": "FIELD",
        "field": "stream",
        "operation": "EQUALS",
        "value": "payment-events-stream"
      },
      {
        "type": "FIELD",
        "field": "applicationId",
        "operation": "EQUALS",
        "value": "payment_event_emitter"
      }
    ]
  }
}

The below part of sample response, represents the lag of an application for each partition.

Basic Response
{
  "type": "UNGROUPED",
  "dataPoints": [
    {
      "timestamp": "2022-10-24T11:15:00",
      "value": 2135,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 2135,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 2085,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": 1795,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 576,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:15:00",
      "value": 640,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 640,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 640,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": 330,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:15:00",
      "value": 2434,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 2260,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 2221,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": 710,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 90,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:15:00",
      "value": 770,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 770,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 770,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": 361,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 40,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:15:00",
      "value": 883,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 883,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 883,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": 503,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 93,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Records"
    }
  ]
}

This metric could be used to know how many messages are left for consumer to read and configure alerting for situations when consumer is stuck.

Advanced usage

Using aggregator

By adding aggregator to the request, the size of unconsumed messages will be aggregated over all partitions.

For instance asking for the sum aggregation function, will lead to get the total size of unconsumed messages in the topic.

Request using sum aggregator
{
  "metric": "io.axual.application/consumer_lag_records",
  "stepSize": "PT1M",
  "timeWindow": "PT5M",
  "aggregator": "sum",
  "groupBy": [],
  "filter": {
    "type": "AND",
    "filters": [
      {
        "type": "FIELD",
        "field": "environment",
        "operation": "EQUALS",
        "value": "dev"
      },
      {
        "type": "FIELD",
        "field": "stream",
        "operation": "EQUALS",
        "value": "payment-events-stream"
      },
      {
        "type": "FIELD",
        "field": "applicationId",
        "operation": "EQUALS",
        "value": "payment_event_emitter"
      }
    ]
  }
}

The below response represents the size of unconsumed messages of the stream on a Kafka cluster. As you can see, value decreases over time meaning that messages are getting consumed.

Response using sum aggregator
{
  "type": "UNGROUPED",
  "dataPoints": [
    {
      "timestamp": "2022-10-24T11:15:00",
      "value": 6862,
      "labels": {},
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 6688,
      "labels": {},
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 6599,
      "labels": {},
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": 3699,
      "labels": {},
      "unit": "Records"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 799,
      "labels": {},
      "unit": "Records"
    }
  ]
}

As sum aggregator shows all unconsumed messages, another option is to use max to see the most lagging replica.