Consumer Time to Catch Up

The metric provides the time for consumer group to catch up on a topic/partition at any given time, i.e. number of seconds to consume published but not consumed yet messages by this consumer group.

Use cases

Is my application able to handle the amount of messages that are produced to the topic I am consuming?

It’s crucial to have consumers that can handle the load produced to the topic.

Using this metric you can see if your application has any issues with that. To do so, use basic usage (depending on your case, you can do it with or without aggregator)

If value is constant or negative and increasing, that means that your application is good, and you shouldn’t investigate it.

In other cases, there are possible problems:

  • not enough consumers

  • one of the consumers is stuck (for example, can’t deserialize a message from one of the partitions)

  • not enough partitions - parallelism feature of Kafka to increase consuming speed

Basic usage

Please refer to the example consumer time to catch up provided in the API docs


This request is asking for the value in seconds of the stream payment-events-stream on environment dev for the application consumer_app for the last 5 minutes with the step-size of 1 minute.

Basic Request
{
  "metric": "io.axual.application/consumer_time_to_catch_up",
  "stepSize": "PT1M",
  "timeWindow": "PT5M",
  "filter": {
    "type": "AND",
    "filters": [
      {
        "type": "FIELD",
        "field": "environment",
        "operation": "EQUALS",
        "value": "dev"
      },
      {
        "type": "FIELD",
        "field": "stream",
        "operation": "EQUALS",
        "value": "payment-events-stream"
      },
      {
        "type": "FIELD",
        "field": "applicationId",
        "operation": "EQUALS",
        "value": "payment_event_emitter"
      }
    ]
  }
}

The response could represent positive and negative values which means:

  • positive value : the consumer is lagging behind

  • negative value : the estimate time after which the consumer will be reading the last offset message

  • zero value : the consumer is neither catching up nor lagging behind

Basic Response
{
  "type": "UNGROUPED",
  "dataPoints": [
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": -2502.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": -372.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": -30.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "0",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": -66.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": 0.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "1",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": -780.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": -3414.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": -30.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": -6.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "2",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": -54.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": -6.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "3",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": 0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": -78.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": -12.0,
      "labels": {
        "__name__": "kafka_consumergroup_lag",
        "axual_cluster": "jupiter",
        "consumergroup": "axual-qa-dev-payment_event_emitter",
        "partition": "4",
        "pod": "jupiter-kafka-exporter-6867f8ccf4-9b8fs",
        "topic": "axual-qa-dev-payment-events-stream"
      },
      "unit": "Seconds"
    }
  ]
}

This metric could be used to know how many seconds are left for consumers to catch up, configure alerting for situations when consumer is lagging,

Advanced usage

Using aggregator

By adding aggregator to the request, the seconds left to consume messages will be aggregated over all partitions.

For instance asking for the avg aggregation function, will lead to get the average estimated time to catch up.

Request using avg aggregator
{
  "metric": "io.axual.application/consumer_time_to_catch_up",
  "stepSize": "PT1M",
  "timeWindow": "PT5M",
  "groupBy": [],
  "aggregator": "avg",
  "filter": {
    "type": "AND",
    "filters": [
      {
        "type": "FIELD",
        "field": "environment",
        "operation": "EQUALS",
        "value": "dev"
      },
      {
        "type": "FIELD",
        "field": "stream",
        "operation": "EQUALS",
        "value": "payment-events-stream"
      },
      {
        "type": "FIELD",
        "field": "applicationId",
        "operation": "EQUALS",
        "value": "payment_event_emitter"
      }
    ]
  }
}

The below response represents the average value of seconds to consume messages of the stream on a Kafka cluster. As you can see, value decreases over time meaning that consumers are catching up.

Response using avg aggregator
{
  "type": "UNGROUPED",
  "dataPoints": [
    {
      "timestamp": "2022-10-24T11:18:00",
      "value": -120.0,
      "labels": {},
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:19:00",
      "value": -10.8,
      "labels": {},
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:16:00",
      "value": -156.0,
      "labels": {},
      "unit": "Seconds"
    },
    {
      "timestamp": "2022-10-24T11:17:00",
      "value": -1183.2,
      "labels": {},
      "unit": "Seconds"
    }
  ]
}

As sum aggregator shows all unconsumed messages, another option is to use max to see the most lagging replica.

Using groupBy

If you want to get response grouped by some label - you can use groupBy

Request using groupBy
{
  "metric": "io.axual.application/consumer_time_to_catch_up",
  "stepSize": "PT1M",
  "timeWindow": "2022-10-24T11:14:00Z/2022-10-24T11:19:00Z",
  "groupBy": [
    "partition"
  ],
  "aggregator": "avg",
  "filter": {
    "type": "AND",
    "filters": [
      {
        "type": "FIELD",
        "field": "environment",
        "operation": "EQUALS",
        "value": "dev"
      },
      {
        "type": "FIELD",
        "field": "stream",
        "operation": "EQUALS",
        "value": "payment-events-stream"
      },
      {
        "type": "FIELD",
        "field": "applicationId",
        "operation": "EQUALS",
        "value": "payment_event_emitter"
      }
    ]
  }
}
The below response is different from every other metrics. Data is aggregated over timestamps
Response using groupBy
{
  "type": "GROUPED",
  "groups": [
    {
      "labels": {
        "partition": "0"
      },
      "dataPoints": [
        {
          "timestamp": null,
          "value": -726.0,
          "labels": {
            "partition": "0"
          },
          "unit": "Seconds"
        }
      ]
    },
    {
      "labels": {
        "partition": "1"
      },
      "dataPoints": [
        {
          "timestamp": null,
          "value": -16.5,
          "labels": {
            "partition": "1"
          },
          "unit": "Seconds"
        }
      ]
    },
    {
      "labels": {
        "partition": "2"
      },
      "dataPoints": [
        {
          "timestamp": null,
          "value": -1057.5,
          "labels": {
            "partition": "2"
          },
          "unit": "Seconds"
        }
      ]
    },
    {
      "labels": {
        "partition": "3"
      },
      "dataPoints": [
        {
          "timestamp": null,
          "value": -15.0,
          "labels": {
            "partition": "3"
          },
          "unit": "Seconds"
        }
      ]
    },
    {
      "labels": {
        "partition": "4"
      },
      "dataPoints": [
        {
          "timestamp": null,
          "value": -22.5,
          "labels": {
            "partition": "4"
          },
          "unit": "Seconds"
        }
      ]
    }
  ]
}