Offset Distribution
Offset Distribution
Kafka consumer applications can store the position for one or more topic partitions inside Kafka. This allows the consumer application to continue consuming where it left off in case the application is stopped or fails. Offset Distribution makes sure the committed offsets are distributed to all Kafka Clusters of the tenant instance. This enables consumer applications to migrate from one Kafka Cluster to another Kafka Cluster. An example would be an application moving from on premise installation to a the cloud, and using the cloud Kafka Cluster.
The Distribution Model controls if committed offsets should be distributed, and which clusters it should be distributed to.
How it works
The Offset Distributor syncs consumerGroup offsets from the source cluster to a target cluster to allow for consumer applications to start on the target cluster without having to process all data on the topic.
Suppose messages are distributed using the Message Distributor to a new Kafka cluster, there is no consumergroup offset stored, so potentially weeks of messages would be processed again by a new consumer application using the same group ID.
The Offset Distributor syncs the consumergroup offsets often by determining timestamps of all consumerGroups on all Kafka topics gathered from the __consumer_offsets
topic of the source cluster.
On the target cluster, an Offset Committer is running that receives the offset timestamps from the Offset Distributor and sets consumergroup offsets based on the timestamps.
There is some delay (below 1 minute) in this process, so there may still be some limited double processing of data.
Check status
From an operating standpoint it is important to validate:
-
Is the offset distributor running on the source cluster?
-
Is the offset committer running on the target cluster?
In combination with message distributor this may result in 4 connector tasks running for each Kafka cluster in case of a bilateral or multilateral Distribution setup:
-
message distributor
-
offset distributor
-
offset committer
-
(optional) schema distributor
Enabling Offset Distributor with Distributor Helm Charts
See the Deploying Distributor page for instructions on deploying the Offset Distributor
Enabling Offset Committer with Distributor Helm Charts
See the Deploying Distributor page for instructions on deploying the Offset Committer
Check functionality
Offset Distributor
To see if the offset distributor on the source cluster is running (apart from status, logs etc), you can get consumergroup details from the broker, by opening a shell to the Pod (or using RedPanda) for example.
-
First find the offset-distributor consumergroup:
-
/opt/kafka/bin/kafka-consumer-groups.sh --command-config /tmp/client --bootstrap-server $kafkahost --list
the offset-distributor consumergroup is named_<cluster_name>-offset-distributor-level-X-to-X
for example:_axual-demo-cluster01-offset-distributor-level-31-to-target
.
-
-
Then describe the consumergroup as such:
-
/opt/kafka/bin/kafka-consumer-groups.sh --command-config /tmp/client --bootstrap-server $kafkahost --describe --group _<clustername>-offset-distributor-level-X-to-X
-
you should see consumergroup
current-offset
andlog-end-offset
on the many partitions of the__consumer-offsets
topic.
Note theLAG
and verify the number gets reduced periodically, this shows the offset distributor is active.
-
Offset Distributor & Committer
To verify offset distribution on the target cluster, you can compare all consumergroup offsets of the source cluster with the target cluster.
For example this command /opt/kafka/bin/kafka-consumer-groups.sh --command-config /tmp/client --bootstrap-server $kafkahost --describe --offsets --all-groups
will show all offsets and lags of all consumer groups.
A properly working offset Distribution results in rapid updates on the target cluster on consumergroups that have no active consumer application.