When managing Apache Kafka, you may run into a scenario to monitor how many messages are stored in a topic. Although Kafka doesn’t provide a direct command to count messages, you can achieve this using the kafka-run-class.sh
script and the kafka.tools.GetOffsetShell
utility. This guide explains the steps for counting messages in a Kafka topic, including detailed examples.
Prerequisites
Before you begin, ensure the following:
- Kafka Installation: Kafka must be installed and running on your system.
- Zookeeper/Broker Access: You need access to the Kafka broker or Zookeeper server.
- Kafka Environment Configured: The environment variable
KAFKA_HOME
should point to your Kafka installation directory.
Steps to Count Messages in a Kafka Topic
Understand the GetOffsetShell
Tool
The kafka.tools.GetOffsetShell
utility fetches the latest and earliest offsets for each partition of a Kafka topic. By subtracting the earliest offset from the latest offset for each partition, you can determine the number of messages stored.
Run the GetOffsetShell
Command
The kafka-run-class.sh
script executes the GetOffsetShell
utility. Here’s the basic syntax:
$KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list <broker-list> --topic <topic-name>
--broker-list
: Specifies the list of Kafka brokers (e.g.,broker1:9092,broker2:9092
).--topic
: Specifies the Kafka topic to analyze.
Retrieve Earliest and Latest Offsets
Run the command twice—once for the earliest offsets and once for the latest offsets:
1. Fetch Earliest Offsets:
$KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list broker1:9092 --topic my-topic --time -2
Info: The --time -2
command retrieves the first offset (earliest message) available in each partition.
Output:
my-topic:0:0
my-topic:1:0
This output indicates that partitions 0 and 1 both start at offset 0.
2. Fetch Latest Offsets:
$KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list broker1:9092 --topic my-topic --time -1
Info: The --time -1
command fetches the latest offset, which represents the next position for new messages in the log.
Output:
my-topic:0:500
my-topic:1:600
This output indicates that partition 0 ends at offset 500 and partition 1 ends at offset 600.
4. Calculate the Total Message Count
For each partition, subtract the earliest offset from the latest offset and sum the results:
- Partition 0:
500 - 0 = 500
- Partition 1:
600 - 0 = 600
Total Messages:
500 + 600 = 1100 messages
Combining Steps in a Script
To simplify the process, you can create a script that automates these calculations:
#!/bin/bash
BROKER_LIST="broker1:9092"
TOPIC_NAME="my-topic"
EARLIEST=$($KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --bootstrap-server $BROKER_LIST --topic $TOPIC_NAME --time -2 | awk -F ":" '{sum += $3} END {print sum}')
LATEST=$($KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --bootstrap-server $BROKER_LIST --topic $TOPIC_NAME --time -1 | awk -F ":" '{sum += $3} END {print sum}')
TOTAL=$(paste <(echo "$EARLIEST") <(echo "$LATEST") | awk '{total += $2 - $1} END {print total}')
echo "Total messages in topic $TOPIC_NAME: $TOTAL"
Save this script as count-kafka-messages.sh
and run it to quickly count messages in a topic.
Understanding the --time
Parameter
The --time
parameter specifies which offsets to fetch:
--time -2
: Fetches the earliest available offset in the log for each partition.--time -1
: Fetches the latest offset, which represents the next position for new messages in the log.