Count Messages in Kafka Topic

How to Count Messages in a Kafka Topic Using kafka-run-class.sh

When managing Apache Kafka, you may run into a scenario to monitor how many messages are stored in a topic. Although Kafka doesn’t provide a direct command to count messages, you can achieve this using the kafka-run-class.sh script and the kafka.tools.GetOffsetShell utility. This guide explains the steps for counting messages in a Kafka topic, including detailed examples.



Prerequisites

Before you begin, ensure the following:

  1. Kafka Installation: Kafka must be installed and running on your system.
  2. Zookeeper/Broker Access: You need access to the Kafka broker or Zookeeper server.
  3. Kafka Environment Configured: The environment variable KAFKA_HOME should point to your Kafka installation directory.

Steps to Count Messages in a Kafka Topic

Understand the GetOffsetShell Tool

The kafka.tools.GetOffsetShell utility fetches the latest and earliest offsets for each partition of a Kafka topic. By subtracting the earliest offset from the latest offset for each partition, you can determine the number of messages stored.

Run the GetOffsetShell Command

The kafka-run-class.sh script executes the GetOffsetShell utility. Here’s the basic syntax:

  • --broker-list: Specifies the list of Kafka brokers (e.g., broker1:9092,broker2:9092).
  • --topic: Specifies the Kafka topic to analyze.

Retrieve Earliest and Latest Offsets

Run the command twice—once for the earliest offsets and once for the latest offsets:

1. Fetch Earliest Offsets:

Info: The --time -2 command retrieves the first offset (earliest message) available in each partition.

Output:

This output indicates that partitions 0 and 1 both start at offset 0.

2. Fetch Latest Offsets:

Info: The --time -1 command fetches the latest offset, which represents the next position for new messages in the log.

Output:

This output indicates that partition 0 ends at offset 500 and partition 1 ends at offset 600.

4. Calculate the Total Message Count

For each partition, subtract the earliest offset from the latest offset and sum the results:

  • Partition 0: 500 - 0 = 500
  • Partition 1: 600 - 0 = 600

Total Messages:



Combining Steps in a Script

To simplify the process, you can create a script that automates these calculations:

Save this script as count-kafka-messages.sh and run it to quickly count messages in a topic.


Understanding the --time Parameter

The --time parameter specifies which offsets to fetch:

  • --time -2: Fetches the earliest available offset in the log for each partition.
  • --time -1: Fetches the latest offset, which represents the next position for new messages in the log.

  1. Apache Kafka Documentation: Tools
  2. Apache Kafka GitHub Repository

Leave a Reply

Your email address will not be published. Required fields are marked *