How to Purge Data from Kafka Topics

How to Purge Data from Kafka Topics

Estimated reading time: 4 minutes

Purging data from Kafka topics can be essential when you need to start fresh or clear out unwanted data without deleting the topic itself. While Kafka doesn’t have a direct “purge” command, it offers cleanup policies that control how data is retained and deleted in topics. This guide covers how to purge data from Kafka topics configured with the delete and compact cleanup policies, providing practical command-line examples to help you perform data purges effectively.


TL;DR

  • For delete policy topics, modify retention.ms to trigger data deletion.
  • For compact policy topics, consider using tombstone records, fully deleting the topic, or temporarily switching to a delete policy for easier purging.


Background: Kafka Cleanup Policies

Kafka cleanup policies control how topics handle message retention and deletion. Understanding these policies is essential for managing data lifecycle and ensuring smooth Kafka operations.

  1. Delete Policy: Messages are deleted after a specified retention period (retention.ms), clearing space as needed.
  2. Compact Policy: Keeps the latest record for each unique key and deletes older records, ensuring each key retains only its latest version.

Each policy has distinct methods for purging data, so let’s explore how to clear data in topics with each policy type.


How to Purge Data in Topics with Delete cleanup.policy Policy

If a topic’s cleanup.policy is set to delete, you can temporarily change its retention settings to clear all data. This approach is suitable for cases where you don’t need to delete the topic but want to empty it.

Step 1: Temporarily Lower Retention Time

Set the topic’s retention time to a very low value to clear data. For example, to set the retention time to 1 second:

Step 2: Wait for the Cleaner Thread to Trigger

Kafka periodically checks for data that needs deletion. This interval is configured with log.retention.check.interval.ms, typically every 5 minutes. Wait a few minutes or adjust this setting for a faster purge.

Step 3: Reset Retention Time to Default

Once data is deleted, reset the retention.ms setting to its default or previous value:

Tip: If you have a Kafka UI or management tool, you can also update retention.ms through the graphical interface.


Alternative: Use kafka-delete-records for Delete Policy Topics

Another way to purge data is to use the kafka-delete-records command. This approach is helpful when you need precise control over the offsets to be deleted.

Step 1: Create an Offset Configuration File

Specify the partitions and set each offset to -1 to clear all data:

Step 2: Execute the kafka-delete-records Command

Run the following command to delete records in the specified partitions:

This will delete all records from the specified partitions without deleting the topic itself.


Purging Data in Topics with Compact cleanup Policy

Topics with a compact cleanup policy retain the latest record for each key and purge older records. Since compaction doesn’t allow straightforward deletion, here are a few approaches to purge data in compacted topics.

Option 1: Delete and Recreate the Topic

The simplest way to clear data from a compacted topic is to delete it and create a fresh one. This method fully purges all data but requires you to reconfigure the topic

Option 2: Purge Data with Tombstone Records

If you want to keep the topic but delete specific records, send a tombstone record (a message with a null payload) for each key. Tombstones mark records for deletion without deleting the topic.

Step 1: Lower the delete.retention.ms Setting

Set delete.retention.ms to a short value (e.g., 1 second):

Step 2: Send Tombstone Records for Each Key

Publish messages with a null payload for each key you want to remove. Then, reset delete.retention.ms to its original value.

Option 3: Temporarily Switch to Delete Policy (Advanced)

If sending tombstones for each key isn’t feasible, you can temporarily switch the topic to a delete cleanup policy. This approach requires setting retention.ms=-1 initially to prevent unintended deletions.

Step 1: Set retention.ms to Prevent Deletion

Step 2: Switch to Delete Policy and Perform Cleanup

Follow the steps for purging data as in the delete policy section, then revert back to compact:

Step 3: Reset the Policy to Compact

Note: Changing the cleanup policy may impact other applications, so use this method with caution and preferably in non-production environments.


Additional Tips and Best Practices

  1. Avoid Frequent Purges: Regular purging may indicate inefficiencies. If purging is required frequently, consider adjusting the topic’s cleanup policies or retention settings.
  2. Test in Development Environments: Test any data-purging operations in staging or development environments to avoid data loss in production.
  3. Choose Appropriate Retention Policies: Setting the correct retention time and cleanup policy can help manage data size effectively without the need for frequent manual intervention.

  1. Stack Overflow: How to purge data from Kafka topics
  2. Kafka Documentation on Cleanup Policies
  3. Kafka Delete Records Documentation

More topics to read

Leave a Reply

Your email address will not be published. Required fields are marked *