Estimated reading time: 4 minutes
Purging data from Kafka topics can be essential when you need to start fresh or clear out unwanted data without deleting the topic itself. While Kafka doesn’t have a direct “purge” command, it offers cleanup policies that control how data is retained and deleted in topics. This guide covers how to purge data from Kafka topics configured with the delete and compact cleanup policies, providing practical command-line examples to help you perform data purges effectively.
TL;DR
- For delete policy topics, modify
retention.ms
to trigger data deletion. - For compact policy topics, consider using tombstone records, fully deleting the topic, or temporarily switching to a delete policy for easier purging.
Table of contents
Background: Kafka Cleanup Policies
Kafka cleanup policies control how topics handle message retention and deletion. Understanding these policies is essential for managing data lifecycle and ensuring smooth Kafka operations.
- Delete Policy: Messages are deleted after a specified retention period (
retention.ms
), clearing space as needed. - Compact Policy: Keeps the latest record for each unique key and deletes older records, ensuring each key retains only its latest version.
Each policy has distinct methods for purging data, so let’s explore how to clear data in topics with each policy type.
How to Purge Data in Topics with Delete cleanup.policy
Policy
If a topic’s cleanup.policy
is set to delete, you can temporarily change its retention settings to clear all data. This approach is suitable for cases where you don’t need to delete the topic but want to empty it.
Step 1: Temporarily Lower Retention Time
Set the topic’s retention time to a very low value to clear data. For example, to set the retention time to 1 second:
kafka-configs --bootstrap-server <KAFKA_CLUSTER> --entity-type topics --entity-name <TOPIC_NAME> --alter --add-config retention.ms=1000
Step 2: Wait for the Cleaner Thread to Trigger
Kafka periodically checks for data that needs deletion. This interval is configured with log.retention.check.interval.ms
, typically every 5 minutes. Wait a few minutes or adjust this setting for a faster purge.
Step 3: Reset Retention Time to Default
Once data is deleted, reset the retention.ms
setting to its default or previous value:
kafka-configs --bootstrap-server <KAFKA_CLUSTER> --entity-type topics --entity-name <TOPIC_NAME> --alter --delete-config retention.ms
Tip: If you have a Kafka UI or management tool, you can also update retention.ms
through the graphical interface.
Alternative: Use kafka-delete-records
for Delete Policy Topics
Another way to purge data is to use the kafka-delete-records
command. This approach is helpful when you need precise control over the offsets to be deleted.
Step 1: Create an Offset Configuration File
Specify the partitions and set each offset to -1
to clear all data:
// File: offsetToBeDeleted.json
{
"partitions": [
{"topic": "myTopic", "partition": 0, "offset": -1},
{"topic": "myTopic", "partition": 1, "offset": -1}
],
"version": 1
}
Step 2: Execute the kafka-delete-records
Command
Run the following command to delete records in the specified partitions:
kafka-delete-records --bootstrap-server <KAFKA_CLUSTER> --command-config security.properties --offset-json-file offsetToBeDeleted.json
This will delete all records from the specified partitions without deleting the topic itself.
Purging Data in Topics with Compact cleanup Policy
Topics with a compact cleanup policy retain the latest record for each key and purge older records. Since compaction doesn’t allow straightforward deletion, here are a few approaches to purge data in compacted topics.
Option 1: Delete and Recreate the Topic
The simplest way to clear data from a compacted topic is to delete it and create a fresh one. This method fully purges all data but requires you to reconfigure the topic
kafka-topics --bootstrap-server <KAFKA_CLUSTER> --delete --topic <TOPIC_NAME>
kafka-topics --bootstrap-server <KAFKA_CLUSTER> --create --topic <TOPIC_NAME> --partitions <NUM_PARTITIONS> --replication-factor <NUM_REPLICAS>
Option 2: Purge Data with Tombstone Records
If you want to keep the topic but delete specific records, send a tombstone record (a message with a null payload) for each key. Tombstones mark records for deletion without deleting the topic.
Step 1: Lower the delete.retention.ms
Setting
Set delete.retention.ms
to a short value (e.g., 1 second):
kafka-configs --bootstrap-server <KAFKA_CLUSTER> --entity-type topics --entity-name <TOPIC_NAME> --alter --add-config delete.retention.ms=1000
Step 2: Send Tombstone Records for Each Key
Publish messages with a null
payload for each key you want to remove. Then, reset delete.retention.ms
to its original value.
Option 3: Temporarily Switch to Delete Policy (Advanced)
If sending tombstones for each key isn’t feasible, you can temporarily switch the topic to a delete cleanup policy. This approach requires setting retention.ms=-1
initially to prevent unintended deletions.
Step 1: Set retention.ms
to Prevent Deletion
kafka-configs --bootstrap-server <KAFKA_CLUSTER> --entity-type topics --entity-name <TOPIC_NAME> --alter --add-config retention.ms=-1
Step 2: Switch to Delete Policy and Perform Cleanup
kafka-configs --bootstrap-server <KAFKA_CLUSTER> --entity-type topics --entity-name <TOPIC_NAME> --alter --add-config cleanup.policy=delete
Follow the steps for purging data as in the delete policy section, then revert back to compact:
Step 3: Reset the Policy to Compact
kafka-configs --bootstrap-server <KAFKA_CLUSTER> --entity-type topics --entity-name <TOPIC_NAME> --alter --add-config cleanup.policy=compact
Note: Changing the cleanup policy may impact other applications, so use this method with caution and preferably in non-production environments.
Additional Tips and Best Practices
- Avoid Frequent Purges: Regular purging may indicate inefficiencies. If purging is required frequently, consider adjusting the topic’s cleanup policies or retention settings.
- Test in Development Environments: Test any data-purging operations in staging or development environments to avoid data loss in production.
- Choose Appropriate Retention Policies: Setting the correct retention time and cleanup policy can help manage data size effectively without the need for frequent manual intervention.
Reference Links
- Stack Overflow: How to purge data from Kafka topics
- Kafka Documentation on Cleanup Policies
- Kafka Delete Records Documentation