Kafka Active Segment Deletion

Kafka Active Segment Deletion: When and How It Happens

Estimated reading time: 1 minute

Apache Kafka efficiently manages log data by segmenting and retaining records across time-based or size-based thresholds. However, a question that often arises is how Kafka handles segment deletion, especially for low-volume topics with idle data streams. At SocketDaddy.com, we regularly work with Kafka to ensure efficient data handling, and understanding Kafka’s retention and deletion policies is essential for optimizing these configurations. This post focuses on when Kafka will delete log segments and how it handles active segments that may otherwise idle beyond retention limits.


TL;DR

Kafka does not delete active segments directly. However, if an active segment breaches retention.ms due to inactivity or low data volume, Kafka will force a roll, transitioning it to an inactive state that is eligible for deletion. This mechanism prevents any segment from remaining open indefinitely when retention limits are exceeded.


What Is an Active Segment in Kafka?

Kafka stores log data within partitions, with each partition containing multiple log segments. Each partition has exactly one active segment, which receives all incoming writes. This segment remains active until it meets the configured thresholds set by segment.ms (time) or segment.bytes (size). When either threshold is exceeded, Kafka rolls the segment, closing it and creating a new one.

Typically, active segments are not immediately subject to deletion by retention policies. Kafka restricts retention-based deletion to segments that have already rolled and are marked as inactive, preserving data integrity by ensuring active segments are not deleted prematurely.

Kafka’s Retention Policy and Segment Rolling

Two key configurations manage Kafka’s segment rolling:

  1. segment.ms (Time-Based): This setting specifies the maximum time an active segment remains open before Kafka initiates a roll. When segment.ms is exceeded, Kafka rolls the segment and creates a new active one.
  2. segment.bytes (Size-Based): This configuration sets the maximum size of an active segment in bytes. Once a segment reaches this size, Kafka rolls it regardless of the time elapsed. This mechanism is particularly useful for high-volume topics.

Upon rolling, the segment becomes inactive, and Kafka’s retention policies (retention.ms and retention.bytes) govern its eligibility for deletion.

What Happens to an Active Segment When Retention Is Breached?

Active segments are generally shielded from deletion, as they’re still receiving writes. However, Kafka includes a mechanism to handle cases where an active segment breaches retention.ms due to low data volume or idle conditions. When this occurs, Kafka force-rolls the active segment, closing it and transitioning it to inactive status, making it eligible for deletion based on retention policies.

This approach prevents segments from remaining open indefinitely when retention limits are reached. For example, the Kafka log entry below illustrates Kafka’s detection and force-rolling of a segment once it surpasses the retention.ms threshold:

Relevant Sections in Kafka’s Codebase

In Kafka’s UnifiedLog.scala file, several key methods outline the segment retention and deletion logic:

  1. deletableSegments Method: This method iterates over segments and applies deletion criteria based on retention policies. It ensures active segments are not immediately deleted but force-rolls them when necessary due to retention.ms breaches. View the deletableSegments method in the code.
  2. Forced Rolling on Retention Breach: The code includes specific logic to handle idle segments by rolling them once they exceed retention.ms. This aspect of Kafka’s behaviour prevents active segments from remaining open longer than specified in the retention settings. See force-rolling implementation here.

For further details on this behaviour and historical discussions, the following Kafka JIRA issues provide valuable context:

  • KAFKA-7910: Document retention.ms behavior with record timestamp
  • KAFKA-4336: This issue addresses how Kafka handles retention and deletion in log segments, with particular reference to active segments.
  • KAFKA-475: This issue discusses segment management and improvements around forced rolling under low data conditions.

Best Practices for Configuring Segment Rolling and Retention

Configuring segment rolling and retention settings is important to ensure optimal Kafka performance, especially for topics with low or infrequent data. Here are a few best practices:

  • Configure segment.ms Mindfully: Setting a reasonable segment.ms for low-volume topics helps ensure timely segment rolls, preventing any segment from staying open indefinitely. This prevents active segments from hitting retention.ms without rolling.
  • Set retention.ms Thoughtfully: For low-data topics, setting retention.ms appropriately helps avoid premature deletion. This balance prevents unnecessary segment rolling and deletion for partitions that receive data infrequently.
  • Monitor Segment and Log Activity: Regularly monitoring Kafka logs can provide insights into how retention and rolling configurations impact segment behavior. Log messages, like the one above, indicate when Kafka enforces a force-roll due to retention policies.

Conclusion

Kafka’s segment management strategy balances data retention with efficient storage handling. Active segments are typically exempt from retention-based deletion until they roll. However, Kafka force-rolls any active segment that breaches retention.ms due to inactivity, ensuring compliance with retention limits without prematurely deleting active data.

Further reading

Leave a Reply

Your email address will not be published. Required fields are marked *