Kafka retention.ms: How Kafka Decides How Long to Keep Your Data

Kafka retention.ms: How Kafka Decides How Long to Keep Your Data

Let’s talk about Kafka’s retention.ms setting, one of those settings that seems small but is actually a big deal when it comes to how Kafka handles your data. Imagine you’ve got a whiteboard in your office, and every few days, you erase old notes to make room for new ones. That’s basically what retention.ms does in Kafka—except it’s way more sophisticated and doesn’t involve dry-erase markers.

In this article, we’ll break down what kafka retention.ms means, how it works, and why it matters for managing your Kafka topics. Whether you’re building a real-time analytics system or just tinkering with Kafka for fun, understanding this setting will save you headaches (and maybe some disk space too).


TL;DR

Kafka retention.ms is a configuration that determines how long Kafka keeps messages in a topic. After the configured time (in milliseconds), old messages are eligible for deletion to free up disk space. You can set this value based on your use case, whether you need short-term logs or long-term archives.


What is kafka retention.ms?

Kafka is like a giant logbook where messages are written as they come in. But unlike a regular logbook, Kafka doesn’t keep messages forever (unless you want it to). The retention.ms setting controls how long Kafka holds onto messages before they’re cleaned up.

Here’s the breakdown:

  • Retention: How long Kafka stores messages in a topic.
  • ms: This value is in milliseconds. For example, 604800000 (7 days).

If you don’t explicitly set retention.ms, Kafka uses the default retention period, which is typically 7 days.

Important!
There are three settings for the broker: the log.retention.ms property, the log.retention.minutes poperty and the log.retention.hours property. When evaluating which confiuration to apply, if all three of them are set, the log.retention.ms supersedes the log.retention.minutes property. Similarly, the log.retention.minutes property takes priority over the log.retention.hours property. Choose your configurations carefully.


Why is kafka retention.ms Important?

Think of it as managing your clutter. Without retention limits, Kafka topics could grow indefinitely, eventually eating up all your disk space. The kafka retention.ms setting helps you strike a balance between:

  1. Disk Space Management: Keep only the data you need and delete the rest.
  2. Use Case Requirements: Retain messages long enough for your consumers to process them.
  3. Cost Efficiency: Storage isn’t free—longer retention times mean more disk usage.

How Does kafka retention.ms Work?

Kafka evaluates messages against the retention period you’ve set for a topic. Once the retention time has passed, Kafka marks those messages for deletion.

Here’s the high-level process:

  1. A message is produced and stored in a Kafka partition.
  2. Kafka tracks the timestamp when the message was written.
  3. After the retention.ms period expires, Kafka deletes the message during its cleanup process.

For example, if retention.ms=86400000 (24 hours), messages older than 24 hours are removed from the topic.


Configuring kafka retention.ms

Setting kafka retention.ms is simple, and you can do it at the topic level or as a broker-wide default.

Topic-Level Configuration:
If you want different retention times for different topics, set it when creating or updating the topic:

Broker-Level Configuration:
To set a default retention time for all topics on a broker, add this to the broker’s configuration file (server.properties):

This acts as the fallback for topics that don’t have a specific retention.ms configured.


When to Adjust kafka retention.ms

The retention period you choose depends on your use case. Here are some common scenarios:

1. Short Retention for Real-Time Systems

If you’re building a real-time analytics platform, you probably don’t need to keep messages for weeks. A shorter retention period (e.g., 1 hour or 1 day) is sufficient.

Example:

2. Long Retention for Historical Analysis

For applications that need historical data—like fraud detection or trend analysis—you’ll want a longer retention period, maybe weeks or even months.

Example:

3. Permanent Storage

Yes, Kafka can store messages indefinitely if you set retention.ms=-1. This is great for use cases where Kafka acts as an archive or primary data store. Just make sure you’ve got the disk space to back it up!

Example:


How kafka retention.ms Affects Consumers

One common question is: What happens to consumers when messages are deleted?

The good news is, Kafka doesn’t just yank the rug out from under your consumers. Once messages are deleted:

  • New Consumers: They won’t see the old messages—because they’re gone!
  • Existing Consumers: If they’ve already processed those messages, no problem. But if they try to rewind past the retention period, Kafka won’t have the data.

Best Practices for kafka retention.ms

  1. Match Retention to Business Needs:
    Don’t just pick a number out of thin air. Think about how long your consumers need access to the data.
  2. Monitor Disk Usage:
    Use tools like Kafka’s JMX metrics to keep an eye on disk usage. Retention periods directly impact how much storage Kafka consumes.
  3. Use Compact Topics When Necessary:
    For topics with unique keys, consider log compaction instead of time-based retention. Compaction keeps the latest version of each key, regardless of age.
  4. Test Before You Commit:
    Shortening retention.ms can lead to data loss if consumers aren’t ready for it. Always test changes in a staging environment.

Common Misconceptions

  1. Messages are deleted exactly when retention.ms expires.
    Nope. Kafka doesn’t immediately delete messages. Cleanup happens during scheduled log segment deletions, so there might be a slight delay.
  2. “Short retention saves all the disk space.
    It helps, but Kafka still uses some space for logs, metadata, and segments, even with short retention.

Questions?

If you have questions or want to share how you use retention.ms in your projects, drop a comment below, and I’ll respond as fast as I can.


References

  1. Apache Kafka Official Documentation
  2. Confluent: Kafka Retention Policies

Leave a Reply

Your email address will not be published. Required fields are marked *