Estimated reading time: 5 minutes
Apache Kafka is a popular distributed streaming platform that thousands of companies worldwide use to build scalable, high-throughput, real-time streaming systems. The Kafka Topic Naming Conventions have been one of the most controversial and hotly debated topics around this technology for years. This post will provide the best practices for naming Kafka topics. I will also guide you through deciding on the naming conventions and the do’s and don’ts when you set up your systems.
Despite its popularity among companies like The New York times, Pinterest and LinkedIn, there’s little guidance on naming Kafka Topics. There is tons of material on deciding the partitions and replication factor for your topics. However, there is not much information on how to name them.
You typically would decide the names for the topics based on conventions and practices followed at your company. These could also be based on your personal preferences. If you look at the very few results that show up on Google for Kafka Topic Naming Conventions, most results would recommend a convention that would look something like these:
<namespace> .<product>.<event-type>
<application>-<data-type>-<event-type>
<organization>.<application-name>.<event-type>.<event>
Chris Riccomini, in his excellent blog post, says that he has had great success with the following convention:
<message type>.<dataset name>.<data name>
All of these look perfectly fine and meet the needs. However, is there more than what meets the eye? Let’s look more closely at what makes a good topic name.
What names are valid in Kafka?
Kafka enforces a set of “legal” characters that can constitute a topic name. Valid characters for Kafka topics are the ASCII Alphanumeric characters, ‘.’, ‘_’, and ‘-‘. So, anything that matches the following pattern can be a valid Kafka Topics name.
val legalChars = "[a-zA-Z0-9\\._\\-]"
However, one thing to remember is that due to limitations in metric names, topics with a period (‘.’) or underscore (‘_’) could collide. To avoid issues, it is best to use either but not both.
There is no control or way around what Kafka enforces. However, you can use this rule as a foundation to get creative and develop more standard naming conventions.
It is worth emphasizing that the topic names are case-sensitive. So topicName
is not the same as topicname
or TopicName
. Kafka would treat all three of them as individual topics.
Naming the topics
Let’s look at some guidelines that you should consider when naming your Kafka Topics.
Decide the format for the topic names.
The first and most important thing you must consider is the format you want to follow for all your topics. As mentioned, Kafka allows all ASCII Alpha-numeric characters, periods, underscores and hyphens. You can format a topic name in several ways. Some examples are:
my-topic-name
myTopicName
my_topic_name
itlabs.mytopic.name
ITLabs-Website-Tracker
Deciding this as a first step is necessary to ensure consistency in your naming conventions and patterns. Readability and ease of understanding of the topic names play a key role in these decisions, and you don’t want inconsistent patterns in your system’s topic names.
What fields should be part of your Kafka topic naming conventions?
The next step in naming Kafka topics is defining what fields should go in the name and in what order they should appear. Some of the best practices Chris Riccomini suggests are:
Do not use fields that change.
Avoid fields in the topic names that would change over time — such as the consumer name, the team name, the owner of the topics, etc. Once you create a topic in Kafka, it is impossible to rename them.
Leave metadata and schema information out of names
If you can find the nature of the data in the topic or information related to the field elsewhere, such as the Kafka Metadata or a schema registry, leave them out of the topic names. The schema registry can provide you with information about a schema for a given topic. Kafka brokers provide topic metadata information. Since there are other sources of truth for this information, it’s best to keep them out of the topic names.
Avoid Partitions, security info, etc in topic names
This is similar to the previous point. All the metadata information, such as the partition count, security levels, and configurations, are available in the topic’s metadata and through the Kafka Brokers. Avoid including these fields in topic names.
Don’t tie topic names to consumers or producers
Never decide on a topic name based on the producers or the consumers of that topic. The number of producers and the number of consumers can change over time. You should not include a dynamic field value that changes over time in a topic name.
Enforcing Kafka Topic Naming Conventions
The first step to make sure that the users adhere to the naming conventions is to disallow any random user from being able to create a topic. You can do this by disabling the Auto-create Topic functionality in Apache Kafka by setting auto.create.topics.enable=false in the broker configurations.
Another way to ensure the naming conventions are followed is to automate the topic creation. Take the topic names as inputs and validate the pattern before creating them.
An automated script or utility should also monitor the topics in a Kafka Cluster to validate the topic names and flag any violations of the standard naming conventions and formats.
Conclusion
While you can technically name your Kafka Topics anything you want (as long as it meets the Kafka Legal Character rules), you must have a standard naming convention for the Kafka topics you create. This is essential to ensure your Kafka environment is not cluttered. Such naming conventions and standards must be enforced earlier in the environment because once you create a Kafka topic, it is impossible to rename it.