Tautik Agrahari

Kafka Essentials

image

Kafka is a message stream that holds the messages. Internally Kafka has topics.

Every topic has 'n' partitions. Message is sent to a topic and depending on the configured hash key, it is put into a partition.

Within partition, messages are ordered → no ordering guarantee across partitions.

When you increase number of partitions, existing messages stay as is - no messages would be sent to them.

Key Kafka Concepts

https://bear-images.sfo2.cdn.digitaloceanspaces.com/tautik/35pm.webp

Message Distribution: When you send a message to a topic, the message is sent to one of the partitions based on:

partition = hash(partition_key) % number_of_partitions

Limitation of Kafka: You cannot reduce the number of partitions (proposal for more consumers).

number_of_consumers <= number_of_partitions

For more consumers, you need more partitions:

When you increase partitions:

Gets records from one/some/all partitions (assigned).

Ordering Guarantees

Commit & Data Deletion

Commit: How do you tell Kafka that you have processed a message? You don't delete messages here (like in SQS). What you do is you say "hey I've read till this point" or "I commit till this message" so next time when a consumer from that consumer group wakes up, it gets messages from that point onwards.

Auto commit configuration is available, but the goal is you commit after you're done processing (or before, as per what you are building).

Data deletion: Because there's no delete API, Kafka has a data deletion policy which says "I'll delete a message older than 7 days, 14 days, 30 days" - you configure your retention period.

You need to know:

You expect your consumers to process messages within the retention period. So you set retention period with buffer - if you expect processing within 14 days, set retention to 30-40 days.