Apache Kafka

Apache Kafka

Apache Kafka for Beginners

Apache Kafka is making a lot of buzz these days. When Kafka used in the right way and for the right use case, it has unique features that make it a highly attractive option for data integration.

While LinkedIn, where Kafka was founded, is the most well-known user, there are lots of companies successfully using this technology.

So now that the word is out, it looks the world wants to know: What does it do? How to use it?

In this post, I’ll try to answers those questions. I’ll begin by briefly introducing Kafka, and then demonstrate some of Kafka’s unique commands by walking through an example scenario.

What is Kafka and How to use it?

Kafka is very simple to describe at a high level, but has an unbelievable depth of technical detail when you dig deeper. The Kafka documentation does an excellent job of clarifying the many design and implementation subtleties in the system, so we will not attempt to clarify them all here. In summary, Kafka is a distributed publish-subscribe messaging system that is designed to be fast, durable, and scalable.

Like many publish-subscribe messaging systems available in the world, It maintains feeds of messages in topics. Producers(Broker’s) write data to topics and consumers read from topics which are stored in Zookeeper. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.

Messages are simply byte arrays and the developers can use them to store any object in any format – with JSON, String. It is possible to attach a key to each message, in which case the producer assurances that all messages with the same key will arrive to the same partition. When consuming from a topic, it is possible to organize a consumer group with multiple consumers. Each consumer in a consumer group will read messages from a unique subset of partitions in each topic they subscribe to, so each message is sent to one consumer in the group, and all messages with the same key arrive at the same consumer.

What makes it unique is that Kafka treats each topic partition as a log (an ordered set of messages). Each message in a partition is allocated a unique offset. It does not attempt to track which messages were read by each consumer and only preserve unread messages; rather, It retains all messages for a set amount of time, and consumers are responsible to track their location in each log. Accordingly, This can support a large number of consumers and retain large amounts of data with very little in the clouds.

Kafka Core APIs

Producer API: This API is used to publish a stream of records to one or more Kafka topics.

Consumer API: This API used as subscribe to one or more topics and process the stream of records produced to them.

Streams API: This API is used to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.

Connector API: Allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.

Basic Commands Usage

To Start Zookeeper Server

bin\windows\zookeeper-server-start.bat config\

To Start Kafka Server

bin\windows\kafka-server-start.bat config\

To Create a topic

bin\windows\kafka-topics.bat –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic a2cart
Created topic “a2cart”.

To List Topic

bin\windows\kafka-topics.bat –list –zookeeper localhost:2181

To Send some messages via broker

bin\windows\kafka-console-producer.bat –broker-list localhost:9092 –topic a2cart

To consume the message via consumer

bin\windows\kafka-console-consumer.bat –zookeeper localhost:2181 –topic a2cart –from-beginning

To Describe Topic Details

bin\windows\kafka-topics.bat –describe –zookeeper localhost:2181

Setting up a multi-broker cluster

Copy into and Change the IP Address and Port for broker in server 1 and 2 properties file.

cp config/ config/
cp config/ config/

Leave a Comment