Apache Kafka for Beginners
Apache Kafka is making a lot of buzz these days. When Kafka used in the right way and for the right use case, it has unique features that make it a highly attractive option for data integration.
While LinkedIn, where Kafka was founded, is the most well-known user, there are lots of companies successfully using this technology.
So now that the word is out, it looks the world wants to know: What does it do? How to use it?
In this post, I’ll try to answers those questions. I’ll begin by briefly introducing Kafka, and then demonstrate some of Kafka’s unique commands by walking through an example scenario.
What is Kafka and How to use it?
Kafka is very simple to describe at a high level, but has an unbelievable depth of technical detail when you dig deeper. The Kafka documentation does an excellent job of clarifying the many design and implementation subtleties in the system, so we will not attempt to clarify them all here. In summary, Kafka is a distributed publish-subscribe messaging system that is designed to be fast, durable, and scalable.
Like many publish-subscribe messaging systems available in the world, It maintains feeds of messages in topics. Producers(Broker’s) write data to topics and consumers read from topics which are stored in Zookeeper. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.
Messages are simply byte arrays and the developers can use them to store any object in any format – with JSON, String. It is possible to attach a key to each message, in which case the producer assurances that all messages with the same key will arrive to the same partition. When consuming from a topic, it is possible to organize a consumer group with multiple consumers. Each consumer in a consumer group will read messages from a unique subset of partitions in each topic they subscribe to, so each message is sent to one consumer in the group, and all messages with the same key arrive at the same consumer.
What makes it unique is that Kafka treats each topic partition as a log (an ordered set of messages). Each message in a partition is allocated a unique offset. It does not attempt to track which messages were read by each consumer and only preserve unread messages; rather, It retains all messages for a set amount of time, and consumers are responsible to track their location in each log. Accordingly, This can support a large number of consumers and retain large amounts of data with very little in the clouds.
Kafka Core APIs
Producer API: This API is used to publish a stream of records to one or more Kafka topics.
Consumer API: This API used as subscribe to one or more topics and process the stream of records produced to them.
Streams API: This API is used to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
Connector API: Allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.
Basic Commands Usage
To Start Zookeeper Server
To Start Kafka Server
To Create a topic
bin\windows\kafka-topics.bat –create –zookeeper localhost:2181 –replication-factor 1 –partitions 1 –topic a2cart
Created topic “a2cart”.
To List Topic
bin\windows\kafka-topics.bat –list –zookeeper localhost:2181
To Send some messages via broker
bin\windows\kafka-console-producer.bat –broker-list localhost:9092 –topic a2cart
To consume the message via consumer
bin\windows\kafka-console-consumer.bat –zookeeper localhost:2181 –topic a2cart –from-beginning
To Describe Topic Details
bin\windows\kafka-topics.bat –describe –zookeeper localhost:2181
Setting up a multi-broker cluster
Copy server.properties into server-1.properties and server-1.properties. Change the IP Address and Port for broker in server 1 and 2 properties file.
cp config/server.properties config/server-1.properties
cp config/server.properties config/server-2.properties