Tuesday, October 12, 2021

Kafka v/s RabbitMQ - Let's compare both

Hello Friends,

I have published posts about "Apache Kafka" in my previous posts, Check it first if you want to learn it's basic concepts.

What is Apache Kafka? 

> Apache Kafka, Setup it


So, now let's compare Apache Kafka with RabbitMQ.


What is it?

Kafka - Apache Kafka is an event streaming platform. In simple words, for high-ingress data streams and replay, Apache Kafka is a message bus optimised.
RabbitMQ - RabbitMQ is the most widely deployed open source message broker. It’s Solid, mature, general-purpose message broker which supports various standardised protocols like AMQP, STOMP, MQTT, HTTP and WebSockets


Asynchronous Messaging Patterns

Kafka
- Apache Kafka isn’t an implementation of a message broker. Instead, it’s a distributed streaming platform. Kafka’s storage layer is implemented using a partitioned transaction log. Kafka also provides a Streams API to process streams in real time and a Connectors API for easy integration with various data sources
RabbitMQ - Based on queues and exchanges, RabbitMQ is an implementation of a message broker — often referred to as a service bus. It natively supports both messaging patterns ephemeral & durable.


Primary Use

Kafka - Build applications that process and re-process streamed data on disk.
RabbitMQ - It process high-throughput and reliable background jobs, communication and integration within, and between applications.


Support for High Availability

Kafka - Yes, Apache Kafka supports high availability.
RabbitMQ - It also supports high availability.


Federated Queues

Kafka - No, Kafka does not support federated queues.
RabbitMQ - Supports federated queues. This feature offers a way of balancing the load of a single queue across nodes or clusters.


Management & Monitoring

Kafka - Yes, HTTP-API, command line tool & Kafka Tool (GUI)
RabbitMQ - Yes, HTTP-API, command line tool, and UI (dashboard) for managing and monitoring RabbitMQ.


Retry logic and dead-letter

Kafka - No
RabbitMQ - Yes, It supports dead letter queue


Message Time-To-Live (TTL)

Kafka - No
RabbitMQ - Yes


Delayed/scheduled messages

Kafka
- No,
RabbitMQ - Yes


Message retention

Kafka
- Kafka persists all messages by design up to a configured timeout per topic. In regards to message retention, Kafka does not care regarding the consumption status of its consumers as it acts as a message log.
Consumers can consume every message as much as they want, and they can travel back and forth “in time” by manipulating their partition offset. Periodically, Kafka reviews the age of messages in topics and evicts those messages that are old enough.
RabbitMQ - RabbitMQ removes messages from storage as soon as consumers successfully consume them. This behavior cannot be modified. It is part of almost all message brokers’ design.


Fault handling

Kafka - Kafka does not provide any dead-letter mechanisms out of the box. With Kafka, it is up to us to provide and implement message retry mechanisms at the application level.
RabbitMQ - Provides tools such as delivery retries and dead-letter exchanges (DLX) to handle message processing failures due to Transient or Persistent(bug in consumer script) issues.


Scale

Kafka - Yes, Its architecture using partitions means it scales better horizontally (scale-out).
RabbitMQ - Yes, It scales better vertically (scale-up).


Hosted Solution & Enterprise Support

Kafka - Yes, CloudKarafka
RabbitMQ - Yes, CloudAMQP


When to Use Which?
Kafka is preferable when we require:
1. Strict message ordering.
2. Message retention for extended periods, including the possibility of replaying past messages.
3. The ability to reach a high scale when traditional solutions do not suffice.


RabbitMQ is preferable when we need:
1. Advanced and flexible routing rules.
2. Message timing control (controlling either message expiry or message delay).
3. Advanced fault handling capabilities, in cases when consumers are more likely to fail to process messages (either temporarily or permanently).
4. Simpler consumer implementations.



Bottomline:
While RabbitMQ and Kafka are sometimes interchangeable, their implementations are very different from each other. As a result, we can’t view them as members of the same category of tools; one is a message broker, and the other is a distributed streaming platform.
So in an event-driven architecture-based system, we could use RabbitMQ to send commands between services and use Kafka to implement business event notifications.

I hope this post will help you to choose Kafka or RabbitMQ for your backend architecture.


Thank You

Wednesday, October 6, 2021

Apache Kafka, Setup it

Kafka short introduction

Kafka is an event based messaging system that safely moves data between systems.

Related Post
Before you start I would like to suggest you to read my previous post "What is Apache Kafka".

Let's talk about setup. 
You need java running in your system, let's see my current java version. Just open terminal and execute "java -version".

  $ java -version
  java version "1.8.0_301"
  Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
  Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)
  
Once you get java running in your system, follow steps to setup and get kafka running:

1. Download & Start service
a. Get latest kafka release, using below link or from official kafka build page. 
b. Unzip archive and navigate to the extracted directory in console.
c. Run the following commands in order to start all services in the correct order. (new terminal for each command)

    Start the ZooKeeper service
    $ bin/zookeeper-server-start.sh config/zookeeper.properties

    Start the Kafka broker service
    $ bin/kafka-server-start.sh config/server.properties

2. Create a topic to store your events.
Kafka is a distributed event streaming platform that lets you read, write, store, and process events (also called records or messages in the documentation) across many machines.
Example events are payment transactions, geolocation updates from mobile phones, shipping orders, sensor measurements from IoT devices or medical equipment, and much more. These events are organized and stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.
So before you can write your first events, you must create a topic (named "quickstart-events"). Open another terminal session and run:
$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092

3. Write some event into the topic
A Kafka client communicates with the Kafka brokers via the network for writing (or reading) events. Once received, the brokers will store the events in a durable and fault-tolerant manner for as long as you need—even forever.
Run the console producer client to write a few events into your topic. By default, each line you enter will result in a separate event being written to the topic.

$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
  This is my first event
  This is my second event
    
4. Read the events
Open another terminal session and run the console consumer client to read the events you just created:

$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
This is my first event
This is my second event



5. Higher Level Diagram



So this was about setup of "Apache Kafka", in next post we will use Kafka in NodeJS application.


What is Apache Kafka?

Introduction
Apache Kafka is an event streaming platform.

Let's talk in detailed about it.
Kafka is an event based messaging system that safely moves data between systems. Depending on how each component is configured, it can act as a transport for real-time event tracking or as a replicated distributed database.
Kafka combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution:
  • To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
  • To store streams of events durably and reliably for as long as you want.
  • To process streams of events as they occur or retrospectively.
And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner.

How Apache Kafka works?
Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol.

Servers: Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the brokers. Other servers run Kafka Connect to continuously import and export data as event streams to integrate Kafka with your existing systems such as relational databases as well as other Kafka clusters.

Clients: They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures. Kafka ships with some such clients included, which are augmented by dozens of clients provided by the Kafka community: clients are available for Java and Scala including the higher-level Kafka Streams library, for Go, Python, C/C++, and many other programming languages as well as REST APIs.

Kafka APIs
In addition to command line tooling for management and administration tasks, Kafka has five core APIs for Java and Scala:
  1. The Admin API to manage and inspect topics, brokers, and other Kafka objects.
  2. The Producer API to publish (write) a stream of events to one or more Kafka topics.
  3. The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
  4. The Kafka Streams API to implement stream processing applications and microservices.
  5. The Kafka Connect API to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka.

Let's see some use cases
  • Messaging: Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.
  • Website Activity Tracking: The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type.
  • Metrics: Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralised feeds of operational data.
  • Log Aggregation: Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files.
  • Stream Processing: Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing.
  • Event Sourcing: Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

Let's see some common terms

Term

Description

Cluster

The collective group of machines that Kafka is running on

Broker

A single Kafka instance

Topic

Topics are used to organize data. You always read and write to and from a particular topic

Partition

Data in a topic is spread across a number of partitions. Each partition can be thought of as a log file, ordered by time. To guarantee that you read messages in the correct order, only one instance can read from a particular partition at a time.

Producer

A client that writes data to one or more Kafka topics

Consumer

A client that reads data from one or more Kafka topics

Replica

Partitions are typically replicated to one or more brokers to avoid data loss.

Leader

Although a partition may be replicated to one or more brokers, a single broker is elected the leader for that partition, and is the only one who is allowed to write or read to/from that partition

Consumer group

A collective group of consumer instances, identified by a groupId. In a horizontally scaled application, each instance would be a consumer and together they would act as a consumer group.

Group Coordinator

An instance in the consumer group that is responsible for assigning partitions to consume from to the consumers in the group

Offset

A certain point in the partition log. When a consumer has consumed a message, it "commits" that offset, meaning that it tells the broker that the consumer group has consumed that message. If the consumer group is restarted, it will restart from the highest committed offset.



So this was introduction post about "Apache Kafka", in next post I will explain it's basic setup.