What are the differences between Apache Kafka and RabbitMQ?

(Updated May 2017 - it’s been 4.5 years!)

Kafka is a general purpose message broker, like RabbItMQ, with similar distributed deployment goals, but with very different assumptions on message model semantics. I would be skeptical of the "AMQP is more mature" argument and look at the facts of how either solution solves your problem.

TL;DR,

a) Use Kafka if you have a fire hose of events (20k+/sec per producer) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, but most importantly you’re OK with your consumers managing the state of your “cursor” on the Kafka topic.

Kafka’s main superpower is that it is less like a queue system and more like a circular buffer that scales as much as your disk on your cluster, and thus allows you to be able to re-read messages.

b) Use Rabbit if you have messages (20k+/sec per queue) that need to be routed in complex ways to consumers, you want per-message delivery guarantees, you need one or more features of protocols like AMQP 0.9.1, 1.0, MQTT, or STOMP, and you want the broker to manage that state of which consumer has been delivered which message.

RabbitMQ’s main superpowers are that it’s a scalable, high performance queue systemwith well-defined consistency rules, and ability to create interesting exchange toplogies.

Neither offers "filter/processing" capabilities - if you need that, consider using a data flow or stream processing framework - there are many: Apache Beam (which is an abstraction on top of Google Dataflow, Flink, Spark, or Apex), Storm, NiFi, direct use of Apex, Flink, or Spark or Spring Cloud Data Flow on top of one of these solutions to add computation, filtering, querying, on your streams. You may also want to use something like Apache Cassandra or Geode or Ignite as your queryable stream cache.

Kafka traditionally hasn’t offered transactional semantics in its writes, though this is changing in 0.11.

Pivotal has recently published a reasonably fair post on when to use RabbitMQ or Kafka, which I provided some input into. Pivotal is the owner of RabbitMQ but is also a fan of using the right tool for the job, and encouraging open source innovation … and thus is a fan of Kafka!

Details:

Firstly, on RabbitMQ vs. Kafka. They are both excellent solutions, RabbitMQ being more mature, but both have very different design philosophies. Fundamentally, I'd say RabbitMQ is broker-centric, focused around delivery guarantees between producers and consumers, with transient preferred over durable messages. Whereas Kafka is producer-centric, based around partitioning a fire hose of event data into durable message brokers with cursors, supporting batch consumers that may be offline, or online consumers that want messages at low latency.

RabbitMQ uses the broker itself to maintain state of what's consumed (via message acknowledgements) - it uses Erlang's Mnesia to maintain delivery state around the broker cluster. Kafka doesn't have message acknowledgements, it assumes the consumer tracks of what's been consumed so far. Kafka brokers use Zookeeper to reliably maintain their state across a cluster.

RabbitMQ presumes that consumers are mostly online, and any messages "in wait" (persistent or not) are held opaquely (i.e. no cursor). Kafka was based from the beginning around both online and batch consumers, and also has producer message batching - it's designed for holding and distributing large volumes of messages.

RabbitMQ provides rich routing capabilities with AMQP 0.9.1's exchange, binding and queuing model. Kafka has a very simple routing approach - in AMQP parlance it uses topic exchanges only.

Both solutions run as distributed clusters, but RabbitMQ's philosophy is to make the cluster transparent, as if it were a virtual broker. Kafka makes partitions explicit, by forcing the producer to know it is partitioning a topic's messages across several nodes., this has the benefit ofpreserving ordered delivery within a partition.

RabbitMQ ensures queued messages are stored in published order even in the face of requeues or channel closure. One can setup a similar topology & order delivery to Kafka using the consistent hash exchange or sharding plugin., or even more interesting topologies.

Put another way, Kafka presumes that producers generate a massive stream of events on their own timetable - there's no room for throttling producers because consumers are slow, since the data is too massive. The whole job of Kafka is to provide the "shock absorber" between the flood of events and those who want to consume them in their own way -- some online, others offline - only batch consuming on an hourly or even daily basis.

Performance-wise, both are excellent performers, but have major architectural differences. RabbitMQ has demonstrated setups of over a million messages/sec, Kafka has demonstrated setups of several million messages/sec. The primary architectural difference is that RabbitMQ handles its messages largely in-memory and thus uses a large cluster in these benchmarks (30+ nodes), whereas Kafka proudly leverages the powers of sequential disk I/O and requires less hardware (this benchmark uses 3x 6 core / 32 GB RAM nodes).

This older paper indicates Kafka handled 500,000 messages published per second and 22,000 messages consumed per second on a 2-node cluster with 6-disk RAID 10.
http://research.microsoft.com/en...

Now, a word on AMQP. Frankly, it seems the standard was a mess but has stabilized. Officially there is a 1.0 specification standardized by OASIS . In practice it is a forked standard, with 0.9.1 being broadly deployed in production, and a smaller number of users of 1.0.

AMQP has lost some of its sheen and momentum, but it has already succeeded in its goal of helping to break the hold TIBCO had on high performance, low latency messaging through 2007 or so. Now there are many options.

quotes:

https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ