Kafka, on the other hand, when you write a message to the broker the broker writes it immediately to disk queue rather than holding it in memory. But isn't that slower? No, it's not, because it's in page cache, which is managed more efficiently than garbage collected memory. Then, when consuming, rather than keeping metrics for each individual message being received, consumers simply have a log position -- they periodically commit, which tells the broker that all of the messages until that point have been consumed. If they never commit, eventually another consumer will get those messages.
So basically, it scales a ton better because you're just doing scads of sequential I/O with occasional commits, rather than tracking a bunch of messages in memory individually (which in theory should be fast but causes GC problems).
EDIT: should add that morkbot had a great link too:
https://news.ycombinator.com/item?id=6874607 http://www.quora.com/RabbitMQ/RabbitMQ-vs-Kafka-which-one-fo...
:-)
Writing Java code so that there are no perceivable GC pauses is an art, but it is not impossible to achieve.
JVM might require more RAM upfront, but a well-written program is usually reasonably memory-efficient, too, so the consumed memory grows reasonably slowly with the problem size.
Writing things in pure C is often just too time-consuming.
I'm not convinced. Java I/O is far form perfect, and Kafka is probably very heavy on I/O side.
> and usually faster than e.g. Go.
That's strange, since Go to some degree was intended as replacement for Java without having Java's downsides. Why would Go be less performant?
I'd be interested if someone would write such framework in Rust though. C++ is of course a default expectation, but usage of Java somehow surprises me in this case.
You should never discredit a language, especially with blanket terms such as "its faster than Go". In what respects and in what areas? Here's a blog post which performs benchmarks on Go and Scala: http://eng.42go.com/scala-vs-go-tcp-benchmark/
They found Go to perform better than scala, however it had a high footprint. Every language has its tradeoffs, Java and Go are no exception.
Here's a good writeup talking about sequential I/O in Java: http://mechanical-sympathy.blogspot.com/2011/12/java-sequent...
Note, that unlike Hadoop, original Google's map reduce system was written in C++.
RabbitMQ developer on the kafka-users list: http://mail-archives.apache.org/mod_mbox/kafka-users/201306....
SO discussion on several queuing systems: http://stackoverflow.com/questions/731233/activemq-or-rabbit...
And it has sharding, which no-other messagequeue has (i think).
- You receive a message, but the system can't tell you why your received it nor what you should do. (The Trial)
- It's not a distributed messaging system with bugs. Actually, you are the bug. (Metamorphosis)
As an aside, I went to a tech conference in Prague two months ago and visited Café Slavia, a hangout not just of Kafka, but also author Milan Kundera and president/poet Václav Havel. I had a glass of absinthe in their honour.
... or maybe you're just overthinking it.
A full company, scalable event bus like this can totally revolutionize the way you build services.
Shameless (and shameful) plug, but if anyone wants to be part of such an enterprise that's already gained traction with big companies, send me a message!
A few of the major improvements (from https://archive.apache.org/dist/kafka/0.8.0/RELEASE_NOTES.ht...):
* Intra-cluster replication support
* Support multiple data directories
* Many new internal metrics
* Time based log segment rollout
Plus many bug fixes and other improvements.