At a glance

Kafka is a distributed publish-subscribe messaging system developed at LinkedIn, designed to support a very high throughput, persistent messages and parallel loading into Hadoop. A proposal has been submitted to make Kafka an Apache incubator project.

print this page Print     permalink Permalink

Latest articles

Kafka proposed as Apache incubator project

At the end of November 2010, LinkedIN Kafka LinkedIn officially published its first public Kafka release, along with some design documents. It was immediately clear that it was going to be a real contender in the messaging systems arena, and some of its features immediately caught my eyes: Hadoop support, distributed and persistent queue, extremely high throughput (hundreds of thousands of messages per second).

At DataSift DataSift we deal with several thousand messages every second, and that means that we need to move data around really, really fast. For a while, we've been using Redis internally for the intermediate queues, perhaps abusing its pub-sub support. Redis has served us well so far, but it showed several minor issues, and we've gradually replacing it with real message systems, or to be more precise with a high performance duo: Kafka and 0mq. I'm going to talk about how we use 0mq and the patterns that are more appropriate for each use case in another article.

Today I'm happy to read that Kafka has been proposed as an Apache incubator project. Since November, it's come a long way: after the native Scala and Java clients, I contributed the first PHP client, and now there are clients in Python, Ruby, C#, Clojure, Node.Js. This week, my colleague Ben wrote a C++ producer that we'll contribute soon.

Kafka is still a young project, but it's maturing fast, and we're confident enough to use it in production (as a matter of fact, we've been using it for months now) in front of our HBase cluster and to collect monitoring events sent from all our internal services. We chose Kafka especially for its persistent storage (which is essentially a partitioned binary log), but we plan to do some analytics via its support for Hadoop soon. And its distributed nature (coordination between consumers and brokers is done via Zookeeper) makes it very appealing too.

We encourage everyone to try it out and vote for it by subscribing to the general incubator mailing list. If you're already using Kafka, I'd love to hear about your experiences with it, please leave a comment below.

Update 2011-07-04: Kafka is now an Apache Incubator Project!

1 response to "Kafka proposed as Apache incubator project"

Turning it into an Apache project was certainly a good idea.

Lorenzo Alberton

Lorenzo Alberton Lorenzo has been working with large enterprise UK companies for the past 10+ years and is currently CTO at DataSift. He's an international conference speaker and a long-time contributor to many open source projects. Lorenzo Alberton's profile on GitHub Lorenzo Alberton's profile on LinkedIN View Lorenzo Alberton's profile on PHP PEAR
View Lorenzo Alberton's Twitter stream Lorenzo Alberton - Sun Certified MySQL 5 Developer PHP5 ZCE - Zend Certified Engineer


AJAX, Apache, Book Review, Charset, Cheat Sheet, Data structures, Database, Firebird SQL, Hadoop, Imagick, INFORMATION_SCHEMA, JavaScript, Kafka, Linux, Message Queues, mod_rewrite, Monitoring, MySQL, NoSQL, Oracle, PDO, PEAR, Performance, PHP, PostgreSQL, Profiling, Scalability, Security, SPL, SQL Server, SQLite, Testing, Tutorial, TYPO3, Windows, Zend Framework

Buy me a book - Introduction To Information Retrieval