Apache Kafka – original research paper

The original research paper from Jay Kreps, Neha and Jun Rao for Apache Kafka is available now at

The abstract says “Log processing has become a critical component of the data pipeline for consumer internet companies. We introduce Kafka, a distributed messaging system that we developed for collecting and
delivering high volumes of log data with low latency. Our system incorporates ideas from existing log aggregators and messaging systems, and is suitable for both offline and online message
consumption. We made quite a few unconventional yet practical design choices in Kafka to make our system efficient and scalable. Our experimental results show that Kafka has superior performance when compared to two popular messaging systems. We have been using Kafka in production for some time and it is
processing hundreds of gigabytes of new data each day.”

Download (PDF, Unknown)