2011 – 2026
Apache Kafka Versions
Every release of Apache Kafka — from the 0.7 incubating release open-sourced out of LinkedIn, through the platform-defining APIs (replication, Connect, Streams, exactly-once) to the current line. Kafka 4.3 (May 2026) is the current release; Kafka has been KRaft-only since 4.0 (March 2025), when Apache ZooKeeper was removed entirely. With ship dates and headline changes per version, the multi-year KRaft transition this page exists to make legible, and the Apache-2.0-vs-Confluent-Community-License story.
The LinkedIn origin and the log as a first-class abstraction
Kafka was built at LinkedIn around 2010 by Jay Kreps, Neha Narkhede, and Jun Rao, to solve a problem the company had outgrown: moving huge streams of activity data (page views, metrics, logs) between systems without a tangle of point-to-point pipelines. The name is a nod to the writer Franz Kafka — Kreps liked the idea of a system "optimized for writing." It was open-sourced in 2011, entered the Apache Incubator that year, and graduated to a top-level Apache project in October 2012.
The idea that made Kafka more than a message queue was treating the append-only commit log as the core abstraction — an ordered, replayable, durable sequence of records that many consumers can read independently at their own position. Kreps laid this out in the widely read 2013 essay "The Log." That single design choice is why Kafka became infrastructure for event streaming, event sourcing, and stream processing rather than just A-to-B messaging — and why every feature that followed (Connect, Streams, exactly-once, tiered storage) is a layer on top of the log.
The API build-out — replication, Connect, Streams, exactly-once
Kafka's most consequential features all shipped during its 0.x years, each turning it into a broader platform. Intra-cluster replication in 0.8 (2013) was the reliability watershed: partitions gained replicas and leader election, so a broker could fail without losing data. That is the release that made Kafka safe for systems of record.
Kafka Connect (0.9, 2015) added a framework for streaming data in and out of external systems with reusable connectors, so Kafka became the hub of a data pipeline rather than one hop in it. Kafka Streams (0.10, 2016) put a full stream-processing library inside Kafka — joins, aggregations, and windowing as a client library, no separate cluster to run.
Then exactly-once semantics in 0.11 (2017) — the idempotent producer plus multi-partition transactions — closed the last correctness gap, letting a read-process-write pipeline commit its inputs and outputs atomically. It's a fair argument that 0.11 was a bigger release than the 1.0 that followed it a few months later; 1.0 mostly signaled that the project considered itself production-mature.
The KRaft transition — why ZooKeeper was removed
From the beginning, Kafka stored its cluster metadata — which brokers exist, which topics and partitions they hold, who the controller is — in Apache ZooKeeper, a separate distributed coordination service. That worked, but it meant every Kafka deployment was really two systems to run, secure, and reason about, and ZooKeeper became a scaling ceiling: the number of partitions a cluster could hold was bounded by how fast the controller could load metadata out of ZooKeeper.
KIP-500, first proposed in 2019, set out to replace ZooKeeper with KRaft — a built-in Raft consensus quorum where a set of Kafka controllers store the metadata as an event log inside Kafka itself. No external dependency, faster failover, and metadata that scales to millions of partitions. It was a foundational change to how Kafka runs, so it shipped deliberately over several years.
The milestones are version-pinned, which is why this page tracks them so precisely: KRaft was early access in 2.8 (2021), a fuller preview in 3.0, and production-ready for new clusters in 3.3 (2022, KIP-833). Migrating an existing ZooKeeper cluster took longer: the KIP-866 migration path was early access in 3.4–3.5 and production-ready in 3.6 (2023).
Kafka 3.9 (2024) is the pivotal bookend: the last release that supports ZooKeeper at all, and therefore the mandatory bridge release — a ZooKeeper cluster upgrades to 3.9, migrates to KRaft, and only then moves forward. Kafka 4.0 (March 2025) removed ZooKeeper entirely; 4.x is KRaft-only. So the practical answer to "do I still need ZooKeeper?" is a version boundary: yes through 3.9, no from 4.0.
Tiered storage, the new consumer protocol, and Queues
With ZooKeeper handled, the recent releases have reshaped the parts of Kafka developers touch most. Tiered storage (KIP-405, early access in 3.6, GA in 3.9) lets brokers offload older log segments to remote object storage, decoupling how much history you keep from how much local disk you buy.
The next-generation consumer rebalance protocol (KIP-848, early access in 3.7, GA in 4.0) moved partition-assignment logic from the client group leader to the broker and made rebalances incremental, ending the "stop-the-world" pauses that hit large consumer groups every time a member joined or left.
And Queues for Kafka (Share Groups, KIP-932, preview in 4.0, production-ready in 4.2) gave Kafka true queue semantics for the first time: multiple consumers cooperatively processing records from the same partitions with per-record acknowledgement, so the classic work-queue pattern no longer needs a partition per worker.
Confluent and the Community License
Kafka's three creators left LinkedIn in 2014 to found Confluent, the company that drives much of Kafka's development and sells Confluent Platform (a Kafka distribution with extras) and Confluent Cloud (a managed service). Confluent went public on NASDAQ (ticker CFLT) in 2021. Apache Kafka itself remains an Apache Software Foundation project under Apache 2.0; Confluent is the largest contributor, not the owner.
The licensing wrinkle worth keeping straight: in December 2018, Confluent moved several of its own Confluent Platform add-ons — ksqlDB, the Schema Registry, REST Proxy, and some connectors — from Apache 2.0 to the source-available Confluent Community License. The motive was the same one behind MongoDB's SSPL and Redis's and Elastic's later relicensings: stop hyperscalers from selling the software as a service without contributing back. The CCL lets you use, modify, and redistribute the code but not offer it as a competing managed service, and the OSI has not certified it — so those components are source-available, not open source. None of this touches Apache Kafka, which this page tracks and which stays Apache 2.0. When people ask "is Kafka open source?" the answer is a clean yes; the CCL question is really a question about Confluent's extras.