http://blog.parsely.com/post/3886/pykafka-now/
Unfortunately, as the OP illustrates, there are now 2 widely-used Python + Kafka drivers (pykafka and kafka-python), and as of recently, a third, confluent-kafka-python, which is a thin wrapper over librdkafka.
The reason there's all this fragmentation is because Kafka was quite the moving target for non-JVM languages for the past three years. We have used it in production since Kafka 0.7, so we've had to live through it all blow-by-blow. I'm hoping that with Kafka 0.10 recently released, we can finally unify the community around a single driver (somehow).
Msgs/s
confluent_kafka_consumer : 277573.293164 / 261407.908007 = 1.061%
pykafka_consumer : 33433.342585 / 33976.938217 = 0.984%
pykafka_consumer_rdkafka : 164311.503412 / 172008.742201 = 0.955%
python_kafka_consumer : 37667.971237 / 38622.727894 = 0.975%
So yes docker network magic adds overhead, but the bias is consistent across all clients.
There is still value with comparing different clients with the same network constraints. Yeah it is a contrived setup(noted in the post), but at least is the same contrived setup for each test.
Anyways, always fun to read benchmarks. I hope kafka-python makes someone out there smile. That's the best benchmark in my book.
Good ole laptop benchmarks