Apache Flink 1.9.0 Release Announcement (opens in new tab)

(flink.apache.org)

33 pointsBenfromparis6y ago12 comments

12 comments

10 comments · 3 top-level

continuations6y ago· 7 in thread

Apache has a large number of stream processing frameworks:

Flink vs Spark vs Storm vs Kafka vs Samza vs Apex

How do they compare? How would you choose which one to use?

StevePerkins6y ago

I don't have experience with Samza or Apex, but as for the first three:

1. Flink - Focused on stateful stream processing.

2. Spark - Focused on batch processing. Can be used for continuous streams, but approaches them as "micro-batches".

3. Kafka - A message queue system (for all practical purposes). Has an optional stream processing add-on for basic needs.

Separate use cases and strengths aside, it's worth calling out that all of these products are primarily backed by completely different companies. Apache is a consortium made of many companies, and serves as common branding for "community editions" of their "enterprise edition" products. There can quite a lot of overlap between sponsored products in this consortium.

lern_too_spel6y ago

Spark supports both microbatch and continuous stream processing.

Apache Software Foundation is not a consortium made of many companies but a single non-profit that provides organizational support for open source projects, some of which have contributors employed as such by other companies and some of which have only volunteer contributors.

nivertech6y ago

4. Apache BEAM (same model as Google Cloud Dataflow)

jdm22126y ago

The only two of those I know are Kafka and Flink. For those two: Flink is much more full-featured and performant (basically the full Google DataFlow API, and several orders of magnitude faster than Kafka Streaming), but Kafka Streaming has a stupid simple API that is useful if you need streaming because $reason but don't care about scaling up to infinity. If you're doing some really hacky demoware, Kafka Streaming will probably be faster to spin up because you just need the Kafka Streaming jar and a Kafka cluster.

BFLpL0QNek6y ago

Do you have any numbers to back up Flink is faster than KStreams, also under what scenario?

I am genuinely interested as use KStreams a lot but the engineering discipline in the API leads a lot to be desired and more than happy to switch the API if Flink is that much better.

1 more reply

barrkel6y ago

You don't need me to search the internet for you, but https://thenewstack.io/apache-gets-another-real-time-stream-... has some comparisons.

There's also Apache Beam, which is an API for streaming, and has Flink and Apex execution engines. Google's Cloud Dataflow is another implementation of Apache Beam.

As to which one to choose, you need to evaluate them, there's no simple answers. If you have Hadoop already then Apex may be a better fit than Flink; OTOH if you do Akka stuff already, then Flink might integrate better with your stack. If you have more batch than streaming use cases, maybe you want Spark. Etc.

readme36y ago

Also include storm in the mix too. Storm 2.0 was released recently. We have been using storm for a long time and we really like its a) Simple programming model b) Support for a wide variety of sources (e.g Kinesis , EventHub) c)Easy troubleshooting We did evaluate Spark streaming (we use Spark for batch workloads and it works well) , but fell back to storm because of the above

jdm22126y ago

This is exciting! I've been using Flink a lot lately, and fine-grained recovery is going to be very useful for [work stuff]!

whoevercares6y ago

From a WeChat posting I heard 1.5M LOC is changed. Wow

j / k navigate · click thread line to collapse

12 comments

10 comments · 3 top-level

continuations6y ago· 7 in thread

Apache has a large number of stream processing frameworks:

Flink vs Spark vs Storm vs Kafka vs Samza vs Apex

How do they compare? How would you choose which one to use?

StevePerkins6y ago

I don't have experience with Samza or Apex, but as for the first three:

1. Flink - Focused on stateful stream processing.

2. Spark - Focused on batch processing. Can be used for continuous streams, but approaches them as "micro-batches".

3. Kafka - A message queue system (for all practical purposes). Has an optional stream processing add-on for basic needs.

lern_too_spel6y ago

Spark supports both microbatch and continuous stream processing.

nivertech6y ago

4. Apache BEAM (same model as Google Cloud Dataflow)

jdm22126y ago

BFLpL0QNek6y ago

Do you have any numbers to back up Flink is faster than KStreams, also under what scenario?

I am genuinely interested as use KStreams a lot but the engineering discipline in the API leads a lot to be desired and more than happy to switch the API if Flink is that much better.

1 more reply

barrkel6y ago

You don't need me to search the internet for you, but https://thenewstack.io/apache-gets-another-real-time-stream-... has some comparisons.

There's also Apache Beam, which is an API for streaming, and has Flink and Apex execution engines. Google's Cloud Dataflow is another implementation of Apache Beam.

readme36y ago

jdm22126y ago

This is exciting! I've been using Flink a lot lately, and fine-grained recovery is going to be very useful for [work stuff]!

whoevercares6y ago

From a WeChat posting I heard 1.5M LOC is changed. Wow

j / k navigate · click thread line to collapse