- Storm does not allow arbitrary state in operators (what Nathan Marz calls "bolts"). This makes implementing the runtime easier, such as being able to replay tuple sends for fault tolerance, but it limits what kinds of applications one can make. Yes, I'm on board with the idea that we should avoid mutable state as much as possible, but people who build real applications want it. Yet, fault tolerance in our system requires more work, so it's a trade-off.
- Storm programs are implemented in Java. Streams applications are implemented in our programming language, which has the rather pedestrian name Streams Programming Language, but usually just SPL. This may seem minor, but it's a big deal. Marz is working on a higher level language in Clojure. Implementing programs in a higher level language enables developers to abstract away many issues related to high performance, distributed systems. I compare it to the difference between writing assembly code and writing C code. (Or the difference between writing Python code and writing C code.) The code that we generate is similar in principle to how one writes a Storm application. Which brings me to...
- Storm runs on the JVM, we generate C++ code which gets compiled.
Neither Storm or Streams are the first or only in this area. Stream programming is also popular for hardware, but that is usually synchronous and if there's state, it's shared-memory. Storm and Streams are distributed and asynchronous. There are academic distributed streaming systems such as Borealis. The research name for Streams is System S, and there are many academic papers about it, or that use it as a platform for other research: http://dl.acm.org/results.cfm?h=1&cfid=66087472&cfto...
And for the record, I am impressed with Storm.
The "state spout" abstraction, a future feature for Storm, will alleviate the performance problems with using an external database. Although in the time being, smart use of batching/checkpointing is sufficient for most applications.
Also, Storm topologies can be written in any language. Storm has great multi-language support.
I do agree though that higher level abstractions are important. That will come later, once we're confident that we've mastered the primitives for doing fault-tolerant realtime computation.
Maybe I misunderstand your point, but Storm does allow state in Bolts - a Bolt is just a Java object, so it can have member variables. That's how aggregation (e.g. counting events per user) is done. Of course, if you want a Bolt that scales horizontally, you need to account for the state being split across several instances of the Bolt class; and if you need the state to survive a restart, you need to keep it in an external database instead of in the object's memory.
I had assumed there was no arbitrary state because of the replay semantics. Let's say bolt A sends tuples to bolt B. B has internal state. A sends tuples t1, t2, t3 and t4. A receives acknowledgements that t1, t3 and t4 were processed. So t2 needs to be replayed. But the semantics of what that means is undefined - B has internal state that already incorporates, for certain, t3 and t4, and maybe t2. (While it's unlikely, you never know where a tuple got lost.) So replaying t2 is problematic - do you just blindly replay it, and allow potentially broken semantics? The alternative is to do rollback, which is quite hairy.