At the risk of repeating myself: I don't have any conclusions.
> There is absolutely no intrinsic value in a big runtime.
And yet there is cost. It is unclear if that cost is a factor.
> Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?
Because they are not useful.
We are looking at a business problem, think about the ways people can solve that problem, and cross-comparing the tooling used by those different solutions.
Is there really nothing to be gained here?
The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.
Is this Spark-based solution not the typical way Spark is implemented?
Could a 10mb solution do the same if it can't get into L1? Is it worth trying to figure out how to make Spark work correctly if the JVM has a size limit? Is that a size limit?
There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?
If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?
That CUDA solution is exciting... There is stuff to think about there.