His example is:
sequence
.map(|x: T0| ...: T1)
.scan(|a: T1, b: T1| ...: T1)
.filter(|x: T1| ...: bool)
.flat_map(|x: T1| ...: sequence<T2>)
.collect()
It would be written in Futhark something like this: sequence
|> map (\x -> ...)
|> scan (\x y -> ...)
|> filter (\x -> ...)
|> map (\x -> ...)
|> flattenI haven't studied it in depth, but it's pretty readable.
The example you showed is very much how I think about PRQL pipelines. Syntax is slightly different but semantics are very similar.
At first I thought that PRQL doesn't have scan but actually loop fulfills the same function. I'm going to look more into comparing those.
It is a joke, but an SQL engine can be massively parallel. You just don't know it, it just gives you what you want. And in many ways the operations resembles what you do for example in CUDA.
CUDA backend for DuckDB or Trino would be one of my go-to projects if i was laid off.
What could be good is relational + array model. I have some ideas on https://tablam.org, and building not just the language but the optimizer in tandem I think will be very nice.
• Datalog is much, much better on these axes.
• Tutorial D is also better than SQL.
It solves all the warts of sql while still being true to its declarative execution. Trailing commas, from statement first and reads as a a composable pipeline, temporary variables for expressions, intuitive grouping.
Sometimes I have a problem, I just generate bunch of "possible solutions" with a constraint solver (e.g. Minizinc) which generates GBs of CSVs describing bunch of solutions, then let DuckDB analyze which ones are suitable, DuckDB is amazing.
Term rewriting languages probably work better at this than I would expect? It is kind of sad how little experience with that sort of thing that I have built up. And I think I'm above a large percentage of developers out there.
Raph is a super nice guy and a pleasure to talk to. I'm glad we have people like him around!
Hardware architectures like Tera MTA were much more capable but almost no one could write effective code for them even though the language was vanilla C++ with a couple extra features. Then we learned how to write similar software architecture on standard CPUs. The same problem of people being bad at it remained.
The common thread in all of this is people. Humans as a group are terrible at reasoning about non-trivial parallelism. The tools almost don't matter. Reasoning effectively about parallelism involves manipulating a space that is quite evidently beyond most human cognitive abilities to reason about.
Parallelism was never about the language. Most people can't build the necessary mental model in any language.
To your point, we also didn't need a new language to adopt this paradigm. A library and a running system were enough (though, semantically, it did offer unique language-like characteristics).
Sure, it's a bit antiquated now that we have more sophisticated iterations for the subdomains it was most commonly used for, but it hit a kind of sweet spot between parallelism utility and complexity of knowledge or reasoning required of its users.
The syntax and semantics should constrain the kinds of programs that are easy to write in the language to ones that the compiler can figure out how to run in parallel correctly and efficiently.
That's how you end up with something like Erlang or Elixir.
throwing infiniband or IP on top is really structurally more of the same.
Chapel definitely can target a single GPU.
Overall, it seems to be a really interesting problem!
going the other direction, making channel runtimes run SIMD, is trivial
Disclaimer: I did not watch the video yet
Or basically a generic nestable `remote_parallel_map` for python functions over lists of objects.
I haven't had a chance to fully watch the video yet / I understand it focuses on lower levels of abstraction / GPU programming. But I'd love to know how this fit's into what the speaker is looking for / what it's missing (other than obviously it not being a way to program GPU's) (also full disclosure I am a co-founder).
P.S. I'm joking, I do love Go, even though it's by no means a perfect language to write parallel applications with
Nothing yet? Damn...
However Erlang has very little to say about parallelization of loops, or in the levels between a single loop and a HTTP request.
Nor would it be a good base for such things; if you're worried about getting maximum parallel performance out of your CPUs you pretty much by necessity need to start from a base where single-threaded performance is already roughly optimal, such as with C, C++, or Rust. Go at the very outside, and that's already a bit of a stretch in my opinion. BEAM does not have that level of single-threaded performance. There's no point in making what BEAM does fully utilize 8 CPUs in this sort of parallel performance when all that does is get you back to where a single thread of Rust can run.
(I think this is an underappreciated aspect of trying to speed things up with multiple CPUs. There's no point straining to get 8 CPUs running in some sort of complicated perfect synchronization in your slow-ish language when you could just write the same thing in a compiled language and get it on one CPU. I particularly look at the people who think that GIL removal in Python is a big deal for performance and wonder what they're thinking... a 32-core machine parallelizing Python code perfectly, with no overhead, might still be outperformed by a single-core Go process and would almost certainly be beated by a single-core Rust process. And perfect parallelization across 32 cores is a pipe dream. Unless you've already maxed out single-core performance, you don't need complicated parallelization, you need to write in a faster language to start with.)
The thing i would really like to see is some research on how to run the Erlang concurrency model on a GPU.
Some of the operations Erlang does, GPUs don't even want to do at all, including basic things like pattern matching. GPUs do not want that sort of code at all.
"Erlang" is being over specific here. No conventional CPU language makes sense on a GPU at all.