The problem with number crunching or maths is that it is very difficult to cut the whole computation into smaller units and pre-emptively schedule it. If it is possible for a specific use case, then it is moderately easy to replace that part with NIFs. For effective maths you need to convert the internal tagged number representation to machine native code that is also expensive. Solving these two things in the generic case is very difficult while preserving all the good parts.
They cannot be pre-empted, but they must also return quickly, or risk causing lots of problems (see https://erlang.org/doc/man/erl_nif.html for slightly more detail on what this means). As such you can't just write some big function in C to do number crunching.
The NIF documentation mentions some ways around the problem, but all of them take some effort, or have tradeoffs of some sort. I was really excited when “dirty” NIFs were introduced, which can tell the BEAM that they'll run for a while, thus appearing to allow for long-running NIFs with no extra work other than setting a flag. However, it turns out that the BEAM just spins up N threads for scheduling dirty NIFs, and if you have too many such NIFs, too bad, some won't get scheduled till the others have completed. In retrospect it should have been obvious that there couldn't be a silver bullet for this problem, because it really isn't easy.
Erlang may well be my favorite language, but as you imply, it's just not going to be the right approach for everything: in my experience, it's absolutely fantastic in its niche but that niche is quite small. I think that's fine, though. For me, where Erlang does make sense, its concurrency approach makes it unbeatable, and I'll live with the performance tradeoffs. It turns out that basically all the NIFs I've had to write were just to gain access to functionality that Erlang doesn't expose (e.g. network namespaces on Linux, which are supported now, but weren't when I needed them).
It's actually worse than that; as I recall, the internal numerical representations of numbers do not necessarily map to the CPU's (for instance, there is no byte sizing; you have integers and floats, and they can be arbitrarily large). The work to perform that conversion, do the math, and convert back, would almost assuredly make it so that a single calculation takes more time than just doing it within the BEAM. The only way to save time would be to convert once, do a bunch of math, and convert back. Which would, yes, prevent pre-emption, AND require indication of intent (so brand new language constructs, minimally).
That's a lot to expect of the user, and a lot to implement in the language...all to avoid just writing a NIF.
There wouldn't need to be an indication of intent, other than writing the math separate from any function calls. I don't know how much code fits this pattern, but it's an idea that could be explored. I think that's part of what hipe is supposed to do, but I haven't looked into hipe in a long time.
I don't understand why. If you have a maths-intensive operation like matrix-multiplication using untagged maths, why does that prevent pre-emption? Why does it require indication of intent?
And there's already a basically zero-overhead way to implement pre-emption - safepoints - that's what the JVM does when it wants to pre-empt in user-space.