I should be able to provide platform APIs to allocate memory, maybe do some floating point math, and then that's about it. Additional features would just be giving some additional API implementations the runtime needs to interact with the host machine.
Context: this was a huge barrier for me on my project https://github.com/mandiant/STrace/blob/16859a811e4af7c68259...
Perhaps more relevant: what's the point of taking machine code capable software, compiling it into WASM, and then compiling that back into machine code? A reinvention of FFI perhaps?
We could have some web standard where we ship native code to the user, which the user's machine then executes directly (using some form of sandboxing). Maybe we could have a standard API so that the operating system's interfaces don't play a role. But you still have to ship x86_64, aarch64, ARM32, RISC-V 32, RISC-V 64, MIPS, PowerPC, etc. And you'll need to support multiple versions of all these different instruction sets; some users may have old x86 machines without SSE, so you need a version of your x86 code with x87 floats rather than SSE floats. And you need one version of your ARM native code with hard floats for machines with FPUs, and one version with soft floats for machines without. And you need to constantly add new variations as new instruction sets or new versions of old instruction sets are released.
I don't trust most websites to get all of this correct. And if websites got it wrong, or if websites ever stopped adding new targets, or ever removed supports for old targets, we would end up with a web which isn't usable on certain CPU architectures, which is the opposite of what the web is all about.
The alternative is that we have some specification for intermediate code which is low level enough to be a universal compile target but abstract enough to be possible to compile to essentially any CPU architecture. That's what WASM is.
Isn't that exactly what Java did years ago? And what Microsoft's .NET bytecode did when they had to remove the Microsoft JVM?
Standard WASM is stack based, unlike every real computer out there. It also explicitly supports unaligned memory which is terrible for native execution performance. It was designed for the limitations of the browser sandbox as an evolution on a backwards compatible Javascript library.
I don't know why you'd need tk shop different versions for different instruction sets, that's exactly the problem Java, and to a lesser extent dotnet, solves. The runtime is platform dependent, the binary isn't. You can build a JAR and run it on anything from old MIPS server to a Windows 11 machine or an M2 Mac if your code targets a runtime old enough. You don't need to compile separate versions for each platform at all.
Building platform independent executables has already been solved by Java and very few people actually use it. I'm not against an alternative to Java, or anything else tainted by Oracle for that matter, I just don't see the use cases outside the browser for platforms precompiling WASM.
> what's the point of taking machine code capable software, compiling it into WASM, and then compiling that back into machine code?
That's where the proliferation of instruction sets and instruction set versions is relevant. My comment wasn't trying to address the Java/.NET part of your question.
My response to "Why not Java bytecode" is here: https://news.ycombinator.com/item?id=32743910. I don't know enough about the design of the CLR to comment. In any case, neither Java bytecode nor CLR bytecode are proper open standards, so they seem like bad fits for the web unless Microsoft or Oracle wanted to lead an effort to turn them into web standards.
I don't have enough experience with compiler back-end theory to respond intelligently to what sort of challenges and benefits a stack-based IR represents, but I know that you can view a stack language as a language with arbitrarily many registers, which you can translate into SSA form, which lets you do efficient register allocation, so I imagine the stack-based nature might be okay? I agree that allowing unaligned memory access seems like an issue, but I suppose the alternative there is to have different alignment requirements per architecture, which hurts interoperability. I'd love to hear from WASM implementers whether this causes problems in practice or not.
Some use case outside browsers are edge computing and distributing binaries (for computation efficiency) with NPM packages. For example, distributing cryptographic libraries as Wasm instead of error prone reimplementation in JS.
Firefox has some parts which are compiled to WebAssembly, and Mozilla wrote an article about it [0]. They seem to use WebAssembly to sandbox the code and improve security.
[0]: https://hacks.mozilla.org/2020/02/securing-firefox-with-weba...
If I understand your question correctly, the point is that wasm becomes the agreed-upon target, so you only need language -> wasm compilation and you can then distribute that artifact everywhere. Then you can either JIT it or recompile it into machine code at the endpoint, as an optimization.
That would help reduce the size of the graph of N languages and M architectures from O(MN) to O(M+N) (similar to the function of IR)
On the other hand, wasm is low-level enough to handle C.
.NET / CLR was the previous attempt to do something like this, and it did have compilers for many different languages targeting it. But the downside there was that the runtime itself wasn't meaningfully portable for a very long time. If it were open and cross-platform from the get go, who knows, perhaps wasm would have been a CIL subset.
Stay tuned, the next article in this series on Wasmtime security will run next Tuesday.