Tooling for WebAssembly is held mostly by the browser vendors. It is such a nice format to work with when one removes all the fluff. WebAssembly tooling should not take seconds to do what should take milliseconds, and it should be able to be used as a library, not just a command line program.
I developed a unique way to write interpreters based on threaded code jumps and basic block versioning when I made MiniVM (https://github.com/FastVM/minivm). It was both larger and more dynamic than WebAssembly. Web49 started as a way to compile WebAssembly to MiniVM, but soon pivoted into its own Interpreter and tooling. I could not be happier with it in its current form and am excited to see what else It can do, with more work.
I'd be very interested to read more about this. It looks like you are using "one big function" with computed goto (https://github.com/FastVM/Web49/blob/main/src/interp/interp....). My experience working on this problem led me to the same conclusion as Mike Pall, which is that compilers do not do well with this pattern (particularly when it comes to register allocation): http://lua-users.org/lists/lua-l/2011-02/msg00742.html
I'm curious how you worked around the problem of poor register allocation in the compiler. I've come to the conclusion that tail calls are the best solution to this problem: https://blog.reverberate.org/2021/04/21/musttail-efficient-i...
As compared to hand-written assembly or the tailcall technique you describe. But (for the benefit of onlookers) a threaded switch, especially using (switch-like) computed gotos, is still more performant than a traditional function dispatch table.
Has there been any movement in GCC wrt the tailcalls feature?
One of the limitations with computed gotos is the inability to derive the address of a label from outside the function. You always end up with some amount of superfluous conditional code for selecting the address inside the function, or indexing through a table. Several years ago when exploring this space I discovered a hack, albeit it only works with GCC (IIRC), at least as of ~10 years ago. GCC supports inline function definitions, inline functions have visibility to goto labels (notwithstanding that you're not supposed to make use of them), and most surprisingly GCC also supports attaching __attribute__((constructor)) to inline function definitions. This means you can export a map of goto labels that can be used to initialize VM data structures, permitting (in theory) more efficient direct threading.
The tailcall technique is a much more sane and profitable approach, of course.
Note that that message is from twelve years ago. A lot's changed since then, not just in compilers but in CPUs. Branch prediction is a lot better now.
And as to why the rest are faster, I spent much time optimizing the interpreter and learning what the best way to write interpreters is. Its mostly jump threading and Mixed Data.
Nice work!
This doesn't match with my experience. After working on this problem a lot, I came to the conclusion that musttail with opaque function calls is one of the best ways of getting good code out of the compiler: https://blog.reverberate.org/2021/04/21/musttail-efficient-i...
I'm really interested in a fast interpreter-only Wasm VM that can allow the host to share some of its memory with the VM.
> git clone ...
> make -j
> ./bin/wasm2wat ./test/core/address.wast
I get: >>> ./bin/wat2wasm test/core/address.wast
unexpected word: `` byte=256Will work stop on MiniVM as result of Web49?