It's still getting faster too, e.g. engines are starting to use trap-based bounds checking which greatly reduces overhead when running on 64-bit platforms.
Could you elaborate on “trap-based bounds checking”, or provide a reference to learn more? My searching mostly pointed to “trap” as being a WASM concept, rather than some processor feature that improves bounds checks.
It means using the processor's memory protection features to prevent programs from reading memory that they're not supposed to read, rather than inserting check instructions at every memory access.
Excellent post! Webassembly appears to be approximately half as fast as native. Do you know where that gap comes from, and how much of it can be expected to close?