RPython generated from Sail booting RISC-V Linux: 2 minutes
Javascript TinyEmu booting RISC-V Linux in my browser (Safari, M1 Mac): 1.25 seconds
https://bellard.org/jslinux/vm.html?cpu=riscv64&url=buildroo...
(2nd or 3rd time once files are cached from the internet)
Any questions?
There is a general parallel here with Python vs. compiled code. Many developers prefer the flexibility that Python gives them over the speed that compiled code gives. Once the problem space is well defined and static, or small enough to be well understood, it makes sense to spend the developer time to create a high-performant solution. But during the exploration phase, it is better to use a dynamic language that may be slower.
The only time the RISC-V instruction set should be changing is when new instructions are being added, and during the extension development process the set of instructions and meaning or especially the binary encoding of individual instructions can change.
I have been the person doing the modifications to Spike during development of RISC-V extensions, and in particular during a quite fluid stage of the development of the Vector extension. I know how easy it is to do this. Just as easy as Sail, I would say.
Here's one example of why.
Most RISC-V emulators decode instructions using a series of nested switch statements. Zeroth, switch on non-C vs which page of C (if C is implemented) bits 1:0. First switch on the "opcode" field bits 6:2 e.g. OP-IMM or LOAD or BRANCH. Second, typically, switch on the "funct3" field bits 14:12 which distinguishes e.g. ADD / SLT / SLTU / AND / OR / XOR / SLL / SRL for arithmetic instructions or BEQ / BNE / BLT / BLTU / BGE / BGEU for conditional branches, or operand size for loads and stores. Third, for some instructions switch on the "funct7" field bits 31:25 to distinguish between e.g. ADD / SUB or SRL / SRA.
This is pretty fast and efficient and makes compact code/tables, but it is high maintenance.
Spike decodes instructions with a loop searching a linear list of MASK and MATCH values until it finds the correct instruction. So, by the way, does my simple "trv" emulator.
Here is my own complete executable C definition of RV32I:
https://github.com/brucehoult/trv/blob/main/instructions.inc
The 3rd and 4th values (the hex ones) are the MATCH and MASK values. The logic is "if ((instruction & MASK) == MATCH)" for example:
if ((instruction & 0xfe00707f) == 0x40000033) rd = rs1 - rs2; // sub
These linear comparisons are of course slow, but very easy to maintain.Spike makes two concessions to speed:
1) the more frequently-executed instructions gradually sift their way to the start of the list
2) there is a hash table using the entire 32 bits of each instruction (i.e. including any register fields and literals) as key, mapping to a struct with the already-decoded instruction. The last few thousand instructions are kept in the hash table. So dynamically most instructions can avoid the linear search.
(There are other aspects to ISA emulation, such as memory address protection and translation and caching the results of that, but the principles are the same)