Implementing a Virtual Machine in C (opens in new tab)

(blog.felixangell.com)

159 pointsfreefouran11y ago50 comments

50 comments

44 comments · 15 top-level

I find these kinds of very basic intro articles frustrating. They till the same ground over and over: a tiny instruction set implemented with a switch statement. None of the more difficult issues are addressed: exception handling, linking to libraries or other programs written for the same VM, portability of programs across architectures, accessing the OS for services like file I/O, time, etc.-- All the things that make a toy not a toy.

Every CS student in the world has written a toy VM just like this one.

dandrews11y ago

That's a little ungenerous of you. The toy VM has presented an aha! moment to all of us at one time - or more than once, when you learn about Turing equivalence and the rabbit hole that leads from that.

Skip the articles beneath your skill level and move along.

ternaryoperator11y ago

Agreed, it was a bit ungenerous, esp. b/c I didn't realize til someone mentioned it below that the author was a teenager.

My objection, though, is not what's above/below me, it's that given the title, I was expecting a deeper article only to find one that has appeared numerous times. It was frustration I was expressing more than condescension. But you're quite right in using the term "ungenerous." I appreciate the gentle reproof.

tptacek11y ago

The details you're talking about are pretty much just details; for instance, a foreign function interface for C functions (and thus system calls) is a pretty straightforward exercise that involves just a bit about the C ABI for your platform.

It's much easier to teach someone who knows how to build a basic register machine how to set up the stack for a call to a C function than it is to teach someone who doesn't know how a register machine works how to build one.

Also: there are reasons you might want to build a simple VM that has no access to system services and no notion of an exception. BPF is a good example of such a virtual machine.

If there's something frustrating about this article to me, it's that it lacks a motivating example. Why would you want to build this VM? What are you compiling down to it from?

tylermac111y ago

Considering the author is 16, I think it's a great write up/exercise.

1 more reply

rounak11y ago

You're being unnecessarily harsh. If you got frustrated means it wasn't targeted towards you.

Would appreciate if you did an advanced version of this writeup :)

titanix211y ago

Since I also wrote a toy VM where the toolchain supports linking of different code files (albeit the runtime doesn't implement CALL / RET) I decided to make a blog post about this specific topic.

https://netspring.wordpress.com/2015/05/10/toy-virtual-machi...

maguirre11y ago

I feel the same way. I went to read the post expecting a lot more than what I found and came away feeling both more knowledgeable than I thought I was and more ignorant for not knowing that I could get away with calling what a saw a "Virtual Machine"

tptacek11y ago

It's an instruction set, with an instruction dispatcher, a stack, and a register file. Why would it surprise you that someone would call it a VM?

Are you maybe getting your signals crossed between the kind of VM this article is talking about (in the p-code sense of a VM) and virtualization systems?

1 more reply

bitwize11y ago

"Virtual machine" here means "abstract machine" as in JVM or CLR, not "multiplexed hardware".

jcoffland11y ago· 8 in thread

I'd like to see such an article on a register based VM. Pawn and Lua are nice examples. Most VMs are stack based but this is mainly because they are conceptually easier to understand. Register based machines have some real advantages, like requiring far fewer instructions inside tight loops.

tptacek11y ago

Stack VMs aren't used just because they're easier to understand:

* In interpreted environments, registers are stored in memory anyways, so the advantage of simulating them isn't as great

* It is easier to generate code for stack machines, because you don't need to run register allocation

* There's a tradeoff in instruction complexity versus number of instructions between stack and register machines

rdc1211y ago

This paper [1] (thou it is a tad old now) shows that the register machine approach does still outperform the stack machine by a fair margin. Largely from the reduction of needed instructions.

Wonder which is a better source represntation for a JIT thou.

[1] https://www.usenix.org/legacy/events/vee05/full_papers/p153-...

1 more reply

userbinator11y ago

In interpreted environments, registers are stored in memory anyways, so the advantage of simulating them isn't as great

...unless the interpreter maps VM registers directly onto machine registers. With some VMs it's possible, and then you can get very good performance.

1 more reply

crzwdjk11y ago

Registers are stored in memory, but ideally "memory" means L1 cache, and it seems to me like register VMs would have better cache locality. This might be why they're getting more popular relative to stack VMs as the speed advantage of cache increases.

1 more reply

voidiac11y ago

Register based VMs are not more complex than stack based VMs. You can even have a stack and the only difference is that your instructions operate on registers and not directly on the stack. I only consider generating code more complex with a register based VM.

If you are interested here is a toy vm I wrote myself: https://github.com/byo3rn/ire

PS: I suck at documentation.

bch11y ago

You can see a project switch from a stack-based VM to a register-based VM here[0] [1].

For [1], search page for "vdbe" (virtual database engine). Hopefully you find interesting things.

[0] https://www.sqlite.org/vdbe.html

[1] http://www.sqlite.org/src/timeline?c=2007-11-11&n=400

ddfreyne11y ago

I created a register-based VM named RCPU, for educational purposes. You can find it at https://github.com/ddfreyne/rcpu. It has most of what you’d expect from a register-based VM, and even has video output (rcpu-assemble samples/video.rcs and then rcpu-emulate the resulting file with --video).

I intend to write up my findings, but haven’t gotten around to it yet. I do have slides for a talk for this project though: https://speakerdeck.com/ddfreyne/simulating-a-cpu-with-ruby

rednovae11y ago

Not an article, but this is a register-based VM that should be simple and straight-forward to understand. Also comes with an assembler, disassembler and debugger.

https://github.com/endeav0r/hsvm

aftbit11y ago· 5 in thread

I'm a bit disappointed - this VM doesn't have instructions for looping or branching, nor does it really use the registers in any way. I was hoping to read a writeup that introduced some concepts that were used in real (non-toy) systems.

voidiac11y ago

Branching would be something like:

  case JMP: {
    ip = program[ip + 1] - 1;
    break;
  }

aspl11y ago

Hi. I updated the article to include a bit more information on registers/branches, and an example on the github with a loop. I wanted to leave the registers (or well the instructions for moving/setting registers) as an exercise for the reader to see if they could implement it themselves :)

jwdunne11y ago

You could implement looping by implementing an instruction, say 'JMP', that takes an 'address' as a parameter that modifies the ip. Branching can be implemented similarly with an instruction called 'BNZ', which takes two parameters, a register and an address, which changes the ip to the given address if the register is non-zero. SUB would be useful so you can subtract a register from itself to branch if equal. Other routes include using the top of the stack with your operations and branching, so you can stick to one parameter in the above instructions.

stevekemp11y ago

Agreed. I wrote a simple virtual machine, and the most interesting thing was writing the compiler that would read my "assembly", and spit out the bytecode. Including handling labels and jumping instructions.

My VM was very simple, and I've not touched it for a while, but the whole thought of how to handle branching was what made it more interesting than other similar toy systems.

https://github.com/skx/simple.vm

Someone11y ago

This _is_ a toy system, but "No looping or branching" does not imply "toy system". Counterexample: http://dtrace.org/guide/chapter.html#chp-intro-6

neverartful11y ago· 2 in thread

Contrary to the naysayers, I like seeing stuff like this. Why? Because it's a simple, gentle introduction. It's easily digestible for the newcomer. And it might be easy enough to encourage a newcomer to start building their own VM that goes on to be something real.

For those that criticize it and find faults with it -- I'm sure the author would consider pull requests. Or you could provide your own fork with all the improvements that you believe are necessary.

aikah11y ago

Calling people who criticize that blog post "naysayers" is childish, at best. Once something is out there on the web it's going to get some criticism, that's normal.There is nothing wrong with that.

sdoering11y ago

The question (and OP made that clear) is how you criticize. Trolling is not the right way, I believe.

Cosplaying Captain Obvious neither.

amelius11y ago· 2 in thread

This project is nice for educational purposes, but I wouldn't call it a VM, but instead a "bytecode interpreter".

I think nowadays it is kind of a minimum requirement to have the intermediate code JIT-compiled (or at least compiled).

I'm also missing a garbage collector, although that is not necessarily part of a VM (but often is). See NaCl for a counterexample. By the way, a project that I'd like to see is an efficient garbage collector implemented inside the VM, instead of as being part of the VM.

karmakaze11y ago

It's interesting that VM for some doesn't mean the same as virtual machine. A bytecode interpreter _is_ a vm with the bytecodes representing opcodes of the machine. What this isn't is a 'modern VM' complete with JIT and GC.

TazeTSchnitzel11y ago

Why is JIT a minimum requirement?

earlz11y ago· 1 in thread

I'm pretty sure everyone has wrote their own toy VMs, but I'll go ahead and throw mine out there. (well, 1 of the 3 I've wrote that I like best). It's called LightVM and is intended to be capable of running on tiny microcontrollers.

The most cool thing I like about it is the opcodes and registers are extremely general purpose. So, to do a branch, you do `mov IP, label`, or even a "push.mv" instruction which when used against IP is basically the same as the usual "call" instruction, but can also be used with data registers to save a register to the stack and then set it to a value.

I've found the hardest thing about making a VM isn't making a VM, but rather making the infrastructure around it (assembler, debugger, compilers, etc)

https://bitbucket.org/earlz/lightvm/overview

dkersten11y ago

So, to do a branch, you do `mov IP, label`, or even a "push.mv" instruction which when used against IP is basically the same as the usual "call" instruction

I wrote something a little like this once too - there was a register stack and call, jump, branch were all implemented by pushing or popping the register stack.

vbezhenar11y ago· 1 in thread

For those who want to implement a VM as an exercise, I recommend to implement a simple JIT-compiler after that. You'll probably be impressed at performance improvements and it's funny exercise to do. I used GNU lightning to generate machine code.

rounak11y ago

Any pointers/links/tutorials for this? Thanks.

jCanvas11y ago· 1 in thread

I think the title is very misleading. This is not a virtual machine but an interpreter for a made up assembly language. There is nothing wrong with that and I am sure a beginner would find it very useful. But reading the title I was expecting something quite different.

dalke11y ago

Virtual machines include "interpreters for a made up assembly language." Quoting from http://en.wikipedia.org/wiki/Virtual_machine#Process_virtual... :

> A process VM, sometimes called an application virtual machine, or Managed Runtime Environment (MRE), runs as a normal application inside a host OS and supports a single process. ... Process VMs are implemented using an interpreter; performance comparable to compiled programming languages is achieved by the use of just-in-time compilation.

It points to several examples of process VMs. One is Parrot. Quoting from http://en.wikipedia.org/wiki/Parrot_virtual_machine :

> Parrot is a register-based process virtual machine designed to run dynamic languages efficiently. It is possible to compile Parrot assembly language and PIR (an intermediate language) to Parrot bytecode and execute it.

(I quoted that one over Java and Python virtual machines because it uses the phase "assembly language" in the context of the VM.)

donpdonp11y ago

New concepts are introduced at a satisfying pace. Each bit of code is explained thoroughly. Nice writeup.

emmanueloga_11y ago

I am starting to sound like a broken record, but here it goes. If you want a more complete tutorial on writing stack based virtual machines, check "The Elements of Computing Systems" and its accompanying course, http://www.nand2tetris.org/.

The book teaches you to build:

1) A CPU from basic electronics elements

2) An assembler to generate machine code

3) A bytecode VM that can be simulated and an assembler generator from the bytecode

4) A basic programming language that generates bytecode

5) An operating system using that language.

I'm midway through building the Assembler and VM myself :-).

tjscanlon11y ago

For everyone who enjoyed this or wants to take it a step further, I recommend writing a CHIP-8 emulator. I used the following source: http://www.multigesture.net/articles/how-to-write-an-emulato... and it was very helpful.

ggambetta11y ago

For people looking for less "toy" implementations, I've written two emulators, an 8086 one and a Z80 one.

There's libz80 (https://github.com/ggambetta/libz80) which is (AFAIK) quite complete and correct but just a library, and the 8086 one (https://github.com/ggambetta/emulator-backed-remakes) which is incomplete and buggy but serves a much more interesting purpose :)

phodo11y ago

While seemingly simple, the simple non-turing example is not too far off from the (simple) Forth-like stack-based programming language found and executed in bitcoin transactions.

https://en.bitcoin.it/wiki/Script

pjonesdotca11y ago

C is not my thing so a few years ago trying to sort out how a VM works, I created a VM in Ruby.

Practical? Not in the least. But, it was a good weekend's worth of fun.

https://github.com/patrickjonesdotca/carban

bvanslyke11y ago

For a project that goes a bit deeper (branching, i/o, etc) consider writing a Chip8 simulator. There's lots of games written in chip8 bytecode to test with!

j / k navigate · click thread line to collapse

50 comments

44 comments · 15 top-level

ternaryoperator11y ago· 9 in thread

Every CS student in the world has written a toy VM just like this one.

dandrews11y ago

Skip the articles beneath your skill level and move along.

ternaryoperator11y ago

Agreed, it was a bit ungenerous, esp. b/c I didn't realize til someone mentioned it below that the author was a teenager.

tptacek11y ago

Also: there are reasons you might want to build a simple VM that has no access to system services and no notion of an exception. BPF is a good example of such a virtual machine.

If there's something frustrating about this article to me, it's that it lacks a motivating example. Why would you want to build this VM? What are you compiling down to it from?

tylermac111y ago

Considering the author is 16, I think it's a great write up/exercise.

1 more reply

rounak11y ago

You're being unnecessarily harsh. If you got frustrated means it wasn't targeted towards you.

Would appreciate if you did an advanced version of this writeup :)

titanix211y ago

Since I also wrote a toy VM where the toolchain supports linking of different code files (albeit the runtime doesn't implement CALL / RET) I decided to make a blog post about this specific topic.

https://netspring.wordpress.com/2015/05/10/toy-virtual-machi...

maguirre11y ago

tptacek11y ago

It's an instruction set, with an instruction dispatcher, a stack, and a register file. Why would it surprise you that someone would call it a VM?

Are you maybe getting your signals crossed between the kind of VM this article is talking about (in the p-code sense of a VM) and virtualization systems?

1 more reply

bitwize11y ago

"Virtual machine" here means "abstract machine" as in JVM or CLR, not "multiplexed hardware".

jcoffland11y ago· 8 in thread

tptacek11y ago

Stack VMs aren't used just because they're easier to understand:

* In interpreted environments, registers are stored in memory anyways, so the advantage of simulating them isn't as great

* It is easier to generate code for stack machines, because you don't need to run register allocation

* There's a tradeoff in instruction complexity versus number of instructions between stack and register machines

rdc1211y ago

This paper [1] (thou it is a tad old now) shows that the register machine approach does still outperform the stack machine by a fair margin. Largely from the reduction of needed instructions.

Wonder which is a better source represntation for a JIT thou.

[1] https://www.usenix.org/legacy/events/vee05/full_papers/p153-...

1 more reply

userbinator11y ago

In interpreted environments, registers are stored in memory anyways, so the advantage of simulating them isn't as great

...unless the interpreter maps VM registers directly onto machine registers. With some VMs it's possible, and then you can get very good performance.

1 more reply

crzwdjk11y ago

1 more reply

voidiac11y ago

If you are interested here is a toy vm I wrote myself: https://github.com/byo3rn/ire

PS: I suck at documentation.

bch11y ago

You can see a project switch from a stack-based VM to a register-based VM here[0] [1].

For [1], search page for "vdbe" (virtual database engine). Hopefully you find interesting things.

[0] https://www.sqlite.org/vdbe.html

[1] http://www.sqlite.org/src/timeline?c=2007-11-11&n=400

ddfreyne11y ago

I intend to write up my findings, but haven’t gotten around to it yet. I do have slides for a talk for this project though: https://speakerdeck.com/ddfreyne/simulating-a-cpu-with-ruby

rednovae11y ago

Not an article, but this is a register-based VM that should be simple and straight-forward to understand. Also comes with an assembler, disassembler and debugger.

https://github.com/endeav0r/hsvm

aftbit11y ago· 5 in thread

voidiac11y ago

Branching would be something like:

  case JMP: {
    ip = program[ip + 1] - 1;
    break;
  }

aspl11y ago

jwdunne11y ago

stevekemp11y ago

My VM was very simple, and I've not touched it for a while, but the whole thought of how to handle branching was what made it more interesting than other similar toy systems.

https://github.com/skx/simple.vm

Someone11y ago

This _is_ a toy system, but "No looping or branching" does not imply "toy system". Counterexample: http://dtrace.org/guide/chapter.html#chp-intro-6

neverartful11y ago· 2 in thread

For those that criticize it and find faults with it -- I'm sure the author would consider pull requests. Or you could provide your own fork with all the improvements that you believe are necessary.

aikah11y ago

Calling people who criticize that blog post "naysayers" is childish, at best. Once something is out there on the web it's going to get some criticism, that's normal.There is nothing wrong with that.

sdoering11y ago

The question (and OP made that clear) is how you criticize. Trolling is not the right way, I believe.

Cosplaying Captain Obvious neither.

amelius11y ago· 2 in thread

This project is nice for educational purposes, but I wouldn't call it a VM, but instead a "bytecode interpreter".

I think nowadays it is kind of a minimum requirement to have the intermediate code JIT-compiled (or at least compiled).

karmakaze11y ago

TazeTSchnitzel11y ago

Why is JIT a minimum requirement?

earlz11y ago· 1 in thread

I've found the hardest thing about making a VM isn't making a VM, but rather making the infrastructure around it (assembler, debugger, compilers, etc)

https://bitbucket.org/earlz/lightvm/overview

dkersten11y ago

So, to do a branch, you do `mov IP, label`, or even a "push.mv" instruction which when used against IP is basically the same as the usual "call" instruction

I wrote something a little like this once too - there was a register stack and call, jump, branch were all implemented by pushing or popping the register stack.

vbezhenar11y ago· 1 in thread

rounak11y ago

Any pointers/links/tutorials for this? Thanks.

jCanvas11y ago· 1 in thread

dalke11y ago

Virtual machines include "interpreters for a made up assembly language." Quoting from http://en.wikipedia.org/wiki/Virtual_machine#Process_virtual... :

It points to several examples of process VMs. One is Parrot. Quoting from http://en.wikipedia.org/wiki/Parrot_virtual_machine :

(I quoted that one over Java and Python virtual machines because it uses the phase "assembly language" in the context of the VM.)

donpdonp11y ago

New concepts are introduced at a satisfying pace. Each bit of code is explained thoroughly. Nice writeup.

emmanueloga_11y ago

The book teaches you to build:

1) A CPU from basic electronics elements

2) An assembler to generate machine code

3) A bytecode VM that can be simulated and an assembler generator from the bytecode

4) A basic programming language that generates bytecode

5) An operating system using that language.

I'm midway through building the Assembler and VM myself :-).

tjscanlon11y ago

ggambetta11y ago

For people looking for less "toy" implementations, I've written two emulators, an 8086 one and a Z80 one.

phodo11y ago

While seemingly simple, the simple non-turing example is not too far off from the (simple) Forth-like stack-based programming language found and executed in bitcoin transactions.

https://en.bitcoin.it/wiki/Script

pjonesdotca11y ago

C is not my thing so a few years ago trying to sort out how a VM works, I created a VM in Ruby.

Practical? Not in the least. But, it was a good weekend's worth of fun.

https://github.com/patrickjonesdotca/carban

bvanslyke11y ago

For a project that goes a bit deeper (branching, i/o, etc) consider writing a Chip8 simulator. There's lots of games written in chip8 bytecode to test with!

j / k navigate · click thread line to collapse