Table of Contents for Programming Languages: a survey

Other Intermediate languages (ILs) designed to be targetted by multiple high-level languages, or pedagogical or minimalistic


Parrot started out as a runtime for Perl6. Then it refocused on being an interoperable VM target for a variety of languages. However, it hasn't been very successful (due to not enough volunteers being motivated to spend enough hours hacking on it), and even Perl6 is moving away from it.

Even if unsuccessful, it is still of interest because it is one of the few VMs designed with interoperation between multiple HLLs in mind.

Note that multiple core Parrot devs claim that all core Parrot devs hate Parrot's object model:

It's register-based. It provides garbage collection.

It has a syntactic-sugar IR language called PIR (which handles register allocation and supports named registers), an assembly-language called PASM, an AST serialization format called PAST, and a bytecode called PBT.

Its objects are called PMCs (Polymorphic Containers).

Its set of opcodes are extensible (a program written in Parrot can define custom opcodes). Parrot itself contains a lot of opcodes:

At one point there was an effort called M0 to redefine things from a small, core set of opcodes but i don't know what happened to it; this appears to be the list of M0 opcodes: . I dunno if the M0 project is still ongoing, see . The repo seems to be at . There is also an IL that compiles to M0: .

There was an earlier effort for some sort of core language called L1 . Not sure what happened with that either.


The M0 opcodes:

control flow: noop goto (go to a fixed offset in the current bytecode segment) goto_if (conditionally go to a fixed offset in the current bytecode segment) goto_chunk (go to an offset in another chunk)

arithmetic: add_i add_n sub_i sub_n mult_i mult_n div_i div_n mod_i (remainder) mod_n isgt_i isgt_n isge_i isge_n convert_i_n (convert from integer to numeric) convert_n_i

bitwise arithmetic: ashr (right bitshift with sign extension) lshr (right bitshift without sign extension) shl (left bitshift) and or xor

Memory/GC ops: gc_alloc sys_alloc sys_free copy_mem

todo: set set_imm deref set_ref set_byte get_byte set_word get_word csym ccall_arg ccall_ret ccall print_s print_i print_n exit

_i means arith on 'integers', _n means arith on 'two numeric registers', "Treat *$2 and *$3 as integer or floating-point values, (operate on) them and store the result in *$1."

todo; explain the crytic ones; descriptions here




Parrot is also a register based virtual machine. It defines four types of registers:

    Numbers (i.e. floating point)
    Polymorphic Containers (PMCs), which reference complex types and structures

Like LLVM, Parrot does not define a maximum number of registers: each function uses as many registers as it needs. Functions do not re-use registers for different purposes by storing their values to memory, they specify a new register number instead. The Parrot runtime will handle assignment of virtual machine registers to CPU registers.

So far as I can tell, integer registers are the width of the host CPU on which the VM is running. A Parrot bytecode might find itself using either 32 or 64 bit integer registers, determined at runtime and not compile time. This is fascinating if correct, though it seems like BigNum? handling would be somewhat complicated by this. " --



Used as a target for Haxe; being replaced by HashLink? VM in this capacity. Neko and Haxe and HashLink? have the same lead dev (Neko precedes Haxe).


Haxe currently uses NekoVM? as its primary VM target, but it's switching to HashLink?. Neko and Haxe and HashLink? have the same lead dev (Neko precedes Haxe). Why is Haxe switching to HashLink?? And why isn't hashlink just called 'Neko 3.0'? explains that "Neko was my first complete virtual machine. It was designed before Haxe was even created and was meant to run various programming languages. In the end only two languages ended up targeting Neko: NekoML?, the language of the Neko compiler, and of course Haxe. Back then, the Neko virtual machine was not especially designed to run Haxe and suffered from some limitations, the main one being performance....Neko is very slow. The main reason for this is that it's dynamically typed". Hashlink by contrast is "strictly-typed register-based" (Neko is stack-based / a stack machine: [1].



Pedagogical language


C- language

Pedagogical language. Not to be confused with C--.



Primitive types: byte, word, string, real, big (and some short versions of some of these) Addressing modes: immediate, indirect and double indirect

Garbage Collection

there are separate instructions (omitted from below) for calling into a different module (and for allocating a stack frame prior to that call)


Question marks mean that there are a bunch of instructions of this form for different types.

arithmetic: add?, and?, div?, mod?, mul?, negf (negate real) comparison and branching and jumping: beq?, bge?, bgt?, ble?, blt?, bne?, call, casec (case), frame (Allocate frame for local call), goto (computed goto), jmp, mspawn other operations on primitive types: addc (concatenate strings), cons? (cons, "Allocate new list element"), head?, indc (index by character), indx (array index returning a pointer to the indexed location), ind? (array index returning the value at the location), insc (insert character into string), lena (len of array), lenc (len of string), lenl (len of linked list) multithreading: alt ("Alternate between communications"; "selects between a set of channels ready to communicate"), exit, nbalt (non-blocking alt) type conversion and coercion: cvt?? exception handling: eclr ("Clear exception stack"; reserved for internal use by the implementation, to cancel exception handlers on return from a function), module stuff: load (load module), mnewz (Allocate object given type from another module), memory: lea (load effective address, e.g. explicitly perform addressing mode), mov?, movm (copy range of memory), movmp (copy range of memory and update reference counts), movp (copy pointer and update reference counts), movpc (compute memory address of an immediate program counter value)

todo: the rest of these:

new, newz - Allocate object

    Syntax:		new	src, dst
    		newz	src, dst
    Function:	dst = malloc(src->size);
    		initmem(dst, src->map);

The new instruction allocates and initializes storage to a new area of memory. The size and locations of pointers are specified by the type descriptor number given as the src operand. A pointer to the newly allocated object is placed in dst. Any space not occupied by pointers has undefined value.

The newz instruction additionally guarantees that all non-pointer values are set to zero. It is not used by Limbo. newa, newaz - Allocate array

    Syntax:		newa	src1, src2, dst
    		newaz	src1, src2, dst
    Function:	dst = malloc(src2->size * src1);
    		for(i = 0; i < src1; i++)
    			initmem(dst + i*src2->size, src2->map);

The newa instruction allocates and initializes an array. The number of elements is specified by the src1 operand. The type of each element is specified by the type descriptor number given as the src2 operand. Space not occupied by pointers has undefined value. The newaz instruction additionally guarantees that all non-pointer values are set to zero; it is not used by Limbo. newcx - Allocate channel

    Syntax:		newcw	dst
    		newcb	dst
    		newcl	dst
    		newcf	dst
    		newcp	dst
    		newcm	src, dst
    		newcmp	src, dst
    Function:	dst = new(Channel)

The newc instruction allocates a new channel of the specified type and stores a reference to the channel in dst. For the newcm instruction the source specifies the number of bytes of memory used by values sent on the channel (see the movm instruction above). For the newcmp instruction the first operand specifies a type descriptor giving the length of the structure and the location of pointers within the structure (see the movmp instruction above). orx - Logical OR

    Syntax:		orb	src1, src2, dst
    		orw	src1, src2, dst
    		orl	src1, src2, dst
    Function:	dst = src1 | src

These instructions compute the bitwise OR of the two operands addressed by src1 and src2 and store the result in the dst operand. recv - Receive from channel

    Syntax:		recv	src, dst
    Function:	dst = <-src

The recv instruction receives a value from some other thread on the channel specified by the src operand. Communication is synchronous, so the calling thread will block until a corresponding send or alt is performed on the channel. The type of the received value is determined by the channel type and the dst operand specifies where to place the received value. ret - Return from function

    Syntax:		ret
    Function:	npc = link(fp)
    		mod = mod(fp)
    		fp = frame(fp)
    		pc = npc

The ret instruction returns control to the instruction after the call of the current function. send - Send to channel

    Syntax:		send	src, dst
    Function:	dst <-= src

The send instruction sends a value from this thread to some other thread on the channel specified by the dst operand. Communication is synchronous so the calling thread will block until a corresponding recv or alt is performed on the channel. The type of the sent value is determined by the channel type and the dst operand specifies where to retrieve the sent value. shlx - Shift left arithmetic

    Syntax:		shlb	src1, src2, dst
    		shlw	src1, src2, dst
    		shll	src1, src2, dst
    Function:	dst = src2 << src1

The shl instructions shift the src2 operand left by the number of bits specified by the src1 operand and store the result in the dst operand. Shift counts less than 0 or greater than the number of bits in the object have undefined results. shrx - Shift right arithmetic

    Syntax:		shrb	src1, src2, dst
    		shrw	src1, src2, dst
    		shrl	src1, src2, dst
    Function:	dst = src2 >> src1

The shr instructions shift the src2 operand right by the number of bits specified by the src1 operand and store the result in the dst operand. Shift counts less than 0 or greater than the number of bits in the object have undefined results. slicea - Slice array

    Syntax:		slicea	src1, src2, dst
    Function:	dst = dst[src1:src2]

The slicea instruction creates a new array, which contains the elements from the index at src1 to the index src2-1. The new array is a reference array which points at the elements in the initial array. The initial array will remain allocated until both arrays are no longer referenced. slicec - Slice string

    Syntax:		slicec	src1, src2, dst
    Function:	dst = dst[src1:src2]

The slicec instruction creates a new string, which contains characters from the index at src1 to the index src2-1. Unlike slicea , the new string is a copy of the elements from the initial string. slicela - Assign to array slice

    Syntax:		slicela	  src1, src2, dst
    Function:	dst[src2:] = src1

The src1 and dst operands must be arrays of equal types. The src2 operand is a non-negative integer index. The src1 array is assigned to the array slice dst[src2:]; src2 + nelem(src1) must not exceed nelem(dst). spawn - Spawn function

    Syntax:		spawn	src, dst
    Function:	fork();

The spawn instruction creates a new thread and calls the function specified by the dst operand. The argument frame passed to the thread function is specified by the src operand and should have been created by the frame instruction. subx - Subtract

    Syntax:		subb	src1, src2, dst
    		subf	src1, src2, dst
    		subw	src1, src2, dst
    		subl	src1, src2, dst
    Function:	dst = src2 - src1

The sub instructions subtract the operands addressed by src1 and src2 and stores the result in the dst operand. For subb, the result is truncated to eight bits. tail - Tail of list

    Syntax:		tail	src, dst
    Function:	dst = src->next

The tail instruction takes the list specified by the src operand and creates a reference to a new list with the head removed, which is stored in the dst operand. tcmp - Compare types

    Syntax:		tcmp	src, dst
    Function:	if(typeof(src) != typeof(dst))

The tcmp instruction compares the types of the two pointers supplied by the src and dst operands. The comparison will succeed if the two pointers were created from the same type descriptor or the src operand is nil; otherwise, the program will error. The dst operand must be a valid pointer. xorx - Exclusive OR

    Syntax:		xorb	src1, src2, dst
    		xorw	src1, src2, dst
    		xorl	src1, src2, dst
    Function:	dst = src1 ^ src2

These instructions compute the bitwise exclusive-OR of the two operands addressed by src1 and src2 and store the result in the dst operand.


Guile's VM

Guile 2.0: stack machine

About 180 instructions:

A selection of some of the instructions (quotes and paraphrased quotes from the documentation):

Lexical Environment Instructions:

These instructions access and mutate the lexical environment of a compiled procedure—its free and bound variables.

Top-Level Environment Instructions:

These instructions access values in the top-level environment: bindings that were not lexically apparent at the time that the code in question was compiled. The location in which a toplevel binding is stored can be looked up once and cached for later. The binding itself may change over time, but its location will stay constant. Currently only toplevel references within procedures are cached, as only procedures have a place to cache them, in their object tables.

Procedure Call and Return Instructions:

Function Prologue Instructions: todo

Trampoline Instructions:

Branch Instructions:

Data Constructor Instructions:

Loading Instructions:

Dynamic Environment Instructions:

Miscellaneous Instructions:

Inlined Scheme Instructions:

Inlined Mathematical Instructions:

Inlined Bytevector Instructions:


Guile 2.2's VM: register machine

A selection of some of the instructions (quotes and paraphrased quotes from the documentation):

Each VM instruction starts by indicating which operation it is, and then follows by encoding its source and destination operands. Each procedure declares that it has some number of local variables, including the function arguments. These local variables form the available operands of the procedure, and are accessed by index.

The local variables for a procedure are stored on a stack. Calling a procedure typically enlarges the stack, and returning from a procedure shrinks it. Stack memory is exclusive to the virtual machine that owns it.

In addition to their stacks, virtual machines also have access to the global memory (modules, global bindings, etc) that is shared among other parts of Guile, including other VMs.

The registers that a VM has are as follows:

    ip - Instruction pointer
    sp - Stack pointer
    fp - Frame pointer

The structure of the top stack frame is as follows:

   +==================+ <- fp + 2 = SCM_FRAME_PREVIOUS_SP (fp)
   | Dynamic link     |
   | Return address   |
   +==================+ <- fp
   | Local 0          |
   | Local 1          |
   | ...              |
   | Local N-1        |
   \------------------/ <- sp

The dynamic link is the offset of the fp that was in effect before this program was applied, relative to the current fp.

A function call in Guile is very cheap: the VM simply hands control to the procedure. The procedure itself is responsible for asserting that it has been passed an appropriate number of arguments. For example, only calls to keyword-argument procedures “pay” for the cost of parsing keyword arguments. (At the time of this writing, calling procedures with keyword arguments is typically two to four times as costly as calling procedures with a fixed set of arguments.)

Consider the following Scheme code as an example:

  (define (foo a)
    (lambda (b) (list foo a b)))

Within the lambda expression, foo is a top-level variable, a is a lexically captured variable, and b is a local variable.

Another way to refer to a and b is to say that a is a “free” variable, since it is not defined within the lambda, and b is a “bound” variable.

Going back to our example, b may be allocated on the stack, as it is never mutated.

a may also be allocated on the stack, as it too is never mutated. Within the enclosed lambda, its value will be copied into (and referenced from) the free variables vector.

foo is a top-level variable, because foo is not lexically bound in this example.

Guile allocates all variables on the stack. When a lexically enclosed procedure with free variables—a closure—is created, it copies those variables into its free variable vector. References to free variables are then redirected through the free variable vector.

Variables that are assigned are usually allocated in boxes, so that continuations and closures can capture their identity and not their value at one point in time. If a variable is ever set!, however, it will need to be heap-allocated instead of stack-allocated, so that different closures that capture the same variable can see the same value. Also, this allows continuations to capture a reference to the variable, instead of to its value at one point in time. For these reasons, set! variables are allocated in “boxes”—actually, in variable cells.

Thus perhaps counterintuitively, what would seem “closer to the metal”, viz set!, actually forces an extra memory allocation and indirection. Sometimes Guile’s optimizer can remove this allocation, but not always.

A compiled procedure is a compound object consisting of its bytecode and a reference to any captured lexical variables. In addition, when a procedure is compiled, it has associated metadata written to side tables, for instance a line number mapping, or its docstring. You can pick apart these pieces with the accessors in (system vm program). A procedure may reference data that was statically allocated when the procedure was compiled.

We can see how these concepts tie together by disassembling the foo function we defined earlier to see what is going on:

scheme@(guile-user)> (define (foo a) (lambda (b) (list foo a b))) scheme@(guile-user)> ,x foo Disassembly of #<procedure foo (a)> at #xea4ce4:

   0    (assert-nargs-ee/locals 2 0)    ;; 2 slots (1 arg)    at (unknown file):1:0
   1    (make-closure 1 7 1)            ;; anonymous procedure at #xea4d04 (1 free var)
   4    (free-set! 1 0 0)               ;; free var 0
   6    (mov 0 1)
   7    (return-values 2)               ;; 1 value

Disassembly of anonymous procedure at #xea4d04:

   0    (assert-nargs-ee/locals 2 2)    ;; 4 slots (1 arg)    at (unknown file):1:16
   1    (toplevel-box 1 74 58 68 #t)    ;; `foo'
   6    (box-ref 1 1)                   
   7    (make-short-immediate 0 772)    ;; ()                 at (unknown file):1:28
   8    (cons 2 2 0)                    
   9    (free-ref 3 3 0)                ;; free var 0
  11    (cons 3 3 2)
  12    (cons 2 1 3)
  13    (return-values 2)               ;; 1 value

Lexical Environment Instructions:

Top-Level Environment Instructions:

Procedure Call and Return Instructions:

Function Prologue Instructions:

Branch Instructions:

Dynamic Environment Instructions:

Miscellaneous Instructions:

Inlined Scheme Instructions:

There are also '/immediate' (immediate addr mode) variants of the vector and struct ops.

Inlined Mathematical Instructions:

There are also '/immediate' (immediate addr mode) variants of 'add' and 'sub'.

Unboxed Floating-Point Arithmetic instructions:

There are also a set of Constant Instructions, a set of Inlined Bytevector Instructions, and a set of Unboxed Integer Arithmetic instructions (interestingly this has both a left shift and a right shift).


See also Guile section of proj-plbook-plChLispLangs, proj-plbook-plChImplementationCaseStudies, proj-plbook-plChSingleIntermedLangs, proj-plbook-plChTargetsImpl.

Neohapsis Toy VM 101

Four General Purpose Registers: r1, r2, r3, r4

Instruction Registers: IP, baseip

Stack Registers: SP, basesp

Flag Registers: flags

Our Virtual CPU Instruction Set:

MOV r32, r32 MOV [r1], r32 MOV r1, [r1] MOV r32, value CMP r32, value INC/DEC r32 AND/OR r1, value XOR r32,r32 PUSH/POP r32 JMP (Relative / Direct) JE, JL, JG CALL (r1 / value) EXITVM

Mesa and Cedar

Mesa is interesting because a paper was published claiming that its bytecode had a very high code density (roughly 4 bytes per HLL statement; ).



"Simplicity is a typed, combinator-based, functional language without loops and recursion, designed to be used for crypto-currencies and blockchain applications"


much of the following is quoted or paraphrased from [2]


core Simplicity has 9 types of terms:

Simplicity's 'bit machine' has "two non-empty stacks of data frames. One stack holds read-only frames, and the other stack holds write-only frames. A data frame consists of an array of cells and a cursor pointing either at one of the cells or pointing just beyond the end of the array. Each cell contains one of three values: a zero bit, a one bit, or an undefined value." and 10 instructions:

I think Simplicity terms are translated into sequences of bit machine instructions?

Ethereum Virtual Machine (EVM) (for cryptocurrency smart contracts)

Ethereum is a platform for writing applications on top of a blockchain. The applications can send messages and currency between each other. The applications are run in virtual machines by third parties, and are charged 'gas' for each computational step.

It has all of a stack, local random-access memory ('local' as in function-local, like a locally scoped variable; this memory appears empty at the beginning of each function (even if this is a reentrant call); when the call returns to a parent function the parent's memory is intact), and persistent storage. The word size is 256-bit, and all arithmetic is mod 2^256. There is a 256 call depth limit. Many opcodes return '0' in case of an error (rather than terminating execution). There is no built-in 'cron' or 'timer' facility for programs to be executed periodically; all execution must be triggered externally by a user (this external user may, however, be an external cron script). Each step of execution costs 'gas'; the fees may be seen at At any given time, there is a maximal execution time (in terms of 'gas') (the GASLIMIT opcode returns the current limit). The current gas limit may be seen at At the time of this writing, the limit is 3141592. talks a bit about some design choices.


A list of 127 (as of this writing) opcodes is at , appendix H 'Virtual Machine Specification', PDF page 22. () ( an older version with 56 opcodes may be found at , section section 'Virtual machine opcodes'; this older version also has slightly different instruction description phrasing that may be easier to read, which are copied below for much of the wording of the instruction descriptions; differences in the versions include the addition of ADDMOD, MULMOD, EXTCODSIZE, EXTCODECOPY, BLOCKHASH replacing PREVHASH, JUMPDEST, LOG*, CALLCODE, SIGNEXTEND, DUP2+, SWAP2+, the removal of NEG, the return value of CREATE).

The VM halts with an error if:

These conditions can be checked before executing each instruction, so "no instruction can, through its execution, cause an exceptional halt" -- [3]

Here's the list:

Terminating opcode:

Basic opcodes:

Cryptographic opcodes:

Ethereum message opcodes:

Ethereum blockchain opcodes:

Memory and control flow opcodes:

Push Immediate Value and DUP and SWAP opcodes


Interaction with other messages / call stack opcodes:

Error handling opcode:

EVM and proofs of correctness

EVM: proposal to remove dynamic jumps to assist static analysis

Proposal to remove the two dynamic jump instructions, JUMP and JUMPI (JUMPI is a conditional jump, not an immediate form of JUMP), and replace them with five primitive instructions: JUMPTO, JUMPIF, BEGINSUB, JUMPSUB, RETURNSUB.

JUMPTO (JUMPTO jump_target) is an immediate jump (that is, a jump to a constant address embedded inline in the program code). JUMPIF (JUMPIF jump_target) is a conditional form of JUMPTO.

BEGINSUB, JUMPSUB, and RETURNSUB are for making subroutines. Each subroutine can only have a single entry point, which is marked with BEGINSUB (BEGINSUB n_args, n_results). RETURNSUB (RETURNSUB) returns. JUMPSUB (JUMPSUB jump_target) jumps into a subroutine.

In addition, the proposal provides the following non-essential (but efficient) new instructions:

BEGINDATA: specifies that all of the following bytes to the end of the bytecode are data, and not reachable code.

JUMPV n, jumpdest ...: (n and each jumpdest are two inline bytes): jump to one of a vector of n JUMPDEST offsets (the index is on top of the stack). "If the index is greater than or equal to n - 1 the last (default) offset is used...used to implement O(1) jumptables"

JUMPSUBV n, beginsub: like JUMPV but like a JUMPSUB instead of a JUMPTO.

GETLOCAL x, PUTLOCAL x: (x is two inline bytes): push/pop the stack, getting/putting the result from/to local variable n. "Local variable n is the n'th stack item below the frame pointer".

The proposal provides for the VM to maintain internally (not accessible to the program) a frame pointer, a return stack (of places to return to when subroutines return), and a frame stack (a stack of frame pointers corresponding to the return pointers in the return stack) (afaict, the return stack and the frame stack could be combined). "The frame pointer FP is set to SP + n_args at entry to the currently executing subroutine....Defining the frame pointer so as to include the arguments is unconventional, but better fits our stack semantics and simplifies the remainder of the proposal."

"The first instruction of an array of EVM bytecode begins execution of a main routine with no arguments, SP and FP set to 0, and with one value on the return stack—code size - 1. (Executing the virtual byte of 0 after this offset causes an EVM to stop. Thus executing a RETURNSUB with no prior JUMPSUB or JUMBSUBV—that is, in the main routine—executes a STOP.)"

The proposal also defines some validity conditions. The preexisting EVM validity conditions (asserted at each instruction) are:

The new validity conditions are:

"In practice, we must test at runtime for conditions 1 and 2—sufficient gas and sufficient stack. We don’t know how much gas there will be, we don’t know how deep a recursion may go, and analysis of stack depth even for non-recursive programs is non-trivial. All of the remaining conditions we validate statically."

"All of the instructions are O(1) with a small constant, requiring just a few machine operations each, whereas a JUMP or JUMPI must do an O(log n) binary search of an array of JUMPDEST offsets before every jump."

The 'Validation' section of [19] gives some example validation code.

Tooling and variant implementations

"...there is a way to make compilers that are not subject to these sorts of bugs: build the compiler inside of a proof assistant, and prove its correctness using a formal proof that can be checked by a machine. This is exemplified by the CompCert? project, which is built in the proof assistant Coq and has accompanying proofs of correctness. In the CSmith study, no bugs were found in the parts of CompCert? that had been proven correct.

Elle is a project to do this same thing, for a structured programming language (Elle-Core, or just “Elle” for short when clear from context) that compiles down to the EVM.

Elle builds on Eth-Isabelle, a formal specification of the EVM...inside of the Isabelle proof assistant...

On top of this EVM implementation, Elle contains a syntactic definition of the Elle-Core language along with a formal semantics for it. It contains an implementation of a translation from Elle-Core to EVM (described in detail here), as well as a correctness proof linking the semantics of Elle-Core programs to their compiled EVM counterparts (described in detail here).

Elle-Core provides users with a structured programming abstraction, freeing them from having to reason directly about addresses of program locations when describing control-flow (e.g. the jumps and conditional jumps used to implement if-statements and for-loops). With Elle’s structural abstraction, it becomes straightforward to implement conventional control structures such as if, for, unless, etcetera. "

More info



Intermediate language targeting the EVM.

"LLL was always meant to be very simple and minimalistic; essentially just a tiny wrapper over coding in ASM directly. In my opinion just use serpent; it has direct access to opcodes so it is a superset of LLL but it also has a whole bunch of high-level features as well for when you want them. The downside is that the compiler is more complex and so theoretically might contain more bugs." [20]





Intermediate language between Solidity and the EVM.


Michelson (Tezos VM)

A smallish, statically typed, functional, concatenative, stack-based language.

most of the following is paraphrased or quoted from



" Why Michelson?

At first sight, Michelson is a strange language. It doesn’t include many features like polymorphism, closures, or named functions. Compared to a language like Haskell or OCaml, it seems underpowered; its stack is not always easy to deal with; there is no standard library. However, these restrictions are largely motivated by the language’s design goals.

Tezos takes a slightly different view from Ethereum regarding the role of smart contracts. We think of our platform more as a way to implement pieces of business logic than as a generic “world computer”. Looking at Ethereum, most contracts implement things like multisig wallets, vesting and distribution rules, etc. Michelson is targeted to these applications, not the case of arbitrary programs.

Michelson is designed as a readable compilation target, though it can be hand written. The goal is that even the output of a compiler can be understood. We intend the language to be simple enough that developers can build their own analysis tools and compilers should they prefer to do so. This is a departure from the EVM’s bytecode which more closely resembles assembly. With a lower-level bytecode, you usually need confidence in both your program and the compiler toolchain. With Michelson you can more easily check over and verify properties of the program that is actually executed. Using a higher-level bytecode also simplifies the process of proving properties about the compiled output. Programs written in Michelson can be reasonably analyzed by SMT solvers and formalized in Coq without the need for more complicated techniques like separation logic. Similarly, the restrictions imposed by the forced indentation and capitalization ensure that the source cannot be obfuscated with indentation and alignment tricks.

Our current implementation of Michelson is based around an OCaml GADT, which we have used to verify the type-soundness of the language. Additionally, the implementation of a stack based language maps directly to the semantics. The same is not true for any efficient implementation of the lambda-calculus. There have also been two formally verified implementations of Michelson, one in Coq and one in F*. One day, we hope to replace our current implementation with a verified one.

Finally, one of the main advantages of Tezos is that the system is amendable. We want to start with a small core language in which we are confident and add features as good use cases are created for them. We don't want to throw everything into the language in at the onset and then break backwards compatibility.

So, why Michelson? To provide a straightforward platform for business logic, to provide a readable bytecode, and to be introspectable. When I was working with Olin Shivers, he was very fond of saying that one should always use a "tool small enough for the job". Michelson has been carefully designed to be that tool. " [21]

Michelson Types

core primitive constant types:


other core types:

domain-specific types:

Michelson Instructions

Michelson core control structures and instructions

Michelson core stack instructions

Michelson core comparison instructions

There is syntactic sugar for merging COMPARE and comparison combinators, and also for branching:

NEQLTGTLEGE}: same as EQ/NEQ/LT/GT/LE/GE followed by IF.
NEQLTGTLEGE}: same as COMPARE followed by EQ/NEQ/LT/GT/LE/GE followed by IF.

Michelson boolean instructions

Michelson integer instructions

Performing computations between values of different int types must be done via explicit casts.

" For specifying arithmetics, we consider that integers are all stored on 64 bits (the largest integer size) so that we can express the operations, in particular casts, using usual bitwise masks. In this context, the type indicator functions are defined as follows (which can be read both as a constraint on the bitpatttern and as a conversion operation). Uint64 (x) = Int64 (x) = x Uint32 (x) = x & 0x00000000FFFFFFFF Int32 (x) = x & 0x00000000FFFFFFFF

(x & 0x80000000 ? 0xFFFFFFFF00000000 : 0)
 Uint16 (x) = x & 0x000000000000FFFF
 Int16 (x) = x & 0x000000000000FFFF
           | (x & 0x8000 ? 0xFFFFFFFFFFFF0000 : 0)
 Uint8 (x) = x & 0x00000000000000FF
 Int8 (x) = x & 0x00000000000000FF
          | (x & 0x80 ? 0xFFFFFFFFFFFFFF00 : 0)"

Alternative instructions that check for overflows (and are like a 'FAIL' if there is an overflow):

Bitwise logical instructions available only on unsigned integers (in addition to bool):

Logical shifts are available only on unsigned integers:

Michelson operations on strings

("Strings are mostly used for naming things without having to rely on external ID databases. So what can be done is basically use string constants as is, concatenate them and use them as keys.")

Michelson operations on pairs

Michelson operations on sets

Recall that sets are homogeneous and typed generic/polymorphic; their type includes the type of the elements they contain.

Michelson operations on maps

Michelson operations on optional values

Michelson operations on unions



AntShares VM

AntShares VM Instructions

Copied/paraphrased from [22]:


Flow control:




Arithmetic (Note: Arithmetic inputs are limited to signed 32-bit integers, but may overflow their output):



Sketch of proposal for smart contracts in Ripple cryptocurrency

CIYAM's proposal for smart contracts/automated transactions in NXT cryptocurrency



"Each instruction contains an opcode, the immediate flag, two register addresses ri and rj , and a w-size third operand A , which is either interpreted as a register address or an immediate depending on the value of the immediate flag. Each instruction fills two w-size words."

instructions shared with Lajos:

instructions in tinyRAM but not in Lajos:

Lajos (a putative successor/variant to tinyRAM)

"Lajos instructions are w -sized. This allows us to store one instruction per memory location, but it means that we cannot have full w -sized immediate values. If w -sized immediates are necessary, they can be encoded in the program text and thus pre-loaded in memory and accessible via LOAD . (Note that unlike a real processor, there is no additional overhead for memory access)....Lajos instructions contain the same fields as in TinyRAM?, but since they are packed into a single w -size word, the immediate size depends on choice of w and k . A convenient choice is w = 32 and k = 32 , yielding a 16-bit immediate...Reduced immediate size has an important ramification: immediate values can no longer be used to address all 2 w memory locations. This mainly limits the jump and memory access instructions; as a result, in Lajos these instructions are modified such that immediate values offset a base address."

"Lajos has a zero register: any read from r0 yields 0, and any write to r0 is discarded."

The zero register "allows further instruction set simplification. In particular, Lajos eliminates all 5 CMP * instructions, adding in their place SUBS ; along with SUB and XOR , all functionality is retained. Further, MOV can be eliminated, since a zero operand is always available for the ADD , SUB , OR , and XOR instructions."

"having relative jumps allows us to neatly eliminate the CMOV instruction."

"SHR and SHL are both relatively expensive instructions: the limitations of arithmetic circuits dictate that bit shifting a variable distance involves an unrolled loop. Consequently, both SHR and SHL imply a loop of length w in the transition verifier. Lajos reduces this burden by eliminating SHR , since it can be emulated by SHL followed by DIV . The semantics of SHL are also modified from TinyRAM? to better accommodate emulated SHR ."

"The READ instruction becomes quite costly when the tape lengths are large; since the READ instruction must be encoded in the transition verifier, this cost is repeated for each execution step. However, as with the program text itself, the tapes can be pre-loaded into memory before execution starts, obviating the READ instruction entirely. To facilitate this, the Lajos assembler provides two directives, inTape and auxTape , that set the memory location at which the tapes are loaded. (See the TrAsm? documentation for more details.)"

instructions shared with tinyRAM:

instructions in Lajos but not in tinyRAM:

TinyRAM? was originally a Harvard architecture, but has been modified to vnTinyRAM, a von Neumann variant ("program and data are stored in the same read-write address space; programs may use runtime code generation").[23] [24]


The zk-SNARK framework can be used by a party who known a satisfying instance of an NP problem to prove to another party the existence of the satisfier without revealing the identity of the satisfier. The NP statement to be proved can be expressed in any of: TinyRAM?, R1CS ("(Rank-1 Constraint Systems), which is a language that is similar to arithmetic circuit satisfiability"), BACS ("(Bilinear Arithmetic Circuit Satisfiability). This simplifies the writing of NP statements when the additional flexibility of R1CS is not needed. Internally, it reduces to R1CS"), USCS ("Unitary-Square Constraint Systems"), TBCS ("Two-input Boolean Circuit Satisfiability").

" An instance of the language is specified by a set of equations over a prime field F, and each equation looks like: < A, (1,X) > * < B , (1,X) > = < C, (1,X) > where A,B,C are vectors over F, and X is a vector of variables.

In particular, arithmetic (as well as boolean) circuits are easily reducible to this language by converting each gate into a rank-1 constraint. See [BCGTV13] Appendix E (and "System of Rank 1 Quadratic Equations") for more details about this. "


" GCC uses three main intermediate languages to represent the program during compilation: GENERIC, GIMPLE and RTL. GENERIC is a language-independent representation generated by each front end. It is used to serve as an interface between the parser and optimizer. GENERIC is a common representation that is able to represent programs written in all the languages supported by GCC.

GIMPLE and RTL are used to optimize the program. GIMPLE is used for target and language independent optimizations (e.g., inlining, constant propagation, tail call elimination, redundancy elimination, etc). Much like GENERIC, GIMPLE is a language independent, tree based representation. However, it differs from GENERIC in that the GIMPLE grammar is more restrictive: expressions contain no more than 3 operands (except function calls), it has no control flow structures and expressions with side-effects are only allowed on the right hand side of assignments. See the chapter describing GENERIC and GIMPLE for more details. " --


"GENERIC [163] is an abstract syntax representation of C that takes care of parsing and linking issues. It remains at the same level of abstraction as C source: control structures are preferred over control-flow graph; expression trees are preferred over three-address code expressions; typed variables are preferred over pseudo-registers or registers; stack frames remain implicit. It is the first main intermediate language used in the GCC compiler suite" -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 42

gcc Gimple

"SIMPLE is an intermediate language in the McCAT? compiler that facilitates alias and dependency analyses. It is simpler than GENERIC, with expression trees translated into three-address code expressions. GIMPLE is a clone of SIMPLE where control structures are translated into control-flow graphs. It is the second main intermediate language used in the GCC compiler suite." -- Yannick Moy. Automatic Modular Static Safety Checking for C Programs. PhD thesis, Université Paris-Sud, January 2009, section 2.1, page 42

gcc RTL

GNU Lightning

"a portable, fast and easily retargetable dynamic code generation system....The interface that it exposes to is that of a standardized RISC architecture loosely based on the SPARC and MIPS chips."

"There are a few general-purpose registers (six, not including those used to receive and pass parameters between subroutines), and arithmetic operations involve three operands—either three registers or two registers and an arbitrarily sized immediate value. ... There are at least seven integer registers, of which six are general-purpose, while the last is used to contain the frame pointer (FP). The frame pointer can be used to allocate and access local variables on the stack, using the allocai or allocar instruction.

Of the general-purpose registers, at least three are guaranteed to be preserved across function calls (V0, V1 and V2) and at least three are not (R0, R1 and R2). "

Copied or paraphrased from :

types: "

Each instruction is composed of:

Each instruction takes two or three operands; in most cases, one of them can be an immediate value instead of a register.

most non-memory operations only take integers (either signed or unsigned) as operands; this was done in order to reduce the instruction set, and because most architectures only provide word and long word operations on registers.

I've omitted the variant suffixes 'r' and 'i' which say whether or not the last operand is immediate.

Binary ALU operations:

These accept three operands; the last one can be an immediate. 'add' and 'sub' have three variants, plain, 'c' (set carry), and 'x' (add carry). addx operations must directly follow addc, and subx must follow subc; otherwise, results are undefined.

Unary ALU operations:

Compare instructions:

These accept three operands. The last two operands are compared, and the first operand, that must be an integer register, is set to either 0 or 1.

Transfer operations:

Network extensions:

Load operations:

Store operations:

Flow control, argument management, for the caller:

Flow control, argument management, for the callee:

Branch instructions:

Takes three arguments, one jump location and two functions to be compared.

!(O2 <= O3) goto O1), bordr (if (O2 == O2) && (O3 == O3) goto O1), bunordr (if !(O2 != O2)(O3 != O3) goto O1)


See documentation for section Labels in .


Used by Racket.


Was part of

libjit instructions


Portable Assembly Forth (PAF)

" An issue that a number of portable assembly languages have had is that they require the code to be organized in functions that follow the standard calling convention (ABI) of the platform, which usually prevents tail-call optimization. PAF provides ABI calls and definitions for interfacing with the rest of the world, but also PAF calls and definitions, which (unlike ABI calls) can be tail-call-optimized and can therefore be used as universal control flow primitives [Ste77] (see Section 3.10 and 3.11).

Another problem is that indirect branches and calls have a high cost, because the compiler has to assume that every branch/call can reach any entry point. PAF introduces tags to specify which branches/calls can reach which entry points (see Section 3.9 and 3.10).

The most significant difference between PAF and Forth is that PAF contains restrictions that ensure that the stack depth is always statically determinable, so stack items can be mapped to registers (Section 3.3 and 3.9). It is interesting that these restrictions are relatively minor and don’t affect much Forth code; it’s also interesting to see an example of Forth code that is affected (see Section 5).


The main innovations of PAF are: tags indicate the control flow for indirect branches and calls; and PAF has two kinds of calls and definitions: the ABI ones follow the platform’s calling convention and are useful for interfacing to the outside world, while the PAF ones allow tail-call elimination and are useful for implementing general control structures.


2 Previous Work

2.1 ((why not just use C?))

Another problem of C (and probably a reason why it is not used as often as compiler target language as for interpreters) is that its control flow is quite inflexible: Code is divided into C functions, that can be called and from which control flow can return; the only other way to change control flow across functions is longjmp() . Varargs in combination with other language features have led to calling conventions where the caller is responsible for removing the arguments from the stack. This makes it impossible to implement guaranteed tail-call optimization, which would be necessary to use C calls as a general control flow primitive [Ste77] As a result, any control flow that does not fit the C model, such as unlimited tail calls, backtracking, coroutining, and even exceptions is hard to map to C efficiently


2.2 ((why not just use LLVM?))

LLVM shares many of the problems of have to divide the code into functions that follow some calling convention, restricting the kind of control flow that is possible. To work around this problem, it is possible to add your own calling convention, but that is not easy


2.3 ((why not just use C--?))

C-- has been designed as portable assembly language. Many considerations went into its design, and it appears to be well-designed, if a little too complex for my taste, but the project appears to be stagnant...

2.4 ((why not just use GNU Lightning?))

GNU Lightning cannot perform any register allocation on its own. Therefore the front end has to perform the register allocation. It also does not perform instruction selection; each Lightning instruction is translated to at least one native instruction...

it is...possible to implement your own calling conventions and other control flow, ..., but (from reading the manual) it is not clear if this can be integrated with the stack handling by GNU Lightning und if one can use the processor’s call instruction for your own calling convention.

It is possible to use better code generation technology with the GNU Lightning interface, and also to provide ways to use the processor’s call and return instructions for your own calling convention. With these changes, wouldn’t the GNU Lightning interface be the perfect portable assembly language? It would certainly satisfy the basic requirements of a portable assembly language, but as a replacement for a language like C, it misses conveniences like register allocation...

the relation between PAF and the machine is not as direct as for GNU Lightning: There is register allocation and instruction selection, there may be instruction scheduling, and code replication. Instruction selection and instruction scheduling make better code possible (at the cost of slower compilation); register allocation interacts with these phases, and leaving it to the clients would require duplicated work in the clients, as register allocation is not really language-specific


3.2 Target we define the class of machines that we target with PAF:

PAF targets general-purpose computer architectures, i.e., the architectures that have been designed as compiler targets, such as AMD64, ARM, IA-32, IA-64, MIPS, PowerPC?, SPARC.

Memory on the target machines is byte-addressed with a flat address space; e.g., DSPs with separate X and Y address spaces are not target machines.

The target machines use modulo (wrap-around) arithmetics and and signed numbers are represented in 2s-complement representation.

The target machines have a uniform register set for integers and addresses (not, e.g., accumulators with different size than address registers), and possibly separate (but internally also uniform) floating point registers.


3.9 Control flow inside definitions

The standard Forth words begin again until ahead if then cs-roll are available in PAF and are useful for building structured control flow, such as if ... then ... elsif ... then ... else ... end . While one can construct any control flow with these words [Bad90], if you want to implement labels and gotos, it’s easier to use labels and gotos. Therefore, PAF (unlike Forth) provides that, too: L: name defines a label and goto: name jumps to it.

PAF also supports indirect gotos: ’ name / tag produces the address of label name, and goto/ tag jumps to a label passed on the stack. The tag indicates which gotos can jump to which labels; a PAF program must not jump to a label address generated with a different tag. E.g., a C compiler targeting PAF could use a separate tag for each switch statement and the labels occuring there.


Whichever method of control flow you use, on a control flow join the statically determined stack depth has to be the same on all joining control flows. This ensures that the PAF compiler can always determine the stack depth and can map stack items to registers even across control flow. This is a restriction compared to Forth, but most Forth code conforms with this restriction. Breaking this rule is detected and reported as error by the PAF compiler. So the tags have another benefit in connection with the stack-depth rule: The static stack depth for a given tag must be the same (for all labels and all gotos), but they can be different for different tags. If there were no tags, all labels and gotos in a definition would have to have the same stack depth. ...

3.10 PAF Definitions and PAF calls...

The stack effects of all definitions whose address is taken with the same tag have to be compatible. I.e., there must be one stack effect that describes all of them; e.g., ( x x -- x ) is a valid stack effect of both + and drop (although the minimal stack effect of drop is ( x -- ) ), so + and drop have compatible stack effects.

The use of tags here has two purposes: It informs the PAF compiler about the control flow; and it also informs it about the stack effect of the indirect call (while a Forth compiler usually has to assume that execute can call anything, and have any stack effect). Or conversely, in connection with the stack-depth rule: Tags allow different stack effects for indirectly called definitions with different tags; without tags, all indirectly called definitions would have to have the same stack effect.


3.11 ABI definitions and ABI calls

We need to specify the stack effect explicitly as signature of an ABI definition or call. The syntax for such a signature is [xr]*-[xr]* , where x indi- cates a cell (machine word/integer/address) argument, and r a floating-point argument; the letters before the - indicate parameters, and the letters afterwards the results. The division into x and r reflects the division into general-purpose registers and floating-point registers on real machines, and the role these registers play in many calling conventions.

A definition conforming to the calling convention is defined with abi: sig name


You can take the address of an ABI funtion with abi’ name and call it with abi-exec. sig . There are no tail calls to ABI functions, because we cannot guarantee that tail calls can be optimized in all calling conventions.


3.12 Why have two kinds of definitions and two kinds of calls?

The PAF definitions and calls allow to implement various control structures such as backtracking through tail calls [Ste77]. They also allow the compiler to use flexible and possibly more efficient calling interfaces than the ABI calling convention. On the other hand, the ABI counterparts allow interfacing with other languages and using dynamically or statically linked binary libraries, including callbacks, and using PAF to build such libraries (e.g., as plug-ins).

... 3.13 Exceptions

It is possible to build non-local control-flow such as exceptions with tail-calls, but it is often more convenient to let a PAF definition correspond to a source language function/method/procedure (no need to spread locals across several definitions). Exceptions are a common non-local control-flow construct, so PAF includes them


4 Non-Features ... Garbage collection...Types PAF does not perform type checking during compi-lation, nor at run-time; also, there is no overloading of several operations on the same operator based on types....Debugger...SIMD...


5 PAF vs. Forth

PAF has restrictions and features that allow the compiler to statically determine the stack depth. As a consequence, in PAF there is no need to implement the stacks in memory, with a stack pointer for each stack (data stack and return stack for cells, floating-point stack for floating-point values). In contrast, Forth needs to have a separate mem- ory area and stack pointer for each stack, and while stack items can be kept in registers for most of the code, there are some words (in particular, execute ) and code patterns (unbalanced stack effects on control flow joins), that force stack items into memory and usually also force stack pointer updates.

This property of Forth is avoided in PAF by requiring balanced stack effects on control flow joins (see Section 3.9), and by replacing execute with exec. tag (see Section 3.10); all definition addresses returned for a particular tag are required to have compatible stack effects, so exec. tag has a statically determined stack effect

The effect on real programs is relatively small: most Forth code has balanced stack effects for control flow anyway, and most occurences of ’ and execute can be converted to their tagged variants, because programmers keep the stack depth statically determinable in order to keep the code understandable. However, there are cases where the restrictions are not so easy to comply with. E.g., object-oriented packages in Forth use execute for words with arbitrary stack effects.


6 Related work

We have discussed C, LLVM, C--, and Vcode/GNU Lightning in Section 2. There are projects that are similar to PAF in using a restricted or modified form of a higher-level language as portable assembler:

Machine Forth (which evolved into colorForth) is a simple variant of Forth created by Chuck Moore, the inventor of Forth. It closely corresponds to the instructions on his Forth CPUs, but he also wrote an implementation for IA-32 that creates native code. The IA-32 compiler is very simple, basically just expanding the words into short machine code sequences.

It does not map stack items beyond the top-of-stack to registers, yet the generated code is relatively compact; this reflects the fact that machine Forth is close to the machine, including IA-32.


7 Conclusion

The main contributions of PAF are:


One may wonder about the “absence” of some operations in PAF; e.g., there is <? U<? , but only =? + - * . The reason is that, on the two’s-complement machines that PAF targets, these operations are the same for signed and unsigned numbers


UEFI's EBC (EFI Byte Code) Chapter 21, page 985, PDF page 1034 (instruction set listing starts at 21.8, page 994, PDF page 1043)

Many arguments support both direct and indirect indexing. 'Natural values' are special UEFI-specified number representations used to abstract away from bit length.

moves: MOV, MOVI (immediate), MOVIn (immediate indexed value), MOVn (unsigned natural value), MOVsn (signed natural value, MOVREL (relative indexing)

arithmetic: ADD, SUB, MUL, MULU (unsigned MUL) DIV, DIVU, MOD, MODU, NEG

logical: AND, OR, NOT, XOR

shifts and conversions: ASHR, SHL, SHR

conversions: EXTNDB (sign-extend a byte (8-bit) value), EXTNDW (sign-extend a 16-bit value), EXTNDD (sign-extend a 32-bit value)

comparisons: CMP (=, <=, >=, unsigned <=, unsigned >), CMPI (CMP immediate)

control: JMP (unconditional, or conditional upon condition bit in FLAGS register; absolute, or relative; possibly immediate; 32-bit immediate value (in which case there is also a register operand, which could be the zero register, and the immediate value is an offset), or 64-bit operand (that must be immediate; no register operand), JMP8 (relative jump to signed 8-bit offset; unconditional, or conditional upon condition bit in FLAGS register), CALL, RET

stack ops: PUSH, PUSHn (unsigned natural), POP, POPn

misc: BREAK (halt/error; get version; debug breakpoint; system call; create thunk; set compiler version), LOADSP (load FLAGS register with a general-purpose register), STORESP


Mu calls itself a 'micro' virtual machine for managed languages (especially dynamic ones): in comparison to VMs such as the JVM and .NET which are "designed for certain kinds of languages", it is "minimalist", "explicitly designed to support the development of new languages", and "target[s] a low level of abstraction"; the IR is "SSA-based rather than stack-based, the type system is much simpler, the concurrency primitives are low-level, and all high-level optimizations are the responsibility of the client run-time system, not the micro virtual machine".

In my opinion, Mu appears to position itself as a better LLVM. In comparison to LLVM, Mu is "not a compiler framework", and "performance-critical language-specific optimizations are performed outside the micro virtual machine in higher layers of the run-time stack" [25] (but it seems to me that if Mu took off as an IR, it would be possible to write an optimizer framework over it, and provide common language-agnostic optimizer passes, just as LLVM does on top of its core LLVM IR). More importantly, Mu has built-in support for garbage collection and concurrency. [26] states that the Mu has "just three concerns: hardware (compiler back end), memory (garbage collector), and concurrency (scheduler and memory model for language implementers)." "Mu is designed to support managed languages while the LLVM is designed for C-like languages", for example, "the LLVM IR type system contains raw pointer types but not reference types". ", LLVM’s focus on heavy-duty static optimizing compilation of C/C++ leaves its support for dynamic just-in-time (JIT) compilation as somewhat of an afterthought, receiving less attention to quality and maintenance. In contrast, dynamic languages rely heavily on on-going run-time (re)compilation to achieve good performance". "LLVM’s focus on C means that support for garbage collection, concurrency, and dynamic typing are minimal or weak."

design criteria

Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development lists eight design principles:



numbers: int, float, double


calling: func, thread, stack


misc notes

"The top-level unit of the Mu IR is a code bundle , which contains definitions of types, function signatures, constants, global cells and functions. A function has basic blocks and instructions. The Mu instruction set contains LLVM-like primitive arithmetic and logical instructions as well as Mu-specific garbage-collection-aware memory operations, thread/stack operations, and traps"

particularly IL-ish things

(this is meant to be an IL after all)

"SSA-based rather than stack-based"

"For the convenience of the micro virtual machine rather than the client, the types of the operands are explicitly written as type arguments so that Mu does not need to infer the type of any instruction from the types of its operands."

"Mu IR programs must explicitly truncate, extend, convert or cast the arguments to match the signature."

"The client must explicitly generate TAILCALL instructions to utilize this feature. Mu implementations need not automatically convert conventional CALL s into TAILCALL s, though an implementation might."

" Exception handling within a function, for example, a throw statement in a try - catch block in Java, should be translated to branching instructions ( BRANCH and BRANCH2 ) in the Mu IR. In this case, Mu is not aware of any exceptions being thrown. "


exact garbage collection.

instruction set




The QEMU emulator/virtualization system internally uses a compiler called "Tiny Code Generator" (TCG) to translate the instructions for the processor being emulated into the instructions for the host processor. The idea is that when TCG is extended to emulate some type of processor, someone writes a translation from the operations of the ISA of the emulated processor into the "TCG frontend ops" (TCG frontend instruction set); and when TCG is ported to itself run on a new platform, someone writes a translation from the "TCG backend ops" (TCG backend instruction set). Then when QEMU is emulating some software, the software being emulated is translated to frontend ops, the frontend ops are translated to backend ops, and the backend ops are translated into instructions for the host processor.

QEMU TCG Backend instrution set

List of backend ops:

3264): byte swap a 16-bit/32-bit/64-bit quantity
1632)(SU): Sign (S) or Zero (U) extend a (8/16/32)-bit operand


moving data between registers and arbitrary host memory:

1632)(SU) Load an (8/16/32)-bit quantity from host memory and (sign or zero)-extend
1632): Store an (8/16/32)-bit quantity to host memory

moving data between registers and arbitrary target memory:

ST)(8163264)(SU): same as above

Code Flow:


CALL, BR, DISCARD, NOP do not appear to be available in the frontend [27], but perhaps i'm misunderstanding.

The frontend also includes some extra-op helpers:

temporary registers:


Note that "As of 2014, QEMU does not support 3D acceleration" [28].




The Wikipedia description is good and fairly concise, so i'll just quote it here:

" The SECD machine is a highly influential[citation needed] virtual machine and abstract machine intended as a target for functional programming language compilers. The letters stand for Stack, Environment, Control, Dump, the internal registers of the machine. These registers point to linked lists in memory.

The machine was the first to be specifically designed to evaluate lambda calculus expressions. It was originally described by Peter J. Landin as part of his ISWIM programming language definition in 1963. The description published by Landin was fairly abstract, and left many implementation choices open (like an operational semantics). Hence the SECD machine is often presented in a more detailed form, such as Peter Henderson's Lispkit Lisp compiler, which has been distributed since 1980. Since then it has been used as the target for several other experimental compilers.

In 1989 researchers at the University of Calgary worked on a hardware implementation of the machine.[1]

Informal description

When evaluation of an expression begins, the expression is loaded as the only element of C. The environment E, stack S and dump D begin empty.

During evaluation of C it is converted to reverse Polish notation with ap (for apply) being the only operator. For example, the expression F (G X) (a single list element) is changed to the list X:G:ap:F:ap.

Evaluation of C proceeds similarly to other RPN expressions. If the first item in C is a value, it is pushed onto the stack S. More exactly, if the item is an identifier, the value pushed onto the stack will be the binding for that identifier in the current environment E. If the item is an abstraction, a closure is constructed to preserve the bindings of its free variables (which are in E), and it is this closure which is pushed onto the stack.

If the item is ap, two values are popped off the stack and the application done (first applied to second). If the result of the application is a value, it is pushed onto the stack.

If the application is of an abstraction to a value, however, it will result in a lambda calculus expression which may itself be an application (rather than a value), and so cannot be pushed onto the stack. In this case, the current contents of S, E, and C are pushed onto D (which is a stack of these triples), S is reinitialised to empty, and C is reinitialised to the application result with E containing the environment for the free variables of this expression, augmented with the binding that resulted from the application. Evaluation then proceeds as above.

Completed evaluation is indicated by C being empty, in which case the result will be on the stack S. The last saved evaluation state on D is then popped, and the result of the completed evaluation is pushed onto the stack contents restored from D. Evaluation of the restored state then continues as above.

If C and D are both empty, overall evaluation has completed with the result on the stack S. Registers and memory

The SECD machine is stack-based. Functions take their arguments from the stack. The arguments to built-in instructions are encoded immediately after them in the instruction stream.

Like all internal data-structures, the stack is a list, with the S register pointing at the list's head or beginning. Due to the list structure, the stack need not be a continuous block of memory, so stack space is available as long as there is a single free memory cell. Even when all cells have been used, garbage collection may yield additional free memory. Obviously, specific implementations of the SECD structure can implement the stack as a canonical stack structure, so improving the overall efficiency of the virtual machine, provided that a strict bound be put on the dimension of the stack.

The C register points at the head of the code or instruction list that will be evaluated. Once the instruction there has been executed, the C is pointed at the next instruction in the list—it is similar to an instruction pointer (or program counter) in conventional machines, except that subsequent instructions are always specified during execution and are not by default contained in subsequent memory locations, as it is the case with the conventional machines.

The current variable environment is managed by the E register, which points at a list of lists. Each individual list represents one environment level: the parameters of the current function are in the head of the list, variables that are free in the current function, but bound by a surrounding function, are in other elements of E.

The dump, at whose head the D register points, is used as temporary storage for values of the other registers, for example during function calls. It can be likened to the return stack of other machines.

The memory organization of the SECD machine is similar to the model used by most functional language interpreters: a number of memory cells, each of which can hold either an atom (a simple value, for example 13), or represent an empty or non-empty list. In the latter case, the cell holds two pointers to other cells, one representing the first element, the other representing the list except for the first element. ...


A number of additional instructions for basic functions like car, cdr, list construction, integer addition, I/O, etc. exist. They all take any necessary parameters from the stack. " -- (this article has additional Further reading links not in the above excerpt)

Reynolds 1993 says that the dump (the D in SECD) is a representation of a continuation.




The Amber Machine by Luca Cardelli

"The Amber machine is a stack machine designed as an intermediate language for compiling higher-order languages, in the tradition of SECD machines [Landin 64] and combinator machines [Turner 79] [Curien 86]. This is a revision of the Functional Abstract Machine described in [Cardelli 83, Cardelli 84], and is specialized for the Amber language [Cardelli 86]"



SAIL: Static Analysis Intermediate Language with a Two-Level Representation

"SAIL is an intermediate language designed for static analysis; it explicitly both a high- and a low-level representation, with mappings between them, to facilitate both analysis (on the low-level representation) and reporting (through the high-level representation)." [29]

"SAIL is a front-end for program analysis systems that provides a two-level representation, consisting of both a language-specific high-level intermediate language as well as a low-level, language-independent representation. SAIL provides a precise mapping from the low-level instructions to the high-level expressions and statements used in the source code, greatly simplifying tasks such as error reporting and relating reasoning on the low-level representation back to the original source code. SAIL also provides support for control flow graph construction as well as compact and efficient serialization. SAIL currently parses C and C++ code and uses GCC 4.3.4 to generate its high-level intermediate representation." [30]

"In contrast to CIL, Sail provides both a high and a low-level representation; however, Sail 's high-level representation is much higher than CIL whereas its low-level represen- tation is much lower. While CIL allows for intricate program transformations, Sail is exclusively targeted for performing static analysis" [31]



F) f, #bytes_passed, (dest)?..., RET(PF), where P means no return value (and so no 'dest' in CALLP) and F means a return value (and a 'dest' in CALLF))



16 x 16-bit registers. Special registers:





"Morte is a minimalist implementation of the calculus of constructions ...You can think of Morte as a very low-level intermediate language for functional languages"

Morte is said to be 'superoptimizing'; it removes recursion by transforming recursive data structures to things that hold information telling you how to do folds on them (Boehm-Berarducci encoding), and represents it in a lambda-calculus. Then it essentially optimizes by inlining; it applies functions to their arguments (beta reduction), and applies .



Wren VM


Wren VM instructions

Much of the following is often quoted or paraphrased from [33].

There are also a bunch of primitives in [34] . These are invoked with the CALL or SUPER opcodes:

class Object (the root class; classes have both a superclass and a metaclass (except that Object has no superclass); Object's metaclass is objectMetaclass ):

class Class (a subclass of Object, like everything else; its metaclass is itself (Class))

class objectMetaclass (a subclass of Class; objectMetaclass's metaclass is Class):

class Bool:

class Fiber:

class Fn:

class Null:

class Num:

class String:

class List:

class Map:

class Range:

class System:

Wren VM links




IL4 Lisp-ahtava VM

Vm made for demoscene 4k intros.

Lisp syntax.

"Comments start with a # char and go up to the end of the line. "

" The language is totally untyped (well, strictly speaking the type is "32-bits" :). That means you have to be careful to call the right arithmetic and comparison functions (as well as others). The values may very well be pointers, integers, floats or whatever. "

top level constructs:

in the 'bytecode functions': "

    (fun glVertex3fv (arr) (stdcall (external_symbol "_glVertex3fv@4") arr))
    (fun glViewport (x y w h) (stdcall (external_symbol "_glViewport@16") x y w h))

There are different forms for functions which return a floating point value, namely "stdcall-fp" and "cdecl-fp".


-- [35]

types of items in the source code are (from

types of constants are (from, 'parse_const'):

types of statements are (from

imbc (intermediate bytecode) opcodes (from

An demoscene 4k intro, 'roseshank', included with the distribution also contains a 'core' library with:


(or) ^ (xor) ~ (not) 2> (shr) >> (sar)



Forth VM made for demoscene 4k intros.

Some core words (from ccore.h):





stack manipulation:

float math:

signed int math:

(or), ^ (xor)





repl stuff:

Some graphics constants (from cgl.h): constants:

Some graphics functions (from cgl.h):

Some core bytecodes(?) (from code.c, or from [36]):


" Cogol exists at a level just above assembly language. Generally, most lines in a Cogol program each correspond to a single line of assembly, but there are some important features of the language:

    Basic features include named variables with assignments and operators that have more readable syntax. For example, ADD A1 A2 3 becomes z = x + y;, with the compiler mapping variables onto addresses.
    Looping constructs such as if(){}, while(){}, and do{}while(); so the compiler handles branching.
    One-dimensional arrays (with pointer arithmetic), which are used for the Tetris board.
    Subroutines and a call stack. These are useful for preventing duplication of large chunks of code, and for supporting recursion.

The compiler (which I wrote from scratch) is very basic/naive, but I've attempted to hand-optimize several of the language constructs to achieve a short compiled program length.

Here are some short overviews about how various language features work:


The source code is tokenized linearly (single-pass), using simple rules about which characters are allowed to be adjacent within a token. When a character is encountered that cannot be adjacent to the last character of the current token, the current token is deemed complete and the new character begins a new token. Some characters (such as { or ,) cannot be adjacent to any other characters and are therefore their own token. Others (like > or =) are only allowed to be adjacent to themselves, and can thus form tokens like >>> or ==. Whitespace characters force a boundary between tokens but aren't themselves included in the result. The most difficult character to tokenize is - because it can both represent subtraction and unary negation, and thus requires some special-casing. "


" parallel, register-based, bytecode virtual machine


Feature list:

Viua instructions


Project Oberon 2013 RISC.mod

see also


A low-level retargetable code generator with an IR (the same sort of thing as LLVM). Designed for the Webassembly project. Written in Rust. Formerly called Cretonne.


" ...the 16-bit Ridiculously Simple Computer (RiSC?-16), a teaching ISA that is based on the Little Computer (LC-896) developed by Peter Chen at the Uni- versity of Michigan. The RiSC?-16 is an 8-register, 16-bit computer. All addresses are shortword- addresses (i.e. address 0 corresponds to the first two bytes of main memory, address 1 corresponds to the second two bytes of main memory, etc.). Like the MIPS instruction-set architecture, by hardware convention, register 0 will always contain the value 0. The machine enforces this: reads to register 0 always return 0, irrespective of what has been written there. "

instructions: add addi nand lui sw lw beq jalr

WebKit's Air Assembly language

?? Is this the same as:

See also:

WebKit's Bare Bones Backend / B3 IR


" an instruction set architecture created by Andrew Tanenbaum for his MIC-1 architecture. It is used to teach assembly basics in his book Structured Computer Organization.

IJVM is mostly a subset of the JVM assembly language that is used in the Java platform. This instruction set is so simple that it's difficult to write complex programs in it (for example, no shift instructions are provided). " -- [37]


There's also a set of special ARRAY instructions. Instruction Stack before* Stack after Description


33 instructions (from slide 27):

(the github repo documentation also mentions another instruction, SLEEP)





16 bit register based virtual machine

16 registers

50 instructions:


Embed VM

skx simplevm

" The following are examples of all instructions:

:test :label goto 0x44ff # Jump to the given address goto label # Jump to the given label jmpnz label # Jump to label if Zero-Flag is not set jmpz label # Jump to label if Zero-Flag is set

store #1, 33 # store 33 in register 1 store #2, "Steve" # Store the string "Steve" in register 1. store #1, #3 # register1 is set to the contents of register #3.

exit # Stop processing. nop # Do nothing

print_int #3 # Print the (integer) contents of register 3 print_str #3 # Print the (string) contents of register 3

system #3 # Call the (string) command stored in register 3

add #1, #2, #3 # Add register 2 + register 3 contents, store in reg 1 sub #1, #2, #3 # sub register 2 + register 3 contents, store in reg 1 mul #1, #2, #3 # multiply register 2 + register 3 contents, store in reg 1 concat #1, #2,#3 # store concatenated strings from reg2 + reg3 in reg1.

dec #2 # Decrement the integer in register 2 inc #2 # Increment the integer in register 2

string2int #3 # Change register 3 to have a string from an int is_integer #3 # Does the given register have an integer content? int2string #3 # Change register 3 to have an int from a string is_string #3 # Does the given register have a string-content?

cmp #3, #4 # Compare contents of reg 3 & 4, set the Z-flag. cmp #3, 42 # Compare contents of reg 3 with the constant 42. sets z. cmp #3, "Moi" # Compare contents of reg 3 with the constant string "Moi". sets z.

peek #1, #4 # Load register 1 with the contents of the address in #4. poke #1, #4 # Set the address stored in register4 with the contents of reg1. random #2 # Store a random integer in register #2.

push #1 # Store the contents of register #1 in the stack pop #1 # Load register #1 with the contents of the stack. call 0xFFEE # Call the given address. call my_label # Call the defined label ret # Return from a called-routine. "





  // Misc
  FUNC_CALL: 11,
  EVAL: 12,
  COPY: 15,
  EXIT: 16,
  COND_JUMP: 17,
  JUMP: 18,
  PROPSET: 21,
  // Comparisons
  // Math
  ADD: 100,
  MUL: 101,
  MINUS: 102,
  DIV: 103}; "


" Table 2: LIL General-Purpose Instructions LIL syntax; Description Arithmetic:




Table 3: LIL VM-Specific Instructions LIL syntax Description


JNI Handles:

Thread Pointer:


RockSalt RTL

"RTL is a small RISC-like language for computing with bit-vectors. The language abstracts over an architecture’s definitionof machine state, which in the case of the x86 includes the variouskinds of registers shown in Figure 1 as well as a memory, represented as afinite map from addresses to bytes. Internally, the language supports a countably infinite supply of local variables thatcan be used to hold intermediate bit-vector values. The RTL instruction set is sketched in Figure 3 and includes standard arithmetic, logic, and comparison operations for bit vectors; operations to sign/zero-extend and truncate bit vectors; an operation to load an immediate value into a local variable; operationsto load/store values in local variables from/to registers; operations to load and store bytes into memory; and a special operation fornon-deterministically choosing a bit-vector value of a particular size. We use dependent types to embed the language into Coq and ensure that only bit-vectors of the appropriate size are used in instructions."

Figure 3.The RTL Language

Machine locations: PC


Local variables: x, y, z \in identifier

Arithmetic operators:

Comparison operators:

RTL instructions:

UCSD p-code


" The transputer instruction set consisted of 8-bit instructions assembled from opcode and operand nibbles. The upper nibble contained the 16 possible primary instruction codes, making it one of the very few commercialized minimal instruction set computers. The lower nibble contained the one immediate constant operand, commonly used as an offset relative to the Workspace (memory stack) pointer. Two prefix instructions allowed construction of larger constants by prepending their lower nibbles to the operands of following instructions. Further instructions were supported via the instruction code Operate (Opr), which decoded the constant operand as an extended zero-operand opcode, providing for almost endless and easy instruction set expansion as newer implementations of the transputer were introduced.

The 16 'primary' one-operand instructions were: Mnemonic Description J Jump – add immediate operand to instruction pointer LDLP Load Local Pointer – load a Workspace-relative pointer onto the top of the register stack PFIX Prefix – general way to increase lower nibble of following primary instruction LDNL Load non-local – load a value offset from address at top of stack LDC Load constant – load constant operand onto the top of the register stack LDNLP Load Non-local pointer – load address, offset from top of stack NFIX Negative prefix – general way to negate (and possibly increase) lower nibble LDL Load Local – load value offset from Workspace ADC Add Constant – add constant operand to top of register stack CALL Subroutine call – push instruction pointer and jump CJ Conditional jump – depending on value at top of register stack AJW Adjust workspace – add operand to workspace pointer EQC Equals constant – test if top of register stack equals constant operand STL Store local – store at constant offset from workspace STNL Store non-local – store at address offset from top of stack OPR Operate – general way to extend instruction set

All these instructions take a constant, representing an offset or an arithmetic constant. If this constant was less than 16, all these instructions coded to one byte.

The first 16 'secondary' zero-operand instructions (using the OPR primary instruction) were: Mnemonic Description REV Reverse – swap two top items of register stack LB Load byte BSUB Byte subscript ENDP End process DIFF Difference ADD Add GCALL General Call – swap top of stack and instruction pointer IN Input – receive message PROD Product GT Greater Than – the only comparison instruction WSUB Word subscript OUT Output – send message SUB Subtract STARTP Start process OUTBYTE Output byte – send one-byte message OUTWORD Output word – send one-word message


Transputers were intended to be programmed using the programming language occam, based on the communicating sequential processes (CSP) process calculus. The transputer was built to run Occam specifically, more than contemporary CISC designs were built to run languages like Pascal or C.


The transputer did not include a memory management unit (MMU) or a virtual memory system...The transputer's lack of support for virtual memory inhibited the porting of mainstream variants of the Unix operating system" -- [39]








Schneider's IL

"a first-order language IL that re- stricts function application to tail position, and does not allow mutually recursive definitions. We give two semantic interpretations to the language: The functional interpretation IL/F given in Section 3.2 yields a standard, first-order func- tional language with a tail call restriction. The imperative interpretation IL/I given in Section 3.3 reveals a low-level imperative register transfer lan- guage.

3.1 Syntax of IL We emphasize the first-order nature of the language by using different al- phabets for the names of variables and functions. x ranges over V , the alphabet for variables, which denote values of base type. f ranges over L, the alphabet for labels, which we use to denote first-order functions. The language is parametrized over a structure of simple expressions which we call operations and denote by Op. By convention, e ranges over Op. We assume a partial function [[·?]] : (V → V ) * V which evaluates an op- eration given the values of the variables. Evaluation of an operation cannot change the value of a variable, hence operations cannot have side effects. As [[·?]] is a partial function, operation evaluation may err: [[e?]] E = ⊥, but evaluation is deterministic.

Figure 3.1: Syntax of IL Exp \oe s, t ::= let x = e in s first-order let

if x then {s} else {t} conditional
x value
fun f x_bar = s in t second-order recursive let
fx_bar application


The following syntax is an alternative presentation of the syntax of IL, which generates the same abstract syntax trees, but is more suggestive of IL/I’s imperative interpreta- tion:

Exp \in s, t ::= x := e; s assignment

if x then {s} else {t} conditional
return x value
block f x {s}; {t} block definition
goto f x jump + parallel assignment

" --


Popular ILs