Table of Contents for Programming Languages: a survey

Intermediate languages

There are many languages whose goal is not to be a good language for humans to write or read code in, but rather to be a good language for a compiler or interpreter to target.

TODO: separate this section into 'implementation tours' in the implementation section, and purely target language tours, here.

Chapter : a tour of some language implementations

Go

Haskell: GHC

Python: CPython

http://docs.python.org/devguide/compiler.html

Python: PyPy

PyPy? is a reimplementation of Python in RPython.

RPython is a restricted subset of Python, with restrictions on dynamic typing, reflection, and metaprogramming to enable type inference at compile time.

PyPy? is also a compiler for RPython, which adds in JIT analysis. The RPython compiler is written in Python.

Then the PyPy? Python interpreter is written in a mixture of Python (for slow initialization) and RPython (for the fast part) (i think?). This way the JIT analysis is applied to the result. I think? Not sure I understand.

I think it provides an extension API called CPyExt?, not sure though.

RPython:

Perl6: Rakudo

Perl6 source code is parsed and the parse tree is annotated by firing "action methods" during parsing. The annotated AST is called a QAST. The QAST is then compiled to a virtual machine bytecode. Various virtual machines are planned to be supported, include Parrot, JVM, and MoarVM?. The compilation steps from source code to bytecode are implemented in a subset of Perl6 called 'NQP' (Not Quite Perl).

The Perl6 object model is being reimplemented in the 6model project.

Links:

Smalltalk: Squeak

Squeak runs on a VM.

The VM is implemented in Slang, a subset of Smalltalk that can be efficiently optimized.

Erlang

http://prog21.dadgum.com/127.html

Erlang -> Core Erlang -> BEAM -> optimized BEAM

Comparisons and observations

Generally the implementation of a high-level language in a restricted 'core' version of that same language define the core by producing a statically typed variant and disallowing various metaprogramming constructs.

Chapter : a tour of some targets, IRs, VMs and runtimes

(todo move most of the above here)

We'll refer to the top of the stack as TOP0, and to the second position on the stack as TOP1, and to the third position as TOP2, etc.

stacks:

cpython: block stack

stack ops: cpython:

arithmetic:

JVM

stack-oriented

9 primitive types: int, long, short, byte, char, float, double, bool, reference

invokedynamic Dynalink (Dynamic Linker Framework)

JVM bytecode verification • JVM bytecode is statically verified before execution - An instruction must work on stack operands and variables of the right type - A method must use no more local variables and no more local stack positions than it claims to - For every point in the bytecode, the local stack has the same depth every time it is reached - A method must not throw more exceptions than it admits - A method must end with a return value or throw instruction - Method must not use one half of a two word value

Additional JVM runtime checks • Array bounds check • Array assignment type checks • Null reference checks • Checked casts • Bottom line: - A JVM program cannot read or overwrite arbitrary memory - Better debugging, better security - No buffer overflow attacks, worms, etc as in C/C++

" -- Rasmus Ejlers Møgelberg, http://itu.dk/people/mogel/SPLC2012/lectures/SPLC.2012.12.pdf

Links:

http://itu.dk/people/mogel/SPLC2012/lectures/SPLC.2012.12.pdf
http://www.theserverside.com/news/1363881/The-Working-Developers-Guide-to-Java-Bytecode
http://en.wikipedia.org/wiki/Java_bytecode
http://www.netmite.com/android/mydroid/dalvik/docs/java-bytecode.html
The Java Virtual Machine Specification: Structure of the Java Virtual Machine http://pages.cs.wisc.edu/~fischer/cs536.f13/jvm.html

on Android

Android has a non-canonical JVM compiler that called 'Dalvik' that compiles to another Dalvik-specific VM.

It doesn't support invokedynamic.

CLR

DLR for dynamic languages (built on top of CLR)

alternate, open implementation: Mono

15 primitive types: bool, byte, sbyte, char, decimal, double, float, int, uint, long, ulong, object, short, ushort, string

-- http://msdn.microsoft.com/en-us/library/ya5y69ds.aspx

The CIL has 67 base instructions and 33 object model instructions, according to " Common Language Infrastructure (CLI) Partition III CIL Instruction Set".

" Rasmus Ejlers Møgelberg Common Language Infrastructure • Much the same philosophy as JVM, but - Many source languages: C#, VB.NET, F#, SML, JScript, Eiffel, Ruby - Tail calls support functional languages - True generics in byte-code: safer and faster - User-defined structs " -- Rasmus Ejlers Møgelberg, http://itu.dk/people/mogel/SPLC2012/lectures/SPLC.2012.12.pdf

Instruction encoding

Instructions are one or more byte opcodes (currently, one or two bytes) followed by zero or more operands. They can also have prefixes.

LLVM

Some issues with LLVM:

http://doc.pypy.org/en/latest/faq.html#could-we-use-llvm

LLVM ISA

Instructions

Terminator Instructions: ret, br, switch, indirectbr, invoke, resume, unreachable

Binary Operations: add, fadd, sub, fsub, mul, fmul, udiv, sdiv, fdiv, urem, srem, frem

Bitwise Binary Operations: shl, lshr, ashr, and, or, xor

Vector Operations: extractelement, insertelement, shufflevector,

Aggregate Operations: extractvalue, insertvalue,

Memory Access and Addressing Operations: alloca, load, store, fence, cmpxchg, atomicrmw, getelementptr,

Conversion Operations: trunc , zext , sext , fptrunc , fpext , fptoui , fptosi , uitofp , sitofp , ptrtoint , inttoptr , bitcast ,

Other Operations: icmp, fcmp, phi, select, call, va_arg, landingpad,

Intrinsics

Variable Argument Handling Intrinsics: llvm.va_start, llvm.va_end, llvm.va_copy,

Accurate Garbage Collection Intrinsics: llvm.gcroot, llvm.gcread, llvm.gcwrite,

Code Generator Intrinsics: llvm.returnaddress, llvm.frameaddress, llvm.stacksave, llvm.stackrestore, llvm.prefetch, llvm.pcmarker, llvm.readcyclecounter,

Standard C Library Intrinsics: llvm.memcpy, llvm.memmove, llvm.memset, llvm.sqrt, llvm.powi, llvm.sin, llvm.cos, llvm.pow, llvm.exp, llvm.exp2, llvm.log, llvm.log10, llvm.log2, llvm.fma, llvm.fabs, llvm.copysign, llvm.floor, llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint, llvm.round,

Bit Manipulation Intrinsics: llvm.bswap, llvm.ctpop, llvm.ctlz, llvm.cttz,

Arithmetic with Overflow Intrinsics: llvm.sadd.with.overflow, llvm.uadd.with.overflow, llvm.ssub.with.overflow, llvm.usub.with.overflow, llvm.smul.with.overflow, llvm.umul.with.overflow,

Specialised Arithmetic Intrinsics: llvm.fmuladd,

Half Precision Floating Point Intrinsics: llvm.convert.to.fp16, llvm.convert.from.fp16,

Debugger Intrinsics: llvm.dbg.declare, llvm.dbg.value¶

Exception Handling Intrinsics: llvm.eh.typeid.for, llvm.eh.sjlj.setjmp, llvm.eh.sjlj.longjmp, llvm.eh.sjlj.lsda, llvm.eh.sjlj.callsite,

Trampoline Intrinsics: llvm.init.trampoline, llvm.adjust.trampoline

Memory Use Markers: llvm.lifetime.start, llvm.lifetime.end, llvm.invariant.start, llvm.invariant.end

General Intrinsics:llvm.var.annotation, llvm.ptr.annotation.*, llvm.annotation.*, llvm.trap, llvm.debugtrap, llvm.stackprotector, llvm.stackprotectorcheck, llvm.objectsize, llvm.expect, llvm.donothing,

" The LLVM instruction set defines a register based virtual machine with an interesting twist: it has an infinite number of registers. In keeping with its design point as a compiler intermediate representation, LLVM registers enable static single assignment form. A register is used for exactly one value and never reassigned, making it easy for subsequent processing to determine whether values are live or can be eliminated. " -- http://codingrelic.geekhold.com/2010/07/virtual-instruction-sets-opcode.html

LLVM Links

http://llvm.org/docs/LangRef.html

CPython bytecode

Two stacks:

main stack
block stack: "Per frame, there is a stack of blocks, denoting nested loops, try statements, and such."

Stack ops:

various rots, dups
in-place binary arithmetic on top0, top1 (doesn't consume top1)

Arithmetic:

arithmetical negation (-6 -> 6), boolean negation (-6 -> False), inversion (-6 -> 5), unary positive (does nothing by default but can be overridden by __pos__ in custom classes)

Links:

http://docs.python.org/release/3.3.2/library/dis.html#bytecodes
www.troeger.eu/files/teaching/pythonvm08.pdf

Parrot

Parrot started out as a runtime for Perl6. Then it refocused on being an interoperable VM target for a variety of languages. However, it hasn't been very successful (due to not enough volunteers being motivated to spend enough hours hacking on it), and even Perl6 is moving away from it.

Even if unsuccessful, it is still of interest because it is one of the few VMs designed with interoperation between multiple HLLs in mind.

Note that multiple core Parrot devs claim that all core Parrot devs hate Parrot's object model: http://whiteknight.github.io/2011/09/10/dust_settles.html http://www.modernperlbooks.com/mt/2012/12/the-implementation-of-perl-5-versus-perl-6.html

It's register-based. It provides garbage collection.

It has a syntactic-sugar IR language called PIR (which handles register allocation and supports named registers), an assembly-language called PASM, an AST serialization format called PAST, and a bytecode called PBT.

Its objects are called PMCs (Polymorphic Containers).

Its set of opcodes are extensible (a program written in Parrot can define custom opcodes). Parrot itself contains a lot of opcodes: http://docs.parrot.org/parrot/devel/html/ops.html

At one point there was an effort called M0 to redefine things from a small, core set of opcodes but i don't know what happened to it; this appears to be the list of M0 opcodes: https://github.com/parrot/parrot/blob/m0/src/m0/m0.ops . I dunno if the M0 project is still ongoing, see http://leto.net/dukeleto.pl/2011/05/what-is-m0.html https://github.com/parrot/mole http://reparrot.blogspot.com/2011/07/m0-roadmap-goals-for-q4-2011.html http://gerdr.github.io/on-parrot/rethinking-m0.html . The repo seems to be at https://github.com/parrot/parrot/tree/m0 . There is also an IL that compiles to M0: https://github.com/parrot/m1/blob/master/docs/pddxx_m1.pod .

There was an earlier effort for some sort of core language called L1 http://wknight8111.blogspot.com/2009/06/l1-language-of-parrot-internals.html . Not sure what happened with that either.

The M0 opcodes:

https://github.com/parrot/parrot/blob/m0/docs/pdds/draft/pdd32_m0.pod

control flow: noop goto (go to a fixed offset in the current bytecode segment) goto_if (conditionally go to a fixed offset in the current bytecode segment) goto_chunk (go to an offset in another chunk)

arithmetic: add_i add_n sub_i sub_n mult_i mult_n div_i div_n mod_i (remainder) mod_n isgt_i isgt_n isge_i isge_n convert_i_n (convert from integer to numeric) convert_n_i

bitwise arithmetic: ashr (right bitshift with sign extension) lshr (right bitshift without sign extension) shl (left bitshift) and or xor

Memory/GC ops: gc_alloc sys_alloc sys_free copy_mem

todo: set set_imm deref set_ref set_byte get_byte set_word get_word csym ccall_arg ccall_ret ccall print_s print_i print_n exit

_i means arith on 'integers', _n means arith on 'two numeric registers', "Treat *$2 and *$3 as integer or floating-point values, (operate on) them and store the result in *$1."

todo; explain the crytic ones; descriptions here https://github.com/parrot/parrot/blob/m0/docs/pdds/draft/pdd32_m0.pod

Links:

http://docs.parrot.org/parrot/latest/html/

Parrot

Parrot is also a register based virtual machine. It defines four types of registers:

    Integers
    Numbers (i.e. floating point)
    Strings
    Polymorphic Containers (PMCs), which reference complex types and structures

Like LLVM, Parrot does not define a maximum number of registers: each function uses as many registers as it needs. Functions do not re-use registers for different purposes by storing their values to memory, they specify a new register number instead. The Parrot runtime will handle assignment of virtual machine registers to CPU registers.

So far as I can tell, integer registers are the width of the host CPU on which the VM is running. A Parrot bytecode might find itself using either 32 or 64 bit integer registers, determined at runtime and not compile time. This is fascinating if correct, though it seems like BigNum? handling would be somewhat complicated by this. " -- http://codingrelic.geekhold.com/2010/07/virtual-instruction-sets-opcode.html

MoarVM

MoarVM? is a VM built for Perl6's Rakudo implementation (the most canonical Perl6 implementation as of this writing).

Register-based
primitive types: int, num, str, object
uses libuv for async I/O, libtommath for multiple precision arithmetic, libatomic_ops for atomic operations, uthash for hashes

Links:

http://jnthn.net/papers/2013-yapceu-moarvm.pdf
http://perl6.com/MoarVM_talk1.pdf‎
opcodes (that's pretty unenlightening; here's the implementation, i think: https://github.com/MoarVM/MoarVM/blob/master/src/core/interp.c )
http://6guts.wordpress.com/2013/05/31/moarvm-a-virtual-machine-for-nqp-and-rakudo/

GHC Core

Possible extensions:

GHC STG

GHC Cmm (C--)

Smalltalk

From http://wiki.squeak.org/squeak/2267 , the operations available in Slang are:

"&" "

"+" "-" "" "
" min: max: bitAnd: bitOr: bitXor: bitShift: "<" "<=" "=" ">" ">=" "~=" "==" isNil notNil whileTrue: whileFalse: to:do: to:by:do: ifTrue: ifFalse: ifTrue:ifFalse: ifFalse:ifTrue: at: at:put: 1 bitInvert32 preIncrement integerValueOf: integerObjectOf: isIntegerObject:

" and: or: not

Links:

http://www.cosc.canterbury.ac.nz/wolfgang.kreutzer/cosc205/squeak.html
http://www.marcusdenker.de/talks/07SCGSmalltalk/11Bytecode.pdf
http://www.mirandabanda.org/bluebook/bluebook_chapter26.html#TheBytecodes26
http://www.mirandabanda.org/bluebook/bluebook_chapter27.html
http://www.mirandabanda.org/bluebook/bluebook_chapter28.html (the bytecode instruction set is here)
http://www.mirandabanda.org/bluebook/bluebook_chapter29.html (specification of the primitive methods)

The Smalltalk-80 Bytecodes Range Bits Function 0-15 0000iiii Push Receiver Variable #iiii 16-31 0001iiii Push Temporary Location #iiii 32-63 001iiiii Push Literal Constant #iiiii 64-95 010iiiii Push Literal Variable #iiiii 96-103 01100iii Pop and Store Receiver Variable #iii 104-111 01101iii Pop and Store Temporary Location #iii 112-119 01110iii Push (receiver, true, false, nil, -1, 0, 1, 2) [iii] 120-123 011110ii Return (receiver, true, false, nil) [ii] From Message 124-125 0111110i Return Stack Top From (Message, Block) [i] 126-127 0111111i unused 128 10000000 jjkkkkkk Push (Receiver Variable, Temporary Location, Literal Constant, Literal Variable) [jj] #kkkkkk 129 10000001 jjkkkkkk Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk 130 10000010 jjkkkkkk Pop and Store (Receiver Variable, Temporary Location, Illegal, Literal Variable) [jj] #kkkkkk 131 10000011 jjjkkkkk Send Literal Selector #kkkkk With jjj Arguments 132 10000100 jjjjjjjj kkkkkkkk Send Literal Selector #kkkkkkkk With jjjjjjjj Arguments 133 10000101 jjjkkkkk Send Literal Selector #kkkkk To Superclass With jjj Arguments 134 10000110 jjjjjjjj kkkkkkkk Send Literal Selector #kkkkkkkk To Superclass With jjjjjjjj Arguments 135 10000111 Pop Stack Top 136 10001000 Duplicate Stack Top 137 10001001 Push Active Context 138-143 unused 144-151 10010iii Jump iii + 1 (i.e., 1 through 8) 152-159 10011iii Pop and Jump 0n False iii +1 (i.e., 1 through 8) 160-167 10100iii jjjjjjjj Jump(iii - 4) *256+jjjjjjjj 168-171 101010ii jjjjjjjj Pop and Jump On True ii *256+jjjjjjjj 172-175 101011ii jjjjjjjj Pop and Jump On False ii *256+jjjjjjjj 176-191 1011iiii Send Arithmetic Message #iiii 192-207 1100iiii Send Special Message #iiii 208-223 1101iiii Send Literal Selector #iiii With No Arguments 224-239 1110iiii Send Literal Selector #iiii With 1 Argument 240-255 1111iiii Send Literal Selector #iiii With 2 Arguments

Dis

Primitive types: byte, word, string, real, big (and some short versions of some of these) Addressing modes: immediate, indirect and double indirect

Garbage Collection

there are separate instructions (omitted from below) for calling into a different module (and for allocating a stack frame prior to that call)

ISA

Question marks mean that there are a bunch of instructions of this form for different types.

arithmetic: add?, and?, div?, mod?, mul?, negf (negate real) comparison and branching and jumping: beq?, bge?, bgt?, ble?, blt?, bne?, call, casec (case), frame (Allocate frame for local call), goto (computed goto), jmp, mspawn other operations on primitive types: addc (concatenate strings), cons? (cons, "Allocate new list element"), head?, indc (index by character), indx (array index returning a pointer to the indexed location), ind? (array index returning the value at the location), insc (insert character into string), lena (len of array), lenc (len of string), lenl (len of linked list) multithreading: alt ("Alternate between communications"; "selects between a set of channels ready to communicate"), exit, nbalt (non-blocking alt) type conversion and coercion: cvt?? exception handling: eclr ("Clear exception stack"; reserved for internal use by the implementation, to cancel exception handlers on return from a function), module stuff: load (load module), mnewz (Allocate object given type from another module), memory: lea (load effective address, e.g. explicitly perform addressing mode), mov?, movm (copy range of memory), movmp (copy range of memory and update reference counts), movp (copy pointer and update reference counts), movpc (compute memory address of an immediate program counter value)

todo: the rest of these:

new, newz - Allocate object

    Syntax:		new	src, dst
    		newz	src, dst
    Function:	dst = malloc(src->size);
    		initmem(dst, src->map);

The new instruction allocates and initializes storage to a new area of memory. The size and locations of pointers are specified by the type descriptor number given as the src operand. A pointer to the newly allocated object is placed in dst. Any space not occupied by pointers has undefined value.

The newz instruction additionally guarantees that all non-pointer values are set to zero. It is not used by Limbo. newa, newaz - Allocate array

    Syntax:		newa	src1, src2, dst
    		newaz	src1, src2, dst
    Function:	dst = malloc(src2->size * src1);
    		for(i = 0; i < src1; i++)
    			initmem(dst + i*src2->size, src2->map);

The newa instruction allocates and initializes an array. The number of elements is specified by the src1 operand. The type of each element is specified by the type descriptor number given as the src2 operand. Space not occupied by pointers has undefined value. The newaz instruction additionally guarantees that all non-pointer values are set to zero; it is not used by Limbo. newcx - Allocate channel

    Syntax:		newcw	dst
    		newcb	dst
    		newcl	dst
    		newcf	dst
    		newcp	dst
    		newcm	src, dst
    		newcmp	src, dst
    Function:	dst = new(Channel)

The newc instruction allocates a new channel of the specified type and stores a reference to the channel in dst. For the newcm instruction the source specifies the number of bytes of memory used by values sent on the channel (see the movm instruction above). For the newcmp instruction the first operand specifies a type descriptor giving the length of the structure and the location of pointers within the structure (see the movmp instruction above). orx - Logical OR

    Syntax:		orb	src1, src2, dst
    		orw	src1, src2, dst
    		orl	src1, src2, dst
    Function:	dst = src1 | src

These instructions compute the bitwise OR of the two operands addressed by src1 and src2 and store the result in the dst operand. recv - Receive from channel

    Syntax:		recv	src, dst
    Function:	dst = <-src

The recv instruction receives a value from some other thread on the channel specified by the src operand. Communication is synchronous, so the calling thread will block until a corresponding send or alt is performed on the channel. The type of the received value is determined by the channel type and the dst operand specifies where to place the received value. ret - Return from function

    Syntax:		ret
    Function:	npc = link(fp)
    		mod = mod(fp)
    		fp = frame(fp)
    		pc = npc

The ret instruction returns control to the instruction after the call of the current function. send - Send to channel

    Syntax:		send	src, dst
    Function:	dst <-= src

The send instruction sends a value from this thread to some other thread on the channel specified by the dst operand. Communication is synchronous so the calling thread will block until a corresponding recv or alt is performed on the channel. The type of the sent value is determined by the channel type and the dst operand specifies where to retrieve the sent value. shlx - Shift left arithmetic

    Syntax:		shlb	src1, src2, dst
    		shlw	src1, src2, dst
    		shll	src1, src2, dst
    Function:	dst = src2 << src1

The shl instructions shift the src2 operand left by the number of bits specified by the src1 operand and store the result in the dst operand. Shift counts less than 0 or greater than the number of bits in the object have undefined results. shrx - Shift right arithmetic

    Syntax:		shrb	src1, src2, dst
    		shrw	src1, src2, dst
    		shrl	src1, src2, dst
    Function:	dst = src2 >> src1

The shr instructions shift the src2 operand right by the number of bits specified by the src1 operand and store the result in the dst operand. Shift counts less than 0 or greater than the number of bits in the object have undefined results. slicea - Slice array

    Syntax:		slicea	src1, src2, dst
    Function:	dst = dst[src1:src2]

The slicea instruction creates a new array, which contains the elements from the index at src1 to the index src2-1. The new array is a reference array which points at the elements in the initial array. The initial array will remain allocated until both arrays are no longer referenced. slicec - Slice string

    Syntax:		slicec	src1, src2, dst
    Function:	dst = dst[src1:src2]

The slicec instruction creates a new string, which contains characters from the index at src1 to the index src2-1. Unlike slicea , the new string is a copy of the elements from the initial string. slicela - Assign to array slice

    Syntax:		slicela	  src1, src2, dst
    Function:	dst[src2:] = src1

The src1 and dst operands must be arrays of equal types. The src2 operand is a non-negative integer index. The src1 array is assigned to the array slice dst[src2:]; src2 + nelem(src1) must not exceed nelem(dst). spawn - Spawn function

    Syntax:		spawn	src, dst
    Function:	fork();
    		if(child)
    			dst(src);

The spawn instruction creates a new thread and calls the function specified by the dst operand. The argument frame passed to the thread function is specified by the src operand and should have been created by the frame instruction. subx - Subtract

    Syntax:		subb	src1, src2, dst
    		subf	src1, src2, dst
    		subw	src1, src2, dst
    		subl	src1, src2, dst
    Function:	dst = src2 - src1

The sub instructions subtract the operands addressed by src1 and src2 and stores the result in the dst operand. For subb, the result is truncated to eight bits. tail - Tail of list

    Syntax:		tail	src, dst
    Function:	dst = src->next

The tail instruction takes the list specified by the src operand and creates a reference to a new list with the head removed, which is stored in the dst operand. tcmp - Compare types

    Syntax:		tcmp	src, dst
    Function:	if(typeof(src) != typeof(dst))
    			error("typecheck");

The tcmp instruction compares the types of the two pointers supplied by the src and dst operands. The comparison will succeed if the two pointers were created from the same type descriptor or the src operand is nil; otherwise, the program will error. The dst operand must be a valid pointer. xorx - Exclusive OR

    Syntax:		xorb	src1, src2, dst
    		xorw	src1, src2, dst
    		xorl	src1, src2, dst
    Function:	dst = src1 ^ src2

These instructions compute the bitwise exclusive-OR of the two operands addressed by src1 and src2 and store the result in the dst operand.

LuaVM

" By default, Lua has a maximum stack frame size of 250. This is encoded as MAXSTACK in llimits.h. The maximum stack frame size in turn limits the maxi mum number of locals per function, which is set at 200, encoded as LUAI_MAXVARS in luaconf.h. Other limits found in the same file include the maximum number of upval ues per function (60), encoded as LUAI_MAXUPVALUES , call depths, the minimum C stack size, etc. Also, wit h an sBx field of 18 bits, jumps and control structures cannot exceed a jump distance of about 131071. "

0 MOVE Copy a value between registers 1 LOADK Load a constant into a register 2 LOADBOOL Load a boolean into a register 3 LOADNIL Load nil values into a range of registers 4 GETUPVAL Read an upvalue into a register 5 GETGLOBAL Read a global variable into a register 6 GETTABLE Read a table element into a register 7 SETGLOBAL Write a register value into a global variable 8 SETUPVAL Write a register value into an upvalue 9 SETTABLE Write a register value into a table element 10 NEWTABLE Create a new table 11 SELF Prepare an object method for calling 12 ADD Addition operator 13 SUB Subtraction operator 14 MUL Multiplication operator 15 DIV Division operator 16 MOD Modulus (remainder) operator 17 POW Exponentiation operator 18 UNM Unary minus operator 19 NOT Logical NOT operator 20 LEN Length operator 21 CONCAT Concatenate a range of registers 22 JMP Unconditional jump 23 EQ Equality test 24 LT Less than test 25 LE Less than or equal to test 26 TEST Boolean test, with conditional jump (if not (R(A) <=> C) then PC++) 27 TESTSET Boolean test, with conditional jump and assignment (if (R(B) <=> C) then R(A) := R(B) else PC++) 28 CALL Call a closure 29 TAILCALL Perform a tail call 30 RETURN Return from function call 31 FORLOOP Iterate a numeric for loop 32 FORPREP Initialization for a numeric for loop 33 TFORLOOP Iterate a generic for loop 34 SETLIST Set a range of array elements for a table 35 CLOSE Close a range of locals being used as upvalues 36 CLOSURE Create a closure of a function prototype 37 VARARG Assign vararg function arguments to registers

this table is taken almost verbatim from http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf

summary:

move
loads (constant, bool, nil)
get/set (upval, global, table)
misc: newtable, self, close, closure, vararg
arithmetic: add, sub, mul, div, mod, pow, unm (unary minus)
logic: not, test, testset (both test and testset can be used to implement both 'and' and 'or', see http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf)
arrays: len, concat, setlist
jumps: jmp, call, tailcall, return (also some other conditionals)
conditionals: eq, lt, le (also test and testset)
loops: forloop, forprep, tforloop

Links:

asm.js

Links:

http://asmjs.org/spec/latest/

Guile's VM

Links:

http://www.gnu.org/software/guile/manual/html_node/A-Virtual-Machine-for-Guile.html

C--

Dalvik VM

todo http://www.netmite.com/android/mydroid/dalvik/docs/dalvik-bytecode.html

(todo is this wrong?) interestingly, Dalvik only has 16 registers.

however:

" The Dalvik instruction set implements an interesting compromise: it is register based, but there are a finite number of them as opposed to the theoretically infinite registers of LLVM or Parrot. Dalvik supports 65,536 registers, a vast number compared to hardware CPUs and presumably sufficient to implement SSA (if desired) in reasonably large functions.

Even more interestingly, not all Dalvik instructions can access all registers. Many Dalvik instructions dedicate 4 bits to the register number, requiring their operands to be stored in the first 16 registers. A few more instructions have an 8 bit instruction number, to access the first 256. There are also instructions to copy the value to or from any of the 65,536 registers to a low register, for a subsequent instruction to access.

It took a while to understand the rationale for this choice, and I'm still not confident I fully get it. Clearly the Dalvik designers believe that keeping data in one of the high registers will be faster than explicitly storing it to memory, even if the vast number of registers end up mostly residing in RAM. Addressing data as register numbers instead of memory addresses should make it easier for the VM to dynamically remap Dalvik registers to the real hardware registers. For example, if it can predict that virtual register 257 will likely be used in the near future it can be kept in a CPU register instead of being immediately stored to memory. " -- http://codingrelic.geekhold.com/2010/07/virtual-instruction-sets-opcode.html

Non-deprecated, non-implementation-specific instructions (many instructions have _2ADDR forms, and many also have _LIT8 and _LIT16 forms, and many have _WIDE forms, and many have _FLOAT, _INT, and _LONG forms, and many have _BOOLEAN, _BYTE, _CHAR, _OBJECT, _SHORT, and _WIDE forms, and many have _16 and _32 forms, and many of these are mixed, e.g. ADD_INT_2ADDR. I'll only list the base instruction in these cases)

ADD AGET AND APUT ARRAY_LENGTH CHECK_CAST CMPG CMPL CMP CONST CONST_CLASS CONST_HIGH16 CONST_STRING CONST_STRING_JUMBO DIV DOUBLE_TO_FLOAT DOUBLE_TO_INT DOUBLE_TO_LONG FILLED_NEW_ARRAY FILLED_NEW_ARRAY_RANGE FILL_ARRAY_DATA FLOAT_TO_DOUBLE FLOAT_TO_INT FLOAT_TO_LONG GOTO IF_EQ IF_EQZ IF_GE IF_GEZ IF_GT IF_GTZ IF_LE IF_LEZ IF_LT IF_LTZ IF_NE IF_NEZ IGET INSTANCE_OF INT_TO_BYTE INT_TO_CHAR INT_TO_DOUBLE INT_TO_FLOAT INT_TO_LONG INT_TO_SHORT INVOKE_DIRECT INVOKE_DIRECT_RANGE INVOKE_INTERFACE INVOKE_INTERFACE_RANGE INVOKE_STATIC INVOKE_STATIC_RANGE INVOKE_SUPER INVOKE_SUPER_RANGE INVOKE_VIRTUAL INVOKE_VIRTUAL_RANGE IPUT LONG_TO_DOUBLE LONG_TO_FLOAT LONG_TO_INT MONITOR_ENTER MONITOR_EXIT MOVE MOVE_16 MOVE_EXCEPTION MOVE_FROM16 MOVE_OBJECT MOVE_OBJECT_16 MOVE_OBJECT_FROM16 MOVE_RESULT MOVE_RESULT_OBJECT MOVE_RESULT_WIDE MOVE_WIDE MOVE_WIDE_16 MOVE_WIDE_FROM16 MUL NEG NEW_ARRAY NEW_INSTANCE NOP NOT OR PACKED_SWITCH REM RETURN RETURN_OBJECT RETURN_VOID RETURN_WIDE RSUB SGET SHL SHL SHR SPARSE_SWITCH SPUT SPUT SUB SUB THROW USHR XOR

Links:

NekoVM

BEAM VM

The Erlang virtual machine. Elixir also targets it.

Links:

Nock

Goals: a functional-style virtual machine, with a very simple specification, yet also practically executable

Pros:

Extremely short specification
No empty list makes definitions of some simple functions simpler (but 'tail' still has to return a sentinel if you pass it a one-item list)
No empty list means [x] = x (as with ordinary parentheses)
Functional
Mind-expanding in that it rather different both from imperative virtual machines and from other extant functional languages

Cons:

No opcode for subtraction means that subtraction takes linear time in the value of the argument! The author suggests that Nock should come with a standard library with functions such as subtraction, and that compilers should detect subtraction routines and replace them with platform primitives. Another solution would be to use the 'hint' facility to annotate the code where operations like subtraction occur, according to another specification, so that a compiler/interpreter doesn't have to detect them. Another solution might be to eschew the use of the provided number data type for representation of numbers in programs running on Nock, and instead re-represent numbers with a higher-order positional notation encoding built out of trees. This would allow sublinear-time computation of subtraction, but would be two levels removed from platform arithmetic primitives (e.g. the higher-order number representation layer, and the Nock layer).
In general, because the language is so small, there are probably other common constructs that will suffer from the problem with subtraction, e.g. slow execution due to inability to use platform primitives. In order to practically run the language, a standard library or set of 'hints' will probably have to be added, at which point effectively we are working in another, larger language.
No empty list means that a wrapper type must be used when a program wants to express a list than might be empty
Implicit cons, while cute, makes code harder to understand and probably makes implementations more complex

People: C. Guy Yarvin

Primitive data types: non-negative integers, binary trees with non-negative integer leaves

Data literal constructors and primitive operations

(NonNegativeIntegers?, BinaryTrees?, successor, isLeaf, isStructurallyEqual, select)

where successor(a number) = that number + 1, isLeaf(a number) = 1, isLeaf(interior node of a tree) = 0, and select takes two arguments, a list and a number, and the number it treated as an index into the (binary) tree in the manner explained in http://www.urbit.org/2013/08/22/Chapter-2-nock.html ; the subtree (or leaf node, which would be a natural number) at that index is returned.

Data literal idiosyncracies

There is no empty list. The results of isTree use the convention 1 is False, 0 is True.

Primitive instruction set

select, identity, nock, isLeaf, successor, isStructurallyEqual

The 'nock' primitive instruction requires further explanation. See 'The nock primitive instruction', below.

Each primitive instruction is represented by a numerical opcode, from 0 to 5.

Implicit primitive instruction cons

There is no opcode for a 'cons' operator or for any other way to construct a binary tree. However, Nock programs can indeed cons; the cons operator, instead of being represented explicitly by an opcode, is represented implicitly in an idiosyncratic/clever way in the representation of a Nock program.

A Nock program is represented by a binary tree with non-negative integer leaves. Most Nock instructions are of the form [a OPCODE c], where 'a' and 'c' are arbitary binary trees which are interpreted as arguments to the instruction, and OPCODE is a number that specifies the instruction.

To write a program instruction that will execute 'cons(a, b)', you put a tree in the middle OPCODE position, instead of a number. An instruction of the form [a [b c] [d e]] produces the following binary tree as a result: [eval([a b c]) eval([a d e])].

Macro instructions

if, compose, push-and-compose, call-on-result-of, hint

'if' takes three arguments (in addition to a state argument), the first of which must evaluate to 0 or 1; the result is either the second or the third argument depending on what the first argument evaluated to.

'compose' takes two arguments (in addition to a state argument), and applies the first argument to the state, and then applies the second argument to the result.

'push-and-compose' takes two arguments (in addition to a state argument), and applies the first argument to the state, then pushes the result onto the state to produce a new state, and then applies the second argument to the new state.

'call-on-result-of' takes two arguments (in addition to a state argument), and applies the first argument to the state to produce a temporary data structure called a 'core', then selects into this 'core' using the second argument as an index, and evaluates the result.

'hint' takes two arguments (in addition to a state argument), and simply throws the first argument away and evaluates the second argument applied to the state. If the first argument is not a number but a tree, then some computation is supposed to be done involving the first argument, after which the result of this computation is ignored: the head of the tree is discarded and the tail of the tree is applied to the state and then discarded. The idea of 'hint' is to provide hints to the compiler/interpreter but without affecting the actual result of the computation. When the first argument is not a number but a tree, we have a 'computed hint' which is not hardcoded but which is itself the result of some computation.

Each macro instruction is represented by a numerical opcode, from 6 to 10.

Syntax

Nock programs are expression encoded as abstract syntax trees.

A syntactically valid Nock node is a tree of the form [subject verb object], where [verb object] satisfies the 'formula syntactic validity condition' below. Each of subject, verb, object may be numbers or trees. Note that, because we are considering these trees to be binary, [subject verb object] is a shorthand for [subject [verb object]]. A pair [verb object] is called a 'formula'.

formula syntactic validity condition: One of the following (a) or (b) is true: (a) verb is a valid Nock opcode, and object is a binary tree which, considered as a linked list, is at least long enough to provide enough arguments to the instruction represented by the opcode in verb; or (b) 'verb' is of the form '[verb2 object2] [verb3 object3]', and this syntactic validity condition (recursively) holds for each of those two formulas.

Presumably in order to achieve extreme conciseness, the Nock specification doesn't actually contain the concept of syntactically valid/invalid Nock nodes; any node is taken as legitimate source code but 'invalid' trees merely cause the execution of a trivial infinite loop.

Note also that since Nock nodes may be dynamically constructed and evaluated during execution, it is not in general possible to determine at compile time whether a given Nock expression will give rise to and try to execute a syntactically invalid Nock expression as it is running, although i imagine that most programmers will constrain themselves to only use a restricted set of Nock idioms which can be proven not to have this problem.

Evaluation strategy

Nock is a functional language based on expression evaluation.

To evaluate a syntactically valid Nock node of the form [subject verb object]:

If verb is an opcode, then one follows the reduction rule specified by its opcode. Note that many of the reduction rules are recursive, that is, in order to compute the reduction you must first compute the reduction of another Nock expression.

If our node is of the form [subject [verb2 object2] [verb3 object3]], then the result is [eval([subject verb2 object2]) eval([subject verb3 object3])].

According to the specification, the result of evaluating a syntactically invalid Nock node is to get the same expression back; so one might say that the Nock specification does not distinguish between invalid syntax and programs which trivially loop; however in practise a compiler or interpreter would easily be able to distinguish what i am here calling syntactically valid Nock nodes.

As mentioned above, because the reduction rules are recursive, the reduction of a Nock node proceeds by expanding a Nock node into new Nock nodes that must be evaluated in order to compute the final result. In other words, an interpreter would be computing new Nock expressions on the fly and then evaluating them. It is possibly for these new Nock expressions to be syntactically invalid.

The subject

The 'subject' in Nock represents state. As a functional language, there is no state in Nock except what is passed explicitly, e.g. in the value of the subject. All Nock primitive instructions that require recursive evaluation (that is, all of them except for 'select' and 'identity') pass in the same state that they were given to the recursive evaluation.

However, it is possible for a Nock expression to cause the evaluation of another Nock evaluation in the context of a modified state. A Nock expression explicitly represents the state in the subject, so to form one expression that invokes a subexpression with modified state, all you have to do is to place something other than the subject that you were given into the subject position in the subexpression. It is convenient to speak of a formula being applied to the subject, or equivalently to the state. This facility is used in the macro instructions. For example, we noted above that 'compose' takes two arguments (in addition to a state argument), and applies the first argument to the state, and then applies the second argument to the result. So, in the subexpression that calls for the evaluation of the second argument, the subject position is filled by the result of evaluating the application of the first argument to the state.

Nock does not define any facilities for I/O; it is a pure language.

The nock primitive instruction

Definition

The 'nock primitive instruction', whose opcode is 2, is found in nodes of the form:

[subject 2 [verb1 object1] [verb2 object2]]

It is evaluated to:

eval([eval(subject verb1 object1) eval(subject verb2 object2)])

The subexpression eval(subject verb2 object2) should evaluate to a syntactically valid formula [verb3 object3], so that the result of the inner evaluations is a syntactically valid Nock formula, that is, recalling that [subject3 [verb3 object3]] can be written in the shorthand [subject3 verb3 object3],

eval(subject verb1 object1) = subject3 eval(subject verb2 object2) = [verb2 object3] [eval(subject verb1 object1) eval(subject verb2 object2)] = [subject3 verb3 object3]

Contrast with other the other instructions

Primitive instructions 'select' and 'identity' do not require recursive evaluation. Primitive instructions 'isTree' and 'successor' and 'isStructurallyEqual' cause recursive evaluation of their single argument (applied to the subject but untransformed) before they apply their characteristic operation. The implicit primitive operation cons causes recursive evaluation of both of its arguments (applied to the subject but untransformed).

All of those operations share the following form: possibly, they first fully reduce (recursively evaluate) each of evaluate their arguments separately, and then they do something to the result that can be computed without further recursive Nock evaluation, something that can be accomplished in a number of steps that scales at most linearly with the size of the fully reduced arguments. Clearly, no such set of such instructions could be Turing-equivalent, because a reduction of an expression that only involved them could always be completed in time linear in the size of the expression.

All of those instructions also share the property that one can could check their arguments at 'compile time' to make sure they are syntactically valid formulas.

The nock primitive instruction is different. Reduction of a Nock instruction produces a top-level Nock node which still has to be evaluated (the outer 'eval' in 'eval([eval(subject verb1 object1) eval(subject verb2 object2)])'), even after the evaluation of its arguments (the inner 'eval's in that same expression). This is in contrast to every other primitive instruction.

Each of the Nock macro instructions contains the nock primitive instruction, directly or indirectly, and so they may also cause recursive evaluation.

Use in loops

Note first that for any subject, the Nock expression '[subject select 1]' evaluates to 'subject'; we write eval([subject select 1]) = subject.

This implies that any Nock expression of the form:

[subject 2 [0 1] x] or, written with names instead of opcodes: [subject nock [select 1] x]

will reduce as follows:

eval(eval([subject select 1]) eval([subject x]))

eval([subject eval([subject x])])

That is, it causes the evaluation of a Nock expression with the same subject as originally, and with a formula that is given by the result of evaluating an operation on that subject. If we place program code (Nock expressions) within the subject, we can write x so that it dynamically selects this code; so the innermost eval will select some code out of the subject, and then the outermost eval will execute it. However, the code from the subject still has access to the subject, and so if it chooses, it can select its own code from the subject again, and execute that. For this sort of thing to work, we must not only evaluate an argument which has already been evaluated (like eval), but we must have a way to pass in the same subject both times.

For instance, the following Nock expression causes an infinite loop when evaluated: [[[a 0 1] [0 1]] 2 [a 0 1] [0 1]]. A nicer way of writing that, using names instead of opcodes, might be:

[[nock [select 1] [select 1]] nock [select 1] [select 1]]

This is of the form:

subject: [nock [select 1] [select 1]] verb: nock argument1: [select 1] argument2: [select 1]

Note that the subject is identical to the formula (recall that in Nock, 'formula' means [verb object] within the Nock expression form [subject [verb object]], and that [subject verb argument1 argument2] is shorthand for [subject [verb [argument1 argument2]]], so [verb argument1 argument2] = [verb [argument1 argument2]] is of the form [verb object]).

What happens when we evaluate this expression? The verb is not an opcode but a tree, so the top level operation is actually an implicit cons. Reducing this, we get:

[subject nock [select 1] [select 1]]

[eval([subject select 1]) eval([subject select 1])]

[subject subject]

[[nock [select 1] [select 1]] [nock [select 1] [select 1]]]

[[nock [select 1] [select 1]] nock [select 1] [select 1]]

which is identical to what we started with.

Contrast with 'eval' primitive in other languages

The first thing i thought of when i say the nock primitive instruction was other languages' 'eval', but whereas 'eval' takes one argument, nock takes two arguments, both of which are recursively evaluated, and also a 'subject', which is duplicated and used in the evaluation of both arguments.

This is a critical difference, because it allows the nock instruction (in conjuction with 'select') to provide the power needed for looping. The duplication of the subject is needed so that program code from the subject can be copied and consumed in each loop iteration yet still available for the next iteration.

Why does nock need to take two arguments and evaluate both of them, rather than just taking one argument and resulting in eval([subject eval(argument1)]? If we wanted a loop that progressed, rather than just infinitely looping as in the example above, we would need to update some sort of state that persists between loop iterations. But we'd still want to execute the same loop code in each iteration. The subject serves as this persistent state, and formula selects the code to be executed in the next iteration. So the evaluation of the first argument is what updates the state, and the evaluation of the second argument is like a continuation which tells us what to do in the next iteration.

Contrast with 'apply' primitive in other languages

In lambda calculus, for example, we have an infinite loop of the form:

(lambda x . x x) (lambda x . x x)

The implicit primitive operation here is 'apply'. We could write this:

apply((lambda x . x x), (lambda x . x x))

Salient differences between the nock primitive and apply are:

apply involves variables, nock does not
apply requires the interpreter to either (a) substitute the bound variables immediately, which would require scanning over the whole subexpression at each reduction step, which would be inefficient, or (b) maintain an environment that maps bound variables to their values. In Nock, the subject takes the place of a lexical environment, and the program, not the interpreter, is responsible for maintaining whatever lookup tables it needs in there
the places where the bound values are to be substituted in depend on the exact function being applied; at a given reduction step, the interpreter may have to insert them in different places depending on the program code. In Nock and other combinatory systems, where values are substituted depends only on the reduction rules associated with each instruction (although this only holds if you look at the reductions one step at a time; if the interpreter memoizes chains of reduction steps, it will have to learn new substitution patterns).

Comparison with 'S' primitive in SKI combinatory calculus

The nock primitive operation corresponds exactly to the S combinator in combinatory calculus. The S combinator is defined:

Sxyz = xz(yz)

The 'z' corresponds to the subject in the nock primitive; the 'z' is copied and both arguments are applied to it, and then the first result is applied to the second result. The inner eval in the nock primitive corresponds to the xz and yz applications, and the outer eval corresponds to the application of xz to yz.

Interestingly, the K combinator (which takes an argument and then ignores a second argument, yielding the first argument: Kxy = x), also corresponds to the 'identity' primitive operation in Nock, which takes the argument and then ignores the subject. The S and K combinators form a Turing-universal basis. This provides another way to look at the Nock primitive operations; the nock and identity primitives already form a Turing-universal basis, and then the select, successor, isLeaf, and isStructurallyEqual primitives, and the cons implicit primitive, add the minimal operations necessary for operating with non-negative integers and binary trees.

So Nock can be thought of as the SKI combinatory calculus endowed with more useful primitive data types (so that you don't have to use unnatural encodings, e.g. Church numerals, for data), as well as with a notation that, by explicitly and prominently placing the 'subject' in every expression, is a little clearer.

Alternate Nock notation

We can rewrite Nock expressions into more a familiar notation by:

writing the verb first instead of in the middle
putting the subject at the end instead of the beginning
separating the subject from the arguments with a vertical bar symbol ('

using mnemonics in place of numerical opcodes: SEL, ID, NOCK, CELL, SUCC, EQL, IF, COMP, PUSH, CALL, HINT
placing a newline whenever there is a

, and indenting

Now the reduction of the infinite loop above becomes:

eval[NOK [SEL 1] [SEL 1]

NOK [SEL 1] [SEL 1]]

eval[eval[[SEL 1]

NOK [SEL 1] [SEL 1]]

eval[[[SEL 1]

NOK [SEL 1] [SEL 1]]]]

eval[NOK [SEL 1] [SEL 1]

NOK [SEL 1] [SEL 1]]

or, if we define 'subject' to be NOK [SEL 1] [SEL 1] and don't always have newlines with all '

's:

eval[NOK [SEL 1] [SEL 1]

subject]

eval[eval[[SEL 1]

subject]
eval[[[SEL 1]	subject]]]

eval[NOK [SEL 1] [SEL 1]

subject]

Runtimes

Apache Portable Runtime

"OS threads, blocking IO, and other system things"

libuv

asynchronous I/O and some other stuff.

Used by nodejs, Rust language, Luvit, Julia, pyuv, MoarVM?.

libumem

Used by nodejs

helps with debugging memory leaks (see http://www.joyent.com/blog/walmart-node-js-memory-leak for an example)

Other useful runtime libraries

https://en.wikipedia.org/wiki/GNU_C_Library
- also Bionic, a simplified libc variant for Android
libuv for async I/O
libtommath for multiple precision arithmetic
uthash for hashes
libjit (see also http://codingrelic.geekhold.com/2010/07/exploring-libjit.html )

libatomic_ops

"Provides implementations for atomic memory update operations on a number of architectures"

"Note that experience with this package contributed to the design of C11 and C++11 atomics, which represent a careful refinement of the API. If your platform supports well-implemented C11 or C++11 atomics, please use those instead. "

documentation: http://en.cppreference.com/w/c/atomic

C11 atomics:

atomic_flag_test_and_set: sets an atomic_flag to true and returns the old value
atomic_flag_clear: sets an atomic_flag to false
atomic_store: stores a value in an atomic object
atomic_load: reads a value from an atomic object
atomic_exchange: swaps a value with the value of an atomic object
atomic_compare_exchange_strong: swaps a value with the an atomic object if the old value is what is expected, otherwise reads the old value
atomic_compare_exchange_weak: swaps a value with the an atomic object if the old value is what is expected, otherwise reads the old value (is like strong but allowed to fail spuriously)
atomic_fetch_add
atomic_fetch_sub
atomic_fetch_or
atomic_fetch_xor
atomic_fetch_and

All of the above atomics take an optional parameter, memory_order, which can be one of relaxed memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst. The default is memory_order_seq_cst (the strongest sequencing condition). Here's what these do: http://en.cppreference.com/w/c/atomic/memory_order

atomic_init: initializes an existing atomic object
atomic_is_lock_free: indicates whether the atomic object is lock-free
atomic_thread_fence: generic memory order-dependent fence synchronization primitive
atomic_signal_fence: fence between a thread and a signal handler executed in the same thread
atomic_flag (type; must be initialized with ATOMIC_FLAG_INIT). atomic_flag is lock-free. The other atomics may or may not be implemented with locks.
ATOMIC_VAR_INIT (note: this initialization operation is NOT atomic)

"Members of an atomic struct/union may not be accessed individually. The whole struct must first be copied to a non-atomic variable of compatible type."

"The ++,--, and compound assignment operators (e.g. +=) are atomic read-modify-write operations."

-- http://fileadmin.cs.lth.se/cs/Education/EDAN25/F06.pdf

Minimal Forths

http://stackoverflow.com/questions/407987/what-are-the-primitive-forth-operators?rq=1

http://newsgroups.derkeiler.com/Archive/Comp/comp.lang.forth/2005-09/msg00337.html

http://code.google.com/p/subtle-stack/downloads/detail?name=jonesforth.s.txt

http://groups.google.com/group/comp.lang.forth/msg/10872cb68edcb526

" A long time ago, I had a book called "Threaded Interpretive Languages", published I think by Byte, that discussed how to implement a Forth-like language (I don't think they ever called it Forth) in Z80 assembly.

You may not have a Z80 handy, or want one, but the book might be instructive.

+1 for Loeliger's book. The book NEVER said the language was FORTH. Between that book and the Ritter & Walker paper in Byte around the same time, I learned enough to build a reentrant direct-threaded code interpreter for a test set project a few years later at General Dynamics. – John R. Strohm Dec 13 '09 at 4:41

+1 for that book which I love, it was the book which hooked me on the idea of inventing and studying languages – Andy Dent Apr 18 '10 at 10:16

I picked up that book used based on this recommendation. It's great! – Barry Brown

C IL interpreters

http://stackoverflow.com/questions/6839737/are-there-any-c-to-bytecode-compilers

Ruva VM (Ruby VM)

http://ruva.rubyforge.org/classes/Ruva/VM/Opcodes.html

Neohapsis Toy VM 101

http://recon.cx/2008/a/craig_smith/Neohapsis-VM-101.pdf

Four General Purpose Registers: r1, r2, r3, r4

Instruction Registers: IP, baseip

Stack Registers: SP, basesp

Flag Registers: flags

Our Virtual CPU Instruction Set:

MOV r32, r32 MOV [r1], r32 MOV r1, [r1] MOV r32, value CMP r32, value INC/DEC r32 AND/OR r1, value XOR r32,r32 PUSH/POP r32 JMP (Relative / Direct) JE, JL, JG CALL (r1 / value) EXITVM

UCSD Pascal p-code

https://en.wikipedia.org/wiki/UCSD_Pascal

https://en.wikipedia.org/wiki/P-code_machine

Cint, a C/C++ interpreter

VM instructions (from cint-5.18.00/cint/src/bc_inst.h ):

  void LD(G__value* pval);
  void LD(int a);
  void CL(void);
  void OP2(int opr);
  int CNDJMP(int addr=0);
  int JMP(int addr=0);
  void POP(void);
  void LD_FUNC(const char* fname,int hash,int paran,void* pfunc,
               struct G__ifunc_table_internal* ifunc, int ifn);
  void LD_FUNC_BC(struct G__ifunc_table* ifunc,int ifn,int paran,void *pfunc);
  void LD_FUNC_VIRTUAL(struct G__ifunc_table* ifunc,int ifn,int paran,void *pfunc);
  void RETURN(void);
  void CAST(int type,int tagnum,int typenum,int reftype);
  void CAST(G__TypeInfo& x);
  void OP1(int opr);
  void LETVVAL(void);
  void ADDSTROS(int os);
  void LETPVAL(void);
  void TOPNTR(void);
  void NOT(void);
  void BOOL(void);
  int ISDEFAULTPARA(int addr=0);
  void LD_VAR(struct G__var_array* var,int ig15,int paran,int var_type);
  void ST_VAR(struct G__var_array* var,int ig15,int paran,int var_type);
  void LD_MSTR(struct G__var_array* var,int ig15,int paran,int var_type);
  void ST_MSTR(struct G__var_array* var,int ig15,int paran,int var_type);
  void LD_LVAR(struct G__var_array* var,int ig15,int paran,int var_type);
  void ST_LVAR(struct G__var_array* var,int ig15,int paran,int var_type);
  void CMP2(int operator2);
  void PUSHSTROS(void);
  void SETSTROS(void);
  void POPSTROS(void);
  void SETTEMP(void);
  void FREETEMP(void);
  void GETRSVD(const char* item);
  void REWINDSTACK(int rewind);
  int CND1JMP(int addr=0);
 private:
  void LD_IFUNC(struct G__ifunc_table* p_ifunc,int ifn,int hash,int paran,int funcmatch,int memfunc_flag);
 public:
  void NEWALLOC(int size,int isclass_array);
  void SET_NEWALLOC(int tagnum,int var_type);
  void SET_NEWALLOC(const G__TypeInfo& type);
  void DELETEFREE(int isarray);
  void SWAP();
  void BASECONV(int formal_tagnum,int baseoffset);
  void STORETEMP(void);
  void ALLOCTEMP(int tagnum);
  void POPTEMP(int tagnum);
  void REORDER(int paran,int ig25);
  void LD_THIS(int var_type);
  void RTN_FUNC(int isreturn);
  void SETMEMFUNCENV(void);
  void RECMEMFUNCENV(void);
  void ADDALLOCTABLE(void);
  void DELALLOCTABLE(void);
  void BASEDESTRUCT(int tagnum,int isarray);
  void REDECL(struct G__var_array* var,int ig15);
  void TOVALUE(G__value* pbuf);
  void INIT_REF(struct G__var_array* var,int ig15,int paran,int var_type);
  void PUSHCPY(void);
  void LETNEWVAL(void);
  void SETGVP(int pushpop);
  void TOPVALUE(void);
  void CTOR_SETGVP(struct G__var_array* var,int ig15,int mode); 
  int TRY(int first_catchblock=0,int endof_catchblock=0);
  void TYPEMATCH(G__value* pbuf);
  void ALLOCEXCEPTION(int tagnum);
  void DESTROYEXCEPTION(void);
  void THROW(void);
  void CATCH(void);
  void SETARYINDEX(int newauto);
  void RESETARYINDEX(int newauto);
  void GETARYINDEX(void);

  void PAUSE();

  void NOP(void);

  // new instructions
  void ENTERSCOPE(void);
  void EXITSCOPE(void);
  void PUTAUTOOBJ(struct G__var_array* var,int ig15);
  void CASE(void* x);
  /* void SETARYCTOR(int num); */
  void MEMCPY();
  void MEMSETINT(int mode,map<long,long>& x);
  int JMPIFVIRTUALOBJ(int offset,int addr=0);
  void VIRTUALADDSTROS(int tagnum,struct G__inheritance* baseclass,int basen);
  void cancel_VIRTUALADDSTROS();

Mesa and Cedar

Mesa is interesting because a paper was published claiming that its bytecode had a very high code density (roughly 4 bytes per HLL statement; http://en.wikipedia.org/wiki/Mesa_%28programming_language%29#cite_ref-3 ).

Links:

Chapter: encoding

Instruction encoding

Fixed or variable length

The various syntactic constructs of a target language program can be of fixed or variable length.

An advantage of fixed-length encodings is that the instructions can be decoded in parallel.

Variable length encodings

For constructs with variable-length encodings, a way of finding the end of each construct must be provided. The possibilities and tradeoffs are similar to string length/termination schemes, and particularly to the tradeoffs between C-style strings and Pascal-style strings (see , below), except:

the language can have a hierarchical/recursive syntactic structure, meaning that it may be possible for variable-length length syntactic constructs to be composed of smaller fixed-length parts, or vice versa.
- this leads to interesting implications on the other advantages and disadvantages; for example, if one had terminator sentinel values to mark the end of the higher-level, containing syntactic contstruct, but not for the smaller constructs contained within it, this would only inhibit parallel decoding within each higher-level construct, not between them; and the chain of serial dependencies would likely be short, allowing semi-parallel decoding even within it
in many cases the length of variable length syntactic constructs can be inferred without spending space on a length header (e.g. in most variable-length assembly languages, the length of an instruction is inferred from its opcode)
since we are talking about program and not data, the language designer can specify various sentinel values and make sure no other instructions are equal to or possible even that no other instructions contain those values. Compare to the case with strings, where C-terminated strings cannot store binary data without escaping, which is a real handicap; by contrast, if an assembly language designer wanted to reserve opcode 0 for some sort of sentinel or terminator, it would be easy for them to simply not use opcode 0 for any other instruction.

Data encoding

String length/termination schemes

The two most popular schemes are 'Pascal-style' strings (each string has a fixed-length header (two bytes is a popular choice) that specifies the string length) and 'C-style strings' (each string is termination by a zero byte, or 'NULL').

Pascal-style strings have the disadvantages that:

they have a maximum length
if you are just looking at the bitstream without knowing which bytes are the string header, you don't know where the string ends
- this means that if you have a one-off error and lose your position in the bitstream, you will not be able to recover
- this also means that you cannot have a bunch of decoders looking at different parts of the encoded data and parse it in parallel; there are serial dependencies between each data item and the previous one, because you must know the length of the previous one before you know where the next one starts
if the header length is more than one byte, then it will waste space for strings of length smaller than 256

C-style strings have the disadvantages that:

there is no constant-time way to determine the string-length
- this makes some string operations that are O(n) less efficient, because sometimes there is a way to do the inner loop more efficiently but only if you know the length ahead of time, and for an O(n) operation it is often not worth it to walk thru the entire string before starting the real work
the string cannot store a NULL (without escaping); this means that the language must treat binary data separately from strings
they are prone to programming mistakes that can lead to security bugs, involving inserting NULLs into strings or buffer overflows
technically speaking, UTF-8 can contain NULL bytes and so cannot be stored unescaped as a C-style string. For this reason, many languages use Modified UTF-8.

Pascal-style strings can be extended by various encoding schemes that allow arbitrary lengths to be encoded into a variable-length header. For example:

the header itself can be null-terminated
the header could use https://en.wikipedia.org/wiki/Elias_gamma_coding , that is, the header can consist of two parts, a unary count of digits, and a binary representation of length whose number of digits is specified in the first part
the header could use some other prefix-free code such as the ones in https://en.wikipedia.org/wiki/Prefix-free_code#Techniques_2

notes-computer-programming-programmingLanguagesBook-programmingLanguagesPartTargetLanguages

Intermediate languages

Chapter : a tour of some language implementations

Go

Haskell: GHC

Python: CPython

Python: PyPy

Perl6: Rakudo

Smalltalk: Squeak

Erlang

Comparisons and observations

Chapter : a tour of some targets, IRs, VMs and runtimes

JVM

on Android

CLR

Instruction encoding

Links

LLVM

LLVM ISA

Instructions

Intrinsics

LLVM Links

CPython bytecode

Parrot

MoarVM

GHC Core

GHC STG

GHC Cmm (C--)

Smalltalk

Dis

ISA

Links

LuaVM

asm.js

Guile's VM

C--

Dalvik VM

NekoVM

BEAM VM

Nock

Data literal constructors and primitive operations

Data literal idiosyncracies

Primitive instruction set

Implicit primitive instruction cons

Macro instructions

Syntax

Evaluation strategy

The subject

The nock primitive instruction

Definition

Contrast with other the other instructions

Use in loops

eval([subject eval([subject x])])

[eval([subject select 1]) eval([subject select 1])]

[subject subject]

[[nock [select 1] [select 1]] [nock [select 1] [select 1]]]

[[nock [select 1] [select 1]] nock [select 1] [select 1]]

Contrast with 'eval' primitive in other languages

Contrast with 'apply' primitive in other languages

(lambda x . x x) (lambda x . x x)

Comparison with 'S' primitive in SKI combinatory calculus

Alternate Nock notation

Links

Runtimes

Apache Portable Runtime

libuv

libumem

Other useful runtime libraries

libatomic_ops

Links

Minimal Forths

C IL interpreters

Ruva VM (Ruby VM)

Neohapsis Toy VM 101

UCSD Pascal p-code

Cint, a C/C++ interpreter

Mesa and Cedar

Links

Chapter: encoding

Instruction encoding