Bayle Shanks's website: proj-jasper-jasperAssemblyNotes3

i've been reading about Lua's VM from

http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf

which is an excellent, easy-to-read document. I'm interested in the VM because Lua seems somewhat small, and the VM seems somewhat small, and i am looking at it as a basis set of operations for Lua (a core language). Seen for this purpose, there are some unnecessarily complex parts of the VM, doubtless for efficiency, such as:

the LOADBOOL instruction also has a built-in optional SKIP
some instructions (those with an RK in http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf , including GETTABLE, SETTABLE, ADD/SUB/MUL/DIV/MOD/POW) have a built-in 'constant addressing mode' (the high order bit of the 8-bit register number operand says whether this is to be interpreter as direct addressing (a register) or constant addressing (a constant)), but not all possible instructions (e.g. MOVE is just direct addressing in both operands). Note that the constants in these instructions are indices into the constant table, not immediate constants.

so i guess an assembly for Jasper would want to be less complex. Other thoughts:

at first glance, the proliferation of LOADK vs LOADNIL vs LOADBOOL seems wasteful, but upon second glance, it provides useful type information to the reader or compiler/interpreter
GETGLOBAL and SETGLOBAL don't refer to the global ID directly, but indirectly via the constant table
Lua's 'constant addressing mode' may be a better idea than my 'immediate addressing mode + variable-length constant suffixes'. Even if you wanted variable-length constants, at least with Lua's way these would be kept in the constant tables, rather than interspersed with code. My concerns about locality of reference and caching would probably be satisfied with Lua, since its 'code segments' are only the length of one function (identifying code segments with functions also makes things slightly less complicated). Note that I'd still like an 'immediate addressing mode' for small values. If we went this way, maybe we could also dispense with variable length prefixes and fix a set of prefix flags?
(begin part already copied to plPartImplementation). it's interesting to me how some languages go from a source language to a assembly-language-like IR or VM (eg Python, Lua, Java, Smalltalk?, Clang/LLVM), and other languages go from a source language to a non-assembly-like 'core language' and from there to either a C-like language or a VM (eg Haskell->GHC Core -> STG -> C--, Hoon -> Hoon 'non-synthetic glyphs' -> Nock, Shen -> KLambda). Note that (a) most languages' assembly-like language or VM or C-like language is Turing/von Neumann/imperative/linear, but Nock is not; (b) probably many languages that i am saying go straight to an assembly-like language have some sort of desugaring step, but to me desugaring a few constructs in a complex language is distinct from defining a small core language, although i agree that the distinction there is subjective; (c) there are also some language implementations that pass thru a 'generic AST' stage, such as gcc's Gimple, which i don't know much about but which i guess may be like an assembly-like IR, except in tree form instead of linear form (gcc goes: source language -> gcc GENERIC -> gcc GIMPLE -> gcc RTL; RTL is an assembly-like language; GENERIC is a generic AST language; GIMPLE is a 3-operand variant of GENERIC, i think); i see this more an a tree-structured form of assembly than as a core language.
- i'm guessing that the proliferation of steps in GHC is because they ultimately convert from one computational paradigm (functional language/'graph machine'/lambda-calculusy with lots of partial evaluation, laziness, closures) to another (turing machine/imperative/von Neumann); this probably happens in the STG -> C-- step.
- (end part already copied to plPartImplementation).
- i guess i like the architecture with the core language as an extra step. This allows you to conceptually highlight which high-level operations are important in your language, as well as to focus implementation effort around these without worrying about VM impedence mismatch until the next level down. For example, in Haskell we see that things like partially applied functions, closures (let-bound stuff), variables applied to arguments, and generic thunks are all pretty 'core'.
  - however one disadvantage here is that you lose the chance for optimized compilation of some larger structures, eg parts of the language that are broken into many small parts in the core language. One way to retain this is to have a large core, comparable to Guile's VM with about 180 instructions, but that's too much complexity for me. Another way is to allow lower level compilation stages to look back at higher-level representations, such as the original source code; http://www.aosabook.org/en/llvm.html argues/implies that doing that is part of what makes gcc dirty, but i might argue it's a good thing, as long as it's only permitted for optimization, not actual semantics, so that if the higher-level is unavailable, the lower level/later compilation stages can still do their job.

---

so, addressing modes:

immediate
constant table (constant table immediate)
direct
indirect

(note: in this scheme there is no stack addressing mode!)

an alternative:

immediate / constant table: high-order bit indicates whether this is an immediate constant or a constant-table constant
direct
indirect
stack

an alternative:

1 bit says: constant if constant: one bit says: immediate constant or constant table constant if non-constant: one bit says: direct or indirect stacks, and the top-two elements of each stack, are memory-mapped (so, as described above, with 4 stacks, that's 16 memory-mapped locations; one bit is whether this is a straight read/write or a pop/push, and one bit is whether this is to the TOS or the second-element-of-stack, and two bits are which stack) could also have one bit for whether we are accessing 'special' memory, like the memory-mapped stacks, or 'ordinary' memory; 'special' memory has the properties that (a) when you read from a special memory location, you might get a value different from what you last wrote to it, (b) 'reads' might cause mutation (e.g. when they are interpreted as POPs). We might extend this to say that special memory is 'volatile' and then accesses to it should be executed immediately and non-reordered because they might have arbitrary side-effects. We might also say that special memory is always thread-local and never shared.

also, as noted above, we need an 'emulation'/'cascaded addressing mode' bit (at least for direct and indirect addressing, do we also need it for constants?). do we need PC-relative addressing? do we need table (eg displacement) addressing?

if we have: constant bit, direct/indirect bit, emulation bit, special bit, that's 4 bits, so if we have 16-bit words, we have 12 bits left, ie 4k. note that in this scheme, addresses can be up to 4k but constants can be up to 16k, because constants don't need 'emulation' or 'special' bits.

(Instead, could use the ARM-ish barrel-shift-ish encoding where the 12 low-order bits of immediate constants are interpreted straighforwardly, but the two high-order bits shift the others, e.g. if the two high order bits are 00 then the value is the value given by the 12 low-order bits, if the two high order bits are 01 then the value is 2*(the value given by the 12 low-order bits), if 10 then 4*(the value given by the 12 low-order bits), and 11 is multiplication by 8, allowing representation of all values up to 4k, and some values with trailing zeros up to 32k. This sounds useful/efficient but is slightly more complicated, so i probably won't do it)

---

in [1] i wrote:

" Note that Haskell is implemented a bit differently from many languages; this stems from Haskell's properties:

lazy (ie. at runtime we often pass around suspended expressions or 'thunks' rather than values)
higher-order functions (ie. functions are first-class values)
currying (ie. if we apply a function f to two arguments using the expression 'f x y', and the result type is an integer, then (f x) is a function which returns a function which takes one argument and returns an integer; in some implementations, the 'f' in (f x y) could be internally implemented either by (a) a function which takes one argument and returns a function which returns an integer, (b) a function which takes two arguments, and returns an integer (c) a function which takes more than two arguments, in which case the result of 'f x y' is a partially applied function)

It is therefore assumed that Haskell programs will, at runtime, very frequently be dealing with unevaluated thunks, with closures which represent unevaluated or partially evaluated functions, and with the application of functions whose arity was unknown at compile time. "

due to these considerations, the GHC compiler appears at first rather complicated (for example: in reading about it i heard about 'spineless tagless G-machines' and i found a paper describing that and read about a weird system in which to evaluate a function whose arity is only known at runtime you put a bunch of potential arguments on the stack and then jump into code for the function and then IT decides how many arguments to consume, and if it generates another function as a result, it calls that function directly to consume the rest of the arguments, etc; then in later papers i found that they dropped this weird 'push/enter' system for the simpler 'eval/apply' system that i would have expected (the caller looks up the arity of function at runtime and then calls the function with only the arguments it will use, then calling the output of the function if appropriate), and also that they also later switched from a tagless system to one with tags (but not type tags, rather a tag for whether an object has been evaluated yet, and for which outermost constructor was used to generate the object, which is type-specific because different types have different sets of constructors, so we kind of have a type-within-a-type and the type is known at compile-time but the choice of constructor, the type-within-a-type, is not!). And then there are info tables describing each object, and the garbage collector has to read these, and the garbage collector also apparently can sometimes help in propagating tags for some reason, and it all sounds very complicated)

but now that i've read about that, i wonder if i can't boil down the basic operations that the runtime is trying to do into an intermediate VM language, and leave questions like push/enter vs. eval/apply, and of how/where you store the stuff that Haskell puts in type tags and info tables, to the VM implementation (compiler or interpreter). Then Jasper Core could do all this Haskelly stuff but the reference implementation could be comprehensible. Of course, the optimized reference implementation could compile this intermediate language into something more efficient, like Haskell does.

Now, what is the difference between GHC Core and the thing i am talking about? I guess it's that i'm talking about an intermediate VM language for what instructions are being executed at runtime, rather than a representation of the source code that is being compiled.

operations like: APPLY (what else)

also, other 'runtime'-y things, such as COW (copy-on-write)

http://www.haskell.org/haskellwiki/Ministg

so, this suggests the place of expressions are included in jasper assembly; not only because we could have an AST instead of a linear sequence of VM instructions, but also because (for the same reason that lazy languages are like a reduction of expression graphs), thunks must contain expressions. also, should think about how to generalize/unify this with dynamically passing around references to imperative code.

one piece of the puzzle that is missing is how to combine delimited continuation and friends (eg metaprogramming based on direct manipulation of the call stack) with haskell-style graph reduction this may also be related to the puzzle of how to have tail calls but also support good stack traces; as ministg points out, "Lazy evaluation means that the evaluation stack used by the STG machine is not so useful for diagnosing uncaught exceptions. The main issue is that the evaluation context in which thunks are created can be different to the evaluation context in which thunks are evaluated. The same problem does not occur in eager languages because the construction of (saturated) function applications and their evaluation happens atomically, that is, in the same evaluation context. ". In fact, you might say that in such a language there are not 2, but 4, stack contexts for each function: the lexical (source code) ancestors of the function definition in the AST, the lexical (source code) ancestors of the function application in the AST, the ancestors in the call stack at the runtime time of the function's creation, and the ancestors in the call stack at the runtime time of the function's calling. should also look at di franco's effort to combine call-stack metaprogramming with constraint propagation

people say 'synchronize' and have target language instructions like SYNC but what they really mean is often 'WAIT until the other thread gets here, or WAIT until the other thread does this', so i should call that opcode WAIT.

the idea of haskell data constructors as a type-within-a-type is interesting and possibly a fundamental way to generalize/simplify. So now we seem to have at least 3 types of types: atomic types (Hoon's odors), composite types (C structs, Python objects, Python and Haskell and Lisp lists, Python dicts, Hoon's structural typing), and the choice of data constructor within a Haskell type (eg is this List a Cons or a NilList?? is this Maybe a Nothing or a Just?) in addition, there are interface types vs representation types, which may be an orthogonal axis

why is a g machine graph not tree (recursion, but why?)? http://stackoverflow.com/questions/11921683/understanding-stg says "Haskell's pervasive recursion mean that Haskell expressions form general graphs, hence graph-reduction and not tree-reduction" but i'm not sure i understand that, wouldn't we just be working with a lazily generated (possibly infinite) tree?

another primitive that we may need to deal with is fexprs, that is, (first-class) functions that don't evaluate their arguments first: https://en.wikipedia.org/wiki/Fexpr (how is this different from any function in the context of a lazy language? should we just generalize this to strictness annotations on each argument?)

APPLY COW WAIT GETTABLE? SETTABLE?

from [2]: some core operations from Guile's VM:

getting and setting and isBound? of values from/to variables in various namespaces (the stack, the local variable stack frame, the 'toplevel environment', the 'program’s vector of captured variables'
boxing and unboxing
make-closure
procedure call and return
branch
data constructor (also primitive data?)
loading
primitive ops
not
structural equality
reference equality
null?
get, set
cons, car, cdr
struct-vtable (meta)
class-of
number
bool
nil
eol
char
symbol
list/vector/array
struct

a new (to me, not to the world) interpretation of Nock's core instructions:

select, identity, nock, isLeaf, successor, isStructurallyEqual:

head and tail on s-expressions,
get the environment as a value,
apply a function to the environment then run a function under the new environment being the result,
primitive arithmetic,
structural equality

in other words:

s-expression manipulation
environment manipulation
run a function within an environment
primitive arithmetic
structural equality

(note: no mention of stack manipulation/continuations/jumping here, except for 'run a function within an environment')

(note: we can construct functions because we can build an AST as data and then execute it as code using Nock)

(note: why is this 'functional'? maybe b/c: (a) operations are defined in terms of macros ie combinators, (b) to use eg the environment you copy it, you dont mutate it)

this interpretation shows the similarity between even Nock and various other core languages:

define and execute functions
manipulate composite data structures
manipulate the environment
primitive arithmetic
equality
(note: i guess we get boolean ops like AND because we have access to structural equality, not just equality, so we can simply compare inputs to entries in a truth table)
(note: as Nock shows, we can build both 'if' and also program composition from these primitives)

so some stack ops:

DROP DUP SWAP OVER ROT DUP2 DROP2 PICK TUCK

DRIP: a b c ... -> a a c ... (MOV reg1 reg2) DIP: a b c ... -> b b c ... (MOV reg2 reg1) NUP: a b c ... -> a b b c ... (PUSH reg2 onto the (rest of the) stack) NIP: a b c ... -> a c ... (POP the (rest of the) stack into reg2) TUCK: a b c ... -> a b a c ... (PUSH reg1 onto the (rest of the) stack) TAKE: a b c ... -> c b ... (POP the (rest of the) stack into reg1)

PUSH POP OVER UNDER PEEK

cow, fold, map as core ops

---

some things u see in a lot of languages:

APPLY (call a function, possibly with partial arguments or currying) APPLYTAIL return GOTO/JMP conditional JMP comparisons, esp. equality a way to make a composite data structure a way to index into or destructure that composite data structure arithmetic and other 'primitive functions' looping constructs variable assignment a way to make a closure (a function with an environment of variables to be used in that function) switch/jump table (special case: 'case' of an ADT) try, raise, catch call FFI

and some have: call/cc, delimited continuations, and other 'first class stack frame metaprogramming' first-class, executable representation of code polymorphic dispatch the nock operator (do something to the environment, then do something to the code, then evaluate the resulting code upon the resulting environment)

some popular data structures: tuples linked lists (used as binary trees in the Lisp or Nock way, e.g. first/rest) random access/packed lists/arrays/vectors dicts graphs? functions, closures, objects, tagged unions/ADTs

stacks seem to be used mostly for keeping track of evaluating an expression, which is shaped like a tree, linearly (eg in the AST you don't need the stack, but in bytecode you do). the call stack can be thought of as an extension to this

could have calls with arbitrary many arguments; have the second operand have a sentinel that means 'composite argument', which is found outside of the fixed-width bytecode (are the composite args embedded following the instruction, or in a special constant table?)

---