proj-oot-ootVm

Introduction and motivation

"Oot" has 3 well-defined compilation stages:

Oot -> Oot Core -> Oot VM -> platform-specific code

Oot is the high-level language. It is built from Oot Core using Oot Core's metaprogramming facilities.

Oot VM is a virtual machine targeted by Oot. Oot Assembly is the linear intermediate language/assembly language that the Oot VM runs. Oot Bytecode is a particular encoding (meta)format for Oot Assembly. Oot Core is interpreted/JIT compiled/compiled to Oot Assembly.

For bootstrapping, the Oot compiler is available in Oot Assembly.

Runtime 'administration/services' such as garbage collection, greenthread scheduling, copy-on-write, bounds checking, and non-strict evaluation are implemented 'beneath' Oot Assembly, in the Oot VM.

The rationale for Oot Core is:

Why have a VM?

Why have a linear Oot Assembly layer underneath Oot Core, instead of just interpreting the Oot Core AST directly, or instead of compiling Oot Core directly to machine code? The main reason is to serve as a thought experiment. This helps me:

Another reason is performance. Since i haven't implemented this yet, i don't have hard data on this, but I suspect that it will be easier to quickly execute a linear series fo fixed size instructions than to walk an AST, if only because this is close to the architecture of today's commodity computers. Wikipedia says "for interpreters, an AST causes more overhead than a bytecode interpreter, because of nodes related to syntax performing no useful work, of a less sequential representation (requiring traversal of more pointers) and of overhead visiting the tree" but "using AST has been proposed as a better intermediate format for just-in-time compilers than bytecode".

Another reason is that everybody else is doing it. Other recent interpreted high-level languages all seem to take this approach (eg Java, C#, Python, Perl, Ruby; even Lisp machines compiled Lisp to a stack-based linear language).

Note that although we are making an Oot VM, it might not be used by all implementations of Oot; optimized implementations of Oot might compile to another target, might directly interpret the AST (for JIT compilers that find this useful), or might create their own variant instruction set and bytecode (similiar to LuaJIT? or Dalvik). In contrast to Oot Core, whose structure is occasionally visible to programs when reflection or other metaprogramming is used, the Oot VM is really just an implementation detail (except that some of Oot VM's limits might be inherited by the HLL, eg the ~2048 number of local variables limit).

Why make our own new VM/bytecode, rather than targeting an existing one (eg JVM, .Net/CLR, Javascript, LLVM, BEAM, RPython, etc)? Because no existing VM quite meets our requirements:

Some properties we want Oot VM to have:

What do we mean by 'preserve intent', and why do we want that? What we mean is "don't map a specific operation S to OA code C if, by looking at the resulting C, you cannot discover that this is actually S without doing non-local analysis".

Some examples:

There are three reasons to 'preserve intent':

Efficiency: We want Oot Assembly to be reasonably efficient to interpret; however, typically efficiency should not be at the expense of preservation of intent or customizability. We expect that performance-minded platform-specific implementations might compile Oot Assembly into custom high-performance bytecode formats anyways.

Some features of Oot VM:

Encoding Format

Instructions are encoded in a sequence of 64-bit frames.

2 format bits + 62 bits depending on format

The format bits are as follows:

In the PRIMARY format, each 64-bit frame represents a single instruction. The PRIMARY format is the workhorse of OA. Most OA code can be straighforwardly represented in PRIMARY format. It is 3-operand register-based with addressing modes. You might call it a '4-operand format' because the opcode itself has an associated addressing mode, allowing 'first-class functions' to be stored in memory and called in one instruction (provided they take 2 arguments).

In the LONG format, many 64-bit frames may be combined to represent a single instruction or "sentence". The LONG format is an extremely extensible format which can represent each operand as a 'phrase' which may contain various modifiers and subphrases; this is an extensible AST serialization format. It is intended to be used when modes, modifiers, and metadata must be attached to instructions or to individual operands within instructions, as well as to allow OA to be extended to cleanly represent other programming styles.

In the SHORT format, one 64-bit frame represents 8 instructions. The SHORT format is intended to be used to concisely represent the calculation of single expressions. It is stack-based.

Short Format

1 format bit + (2 bit addr mode + 5 bit operand) + 7 x (1 subformat bit + 2 bit addr mode + 5 bit operand)

The SHORT format is a 64-bit packed grouping of 8 sub-instructions, each one 8 bits long except for the first one, in which the first bit was consumed by the format bit (which is '0').

Because the leaf and interior nodes of expressions produce values which are consumed exactly once, and because the interior and top nodes of expressions consume their inputs (as opposed to assigning them to a variable from which they might be re-used in some other expression later), expressions can be represented in a concise RPN form in which there is a sequence of various operations which are all implicitly assumed to take their inputs from, and push their inputs onto, the stack. The SHORT format provides for arguments to be PUSHed onto the stack from any of the first 16 registers, for a sequence of any of the first 32 opcodes to be called with stack-implicit arguments, and for a single value produced at the end to be optionally POPed from the stack and stored into any of the first 16 registers. Stretches of code which require no control flow operations besides function calling and value selection, no more than the first 16 registers, no more than the first 32 opcodes, no more than the first 4 addressing modes, and no metadata or modifiers, could conceivably be written entirely in this format.

The first instruction is a 'PUSH' instruction which pushes a value onto the stack. The next 6 instructions are each either PUSHs or operations which take their arguments from, and push their results to, the stack. The last instruction is either POP instruction which pops a value from the stack, or another operation.

In the first subinstruction, the addressing mode and the operand specify the data which is to be PUSHed onto the stack.

In the next six subinstructions, the subformat bit chooses between 'PUSH' or 'OPERATION'. In the case of PUSH, the addressing mode and the operand are used as in the first subinstruction, to specify the data which is to be PUSHed onto the stack. In the case of OPERATION, the addressing mode and the operand are used to specify an opcode which is executed.

In the last, 8th subinstruction, the subformat bit chooses between 'POP' or 'OPERATION'. In the case of POP, the addressing mode and the operand are used to specify the effective address to which the result which is POPed from the stack is to be stored. In the case of OPERATION, the addressing mode and the operand are used as in the previous six instructions, to specify an opcode which is executed against the stack.

The inputs and outputs of the OPERATION instructions are all implicitly the data stack (stack0) (eg the inputs are POPed from the stack and the outputs are PUSHd to the stack). Note that OPERATION instructions may consume any number of inputs from the stack (todo must this number be fixed for each operation, or can some eg consume the stack up to some terminator? i'm thinking if not fixed, then known at compile-time at least).

Primary Format

2 format bits + (2 custom flag bits + 4 addr mode bits + 8 operand bits) + 3 x (4 addr mode bits + 12 operand bits)

Opcodes

The first field is the opcode field. The first four bits of the opcode field are the address mode bits. The addressing mode is applied to the 11-bit operand to generate the opcode field value. The opcode field value is broken down into:

Long Format

todo

see ootAssemblyThoughts for the previous proposal. Note that now that we are stealing 2 format bits from the header, it would have to be modified at least a little. Maybe since we have short formats now we should ditch the 8-bit payload format; that saves 1 bit, we still need another. Also, the addr modes are probably custom, not 'split addressing'. Also, since we have the other formats now, no need to specify anything for role 6.

Also, do we want to add a 'map' that says what the child nodes will be and what their offsets are, rather than just linearly reading and encountering roles and then a stop?

Also, do we want to add backwards offsets at each child node to say how to find its parent?

Both of these ideas make sense if we think of this as an AST serialization rather than as a natural language 'stream encoding'.

Also, the format currently described at ootAssemblyThoughts was designed to permit arbitrarily long sentences, by using 'stops' instead of headers with length. But in the real world, we'd probably prefer to have header, perhaps with a 'combine with next' semantic if the header says the max length.

We might also want to have a map to the starting location (offset) of each child of the current node, and the role of that child. And maybe, if the node has a single parent, the offset back (or forwards) to that parent. Otoh this information is implicit in the data in any case, and who is to say that it is more efficient to store it explicitly; that depends a lot on the application. And besides, i'm not trying to make some complicated red-black tree or anything, i'm trying to make something very simple.

But if we did have a map, it would have to have length limits. And maybe the numbers 4 and 16 are useful here (and possibly 8). If we had a 4-bit role and a 4-bit offset for each child, that's 8 bits in the map, so if we had 8 children we'd already fill up a 64-bit frame. If we had a 3-bit role and a 3-bit offset, and 8 children, that's 48 bits, so at least we have room for a little data here or maybe an offset to the parent. If the length needed for the children exceeds these limits, we can ay 'use the parenthesis construct' which is a 64-bit frame that is mostly just a size of the subtree.

Custom Instructions

This proposal allow us to start with a tiny, very easy to implement set of primitives, and then expand them using the language itself, rather than to make a tradeoff between ease of porting and expansiveness of instruction set.

There are various types of 'custom' metaprogrammabilty, eg custom ops, custom types. i think this means that ops, types etc should be classified into:

My language may be a little sloppy below, since i made up that terminology right now, after writing the rest.

In addition to custom instructions, there are first-class functions that can be called like instructions via using the addressing mode on the opcode field; this functionality is referred to as 'call3'. Typically the opcode's addressing mode is immediate constant; but by using 'constant table' addressing mode, module functions can be called, and by using the other addressing modes, function pointers sitting in memory can be called (so in this last case, which function is executed is only known at run-time, not compile-time).

call3s are like macros that substitute their 'body' inline whenever they are seen, except with references to the special ARG0 memory location replaced by the effective address indicated by the first operand, and references to ARG1 and ARG2 replaced by either references to the values indicated by the second and third operands, or the effective addresses, depending upon if the assigned opcode is less than 0xC00; and 'JMP' appended to their body whose target is the instruction following their invocation. Custom instructions do create a new 'scope' with respect to label numbers; custom instructions are not aware of and cannot jump to labels outside of their own implementation, and labels within custom instructions are only accessible from jumps within those instructions; if custom instructions containing labels are inline expanded, the labels must be re-numbered so as not to conflict with labels in the calling code.

Some interpreters may implement custom instructions using a hidden stack instead of inline expansion.

Custom instructions may not invoke themselves; and they may not make use of custom instructions defined after themselves. There is no way to recursively call much less 'tailcall' a custom instruction; when you define a custom instruction, please use iteration instead of recursion. This limitation ensures that custom instructions may be inlined (or, that if they are implemented using a hidden stack, that there is a fixed upper bound on the hidden stack depth).

todo: do we place some requirements on call3'd subroutines that they must complete quickly and cannot utilize the underlying platform, for the purpose of being able to use the underlying platform stack as the hidden stack without saving it when we want to return control to the underlying platform?

Custom instructions are declared by a map of opcodes to a tuple (module identifier, function).

Custom instructions must not have cyclic dependencies, and must be defined in order such that when one of them is defined, any custom instructions used within its implementation have already been defined. (the order of the assigned opcodes may be different from dependency order).

The form of first-class functions is the same as custom instruction declarations (with a 0 in the opcode field). If the opcode operand resolves to an integer, then an instruction (built-in or custom) is indicated; if it resolves to a pointer, a first-class function is indicated (note: this means that there must be a 'function type' which can be represented as either an integer or a pointer).

Function declarations include:

Custom addressing mode declarations include:

The first four custom addressing modes are 'direct'; their operands represent registers, and their effective address is that register; these are the only ones whose effective address may be a register.

Custom addressing modes must leave the stacks unchanged (except if they are direct and the register operand is a memory-mapped register). This makes it easier to analyze the program for changes to the stack.

TODO: actually what if we want to use the extra addr mode bits to represent per-operand custom flags? in that case we need to wrap the whole instruction, not just the argument evaluation.

There are also custom flag bits, which are defined by a wrapper that is called in place of and wraps the following op. Like call3s, they can look at the link register to get a pointer to the instruction that they are supposed to wrap. When they are called INTROSPECT_INSTRUCTION is automatically called on the wrapped instruction to break that instruction into (4 operands and 4 addressing modes) and put those parts on the stack. The custom flag implementation can and typically does use a form of EVAL to execute those parts, possibly after modifying them.

Requirements on custom flag bit semantics:

OA does not support variadic instructions.

Self-modifying code per se is not supported, although EVAL is.

Addressing Modes

There are 4 built-in 'core' addressing modes and 12 custom addressing mode slots.

Built-in addr modes are:

'immediate constant' in the output addressing mode with operand '0' means 'discard'. Other operands for 'immediate constant' addressing mode in the output are forbidden, except in certain compile-time metaprogrammy operations, for example IMPORT.

'constant table' in the output addressing mode is forbidden unless the constant referenced is a global static pointer.

some ideas for other addressing modes:

Modules

Modules must declare custom instructions, custom addressing modes, and their constant table. All of these declarations must be done completely at compile-time. Custom types don't need to be declared; they are just stored as pointers.

There are two places to declare custom instructions; as unused opcode immediates, and as function-typed entries in a constant table. There (probably) is a separate constant table for instructions (ie for when the opcode field has constant table addressing mode). tombdo is there any difference between custom instructions as immediate opcodes, and as function-typed entries in a constant table? I'm thinking that the latter would be more like library functions and the former would be more like new 'instructions' but what is the difference? Perhaps the difference between the two is that the former is hardwired as builtins in the OA implementation and the latter is per-module; or that the former is per-whole-program (allowing faster dispatch) and the latter is per-module. Or that the former can be either OA code or hardwired in the implementation, while the latter can be OA code or an external C library. Or the latter can be used by a module loaded at runtime but the former cannot. Or something. Currently my best guess is that only constant table customs are per-module; the others are either per-program, or part of the implementation (perhaps an interchangable part though, eg each program can specify an Oot 'profile' different ones of which have different selections, and some profiles are implementation-dependent..).

Modules may import other modules. This is done by listing the other module in the imports table within the module info table. There is also a runtime instruction IMPORT. This instruction must use immediate-mode addressing to specify the destination integer identifier (this is to allow the implementation to know what module identifiers are being used at compile-time).

Modules may also import custom opcodes, types, and addressing modes from the other modules into their own. This is done by listing the other module into the metaprogramming imports table within the module info table. There is no way to do this at runtime. Trying to import metaprogramming from different modules that define overlapping opcode, type, or addressing mode identifiers is a compile-time error.

There is a special module BUILTINS. This is meant to be an implementation-dependent module that contains opcodes, types, and addressing modes that might be given special treatment by an Oot Assembly implementation. By tracing imports of opcodes, types, and addressing modes from this module, the implementation can know at compile-time where its optimizations can be applied. The semantics of standard opcodes, types, and addressing modes, even non-core ones, must not be altered in the BUILTINS module's implementation; however BUILTINS can also define non-standard opcodes, types, and addressing modes. (todo is this a good idea? or just autoimport all implementation-dependent builtins to EVERY module?)

The BUILTINS module is auto-imported at module identifier 0. The current module is auto-imported at module identifier 1.

Absolute memory addresses (constant table mode), eg those in the operands of JMPs, are all implicitly module relative (pointers, however, can be shared between modules). Integer JMP addresses refer to a count of Oot Assembly instructions in the module.

Each module has:

Length fields are 16-bits, but their values cannot be more than 4k. Lengths mean how many items each table has (not the number of bytes that the table takes up).

(todo: is this really the best format? should we (just) place the lengths in the tables themselves, or only have lengths and let the pointers be computed?)

Symbol names data is the data which can be stripped for more compact builds. It contains ASCII string names and source maps for the module and for each defined custom addr mode, custo opcode, function declaration.

Types

There are some built-in types that don't need to be declared (the first 16, perhaps?).

types todo: probably want at least integer, floating point, string (bytestring or unicode or both?), object?; the INSP of Parrot, and mb also of MoarVM?

another idea for 4 core types: pointer int16 other/custom static type (has already been statically typechecked) dynamic (boxed/wrapped, tagged type; runtime typechecking requested)

an idea for another 12 'base' types:

boolean/single bit unsigned int16 byte float64 arbitrarily large/precise rational unicode string array cons cell/linked list/binary tree graph/hash/dict (opaque, as opposed to AST) function/callable, thunk/promise/future/lazy value AST/expression/block/logic greenthread/process/channel(/actor?)

(note that some implementations might not support all of these types; eg an embedded implementation might not support floating point, arbitrarily large/precise rational, unicode)

OA is typed in the sense that every scope has a type declaration for every register it uses, and every instruction has a declared type signature; there is however a 'dynamic' type. 'dynamic' type means that this location can hold a value of any boxed type. Boxed types have a type tag. Values in locations with 'dynamic' type must be 'coerced' via a COERCE instruction to their actual type before they are moved to locations with their actual type, or passed as arguments to instruction demanding their actual types; the COERCE instruction causes a runtime exception if the value isn't actually the correct type (as determined by its type tag). Type identifiers are 12-bit integers.

There are also a few special memory locations whose type varies by scope, namely the call3 inputs and output registers. In addition, there are stacks; stacks are typed via 'stack maps' like in the JVM. Stack maps are like regexps, and can contain unbounded repetitions of patterns. Instruction type signatures include a specification of their alteration of the stack (in the form of eg: upon entry, i expect the top of the stack to contain an int8 and an int16, which will be consumed; upon exit, the top of the stack will contain an int16 (note: the instruction does not HAVE to consume the inputs, but if it does not, it must list these as outputs)).

(i suppose in the end we also need a way to coerce raw bytes to typed something, for interop?)

Some instructions have polymorphic forms, eg forms whose input type signature includes 'dynamic'. Some of these instructions also have specialized forms, that is, forms where the 'dynamic' in the type signature is replaced by a specific type. In this case, the instruction will have the same mnemonic but a different opcode, and the Oot implementations handles dispatch to the specific form if there is one (by, at runtime, checking for a specific form before each call to an instruction with a dynamic input type signature).

todo i think this is out of date: Just like memory locations are typed, the stack is typed. But since the program is allowed to arbitrarily manipulate the stack, we cannot always statically know what the 'type signature' of the stack (the stack map) will be. So, what we do is we say that, so long as the number of things on stack0 is predicatable, and stack1 is only being accessed implicitly by CALLs and RETURNS and their associates, we can infer the stack map (like the JVM verifier does (used to, now i think stack maps are made by the compiler)), but if the program wants to change the stack in a weird unpredictable way, then it has to start including assertions saying "i'm about to add an arbitrary number of things to the stack" and "i'm about to directly manipulate stack1, consuming n elements" (so that the compiler can remember that, at that point in time, the stack-so-far (above the n consumed elements) is known but everything added to the stack on top of that is unknown). When the program is done doing weird stack writes, it can then assert how much it consumed and then the program continues with the rest of the implicit stack map governing (i could probably find a clearer way to write that...). And if it wants to read the unpredictable regions of stack0, or any of stack1, then each time before reading, it must assert that there is no stack underflow, and also the type of the thing it read (which can be 'dynamic'). This allows the program to use the stack efficiently in normal circumstances; to manipulate the stack arbitrarily if it wants to, at the expense of runtime checks; and to do one-time manipulations (such as handling exceptions) which leave the stack in a predictable state after which efficient stack usage can resume.

Although OA is designed for optional static typing, unlike eg the JVM, there may not be any typechecking done at runtime; typechecking is provided as a separate OA program (written in OA). OA users are urged to typecheck any untrusted or newly written code before running it.

(do we allow 'subtypes'? eg can we write a function that says "i'll take anything which is an int or a float" and then assert at runtime that some value is an int or a float, without saying which one?)

Type declarations include:

Values of the 'pointer' type are not necessarily 'pointers' on the underlying platform; they are simply some implementation-dependent reference value that allows dereferencing via the indirect addressing mode, and 'pointer arithmetic' to traverse adjacent fields. A 'pointer' could be represented by a machine pointer although with information about the width of record fields, by an integer into an array of emulated 'memory locations', by an object with methods to 'get' and 'set' its value and to get the 'next' and 'previous' locations, etc. Dereferencing a 'pointer' could involve calling garbage collection read-barriers, and writing to it could involve calling GC write-barriers, all 'under the hood'.

Memory

4k local variable locations or 'registers' (or less; in each module header it says how much it needs; perhaps memory sizes are always powers of two (this apparently makes eg pointer tagging easier))

memory map and memory-mapped special locations 0: PC (program counter) (do we forbid writing to this? i think so; we want to make programs use control flow ops like JMP for easier analysis) 1: ERR 2: stack0 push/pop 3: direct read/write access to the top of stack0 (without pushing or popping) 4-7: GPRs 8: stack1 push/pop 9: direct read/write access to the top of stack1 (without pushing or popping) 10: current size of stack 0 11: current size of stack 1 12: MODE 13: ARG0 (call3 operand0/output) 14: ARG1 (call3 operand1/usually input) 15: ARG2 (call3 operand2/usually input) 0x010-0x7FF: general purpose registers (GPRs) (accessible in the opcode field) 0x800-0xFFF: general purpose registers (GPRs) (not accessible in the opcode field)

note: this is not to say that the virtual machine can only access 4k of memory, just that it has only 4k of registers. A pointer can be placed in any of these memory locations, and that pointer can point to somewhere else. Likewise, the PC can point to memory locations outside of this map. These alien memory locations are always considered to be 'higher' than this memory (eg if compared, the alien memory address is 'greater than' 0xFFFF) but don't have to actually be that way in the underlying hardware. The alien memory addresses can be incremented as usual. These 4k locations are the only ones that can be accessed in direct addressing mode; however other locations can be accessed via indirect addressing mode. There are also absolute JMPS, which although 'absolute' are actually relative to each module.

There is no way to get a pointer to a register.

I call these locations 'registers' but really they are more like local variables.

If you do an operation on 'the contents of register 7' but the operation calls for some struct, then it is implicit that register 7 actually holds a pointer to that struct. If the contents of register 7 is typed as a struct (of some specific type), then the semantics of 'copy register 7 into register 8' is to copy the STRUCT (with COW? maybe not, if it is not boxed), not the pointer.

It is illegal and undefined to read from uninitialized registers.

The machine handles much of memory allocation/garbage collection transparently.

Stack

There are two stacks, intended to be used as data stack (stack0) and call stack (stack1). However, these stacks have different properties, and furthermore are related. The state of the part of the data stack relating to the current (context? block? function?) (eg the current 'stack frame' or current 'activation record') at each line of code must be statically known ('state' meaning its height, and the type of each object in it) .

The call stack contains only one type of item, 'stack frames'/'activation records' (or pointers to them), which each contain a PC (program counter), registers (local variables), and the associated data stack 'frame'; however, unlike the data stack, the call stack can be arbitrarily manipulated, that is, there is no need to statically know its height.

If a program attempts to access beyond the end of either stack, or even beyond the end of the current 'stack frame' within the data stack, then the program can get any value as a result, or an error might occur.

Note that as we pass between different (contexts? blocks? functions?; is the current whatever-it-is at the head of the call stack?), the current stack frame of the data stack changes. If the call stack is manipulated, eg if two item are SWAPd, then it's as if the currently-unreachable-part-of-the-data-stack-below-the-current-stack-frame also changes. So in a way, manipulating the call stack also manipulates the data stack, but not the other way around (however, manipulating the call stack doesn't change the current data stack frame).

Note that the call and data stacks can/should be thought of, and implemented, as a single stack, composed of 'activation records' each of which contain a PC, registers, and a mini data stack. Whether the activation records are actually on this single stack, or whether the single stack contains of pointers to heap-allocated activation records, or some mixture, is an implementation detail (probably the stack actually contains the activation recordsstack frames when there is no need for a closure that outlasts the relevant dynamic context; and contains pointers to activation records when a closure to this record may be needed). The separation of data stack/call stack may be thought of as (a) a safety mechanism, to make it clear to which part of the stack the mandatory typechecking applies, and (b) an encapsulation and convenience, to prevent OA code from having to directly deal with stack pointers or the structure of activation records.

Modes

The MODE register sets various modes. It has 1 bits. By default, it starts out in mode 0.

bits in MODE from MSB to LSB:

Aside from 'external' state such as memory and I/O, and the stack, OA itself has no 'hidden state'; the state is visible in the registers.

Errors

There are four error handling modes, controlled by the MODE register.

In 'crash' mode, any attempt to write a non-zero value to the ERR register is immediately fatal.

In 'handle' mode, any attempt to write a non-zero value to the ERR register succeeds and then immediately causes the current value of the PC to be pushed onto stack1, and then causes a JMP to the beginning of the program. Note that programs run in this mode should be sure to check the ERR register at the beginning and if it is non-zero, to 'handle' the error in some fashion.

In 'restart' mode, any attempt to write a non-zero value to the ERR register immediately causes the stacks to be emptied and a JMP to the beginning of the program. Note that 'restart' could be emulated by putting a routine to clear the stacks and the ERR register at the beginning of the program and running in 'handle' mode.

In 'tolerant' mode, any attempt to write to the ERR register succeeds.

Note that it is possible for a subroutine to leave the ERR register in a non-zero state by saving the current MODE, setting the error-handling mode to 'tolerant', setting the ERR register, then restoring the error-handling mode. By this method (or by the NONFATALERROR opcode, which does the same thing), a subroutine can return an 'error code' which is (unchanged) upon success and non-zero otherwise, which may be more convenient than returning multiple values via the stack.

Calling

There are (at least) three explicit CALL instructions, in addition to CALL3 functionality:

Limits

Note than although the instructions can only hold 12-bit immediate values, other addressing modes can be used to give larger values. Unless otherwise specified, anywhere an integer may be given, its size may range up to 64k - 1 (and possibly larger in some implementations). Many things have a limit of 4k - 1, however. These include:

(todo: is the 4k limit really needed for any of these?)

OA implementations may also limit the number of allowed labels. The limit must be at least 4k.

FLAGS (lack thereof)

There are two flag bits per instruction.

Note that further use-cases for operand-level flags can be emulated by custom opcodes.

Boxing

in the boxing:

Custom flag bits suggestions

Atomics and memory model

todo

how about:

but mb we are not even having shared memory at all?? in which case we need some IPC/message passing primitives at least, instead.

Concurrency

The Oot VM offers only cooperative multitasking. The MAINT instruction must be called to cause the VM to consider yielding control to another greenthread, and one blocking greenthread causes the others to block.

Note that Oot Core is preemptively multitasked; the compiler/interpreter inserts MAINTs upon:

and avoids blocking (or, creates another OS-level thread if a potentially blocking operation must be called).

An object passed between processes cannot contain pointers to outside of itself.

Garbage collection

todo: do we garbage collect cyclic references (like Python) or do we require weak pointers to break the cycle (like Apple Objective C ARC)? Do we have mark-and-sweep style GC or reference counting-style or some combination? Are we like Golang? Like Python? Like Inferno?

Garbage collection is only considered upon memory allocations (if there is no more memory available; todo do we do garbage collection before requesting more memory from the OS, or only if the OS says that no more memory is available?) or upon calls to the MAINT instruction.

todo: how does the HLL tell the VM where pointers are in data structures? What about pointers to locations 'inside' data structures (as opposed to the head of data structures)? do we have read barriers? write barriers? handles?

Typechecking

Explicit stack maps are not required (because that makes it harder to write a code generator) and stack typechecking is not guaranteed (because that increases starting time) but code that would fail stack typechecking is nevertheless illegal. A stack map inference engine and checker will be written in Oot and included in the Oot distribution, and perhaps the executable will use it by default, or at least have an option to require it before executing code (other than the Oot language implementation itself).

Notes on some instructions

JMP: if given a pointer, JMP to that pointer (can you JMP to 'data' locations or just locations marked as 'code'?); if given an integer argument, jumps to the n-th Oot Assembly instruction. NONFATALERR, see above

Core instructions

These instructions must be implemented by someone porting OA. All other 'base' instructions are ultimately reduced to these.

When an instruction 'returns' a value, this means that the contents of effective address of operand 0 is set to this value.

probably not core but still considering:

add-on profile modules:

More Instructions (non-core instructions)

Note that Oot Core can be simpler than Oot Assembly in there may be instructions in here which are not Oot Core primitives, but rather correspond to Oot Core Library constructs. In this case the compiler/interpreter uses the 'jet' system to recognizes the Oot Core Library call and override it with a single Oot VM instruction.

functional:

collections:

hashing:

automata:

comparisons:

arithmetic:

moves:

control:

stack ops:

graph:

memory:

system calls:

promises:

(todo: what else?)