proj-oot-ootAssemblyThoughts

Oot Bytecode and Oot Assembly

See https://gitlab.com/oot/ovm

Introduction

Oot Bytecode (OotB?) is a simple, easy to implement virtual machine. Oot Core is implemented in OotB?. OotB? only has a few 'primitive' instructions; this language is called 'primitive OotB?'. The OotB? Stdlib contains implementations of other instructions built out of the primitive instructions. This makes it easy to bootstrap a naive implementation of Oot; all you have to do is to implement an interpreter for primitive OotB?.

Motivation

Why did we invent yet another bytecode format? Design goals include:

Popular existing bytecodes are not 'very easy' to port to a new platform, and are too 'low level' to directly preserve high-level language constructs (HLL constructs could be preserved via annotation, but then the annotation format effectively forms another programming language anyways).

Oot Bytecode is very easy to initially implement because it has so few primitive instructions. This naive implementation will be very slow but can be incrementally improved by overriding the default interpretation of various stddlib instructions with custom versions using the target platform's native constructs.

Oot Bytecode preserves high-level language constructs via the 'stdlib' instructions that are built out of the primitives. This allows a compiler to an HLL to emit somewhat readable target code that uses the platform's native data structures just by recognizing those stdlib instructions that have an obvious counterpart in the target platform.

Note that we want to be able to efficiently compile the Oot Core implementation from OotB? to even a very inexpressive, static language. Therefore we try to limit run-time dynamicity in OotB? and in the Oot Core implementation.

Overview

Oot Bytecode is the object language of an imperative virtual machine with very few primitive instructions. We hoped that typical Oot Bytecode will be able to be compiled with relatively little effort into relatively readable code in foreign HLLs.

It is similar to the ISA of a CISC 3-operand memory-memory machine, but it is more abstract (perhaps similarly to Lua bytecode). It abstracts away from details such as data representation format, and its 'memory locations' or 'registers' have semantics more similar to local variable identifiers rather than an array of bytes (in this way it is more similar to eg Lua bytecode). It has operations analogous to pointer arithmetic, but these actually correspond to moving across fields in a struct rather than to indexing into an array of bytes. It is relatively agnostic regarding the actual bit width of integers, pointers, etc; in fact, a single item in the constant table can represent an arbitrarily large object. It is a memory-memory (rather than register) machine, but it can directly address (with absolute immediate addressing) only 4k 'memory locations'; however, opaque pointers can point to other 'memory locations' beyond what can be directly addressed.

It has 8 addressing modes, plus a 'boxed' addressing mode bit. The use of 3-operand format and 8 addressing modes is designed to reduce the need for intermediate instructions moving data between virtual registers, which makes it slightly harder to compile programs in other VM architectures into readable HLL code.

It has 3 instruction encodings, SHORT, MEDIUM, and LONG, all three of which consist of a sequence of 64-bit units. SHORT packs 8 fixed-length instructions into each 64-bit unit; MEDIUM is a fixed-length encoding with 64-bits per instruction; and LONG is a variable-length encoding. MEDIUM is the basic encoding; SHORT is a denser encoding that can only represent a subset of instructions; and LONG is an extensible encoding.

The compilation unit for Oot Bytecode is an Oot Bytecode Module. Modules have important limits as a consequence of length limitations in the 64-bit MEDIUM encoding; most notably, the number of functions directly exported by one single module cannot exceed 2048 (a single module can, however, also export other modules, and collectively these can export more than 2048 functions).

Oot Bytecode has polymorphic typing and also has first-class functions and has a function calling syntax in which the opcode field of the instruction, instead of containing a number identifing a Oot Bytecode Core instruction, contains a reference to a memory location which itself contains a reference to a particular function. This is accomplished by giving an addressing mode to the opcode (hence, the opcode field is called the 'opcode operand').

In addition, many integer opcodes are assigned to refer not to Oot Bytecode Core primitive instructions, but rather to 'custom instructions' implemented in the Oot Bytecode Standard Library. These opcodes do not need to be implemented by an Oot Bytecode implementer; they need only implement the Oot Bytecode Core instructions, since the Library is ultimately defined in terms of those primitives. Advanced implementations can treat the custom instructions as compile-time macros and expand and inline them at compile-time; or they can replace the provided implementations with platform-native implementations.

Oot Assembly, the textual language, also has a separate compile-time macro facility, which are expanded during compilation to Oot Bytecode. Why have 'custom instructions' if you already have compile-time macros? Because Oot Bytecode 'custom instructions' can more easily be replaced by with platform-native implementations by an interpreter.

SootB

todo now it's the same as SHORT, plus 64-bit LOADI (16-bit immediates) with restricted addr modes, and maybe a STOREI that can write to 11-bits of registers

SootB? is a simple stack-machine single-threaded subset of OotB?. SootB? is designed to be really easy to implement an interpreter for. There is a compiler from OotB? interpreter to SootB? implemented in SootB?, so this allows you to bootstrap OotB? and hence Oot from just a SootB? interpreter. This will be slow and you will probably want to eventually extend your SootB? interpreter into a full OotB? interpreter, but in the meantime you'll have a working Oot installation. Since SootB? is a subset of OotB?, you can work on extending it incrementally (however you won't experience any speed gains until you are all done, at which point you can throw out the provided OotB?-on-SootB? interpreter and replace it with your own native OotB? interpreter).

In the following, TOS stands for 'top-of-stack', the value on top of the stack.

A SootB? machine has program code, a program counter, a stack, a linear memory, and an error register. It is untyped. Each location in the stack and the memory can hold a value which can be treated alternately as an unsigned short integer (which can at least hold values between 0 to 32767, inclusive), or as a pointer. Pointer arithmetic and arithmetic between pointers and integers is permitted.

Program inputs including any file descriptors needed for console I/O are placed on the stack before starting the program. The program terminates when a HALT instruction is encountered.

SootB? instructions:

Instructions must be 64-bit aligned. Note that ANNOTATE, J, BRZ, BRNZ, LOAD, STORE, SYSCALL, and LOADI are encoded as 16 bits (4 bit opcode and 12 bit data), and everything else is just 4-bits (just the opcode).

The code of the initial program is not necessarily accessible via memory. However, you can allocate memory and put code in it and jump into that.

Program code does not necessarily have to fit in 4k, but you can only absolute jump to the first 4k of memory. You could have a jump table and LOAD an item from it and then JPOP to that, though. Similarly, by storing a pointer in the 4k you can load/store beyond 4k, but you can only directly load/store to the first 4k.

Annotation types:

some syscalls:

In more detail:

todo: we need some way for GC to query for a list of pointers in any memory subspace, or at least if a given memory location contains a pointer

Instruction encoding:

Each instruction is encoded as a fixed-length 64-bit chunk, given below:

todo

How SootB is a subset of OotB

Each SootB? instruction is also an OotB? instruction:

OOTB imports and loading

Behavior inherited from Oot

how module loading works

To minimize PATH manipulation, the PATH must be manipulated via the path_insert() and path_remove() functions. To minimize filesystem queries, the contents of filesystem directories in the PATH may be examined and cached at any time (they may or may not be examined lazily rather than at program start); you must call the system path_cache_invalid(subpath) function if you need to discard part of this cache after a possible mid-runtime change in the contents of these directories.

The init() functions of modules are run at some point before the namespace exported by a module is accessed by other modules, but possibly lazily, in parallel, and in any order. A module may import a module that imports it (cyclic imports are allowed). However, the init()s of modules must not cyclicly depend on one another.

If a module is imported by multiple other modules, these may or may not be realized as separate instances of the module.

Authentication

A module is considered 'weakly authenticated' or just 'authenticated' if it is signed by a public key trusted by the implementation (typically this will include keys given on the commandline and keys in a persistent keystore), and if all of its imports are authenticated (or, in the case of cyclic imports, if all modules in the cycle are signed by a public key trusted by the implementation). It is considered 'strongly authenticated' if it is authenticated and all of its non-official import statements include an inline public key or public key hash, and all of its imports are strongly authenticated (or, in the case of cyclic imports, if all modules in the cycle meet these conditions).

Oot implementations may include commandline switches to determine:

Instruction encodings

todo: the following is a little out of date. Now the shortest format is 8 bits, with 2 form bits (Bootx; 64 opcodes which are just shortcuts for longer fully specified instructions); the next is 16 bits (Boot), with 1 form bit (and 4 bit opcode, 3 bit operand, 2x4-bit operands); the next is 32 bits (BootX?), with 3 form bits (and 5 bit opcode, 3x4bit operands, 4x3bit addr modes); the next is 64 bits (OVM), with 4 form bits (and 5x8bit fields, 5x4bit addr modes; fields are opcode, generic polymorphism type specifier, 3x operands; addr modes are as in BootX? plus another bit for boxed/unboxed).

Oot Bytecode instructions come in (64-bit-aligned) chunks of 64 bits. Each 64-bit chunk is either in SHORT, MEDIUM, or LONG format. The first one or two bits (MSBs) are 'format bits' which indicate whether the format is SHORT, MEDIUM, or LONG. If the first bit is 0, then the format is SHORT. If the first two bits are 11, then the format is MEDIUM. If the first two bits are 10, then the format is LONG. That is to say,

SHORT encoding format:

4 opcode bits + 3 x (2 addr mode bits + 2 operand bits)

(note: every 64 bits, the first opcode bit must be "0")

MEDIUM encoding format:

2 form bits, 2 implementation-dependent bits, 4 x (3 addr mode bits + 12 operand-data bits)

LONG encoding format:

TODO mb add csexp and XML stuff, see ootAssemblyNotes13.

LONG format is a variable-length instruction format. Then length of LONG instructions range from 64-bits to 2048-bits (128 16-bit words). The length of a LONG instruction is always a multiple of 64-bits.

LONG consists of a stream of 16-bit words. This is organized into:

The first word is the initial LONG header:

The amount of trailing padding can be inferred by rounding up 'length' to the next multiple of 4, and subtracting the given 'length' from that number.

LONG_TYPE:

In general, LONG_TYPEs below 64 may be skipped.

A phrase header word is of one of two formats, distinguished by the value of their most-significant bit:

Single-word phrase header format:

Multi-word phrase header format:

When the payload-is-phrase bit is 0, this means that the word immediately following the header is a 16-bit literal constituting the 'payload' (role 'operation/payload') of the phrase; if the phrase length is longer than 1, then the next word after that must be a (sub-)phrase header. When the payload-is-phrase bit is 1, this means that the word immediately following the header is a (sub-)phrase header (and the payload, if there is one, will be a subphrase will role 'payload', or with roles 0-15).

To encode a payload literal longer than 16 bits (available choices are 32 bits, 64 bits, 128 bits, and 256 bits), set payload-is-phrase to 'true' and then encode the payload within a subphrase with some role 0-15 (the role here indicates the amount of padding within the subphrase). Note that these subphrases must have leading padding as needed so that the payload literal is aligned. Phrases with roles 0-15 must always have 'payload-is-phrase' set to 0 and must not have any subphrases.

A sub-phrase is just a nested phrase and has the same format as a phrase.

The 'phrase role' defines the semantics of the phrase with respect to its parent phrase (or with respect to the entire instruction, if top-level). (A subset of) 32 roles are currently defined:

roles 0-15:

roles 16-31:

A null single-word phrase header (that is, one whose value as an integer is 0) is typically used for padding. Note that any phrase header whose value as an unsigned integer is less than 2048 has the most-significant-5-bits set to 0, meaning that it is a single-word phrase header of role 0, and can be ignored.

A top-level 'operation' role is like the opcode field of a MEDIUM instruction (field 1). A top-level 'input' role is like the input operands of a MEDIUM instruction; to give multiple operands, have multiple phrases with the 'input' role, with the 'rightmost' one first (eg the first phrase with an 'input' role corresponds to field 4 of a typical MEDIUM instruction, and the next phrase with an 'input' role corresponds to field 3 of a typical MEDIUM instruction). A top-level 'output' role is like the output operands of a MEDIUM instruction, that is, the 'dest' field (field 2).

(todo: what about instructions with atypical numbers of inputs and outputs? in that case, do we follow instruction semantics, or stick with MEDIUM field correspondence?)

A top-level 'context' role is like the 'context' field (field 0) of a MEDIUM instruction.

To encode an immediate constant operand, just place the constant literal as the payload of a phrase with a suitable role (eg to encode an immediate literal opcode, create a phrase with an 'operation' role and place a word with the opcode itself as the payload of this phrase).

To encode an operand with some other addr mode, set the operand as the payload, but include a sibling phrase of role 'addr mode' with the desired addr mode as payload.

To encode a CALL instruction with many arguments, just give many phrases with role 'input'.

The semantics of the other roles (annotation, both input and output, modality, constraint, conjunction, alternate format, grouping) have not yet been defined.

Addressing modes

addr modes: 000: immediate constant 001: register direct (note: in SHORT mode, register direct applied to DATASTACK or CALLSTACK (registers 0 and 1) operates on registers 4 and 5 instead; in other modes, it is simply illegal to apply register direct to DATASTACK or CALLSTACK). Similary, during execution of a custom instruction in SHORT mode, register direct on register 2 redirects to register 6. 010: register indirect 011: stack / register indirect with the value in the register post-increment on read, pre-decrement on write

100: constant table 101: boxed register direct 110: boxed register indirect 111: boxed stack / register indirect with the value in the register post-increment on read, pre-decrement on write

With the side-effecting addr mode (post/pre increment/decrement, 0011), if there are more than one in the same instruction, then for the primitive instructions, they are processed in order of right to left.

Note that only addressing modes 000-011 are available for the SHORT opcode operand.

immediate constant: the integer contained in the operand is the value. If an immediate constant is given as an argument to an instruction which expects a writable effective address, then a 'dummy' effective address is generated, but writes to it are discarded. Note: a read from a 'boxed' immediate constant is a constant table read.

constant table: the integer contained in the operand is an index into the module's constant table. If an constant table is given as an argument to an instruction which expects a writable effective address, then a 'dummy' effective address is generated, but writes to it are discarded.

register direct: the operand is the effective address

register indirect: the operand is interpreted as an address, and the effective address is the value found at that address. Note: This address mode provides the functionality provided by LOAD/STORE in RISC architectures.

indirect-offset: The operand is split into two operands; let A = the first 6 bits, and let B = the second 6 bits. A is interpreted as an address; let C be the value found at that address, where C is to be interpreted as an address. The effective address is C + B. Note: this addressing mode provides efficient access to fields within a struct, where a pointer to the struct is in one of the first 64 memory locations, and the struct has up to 64 fields.

stack (indirect with read post-increment, write pre-decrement): This is like 'register indirect', except that if the effective address is read from, the value at the address in the operand is incremented after the rest of the instruction executes, and if the effective address is written to, the value at the address in the operand first incremented before the effective address is determined.Note: This addressing mode provides the equivalent of POP and PUSH commands when applied to a stack pointer with a downward-growing stack, and also can be used for scanning an array in a loop.

boxed addressing modes: whenever a value is read or written, a 'boxing function' is called; in the case of a read, it is passed the value and returns the value, in the case of a write (to a location other than the DISCARD location, 0), it is passed an effective address and is expected to write to that address. todo: how to determine the boxing function? The two best choices seem to be: per-program, or per-type(class). but even if per-program, doesnt the boxing fn need to be told the type, so that it can add in a type tag? what about preserving boxed state over fn calls, doesn't that imply that the boxer is a wrapper rather than a getter/setter; or are Oot Views implemented some other way?

(note / design choices / motivations: why provide stack addressing if there is also register addressing? (a) a way to segregate 'temporaries' from 'local variables' (b) natural data structure for some language-y things, such as call stack (c) a way to hold more things in memory than there are registers with efficient allocation/deallocation, as long as you only use the ones near the top of the stack)

Opcodes

Opcodes 0-15 are 'primitive'. Opcodes 16-127 are either unassigned or custom. Opcodes 128-255 are sugar for first-class-function-style CALLs to stdlib functions.

Custom instructions

The implementation for a custom instruction cannot be recursive; it cannot call other custom instructions that directly or indirect call itself; that is, the static call graph (or dependency graph) between custom instructions must be acyclic.

The implicit 'call stack' of custom instructions has a depth limit of 4; a custom instruction cannot call a custom instruction which calls a custom instruction which calls a custom instruction which calls a custom instruction.

The implementation of a custom instruction, if any contained custom instructions are recursively expanded until only primitive (Oot Bytecode Core) instructions remain, must take up less than 2048 Oot Bytecode Core instructions.

These restrictions allow an Oot Bytecode implementation to implement custom instructions by:

Note that these restrictions don't apply to all dynamic first-class-functions, just to custom instructions. Custom instructions (like core instructions) are invoked by an address mode of 0000 (immediate constant) on the opcode operand. Ordinary first-class functions are invoked by other address modes on the opcode operand.

(todo: also, can a 'custom instruction' call an ordinary function in the standard library? surely it can CALL an address passed into it, so the question is not whether it can CALL at all, it's just, is it allowed to have prior knowledge of the standard library functions? i'm leaning towards yes; this isn't much different from allowing it to access numerical constants in its module's constant table)

Custom instructions do not necessarily execute atomically; care should be taken to write custom instructions in a threadsafe manner. Stdlib custom instructions are threadsafe unless otherwise noted.

Global constants

Constant 0 is nil. Constant 1 is the current module.

Primitive types

Note: type safety is extremely important for the 'capability' type because an implementation could break capability security if it could forge capabilities by implicit coercion from other types. Therefore, items of type CAPABILITY can only appear in the CAP register or in address spaces created by CAPMALLOC.

Stdlib types

todo: move this to a separate file

these types used in initial loading:

(note: even though typeclasses and types (abstract and concrete types) are both listed here, they are distinct in that a type signature like

(==) :: (Eq a) => a -> a -> Bool

cannot be satisfied if the two inputs are different concrete types. )

(note: in general, actual function signatures use type constraints and type variables, like Haskell; just using abstract and concrete types directly cannot represent the difference between

(==) :: (Eq a) => a -> a -> Bool and (==) :: (Eq a, Eq b) => a -> b -> Bool

eg, whether it is a typechecking error to compare ints and strings for equality)

we could have a shorthand though where these 'types' are used directly in the function signature (would that mean the first or the second of these possibilities?)

also, i guess we could use these 'types' are used in stack maps. That implies that they should be the second possiblity when used in the type signature, eg

(==) :: Eq -> Eq -> Bool

is shorthand for

(==) :: (Eq a, Eq b) => a -> b -> Bool

because there can be no guarantee that two positions in the stack map that are both Eq have the same concrete type, just that each of their concrete types is Eq.

)

Memory

There are 256 directly-accessible memory locations or 'registers'.

The first 8 registers have special meanings:

todo: frame pointer? (%rbp vs %rsp)

todo: really?: It is illegal for user code to access the DATASTACK or CALLSTACK registers using REGISTER addressing mode (register indirect is allowed though); this is because the stacks might not be implemented using memory-mapped stacks. Similarly for R3 if during the execution of a 'custom instruction'.

It is illegal for user code to access the 'reserved for implementation' registers.

Threads and processes

OotB? threads and processes aren't necessarily OS threads and processes. An OotB? thread has its own registers, data stack, and call stack, but shares other memory with other threads in the same process. An OotB?