proj-oot-ootAssemblyThoughts

Oot Bytecode and Oot Assembly

See https://gitlab.com/oot/ovm

Introduction

Oot Bytecode (OotB?) is a simple, easy to implement virtual machine. Oot Core is implemented in OotB?. OotB? only has a few 'primitive' instructions; this language is called 'primitive OotB?'. The OotB? Stdlib contains implementations of other instructions built out of the primitive instructions. This makes it easy to bootstrap a naive implementation of Oot; all you have to do is to implement an interpreter for primitive OotB?.

Motivation

Why did we invent yet another bytecode format? Design goals include:

Popular existing bytecodes are not 'very easy' to port to a new platform, and are too 'low level' to directly preserve high-level language constructs (HLL constructs could be preserved via annotation, but then the annotation format effectively forms another programming language anyways).

Oot Bytecode is very easy to initially implement because it has so few primitive instructions. This naive implementation will be very slow but can be incrementally improved by overriding the default interpretation of various stddlib instructions with custom versions using the target platform's native constructs.

Oot Bytecode preserves high-level language constructs via the 'stdlib' instructions that are built out of the primitives. This allows a compiler to an HLL to emit somewhat readable target code that uses the platform's native data structures just by recognizing those stdlib instructions that have an obvious counterpart in the target platform.

Note that we want to be able to efficiently compile the Oot Core implementation from OotB? to even a very inexpressive, static language. Therefore we try to limit run-time dynamicity in OotB? and in the Oot Core implementation.

Overview

Oot Bytecode is the object language of an imperative virtual machine with very few primitive instructions. We hoped that typical Oot Bytecode will be able to be compiled with relatively little effort into relatively readable code in foreign HLLs.

It is similar to the ISA of a CISC 3-operand memory-memory machine, but it is more abstract (perhaps similarly to Lua bytecode). It abstracts away from details such as data representation format, and its 'memory locations' or 'registers' have semantics more similar to local variable identifiers rather than an array of bytes (in this way it is more similar to eg Lua bytecode). It has operations analogous to pointer arithmetic, but these actually correspond to moving across fields in a struct rather than to indexing into an array of bytes. It is relatively agnostic regarding the actual bit width of integers, pointers, etc; in fact, a single item in the constant table can represent an arbitrarily large object. It is a memory-memory (rather than register) machine, but it can directly address (with absolute immediate addressing) only 4k 'memory locations'; however, opaque pointers can point to other 'memory locations' beyond what can be directly addressed.

It has 8 addressing modes, plus a 'boxed' addressing mode bit. The use of 3-operand format and 8 addressing modes is designed to reduce the need for intermediate instructions moving data between virtual registers, which makes it slightly harder to compile programs in other VM architectures into readable HLL code.

It has 3 instruction encodings, SHORT, MEDIUM, and LONG, all three of which consist of a sequence of 64-bit units. SHORT packs 8 fixed-length instructions into each 64-bit unit; MEDIUM is a fixed-length encoding with 64-bits per instruction; and LONG is a variable-length encoding. MEDIUM is the basic encoding; SHORT is a denser encoding that can only represent a subset of instructions; and LONG is an extensible encoding.

The compilation unit for Oot Bytecode is an Oot Bytecode Module. Modules have important limits as a consequence of length limitations in the 64-bit MEDIUM encoding; most notably, the number of functions directly exported by one single module cannot exceed 2048 (a single module can, however, also export other modules, and collectively these can export more than 2048 functions).

Oot Bytecode has polymorphic typing and also has first-class functions and has a function calling syntax in which the opcode field of the instruction, instead of containing a number identifing a Oot Bytecode Core instruction, contains a reference to a memory location which itself contains a reference to a particular function. This is accomplished by giving an addressing mode to the opcode (hence, the opcode field is called the 'opcode operand').

In addition, many integer opcodes are assigned to refer not to Oot Bytecode Core primitive instructions, but rather to 'custom instructions' implemented in the Oot Bytecode Standard Library. These opcodes do not need to be implemented by an Oot Bytecode implementer; they need only implement the Oot Bytecode Core instructions, since the Library is ultimately defined in terms of those primitives. Advanced implementations can treat the custom instructions as compile-time macros and expand and inline them at compile-time; or they can replace the provided implementations with platform-native implementations.

Oot Assembly, the textual language, also has a separate compile-time macro facility, which are expanded during compilation to Oot Bytecode. Why have 'custom instructions' if you already have compile-time macros? Because Oot Bytecode 'custom instructions' can more easily be replaced by with platform-native implementations by an interpreter.

SootB

todo now it's the same as SHORT, plus 64-bit LOADI (16-bit immediates) with restricted addr modes, and maybe a STOREI that can write to 11-bits of registers

SootB? is a simple stack-machine single-threaded subset of OotB?. SootB? is designed to be really easy to implement an interpreter for. There is a compiler from OotB? interpreter to SootB? implemented in SootB?, so this allows you to bootstrap OotB? and hence Oot from just a SootB? interpreter. This will be slow and you will probably want to eventually extend your SootB? interpreter into a full OotB? interpreter, but in the meantime you'll have a working Oot installation. Since SootB? is a subset of OotB?, you can work on extending it incrementally (however you won't experience any speed gains until you are all done, at which point you can throw out the provided OotB?-on-SootB? interpreter and replace it with your own native OotB? interpreter).

In the following, TOS stands for 'top-of-stack', the value on top of the stack.

A SootB? machine has program code, a program counter, a stack, a linear memory, and an error register. It is untyped. Each location in the stack and the memory can hold a value which can be treated alternately as an unsigned short integer (which can at least hold values between 0 to 32767, inclusive), or as a pointer. Pointer arithmetic and arithmetic between pointers and integers is permitted.

Program inputs including any file descriptors needed for console I/O are placed on the stack before starting the program. The program terminates when a HALT instruction is encountered.

SootB? instructions:

Instructions must be 64-bit aligned. Note that ANNOTATE, J, BRZ, BRNZ, LOAD, STORE, SYSCALL, and LOADI are encoded as 16 bits (4 bit opcode and 12 bit data), and everything else is just 4-bits (just the opcode).

The code of the initial program is not necessarily accessible via memory. However, you can allocate memory and put code in it and jump into that.

Program code does not necessarily have to fit in 4k, but you can only absolute jump to the first 4k of memory. You could have a jump table and LOAD an item from it and then JPOP to that, though. Similarly, by storing a pointer in the 4k you can load/store beyond 4k, but you can only directly load/store to the first 4k.

Annotation types:

some syscalls:

In more detail:

todo: we need some way for GC to query for a list of pointers in any memory subspace, or at least if a given memory location contains a pointer

Instruction encoding:

Each instruction is encoded as a fixed-length 64-bit chunk, given below:

todo

How SootB is a subset of OotB

Each SootB? instruction is also an OotB? instruction:

OOTB imports and loading

Behavior inherited from Oot

how module loading works

To minimize PATH manipulation, the PATH must be manipulated via the path_insert() and path_remove() functions. To minimize filesystem queries, the contents of filesystem directories in the PATH may be examined and cached at any time (they may or may not be examined lazily rather than at program start); you must call the system path_cache_invalid(subpath) function if you need to discard part of this cache after a possible mid-runtime change in the contents of these directories.

The init() functions of modules are run at some point before the namespace exported by a module is accessed by other modules, but possibly lazily, in parallel, and in any order. A module may import a module that imports it (cyclic imports are allowed). However, the init()s of modules must not cyclicly depend on one another.

If a module is imported by multiple other modules, these may or may not be realized as separate instances of the module.

Authentication

A module is considered 'weakly authenticated' or just 'authenticated' if it is signed by a public key trusted by the implementation (typically this will include keys given on the commandline and keys in a persistent keystore), and if all of its imports are authenticated (or, in the case of cyclic imports, if all modules in the cycle are signed by a public key trusted by the implementation). It is considered 'strongly authenticated' if it is authenticated and all of its non-official import statements include an inline public key or public key hash, and all of its imports are strongly authenticated (or, in the case of cyclic imports, if all modules in the cycle meet these conditions).

Oot implementations may include commandline switches to determine:

Instruction encodings

todo: the following is a little out of date. Now the shortest format is 8 bits, with 2 form bits (Bootx; 64 opcodes which are just shortcuts for longer fully specified instructions); the next is 16 bits (Boot), with 1 form bit (and 4 bit opcode, 3 bit operand, 2x4-bit operands); the next is 32 bits (BootX?), with 3 form bits (and 5 bit opcode, 3x4bit operands, 4x3bit addr modes); the next is 64 bits (OVM), with 4 form bits (and 5x8bit fields, 5x4bit addr modes; fields are opcode, generic polymorphism type specifier, 3x operands; addr modes are as in BootX? plus another bit for boxed/unboxed).

Oot Bytecode instructions come in (64-bit-aligned) chunks of 64 bits. Each 64-bit chunk is either in SHORT, MEDIUM, or LONG format. The first one or two bits (MSBs) are 'format bits' which indicate whether the format is SHORT, MEDIUM, or LONG. If the first bit is 0, then the format is SHORT. If the first two bits are 11, then the format is MEDIUM. If the first two bits are 10, then the format is LONG. That is to say,

SHORT encoding format:

4 opcode bits + 3 x (2 addr mode bits + 2 operand bits)

(note: every 64 bits, the first opcode bit must be "0")

MEDIUM encoding format:

2 form bits, 2 implementation-dependent bits, 4 x (3 addr mode bits + 12 operand-data bits)

LONG encoding format:

TODO mb add csexp and XML stuff, see ootAssemblyNotes13.

LONG format is a variable-length instruction format. Then length of LONG instructions range from 64-bits to 2048-bits (128 16-bit words). The length of a LONG instruction is always a multiple of 64-bits.

LONG consists of a stream of 16-bit words. This is organized into:

The first word is the initial LONG header:

The amount of trailing padding can be inferred by rounding up 'length' to the next multiple of 4, and subtracting the given 'length' from that number.

LONG_TYPE:

In general, LONG_TYPEs below 64 may be skipped.

A phrase header word is of one of two formats, distinguished by the value of their most-significant bit:

Single-word phrase header format:

Multi-word phrase header format:

When the payload-is-phrase bit is 0, this means that the word immediately following the header is a 16-bit literal constituting the 'payload' (role 'operation/payload') of the phrase; if the phrase length is longer than 1, then the next word after that must be a (sub-)phrase header. When the payload-is-phrase bit is 1, this means that the word immediately following the header is a (sub-)phrase header (and the payload, if there is one, will be a subphrase will role 'payload', or with roles 0-15).

To encode a payload literal longer than 16 bits (available choices are 32 bits, 64 bits, 128 bits, and 256 bits), set payload-is-phrase to 'true' and then encode the payload within a subphrase with some role 0-15 (the role here indicates the amount of padding within the subphrase). Note that these subphrases must have leading padding as needed so that the payload literal is aligned. Phrases with roles 0-15 must always have 'payload-is-phrase' set to 0 and must not have any subphrases.

A sub-phrase is just a nested phrase and has the same format as a phrase.

The 'phrase role' defines the semantics of the phrase with respect to its parent phrase (or with respect to the entire instruction, if top-level). (A subset of) 32 roles are currently defined:

roles 0-15:

roles 16-31:

A null single-word phrase header (that is, one whose value as an integer is 0) is typically used for padding. Note that any phrase header whose value as an unsigned integer is less than 2048 has the most-significant-5-bits set to 0, meaning that it is a single-word phrase header of role 0, and can be ignored.

A top-level 'operation' role is like the opcode field of a MEDIUM instruction (field 1). A top-level 'input' role is like the input operands of a MEDIUM instruction; to give multiple operands, have multiple phrases with the 'input' role, with the 'rightmost' one first (eg the first phrase with an 'input' role corresponds to field 4 of a typical MEDIUM instruction, and the next phrase with an 'input' role corresponds to field 3 of a typical MEDIUM instruction). A top-level 'output' role is like the output operands of a MEDIUM instruction, that is, the 'dest' field (field 2).

(todo: what about instructions with atypical numbers of inputs and outputs? in that case, do we follow instruction semantics, or stick with MEDIUM field correspondence?)

A top-level 'context' role is like the 'context' field (field 0) of a MEDIUM instruction.

To encode an immediate constant operand, just place the constant literal as the payload of a phrase with a suitable role (eg to encode an immediate literal opcode, create a phrase with an 'operation' role and place a word with the opcode itself as the payload of this phrase).

To encode an operand with some other addr mode, set the operand as the payload, but include a sibling phrase of role 'addr mode' with the desired addr mode as payload.

To encode a CALL instruction with many arguments, just give many phrases with role 'input'.

The semantics of the other roles (annotation, both input and output, modality, constraint, conjunction, alternate format, grouping) have not yet been defined.

Addressing modes

addr modes: 000: immediate constant 001: register direct (note: in SHORT mode, register direct applied to DATASTACK or CALLSTACK (registers 0 and 1) operates on registers 4 and 5 instead; in other modes, it is simply illegal to apply register direct to DATASTACK or CALLSTACK). Similary, during execution of a custom instruction in SHORT mode, register direct on register 2 redirects to register 6. 010: register indirect 011: stack / register indirect with the value in the register post-increment on read, pre-decrement on write

100: constant table 101: boxed register direct 110: boxed register indirect 111: boxed stack / register indirect with the value in the register post-increment on read, pre-decrement on write

With the side-effecting addr mode (post/pre increment/decrement, 0011), if there are more than one in the same instruction, then for the primitive instructions, they are processed in order of right to left.

Note that only addressing modes 000-011 are available for the SHORT opcode operand.

immediate constant: the integer contained in the operand is the value. If an immediate constant is given as an argument to an instruction which expects a writable effective address, then a 'dummy' effective address is generated, but writes to it are discarded. Note: a read from a 'boxed' immediate constant is a constant table read.

constant table: the integer contained in the operand is an index into the module's constant table. If an constant table is given as an argument to an instruction which expects a writable effective address, then a 'dummy' effective address is generated, but writes to it are discarded.

register direct: the operand is the effective address

register indirect: the operand is interpreted as an address, and the effective address is the value found at that address. Note: This address mode provides the functionality provided by LOAD/STORE in RISC architectures.

indirect-offset: The operand is split into two operands; let A = the first 6 bits, and let B = the second 6 bits. A is interpreted as an address; let C be the value found at that address, where C is to be interpreted as an address. The effective address is C + B. Note: this addressing mode provides efficient access to fields within a struct, where a pointer to the struct is in one of the first 64 memory locations, and the struct has up to 64 fields.

stack (indirect with read post-increment, write pre-decrement): This is like 'register indirect', except that if the effective address is read from, the value at the address in the operand is incremented after the rest of the instruction executes, and if the effective address is written to, the value at the address in the operand first incremented before the effective address is determined.Note: This addressing mode provides the equivalent of POP and PUSH commands when applied to a stack pointer with a downward-growing stack, and also can be used for scanning an array in a loop.

boxed addressing modes: whenever a value is read or written, a 'boxing function' is called; in the case of a read, it is passed the value and returns the value, in the case of a write (to a location other than the DISCARD location, 0), it is passed an effective address and is expected to write to that address. todo: how to determine the boxing function? The two best choices seem to be: per-program, or per-type(class). but even if per-program, doesnt the boxing fn need to be told the type, so that it can add in a type tag? what about preserving boxed state over fn calls, doesn't that imply that the boxer is a wrapper rather than a getter/setter; or are Oot Views implemented some other way?

(note / design choices / motivations: why provide stack addressing if there is also register addressing? (a) a way to segregate 'temporaries' from 'local variables' (b) natural data structure for some language-y things, such as call stack (c) a way to hold more things in memory than there are registers with efficient allocation/deallocation, as long as you only use the ones near the top of the stack)

Opcodes

Opcodes 0-15 are 'primitive'. Opcodes 16-127 are either unassigned or custom. Opcodes 128-255 are sugar for first-class-function-style CALLs to stdlib functions.

Custom instructions

The implementation for a custom instruction cannot be recursive; it cannot call other custom instructions that directly or indirect call itself; that is, the static call graph (or dependency graph) between custom instructions must be acyclic.

The implicit 'call stack' of custom instructions has a depth limit of 4; a custom instruction cannot call a custom instruction which calls a custom instruction which calls a custom instruction which calls a custom instruction.

The implementation of a custom instruction, if any contained custom instructions are recursively expanded until only primitive (Oot Bytecode Core) instructions remain, must take up less than 2048 Oot Bytecode Core instructions.

These restrictions allow an Oot Bytecode implementation to implement custom instructions by:

Note that these restrictions don't apply to all dynamic first-class-functions, just to custom instructions. Custom instructions (like core instructions) are invoked by an address mode of 0000 (immediate constant) on the opcode operand. Ordinary first-class functions are invoked by other address modes on the opcode operand.

(todo: also, can a 'custom instruction' call an ordinary function in the standard library? surely it can CALL an address passed into it, so the question is not whether it can CALL at all, it's just, is it allowed to have prior knowledge of the standard library functions? i'm leaning towards yes; this isn't much different from allowing it to access numerical constants in its module's constant table)

Custom instructions do not necessarily execute atomically; care should be taken to write custom instructions in a threadsafe manner. Stdlib custom instructions are threadsafe unless otherwise noted.

Global constants

Constant 0 is nil. Constant 1 is the current module.

Primitive types

Note: type safety is extremely important for the 'capability' type because an implementation could break capability security if it could forge capabilities by implicit coercion from other types. Therefore, items of type CAPABILITY can only appear in the CAP register or in address spaces created by CAPMALLOC.

Stdlib types

todo: move this to a separate file

these types used in initial loading:

(note: even though typeclasses and types (abstract and concrete types) are both listed here, they are distinct in that a type signature like

(==) :: (Eq a) => a -> a -> Bool

cannot be satisfied if the two inputs are different concrete types. )

(note: in general, actual function signatures use type constraints and type variables, like Haskell; just using abstract and concrete types directly cannot represent the difference between

(==) :: (Eq a) => a -> a -> Bool and (==) :: (Eq a, Eq b) => a -> b -> Bool

eg, whether it is a typechecking error to compare ints and strings for equality)

we could have a shorthand though where these 'types' are used directly in the function signature (would that mean the first or the second of these possibilities?)

also, i guess we could use these 'types' are used in stack maps. That implies that they should be the second possiblity when used in the type signature, eg

(==) :: Eq -> Eq -> Bool

is shorthand for

(==) :: (Eq a, Eq b) => a -> b -> Bool

because there can be no guarantee that two positions in the stack map that are both Eq have the same concrete type, just that each of their concrete types is Eq.

)

Memory

There are 256 directly-accessible memory locations or 'registers'.

The first 8 registers have special meanings:

todo: frame pointer? (%rbp vs %rsp)

todo: really?: It is illegal for user code to access the DATASTACK or CALLSTACK registers using REGISTER addressing mode (register indirect is allowed though); this is because the stacks might not be implemented using memory-mapped stacks. Similarly for R3 if during the execution of a 'custom instruction'.

It is illegal for user code to access the 'reserved for implementation' registers.

Threads and processes

OotB? threads and processes aren't necessarily OS threads and processes. An OotB? thread has its own registers, data stack, and call stack, but shares other memory with other threads in the same process. An OotB? process has its own memories, unlike threads; it also (like threads) has its own registers and stacks.

Each thread and each process has an integer ID, called a TID and PID respectively. Distinct threads are guaranteed to have distinct TIDs, and distinct processes are guaranteed to have distinct PIDs, but that's all that is guaranteed about them.

New threads and new processes are created via FORKTHREAD and FORKPROCESS, respectively. These take an argument which is an address to JMP to; the new child thread/process will start out as a clone of the old parent, but then the child will JMP to the given address whereas the parent will continue by executing the next instruction after the fork instruction. They return the TID or PID of the child process.

Different processes and tasks may or may not execute in parallel. An implementation must guarantee, however, that even if one process or tasks goes into a tight infinite loop, that all other processes and tasks still progress.

When callbacks are present, there are no guarantees on when they may be called, and they may or may not execute in parallel on a separate thread.

Primitive instructions

Primitive opcodes range from 0 to 255 (although right now there are only TODO of them). The first 8 of these are reserved for metaprogramming.

Primitive instructions execute atomically.

0: comment 1: annotation (some kinds of annotation: code type annotation; memory type annotation; variable name annotation, plus how many instructions this name annotation is in scope for) 2: source map 3: assertion ?: are there prefix modifiers? (mb: 4: prefix annotation (?: i don't like this because the implementation must know its semantics in order to execute the code)) 5: block begin/end (some kinds of blocks; stereotyped HLL idiom (execute this); data (skip this); code block (skip this)) 6: alternative translation block (eg CALL as a single LONG instruction vs CALL as many medium instructions; but in general every alternative block may span multiple instructions). Perhaps this has the effect of an unconditional JMP to the last alternative block, for those interpreters that can't understand the other alternatives 7: ?

8: J (note: the operands are added together; if constant addressing mode, they refer to the LABEL constant table) (is the dest operand used as a link register, eg put the instruction after the current one into that? probably) 9: BZ (branch if zero) 10: BNZ (branch if not zero) 11: CPY (copy) 12: ADD-U15 (addition) (todo should this just be polymorphic ADD?). The result of ADD-U15 must satisfy result < 2^15, otherwise it may crash or trap or or yield the wrong result. 13: SUB-U15 (subtraction). The result of SUB-U15 must satisfy 0 <= result, otherwise it may crash or trap or or yield the wrong result. 14: LEQ (<=) 15: EQ (arithmetic equality) 16: RECV: recv(dest, channel, callback). If 'callback' is 0, then blocking I/O. todo: how about just an address to write '1' to upon success? 17: SEND: send(channel, msg, callback). If 'callback' is 0, then blocking I/O. todo: how about just an address to write '1' to upon success? 18: SYSCALL: syscall(fn). Call external library function. This tells the implementation to use platform native calling conventions to 'fn', which is a reference to a native library function. The arguments to fn are found on the data stack, and after the call the result, if any, should be placed on the data stack.

Note that the result of ADD-U15 and SUB-U15 must satisfy 0 <= result < 2^15, otherwise they may crash or trap or or yield the wrong result.

??: POLL: poll(dest, channel, timeout) ??: todo if instead of callbacks, OPEN and RECV etc write a '1' to some memory location upon finishing, then can we use POLL to look for these?

?: OPEN: open(dest (channel output), filepath, callback) (note: 'filepath' may also be an integer file descriptor, may refer to channels as well as files, etc). If callback is 0, then blocking. todo: how to handle file mode? encoded in filepath? filepath is actually a tuple? callback is actually a tuple? todo: how about just an address to write '1' to upon success? ?: CLOSE: close(channel, callback). If callback is 0, then blocking. todo: how about just an address to write '1' to upon success? ?: LOADMOD: loadmodule; loadmod(dest, module_name) ?: UNLOADMOD: unloadmodule; unloadmod(module)

(note: SUB-U15V is not needed because we can use LEQ to test if y <= x when considering doing (x - y)) ?: SEQ (structural equality) ?: LOG (todo) ?: NOP (no-op) ?: HALT (end program) ?: BREAK (break to debugger) ?: BAD (illegal instruction) ?: MALLOC and MFREE (MALLOC requires the 'm' capability. MALLOC conveys the capabilities to the new memory by putting them in the CAP register. All capabilities are conveyed unless the process lacks the 'e' capability, in which case 'x' will not be conveyed.) ?: MLEN (how many memory locations are in the given subaddress space?). ADDRSIZE(dest, pointer)

?: CAS. Compare-and-swap. CAS(dest, old, new); compares the contents of 'dest' with 'old' and if they match, sets 'dest' to 'true' sets ERR to 1 if the compare failed, that is, if 'dest' did not hold 'old'; sets ERR to 0 if the compare succeeded. This is all atomic because all primitive instructions are atomic. ?: LOADL. Load-link. LL(dest, src). Like CPY except that in addition, the platform begins 'watching' the src address. Which address is being 'watched' is threadlocal. Only one address may be 'watched' at a time by any given thread; if another LOADL is executed, the old address is no longer watched. ?: STOREC. Store-conditional. STOREC(dest, src). Like CPY if (a) the address 'dest' is being 'watched' because it was 'src' in the most recent LOADL invocation on this thread, and (b) no thread has altered 'dest' (regardless of whether the value was changed back later) since the most recent LOADL on this thread; if these conditions are true, the value from 'src' is written to 'dest', and 0 is written to ERR; otherwise, 'dest' is not written to by this instruction, and 1 is written to ERR.

??: FORKTHREAD(tid (dest), addr_to_jump_to_in_new_process). Requires the forking process to have the 'm' permission. ??: FORKPROCESS(pid (dest), addr_to_jump_to_in_new_process). Requires the forking process to have the 'm' permission. ??: GETTID(dest) ??: GETPID(dest)

?: CAPMALLOC, CAPMFREE: create/release a new subaddress space that can hold capabilities. CAPMALLOC requires the process to be holding 'c' capability and the 'm' capability. ?: CAPMLEN (like MLEN for CAPMALLOC'd memory) ?: CAPUNION: capability union: CAPUNION(dest, cap1, cap2) ?: CAPONLYADDR: from a capability, create a new capability that only contains those permissions relating directly to the address subspace containing the given pointer. CAPONLYADDR(dest, cap, pointer) ?: CAPONLYPERM: from a capability, create a new capability that only contains those permissions in a certain type mask. CAPONLYPERM(dest, cap, permission_type_mask). ?: CAPGETPERM: query the permissions granted by a capability for the subaddress subspace containing the given pointer (returns a 16-bit permission_type_mask). CAPGETPERM(dest, cap, pointer) ?: CAPGETADDRS: query a list of all subaddress spaces (directly) contained in a capability, and create a (CAPMALLOC'd) address subspace containing them, returning this new subaddress space in a pointer. CAPPUSHADDRS(dest, cap). (note: this is a neat trick to create a low-level list, but it means that accesses to this list won't be boundschecked by higher-level code at typechecking time. I am a little uncomfortable with that. I suppose we can encapsulate this with a function returning a higher-level list later. But in fact, i'm somewhat uncomfortable with the very idea of subaddress spaces with dynamic size. I guess this can't be helped if we have dynamic-size MALLOCs, though?) ?: CAPSEAL: 'seal' a capability to an address, rendering it unusable except by CAPUNSEAL or CAPJMPSEAL. CAPSEAL(dest, cap, address). The purpose of this instruction is to allow an untrusted 'middleman' to transport capabilities without being able to use them. ?: CAPUNSEAL: unseal a sealed capability; requires 'u' permission on the address subspace containing the seal address. CAPUNSEAL(dest, cap) ?: CAPJMPSEAL: JMP to the given address, after first replacing CAP with the union of the contents of the given (potentially sealed) capability CAPEXE and the (potentially sealed) capability CAPDATA, provided that (a) both CAPDATA and CAPEXE are sealed to the same address, and (b) CAPEXE has (potentially sealed) execute permission on the address that is being JMP'd to. CAPJMPSEAL(addr, CAPEXE, CAPDATA). Note that 'u' unseal permission is NOT required. If there is an error preventing the JMP from completing, CAP will not be affected. The purpose of this instruction is to allow certain areas of memory to only be operated upon via calling into certain code.

??: EXEC_INSTR (from stack)

(note: ordinary CPYs can swap capabilities in and out of the CAP register, but a CAP can only be stored in an address subspace created with CAPMALLOC; all of these primitives must check to guarantee that capabilities are only being input from and output to either the CAP register or an address subspace created by CAPMALLOC)

(note: CAPDIFF and CAPISECT (set difference and intersection) and CAPCALLSEAL can be non-primitives)

Bootstrapping and VM plugins

In order to make it easy to write an OotB? interpreter, much of the functionality of OotB? has been implemented in OotB? in the form of custom instructions, syscalls, and a standard library. These components are loaded by the interpreter before program execution begins. This means that the only thing that an implementor must implement is (a) primitive OotB?, and (b) this bootstrapping system. The code of these components is part of the OotB? specification and is not modifyable by conforming OotB? programs or modules. The purpose of this is just to make it easier to port OotB? to a new system, not to allow users to metaprogram OotB?.

Although there will surely be some use-cases for implementations that allow the user to replace the provided components with others, for the most part, advanced interpreters can and should eventually override this 'plugin' system with an efficient native implementation of equivalent functionality, if developer time allows. Note however that changes to the code of these components will not itself be considered a backwards-incompatible change to OotB?, provided that the change does not necessitate any backwards incompatibility observable by end users (even though any such change may require advanced interpreters, which have replaced the component in question with an optimized native implementation, to make deep changes to those native implementations). As OotB? stabilizes, we might someday change the policy and begin considering any change to the code of these components as backwards-incompatible.

There are three files, loaded in order:

custom_instructions.ob0 and syscalls.ob0 share a format given below. stdlib is in the format of a OotB? module file.

The format for custom_instructions.ob0 and syscalls.ob0 is:

All byte ordering is little-endian. The file must be <64KiB?.

Custom instruction signatures must be one of the following:

TODO should we also put some jump tables in here? the SYSCALLS might be able to use that

TODO do we need a signature for each instruction so that stack addr mode knows which operands to decrement and which to increment and which to do both (=neither)?

More about custom_instructions.bin:

Note that 'requested stack depth' should be used to give the precomputed maximum stack depth required for the 'hidden stack' (since the custom instruction CFG is acyclic, a finite maximum depth can be achieved if each custom instruction has a maximum local stack usage).

Limitations on code in custom_instructions.bin:

Execution environment of custom_instructions.bin:

More about syscalls.bin:

The following syscalls are typically included:

TODO: can't we at first just put most of these into vmexec_instr for easier bootstrapping?

The following syscalls are considered 'vm plugins': vminit, vmexec_instr, vmlea*, vmsea, vmjmp, vmtimer, vmshort, vmlong, vmloadk, vmsyscall, vmterminate. This means that they are invoked automatically by the VM in the course of processing ordinary code (but not while processing code inside a syscall implementation). In conjunction with the other syscalls in syscall.bin, these are responsible for implementing 'address subspaces', cababilities, pre-emption (via vmtimer), decoding of SHORT and LONG encodings, module management, etc.

Limitations on code in syscalls.bin:

Execution environment of syscalls:

The syscall global address space: The implementation provides a special global address space for use by syscalls (16 memory locations). The syscall global address space may contain arbitrary random data and should be initialized by vminit. It is not expected that this is the only global memory available; rather, syscalls are expected to MALLOC more blocks of memory as needed and store a root set of pointers to MALLOC'd blocks from these 16 global locations. todo: perhaps to avoid hidden state, we should make this part of the register set, and just forbid r/w from it unless ur in a syscall? sort of like the CAP register, i guess?

Each syscall is called with the following on the hidden stack: (a) the return address, and (b) the arguments, and (c) a pointer to the syscall global address space. The following instructions and syscalls have special behavior:

Note that although ordinary CALLs (including first-class functions) are caller-saved, SYSCALLs are not. SYSCALLs must avoid unintentionally clobbering registers.

At program start, vminit is called. When not inside a syscall, instructions are executed by calling vmexec_instr. All reads from memory go thru vmlea, all writes to memory go thru vmsea, and all jumps (but not relative branches) go through vmjmp. All reads and writes from/to 'special' registers (todo which reg #s are those; these are things like PC, CAP, etc) go thru lear and sear. SHORT encoded instruction groups are executed by calling vmshort and executing the code it returns. LONG encoded instructions are executed by passing control to vmlong. MALLOC and FREE syscalls call the implementations in syscalls.bin. Constant-table addressing is processed via the vmloadk syscall. vmtimer is called every k instructions (where k is positive finite implementation or user-defined constant). vmsyscall is called to dispatch syscalls. vmterminate is called upon abnormal program termination (and is ordinarily called by HALT). These syscalls are all called through the 'vmsyscall' call.

More about stdlib.oob:

stdlib.oob is loaded via the 'loadmodule' syscall, into module 1. When told to load to module 1, Loadmodule should check if module 0 has already been loaded (via checking a flag that it maintains in the global syscall address block); if module 1 has been loaded before, it should return with an error. If module 1 has not been loaded, it should load the stdlib, but bypass signature checking, and then if the load is successful, it should store a reference to the stdlib into the global syscall address block, and globally mark module 1 as previously loaded. This procedure allows signature checking to be skipped for the stdlib (since the stdlib will be requested to be loaded by the implementation, which is trusted, before program start).

Capabilities security

Primitive capabilities (must be implemented by any primitive OotB implementation)

There is a 'capability register' CAP, which holds a CAPABILITY (which is a set of 'permissions'). Unboxing is capability-aware in the following sense; upon unboxing a capabilty input, the current value of CAP is saved on an internal stack, and CAP is replaced by CAP unioned with the new capability; upon the end of this instruction, however, the internal stack is popped and the previous CAP is restored.

The current capability set determines whether, for each subaddress space, memory addresses in that subaddress space may be:

Other bits in the capability are not per-address space (that is, a process can use these permissions when the contents of the CAP register contains them):

permission_type_mask is a 16-bit mask.

The primitive OotB? implementation must be sure that all reads check the 'r' capability, all writes check the 'w' capability, and all inter-subaddress-space jumps/branches/executions check the 'x' capability.

There is a TRANSITIVE 'Access' relation on subaddress spaces; if one subaddress space X 'has Access to' or 'Access'es another one Y, then if you have a capability on X it applies to Y too.

After calling MALLOC, the CAP register contains full permissions on the newly allocated memory. Note that none of the other capability instructions provide any way to increase the permissions in a capability unless you already have another capability with the other permissions.

Note that any executing code always has full permissions on the 'register' address space. todo: and beyond the top-of-the-stack for the data stack eg it can push new items and read what it pushed? for both stacks?

todo: a way to create subaddress spaces for the stack(s) 'up to some point'.

Non-primitive (boxed) capability-augmented pointers (implemented by the stdlib on top of primitives)

Boxed pointers may carry capabilities, holding the following data:

Array bounds are represented by the subaddress space, not by the capability; so narrowing such bounds is done by creating a new subaddress space (which is a view onto a subset of an existing one).

Introspection

Constant 0 is an object holding information about the current module and implementation. This information is constant over each invocation of the program. It includes:

ABI, calling convention, stack convention

Oot Assembly doesn't fix an ABI or calling convention in the sense of a specific, standard way for OotB? binaries to be called into from outside code at a level below the FFI. For ease of interoperability, it is expected that each Oot implementation will adopt the most popular ABI on their target platform. However, within the Oot Assembly model, we have CALLs and we have a stack, so we need to define a convention for calling within OotB? code. This does not mean that implementations 'actually' follow this ABI on the metal, just that the simulated OVM uses this convention within the VM being simulated.

OOTB File format

An Oot Bytecode module file has the following format. It is little-endian.

The filename must match the modulename (except that on platforms that support it, an '.OotB?' extension must be added at the end), except on weird platforms where this isn't practical. Modulenames must start with a letter and contain only ASCII lowercase letters, numbers, and underscore. The length of every import path must be less than 256 (so modulenames should be well under 256). Motivated by Windows compatibility [1], the following module names are forbidden: CON, PRN, AUX, NUL, COM1, COM2, ..., COM9, LPT1, LPT2, ... LPT9.

Header

The fixed-length header contains everything needed to determine whether this OotB? module can satisfy some import request given in another module (assuming that this module is valid; that is checked later) (except for the name of the module, but the loader already knows that since that must match the filename). It contains the module version, custom attributes, and public key of the signer, if any. It also contains the timestamp of the HLL source code from which this module was built, to allow the implementation to dynamically recompile the OotB? module if its source has changed.

Header::TOC

The TOC lists offsets to the other parts of the module. This is to allow the module to be loaded in some way other than proceeding through each section one by one. The ordering of the other file segments is fixed, and each segment (except for the last segment, the code segment) is preceded by some sort of length, so the loader can also skip the TOC and just load everything else in order.

Each entry in the TOC is 4 bytes; it is an offset from beginning of file to beginning of the specified segment, or 0 if the specified segment is empty (even empty segments take up 2 bytes of space to say that they are of 0 length).

todo:

Filename

Signature

Import table

These entries are numbered starting with 1; number 0 refers to this module.

Each entry:

module import:

file import:

do we allow exclusion of specific versions?

Note: like Ruby Bundler or Rust Cargo .lock files, the point of the 'preferred' minor version is to give a version for all of the dependencies that the developer knows is jointly available and works together. No patch version is given because the system still prefers the highest available patch version (so as to support security updates); if the highest available patch version is 'bad' then the user will have to pick a different one on the commandline.

Constant table

Each entry:

todo: some items in the constant table will be imported from other modules; have a field with an index into the import table to specify the source module in that case

Note: the constants in the table will be loaded into constant 1 and up, because constant 0 is reserved for introspection on this module and on the runtime.

todo:

items: * ?type? * 2 bytes: 1/2 of length in bytes (note: min value is 1, max value is 2047; so min length is 2 bytes, max is 4094 bytes) * n bytes: data note: there should be a way to import an external file (at compile time) into a single constant. This will have no length limit.

all memory segment types:

binary blob:

imported constant (note: after import, the type of this item will be whatever the type of the imported item is):

mmapped file resource:

Function table

todo: need a way for functions to declare that they won't use more than n registers and that they won't use more than m stack space (directly/per invocation; if they recursively call themselves they could use more overall)

Initial memory segment table

each entry is of the form:

Segment 1 is the program (type 'code segment'), segment 1 is the registers (type 'registers'), segment 2 is the data stack ('read/write memory segment'), segment 3 is the call stack ('read/write memory segment').

The segment content consists of two 16-bit fields for each location in memory. The first 16-bits are interpreted as an 'address mode'. 0 and 1 are immediate or constant table. Other address modes 2-15 are illegal here, for now. The second 16-bit field is either an immediate value or an index into the constant table.

To encode a pointer to the beginning of another memory segment, the index to that memory segment (that is, an index into this table) is put into the constant table, and then the corresponding constant index is encoded here. Note that the constant table memory segment entries can also contain an offset, to construct pointers targeting the middle of a segment.

The purpose of this table is to minimize startup latency by allowing the module to 'preload' an initial state.

The reason this is separate from the constant table is to avoid cluttering the constant table with these segments, which are probably large and which will probably not be referred to after initialization.

Code segment

This is the last segment in an OotB? file and has no length header.

Oot Assembly format (textual format)

Note that if you want to port Oot to a new platform, it is not necessary for you to implement an assembler that can read this format. All you have to do is implement Oot Bytecode. An assembler has already been implemented in Oot Bytecode.

The motivation for a textual assembly format is merely to provide a standard way for implementors to manually read and write Oot Bytecode without using a hex editor. Programmer conveniences are limited, as it is expected that most of the time implementors will not be writing Oot Bytecode directly.

todo: probably want to add S-expr representation for 'subexpression' addr mode

The Oot Assembly format makes the constant tables implicit.

An Oot Assembly file consists of ASCII text. Its syntax is 'line oriented'; a 'line' is delimited by either the character '\n' or the character ';'. However, a backslash followed by a newline is ignored (removed from the Oot Assembly file at an early stage of processing), and a semicolon inside of a string literal is treated as an ordinary character. A double backslash is treated as a single backslash. A single backslash is often treated as a control character sequence. Whitespace, consisting of the characters [ \t\n\v\r\f] space, horizontal tab, newline, vertical tab, carriage return, and form-feed, are separators are otherwise ignored, except for \n which terminates lines as described above.

The '#' character is used for single-line comments; the '#' character and anything following it, up to the end of the line, is ignored. (these are NOT included in the Oot Bytecode target using the 'comment' opcode).

Empty lines (including lines consisting of nothing other than whitespace and comments) are ignored.

A line's type is determined by its first non-whitespace character, which is one of the following:

variable)")

Identifiers match the regex "[a-zA-Z_][a-zA-Z0-9_]+". Labels and constants are identifiers.

Directives are:

Label declaration are either:

Ordinary label identifiers must not start with a lower case letter. Ordinary labels must be unique within a module (and cannot have the same name as a compile-time constant) and are mapped to integers and included in the modules's label constant table in the generated Oot Bytecode. Relative labels are resolved at compile-time into PC-relative offsets.

Non-anonymous local labels must be unique within the module. Anonymous local labels need not be unique.

Literals are:

Predefined compile-time constants are:

Label references look like:

Variables look like either:

Ordinary lines look like:

instruction_name operand1, operand2, operand3

By convention, operand1 is the 'destination' or 'output' or 'write' operand and operand2 and operand3 are the 'input' or 'read' operands, but some instructions don't have 2 inputs and 1 output and hence don't follow this convention. Actually, the opcode is itself in also an operand, which we call the 'opcode operand' or 'operand0'.

To syntactically indicate an immediate or constant mode operand, just write the constant as a literal (the assembler will automatically identify constants which don't fit in immediate mode and put them in the module constant table).

To syntactically indicate a register mode operand, just use a variable.

todo much of the following is out of date.

For example, the following has register mode for the first (destination) operand, and immediate mode for the second and third (input) operands:

ADD x, 2, 2

To syntactically indicate a register indirect operand, give a variable prefixed with an asterisk ('*'). For example:

ADD *x, 2, 2

To syntactically indicate an indirect-offset operand, give '*(' followed by a variable followed by a '+' followed by an unsigned decimal integer or hex literal in the range [0, 63] followed by ')':

ADD *(x+20), 2, 2

To syntactically indicate an indirect-read-post-increment-write-pre-decrement operand, use the form PUSH or PUSH(variable) or --variable for operand1 or POP or POP(variable) or variable++ for operand2 or operand3; if '(variable)' is omitted, then '(DS)' is assumed:

ADD PUSH, POP, POP

To syntactically indicate an indirect-read-post-decrement-write-pre-increment operand, use the form ++variable for operand1 or variable-- for operand2 or operand3; if '(variable)' is omitted, then '(DS)' is assumed:

ADD ++x, y--, z--

To syntactically indicate an subexpression mode operand, use the form (@label) (or () to indicate a 'subexpression return'):

ADD (), 2, (@COMPUTE_SECOND_ADDEND)

(todo: is there a way to directly do something like a subexpression addr mode on a 0-ary first-class function? would we we'd like an addr mode for that? of course we can always put the first-class function in a labeled block and call it)

Unboxed address modes are used by default. To use boxed (eg addr modes with the MSB set; addr modes 8-15), prefix with $. Eg:

ADD $(), $2, $(@COMPUTE_SECOND_ADDEND)

yields addr modes 15, 8, 15 instead of 7, 0, 7.

Instruction_name can be a numerical opcode or constant instead of an instruction name identifier.

To get the numerical opcode or constant refering to an instruction, prefix the instruction name with &. For example, to copy the numerical opcode for ADD into variable 'x', use:

CPY x, &ADD

To write an instruction with an addressing mode other than immediate or constant, replace instruction_name with a parentheses-surrounded operand expression. For example, to add 2 and 2 and place the result into variable 'y', use:

CPY x, &ADD
(x) y, 2, 2

todo:

Check out:

and some specifics:

are there subexpressions, blocks?

Limits

There are only 2k directly-accessible memory locations or 'locals' or 'registers'.

Implicit lexical nesting via the 'subexpression' addressing mode must be acyclic and is limited to a depth of 255.

Within any one module, there can be only:

(numbers 0-2047 of each of these are reserved for the Oot Bytecode Stdlib)

Note that only the first 4k of these (including the 2k Stdlib ones) are directly accessible in MEDIUM mode.

Within any one function, only:

In the textual Oot Assembly format, some assemblers may have the capability automatically creating implicit structs and breaking modules apart into many submodules so as to give the illusion that some of these limits do not exist.

usage tips

Three ways to programatically generate Oot Bytecode:

todo, finish this 'spec'

todos


(copied from [2])

The idea of 'alias' and 'value' addressing is that each memory cell has two 'levels'; the 'alias' level, which might contain a pointer (or actually, maybe an entire 'alias record', which is a pointer with some attached metadata), or nothing is the cell is not a symlink but an actual value; and the 'value' level, which in the case of non-symlinked cells contains the contents of the cells, and which in the case of the symlinked cells looks like it contains the contents of the symlink target. So when you use value addressing on a symlink cell (that is, one with a non-empty alias level), you are really manipulating the cell that is the symlink target; and then you use alias addressing on this cell, you are reading or mutating its symlink record.

In order to CREATE an alias, a third addressing mode is needed, because you want to take the 'address' of a cell, and place it into the alias level of a different cell. Instead of 2 different alias-and-addressing relating modes, we could have some special instructions. Note that an 'address-of' mode could not be written to, only read, so maybe that one should be omitted?

We also need a sentinel to represent an empty alias cell. We can use 0, which would mean that the PC cannot be aliased, which is no great loss.

Also, is this 'alias level' identical to the 'meta level' above a 'perspective' (view), or do we need yet another mode for that? i'm guessing it's a field within the metadata in the meta level

if we used an 'alias' addressing mode, and a special instruction/opcode for GET_ADDRESS_IDENTIFIER and CREATE_ALIAS then:

If we wanted to look at or manipulate metainformation in the alias record, we would first move the alias record itself into an ordinary value cell, by using issuing a CREATE_ALIAS instruction whose input had alias addressing (r) and whose destination had value addressing (*r) (this create an alias from the destination cell to the alias record which is controlling the input cell; if we wanted to alias a destination cell to the same symlink target as a source cell, we would instead do CREATE_ALIAS with value addressing in the input (the more typical case). (note: if this is what we're doing, tho, then 'alias' addressing in CREATE_ALIAS's output is useless, which seems like a waste). Note: this is irregular in that the CREATE_ALIAS and GET_ADDRESS_IDENTIFIER opcodes would interpret value addressing different from every other opcode.

A more regular solution would be to use an address-of addressing mode, and to have GETALIAS and SETALIAS operations. To create an alias from r4 to r3, you would SETALIAS(input= address-of(r3), destination= address-of(r4)). To create an alias from r4 to whatever r3 is directly aliased to, GETALIAS(input=address-of(r3), destination=r1); SETALIAS(input= r1, destination= address-of(r4)). Etc. I like this better.

A downside there is that address-of can't go in the destination operand. But i guess that's good, it means we have a spare 1/2 addressing mode. hmm, mb too clever, but we could use that in place of SETALIAS... presumably SETALIAS will be somewhat common, but GETALIAS will not. to clear a symlink, we can copy 0 in using this addressing mode

Note that SETALIAS is a special case of setting .__GET and .__SET.

File format todo


at some point we'll want to have, written in Oot Bytecode, not just the OotB? stdlib, but also an OotB? disassembler, assembler, and API for OotB? manipulation.

MB look to YASM's library for guidance on that stuff.

---

note similarity of primitive instruction set to https://en.wikipedia.org/wiki/Little_man_computer and https://en.wikipedia.org/wiki/CARDboard_Illustrative_Aid_to_Computation and https://github.com/jarcane/MicroMini/blob/master/docs/MicroMini.txt

---

if the second goal of Oot Assembly is to serve as a generic AST format that preserves HLL intent, then we want a clear way to represent an HLL call even if it has more than 2 arguments; so we want (1) an annotation to mark the beginning of a CALL block that pushes arguments onto the stack and then consumes them with a CALL, and (2) a stereotyped form for such a block. and what to do about keyword arguments?

also, generalize this; mb give it its own 'annotation' opcode, in addition to the usual one (and in addition to the 'prefix annotation' one and in addition to the 'data block (skip this)' one, if any, and in addition to the 'comment' one)

or, should we just use LONG format for this sort of thing (eg for CALLs with more than 2 arguments)? i guess that would mean finishing the LONG format.

---

it's essential that the interpreter know where all pointers are at all times, so that it can resize the stack and the heap and move stuff around as it does so.

if things are statically typed, great, but when things are of type DYN, must make sure that there is a pointer map (like in Haskell) at all times -- this must be a required component of boxing, i guess, and DYN must always be boxed of course (to have the type tag, and now also to have the pointer map).

---

should custom instructions execute atomically? Maybe use STM there.

---

todo: at some point we need to add a way ('GRANTADDR'?) for one subaddress space to ACCESS another. And a way to delete those grants later. And a way to list them.

---

todo LOADL requires an effective address in both the 'input' and the output.

---

some sort of local .constant definition so as to reserve space in the constant table when a constant is only used in one place (so that the whole constant table has more chance of fitting in cache)?

---

in case thread or process forking isn't available, consider a non-cloning variation on forkthread and forkprocess

---

i'm still going back and forth between:

the arguments for the 'hooks' are:

the arguments against the 'hooks' are:

perhaps after implementing one of them i'll feel that it is or isn't too complicated.

---

have to think carefully about FFIs; that's one thing that i don't know much about that could have a large impact on the design, since a big part of the point of making OotB? easy to implement on various platforms including HLLs is to make interop easy with these HLLs.

---

TODO need a way to load a struct corresponding to metadata for any imported module (and remember, module 0 is self, 1 is stdlib). This allows you to get first-class-functions for module fns, and to access more than 256 exports.

---

ok so even if the stack is not mmapped, we still need a way to save it to the heap, and to swap in another stack saved in the heap (and create a new stack in the heap of a certain size, but assuming that the format of the stack when saved on the heap is not opaque, we can just use MALLOC for that)

we also probably need a way to get the current height of the stack

---

ok, the primitive instruction set and instruction encoding is coming along. Things that remain to be done:

(copied from proj-oot-ootAssemblyNotes13).

---

Recall that the goal here is to be able to directly execute untrusted Oot Assembly code, sandboxing it only via placing restricted capabilities in the Capabilities register before calling it (via SEALCALL).

---

regarding a program which writes instructions into an addr_subspace and then wants to execute them. The program should have to 'load in' the addr_subspace (either as raw, or as ob0), using one or more SYSCALLs. These SYSCALLs may not be available on every platform. Eg if you compile Oob to assembly for some embedded platform, then most likely you will not be able to compile dynamically-generated Oob to assembly on the embedded platform at runtime.

---

JZREL: in imm2. # conditional relative jump (branch if 'in' is zero). Note: imm2 input is signed. 1 is added to non-negative inputs (b/c 0 is just an infinite loop, not useful), so range is (-32768:-1 unioned with 1:32768)

---

i guess the constant table can be unified with the function table and the jump table

---

  1. memory leak when new registers are defined and then finished with, yet still referenced in addr_subspaces
  2. this is actually a little tricky since in theory Oob code could capture the registers and try to use them later. How to restrict that? Or do we just use refcounting or tracing GC? TODO

---

ordinary code shouldn't have execute permissions on a CODE pointer within custom syscalls

---

so we have 1 addr mode bit left (2 of them specify 1 of 4 basic addr modes, immediate, register, register indirect, and RESERVED; the other one specified BOXED or unboxed).

when this bit is set, it could force us into REGISTER INDIRECT mode regardless of the other bits. The BOXED bit still does its thing, but the other 2 bits are now used to specify a displacement from +1 to +4 (no +0 because ordinary indirect mode already does that)

---

i should probably re-describe the relationship between Boot, Ovm, and the Ovm encodings.

Boot is a dead-simple VM. Ovm runs on top of Boot. In addition, Ovm is an extension of Boot, in that the Boot instruction encoding is the same as the Ovm SHORT instruction encoding.

Ovm provides all the various instructions that Boot appears to leave out (eg signed integer arithmetic and floating point arithmetic). Ovm is more extensible and more powerful than Boot, with addressing modes, custom instructions and subaddress spaces, dynamic generic instructions, and first-class functions.

Ovm supports three instruction subencodings: SHORT (which is identical to Boot), MEDIUM (the 'main' one), and LONG (an extensible but more complex encoding).

Both Boot (=Ovm SHORT) and Ovm MEDIUM have fixed-length instruction encodings in which each instruction can be interpreted independently of the others. This is why, in MEDIUM encoding, we don't have quoted Raw bytestrings of binary data, or 64-bit absolute jump constants after a JMP instruction, or 64-bit LOADIs, or meta-commands that switch instruction sets or encodings. LONG can have that sort of thing. Ovm LONG has things like grouping constructs, quoted commmands, transaction begin/ends, 64-bit LOADIs and jump constants, and quoted raw binary data.

---

toread

http://www.atmel.com/Images/compiler.pdf

---

in lowEndTargetsUnsorted2 i analyzed the histogram of how much memory various microcontrollers (aka MCUs akak uCs) have on Digikey. My analysis is:

recall also that:

Why do we care about fitting in cache when we're not performance-oriented, or microcontrollers when we're not a systems language? The reason is that my idea is that programs written in Oot will be running on massively parallel machines, machines which have at least tens of thousands of processors, with each processor being relatively small and slow. We need to make sure that Oot's runtime fits in the small memory space of each of these processors. Since consumer machines like that don't exist yet, we have to guess what the memory limit will be.

Note that https://jaycarlson.net/microcontrollers/ defines microcontrollers (aka MCUs aka uCs) as "processors with completely self-contained RAM, flash, and peripherals". Is that really what we want to target? No, probably not -- we want LOTS of CPUs, for cheap (which probably implies high power efficiency too) -- we don't care if each one has peripherals (although we probably do need each one to have its own memory). It would probably be cheaper/more efficiency to maufacture and distribute a ton of CPUs, each with their own memory, at once, without giving each one its own peripherals. So MCUs are just a guideline, not an actual target.

There are some MCUs, like the Propeller (8 cores) and XMOS (2-24 cores?) and Epiphany (18 cores in Parallela board, 1024 cores in Epiphany-V chip), which all multicore. These would at first seem like good targets, but actually we want MASSIVE parallelism, which is probably very different from 8-core parallelism -- eg the XMOS has some stuff to help the cores synchronize and communicate and share memory and schedule threads, and without knowing much about it, i bet this stuff wouldn't scale from 8 cores to 64k cores very easily, just like the cache coherency in PC multicores won't scale. Presumably the Propeller (8 cores) the same, although i'm not sure. Fans of the Propeller and the XMOS say that the point is not really parallel processing (for that, they recommend just running multiple threads on one higher-end core), but rather to do multiple real-time things with hardware without bothering with interrupts; the Propeller and the XMOS have less peripherals than other MCUs but the idea is that you can devote some of the cores to acting as peripherals; and the reason avoiding interrupts is good is that it makes it easier to prove that the system will be deterministic and will meet timing requirements (also it's simpler to understand).

However, it might be worthwhile to learn about the programming paradigm offered to programmers who write programs for the Propeller and the XMOS, because presumably this is a good paradigm for concurrency. (note: XMOS seems to be the ususal message-passing over channels, plus some thread synchronization barriers (are those just for cores on the same tile?), plus maybe some event stuff that i havent had time to look at yet)

The Epiphany is presumably more scalable -- they claim it should scale to a billion processors -- but so far there hasn't been evidence of more than 1024. Supposedly even the Parallela boards can be connected to each other: "The architecture also allows parallella boards to be combined into a cluster with a fast inter-chip 'eMesh' interconnect, extending the logical grid of cores (creating almost unlimited scaling potential)."

People talk about FPGAs but they also say that it's much, much harder to program these, compared to MCUs. I assume that with FPGAs you really have to get into Verilog/VHDL, RTL, and all of the timing and layout complexity (and hardware programming language non-intuitiveness) that that implies.

In the end the most cost-effective thing to do will probably end up being to use GPUs. GPUs are mostly SIMD but, at the cost of inefficiency, i think we can stil use them to run many instances of Turing-complete state machines. Another alternative is Adapteva/Parallela/Epiphany.

The Epiphany-V local stores are 64k.

"so interesting numbers relating to amount of memory available per core are:

...

GPU program size limits:

https://www.opengl.org/discussion_boards/showthread.php/182182-Shader-Program-size-limit https://www.reddit.com/r/opengl/comments/2n8mjz/size_limits_for_shaders_on_various_gpus/ https://stackoverflow.com/questions/2617957/glsl-maximum-number-of-instructions

" -- lowEndTargets/gpus.txt

and i think the upshot is:

to summarize even shorter:

note: the tiled CPU-node, memory-node, network-node design at the end of ootAssembly19 only gives each CPU node 4k of local memory (plus 1k of non-local working memory at each of the 4 neighboring network nodes). So 8k 16-bit words is possibly 4x too much; we might really only have 4k bytes; otoh in that design each memory node also has 7k of fabric memory, and each CPU node is next to 4 memory nodes, so each CPU node could potentially be allocated another 28k -- but we should leave that for the user application to use, esp. since allocation isn't guaranteed. Let's split the difference; we'll rely on each CPU node being able to use the 4k of non-local working memory for our language in that design, so now we have 8k BYTES, not 8k words, of RAM; but this includes memory that needs to be used for implementing any fabric platform VM stuff, so again, the core interpreter stuff should fit in 4k BYTES. That's pretty tight; we may have to give up on multithreading implementation at the lowest level (and instead implement multithreading at a higher level). Otoh the interpreter code can go in 'ROM', this RAM just has to fit the intrepreter stack etc.

---

OK so as per the discussion in ootAssemblyNotes22 for the built-in 'chip' networking (imagine this as the on-chip network of the tiled/fabric architecture discussed in the previous section and in ootAssemblyNotes19):

messages:

channels:

shared memory blocks:

Note that these numbers are all derived from the 16-bit word size choice; if addresses are one word, that's 64k nodes; if there are 64k nodes, then for a data structure with one bit for every addressable node to fit within one message payload, the payload size must be >= 4k.

Error correction/reliability/exactly-once-delivery is implicitly provided and may be assumed by the application, except for possible dropping of async messages; this is supposed to be like hardware/IPC, not like external network transmission (if you wanted to access a 'real' (external) network then you may need more than 16-bit addresses anyway).

Incoming packets may or may not be buffered. These buffers are opaque to the application (the buffer size cannot be observed or changed, and the application cannot tell if there are currently any packets queuing in its incoming buffers). The possible presence of a buffer is noted only because it affects the semantics of the network.

Special fixed-length message types (from the message size discussion in ootConcurrencyNotes7):

signals:

64-bit messages:

(alternate, probably better, idea: 64-BYTE messages, with 16-byte header (64-bit src addr, 64-bit dest addr) and 48-byte payload; this allows an interior/wrapped packet to have ample room for a 32-byte header and 16-byte payload, or a 16-byte header and 32-byte payload; note however that a single Ed25519 signature (512-bits/64-bytes) would still be too large to fit in the payload)

---

todo (from ootAssemblyNotes19):

(what other modes do other ISAs have? scaled offsets come to mind)

so, we have three options:

---

In boot we were going to make everything 16 bits but many platforms don't support anything smaller than F32. So maybe other the floats should be F32 or we should support float arithmetic of at least 16 but not guarantee that it won't work like f32 or even f64 or higher

---