Difference between revision 10 and current revision
No diff available.---
so the current proposal for OA (Oot Assembly):
EXTENSIBILITY
This proposal allow us to start with a tiny, very easy to implement set of primitives, and then expand them using the language itself, rather than to make a tradeoff between ease of porting and expansiveness of instruction set.
There are various types of 'custom' metaprogrammabilty, eg custom ops, custom types. i think this means that ops, types etc should be classified into:
My language may be a little sloppy below, since i made up that terminology right now, after writing the rest.
ENCODING FORMAT
Instructions are encoded in a sequence of 64-bit frames.
2 format bits + 62 bits depending on format
The format bits are as follows:
In the MEDIUM format, each 64-bit frame represents a single instruction. The MEDIUM format is the workhorse of OA. Most OA code can be straighforwardly represented in MEDIUM format. It is 3-operand register-based with addressing modes. You might call it a '4-operand format' because the opcode itself has an associated addressing mode, allowing 'first-class functions' to be stored in memory and called in one instruction (provided they take 2 arguments).
In the LONG format, many 64-bit frames may be combined to represent a single instruction or "sentence". The LONG format is an extremely extensible format which can represent each operand as a 'phrase' which may contain various modifiers and subphrases; this is an extensible AST serialization format. It is intended to be used when modes, modifiers, and metadata must be attached to instructions or to individual operands within instructions, as well as to allow OA to be extended to cleanly represent other programming styles.
In the SHORT format, one 64-bit frame represents 8 instructions. The SHORT format is intended to be used to concisely represent the calculation of single expressions. It is stack-based.
SHORT FORMAT
1 format bit + (2 bit addr mode + 5 bit operand) + 7 x (1 subformat bit + 2 bit addr mode + 5 bit operand)
The SHORT format is a 64-bit packed grouping of 8 sub-instructions, each one 8 bits long except for the first one, in which the first bit was consumed by the format bit (which is '0').
Because the leaf and interior nodes of expressions produce values which are consumed exactly once, and because the interior and top nodes of expressions consume their inputs (as opposed to assigning them to a variable from which they might be re-used in some other expression later), expressions can be represented in a concise RPN form in which there is a sequence of various operations which are all implicitly assumed to take their inputs from, and push their inputs onto, the stack. The SHORT format provides for arguments to be PUSHed onto the stack from any of the first 16 registers, for a sequence of any of the first 32 opcodes to be called with stack-implicit arguments, and for a single value produced at the end to be optionally POPed from the stack and stored into any of the first 16 registers. Stretches of code which require no control flow operations besides function calling and value selection, no more than the first 16 registers, no more than the first 32 opcodes, no more than the first 4 addressing modes, and no metadata or modifiers, could conceivably be written entirely in this format.
The first instruction is a 'PUSH' instruction which pushes a value onto the stack. The next 6 instructions are each either PUSHs or operations which take their arguments from, and push their results to, the stack. The last instruction is either POP instruction which pops a value from the stack, or another operation.
In the first subinstruction, the addressing mode and the operand specify the data which is to be PUSHed onto the stack.
In the next six subinstructions, the subformat bit chooses between 'PUSH' or 'OPERATION'. In the case of PUSH, the addressing mode and the operand are used as in the first subinstruction, to specify the data which is to be PUSHed onto the stack. In the case of OPERATION, the addressing mode and the operand are used to specify an opcode which is executed.
In the last, 8th subinstruction, the subformat bit chooses between 'POP' or 'OPERATION'. In the case of POP, the addressing mode and the operand are used to specify the effective address to which the result which is POPed from the stack is to be stored. In the case of OPERATION, the addressing mode and the operand are used as in the previous six instructions, to specify an opcode which is executed against the stack.
The inputs and outputs of the OPERATION instructions are all implicitly the data stack (stack0) (eg the inputs are POPed from the stack and the outputs are PUSHd to the stack). Note that OPERATION instructions may consume any number of inputs from the stack (todo must this number be fixed for each operation, or can some eg consume the stack up to some terminator? i'm thinking if not fixed, then known at compile-time at least).
MEDIUM FORMAT
(OLD idea: 2 format bits + (2 custom flag bits + 4 addr mode bits + 8 operand bits) + 3 x (4 addr mode bits + 12 operand bits))
2 format bits + (2 addr mode bits + 12 operand bits) + 3 x (2 custom addr mode bits + 2 addr mode bits + 12 operand bits)
(note: when the 2 'custom addr mode bits' are 00, then the addr mode is fully determined by the 2 addr mode bits; when the 2 custom addr mode bits are nonzero, then custom code is executed; in other words, if we put these together to form 4 total addr mode bits/16 addr modes, then the first 4 of these addr modes are specified and the other 12 addr modes are custom)
OPCODES
The first field is the opcode field. The first four bits of the opcode field are the address mode bits. The addressing mode is applied to the 11-bit operand to generate the opcode field value. The opcode field value is broken down into:
LONG FORMAT
todo
see ootAssemblyThoughts for the previous proposal (um, it's not there anymore? perhaps i moved it to ootAssemblyNotes8?). Note that now that we are stealing 2 format bits from the header, it would have to be modified at least a little. Maybe since we have short formats now we should ditch the 8-bit payload format; that saves 1 bit, we still need another. Also, the addr modes are probably custom, not 'split addressing'. Also, since we have the other formats now, no need to specify anything for role 6.
Also, do we want to add a 'map' that says what the child nodes will be and what their offsets are, rather than just linearly reading and encountering roles and then a stop?
Also, do we want to add backwards offsets at each child node to say how to find its parent?
Both of these ideas make sense if we think of this as an AST serialization rather than as a natural language 'stream encoding'.
Also, the format currently described at ootAssemblyThoughts was designed to permit arbitrarily long sentences, by using 'stops' instead of headers with length. But in the real world, we'd probably prefer to have header, perhaps with a 'combine with next' semantic if the header says the max length.
We might also want to have a map to the starting location (offset) of each child of the current node, and the role of that child. And maybe, if the node has a single parent, the offset back (or forwards) to that parent. Otoh this information is implicit in the data in any case, and who is to say that it is more efficient to store it explicitly; that depends a lot on the application. And besides, i'm not trying to make some complicated red-black tree or anything, i'm trying to make something very simple.
But if we did have a map, it would have to have length limits. And maybe the numbers 4 and 16 are useful here (and possibly 8). If we had a 4-bit role and a 4-bit offset for each child, that's 8 bits in the map, so if we had 8 children we'd already fill up a 64-bit frame. If we had a 3-bit role and a 3-bit offset, and 8 children, that's 48 bits, so at least we have room for a little data here or maybe an offset to the parent. If the length needed for the children exceeds these limits, we can ay 'use the parenthesis construct' which is a 64-bit frame that is mostly just a size of the subtree.
ALL FORMATS
CUSTOM INSTRUCTIONS In addition to custom instructions, there are first-class functions that can be called like instructions via using the addressing mode on the opcode field; this functionality is referred to as 'call3'. Typically the opcode's addressing mode is immediate constant; but by using 'constant table' addressing mode, module functions can be called, and by using the other addressing modes, function pointers sitting in memory can be called (so in this last case, which function is executed is only known at run-time, not compile-time).
call3s are like macros that substitute their 'body' inline whenever they are seen, except with references to the special ARG0 memory location replaced by the effective address indicated by the first operand, and references to ARG1 and ARG2 replaced by either references to the values indicated by the second and third operands, or the effective addresses, depending upon if the assigned opcode is less than 0xC00; and 'JMP' appended to their body whose target is the instruction following their invocation. Custom instructions do create a new 'scope' with respect to label numbers; custom instructions are not aware of and cannot jump to labels outside of their own implementation, and labels within custom instructions are only accessible from jumps within those instructions; if custom instructions containing labels are inline expanded, the labels must be re-numbered so as not to conflict with labels in the calling code.
Some interpreters may implement custom instructions using a hidden stack instead of inline expansion.
Custom instructions may not invoke themselves; and they may not make use of custom instructions defined after themselves. There is no way to recursively call much less 'tailcall' a custom instruction; when you define a custom instruction, please use iteration instead of recursion. This limitation ensures that custom instructions may be inlined (or, that if they are implemented using a hidden stack, that there is a fixed upper bound on the hidden stack depth).
todo: do we place some requirements on call3'd subroutines that they must complete quickly and cannot utilize the underlying platform, for the purpose of being able to use the underlying platform stack as the hidden stack without saving it when we want to return control to the underlying platform?
Custom instructions are declared by a map of opcodes to a tuple (module identifier, function).
Custom instructions must not have cyclic dependencies, and must be defined in order such that when one of them is defined, any custom instructions used within its implementation have already been defined. (the order of the assigned opcodes may be different from dependency order).
The form of first-class functions is the same as custom instruction declarations (with a 0 in the opcode field). If the opcode operand resolves to an integer, then an instruction (built-in or custom) is indicated; if it resolves to a pointer, a first-class function is indicated (note: this means that there must be a 'function type' which can be represented as either an integer or a pointer).
Function declarations include:
Custom addressing mode declarations include:
The first four custom addressing modes are 'direct'; their operands represent registers, and their effective address is that register; these are the only ones whose effective address may be a register.
Custom addressing modes must leave the stacks unchanged (except if they are direct and the register operand is a memory-mapped register). This makes it easier to analyze the program for changes to the stack.
TODO: actually what if we want to use the extra addr mode bits to represent per-operand custom flags? in that case we need to wrap the whole instruction, not just the argument evaluation.
There are also custom flag bits, which are defined by a wrapper that is called in place of and wraps the following op. Like call3s, they can look at the link register to get a pointer to the instruction that they are supposed to wrap. When they are called INTROSPECT_INSTRUCTION is automatically called on the wrapped instruction to break that instruction into (4 operands and 4 addressing modes) and put those parts on the stack. The custom flag implementation can and typically does use a form of EVAL to execute those parts, possibly after modifying them.
Requirements on custom flag bit semantics:
OA does not support variadic instructions.
Self-modifying code per se is not supported, although EVAL is.
ADDRESSING MODES
There are 4 built-in 'core' addressing modes and 12 custom addressing mode slots.
Built-in addr modes are:
'immediate constant' in the output addressing mode with operand '0' means 'discard'. Other operands for 'immediate constant' addressing mode in the output are forbidden, except in certain compile-time metaprogrammy operations, for example IMPORT.
'constant table' in the output addressing mode is forbidden unless the constant referenced is a global static pointer.
some ideas for other addressing modes:
MODULES
Modules must declare custom instructions, custom addressing modes, and their constant table. All of these declarations must be done completely at compile-time. Custom types don't need to be declared; they are just stored as pointers.
There are two places to declare custom instructions; as unused opcode immediates, and as function-typed entries in a constant table. There (probably) is a separate constant table for instructions (ie for when the opcode field has constant table addressing mode). tombdo is there any difference between custom instructions as immediate opcodes, and as function-typed entries in a constant table? I'm thinking that the latter would be more like library functions and the former would be more like new 'instructions' but what is the difference? Perhaps the difference between the two is that the former is hardwired as builtins in the OA implementation and the latter is per-module; or that the former is per-whole-program (allowing faster dispatch) and the latter is per-module. Or that the former can be either OA code or hardwired in the implementation, while the latter can be OA code or an external C library. Or the latter can be used by a module loaded at runtime but the former cannot. Or something. Currently my best guess is that only constant table customs are per-module; the others are either per-program, or part of the implementation (perhaps an interchangable part though, eg each program can specify an Oot 'profile' different ones of which have different selections, and some profiles are implementation-dependent..).
Modules may import other modules. This is done by listing the other module in the imports table within the module info table. There is also a runtime instruction IMPORT. This instruction must use immediate-mode addressing to specify the destination integer identifier (this is to allow the implementation to know what module identifiers are being used at compile-time).
Modules may also import custom opcodes, types, and addressing modes from the other modules into their own. This is done by listing the other module into the metaprogramming imports table within the module info table. There is no way to do this at runtime. Trying to import metaprogramming from different modules that define overlapping opcode, type, or addressing mode identifiers is a compile-time error.
There is a special module BUILTINS. This is meant to be an implementation-dependent module that contains opcodes, types, and addressing modes that might be given special treatment by an Oot Assembly implementation. By tracing imports of opcodes, types, and addressing modes from this module, the implementation can know at compile-time where its optimizations can be applied. The semantics of standard opcodes, types, and addressing modes, even non-core ones, must not be altered in the BUILTINS module's implementation; however BUILTINS can also define non-standard opcodes, types, and addressing modes. (todo is this a good idea? or just autoimport all implementation-dependent builtins to EVERY module?)
The BUILTINS module is auto-imported at module identifier 0. The current module is auto-imported at module identifier 1.
Absolute memory addresses (constant table mode), eg those in the operands of JMPs, are all implicitly module relative (pointers, however, can be shared between modules). Integer JMP addresses refer to a count of Oot Assembly instructions in the module.
Each module has:
Length fields are 16-bits, but their values cannot be more than 4k. Lengths mean how many items each table has (not the number of bytes that the table takes up).
(todo: is this really the best format? should we (just) place the lengths in the tables themselves, or only have lengths and let the pointers be computed?)
Symbol names data is the data which can be stripped for more compact builds. It contains ASCII string names and source maps for the module and for each defined custom addr mode, custo opcode, function declaration.
TYPES
There are some built-in types that don't need to be declared (the first 16, perhaps?).
types todo: probably want at least integer, floating point, string (bytestring or unicode or both?), object?; the INSP of Parrot, and mb also of MoarVM?
another idea for 4 core types: pointer int16 other/custom static type (has already been statically typechecked) dynamic (boxed/wrapped, tagged type; runtime typechecking requested)
an idea for another 12 'base' types:
boolean/single bit unsigned int16 byte float64 arbitrarily large/precise rational unicode string array cons cell/linked list/binary tree graph/hash/dict (opaque, as opposed to AST) function/callable, thunk/promise/future/lazy value AST/expression/block/logic greenthread/process/channel(/actor?)
(note that some implementations might not support all of these types; eg an embedded implementation might not support floating point, arbitrarily large/precise rational, unicode)
OA is typed in the sense that every scope has a type declaration for every register it uses, and every instruction has a declared type signature; there is however a 'dynamic' type. 'dynamic' type means that this location can hold a value of any boxed type. Boxed types have a type tag. Values in locations with 'dynamic' type must be 'coerced' via a COERCE instruction to their actual type before they are moved to locations with their actual type, or passed as arguments to instruction demanding their actual types; the COERCE instruction causes a runtime exception if the value isn't actually the correct type (as determined by its type tag). Type identifiers are 12-bit integers.
There are also a few special memory locations whose type varies by scope, namely the call3 inputs and output registers. In addition, there are stacks; stacks are typed via 'stack maps' like in the JVM. Stack maps are like regexps, and can contain unbounded repetitions of patterns. Instruction type signatures include a specification of their alteration of the stack (in the form of eg: upon entry, i expect the top of the stack to contain an int8 and an int16, which will be consumed; upon exit, the top of the stack will contain an int16 (note: the instruction does not HAVE to consume the inputs, but if it does not, it must list these as outputs)).
(i suppose in the end we also need a way to coerce raw bytes to typed something, for interop?)
Some instructions have polymorphic forms, eg forms whose input type signature includes 'dynamic'. Some of these instructions also have specialized forms, that is, forms where the 'dynamic' in the type signature is replaced by a specific type. In this case, the instruction will have the same mnemonic but a different opcode, and the Oot implementations handles dispatch to the specific form if there is one (by, at runtime, checking for a specific form before each call to an instruction with a dynamic input type signature).
Just like memory locations are typed, the stack is typed. But since the program is allowed to arbitrarily manipulate the stack, we cannot always statically know what the 'type signature' of the stack (the stack map) will be. So, what we do is we say that, so long as the number of things on stack0 is predicatable, and stack1 is only being accessed implicitly by CALLs and RETURNS and their associates, we can infer the stack map (like the JVM verifier does (used to, now i think stack maps are made by the compiler)), but if the program wants to change the stack in a weird unpredictable way, then it has to start including assertions saying "i'm about to add an arbitrary number of things to the stack" and "i'm about to directly manipulate stack1, consuming n elements" (so that the compiler can remember that, at that point in time, the stack-so-far (above the n consumed elements) is known but everything added to the stack on top of that is unknown). When the program is done doing weird stack writes, it can then assert how much it consumed and then the program continues with the rest of the implicit stack map governing (i could probably find a clearer way to write that...). And if it wants to read the unpredictable regions of stack0, or any of stack1, then each time before reading, it must assert that there is no stack underflow, and also the type of the thing it read (which can be 'dynamic'). This allows the program to use the stack efficiently in normal circumstances; to manipulate the stack arbitrarily if it wants to, at the expense of runtime checks; and to do one-time manipulations (such as handling exceptions) which leave the stack in a predictable state after which efficient stack usage can resume.
Although OA is designed for optional static typing, unlike eg the JVM, there may not be any typechecking done at runtime; typechecking is provided as a separate OA program (written in OA). OA users are urged to typecheck any untrusted or newly written code before running it.
(do we allow 'subtypes'? eg can we write a function that says "i'll take anything which is an int or a float" and then assert at runtime that some value is an int or a float, without saying which one?)
Type declarations include:
Values of the 'pointer' type are not necessarily 'pointers' on the underlying platform; they are simply some implementation-dependent reference value that allows dereferencing via the indirect addressing mode, and 'pointer arithmetic' to traverse adjacent fields. A 'pointer' could be represented by a machine pointer although with information about the width of record fields, by an integer into an array of emulated 'memory locations', by an object with methods to 'get' and 'set' its value and to get the 'next' and 'previous' locations, etc. Dereferencing a 'pointer' could involve calling garbage collection read-barriers, and writing to it could involve calling GC write-barriers, all 'under the hood'.
MEMORY
4k local variable locations or 'registers' (or less; in each module header it says how much it needs; perhaps memory sizes are always powers of two (this apparently makes eg pointer tagging easier))
memory map and memory-mapped special locations
0: PC (program counter) (do we forbid writing to this? i think so; we want to make programs use control flow ops like JMP for easier analysis) 1: ERR 2: stack0 push/pop 3: direct read/write access to the top of stack0 (without pushing or popping) 4-7: GPRs 8: stack1 push/pop 9: direct read/write access to the top of stack1 (without pushing or popping) 10: current size of stack 0 11: current size of stack 1 12: MODE 13: ARG0 (call3 operand0/output) 14: ARG1 (call3 operand1/usually input) 15: ARG2 (call3 operand2/usually input) 0x010-0x7FF: general purpose registers (GPRs) (accessible in the opcode field) 0x800-0xFFF: general purpose registers (GPRs) (not accessible in the opcode field)
note: this is not to say that the virtual machine can only access 4k of memory, just that it has only 4k of registers. A pointer can be placed in any of these memory locations, and that pointer can point to somewhere else. Likewise, the PC can point to memory locations outside of this map. These alien memory locations are always considered to be 'higher' than this memory (eg if compared, the alien memory address is 'greater than' 0xFFFF) but don't have to actually be that way in the underlying hardware. The alien memory addresses can be incremented as usual. These 4k locations are the only ones that can be accessed in direct addressing mode; however other locations can be accessed via indirect addressing mode. There are also absolute JMPS, which although 'absolute' are actually relative to each module.
There is no way to get a pointer to a register.
note: there are two stacks, intended to be used as data stack (stack0) and call stack (stack1) but the result is undefined if a program attempts to access beyond the end of the stack; this means that these can be implemented as a single stack
I call these locations 'registers' but really they are more like local variables.
If you do an operation on 'the contents of register 7' but the operation calls for some struct, then it is implicit that register 7 actually holds a pointer to that struct. If the contents of register 7 is typed as a struct (of some specific type), then the semantics of 'copy register 7 into register 8' is to copy the STRUCT (with COW? maybe not, if it is not boxed), not the pointer.
It is illegal and undefined to read from uninitialized registers.
The machine handles much of memory allocation/garbage collection transparently.
MODES
The MODE register sets various modes. It has 1 bits. By default, it starts out in mode 0.
bits in MODE from MSB to LSB:
Aside from 'external' state such as memory and I/O, and the stack, OA itself has no 'hidden state'; the state is visible in the registers.
ERRORS
There are four error handling modes, controlled by the MODE register.
In 'crash' mode, any attempt to write a non-zero value to the ERR register is immediately fatal.
In 'handle' mode, any attempt to write a non-zero value to the ERR register succeeds and then immediately causes the current value of the PC to be pushed onto stack1, and then causes a JMP to the beginning of the program. Note that programs run in this mode should be sure to check the ERR register at the beginning and if it is non-zero, to 'handle' the error in some fashion.
In 'restart' mode, any attempt to write a non-zero value to the ERR register immediately causes the stacks to be emptied and a JMP to the beginning of the program. Note that 'restart' could be emulated by putting a routine to clear the stacks and the ERR register at the beginning of the program and running in 'handle' mode.
In 'tolerant' mode, any attempt to write to the ERR register succeeds.
Note that it is possible for a subroutine to leave the ERR register in a non-zero state by saving the current MODE, setting the error-handling mode to 'tolerant', setting the ERR register, then restoring the error-handling mode. By this method (or by the NONFATALERROR opcode, which does the same thing), a subroutine can return an 'error code' which is (unchanged) upon success and non-zero otherwise, which may be more convenient than returning multiple values via the stack.
CALLING
There are (at least) three explicit CALL instructions, in addition to CALL3 functionality:
LIMITS
Note than although the instructions can only hold 12-bit immediate values, other addressing modes can be used to give larger values. Unless otherwise specified, anywhere an integer may be given, its size may range up to 64k - 1 (and possibly larger in some implementations). Many things have a limit of 4k - 1, however. These include:
(todo: is the 4k limit really needed for any of these?)
OA implementations may also limit the number of allowed labels. The limit must be at least 4k.
FLAGS (lack thereof)
There are two flag bits per instruction.
Note that further use-cases for operand-level flags can be emulated by custom opcodes.
BOXING
in the boxing:
CUSTOM FLAG BITS SUGGESTIONS
ATOMICS AND MEMORY MODEL todo
how about:
but mb we are not even having shared memory at all?? in which case we need some IPC/message passing primitives at least, instead.
NOTES ON SOME INSTRUCTIONS
JMP: if given a pointer, JMP to that pointer (can you JMP to 'data' locations or just locations marked as 'code'?); if given an integer argument, jumps to the n-th Oot Assembly instruction. NONFATALERR, see above
CORE INSTRUCTIONS These instructions must be implemented by someone porting OA. All other 'base' instructions are ultimately reduced to these.
When an instruction 'returns' a value, this means that the contents of effective address of operand 0 is set to this value.
probably not core but still considering:
add-on profile modules:
---
so two things to note: