can we/should we allow CALL3 to have INOUT operands? 2 of them or all 3? maybe CALL3 via just pushing the 2 'input' operands onto the stack? in which case, upon return, it just POPs the stack into the 3rd operand? which would allow a function that wants 2 INOUT operands to leave results on the stack? but that would mean that CALL3 has differing stack behavior depending on what is called, which i don't like.
so i don't think i'll do that. But it's just something to think about.
i guess you could have two different CALL3 addr mode flags to have this behavior, or the normal behavior.
---
in any case, one possible implementation of CALL3 is for it to have 'arguments passed on stack' semantics, so the two input arguments are pushed onto the stack beforehand, and then afterwards the one output argument is popped from the stack and moved into the destination register.
---
macros are expanded inline; one macro which is invoked multiple times will generate multiple instances in the generated code. A label within a macro will generate a unique label for each instance of the macro.
can nested macro invocations refer to macros from outer macro invocations? If so, the compiler will have to pass in the unique label names when calling a nested macro. Perhaps you could let the user handle this by allowing labels to be assigned to variables.
better idea: if the same label is defined more than once, the most recent definition (in terms of static position in the instruction stream) overwrites the others
---
sounds related to my idea of using the extra operand to pass type info:
Tag-free garbage collection using explict tag parameters http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.1106&rep=rep1&type=pdf
---
so, if the number used for passing types has 8 bits, how should an 8-bit immediate 'type' in this field be defined?
8-bits surely won't be enough for all of the types we may ever need, including custom types, struct types, function types, so we can just have a subset, and we'll have to just pass a pointer to a 'type' for others even when the type being passed is statically known (and so could otherwise be an 'immediate').
maybe the largest subset (in terms of bits needed) is function types, so let's consider those first. We'd like to at least handle functions with 2 inputs and one output. The largest number that can be multiplied by 3 and still be 8 or less is 2. So, we have 6-bit immediate function types with 2 bits each for each argument and for the return value. So, for each argument and 4 the return value we can encode 4 types. I think the obvious choices are:
'nil' can be used when we have functions that have less than 2 inputs or less than one output. ('nil' can be another name for what some languages call 'unit' and represent using the literal '()', and for what some languages call 'void')
These function types occupy 64 out of the 256 possible immediate types.
Now, isn't everything really a function type? Because we'll be calling functions and passing these types for polymorphism, so it would make sense for these types to be the function types.
This suggests that we choose the other (256-64 = 192) types to just be the most commonly needed function types, similarly to the 64 8-bit instructions.
However, when we have a pointer to a constant struct representing a 'type', this must be a recursive tree, and what encoding is the leaf of that tree? May as well make this part of the encoding. Which means that we need to reserve a lot of this space for the base types.
Let's reserve a block of 64 for base types.
Now we have used 64 for the base types, and 64 for the function types, and we have 128 left. What about 3-argument functions with only 2 distinct types? Assuming we can swap the 2 arguments, there are two forms of this:
a <- f(b, b) a <- f(a, b)
Each of a and b can take 3 bits, which means that each of the two forms takes 6 bits, which means that the total takes 7 bits. Which is our remaining 128. So these functions can only access (without pointing to a larger, constant struct) the most common 8 out of the 64 base types.
I might suggest:
bool, int32, int64, uint32, uint64, f32, f64, ptr
But doesn't leave any room for: dyn, custom/obj, string.
Also, since we are now talking about the format for the entire 'type struct', not just for the immediate operand, we need a way to specify 'this is a composite type'. We could repurpose 'dynamic' for that. Alternately, we could leave all that for the 64 'base types' (after all, we also need to specify WHICH composite type we have (union or product type? or, homogenous array? hetero array? linked list? struct? tuple? discriminated union (and then: option? simple enum?)? nondiscriminated union? keyword/interned string?)). But this suggests that we really want at least 'dyn' (or whatever we replace it with) in the functions-with-two-same-argument 8 base types. So, in the 8 base types, we should probably give up one or more of:
I suggest giving up 64-bit ints (in the 8 common base types; they can still be in the 64 base types).
Also, we would like these 3-bit common types to be a superset of the 2-bit common types above. So we need nil in there.
So, the 2-bit function operand types are:
And the 3-bit common types are:
We could also consider giving up one of f32 or f64, probably f32, and putting something else in that slot, like dyn or obj (obj could simply be RESERVED or CUSTOM, so as to allow HLLs to efficiently reuse this same system and put their own thing in there, which could be an object type). Or giving up bool. Or giving up both bool and f64.
And the 64 base types must include all composite type constructors as well as actual base types.
for inspiration, maybe see also:
https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(basic_instructions) https://en.wikipedia.org/wiki/Primitive_data_type https://en.wikipedia.org/wiki/Template:Programming_language_comparisons
on the question of whether f32 or f64 is more common, it's contentious: https://softwareengineering.stackexchange.com/questions/305760/why-is-the-most-common-integer-number-32-bits-but-the-most-common-floating-poin
now that i've heard of NaN? tagging, that's more ammo for me to choose f64, given that having a system which is good for implementing an HLL is part of the goal here.
yeah, i like the idea of giving up f32 and adding in either DYNAMIC (which means a type variable to be provided later), ANY (which means a boxed/self-describing type) or CUSTOM (which means the semantics of this are reserved for a HLL building on top of LOVM). Remember, the ones we don't choose here can still be in the base 64 types, we are just choosing which 8 types are available in the shortcut for functions with two operands with the same type. For now let's just say CUSTOM or, maybe better since we haven't quite decided, RESERVED.
i'm tempted to get rid of bool and have two of these choices, but i guess functions with a bool output are pretty common so we want to represent those succinctly. One goal could be for these 8 types to be able to represent every function in Boot, as well as obvious alternatives; dunno if we have any actual bools in Oot, but we do have compare-and-jumps, for which an obvious alternative would be compare (although compare could also be trinary compare, which doesn't output bool; but a compare that was exactly analogous to compare-and-jump would be binary). And we also clearly have both int32 and uint32 in Boot, because i found that i had to describe that in footnotes to some of the instructions.
so the 3-bit common types are:
Out of the 256 one-byte types, we have:
so we have 56 other types. Included in that list might be the above, plus: - f32 - int64 - uint64 - int8 - uint8 - int16 - uint16 - VARIABLE/DYNAMIC (type variable) - ANY (boxed) - unicode string (how terminated?) - bytestring (null-terminated) - bytestring (32-bit length prefixed) - bytestring (8-bit length prefixed) - bytestring (16-bit length prefixed) - fixed point - CODEPTR (as distinguished from an ordinary data pointer) - type - some of these? or is that too crazy? i think it's too crazy: https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats - function
And some composite type constructors: - array (homogenous) - struct (which, here, i guess is the same as tuple/heterogenous array/tree and maybe S-expr if you forget about the linked-list-ness, since any subelement can be COMPOSITE) - linked list (which, here, i guess is the same as S-expr, since any subelement can be COMPOSITE) - discriminated union (which, here, i guess is the same as ADT, since any subelement can be COMPOSITE) - nondiscriminated union - option (yeah this can already be represented but it's probably worth pulling out into its own type) - comptime - annotation
This is only 6 items so far, out of 56 spots, so i think we'll have room for whatever we need (be sure to leave lots of RESERVEDs and CUSTOMs!).
We could have 16 RESERVEDs and 16 CUSTOMs, which means we have 32 slots to fill now, of which the above might be the first 26, which means we'd have 6 left. That's more like it.
i'm sure other language implementations must have stuff like this (internal semi-compact semi-standardized representations of composite data types), what are they? i don't think i've come across it.
i'm worried that this is getting much higher-level than i'd like for LOVM, however. Maybe a lot of this should be bumped up to the OVM level, and just left as CUSTOM here.
for LOVM, maybe all we need are 16 non-composite types, saving spots for COMPOSITE and function types for OVM as above: :
maybe for ??? we let one composite type slip in there, probably bytestring of some sort. Null-terminated or length-prefixed? If length-prefixed, what is the bitwidth of the prefix (i'm guessing 32 bits, since we are 32-bit oriented here in LOVM)? We could have both null-terminated and length-prefixed if we drop int16 or uint16, and if we drop both int16 and uint16 then we can have one other thing too.
So for now let's just say:
yes, i think that's much better. All the composite type stuff is too complicated for the LOVM level.
But now if we can't pass function types, how do we do polymorphism?
One alternative is just to not do polymorphism (in which case we don't have to define types here at all). But another alternative is to do only OOP-style "receiver"-based polymorphism, in which dispatch takes only one primitive type argument. This is enough for what we really want polymorphism for at this level (generic arithmetic ops e.g. addition over all of int32, uint32, f64); and frankly, it's all that some target platforms will support anyhow (and more than many will support, which means that on those platforms we'll have to compile e.g. polymorphic ADD to a dispatcher)
---
how does Wren represent strings? interestingly, it has both a 32-bit length prefix AND a null terminator. Having the null terminator means that it can just directly call C library functions for basic string ops. It also has a hash value:
" A heap-allocated string object. struct sObjString { Obj obj;
// Number of bytes in the string, not including the null terminator. uint32_t length;
// The hash value of the string's contents. uint32_t hash;
// Inline array of the string's bytes followed by a null terminator. char value[FLEXIBLE_ARRAY];}; " -- https://github.com/wren-lang/wren/blob/main/src/vm/wren_value.h
this suggests to me that if we have a bytestring type, that we specify BOTH that:
this lets us use whichever native primitive bytestrings the platform provides (if any).
bytestrings would be just defined as self-describing arrays of bytes (that is, an array of bytes data structure that contains the length of the array in the structure, so that you don't have to pass the length around separately). When you want arrays with zeros in them, the programmer can just use pointers and keep track of the length themself.
---
yknow, regarding the issue of having the 'smallstack' be a register accessible from Boot, necessitating the concept of a Boot 'user code' profile, you could just put the smallstack registers above the 8 registers that Boot can see. My issue with that is that we really would like to do stack manipulation in the compact 16-bit instructions. But:
so, i'm still leaning towards just having a Boot 'user code' profile, but it's something to think about. The second best alternative seems to be: - use the zero registers to save the current smallstack depth, and a pointer to the memory where you put the smallstack and the other registers. - use a higher register (>7) that Boot can't see to indicate 'if you see this in an operand, pop/push to smallstack instead of storing in a register) - now the 16-bit encoding can't access SMALLSTACK at all, but Boot can use all of the 8 registers it can see (with the zero registers functioning as immutable zeros) - we could even make the register containing the TOS a normal Boot register; this won't violate Boot semantics because Boot can't do anything to cause the stack to be pushed or popped, so within Boot program code this just acts an ordinary register - use some or all of those 64 8-bit instructions as compact ways to encode instructions that use the stack - unfortunately the most obvious encoding (just offer every opcode in a stack-only form) probably isn't good enough b/c we also want to compactly have a mixture of stack and register arguments, and we have 64 potential opcodes and 64 8-bit instruction slots - but 64 is probably still enough for the most commonly used cases, so it's probably fine
hmm after writing that down i like it more and more..
---
probably want opaque calling convention probably want 'entry' or 'prolog' (function prolog) instruction that takes as an argument how many registers are needed (i guess separately for int and ptr regs), and 'epilog' instruction to define the end of each function scope in the linear instruction stream. mb use something like GNU lightning's 'argument management' to access arguments: https://www.gnu.org/software/lightning/manual/lightning.html#The-instruction-set. Or maybe just define a convention there.
between these, and CALL, and RETURN, the calling convention can be implemented opaquely.
remember the old interop instructions: xentry xcall0 xcalli xcallp xcallii xcallmm xcallim xcallip xcallpm xcallpp xcall xcallv xlibcall0 xlibcalli xlibcallm xlibcallp xlibcall xlibcallv xret0 xreti xretp xpostcall
---
i GUESS we could sometimes use the 4th operand to just pass another argument, not always to pass type information
---
maybe CALL takes 4 arguments: the destination register, the address to call, the number of int registers in use (ie to be saved if the calling convention makes them caller-save), the number of ptr registers in use
otoh dont we want to pass type information in CALL, at least sometimes (mb a separate polymorphic call instr)?
hmm could split the # of registers operand into 2 4-bit operands. But that would mean that only 16 regs can be saved in between calls.. i guess the rest could be callee-saved tho..
ENTRY would have a similar problem tho (need to specify how many int regs needed, how many ptrs needed, where to put the type info, where to put the link register). I guess 'where to put the type info' and 'where to put the link register' could be a convention; but specifying them would be nice because mb where you want to put them is 'on the stack'.
do we even want a pseudostack register for the MEMORY stack (MEMSTACK as opposed to the SMALLSTACK)?
also does ENTRY need to specify how many arguments there are? how many int args and also how many ptr args?
---
one thing to consider is not having all of the 255 'registers' mean the same thing, not just as a convention, but as fundamental semantics. Could have some or all of:
if there are only up to 16 caller-saved and 16 callee-saved regs, then you can specify how many you need of either one or the other of those classes, both int and ptr, in one 8-bit immediate operand.
---
yknow, first, we could just have a separate REGSAVE instruction rather than making the # of things to save argument(s) to CALL.
Second, this was not wrong above, but remember that the caller (CALL) only needs to specify how many of the caller-saved regs they need saved, and the callee (ENTRY/PROLOG) only needs to specify how many of the callee-saved regs they need saved.
---
if the CALL instructon doesn't take a destination register argument, then it could just be:
Where the arguments are, where the link register is, and where the return value(s) go could all be specified by the calling convention.
Although, to be fair, the type information could be specified by the calling convention just as easily.
---
arg regs and return regs are caller-saved
---
does the caller have to alloc shadow space on mstack? MS x86-64 convention has 32 bytes of caller-allocated shadow space. sysv x64 does not have this tho.
i think not. No shadow space, no 'red zone'.
---
if the convention for passing types to polymorphic functions is:
then you can make the 4th operand just be the 1st argument, and get an additional (code density) benefit from it when calling non-polymorphic functions.
---
one issue with a CALL3 mechanism is: how do you know if the CALL3 operands are to be interpreted as integer registers or pointer registers? Seems like you would need to use the addressing modes (either of the "opcode" ie function pointer, or of the operands) to disambiguate this. There's a lot of possibilities:
(note: you could mb just leave out the 0- and 1- argument forms (just pass immediate zeros for the arguments you don't need))
(note: you don't need immediate mode for pointers, do you? but maybe you do, just to pass null pointers? could also interpret that as PC-relative)
i guess we could redefine the operand addr modes for CALL3. Use 1 bit for int-or-ptr, and 2 bits for any 4 out of: immediate, register, smallstack offset, stack choice, register indirect. So this isn't too hard, it's just an additional complexity.
also, note that unlike normal calls, CALL3 doesn't do any caller-saving, so the convention is different (everything is callee-save)
---
yknow, most callers probably won't need many or any of the caller-saved registers to be saved, as they will use the callee-save registers for anything that needs to be saved across calls, especially since there are so many callee-save registers available in this architecture. So:
ENTRY/prologue is more likely to need to save stuff (the callee-save registers), and so should probably include two full saving operands.
Similarly, RET will often have to restore the callee-save registers, and so should probably include two full restore operands in addition to the value being returned (or maybe the two values being returned, since we have an extra operand).
---
but doesn't RET usually need to pop off the stack too? which stack? both? so doesn't it need to be told how many to pop off?
no; i think in many conventions, the only thing on the stack are the args (and maybe the return addr), and the callER cleans up the stack. This does, however, suggest that CALL does actually need to know how many arguments there are.
---
So, so far we're looking at:
CALL &fn #argcount(4-bit each for int and ptr; so 16 args max per bank) arg0/type arg1 -- or: one entry for int argcount and one for ptr, and no arg1 -- we dont need it to do anything except store the current address in the link register, but it does whatever the platform calling convention needs it to do (e.g. push args onto the stack)
ENTRY (maybe: #argcount) callee-saves-used-int callee-saves-used-ptr ?? -- pushes the needed numbers of callee-saves registers onto the stack. Note: the first callee saves ptr register is the link register, so if you are going to issue a CALL within this function but dont need any callee-saves registers for computation, then callee-saves-used-ptr should be 1, not 0.
RET (maybe: return_addr) return_value (maybe: #argcount) callee-saves-used-int callee-saves-used-ptr -- restores the callee-saves registers from the stack, puts return_value into the return register, then jumps to the link register (or, could leave the 'link register' on the stack but adjust the stack pointer, and then jump indirectly to the 'link regiter' value past the end of the stack)
---
so what if the platform doesn't provide as many callee-saved regs as we do? eg if we provide 64 callee-saved regs, and the platform provides 3, then we don't really want to save 61 regs upon CALL, because most of those probably aren't needed. So, we kind of do want CALL's operands to tell us how many int and how many ptr callee-saved registers are currently in use/actually need to be saved.
this suggests either: CALL &fn #argcount(4-bit each for int and ptr; so 16 args max per bank) #callee-saves(4-bit each for int and ptr; so 16 max per bank) #caller-saves(4-bit each for int and ptr; so 16 args max per bank) or CALL &fn #argcount(4-bit each for int and ptr; so 16 args max per bank) #callee-savesInt #callee-savesPtr
in option 1, arguments are caller-saveds but they don't count in the #caller-saves count
i'm leaning towards the first variant.
so, we seem to have 48 registers per bank:
we also presumably have some 'specials', such as the stack pointer. If we had 16 of these 'other things', then we'd have 64 registers per bank.
---
actually even in x86-64 there are a lot of situations where only 2 GB of code is allowed; see 13.3 64-bit memory models in Windows and 13.4 64-bit memory models in Linux and BSD and 13.5 64-bit memory models in Intel-based Mac (Darwin) in https://www.agner.org/optimize/calling_conventions.pdf
---
could also use Lua-style CALLs:
" CALLA B CR(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1))Performs a function call, with register R(A) holding the reference to thefunction object to be called. Parameters to the function are placed in theregisters following R(A). If B is 1, the function has no parameters. If B is 2or more, there are (B-1) parameters.If B is 0, the function parameters range from R(A+1) to the top of the stack.This form is used when the last expression in the parameter list is afunction call, so the number of actual parameters is indeterminate.Results returned by the function call is placed in a range of registersstarting from R(A). If C is 1, no return results are saved. If C is 2 or more,(C-1) return values are saved. If C is 0, then multiple return results aresaved, depending on the called function. " -- http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf
actually we could provide both (both Lua-style calls, and calls that use a fixed set of registers for argument and return passing)...
---
yknow, instead of providing how many caller-saved registers to actually save upon each CALL, could store this in special REGISTERS instead (since it will often be set upon ENTRY and stay constant across a function). Then you have less CALL operands and you can exceed the 16 register limit and use more of the potential 255 registers
so we're looking at:
CALL fnptr #argcount(4-bit each for int and ptr; so 16 args max per bank) arg0/type ? -- reads the #s of caller-saves from special registers, and saves them, then calls, then restores them
mb also CALLLUA (same arguments as CALL) -- same as CALL but instead of the arguments being the first 16 registers, they are taken from the registers immediately after the fnptr register
ENTRY caller-save-needed-int caller-save-needed-ptr callee-save-needed-int caller-save-needed-ptr -- puts all 4 args into special registers, and saves the callee-save-neededs, (do we also need #args tho? mb but probably not..)
RET return-value -- reads the #s of callee-saves from special registers, and restores them
---
should we have IP-relative addressing so that position-independent code is ez? Or mb use some other register and do everything relative to that?
makes me go back to thinking about those X,Y registers from apple
http://www.obelisk.me.uk/6502/addressing.html http://www.emulator101.com/6502-addressing-modes.html http://skilldrick.github.io/easy6502/#addressing
could use our wealth of registers as a sort of 'zero page'. Do we want to allow indexing into the set of registers though? Probably not.
---
oo remember how pretty this used to be:
" initial commit Bayle Shanks authored 4 years ago
OVM is a virtual machine platform for programming language implementation. It is distinguished by a small, easy-to-port implementation, yet with the ability to preserve some high-level constructs when used as a target IL. Ovm is a prototype and is incomplete and buggy. There have not been any releases yet. Ovm is open-source (Apache 2.0 license).
Motivation Ovm is being (very slowly) developed for the Oot programming language (which is also under development), to serve as:
the language in which Oot will be implemented in a target IL (intermediate language) for Oot programs
Ovm is motivated by Oot's aspirations to be an easy-to-port 'glue language' that can run under, interoperate with, and transpile to/from a variety of languages and platforms. The reason for Ovm's existence is that many other virtual machines:
are too big: many VMs have large implementations and long specs, making them hard to port constrain control flow: many VMs and ILs are written to support a particular HLL (high-level language) and only support control flow patterns of that HLL (for example, some VMs/ILs have trouble with tail calls) are too low-level: on the other hand, there are some tiny VMs that consist of only a handful of assembly instructions and few or no registers. When HLL constructs are compiled into assembly, program structure can be difficult to recover; such a language would not be suitable as an IL for transpilation between HLLs
Ovm deals with these issues:
Ovm has a small implementation. Much of its functionality is in its standard library, which is itself written in Ovm (so you don't have to port it) Ovm is assembly-like and supports arbitrary control flow patterns via computed GOTO and exposure of the callstack Ovm is slightly higher-level than assembly and provides enough registers for a simple mapping of HLL variables to registers in most cases, but more importantly, it is extensible, which allows the standard library to define some higher-level constructs
The implementation of even relatively simple and common operations as standard library functions makes them inefficient. The motivation of this is just to allow us to have a small core, for portability, and push most of the functionality into the standard library. An Ovm implementation which values efficiency over portability could simply override some of these standard library calls with native implementations. Ovm is structured to make this easy to do.
What's here This repo contains:
ovm: command-line frontend to pymovm pymovm: a minimal Ovm implementation (in Python) oasm: Ovm assembler odis: Ovm disassembler pyooblib: a Python library for working with .oob files (and .ob0 files and raw ovm bytecode files) cinstr.ob0 and syscall.ob0, part of the 'standard library' of Ovm docs: There are no docs yet. For now, see http://www.bayleshanks.com/proj-oot-ootAssemblyThoughts (and possibly the later items in the http://www.bayleshanks.com/proj-oot-ootAssemblyNotes* series eg http://www.bayleshanks.com/proj-oot-ootAssemblyNotes13 ).
Ovm understands three file types (raw and .ob0 formats are understood by the core OVM implementation, .oob format is supported by standard library functions):
raw: raw Ovm bytecode .ob0: a simple module format containing a list of functions, their type signatures, their bytecode implementations .oob: "Oot bytecode", a more elaborate module format also containing imports, constant tables, signatures, optional debugging information, etc (not implemented yet).
Overview of the Ovm language Ovm is a assembly-like 3-operand CISC memory-to-memory register machine with 2 primitive types, 4 primitive address modes, 16 primitive instructions, 256 registers, a data stack, and a call stack. Primitive types:
U16: unsigned 16-bit integer PTR: pointer
Primitive address modes:
immediate register register indirect (the equivalent of LOAD/STORE) stack (post-increment/pre-decrement upon read/write; reading/writing to a stack pointer pops/pushes the stack)
Primitive instructions:
annotate (not a real instruction, but an annotation of this point in the code) loadi (load immediate) cpy (copy) add-u16-u16 (addition on integers) sub-u16-u16 (subtraction on integers) leq-u16-u16 (<= comparision on integers) add-ptr-u16 (add an integer to a pointer) sub-ptr-u16 (subtract an integer from a pointer) call3 (call the function represented by a function pointer, passing in 2 inputs and getting 1 output) lea (load effective address: given an addressing mode and operand data, compute an effective address) syscall (call various language or system services) jzrel (conditional relative branch; branch if input is zero) jnzrel (conditional relative branch; branch if input is non-zero) j (absolute jump (or static, or immediate jump); jump to a fixed label) ji (indirect jump (or dynamic jump); jump to a pointer in memory) (maybe: swap?)
Ovm will provide compile-time type checking (not yet implemented). Ovm provides the following features which are atypical for assembly-like languages:
Generic and ad-hoc polymorphism Single-instruction function calls to first-class-functions: a memory location holding a reference to function is called, passing in 2 inputs, and assigning one output Capability-based security (not yet implemented) Separate data and call stacks; also, stack, and the register file, are first-class and may be swapped in a single instruction to allow concise 'caller saving' of registers, greenthread context switching, etc. Extensibility features:
custom instructions, written in Ovm custom syscalls, written in Ovm user-defined data structures with getters and setters written in Ovm (not yet implemented) language hooks which call Ovm code (not yet implemented)
Oot has three bytecode encodings, SHORT (16-bit fixed length instructions; defined but not implemented yet), MEDIUM (64-bit fixed length instructions), and LONG (under development; variable-length instructions composed of a 64-bit aligned stream of 16-bit words). " -- https://gitlab.com/oot/ovm
---
how many regs to allow the use of? we want to save some for the implementation, right?
since i kind of like 64 anyway, there's a nice justification for that; 64*4=256, so if we only allow 64 regs in each bank and if registers hold 32-bit words then the implementation can convert register number to byte number in a heap-stored register bank
---
i really like the spirit of QBE vs LLVM:
https://c9x.me/compile/doc/llvm.html
" QBE is a much smaller scale project with different goals than LLVM.
QBE is for amateur language designers.
It does not address all the problems faced when conceiving an industry-grade language. If you are toying with some language ideas, using LLVM will be like hauling your backpack with a truck, but using QBE will feel more like riding a bicycle.
QBE is about the first 70%, not the last 30%.
It attempts to pinpoint, in the extremely vast compilation literature, the optimizations that get you 70% of the performance in 10% of the code of full blown compilers.
For example, copy propagation on SSA form is implemented in 160 lines of code in QBE!
QBE is extremely hackable.
First, it is, and will remain, a small project (less than 8 kloc). Second, it is programmed in non-fancy C99 without any dependencies. Third, it is able to dump the IL and debug information in a uniform format after each pass.
On my Core 2 Duo machine, QBE compiles in half a second (without optimizations)."
it's probably not for us right now for similar reasons as LLVM, as: (i) it's SSA (and right now we're looking for simplicity, not optimization) (ii) it's only for a few platforms (and right now we're looking for extreme and extremely easy portability)
but still interesting
---
two things to consider about CALLs and callee-save and caller-save variables:
1) the idea i had above of having the number of caller-save variables in use, and the number of callee-save variables in use, both be in special registers, works well in terms of this virtual machine, but if implemented straightforwardly has an implementation cost: - either these numbers are actually stored in platform registers, consuming a bunch of registers, or they are stored on the stack or heap, causing an extra unnecessary memory access upon each CALL/RETURN - this is because putting these things in registers makes them DYNAMIC variables, instead of statically known immediates - they are saved/restored automatically with the other registers, so it's not a 'problem' in that senses, but it is a performance problem - what these really should be are 'lexically scoped'; in this case 'linear lexical scope' would be enough, e.g. "any CALL/RET instruction after a PROLOG instruction and before the next EPILOG instruction take their number-of caller-save/callee-saves from the values given in the PROLOG instruction" - but, lexical scoping, even linear lexical scoping is perhaps slightly more complex than we want this language to be? - i dunno if it's actual a problem, is there really much of a complexity cost from this? If a compiler, you can still be doing single-pass; just have a few more state variables as you read thru the code that stores the last encountered number-of caller-save/callee-saves. If an interpreter, you can just use extra registers, as above, even though there's a perf cost to that. - still, even lexically linear stuff rubs me the wrong way a little. It seems it would be best to just have instruction operands in CALL and RETURN that contains that number of things that may have to be saved.
2) in actual register machines, the calling convention is just that, a convention, and programs have the option of using any register to pass arguments, not just the argument registers. However, in our case, some implementations might put some registers in actual registers and others in the stack or on the heap, so it might be simpler if we just made the caller-save/callee-save/argument register distinction an enforced property rather than a convention. That is, after a call, ONLY the argument registers are guaranteed to be preserved. Note that this strongly suggests that we should encapsulate callee-saving via an ENTRY/PROLOG instruction, so that implementations that wipe those registers anyway don't spend time redundantly saving the already wiped values (if it's left up to the program to do that explicitly).
so so far, i'm thinking:
---
Another place that extra/special registers might be useful is in defining custom instructions. Again, though, that would allow custom instructions to be redefined dynamically, whereas we probably want them to be defined 'lexically linearly' instead.
---
still have to figure out if we're doing hardware-style CALLs, where the argument register are a fixed group of registers, or Lua-style CALLs, where the argument registers are after the register holding the address to be jumped to. I'm thinking that we should do it hardware-style because maybe there's some reason they all do that, and even if not, there may be some impedence mismatch in translating to that (for example in tail calls), although i haven't thought it thru so i'm not really sure if there's a cost or not.
---
agner likes elf best: https://www.agner.org/optimize/calling_conventions.pdf
---
could still consider stuff like custom instructions, with a special register determining what the custom jnstructions do
---
an example of someone else's simple assembly language and simulator/debugger:

---
points out that labels can be the only thing in a line.
i think labels should be the only thing in a line, and those lines should always start with :
---
the goal of LOVM is to be a portable 'assembly-like' language which is somewhat efficient and a somewhat good IL for transpilation:
---
consider having offset ranges like MIPS (which is greater than RISC-V):
---
mb define an optional 'status register' (which CANT be used as a gpr, to ease efficient implementation ) and optional GPR 'accumulator' to (1) allow bigger immediates in special instr forms and (2) ease an orthogonal 8bit instruction set
---
could specify that instruction boundaries msut be consistent throughtout the program, that is u cant jump into the middle of an jnstruction as viewed by the alignment implied by decoding starting from the PC (and going forward or back) at any other time
---
my calculation at the top of lowEndTargetsUnsorted5 regarding the NXP i.MX 6ULZ suggests that we stay under:
in units of 32-bit words (4 bytes), that's:
so if we have 2 banks of 32 registers each, plus 2 smallstacks of 32 items each, that's already at the limit of 128 words, and we don't have room for metadata! So maybe that's too much.
It's still less than Erlang's processes though, which are at least 309 words [1] -- but that includes some non-stack heap as well as the stack (the process heap, which contains the stack, is 233 words).
so this suggests that we reduce the registers in each bank and each smallstack from 32 to 16. If we allow direct access to deep smallstack (indexed to TOS) as if it were another register bank, then a program can ignore the smallstack and effectively have 32 registers in each bank if it wants, which is enough.
So in each bank (before making space for the special registers) we'd have: 8 caller-save (temporary) GPRs 8 callee-save GPRs 16 smallstack GPRs
are arguments passed on the smallstack, or in the caller-save GPRs, or on the memory stack? With this setup we have more stack locs, so maybe on the smallstack.
---
so we're looking at:
one concern is that this proposal doesn't seem to provide enough caller-saves; in fact you want more caller-saves than callee-saves, not less. I suppose if arguments are passed on smallstack, though, those arguments can be thought of as caller-saves.
so maybe take the special registers out of callee-saves.
also, you probably want to interleave the caller- and callee-saves a little so that the 16-bit encoding, which can only easily access the first 8 regs, can access some of each. Just extend the Boot convention, i guess.
---
Looking ahead to the 64 8-bit encoding shortcuts, probably what we want to do is analyze and find the common things and then rationalize it so that we have a regular ISA subset. Initial thoughts, according to some stuff in my plChAssemblyFrequentInstructions:
Note that R1 thru R4 comprise the accumulator, TOS of SMALLSTACK, TOS-of-SMALLSTACK-with-implicit-push-pop, memstack pointer.
That's already all of our budget! But if it weren't, what else is common? Going thru everything in plChAssemblyFrequentInstructions, and writing down the stuff that appears in spot #1 in the various lists:
totals (top only): 10 mv, 11 ld, 2 lm, 2 addi, 2 call
writing down the stuff that appears in spot #2 in the various lists:
totals (top only): 3 mv, 8 ld, 3 lm, 5 cond branch or compare, 2 add
so i think it's clear that mov and ld are the most common, all things considered. After that maybe lm, and then maybe a conditional branch.
HOWEVER we won't have the offsets that these things usually do, so these might not be right for us, we'll have to see.
if we really do just have 32 MOVs between R1-R4 and 32 LDs between R1-R4, then we can completely regularize:
let's go with that for now, as this high degree of regularization may keep the interpreter code smaller
---
i think when making the calling convention we really need to think about making calling not very costly, a la Forth
---
And remember that if we only use 64 registers specifiers then we can use four addressing modes even in the 32-bit encoding
Lovm can have a limited number of registers but ovm can have 255. Lovm can have four addressing modes, or 8 if we don't specify the register bank in the operand.
---
RISC-V says that they didn't use a CAS instruction because it would have 3 source operands (or 5, for double-wide CAS) (and apparently the maximal number of input operands over the entire instruction set determines how many ports are required on the register file, which makes hardware power and mb perf worse).
Here's some things with 3 source operands:
i think we want those in LOVM (and probably even in BootX?