(note: this file was created pretty late in the game; Boot notes used to be in the ootAssemblyNotes series, and maybe also the ootOvmNotes series)
---
---
---
if we have 2 banks of 7 regs each, that's more than the 8 regs in x86. But it's less than the 16 regs in x86_64, so i think it's okay.
ARM has 16 regs but note that this includes the stack pointer, link register, and PC. So i think we're okay but not 100% sure.
---
mb dont specify bitwidth, except that it's at least 32. Not a problem b/c e.g. 32-bit width can be emulated by the programming by bitmasking (ANDing with 2^31) after each op. int32 ops in LOVM could compile to op+bitmasking in Boot.
---
---
how does AArch64 do it?
https://stackoverflow.com/questions/44949124/absolute-jump-with-a-pc-relative-data-source-aarch64
says to "rely on the compiler generated literal pools as usual:
LDR x9, =0xBADC0FFEE0DDF00D BR x9 "
https://modexp.wordpress.com/2018/10/30/arm64-assembly/#branch also has a 'B' mnemonic tho
https://wiki.cdot.senecacollege.ca/wiki/AArch64_Register_and_Instruction_Quick_Start uses 'BR'
according to https://armkeil.blob.core.windows.net/developer/Files/pdf/graphics-and-multimedia/ARMv8_InstructionSetOverview.pdf , B has a range of +-128 MB; that's about 28 bits
so, seems like the AArch64 way is:
according to https://developer.arm.com/documentation/dui0473/c/writing-arm-assembly-language/literal-pools the literal pool offset from the instruction can be +-4k or +1k for 16-bit Thumb code.
mb see also discussion at https://stackoverflow.com/questions/41906688/what-are-the-semantics-of-adrp-and-adrl-instructions-in-arm-assembly
however, in Boot, since we may be compiling Boot to some other ISA, we don't like indirect jumps to loaded constants, because the compiler needs to be able to identify all potential jump target immediates to fix them up once the actual locations are known. But perhaps there's some advantage to grouping together these addresses into a 'literal pool'? Not sure.
---
okay i split jrel into 3 variants:
the idea is that:
a fancy assembler can offer a jrl pseudoinstruction and then just use the smallest one it can for each relative jump.
---
i guess one benefit to 'literal pools' is that literals can have natural alignment. In my current proposal, in the 16-bit or variable-length encoding, a 32-bit constant could be not in 32-bit alignment.
but, in the variable length encoding, you're always reading instructions which aren't aligned anyways, and in the 16-bit encoding, things are only 16-bit aligned which probably is smaller than the wordsize on the underlying platform anyway. And having a literal pool (a) means maybe more memory traffic, because the literal pool is not necessarily next to the instruction, and (b) might complicate the compiler, which has to fix up things that are not known to be jump targets except that some jump instruction refers to them (esp. if the referring jump is after the literal pool).
i am not enough of an expert of this stuff to know which of those is worse. For now i'll leave it like this.
---
riscv really does include the jump immediate in the instruction, mb we should too, for the 16 bit encoding?
done
---
could have a special register that is (or may be in some cases/is permitted to be) cleared at the end if each instruction except for li. that allows the implementation to treat li to that register as fused with the following instr, aloowing the benefits of having some instructions being floowed by an immediate value without actually doing that. or, could get a similar effec while still have the 'real' opcode first by constraining the data in the second part to be some special instruction ( ut then the data has to be parsed out...)
---
hmm i dunno i'm thinking again about the choice to have jump instructions that reference words in the instruction stream. wont this mess up hardware implementations bigtime? Because a hardware implementation can't assume that it has all the information it needs to execute an instruction after loading the instruction.
otoh i guess ARM literal pool load instructions like
LDR x9, =0xBADC0FFEE0DDF00D
'should' have the same problem, and they're fine. All i'm talking about is something like that, except that you load from the literal pool into the PC instead of into some other register. So imo it actually shouldn't be much more harder for the chip designer.
so i think it's good.
---
removed this from undef behavior section (in 16 bits this is impossible):
---
removed/rewrote this:
Instructions with exactly one unsigned immediate operand may have any unsigned value from 0 thru 4194303, inclusive, in that operand. Instructions with two operands, where unsigned immediates, may have any value from 0 thru 63, inclusive, in the first operand, and any value from 0 thru 65535, inclusive, in the second operand. Instructions with three operands, where unsigned immediates, may have any value from 0 thru 63, inclusive, in the first operand and any value from 0 to 255 in each other operand. When an immediate operand type for this instruction is signed, unsigned ranges from 0 to 63 are replaced by signed ranges from -32 to 31, unsigned ranges from 0 to 255 are replaced by signed ranges from -128 to 127, unsigned ranges from 0 to 65535 are replaced by signed ranges from -32768 to 32767, unsigned ranges from 0 to 4194303 are replaced by signed ranges from -2097152 to 2097151.
removed/rewrote this:
removed/rewrote this:
Note that the opcodes (see table below) have the following properties:
Note that later additions that assign instruction(s) to the RESERVED opcode may break the 'only such' parts of these properties.
The short descriptions under Instructions, above, have been kept to at most 80 characters per line.
---
nah, i have a sweet spot for 16-bit
---
old:
---
Still reconsidering overlapping all the register banks, and adding a 6 element small stack to boot. This would keep boot implementatable in 16 registers. It would mean that l o v m would have 32+32= 64 items of state = 512 bytes of state instead of 1k+fp_regs like in The current proposal (assuming each register is 64 bits). Semantically we would still ban integers and pointers from mixing -- so an implementation could choose to keep those registers separate if it wanted. Would we have one small stack or two? The overlap way of doing things would say one I guess. Don't know if this is worth it because boot is in some ways actually less expressive because it loses the ability to directly address 16 registers -- although it gains the ability to have more than eight registers of each type. The benefit that I like the most is that now if you want small stack you don't have to change the semantics of boot at all when moving to l o v m. A tangential benefit is that since it's easy to reduce the size of the stack to six, you can give the platform two more hidden registers (one as a small stack TOS ptr and one as a temporary). I don't know man. This would make type checking require extra work, and it goes against the principle that more registers is better in lovm. Could just provide the small stacks anyway.
---
Still considering providing two small stacks in boot.
done
---
i'm still not happy with the lack of zero-operand instructions for the smallstacks. The 8-bit encoding isn't able to help much here because it has so few bits, and because i really want to be able to copy/load from/to any of the first 4 registers. The main value of the stack in this design is only that it gives access to a few more 'registers' without having to make operands bigger (10 registers instead of 7; it could be more but for now the 10 is chosen because i'm worried about implementations on 32-register machines having enough hidden registers for its own use).
an alternative would be to just make the 64 8-bit ops be the 64 opcodes, with all operands assumed to be the pseudostack register -- and then to special case the obviously useless ops (e.g. cp from/to the smallstack is useless)
---
the motivation for requiring that stack depth be staticly knowable at each instruction of the program is so that a compiler can map the stack locations to fixed register locations.
The reason we only count variance at the end of an instruction is so that ENTRY at the beginning of a function can clean it up .
---
actually that presents a bit of a conundrum.
first, if we disallow smallstack depth variance at the end of an instruction, then immediately after a RET instruction (which restores the stack to the way it was before ENTRY), we'll have forbidden depth variance at that moment. The better solution would seem to be to exempt one of ENTRY or RET from the condition (ENTRY if the condition is enforced before the instruction, RET if after).
Say that stack locations TOS, TOS-1, TOS-2 are mapped to registers R3, R2, R1 in function1. Now function1 calls function2. So at this point the compiler would want to replace TOS with R3 in function2.
But if function3 has a empty stack when it calls function2, then it will have stack location TOS mapped to R1. So the compiled code for function2 conflicts with this.
A solution would be to make the entire stack caller-save, and require the stack to be empty upon each function call (except perhaps for arguments passed on the stack). So make the caller push everything from the smallstack to memstack as part of each calling sequence. But this negates some of the benefits of having a stack (quick function calls).
What we'd like instead to do is to allow the implementation to do some behind-the-scenes register window stuff to spill from the bottom of the smallstack if it wants, and maintain the stack as a stack rather than mapping it to fixed registers, while exposing to the program the illusion that it was emptied after the function call. This mojo would occur upon encountering the ENTRY instruction, so we'd make the requirement for a constant stack apply at the beginning, not end, of instructions, and relax that requirement for a constant stack.
Another way of doing this (which i think i like better) is that if the smallstack doesn't all fit in registers, the part in registers is merely the top few items. By indexing from the top instead of the bottom we avoid this problem, but we get another one: pushing or popping the smallstack requires us to do n extra MOVs, where n is the number of stack items in registers.
And a third way is for the implementation to have indirection when accessing the smallstack, rather than mapping it to a fixed register.
actually i think we should remove the restriction on smallstack depth variance:
so, (in almost all cases) the (entire) stack cannot be staticly assigned to fixed registers (without any indirection).
---
we could encode the branch and jump targets to avoid useless stuff like zero offsets from the next instruction, but:
---
removed from spec (this should go in a design doc, not in the spec):
The 0 in the most-significant bit is intended to allow Boot to be made a part of other instruction formats which use 0 in this bit to indicate a Boot instruction, a 1 to indicate something else (for example, instructions of different lengths). That is to say, instructions with a 1 in the most-significant bit are reserved for extensions.
--- design
do we implement labels in Boot assembly? i'm thinking not. LOVM will have a more usable assembler, Boot is just a target language.
---
i decided to leave them embedded in the instruction stream. Otherwise labels could only be loaded with an 8k range, which would either make implementations jump thru hoops by doing things like branching to somewhere near a desired label just to collect the label, and then branching back (at presumably great cost to performance); or it would mean that the Oot implementation would have to be careful never to compile to something with more than 8k within a function (LLVM only allows indirect branching within a function), which i'm not sure if would be feasible.
As a bonus, now we can assume that any Boot implementation supports larger programs.
We can have a BootX?