:hardbreaks: :toc: macro :toclevels: 1
Version: unreleased/in-progress/under development (0.0.0-0)
THIS DOCUMENT IS OUT OF DATE. The markdown version, boot_reference.md, has replaced it (although that's not finished yet).
Boot is a low-level 'assembly language' virtual machine (VM) that is easy to implement.
toc::[]
TODO rewrite to current design -- the pyboot implementation, particularly the instruction listings in constants.py, is (mostly) more up to date than this document. Note also that a simpler 32-bit encoding has been introduced; this 16-bit encoding has been made an optional 'compressed' encoding (mb since it's optional, it should be moved to BootX?) later: actually now i'm thinking of simplifying even further and removing stacks, and adding floating point, see ootAssemblyNotes26.txt
TODO actually let's go back to defining int32 as '32-bits OR MORE' and make integer overflow fully undefined behavior. This is because signed integer overflow is undefined behavior in C and we want to allow porters to compile arithmetic straightforwardly to their unchecked C equivalents; and also because [1] says that in many cases it's much more efficient for variables to be register width. I also made a note about this at the end.
Boot's purpose is to promote the portability of OVM (Oot Virtual Machine) by serving as a portable target language underneath it.
The reason for writing a new assembly language, rather than adopting LLVM or Webassembly or similar, is that we want the entire Oot toolchain to run on top of many platforms, including 'on top of'/within existing high-level languages such as Python. This means that the entire Boot implementation needs to be quick and easy to port (not just retarget to a new backend, but entirely port), and it needs to support opaque platform-native pointers of unknown size.
Highlights:
Instruction encoding: 16 bits, little endian. LSB bit (the 'instruction encoding length format bit') is always 0.
Four formats:
For each format, from LSB to MSB, the fields are:
In graphical form (from MSB to LSB) (0,1 = constants, o = opcode, x = operand 2, y = operand 1, z = operand 0:
I-formats:
R-formats:
P-formats:
TODO out of data; this is now the 'compact' format, which should go in a different document
Note that some implementation could choose to consider the subinstructions of otwo, oone, ozero as separate encoding formats.
Note that some implementation could choose to consider the I-format as consisting of two subformats, depending on how many bits the immediate constant is.
The I11 and I8 instructions are:
Datatypes: int32, ptr
Registers: two banks of 16 registers each; one for int, one for ptr. The first 5 registers in each bank are special. The next 3 are GPRs. Some operations are only available on the first 8 registers.
Instructions:
TODO: add the other new instructions
| == | |
|---|---|
| annotation | ann |
| constants | ldi sai |
| loads and stores and moves | ld ldh ldhu ldb ldbu ldp st sth stb stp cp cpp |
| arithmetic of ints | add addi sub mul |
| bitwise arithmetic | sl sru srs and or xor |
| arithmetic of pointers | aip |
| comparision control flow | beq beqp bne blt |
| control flow | jpc jy |
| concurrency | CAS fence acquire release |
| misc | more |
| sys | sysinfo |
| RESERVED | 3 R instructions, 1 I instruction |
| RESERVED | 6 two-argument instructions |
| RESERVED | 7 one-argument instructions |
| RESERVED | 8 zero-argument instructions |
| RESERVED | 1 instruction |
| == |
other ideas: in out push pop dup drop swap over read write halt impdep poll sysinfo xcall devop, CAP register/stack cmov select call ret cmov malloc dealloc realloc
In the following, #X is an immediate constant, $X is an integer register, &X is a pointer register, and x is an untyped argument. All immediate constants are signed two's-complement ints.
From left to right, the arguments go into operands op0, op1, op2. Immediate operands are always on the right (the highest-numbered operand). When two immediate operands are combined into an #imm6 (as with instruction jd), op1 is the high-order bits and op2 is the low order bits (imm6 = (op1 << 3) + op2).
The branch immediates have a special coding, because -2, -1, 0 are not needed; at the time of branch instruction execution the PC is pointing to the beginning of the instruction after the branch instruction. Since instructions are 16-bits, -2 would branch right back to the branch instruction itself, causing an infinite loop; -1 would branch into the middle of the branch instruction; and 0 would cause the branch to end up at the same place as if there were no branch. Therefor the branch instructions interpret the numbers 0..7 in op2's binary representation to code for: [-6, -5, -4, -3, 1, 2, 3, 4]. The program counter is in units of bytes, and instructions are 16-bits (2 bytes), so a branch immediate of +2 skips over one instruction, and a branch of +4 skips over two instructions, and a branch of -4 skips backwards two instructions, and a branch of -6 skips backwards three instructions. The odd-numbered branch immediates (-5, -3, 1, 3) are illegal in Boot but are present for compatibility with BootX?, which is a variable-length instruction set including some 1-byte instructions.
Jump and branch offsets are in units such that an offset of 2 represents one Boot instruction. Platforms which represent Boot code in memory in ways such that one instruction spans more or less than 2 memory locations should adjust the jump and branch offsets accordingly when executing instructions jpc beq beqp bne blt, and also should emit the correct INSTRUCTION_SIZE in SYSINFO. Program code which dynamically computes code pointer by adding integers to the PC or to other code pointers should be sure to take account of INSTRUCTION_SIZE.
meta:
constants:
register loads and stores and moves:
stack and register/stack loads and stores and moves:
(note: you can use 'cp' and 'cpp' to achieve DUP, and by also using the zero register, DROP)
arithmetic of ints:
bitwise arithmetic:
arithmetic of pointers:
(?: i'm not sure if these are worth including; the idea is to make up for the fact that the platform word size and pointer size is not known at compile time. Note also that a fancy VM could easily search for these at loadtime and substitute in the appropriate platform-specific immediate constants. These also serve to provide an ADDI on pointers, which is probably very useful)
conditional branches:
unconditional jumps: