- Boot (Oot Boot) reference
Version: unreleased (0.0.0-0)
Boot is a low-level 'assembly language' virtual machine (VM) that is easy to implement.
- Introduction
Boot's purpose is to promote the portability of LoVM? (Low Virtual Machine) by serving as a portable target language underneath it.
The reason for writing a new assembly language, rather than adopting LLVM or Webassembly or similar, is that we want the entire Oot toolchain to run on top of many platforms, including 'on top of'/within existing high-level languages such as Python. This means that the entire Boot implementation needs to be quick and easy to port (not just retarget to a new backend, but entirely port), and it needs to support opaque platform-native pointers of unknown size.
Highlights:
- 3-operand fixed-length register machine
- signed 32-bit integers
- opaque pointer representation
- 15 integer registers and 15 pointer registers, and a zero and a null pointer register
- about 100 instructions, plus optional instructions
- RISC-like; no addressing modes (or rather, each instruction has one fixed addressing mode), and only a few instructions access (non-register) memory
- instructions for compare-and-swap and memory fence
- instructions for I/O and memory allocation
- Cheatsheet
instruction encoding: 32-bits: 3 operands (1 byte each), and 1 byte opcode. First operand is restricted to a maximum value of 63 (unsigned), so the first two bits of each instruction are guaranteed to be 0.
Datatypes: int32, ptr
Registers: two banks of 16 registers each; one for int, one for ptr. The first register in each bank is constant zero.
Instructions:
Boot instructions fall into three categories:
- Small profile: These can be easily ported almost anywhere
- Standard profile: This is what OVM requires. This profile adds integer division, reading the program counter, indirect branching to a previously read program counter value, floating point, atomics, constant tables, and 'systems' instructions for allocating memory, I/O and filesystem operations, interoperation, querying metadata about platform capabilities, and logging.
- Optional instructions: These are not required but can be added, either to expose additional facilities to Boot programs, or to provide more efficient native implementations of certain operations.
small profile (47 instructions; all opcodes are below 128):
== |
---|
annotation | ann |
constants | li sai |
loads and stores and moves | cp cpp lb lbu lh lhu lp lw sb sh sp sw pushi popi pushp popp |
arithmetic of ints | add sub mul addi add32 sub32 mul32 addi32 |
bitwise arithmetic | and or xor not sll srs sru |
arithmetic of pointers | addp addpi addppi addpwi pcmp |
non-branching conditionals | cmovi cmovip cmovpp |
comparision control flow | bne bnep blt beq beqp |
other control flow | jrel jt |
misc | ann halt |
== |
---|
TODO unsigned integer arithmetic or conversions?
standard profile adds (52 instructions, for 92 total; all opcodes are below 128):
== |
---|
integer division | div rem |
floating point | i2f lf sf ceil flor trunc nearest addf subf mulf divf copysign bnan binf beqf bltf fcmp |
atomics (sequential consistency) | lpsc lwsc spsc swsc casrmwsc casrmwpsc fencesc |
constants and constant tables | lkp lkpb jk lkf |
memory allocation | malloc_shared malloc_local mfree mrealloc |
I/O | in out inp outp devop |
other control flow | jy lpc |
misc | sysinfo log |
== |
---|
optional instructions (all opcodes are 128 or greater):
== |
---|
atomics (release consistency) | lprc lwrc sprc swrc casrmwrc crsprmwrc |
atomics (relaxed consistency) | lprlx lwrlx sprlx swrlx casrmwrlx crsprmwrlx |
atomic additional rmw ops (sequential consistency) | addrmwa aprmwa andrmwa orrmwa xorrmwa addrmwa |
atomic additional rmw ops (release consistency) | addrmwrc aprmwrc andrmwrc orrmwrc xorrmwrc addrmwrc |
atomic additional rmw ops (relaxed consistency) | addrmwrlx aprmwrlx andrmwrlx orrmwrlx xorrmwrlx addrmwrlx |
additional floating point | remf sqrt minf maxf powf bnonfinite bgtf beqtotf blttotf ftotcmp |
implementation-defined | impl1 thru impl16 |
filesys | read write open close seek flush poll |
interop | xentry xcall0 xcalli xcallp xcallii xcallmm xcallim xcallip xcallpm xcallpp xcall xcallv xlibcall0 xlibcalli xlibcallm xlibcallp xlibcall xlibcallv xret0 xreti xretp xpostcall |
misc | break |
In the following, #X is an immediate constant, $X is an integer register, &X is a pointer register, _ is an unused argument that must always be 0 in proper Boot programs (Boot implementation are free to make use of these locations however), and X is an untyped argument. All immediate constants are signed two's-complement ints.
From left to right, the arguments go into operands op0, op1, op2. Immediate operands are always on the right (the highest-numbered operand). When two immediate operands are combined into an #imm16 (as with instruction li), op1 is the high-order bits and op2 is the low order bits (imm16 = (op1 << 8) + op2). Similarly for #imm24 (imm24 = (op0 << 16) + (op1 << 8) + op2).
JREL and branch immediates are in units of bytes. JREL and branch immediates which are not divisible by 4 are illegal (because Boot instructions are always 32-byte aligned). Platforms which represent Boot code in memory in ways such that one instruction spans more or less than 4 memory locations should adjust the jrel and branch offsets accordingly before executing them.
constants:
- li $dest #imm16: LoaD? 16-bit Immediate int constant
- sai $dest #imm16: Shift left 16-bits then Add 16-bit Immediate int constant
- lkp &dest #imm16: LoaD? K-th Ptr constant into &dest
- lkpb #imm24: LoaD? K-th Ptr constant into &3
register loads and stores:
- lw $dest &addr #imm8: LoaD? 32-bit int from memory (&addr + #imm8) to register
- lh $dest &addr #imm8: LoaD? 16-bit int from memory (&addr + #imm8) to register
- lhu $dest &addr #imm8: LoaD? 16-bit unsigned int from memory (&addr + #imm8) to register
- lb $dest &addr #imm8: LoaD? 8-bit int from memory (&addr + #imm8) to register
- lbu $dest &addr #imm8: LoaD? 8-bit unsigned int from memory (&addr + #imm8) to register
- lp &dest &addr #imm8: LoaD? Pointer from memory (&addr + #imm8) to register
- sw &addr $src #imm8: STore 32-bit int from register to memory (&addr + #imm8)
- sh &addr $src #imm8: STore 16-bit int from register to memory (&addr + #imm8)
- sb &addr $src #imm8: STore 8-bit int from register to memory (&addr + #imm8)
- sp &addr &src #imm8: STore pointer from register to memory (&addr + #imm8)
moves:
- cp $dest $src _: CoPY? int from register to register (equivalent to $addi $dest $src 0)
- cpp &dest &src _: CoPY? Pointer from register to register (equivalent to $addpi &dest &src 0)
arithmetic of ints (result is undefined if the result is greater than 32 bits):
- add $dest $src1 $src2: $dest = $src1 + $src2
- addi $dest $src1 #imm8: $dest = $src1 + #imm8
- sub $dest $src1 $src2: $dest = $src1 - $src2
- mul $dest $src1 $src2: $dest = $src1 * $src2
arithmetic of ints (as above, but result always defined and all results mod 2^32):
- add32, addi32, sub32, mul32
bitwise arithmetic:
- sll $dest $shiftamount $src: shift left
- sru $dest $shiftamount $src: shift right unsigned
- srs $dest $shiftamount $src: shift right signed
- and $dest $src1 $src2
- or $dest $src1 $src2
- xor $dest $src1 $src2
- not $dest $src1 _: $dest = bitwise_NOT($src1)
arithmetic of pointers:
- addp &dest &src1 $src2: &dest = $src1 + &src2 (Add to Pointer) (note: some pointers may not be able to be added to)
- addpi &dest &src1 #imm8: &dest = $src1 + #imm8 (Add immediate to Pointer, in native pointer units)
- addppi &dest &src1 #imm8: &dest = $src1 + #imm8 (Add immediate to Pointer, in units of PTR_SIZE)
- addpwi &dest &src1 #imm8: &dest = $src1 + #imm8 (Add immediate to Pointer, in units of WORD_SIZE)
conditional branches:
- beq $src0 $src1 #imm8: branch-if-equal
- beqp &src0 &src1 #imm8: branch-if-equal on ptrs
- bne $src0 $src1 #imm8: branch-if-not-equal
- bnep &src0 &src1 #imm8: branch-if-not-equal
- blt $src0 $src1 #imm8: branch-if-less-than
unconditional jumps and other control flow:
- jpc #imm24: jump (PC-relative)
- jy &target _ _: dYnamic (indirect) jump. &target must be a code pointer provided at runtime (either by lpc, lkp, lkpi, or by platform-specific or foreign code) and which did not have pointer arithmetic performed on it.
- jk #imm24: Jump to the #imm24-th pointer in the pointer constant table
- jt $index #imm16: Jump to index within local jump table (jump table is embedded in instruction stream immediately following JT instruction; table length is #imm16 (so there are #imm16 32-bit entries in the table, taking up the same space as #imm16 Boot instructions). Each table entry is a 32-bit signed integer offset, in bytes, from the program location following the end of this jump table (since Boot instructions are always 32-bits, these offsets should always be a multiple of 4; if the platform stores Boot instructions in some other format it may need to adjust these offsets before executing the jump). The quantity $index is interpreted as an unsigned index into this table. If $index is less than #imm16, a jump is performed to the program location specified by the offset in the table entry at the given index; if the index provided is greater than or equal to #imm16, then execution continues from the program location following the end of this jump table (equivalent to a jump to a table entry of offset 0))
- lpc $dest _ _: $dest = PC (program counter)
- halt $ncond $result #imm8: if $ncond is equal to zero, then end program, returning result code ($result + #imm8). Otherwise, if $ncond is nonzero, do nothing.
atomics (sequential consistency):
- casrmw{sc,rc,rlx} &dest $new $old: compare-and-swap atomic (must be within the same memory domain). Upon success, $3 = $new; otherwise, $3 = the contents of &dest. The sc/rc/rlx indicates one of sequential consistency, release consistency (casrmwrc is both an acquire and a release), or relaxed semantics.
- casrmw{sc,rc,rlx}p &dest &new &old is like casrmw{sc,rc,rlx}, but where the values are pointers instead of integers (and &3 is used instead of $3).
- fencesc $memory_domain _ _: instruction/memory access reordering barrier; prevents any memory operations on the given memory_domain from appearing to be reordered across the FENCE instruction. Sequential consistency semantics.
- malloc_shared &dest $size $memory_domain: Requests allocation of a block of $size bytes of memory in memory domain $memory_domain. If successful, a pointer to the new block is stored at &dest; otherwise the null pointer (&0) is stored as &dest. memory_domain is RESERVED for future use; always use $0 for now.
- malloc_local &dest $size: Like malloc_shared but the allocated memory must only be used for thread-local storage. All atomic operations lose their atomicity and ordering guarantees when acting on local memory (e.g. so lpsc becomes equivalent to ordinary lp, etc).
- mrealloc_local &dest $newsize &oldptr: attempts to allocate a new block of local memory of size $newsize, copy the contents of the entire block at &oldptr into it, and then mfree &oldptr. If it succeeds, the new block is assigned to &dest; if it fails, the null pointer (&0) is assigned to &dest; in this case &oldptr is not mfree'd.
- mrealloc_shared &dest $newsize &oldptr: attempts to allocate a new block of memory of size $newsize &oldptr
- mfree &src: deallocates &src
- {lp,lw,sp,sw}{sc,rc,rlx} are like {lp,lw,sp,sw} but atomic, and with {sequential consistency, release consistency, or relaxed} semantics, respectively.
Relaxed semantics operations are atomic but provide no other guaranteed beyond their corresponding non-atomic variants. Release Consistency semantics are defined later but if you are familiar with it, they are RCpc; that is, the ordering operations themselves are ordered with Processor Consistency semantics. Release Consistency loads are acquires and stores are releases. Release Consistency also implies atomicity. Sequential Consistency operations provide the same guarantees as the corresponding Release Consistency operation, and in addition all Sequential Consistency operations also appear in program order in a single total order over this memory_domain observed by all threads along with all other sequentially consistent instructions.
I/O:
- in $dest $device #imm8: read one integer of input from device ($device + #imm8). If successful, $3 is set to 0; if unsuccessful, a non-zero error code is written to $3.
- out $src $device #imm8: write one integer of output to device ($device + #imm8). If successful, $3 is set to 0; if unsuccessful, a non-zero error code is written to $3.
- inp &dest $device #imm8: read one pointer of input from device ($device + #imm8). If successful, $3 is set to 0; if unsuccessful, a non-zero error code is written to $3.
- outp &src $device #imm8: write one pointer of output to device ($device + #imm8). If successful, $3 is set to 0; if unsuccessful, a non-zero error code is written to $3.
- devop $value $device #imm8: device-specific control operation of type #imm8 on a device. If successful, $3 is set to 0; if unsuccessful, a non-zero error code is written to $3.
interop:
- xentry _ #nargsi_imm8 #nargsp_imm8: enter Boot function from external platform. This instruction should be placed at each entry point that may be called from foreign code. #nretsi_imm8 and #nretsp_imm8 indicate the number of integer and pointer arguments expected.
- xret0 &return_address _ _: return from Boot function to external platform at &return_address, returning void
- xreti &return_address $return_value #imm8: return from Boot function to external platform at &return_address, returning the integer ($result + #imm8)
- xretp &return_address &return_value #imm8: return from Boot function to external platform at &return_address, returning the pointer (&return_value + #imm8)
- xcall0 &target_address _ _: call external subroutine with no arguments
- xcalli &target_address $arg1 #imm8: call external subroutine with one integer argument ($arg1 + #imm8)
- xcallp &target_address &arg1 #imm8: call external subroutine with one integer argument (&arg1 + #imm8)
- xcallm &target_address #imm16: call external subroutine with one integer argument (&arg1 + #imm8)
- xcallii &target_address $arg1 $arg2: call external subroutine with two integer arguments
- xcallmm &target_address #arg1_imm8 #arg2_imm8: call external subroutine with two immediate integer arguments
- xcallim &target_address $arg1 #arg2_imm8: call external subroutine with one integer argument and one immediate integer argument
- xcallip &target_address $arg1 &arg2: call external subroutine with one integer argument and one pointer argument
- xcallpm &target_address &arg1 #arg2_imm8: call external subroutine with one pointer argument and one immediate integer argument
- xcallpp &target_address &arg1 &arg2: call external subroutine with two pointer arguments
- xcall &target_address #nargsi_imm8 #nargsv_imm8: call external subroutine with #nargsi_imm8 integer arguments and #nargsv_imm8 pointer arguments placed as per Boot calling convention
- xlibcall0 #libfn_imm24: call external library function #libfn_imm24 with no arguments. Equivalent to doing an lkp to load a pointer constant #libfn_imm24, then doing an xcall0 to that pointer.
- xlibcalli $arg1 #libfn_imm16: call external library function #libfn_imm24 with one integer argument $arg1. Equivalent to doing an lkp to load a pointer constant #libfn_imm16, then doing an xcalli to that pointer with $arg1 0.
- xlibcallm #arg1_imm8 #libfn_imm16: call external library function #libfn_imm24 with one integer argument $arg1. Equivalent to doing an lkp to load a pointer constant #libfn_imm16, then doing an xcalli to that pointer with $0 #arg1_imm8.
- xlibcalli &arg1 #libfn_imm16: call external library function #libfn_imm24 with one pointer argument $arg1. Equivalent to doing an lkp to load a pointer constant #libfn_imm16, then doing an xcallp to that pointer with &arg1 0.
- xlibcall{ii,im,mm,im,ip,pm,pp}: etc (todo document, and add to tables above)
- xlibcall #nargsi_imm8 #nargv_imm8 #libfn_imm8: call external library function #libfn_imm8 with #nargsi_imm8 integer arguments and #nargsv_imm8 pointer arguments placed as per Boot calling convention. Equivalent to doing an lkp to load a pointer constant #libfn_imm8, then doing an xcall to that pointer.
- xpostcall _ #nretsi_imm8 #nretsp_imm8: enter Boot function after returning from external subroutine call. This instruction should immediately follow each xcall.
TODO are all these interop functions in the tables above yet?
TODO dont use 'i' as the mnemonic for both integer and immediate. Note that right now in xcalls we have 'm' for immediate, but elsewhere we use 'i'.
misc:
- ann X Y Z: ANNotation; no effect on execution
TODO what other instructions are in the tables above but not in these schema yet?
TODO revise instruction counts in those tables
TODO start copying in from = General architecture from other file
---
notes
For interop with platforms with varargs or a variable number of return arguments, when a vararg is or may be passed, the calling convention is that last two pointer arguments passed contain pointers to the integer and pointer varargs, and the last two integer arguments passed contain counts of integer and pointer varargs.
For interop on platforms which pass values which are neither integers nor pointers, when arguments are passed which are neither 32-bit integers nor pointers, then if the value is guaranteed to fit within 32-bits, it is passed as an integer, otherwise the value is stored in memory and a pointer to the value is passed.
The motivation for malloc_local is that, in order to be able to provide the concurrency guarantees required by this spec, some Boot implementations may create and use locks to control access to blocks of shared memory returned by malloc; in some cases even non-atomic, unordered load or store instructions could cause the Boot implementation to acquire a lock. malloc_local lets such an implementation know that it does not have to setup and use locks for this memory segment, and represents an assurance by the programmer that this memory segment will only ever be accessed by the same thread that called malloc_local.
all arithmetic is performed as if the values were unsigned 32-bit quantities; all offsets and additions of integers to pointers interpret these values as signed, using two's complement encoding.
Note that an implementation may legally provide sequential consistency when the program requests only release or relaxed consistency; furthermore, the additional RMW ops may be implemented using the CAS primitive; therefore, all of the atomics in the optional instructions may legally be implemented using only the atomics in the small profile as primitives.
Note that any program containing any of the implementation-defined instructions impl1 thru impl16 cannot be said to be a proper Boot program; such a program is rather a program in some implementation-defined dialect/extension of the Boot language.