proj-oot-bootReferenceOld200622

  1. Boot (Oot Boot) reference

Version: unreleased (0.0.0-0)

Boot is a low-level 'assembly language' virtual machine (VM) that is easy to implement.

  1. Introduction

Boot's purpose is to promote the portability of LoVM? (Low Virtual Machine) by serving as a portable target language underneath it.

The reason for writing a new assembly language, rather than adopting LLVM or Webassembly or similar, is that we want the entire Oot toolchain to run on top of many platforms, including 'on top of'/within existing high-level languages such as Python. This means that the entire Boot implementation needs to be quick and easy to port (not just retarget to a new backend, but entirely port), and it needs to support opaque platform-native pointers of unknown size.

Highlights:

    1. Cheatsheet

instruction encoding: 32-bits: 3 operands (1 byte each), and 1 byte opcode. First operand is restricted to a maximum value of 63 (unsigned), so the first two bits of each instruction are guaranteed to be 0.

Datatypes: int32, ptr

Registers: two banks of 16 registers each; one for int, one for ptr. The first register in each bank is constant zero.

Instructions:

Boot instructions fall into three categories:

small profile (47 instructions; all opcodes are below 128):

==
annotation ann
constants li sai
loads and stores and moves cp cpp lb lbu lh lhu lp lw sb sh sp sw pushi popi pushp popp
arithmetic of ints add sub mul addi add32 sub32 mul32 addi32
bitwise arithmetic and or xor not sll srs sru
arithmetic of pointers addp addpi addppi addpwi pcmp
non-branching conditionals cmovi cmovip cmovpp
comparision control flow bne bnep blt beq beqp
other control flow jrel jt
misc ann halt
==

TODO unsigned integer arithmetic or conversions?

standard profile adds (52 instructions, for 92 total; all opcodes are below 128):

==
integer division div rem
floating point i2f lf sf ceil flor trunc nearest addf subf mulf divf copysign bnan binf beqf bltf fcmp
atomics (sequential consistency)lpsc lwsc spsc swsc casrmwsc casrmwpsc fencesc
constants and constant tables lkp lkpb jk lkf
memory allocation malloc_shared malloc_local mfree mrealloc
I/O in out inp outp devop
other control flow jy lpc
misc sysinfo log
==

optional instructions (all opcodes are 128 or greater):

==
atomics (release consistency)lprc lwrc sprc swrc casrmwrc crsprmwrc
atomics (relaxed consistency)lprlx lwrlx sprlx swrlx casrmwrlx crsprmwrlx
atomic additional rmw ops (sequential consistency)addrmwa aprmwa andrmwa orrmwa xorrmwa addrmwa
atomic additional rmw ops (release consistency)addrmwrc aprmwrc andrmwrc orrmwrc xorrmwrc addrmwrc
atomic additional rmw ops (relaxed consistency)addrmwrlx aprmwrlx andrmwrlx orrmwrlx xorrmwrlx addrmwrlx
additional floating point remf sqrt minf maxf powf bnonfinite bgtf beqtotf blttotf ftotcmp
implementation-defined impl1 thru impl16
filesys read write open close seek flush poll
interop xentry xcall0 xcalli xcallp xcallii xcallmm xcallim xcallip xcallpm xcallpp xcall xcallv xlibcall0 xlibcalli xlibcallm xlibcallp xlibcall xlibcallv xret0 xreti xretp xpostcall
misc break

In the following, #X is an immediate constant, $X is an integer register, &X is a pointer register, _ is an unused argument that must always be 0 in proper Boot programs (Boot implementation are free to make use of these locations however), and X is an untyped argument. All immediate constants are signed two's-complement ints.

From left to right, the arguments go into operands op0, op1, op2. Immediate operands are always on the right (the highest-numbered operand). When two immediate operands are combined into an #imm16 (as with instruction li), op1 is the high-order bits and op2 is the low order bits (imm16 = (op1 << 8) + op2). Similarly for #imm24 (imm24 = (op0 << 16) + (op1 << 8) + op2).

JREL and branch immediates are in units of bytes. JREL and branch immediates which are not divisible by 4 are illegal (because Boot instructions are always 32-byte aligned). Platforms which represent Boot code in memory in ways such that one instruction spans more or less than 4 memory locations should adjust the jrel and branch offsets accordingly before executing them.

constants:

register loads and stores:

moves:

arithmetic of ints (result is undefined if the result is greater than 32 bits):

arithmetic of ints (as above, but result always defined and all results mod 2^32):

bitwise arithmetic:

arithmetic of pointers:

conditional branches:

unconditional jumps and other control flow:

atomics (sequential consistency):

Relaxed semantics operations are atomic but provide no other guaranteed beyond their corresponding non-atomic variants. Release Consistency semantics are defined later but if you are familiar with it, they are RCpc; that is, the ordering operations themselves are ordered with Processor Consistency semantics. Release Consistency loads are acquires and stores are releases. Release Consistency also implies atomicity. Sequential Consistency operations provide the same guarantees as the corresponding Release Consistency operation, and in addition all Sequential Consistency operations also appear in program order in a single total order over this memory_domain observed by all threads along with all other sequentially consistent instructions.

I/O:

interop:

TODO are all these interop functions in the tables above yet?

TODO dont use 'i' as the mnemonic for both integer and immediate. Note that right now in xcalls we have 'm' for immediate, but elsewhere we use 'i'.

misc:

TODO what other instructions are in the tables above but not in these schema yet?

TODO revise instruction counts in those tables

TODO start copying in from = General architecture from other file

---

notes

For interop with platforms with varargs or a variable number of return arguments, when a vararg is or may be passed, the calling convention is that last two pointer arguments passed contain pointers to the integer and pointer varargs, and the last two integer arguments passed contain counts of integer and pointer varargs.

For interop on platforms which pass values which are neither integers nor pointers, when arguments are passed which are neither 32-bit integers nor pointers, then if the value is guaranteed to fit within 32-bits, it is passed as an integer, otherwise the value is stored in memory and a pointer to the value is passed.

The motivation for malloc_local is that, in order to be able to provide the concurrency guarantees required by this spec, some Boot implementations may create and use locks to control access to blocks of shared memory returned by malloc; in some cases even non-atomic, unordered load or store instructions could cause the Boot implementation to acquire a lock. malloc_local lets such an implementation know that it does not have to setup and use locks for this memory segment, and represents an assurance by the programmer that this memory segment will only ever be accessed by the same thread that called malloc_local.

all arithmetic is performed as if the values were unsigned 32-bit quantities; all offsets and additions of integers to pointers interpret these values as signed, using two's complement encoding.

Note that an implementation may legally provide sequential consistency when the program requests only release or relaxed consistency; furthermore, the additional RMW ops may be implemented using the CAS primitive; therefore, all of the atomics in the optional instructions may legally be implemented using only the atomics in the small profile as primitives.

Note that any program containing any of the implementation-defined instructions impl1 thru impl16 cannot be said to be a proper Boot program; such a program is rather a program in some implementation-defined dialect/extension of the Boot language.