Bayle Shanks's website: proj-oot-bootxReferenceOld201018

Boot (Oot BootX?) reference

Version: unreleased (0.0.0-0)

BootX? is a set of optional extensions to Boot. These extensions variously add instructions, define additional syscalls, define additional SYSINFO behavior, or further specify details which are unspecified in Boot.

Profiles

For convenient reference, certain subsets of these extensions can be referred to as 'profiles'.

The following profiles are defined:

small: very common functionality
standard: small + common operating system-provided functionality
performance: standard + various common operations which could be computed using the primitives provided by the Standard profile, but which may be faster if implemented natively

Any of these may be prefixed with either 'vanilla 32-bit' or 'vanilla 64-bit' to indicate the combination of the indicated profile with the indicated 'vanilla' restrictions.

Stubs

Much functionality may be trivally implemented in a way that always returns a null result or an exceptional condition, or in a way that does not take advantage of native platform facilities, even when present. This is compliant provided that the corresponding functionality is indicated as 'stub' or 'partially stubbed'.

For example, if the Small profile is implemented but every attempt to allocate memory returns null, the implementation must not be described as "implementing the BootX? Small profile", but may be described as "implementing the BootX? Small profile, partially stubbed".

For another example, if the Standard profile is implemented in a way that prevents more than one thread/process from executing simultaneously, yet the target platform natively provides true parallel processing, then the implementation must be described as "Standard profile, partially stubbed".

Small profile

The Small profile consists of the following functionality enhancement extension:

Sysinfo

plus the following new instruction extensions:

Floating point 1
Integer division
Misc

plus the following new syscall extensions:

Local memory allocation
Memcpy

Standard profile

The Standard profile includes everything in the small profile, plus the following extensions:

Instruction:

Atomics rc 1

Syscall:

Clocks
Environment Variables
Filesys
Process control
Shared memory allocation
TUI

Performance profile

The Performance profile includes everything in the Standard profile, plus the following extensions:

Instruction:

Atomics rc 2
Atomics seqc 1
Atomics seqc 2
Floating point 2
Floating point Triglog
Non-branching conditionals
SIMD

Syscall:

IPC 1
IPC 2

Functionality extensions

Syscall functionality extension

TODO

Instruction extensions

These extensions add new instructions.

Integer division instruction extension

integer division

div rem

Floating point 1 instruction extension

floating point

i2f lf sf ceil flor trunc nearest addf subf mulf divf copysign bnan binf beqf bltf fcmp

Floating point 2 instruction extension

Includes Floating point 1 and adds:

additional floating point

remf sqrt minf maxf powf bnonfinite bgtf beqtotf blttotf ftotcmp

Floating point Triglog instruction extension

Includes Floating point 2 and adds:

additional floating point

TODO

trig, log, exp etc fns from math.h (at this point are we missing any other math from C library or math.h?)

Floating point elusive eight instruction extension

additional floating point

TODO

https://www.evanmiller.org/statistical-shortcomings-in-standard-math-libraries.html#functions

Misc instruction extension

misc	log break hint

TODO should these be separated?

HINTs may be executed as NOPs. They are intended for forward compatibility; later versions of the specification may define semantics for various HINTs with the understanding that some implementations may execute them as NOPs.

Atomics seqc 1 instruction extension

atomics (sequential consistency)

lpsc lwsc spsc swsc casrmwsc casprmwsc fencesc

Atomics seqc 2 instruction extension

atomic additional rmw ops (sequential consistency)

addrmwa aprmwa andrmwa orrmwa xorrmwa

Atomics rc 1 instruction extension

atomics (release consistency)

lprc lwrc sprc swrc casrmwrc casprmwrc

Atomics rc 2 instruction extension

atomics (relaxed consistency)	lprlx lwrlx sprlx swrlx casrmwrlx crsprmwrlx
atomic additional rmw ops (release consistency)	addrmwrc aprmwrc andrmwrc orrmwrc xorrmwrc addrmwrc
atomic additional rmw ops (relaxed consistency)	addrmwrlx aprmwrlx andrmwrlx orrmwrlx xorrmwrlx addrmwrlx

TODO: aren't the normal Boot operations already relaxed consistency?

Non-branching conditionals instruction extension

SIMD extension

TODO

Syscall extensions

These extensions add new syscalls.

Filesys syscall extension

Syscalls: TODO explain syscalls only; mb put these sort of extensions under a separate heading?

filesys

read write open close seek flush poll

Environment variables extension

IPC 1 syscall extension

TODO

IPC 2 syscall extension

TODO

TUI syscall extension

TODO

Process control syscall extension

TODO

Local memory allocation syscall extension

TODO

## xlib 2: malloc(size: uint32) ### Memory allocate a new region of SIZE bytes and return a pointer to the beginning of it.

REGION argument must have been returned by a previous malloc, and must not have been previously mfree'd.

Memcpy syscall extension

TODO

## xlib 1: memcpy(dst: ptr, src: ptr, size: int32) ### Copy SIZE bytes starting at memory location SRC to memory starting at memory location DST.

Shared memory allocation syscall extension

memory allocation

malloc_shared malloc_local

(TODO: which of malloc_shared/malloc_local is ordinary malloc? i think the ordinary malloc is already malloc_local)

Clocks syscall extension

TODO

enumerate available clocks and/or request clock with capabilities
default wallclock clock
default monotonic clock (may reset upon each Boot invocation) (units unknown?)
get current time of clock
get info about clock (e.g. precision/units of clock)
what about asking for 32-bit vs 64-bit precision? What about getting the date? Do we offer seconds since unix epoch? Nanoseconds?
do timers and alarms go in here, or elsewhere? alarms seem like a process control thing
is setting clocks allowed (probably not?)

see https://stackoverflow.com/questions/3523442/difference-between-clock-realtime-and-clock-monotonic https://man7.org/linux/man-pages/man2/clock_gettime.2.html

Restrictive extensions

These extensions further specify details which are unspecified in Boot.

Vanilla 32-bit extension

integers are defined to be 32-bit, represented using little-endian with twos-complement for signed values
arithmetic is mod 2^32
pointers are defined to be represented as 32-bit integers
INT32_SIZE is 4
INT16_SIZE is 2
PTRD_SiZE? is 4

Vanilla 64-bit extension

integers are defined to be 64-bit, represented using little-endian with twos-complement for signed values
arithmetic is mod 2^64
pointers are defined to be represented as 64-bit integers
INT32_SIZE is 4
INT16_SIZE is 2
PTRD_SiZE? is 8

TODO

from old boot:

instructions for compare-and-swap and memory fence

Boot instructions fall into three categories:

Small profile: These can be easily ported almost anywhere
Standard profile: This is what OVM requires. This profile adds integer division, reading the program counter, indirect branching to a previously read program counter value, floating point, atomics, constant tables, and 'systems' instructions for allocating memory, I/O and filesystem operations, interoperation, querying metadata about platform capabilities, and logging.
Optional instructions: These are not required but can be added, either to expose additional facilities to Boot programs, or to provide more efficient native implementations of certain operations.

 pushi popi pushp popp

arithmetic of ints (result is undefined if the result is greater than 32 bits):

add $dest $src1 $src2: $dest = $src1 + $src2
addi $dest $src1 #imm8: $dest = $src1 + #imm8
sub $dest $src1 $src2: $dest = $src1 - $src2
mul $dest $src1 $src2: $dest = $src1 * $src2

standard profile adds (52 instructions, for 92 total; all opcodes are below 128):

==
constants and constant tables	lkp lkpb jk lkf
non-branching conditionals	cmovi cmovip cmovpp
other control flow	lpc
==

optional instructions (all opcodes are 128 or greater):

==
implementation-defined	impl1 thru impl16
interop	xentry xcall0 xcalli xcallp xcallii xcallmm xcallim xcallip xcallpm xcallpp xcall xcallv xlibcall0 xlibcalli xlibcallm xlibcallp xlibcall xlibcallv xret0 xreti xretp xpostcall

64-bit jumps

indirect control flow

lci jy

general lci, with target that doesnt have to be xentry

lpc, for (intrusive) debuggers?

ldptrd, for data, in addition to ldptri (lci)?

lkp &dest #imm16: LoaD? K-th Ptr constant into &dest
lkpb #imm24: LoaD? K-th Ptr constant into &3
pcmp &dest &src1 &src2: &dest = 1 if &src1 > &src2, or 0 if &src1 == &src2, or -1 if &src1 < &src2 todo if we have opaque refs, what if they are incomparable?
lpc $dest _ _: $dest = PC (program counter)
jk #imm24: Jump to the #imm24-th pointer in the pointer constant table
jt $index #imm16: Jump to index within local jump table (jump table is embedded in instruction stream immediately following JT instruction; table length is #imm16 (so there are #imm16 32-bit entries in the table, taking up the same space as #imm16 Boot instructions). Each table entry is a 32-bit signed integer offset, in bytes, from the program location following the end of this jump table (since Boot instructions are always 32-bits, these offsets should always be a multiple of 4; if the platform stores Boot instructions in some other format it may need to adjust these offsets before executing the jump). The quantity $index is interpreted as an unsigned index into this table. If $index is less than #imm16, a jump is performed to the program location specified by the offset in the table entry at the given index; if the index provided is greater than or equal to #imm16, then execution continues from the program location following the end of this jump table (equivalent to a jump to a table entry of offset 0))

atomics (sequential consistency):

casrmw{sc,rc,rlx} &dest $new $old: compare-and-swap atomic (must be within the same memory domain). Upon success, $3 = $new; otherwise, $3 = the contents of &dest. The sc/rc/rlx indicates one of sequential consistency, release consistency (casrmwrc is both an acquire and a release), or relaxed semantics.
casrmw{sc,rc,rlx}p &dest &new &old is like casrmw{sc,rc,rlx}, but where the values are pointers instead of integers (and &3 is used instead of $3).
fencesc $memory_domain _ _: instruction/memory access reordering barrier; prevents any memory operations on the given memory_domain from appearing to be reordered across the FENCE instruction. Sequential consistency semantics.
malloc_shared &dest $size $memory_domain: Requests allocation of a block of $size bytes of memory in memory domain $memory_domain. If successful, a pointer to the new block is stored at &dest; otherwise the null pointer (&0) is stored as &dest. memory_domain is RESERVED for future use; always use $0 for now.
malloc_local &dest $size: Like malloc_shared but the allocated memory must only be used for thread-local storage. All atomic operations lose their atomicity and ordering guarantees when acting on local memory (e.g. so lpsc becomes equivalent to ordinary lp, etc).
mrealloc_local &dest $newsize &oldptr: attempts to allocate a new block of local memory of size $newsize, copy the contents of the entire block at &oldptr into it, and then mfree &oldptr. If it succeeds, the new block is assigned to &dest; if it fails, the null pointer (&0) is assigned to &dest; in this case &oldptr is not mfree'd.
mrealloc_shared &dest $newsize &oldptr: attempts to allocate a new block of memory of size $newsize &oldptr
mfree &src: deallocates &src
{lp,lw,sp,sw}{sc,rc,rlx} are like {lp,lw,sp,sw} but atomic, and with {sequential consistency, release consistency, or relaxed} semantics, respectively.

(also need an instruction to flush icache? this might belong in some sort of self-modifying extension tho b/c a boot->platform compiler/interpreter might not be available at runtime)

Relaxed semantics operations are atomic but provide no other guaranteed beyond their corresponding non-atomic variants. Release Consistency semantics are defined later but if you are familiar with it, they are RCpc; that is, the ordering operations themselves are ordered with Processor Consistency semantics. Release Consistency loads are acquires and stores are releases. Release Consistency also implies atomicity. Sequential Consistency operations provide the same guarantees as the corresponding Release Consistency operation, and in addition all Sequential Consistency operations also appear in program order in a single total order over this memory_domain observed by all threads along with all other sequentially consistent instructions.

devop $value $device #imm8: implementation-dependent control operation of type #imm8 on a device. If successful, $3 is set to 0; if unsuccessful, a non-zero error code is written to $3.
xcall0 &target_address _ _: call external subroutine with no arguments
xcalli &target_address $arg1 #imm8: call external subroutine with one integer argument ($arg1 + #imm8)
xcallp &target_address &arg1 #imm8: call external subroutine with one integer argument (&arg1 + #imm8)
xcallm &target_address #imm16: call external subroutine with one integer argument (#imm16)
xcallii &target_address $arg1 $arg2: call external subroutine with two integer arguments
xcallmm &target_address #arg1_imm8 #arg2_imm8: call external subroutine with two immediate integer arguments
xcallim &target_address $arg1 #arg2_imm8: call external subroutine with one integer argument and one immediate integer argument
xcallip &target_address $arg1 &arg2: call external subroutine with one integer argument and one pointer argument
xcallpm &target_address &arg1 #arg2_imm8: call external subroutine with one pointer argument and one immediate integer argument
xcallpp &target_address &arg1 &arg2: call external subroutine with two pointer arguments
xlibcall0 #libfn_imm24: call external library function #libfn_imm24 with no arguments. Equivalent to doing an lkp to load a pointer constant #libfn_imm24, then doing an xcall0 to that pointer.
xlibcalli $arg1 #libfn_imm16: call external library function #libfn_imm24 with one integer argument $arg1. Equivalent to doing an lkp to load a pointer constant #libfn_imm16, then doing an xcalli to that pointer with $arg1 0.
xlibcallm #arg1_imm8 #libfn_imm16: call external library function #libfn_imm24 with one integer argument $arg1. Equivalent to doing an lkp to load a pointer constant #libfn_imm16, then doing an xcalli to that pointer with $0 #arg1_imm8.
xlibcalli &arg1 #libfn_imm16: call external library function #libfn_imm24 with one pointer argument $arg1. Equivalent to doing an lkp to load a pointer constant #libfn_imm16, then doing an xcallp to that pointer with &arg1 0.
xlibcall{ii,im,mm,im,ip,pm,pp}: etc (todo document, and add to tables above)
lentry32 &dest ; #imm32: &dest = Load register with a code pointer to an xentry instruction
jmp32 ; #imm32: unconditional jump

Some instructions are followed by data.

A semicolon means that the instruction is followed by data; 'instr ; data'.

jump constants only 32 bits

lentry and JMP data is relative to beginning of program

make move instructions non-branching conditionals:

cp $dest $src $cond: if $cond == 0, then CoPy? int from register to register (equivalent to addi $dest $src 0)
cpp &dest &src $cond: if $cond == 0, then CoPy? Pointer from register to register

a way to write Boot code into memory and then jump into it?

select (nonbranching conditional)

The motivation for malloc_local is that, in order to be able to provide the concurrency guarantees required by this spec, some Boot implementations may create and use locks to control access to blocks of shared memory returned by malloc; in some cases even non-atomic, unordered load or store instructions could cause the Boot implementation to acquire a lock. malloc_local lets such an implementation know that it does not have to setup and use locks for this memory segment, and represents an assurance by the programmer that this memory segment will only ever be accessed by the same thread that called malloc_local.

Note that an implementation may legally provide sequential consistency when the program requests only release or relaxed consistency; furthermore, the additional RMW ops may be implemented using the CAS primitive; therefore, all of the atomics in the optional instructions may legally be implemented using only the atomics in the small profile as primitives.

instructions to allow alignment?

other forms of in,out which read many bytes at a time to/from preallocated buffers, and identify the device using pointers

in2 &dest &device $length: read in up to $length memory locations from device whose pointer is &device, to buffer at pointer &dest
out2 &device &src $length: write out up to $length memory locations from buffer at pointer &dest to device whose pointer is &device

also nonblocking

---

floating point and other arith and other instrs from wasm and llvm

syscalls: plan9, klambda, posix, windows, macos, musl libc, android, ios, aws, freertos, python, l4, lists of frequent syscalls, wasm

clocks file rw seek file handle management open/close file management mv attributes networking nonblocking io python event loop

concurrency rw instructions (loads/stores with various memory orders) concurrency rmw instructions (cas, etc) concurrency process management instrs (fork etc)

tui: setcursorabsolute, setcursorrelative, getdimensions, setdimensions, clearscreen, printcharatcursor, getchar

graphics setpixel, getpixel, setpalette, setmode, getmodes, setcustommode? (custom screen size, custom #s of colors)

audio

pico8 https://www.lexaloffle.com/bbs/?tid=28207

---

note in boot spec that bootx will define some syscalls below 128? and some sinfos? and some/all instructions? or maybe just dont mention it much? or maybe say that some things are RESERVED for extension languages?

---

sinfo for:

FEATURES bitmask
is instruction N supported? (subquery)
is syscall N supported? (subquery)

xcall &target_address #nargsi_imm8u #nargsp_imm8u: eXternal CALL subroutine
xentr _ #nargsi_imm8u #nargsp_imm8u: eXternal ENTRypoint to Boot function
xaftr _ #nretsi_imm8u #nretsp_imm8u: place immediately AFTeR? xcall or xlib
xret0 &return_address _ _: RETurn void to external platform
xreti &return_address $return_val #imm8: return int32 ($result + #imm8)
xretp &return_address &return_val #imm8: return ptr (&return_val + #imm8)
xcall: call external subroutine with #nargsi_imm8 integer arguments and #nargsv_imm8 pointer arguments. Before executing this instruction, the arguments must be placed as per the Boot Calling Convention
xentr: This instruction should be placed at each entry point that may be called from foreign code. #nretsi_imm8 and #nretsp_imm8 indicate the number of integer and pointer arguments expected. Every code path starting in xentr must end in an xret or xtail, with no other 'xentr's in between. The xentr most immediately previous to an instruction, if any, is considered to begin an 'xentr subroutine' containing that instruction. No source instruction shall branch or jump to a target location within any xentr subroutine, unless either both source and target are within the same xentr subroutine, or the jump is by way of one of the instructions: xcall, xtail, xlib, xret0, xreti, or xretp.
xaftr: reenter Boot function after returning from external subroutine call. This instruction should immediately follow each xcall. #nretsi_imm8 and #nretsp_imm8 indicate the number of int32s and ptrs being returned, respectively.
xret0, xreti, xretp: return from Boot function to external platform at &return_address. xreti's int32 $return_val is interpreted as signed. &return_address must be the value that was in pointer register 4 upon the corresponding xentr.
xretp: if &return_value holds a ptrc, then $imm8 must be 0
xtail: xtail cannot be used to call any function taking more than 3 integer arguments or more than 3 pointer arguments, unless it is within an xentr routine.

If the function being called takes a variable number of arguments, then the total number of integer arguments is passed in register $11 and the total number of pointer arguments is passed in register $12.

If more than 3 integer arguments or more than 3 pointer arguments need to be passed, then a pointer to the remaining integer arguments is passed in &11 and/or a pointer to the remaining pointer arguments is passed in &12. The contents of the memory holding the additional arguments may be overwritten by the callee, just as with registers 5,6,7. However, the registers 11,12 (both banks) themselves are still callee-saved and, if modified, must be restored before return. The callee must not deallocate the memory pointed to by pointer registers 11 or 12 (that is, the memory holding the additional arguments).

On platforms which pass values which are neither integers nor pointers, when arguments are passed which are neither 32-bit integers nor pointers, if the value is guaranteed to fit within 32-bits, it is passed as an integer, otherwise the value is stored in memory and a pointer to the value is passed.

memory allocation	mallo mfree
interop	xcall xentr xaftr xret0 xreti xretp xtail

---

l8m #imm9u: Load 8-bit iMmediate int constant (the 8 least-significant-bits of the #imm9u) into either $1 or $2, depending on if the most-significant-bit of the #imm9u is 0 or 1, respectively

---

undef behav:

trying to return, using the xret functions, to any &return_address other than the one passed in upon xentr
branching or jumping between distinct xentr subrountines, or into an xentr subrountines from outside of it (without using the interoperation instructions).
failing to restore callee-saved registers before returning or tail calling with xret0, xreti, xrept, xtail
mfreeing the memory allocated by a caller for extra arguments

---

jmp #imm22u: unconditional JuMP?

---

split 8-bit immediate offsets into 2 4-bit immediate offsets, and have one of those be ints, and the other be ptrs, so that you can specify an offset into a struct mixing ints and ptrs

---

something like RISC-V's RV32V (see section 'Why RISC-V's RV32V vector extension is better than fixed-width SIMD' in the plBook RISC-V chapter for why this instead of traditional SIMD)