proj-oot-bootxReferenceOld201018

  1. Boot (Oot BootX?) reference

Version: unreleased (0.0.0-0)

BootX? is a set of optional extensions to Boot. These extensions variously add instructions, define additional syscalls, define additional SYSINFO behavior, or further specify details which are unspecified in Boot.

Profiles

For convenient reference, certain subsets of these extensions can be referred to as 'profiles'.

The following profiles are defined:

Any of these may be prefixed with either 'vanilla 32-bit' or 'vanilla 64-bit' to indicate the combination of the indicated profile with the indicated 'vanilla' restrictions.

Stubs

Much functionality may be trivally implemented in a way that always returns a null result or an exceptional condition, or in a way that does not take advantage of native platform facilities, even when present. This is compliant provided that the corresponding functionality is indicated as 'stub' or 'partially stubbed'.

For example, if the Small profile is implemented but every attempt to allocate memory returns null, the implementation must not be described as "implementing the BootX? Small profile", but may be described as "implementing the BootX? Small profile, partially stubbed".

For another example, if the Standard profile is implemented in a way that prevents more than one thread/process from executing simultaneously, yet the target platform natively provides true parallel processing, then the implementation must be described as "Standard profile, partially stubbed".

Small profile

The Small profile consists of the following functionality enhancement extension:

plus the following new instruction extensions:

plus the following new syscall extensions:

Standard profile

The Standard profile includes everything in the small profile, plus the following extensions:

Instruction:

Syscall:

Performance profile

The Performance profile includes everything in the Standard profile, plus the following extensions:

Instruction:

Syscall:

Functionality extensions

Syscall functionality extension

TODO

Instruction extensions

These extensions add new instructions.

Integer division instruction extension

integer division div rem

Floating point 1 instruction extension

floating point i2f lf sf ceil flor trunc nearest addf subf mulf divf copysign bnan binf beqf bltf fcmp

Floating point 2 instruction extension

Includes Floating point 1 and adds:

additional floating point remf sqrt minf maxf powf bnonfinite bgtf beqtotf blttotf ftotcmp

Floating point Triglog instruction extension

Includes Floating point 2 and adds:

additional floating point TODO

trig, log, exp etc fns from math.h (at this point are we missing any other math from C library or math.h?)

Floating point elusive eight instruction extension

additional floating point TODO

TODO

https://www.evanmiller.org/statistical-shortcomings-in-standard-math-libraries.html#functions

Misc instruction extension

misc log break hint

TODO should these be separated?

HINTs may be executed as NOPs. They are intended for forward compatibility; later versions of the specification may define semantics for various HINTs with the understanding that some implementations may execute them as NOPs.

Atomics seqc 1 instruction extension

atomics (sequential consistency)lpsc lwsc spsc swsc casrmwsc casprmwsc fencesc

Atomics seqc 2 instruction extension

atomic additional rmw ops (sequential consistency)addrmwa aprmwa andrmwa orrmwa xorrmwa

Atomics rc 1 instruction extension

atomics (release consistency)lprc lwrc sprc swrc casrmwrc casprmwrc

Atomics rc 2 instruction extension

atomics (relaxed consistency)lprlx lwrlx sprlx swrlx casrmwrlx crsprmwrlx
atomic additional rmw ops (release consistency)addrmwrc aprmwrc andrmwrc orrmwrc xorrmwrc addrmwrc
atomic additional rmw ops (relaxed consistency)addrmwrlx aprmwrlx andrmwrlx orrmwrlx xorrmwrlx addrmwrlx

TODO: aren't the normal Boot operations already relaxed consistency?

Non-branching conditionals instruction extension

SIMD extension

TODO

Syscall extensions

These extensions add new syscalls.

Filesys syscall extension

Syscalls: TODO explain syscalls only; mb put these sort of extensions under a separate heading?

filesys read write open close seek flush poll

Environment variables extension

IPC 1 syscall extension

TODO

IPC 2 syscall extension

TODO

TUI syscall extension

TODO

Process control syscall extension

TODO

Local memory allocation syscall extension

TODO

  1. ## xlib 2: malloc(size: uint32) ### Memory allocate a new region of SIZE bytes and return a pointer to the beginning of it.
      1. xlib 3: mfree(region: ptr) ### Free a region of memory beginning at pointer REGION.

REGION argument must have been returned by a previous malloc, and must not have been previously mfree'd.

Memcpy syscall extension

TODO

  1. ## xlib 1: memcpy(dst: ptr, src: ptr, size: int32) ### Copy SIZE bytes starting at memory location SRC to memory starting at memory location DST.

Shared memory allocation syscall extension

memory allocation malloc_shared malloc_local

(TODO: which of malloc_shared/malloc_local is ordinary malloc? i think the ordinary malloc is already malloc_local)

Clocks syscall extension

TODO

see https://stackoverflow.com/questions/3523442/difference-between-clock-realtime-and-clock-monotonic https://man7.org/linux/man-pages/man2/clock_gettime.2.html

Restrictive extensions

These extensions further specify details which are unspecified in Boot.

Vanilla 32-bit extension

Vanilla 64-bit extension




TODO




from old boot:

Boot instructions fall into three categories:

 pushi popi pushp popp 

arithmetic of ints (result is undefined if the result is greater than 32 bits):

standard profile adds (52 instructions, for 92 total; all opcodes are below 128):

==
constants and constant tables lkp lkpb jk lkf
non-branching conditionals cmovi cmovip cmovpp
other control flow lpc
==

optional instructions (all opcodes are 128 or greater):

==
implementation-defined impl1 thru impl16
interop xentry xcall0 xcalli xcallp xcallii xcallmm xcallim xcallip xcallpm xcallpp xcall xcallv xlibcall0 xlibcalli xlibcallm xlibcallp xlibcall xlibcallv xret0 xreti xretp xpostcall

64-bit jumps

indirect control flow lci jy

general lci, with target that doesnt have to be xentry

lpc, for (intrusive) debuggers?

ldptrd, for data, in addition to ldptri (lci)?

atomics (sequential consistency):

(also need an instruction to flush icache? this might belong in some sort of self-modifying extension tho b/c a boot->platform compiler/interpreter might not be available at runtime)

Relaxed semantics operations are atomic but provide no other guaranteed beyond their corresponding non-atomic variants. Release Consistency semantics are defined later but if you are familiar with it, they are RCpc; that is, the ordering operations themselves are ordered with Processor Consistency semantics. Release Consistency loads are acquires and stores are releases. Release Consistency also implies atomicity. Sequential Consistency operations provide the same guarantees as the corresponding Release Consistency operation, and in addition all Sequential Consistency operations also appear in program order in a single total order over this memory_domain observed by all threads along with all other sequentially consistent instructions.

Some instructions are followed by data.

A semicolon means that the instruction is followed by data; 'instr ; data'.

jump constants only 32 bits

lentry and JMP data is relative to beginning of program

make move instructions non-branching conditionals:

a way to write Boot code into memory and then jump into it?

select (nonbranching conditional)

The motivation for malloc_local is that, in order to be able to provide the concurrency guarantees required by this spec, some Boot implementations may create and use locks to control access to blocks of shared memory returned by malloc; in some cases even non-atomic, unordered load or store instructions could cause the Boot implementation to acquire a lock. malloc_local lets such an implementation know that it does not have to setup and use locks for this memory segment, and represents an assurance by the programmer that this memory segment will only ever be accessed by the same thread that called malloc_local.

Note that an implementation may legally provide sequential consistency when the program requests only release or relaxed consistency; furthermore, the additional RMW ops may be implemented using the CAS primitive; therefore, all of the atomics in the optional instructions may legally be implemented using only the atomics in the small profile as primitives.

instructions to allow alignment?

other forms of in,out which read many bytes at a time to/from preallocated buffers, and identify the device using pointers

also nonblocking

---

floating point and other arith and other instrs from wasm and llvm

syscalls: plan9, klambda, posix, windows, macos, musl libc, android, ios, aws, freertos, python, l4, lists of frequent syscalls, wasm

clocks file rw seek file handle management open/close file management mv attributes networking nonblocking io python event loop

concurrency rw instructions (loads/stores with various memory orders) concurrency rmw instructions (cas, etc) concurrency process management instrs (fork etc)

tui: setcursorabsolute, setcursorrelative, getdimensions, setdimensions, clearscreen, printcharatcursor, getchar

graphics setpixel, getpixel, setpalette, setmode, getmodes, setcustommode? (custom screen size, custom #s of colors)

audio

pico8 https://www.lexaloffle.com/bbs/?tid=28207

---

note in boot spec that bootx will define some syscalls below 128? and some sinfos? and some/all instructions? or maybe just dont mention it much? or maybe say that some things are RESERVED for extension languages?

---

sinfo for:


If the function being called takes a variable number of arguments, then the total number of integer arguments is passed in register $11 and the total number of pointer arguments is passed in register $12.

If more than 3 integer arguments or more than 3 pointer arguments need to be passed, then a pointer to the remaining integer arguments is passed in &11 and/or a pointer to the remaining pointer arguments is passed in &12. The contents of the memory holding the additional arguments may be overwritten by the callee, just as with registers 5,6,7. However, the registers 11,12 (both banks) themselves are still callee-saved and, if modified, must be restored before return. The callee must not deallocate the memory pointed to by pointer registers 11 or 12 (that is, the memory holding the additional arguments).

On platforms which pass values which are neither integers nor pointers, when arguments are passed which are neither 32-bit integers nor pointers, if the value is guaranteed to fit within 32-bits, it is passed as an integer, otherwise the value is stored in memory and a pointer to the value is passed.

memory allocation mallo mfree
interop xcall xentr xaftr xret0 xreti xretp xtail

---

---

undef behav:

---

---

split 8-bit immediate offsets into 2 4-bit immediate offsets, and have one of those be ints, and the other be ptrs, so that you can specify an offset into a struct mixing ints and ptrs

---

something like RISC-V's RV32V (see section 'Why RISC-V's RV32V vector extension is better than fixed-width SIMD' in the plBook RISC-V chapter for why this instead of traditional SIMD)

see also ARM SVE, SVE2

---

at least 16-way permutes/shuffles (register) scatter/gather (memory; can be used for longer permutes, but in memory)

e.g. ARM NEON VTBL, VTBX; see https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-5-rearranging-vectors

e.g. consider also stuff like vpshufb, vpermps, vcompressps, vpscatterdd, vpgatherdd; also see https://branchfree.org/2018/05/30/smh-the-swiss-army-chainsaw-of-shuffle-based-matching-sequences/ ?

if we restrict ourselves to 16-way stuff, then we have 4-bit indices, and we can pack 16 indices into 64 bits.

see also https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-5-rearranging-vectors https://www.cnx-software.com/2017/08/07/how-arm-nerfed-neon-permute-instructions-in-armv8/

---