Bayle Shanks's website: proj-oot-bootReferenceOld201018

Boot (Oot Boot) reference

Version: unreleased (0.0.0-0)

Boot is a low-level 'assembly language' virtual machine (VM) that is easy to implement.

# Introduction

Boot is a target language that is easy to implement on a wide variety of platforms, even on very primitive bare metal, or 'on top of'/within an existing high-level languages such as Python.

Highlights:

3-operand fixed-length register machine
signed 32-bit integers
integers and pointers both have implementation-dependent sizes in memory, and may be different sizes
opaque pointer representation
7 integer registers, 7 pointer registers, 15 opaque registers to copy values of any type, and a zero register, and a null pointer register
<=64 instructions
RISC-like; no addressing modes (or rather, each instruction has one fixed addressing mode), and only a few instructions access (non-register) memory
instructions for I/O, memory allocation, and calling to and from host platform
the extension BootX? is available (well, rather it's in progress) that specifies more instructions and functionality (see bootx_reference.md)

[TOC]

# Instruction encoding ## 4 bytes per instruction. The bytes are:

op0 (operand 0)
op1
op2
opcode

Op0 is restricted to a maximum value of 63 (when interpreted as unsigned), meaning that the 2 most-significant-bits are always zero.

# Datatypes ##

two primary datatypes:

int32 (32-bit integers)
ptr (pointers)

ptr has two subtypes: - ptrd (data pointers) - ptrc (code pointers)

# Registers ## Two banks of 8 registers each; one for int32, one for ptr. The first register in the int32 bank is constant zero, and the first register in the ptr bank is the null pointer; writes to these registers have no effect. The notation $n refers to the n-th int32 register, and &n refers to the n-th ptr register, for example the first and last registers in each bank are: $0, $15, &0, &15.
1. Instructions ##

47 instructions

==
annotation	ann
load constants	l32m
loads and stores and copies	lb lbu lh lhu lp lw sb sh sp sw cp cpp
arithmetic of ints	add sub mul addm
bitwise arithmetic	and or xor not shl shrs shru
adding ints to pointers	app ap32 ap16 ap8 appm ap32m ap16m ap8m
comparision control flow	bne blt bltu beq bnep beqp
other control flow	jrls jrlm jrll jy lpc
I/O	in inp
interop	xlib
misc	break impl sinfo
==

(Notation for the instruction tables below) #imm6, #imm8, #imm16, #imm22 are immediate constants using two's complement encoding (#imm6 is 6 bits instead of 8 and #imm22 is 22 bits instead of 24 because op0 can only reach 63, as noted in 'Instruction encoding' above), #imm22u, #imm16u, #imm8u, #imm6u are immediate constants interpreted as unsigned, $X is an integer register, &X is a pointer register, _ is an unused argument that must always be 0 in proper Boot programs (Boot implementation are free to make use of these locations however), and X is an untyped argument. All immediate constants are signed two's-complement ints.

From left to right, the arguments go into operands op0, op1, op2. Immediate operands are always on the right (the highest-numbered operand). When two immediate operands are combined into an #imm16 (as with instruction lm), op1 is the high-order bits and op2 is the low order bits (imm16 = (op1 << 8) + op2). Similarly for #imm22 (imm24 = (op0 << 16) + (op1 << 8) + op2).

JREL and branch immediates are in units of bytes in the Boot code. JREL and branch immediates may not jump into the middle of an instruction. Platforms which compile or represent Boot code in memory in ways such that one instruction spans more or less than 4 memory locations must adjust the jrel and branch offsets accordingly before executing them.

Mnemonics with a trailing 'm' represent instructions involving an 'iMmediate' (although not all instructions with immediates have a trailing 'm' in the mnemonic).

annotation:

ann ? ? ?: ANNotation; no effect on execution

load constants:

l32m $dest $src #imm8u: Load 32-bit iMmediate int constant (embedded in instruction stream immediately following instruction), shift it left by #imm8u, add it to $src, then write it to $dest

lw $dest &addr #imm8: Load Word int32 from memory addr (&addr + #imm8*INT32_SIZE)
lh $dest &addr #imm8: Load Halfword int16 from memory addr (&addr + #imm8*INT16_SIZE)
lhu $dest &addr #imm8: Load Unsigned int16 from memory addr (&addr + #imm8*INT16_SIZE)
lb $dest &addr #imm8: Load Byte int8 from memory addr (&addr + #imm8)
lbu $dest &addr #imm8: Load unsigned int8 from memory addr (&addr + #imm8)
lp &dest &addr #imm8: Load Ptr from memory addr (&addr + #imm8*PTRD_SIZE)
sw &addr $src #imm8: Store Word int32 to memory addr (&addr + #imm8*INT32_SIZE))
sh &addr $src #imm8: Store Halfword int16 to memory addr (&addr + #imm8*INT16_SIZE)
sb &addr $src #imm8: Store Byte int8 to memory addr (&addr + #imm8)
sp &addr &src #imm8: Store Ptr to memory addr (&addr + #imm8*PTRD_SIZE)
cp $dest $src $cond: if $cond == 0, then CoPy? int between registers
cpp &dest &src $cond: if $cond == 0, then CoPy? Pointer between registers

arithmetic of ints (result always defined and all results mod 2^32):

add $dest $src1 $src2: $dest = $src1 + $src2
addm $dest $src1 #imm8: $dest = $src1 + #imm8
sub $dest $src1 $src2: $dest = $src1 - $src2
mull $dest $src1 $src2: $dest = $src1 * $src2

bitwise arithmetic:

shl $dest $src #imm8u: Shift Left (C's '<<' operator) by #imm8u bits
shru $dest $src #imm8u: Shift Right Unsigned (logical shift) by #imm8u bits
shrs $dest $src #imm8u: Shift Right Signed (arithmetic shift) by #imm8u bits
and $dest $src1 $src2
or $dest $src1 $src2
xor $dest $src1 $src2
not $dest $src1 $cond: if $cond == 0, then $dest = bitwise_NOT($src1)

Adding ints to Pointers (only valid on data pointers, not code pointers):

app &dest &src1 $src2: &dest = &src1 + $src2*PTRD_SIZE
ap32 &dest &src1 $src2: &dest = &src1 + $src2*INT32_SIZE
ap16 &dest &src1 $src2: &dest = &src1 + $src2*INT16_SIZE
ap &dest &src1 $src2: &dest = &src1 + $src2
appm &dest &src1 #imm8: &dest = &src1 + #imm8*PTRD_SIZE
ap32m &dest &src1 #imm8: &dest = &src1 + #imm8*INT32_SIZE
ap16m &dest &src1 #imm8: &dest = &src1 + #imm8*INT16_SIZE
apm &dest &src1 #imm8: &dest = &src1 + #imm8

conditional branches:

beq $src0 $src1 #imm8: Branch-if-EQual
beqp &src0 &src1 #imm8: Branch-if-EQual on Ptrs
bne $src0 $src1 #imm8: Branch-if-Not-eQual
bnep &src0 &src1 #imm8: Branch-if-Not-Equal on Ptrs
blt $src0 $src1 #imm8: Branch-if-Less-Than
bltu $src0 $src1 #imm8: Branch-if-Less-Than-Unsigned

unconditional jumps and other control flow:

jr #imm9: unconditional Jump (Relative to the next instruction) (range +-255)
jrls #imm9: unconditional Jump Relative (to the next instruction) Long; 9-bit signed immediate offset (twos complement of concatenation of all 3 operands)
jrlm #imm24: unconditional Jump Relative (to the next instruction) Long; 24-bit signed immediate offset (twos complement of concatenation of all 3 operands)
jrll #imm32: unconditional Jump Relative (to the next instruction) Long; 32-bit signed offset in next word in instruction stream
jy &target _ _: Jump dYnamic (indirect)
lpc &dest: Load Program Counter

I/O:

in1 $dest $device #imm8u: read IN one int8 from device ($device + #imm8)
out1 $device $src #imm8u: write OUT one int8 to device ($device + #imm8)
in &dest &device $len: read IN $len int8s from device $device
out $device &src $len: write OUT $len int8s to device $device

interop:

xlib #libfn_imm8u #nargsi_imm6u #nargp_imm8u: call eXternal LIBrary function

misc:

break ? ? ?: BREAKpoint (implementation-dependent debugging)
impl ? ? ?: IMPLementation-dependent instruction
sinfo $dest #query1_imm8u #query2_imm8u: $dest = System INFOrmation query

Notes on certain instructions ###

app, ap32, addpi17, ap: the int32 arguments $src2 are interpreted as signed, so although only addition is provided, subtraction can also be accomplished
ann: implementations may ignore or strip ann instructions
blt: the int32 arguments (in $src0 and $src1) are signed
bltu: the int32 arguments (in $src0 and $src1) are unsigned
cp, cpp, not, halt: to make the cp/cpp/not/halt instructions unconditional, just use register $0 for $cond, since $0 is the always-zero register which always holds 0
cp: when the condition is true, this is equivalent to addi $dest $src 0
cpp: this is not equivalent to app because this can be used on codeptr and app can only be used on 'ptrd's
impl and break: Strictly speaking, any program containing either of these instructions is invalid Boot code, as it is not really a Boot program but rather is a program in some implementation-defined dialect/extension of the Boot language
in, out, inp, outp: If successful, $3 is set to the number of bytes read or written (either 0 or 1 for IN; for IN, if nothing was available to be read or if EOF was reached, IN writes 0 to $3 and the output register holds an arbitrary value; for either IN or OUT, some platforms may return 0 in 3 rather than returning an error in some cases); in case of an error, a negative error code is written to $3. The device number is interpreted as unsigned.
jmp: the #imm22 specifies an code location in terms of bytes from the start of the program
jrel: JREL 0 (the instruction whose encoding is all-zero bits) is illegal
jy: &target must be a code pointer provided at runtime (either by lpc, or by platform-specific or foreign code) and which did not have pointer arithmetic performed on it.
lb: guaranteed to produce values between -128 and 127, inclusive (when interpreted as signed).
lbu: guaranteed to produce values between 0 and 255, inclusive (when interpreted as unsigned)
lh: guaranteed to produce values between -32768 and 32767, inclusive (when interpreted as signed).
lhu: guaranteed to produce values between 0 and 65535, inclusive (when interpreted as unsigned)
sb, sh: stores the least-significant 8- and 16- bits, respectively
sinfo: when query_1 = ..., this returns in &dest ...: 0, PTRD_SIZE, the number of memory locations per ptrd 1, INT32_SIZE, the number of memory locations per int32 2, INT16_SIZE, the number of memory locations per int16 3, VERSION (currently 0) 4, INTMAX_32 5, INTMAX_16 6, INTMAX_8 247-254, implementation-defined others: RESERVED for extensions query_2 should always be 0 when query_1 is 0,1,2,3.
xlib: call external library function #libfn_imm8 with #nargsi_imm8 integer arguments and #nargsv_imm8 pointer arguments placed as per Boot calling convention. See below for defined libfn numbers.

Arithmetic ## Int32 overflow on addition, subtraction, multiplication wraps around (that is, mathematically the operations are done mod 2^32). Note that that the operations of add, addm, sub, mul give valid results whether you consider the int32 operands to be signed two's complement or unsigned, as long as you consider the result to be similarly unsigned or signed.

For example, for multiplication, imagine if we had 3-bit integers instead of 32-bit integers. If we multiply the unsigned representations of 2*3, that is, 010*011, the result is 6, that is, 110. In two's complement, 010 represents 2 and 011 also represents 3, and 110 represents -2; and 2*3 = 6 = -2 mod 2^3. To give another example, if we multiply the unsigned representations of 6*2, that is, 110*010, the result is 12, and 18 mod 2^3 is 4, that is, 100. In two's complement, 110 represents -2 and 010 represents 2, and the result, 100, represents -4; and -2*2 = -4 mod 2^3 = 4. Do note that these are only correct modulo the bitwidth; for example, 2*3 = 6, but mul 010 011 = 110, which when viewed as two's complement yields 2*3 = -2, an incorrect result in ordinary arithmetic, but -2 is equivalent to 6 mod 8, so the result is correct in mod 8 arithmetic. In the examples in this paragraph we used mod 8, but in reality, we are using mod 2^32, not mod 8.

On many platforms, it may be easiest to implement add, sub, addm by viewing the int32s as unsigned integers and then applying unsigned addition, subtraction, because many platforms don't implement wrap-around signed numbers.

Note that many arithmetic operations are provided only for integers; the only arithmetic you can do to pointers is add or subtract integers to/from them.

# (Not) mixing integer bitwidths ##

When in registers and being operated upon, the internal representation of int32s is a defined sequence of bits, however, when in memory, the internal representation of integers is opaque. For example, if a memory location x contains a 32-bit integer, and you read it with lh or lhu, the value that is read is unspecified other than that it's no larger than 16 bits. Similarly, if a memory location contains an 8-bit integer and you read it using lw, the value that is read is unspecified other than that it's some int32 (also, reading a byte using lw near the edge of accessible memory will cause undefined behavior if there are less than INT32_SIZE memory locations in accessible memory, starting with the location read). You cannot write a sequence of bytes (8-bit integers) into memory and then usefully read it back using lw, and you cannot write a 32-bit integer into memory and then usefully read out its component bytes.

Furthermore there is no guarantee that 32-bit integers occupy more than one memory location, or that larger integer bitwidths occupy more memory locations than smaller; it's possible for both of INT16_SIZE, INT32_SIZE to be identically 1 (this can happen if the implementation chooses to make each single memory location large enough to store 32 bits of data).

The instructions lb, lbu, lh, lhu guarantee that the numbers read into registers are in certain ranges that fit in 8- and 16-bits, respectively. However, lb and lh result in signed two's complement representations in the destination register; note that the bit pattern of a small negative number in a 32-bit register, when coded with signed two's complement, is equivalent to a number larger than 16 bits if interpreted as unsigned. For example, a -1 in a register, signed, would be viewed as (232 - 1 = 4294967295) unsigned.

Boot guarantees that bytes (8-bit integers) have a size 1 in memory (meaning that values that are stored with sb occupy one memory location). INT8_SIZE is always 1.

# I/O ## If standard console streams STDIN, STDOUT, exist on the platform and are supported by the implementation, they must be devices #0, #1, respectively, and device #2 must be STDERR if it exists, and otherwise should be an alias to STDOUT or may be a null device (one which never emits anything and to which writing has no effect).

An implementation does not have to support INP, OUTP.

# xlib calls ## Number 0 is defined below and 3 thru 127 are RESERVED for extensions. libfn numbers 128 thru 254 are implementation-defined.

End program with result code 'result'. The result code is interpreted as signed.

# The Boot Calling Convention ##

Up to 3 integer arguments and up to 3 pointer arguments are passed in registers.

Registers 1, 2, 4, 5 (all banks) are caller-saved. Registers 3, 6, 7 (all banks) are callee-saved.

Pointer register 3 is used as a memory stack pointer when applicable (TODO what does 'when applicable' mean?), otherwise it is callee-saved.

Pointer register 5 is used as a return address pointer/link register. When using xlib, there is no need to set this register, these instructions will set it if needed.

Registers 1, 2, 4 (both integer bank and pointer bank) are used to pass arguments and return values (from lower to higher number get arguments from left to right). Upon making a call, up to 3 integer arguments are in integer registers 1, 2, 4, and up to 3 pointer arguments are in pointer registers 1, 2, 4, and the return address is found in pointer register 5.

Registers 5,6,7 (both banks) are caller-saved scratch registers and may be overwritten and used for any purpose by the callee.

Upon returning from a call, up to 3 integer and up to 3 pointer return values will be found in registers 1,2,4 using the same convention as for calling.

# Undefined behavior and arbitrary values ##

These lists are probably accidentially incomplete right now, but we hope to make this list comprehensive as time goes on.

## Undefined behavior ###

The following are undefined behaviors in Boot. Any program containing undefined behavior on any codepath has undefined behavior as a whole:

branching or jumping to a location outside the bounds of the program
branching or jumping to a location in the middle of an instruction
creating a pointer to or accessing memory that was neither malloc'd, nor provided to the Boot program by an external function, unless the platform permits this in an implementation-dependent way
performing addition (arithmetic) on a code pointer, or accesing or using a code pointer upon which arithmetic was performed
mfreeing malloc'd memory more than once
mfreeing a pointer which was not previously returned by malloc
branching or jumping back to the same instruction
jrel 0
an instruction with op0 > 63 (when interpreted as unsigned)
any opcode which is RESERVED
loading a non-integer into an int32 register
loading a non-pointer into a ptr register
load from a memory location that is in the middle of an integer or pointer
any instruction that does not have a 0 for an operand listed with a '_' above (e.g. mallo in op2)

Arbitrary values ###

The following do not cause undefined behavior and do not make the whole program invalid, but do not define the resulting values of certain operations:

loading part of an integer by using lh, lhu, lb, lbu on a larger-bitwidth integer (this is guaranteed to produce an int16 for lh and lhu, and an int8 for lb and lbu, but otherwise the value produced is not specified)
loading a larger bitwidth than was stored by using lw, lh, lhu on a smaller-bitwith integer (this is guaranteed to produce an int32 for lw, and an int16 for lb and lbu, but otherwise the value produced is not specified)

Boot Assembly ## Boot Assembly is a plaintext syntax for Boot.

Boot Assembly is ASCII text. Each line is processed separately; lines are delimited by the newline character, '\n' (a byte with the value 10). Whitespace is defined as one of the characters: ' \t\n\r\f\v' (where \t indicates tab, \n indicates newline, etc). Lines which are all whitespace, or which begin with a semicolon, are skipped. Trailing whitespace on any line is ignored.

Lines begin with an instruction mnemonic, which consists of lowercase letters and digits and is at most 5 characters long. This is followed by whitespace, followed by a number (a string of digits between 0 and 9, possibly prefixed by one of '-' or '+') denoting the first operand, op0. This may be followed by more whitespace and second number (op1), and maybe by more whitespace and a third number (op2). This may be followed by whitespace which may be followed by a semicolon. After a semicolon the rest of the line is a comment (all characters are ignored), up to the first newline, which still terminates the line.

Operands are integers in base 10 and may be prefixed by '+' or '-' to indicate sign. Instructions with exactly one unsigned immediate operand may have any unsigned value from 0 thru 4194303, inclusive, in that operand. Instructions with two operands, where unsigned immediates, may have any value from 0 thru 63, inclusive, in the first operand, and any value from 0 thru 65535, inclusive, in the second operand. Instructions with three operands, where unsigned immediates, may have any value from 0 thru 63, inclusive, in the first operand and any value from 0 to 255 in each other operand. When an immediate operand type for this instruction is signed, unsigned ranges from 0 to 63 are replaced by signed ranges from -32 to 31, unsigned ranges from 0 to 255 are replaced by signed ranges from -128 to 127, unsigned ranges from 0 to 65535 are replaced by signed ranges from -32768 to 32767, unsigned ranges from 0 to 4194303 are replaced by signed ranges from -2097152 to 2097151. Operands of register type must be in the range 0 to 15, inclusive.

Therefore, instruction lines must match the following regular expression (regex): ^([a-z][a-z0-9]+)(\s+([-+]?[0-9]+))(\s+([-+]?[0-9]+))?(\s+([-+]?[0-9]+))?\s*(;.*)?$

The last line in the file must end in a newline, unless the last line is all whitespace.

# Reserved and implementation-defined items ##

Another category is items which are reserved for future use. These items are reserved for use in future versions of Boot itself, and should not be used by either extensions or by implementations.

Implementations must not define or use items which are reserved for extensions, or items which are reserved for future use; if they do so, they risk incompatibility with extensions or future Boot versions. Extension languages must not define or use items which are defined in Boot to be implementation-dependent.

## Reserved instruction encoding space ### The limitation of op0 to have zeros in the two most-significant bits is intended to allow Boot to be made a part of other instruction formats which use zero values in these bits to indicate a Boot instruction, and non-zero values to indicate something else (for example, instructions of different lengths). That is to say, instructions with a 1 in either of the two most-significant bits in op0 are reserved for extensions.
1. Misc ## Instruction mnemonics are at most 5 characters, and are all lowercase alphanumeric. Mnemonics begin with an alphabetic character and are followed by one or more alphanumeric characters.

Note that the opcodes (see table below) have the following properties:

opcodes between 0-1, inclusive, have 22-bit immediates spanning op0, op1, op2, and are the only such opcodes
opcodes between 2-4, inclusive, have 16-bit immediates spanning op1, op2, and are the only such opcodes
opcodes between 5-44, inclusive, have an 8-bit immediate in op2, and are the only such opcodes (excluding mallo and mfree which have _ (forced 0) in op2)
opcodes between 6-31, inclusive, have a signed 8-bit immediate in op2, and are the only such opcodes
opcodes between 6-14, and 32-36, inclusive, are control flow, and are the only such opcodes
opcodes between 37-40, inclusive, are I/O, and are the only such opcodes
opcodes between 15-26, inclusive, are load/stores, and are the only opcodes that directly access memory (although there are others, such as mallo, which call system subroutines that probably access memory)
opcodes between 45-48, inclusive, have a conditional in op2, and are the only such opcodes
opcodes between 45-58, inclusive, have an integer register in op2, and are the only such opcodes

Note that later additions that assign instruction(s) to the RESERVED opcode may break the 'only such' parts of these properties.

The short descriptions under Instructions, above, have been kept to at most 80 characters per line.

# Opcodes and argument types ##

Type identifiers in the following table:

i22, i16, i8: signed immediate of the specified bitwidth
u22, u16, u8: unsigned immediate of the specified bitwidth
ri, rp, ra: register specifier for int32, ptr, anytype bank, respectively
_: must be 0

Opcode and argument type table:

0: jrel ('i22',)
1: jmp ('u22',)
2: lentr ('rp', 'i16',)
3: lm ('ri', 'u16',)
4: sam ('ri', 'u16',)
5: ann ('u8', 'u8', 'u8',)
6: beq ('ri', 'ri', 'i8',)
7: beqp ('ri', 'ri', 'i8',)
8: bne ('ri', 'ri', 'i8',)
9: bnep ('ri', 'ri', 'i8',)
10: blt ('ri', 'ri', 'i8',)
11: bltu ('ri', 'ri', 'i8',)
12: halt ('ri', 'ri', 'i8',)
13: xreti ('rp', 'ri', 'i8',)
14: xretp ('rp', 'rp', 'i8',)
15: lw ('ri', 'rp', 'i8',)
16: lh ('ri', 'rp', 'i8',)
17: lhu ('ri', 'rp', 'i8',)
18: lb ('ri', 'rp', 'i8',)
19: lbu ('ri', 'rp', 'i8',)
20: lp ('rp', 'rp', 'i8',)
21: la ('ra', 'ra', 'i8',)
22: sw ('rp', 'ri', 'i8',)
23: sh ('rp', 'ri', 'i8',)
24: sb ('rp', 'ri', 'i8',)
25: sp ('rp', 'rp', 'i8',)
26: sa ('ra', 'ra', 'i8',)
27: addm ('ri', 'ri', 'i8',)
28: appm ('rp', 'rp', 'i8',)
29: ap32m ('rp', 'rp', 'i8',)
30: ap16m ('rp', 'rp', 'i8',)
31: apm ('rp', 'rp', 'i8',)
32: xcall ('rp', 'u8', 'u8',)
33: xentr ('_', 'u8', 'u8',)
34: xlib ('u6', 'u8', 'u8',)
35: xaftr ('_', 'u8', 'u8',)
36: xret0 ('rp', '_', '_',)
37: in ('ri', 'ri', 'u8',)
38: out ('ri', 'ri', 'u8',)
39: inp ('rp', 'ri', 'u8',)
40: outp ('ri', 'rp', 'u8',)
41: shl ('ri', 'ri', 'u8',)
42: shru ('ri', 'ri', 'u8',)
43: shrs ('ri', 'ri', 'u8',)
44: sinfo ('ri', 'u8', 'u8',)
45: cp ('ri', 'ri', 'ri',)
46: cpp ('rp', 'rp', 'ri',)
47: cpa ('ra', 'ra', 'ri',)
48: not ('ri', 'ri', 'ri',)
49: app ('rp', 'rp', 'ri',)
50: ap32 ('rp', 'rp', 'ri',)
51: ap16 ('rp', 'rp', 'ri',)
52: ap ('rp', 'rp', 'ri',)
53: add ('ri', 'ri', 'ri',)
54: sub ('ri', 'ri', 'ri',)
55: mul ('ri', 'ri', 'ri',)
56: and ('ri', 'ri', 'ri',)
57: or ('ri', 'ri', 'ri',)
58: xor ('ri', 'ri', 'ri',)
59: mallo ('rp', 'ri', '_',)
60: mfree ('rp', '_', '_',)
61: break ('u8', 'u8', 'u8',)
62: impl ('u8', 'u8', 'u8',)
63: RESERVED ()

# TODO ##

update Opcode and argument type table for recent changes
rethink the choice of register numbes for the Boot calling convention (and rethink which of the 4 first registers should be the SMALLSTACK pointer in loot)
todo finish the writeup in ootAssemblyNotes27
make the immediates imm3, so that they'll fit the 16-bit encoding
start copying in from = General architecture from other file
search for TODOs above
maybe xret0 could take an $ncond?
mb get rid of inp, outp, ap16, ap16m?
or.. mb add ina, outa? (no, i'd rather get rid of inp, outp)
without a way to read the PC, can't manually push a return address onto a stack for calling; so reduced to only using xcall, which can only call xentr's. Is this an issue? Maybe... we wanted to preserve xcall for EXTERNAL platform calls (or call to externally visible entry points in Boot code), we want to allow the code to call itself normally. Otoh maybe it's fine for now. This makes this even less of an ASSEMBLY language however. Would like to add CALL and RET in this case but no room. Mb remove ap16, ap16m?
if making CALL, RET, then need a stack for them. Make pointer register 3 unusable/reserved for system use? What about inside xcalls? Or specify that we are using register 3? How would CALL, RET be compiled to JVM -- would we make CALL-subroutines and translate them to methods, just like XCALLs, or would be make a gigantic switch statment, sort of like an interpreter?
mb remove HALT and just have xret instead? and mb xret to 0 is halt, or something like that. Hmm, some platforms will have some cleanup stuff to do upon HALT, and we don't want to make xret less efficient by having to check if the return addr is zero every time; and if we started the program by 'calling' it from a virtual/sentinel address (which would be found in the link register, &4, at the beginning of the program), then letting the program xret to that address to halt, now we have to introduce the inefficiency of checking the xret address on each xret call. Not sure if that inefficiency is material, though.
how do we do halfwords if the platform natively stores words in one storage location? Can halfwords still overlap with words? Is it possible to write to both the high and low halfword and then read the word (and each halfword) and see the expected results? I think not; I think whether there is overlap or not is implementation-dependent (and so is undefined behavior to rely upon? Nah, surely you are allowed to scan through memory at any stride -- it's just that the values you see are in that case are not deterministic with what was written, in the overlap sense). Must document.
todo: should you be able to loads ints into ptr regs, or should you be able to load ptrs into int regs, when doing blind copying?
ptr regs are bigger
but it might be nice to check at load time if a memory location is definitely a ptr, before even allowing it into a ptr register
however by that reasoning, maybe you would want to choose the pointer regs for blind copying anyways, b/c otherwise when you 'launder' pointers through the int32 regs during a blind copy, they would lose their 'pointer' attribute when written back out
mb introduce yet another register type?!?
actually i think this is the best idea. For systems with locations in terms of bytes, this wont even be much memory (16 bytes is only 4 int32s)
rename anytype to byte? but right now they can hold other HLL constructs too..
should i add the jy instruction, with the proviso that it must only be used intraprocedure? would it be hard to convert to a switch statement on JVM?
probably, and probably also:
rename lentr back to lcm; specify (if not already specified) that valid xcall targets must have an xentr
specify that the target of a jy instruction must not be in a different 'compilation unit', which is implementation-defined; a jy, all lcms that might feed into it, and all of their targets, must all be available in one compilation unit at compile time
reiterate that jy, like other jumps, must not jump between xcall subroutines
mb, along with CALL, we should split codeptr/ptrc into two data types, 'Boot code ptr' and 'external code ptr', because mb external code ptrs are a different format than Boot code ptrs. Eg external code ptrs could be a native ptr (into a Harvard memory), and a Boot code ptr could just be an integer describing an offset from the start of code (that a Boot interpreter uses, as opposed to a compiler, where the Boot code pointers will probably just be external code ptrs).
two questions:
should we make a separate register bank for codeptrs? Ot1h, this makes it easier for implementations to verify that no arithmetic is done on codeptrs. Otoh, we do probably need a few of these, so it adds state; also, an implementation can get around the no-arithmetic-on-codeptrs thing just by writing the codeptr to memory and reading it back as something else (this is undefined, but so is arithmetic on codeptrs anyhow). An alternative would be to say that arithmetic can only be performed on the first 8 ptrs (note that later we want to put SMALLSTACK in there). However, this alternative doesn't help out Harvard architectures, like a new register bank would. The only really really safe thing is to not allow codeptrs to be read/written from general memory, e.g. Java has return addrs on the stack and that's it? But don't they allow method handles to be written as data? i guess java keeps track of the type of stuff in memory so it's not a problem either way. Todo check out what they do there
should we define CALL-subroutines as code between a target of a CALL instruction, and the next such, and require these blocks of code to be like 'xentr-subroutines' in the sense that you can't jump between them except via CALL? This would allow CALL-subroutines to be compiled to functions/methods in e.g. JVM, rather than having a sort of switch statement and compiling return addresses on a return stack to constant integers, then dispatching through a SWITCH statement to actually go to them.
if wacky control flow, then no need for x* interop instrucs here; move to BootX?; also move calling conventions there
make section on BootX? noting that it defines interop, library and system calling, calling convention, file ops, and that ppl should use parts of that rather than reinventing the wheel
should I/O really be spec'd as blocking? esp. input. For now i removed "I/O is blocking"
should OUT be allowed to return 0 in $3, perhaps as part of non-blocking I/O? right now i say no; think about this more
consider eliminating IN and OUT and just having read and write syscalls
better define stuff like '32-bit offset in instruction stream' seen in jrl and l32m by defining #imm32 elsewhere
consider moving all I/O into syscalls
mb platformspecific int bitwidth after all (and bitwidth, and intmax, in sysinfo) (and not specifying the result of overflow, rather than soecifying wraparound?). With minimum of 16 bits specified.
maybe we should specify that one of our regs is designated 'stack pointer'
l32m could use its operands to specify: which register to load into, the number of bytes that the immediate has, the displacement where the immediate is to be found (as a register operand). Having a number of bytes would allow only having to use 1 or 2 bytes for small immediates, and having a displacement which is a register operand would allow dynamic indexing into nearly arrays of immediates. The number of bytes could be powers of 2, with 0, so 0, 1, 2, 4, 8 (the other 3 possibilities are unused). If the number of bytes is zero, then the displacement is instead a 3-bit immediate, and there is nothing else in the instruction stream. This also allows 'scaling up' to 64-bit immediates on systems that support that.
one issue with 'immediate arrays' is that, in order to find the next instruction after the immediate pool in the instruction stream, we need to specify the length of the array. Maybe the unused three possibilities could specify this (immediate arrays with 8 items of length 1, or 4 items of length 4, or 8 items of length 8, or something like that).
since l32m is already loading immediates in the instruction stream, maybe have an absolute jump that loads an immediate from the instruction stream?
l32m replace by: l32m dest offset sz means dest = load (sz bytes) from (PC+ offset) ?

later todos (transfer to other file):

unsigned integer arithmetic or conversions?
verify that all instructions fit in the compact encoding with the given number of operands
it's annoying that in Boot Assembly, lm and sam always interpred the immediate constant literal as unsigned; when you want to load a signed constant, you have to do the conversion to and from unsigned (when you are writing or reading Boot Assembly) yourself. But if we change this to allow both, it complicates implementation, because then there is no guarantee that you can roundtrip between a Boot assembler and a Boot disassembler. Otoh, maybe we don't care about roundtripping, because it isn't that hard to write a Boot assembler, and anyways, you can run the assembler/disassembler on another system, the only thing that the porter needs to implement on the target platform is the VM itself. On the third hand, it doesn't help much to allow both, because the disassembler still doesn't know which one is desired in any particular instruction, so it would only help for writing Boot assembly, not for reading it, and imo reading is where you'd really want it anyhow. The disassembler could always output comments on those lines showing the signed values.
normalize the position of memory addresses and their offsets, eg in STORE, put the address in the same place it was in LOAD (in accordance with the 2nd bullet point in Ugly in [1]). Look to see if there are any other memory address calculation instructions in here.
use 'int' instead of 'int32'; there is no crashing on overflow, size is at least 32 bits, and the lower 32 bits of the result are correct