proj-oot-bootExtendedReferenceOld200622

Boot Extended (Oot BootX) reference

:hardbreaks: :toc: macro :toclevels: 0

Version: unreleased/in-progress/under development (0.0.0-0)

BootX? extends Boot with more instructions (for a total of about 60), as well as concurrency semantics. Many of the new instructions are functions that could be written as subroutines in Boot, but some of them are OS or VM services.

This documents extends the document boot_reference.adoc.

Instruction encoding

The instruction encoding is mixed-length.

Encodings are little-endian on disk, over the network, or when presented in serialized form, and are typically stored in host byte order in memory.

The least-significant bits of the instruction indicate the instruction's width in the following manner:

The format and semantics of the 16-bit instructions are identical to Boot.

The encoding format of the 8-bit instructions is (from LSBit to MSBit):

Opcodes 0 thru 31 (inclusive) are available for implementation-dependent or user use. The last 32 opcodes (32 thru 63) assigned to the 8-bit instruction opcodes are RESERVED. Our intention is to wait until we have a number of BootX? programs to analyze, and then choose the 32 available 8-bit instruction opcodes to be 'shortcuts' for common sequences of one or more 16- or 32-bit instructions (dictionary compression).

The encoding format of the 32-bit instructions is (from LSBit to MSBit):

(todo; maybe op0 should be least-significant, since in the instructions, its op0 which tends to be an input, so we want this to be loaded first)

The instruction 'ldi-i16' could be considered to have its own encoding format; 5 bits of the operands is used to determine an output target in a unique way.

When operands are used as 'subopcodes' using the INSTR- instructions, then the address mode bits and the operand_value bits are combined for the subopcode (so these subopcodes are 7 bits, not 5).

N-bit instructions must be N-bit aligned. Branch and Jump targets are in units of N-bits. So, for example, 8-bit instructions must be 8-bit aligned, 16-bit instructions must be 16-bit aligned, 16-bit instruction branch and jump targets count in units of 16-bits, 32-bit instruction branch and jump targets count in units of 32-bits.

Registers

Registers are as in Boot, except that in displacement or indexed addressing modes, a base register of 0 refers not to the zero register, but rather to the PC.

TODO: in BootX?, the registers can each hold a 64-bit quantity, or a floating-point quantity -- however, coercing these without conversion has undefined results

TODO: change (in Boot!) to separate reg banks for ints and ptrs

Addressing modes

An "addressing mode" tells how to interpret an "operand_value" to specify how to obtain an input for an instruction, or how to place an output.

There are 8 addressing modes. From 0 to 7, they are:

Sometimes an instruction input value (computed by interpreting an operand_value with an addressing mode) is called an 'effective value'. Sometimes an address (computed by interpreting an operand_value with an addressing mode) in which an instruction output is to be placed is called an 'effective address'; 'address' is used here loosely since registers and smallstack are included in potential 'effective addresses'.

In the following, the bits of an 'operand_value' are interpreted as an unsigned integer.

Here's what each addressing mode means. First we'll give an English-language summary, and then a formal definition in pseudocode.

If an effective address is given but not an effective value, then the effective value is the value found at the effective address.

In the above, when the operand_value is interpreted as a register but the register is number 2, then, if that register would be written to, then instead smallstack is pushed, and if that register would be read from, instead smallstack is popped.

Now we'll define the addressing modes more formally. A pseudocode notation based on Python is used. 'addressing_mode' is an integer between 0 and 7, inclusive, and operand_value is an unsigned (non-negative) integer.

....

  1. for when the operand is used as an input def get(addressing_mode, operand_value):
  if addressing_mode == IMMEDIATE_ADDR_MODE:
      return operand_value
  if addressing_mode == REGISTER_DIRECT_ADDR_MODE:
      if operand_value == 2:
        return pop_smallstack()
      else:
        return registers[operand_value]
  if addressing_mode == REGISTER_INDIRECT_ADDR_MODE:
      if operand_value == 2:
        effective_address = pop_smallstack()
      else:
        effective_address = registers[operand_value]]
      
      return memory[effective_address]
  if addressing_mode == STACK_ADDR_MODE:
      return smallstack_as_array[operand_value]
  if addressing_mode == CONSTANT_ADDR_MODE:
      return constants[operand_value];
  if addressing_mode == DISPLACEMENT_ADDR_MODE:
      base_address_register = operand_value >> 2;
      displacement_register = operand_value & 3;
      if displacement_register == 2:
        displacement = pop_smallstack()
      else:
        displacement = registers[displacement_register]
      if base_address_register == 2:
        base_address = pop_smallstack()
      else:
        base_address = registers[base_address_register]
      
      effective_address = base_address + displacement
      return memory[effective_address]
  if addressing_mode == INDEXED_ADDR_MODE:
      base_address_register = operand_value >> 2;
      index_register = (operand_value & 3) + 2;
      if index_register == 2:
        index = pop_smallstack()
      else:
        index = registers[index_register]
      if base_address_register == 2:
        base_address = pop_smallstack()
      else:
        base_address = registers[base_address_register]
      effective_address = base_address + index
      return memory[effective_address]
  if addressing_mode == PREDECREMENT_POSTINCREMENT_ADDR_MODE:
      if operand_value == 2:
        # todo what should we do here?
        return pop_smallstack()
      else:
        result = memory[registers[operand_value]]
        registers[operand_value] = registers[operand_value] + 1
        return result
  1. for when the operand is used as an output def put(addressing_mode, operand_value, value): if addressing_mode == IMMEDIATE_ADDR_MODE: do_special_io_output(operand_value, value)
  if addressing_mode == REGISTER_DIRECT_ADDR_MODE:
      if operand_value == 2:
        push_smallstack(value)
      else:
        registers[operand_value] = value
  if addressing_mode == REGISTER_INDIRECT_ADDR_MODE:
      if operand_value == 2:
        effective_address = pop_smallstack()
      else:
        effective_address = registers[operand_value]]
      
      memory[effective_address] = value
  if addressing_mode == STACK_ADDR_MODE:
      smallstack_as_array[operand_value] = value
  if addressing_mode == CONSTANT_ADDR_MODE:
      memory[constants[operand_value]] = value;
  if addressing_mode == DISPLACEMENT_ADDR_MODE:
      base_address_register = operand_value >> 2;
      displacement_register = operand_value & 3;
      if displacement_register == 2:
        displacement = pop_smallstack()}
      else:
        displacement = registers[displacement_register]
      if base_address_register == 2:
        base_address = pop_smallstack()}
      else:
        base_address = registers[base_address_register]
      
      effective_address = base_address + displacement
      memory[effective_address] = value
  if addressing_mode == INDEXED_ADDR_MODE:
      base_address_register = operand_value >> 2;
      index_register = (operand_value & 3) + 2;
      if index_register == 2:
        index = pop_smallstack()}
      else:
        index = registers[index_register]
      if base_address_register == 2:
        base_address = pop_smallstack()}
      else:
        base_address = registers[base_address_register]
      effective_address = base_address + index
      memory[effective_address] = value
  if addressing_mode == PREDECREMENT_POSTINCREMENT_ADDR_MODE:
      if operand_value == 2:
        # TODO what should this be?
        push_smallstack(value)
      else:
        registers[operand_value] = registers[operand_value] - 1
        memory[registers[operand_value]] = value....

Note that the use of addressing modes sometimes causes side-effects (that is, persistant changes or 'mutations' of system state in addition to the actual production of the input value or placement of the output value).

When there are addressing mode side-effects for inputs to an instruction, inputs are processed in the following order: first op0, then op1, then op2. All input are processed before any outputs. When there are addressing mode side-effects for multiple outputs to an instruction or when the same operand is both an input and an output, the ordering and multiplicity of addressing mode side-effects may vary depending on the instruction; refer to the instruction documentation.

If not otherwise stated, any non addressing mode side-effects of an instruction (for example, pushing a result onto smallstack) are applied before any addressing mode output side-effects.

For conditional instructions, side-effects are applied to all input operands regardless of whether the condition is satisfied.

Immediate addressing mode device outputs

Here is where the result of an instruction is sent when the output operand is in IMMEDIATE mode:

(todo: is it a good idea to intermix 'real' stuff like STDOUT with debugging stuff like STDERR? this prevents compilers from easily removing all of the debugging output instructions)

For byte-oriented outputs, the result of sending a value between 0 and 255 is one byte of output. The result of sending a value less than 0 or greater than 255 results in either a critical error or in zero or more bytes of output being generated; whether or not there is an error, how many bytes of output, and what the contents of thes byets are is implementation-dependent.

Opcode addressing modes

The following opcodes are mapped to ordinary instructions, with the mapping given in the tables below:

This gives us a total of 128 three-operand instructions.

When the addressing mode is CONSTANT, or the operand_value is less than 16, and the addressing mode is register direct, register indirect, stack, displacemend, indexed, or predecrement_postincrement, this is treated as a function call. The opcode is processed as an ordinary operand read according to address mode, and the result is assumed to be a pointer to a function. The return address (the codepointer pointing to the instruction after the function call) is pushed onto SMALLSTACK. Then, the other three operands of the function call instruction are processed as input operands and the values read are pushed onto SMALLSTACK in order of op2, op1, op0 (so the top-of-SMALLSTACK now contains op0). Then, a jump is executed to the function pointer.

'constant' type arguments

When an instruction takes a 'constant' argument (type 'c' in the tables below), the addressing mode bits are not interpreted as an addressing mode, and instead are concatenated with the operand_value bits to form an immediate constant (the addressing mode bits are the most-significant bits). Therefore these immediate constants are 7 bits, not just 4.

TODO: get rid of stack-aware operands; make predec/postinc addr mode register 2 push/pop from smallstack; leave register direct register 2 RESERVED for now.

Instructions

The 32 three-operand instructions in opcode addressing mode IMMEDIATE:

0. annotate: c c c (can be ignored) 1. INSTR-TWO-ARITH2: c ? ? (used to encode two-operand instructions) 2. INSTR-TWO-BOOTX-1: c ? ? (used to encode two-operand instructions) 3. INSTR-TWO-BOOTX-2: c ? ? (used to encode two-operand instructions) 4. jrel: c c c (unconditional relative jump by a constant signed amount) 5. ldi: c c oi (load immediate 14-bit int) 6. ldabs: c c o (2-operand absolute instruction format; load from absolute 14-bit memory location; negative location are implementation-dependent, positive locations are slots from memory data segment start (if any)) 7. stabs (c c i): (2-operand absolute instruction format; store to absolute 14-bit memory location; negative locations are implementation-dependent, positive locations are slots from memory data segment start (if any)) 8. ld: c o ip (load from memory address plus unsigned constant) 9. st: c ip i (store from register to memory address plus unsigned constant) 10. bne-int: c ii ii (branch-if-not-equal on ints) 11. ble-int: c ii ii (branch if less-than-or-equal on ints) 12. bne-ptr: c ip ip (branch-if-not-equal on pointers) 13. ble-ptr: c ip ip (branch if less-than-or-equal on pointers) 14. addi-int: c io ii (in-place addition of ints and immediate constant) 15. addi-ptr-int: c iop ii (in-place addition of ints and immediate constant to ptr) 16. sll: c io ii (shift left logical (multiplication by 2^c (mod MAX_INT+1))) 17. srl: c io ii (shift right logical (division by 2^c, rounding towards zero)) 18. sra: c io ii (shift right arithmetic (division by 2^c, rounding towards negative infinity)) 19. add-int: oi ii ii (addition of ints) 20. add-ptr-int: op ip ii (add a int to a pointer) 21. bitor: oi ii ii (bitwise OR) 22. bitand: oi ii ii (bitwise AND) 23. bitxor: oi ii ii (bitwise XOR) 24. sub-ptr: op ip ip (subtraction of pointers) 25. mul-int: oi ii ii (integer multiply) 26. add-f: of if if (float addition) 27. csel: o i i (if top of SMALLSTACK is zero, then CPY op1 into op2, otherwise CPY op0 into op2) 28. pop-from-one-stack-push-to-another-multi: sio sio ii (operands are the two stacks, and the number of items to move between them) (note: if either stack is in memory, atomicity may not be supported; see SYSINFO, todo) 29. 30. 31. CAS: iop i i

(todo: the instruction ordering rules haven't been enforced beyond this point)

The 16 three-operand instructions in opcode addressing mode REGISTER_DIRECT:

16. mul-f: of if if (float division) 17. eq-i: oi ii ii (test for equality, resulting in 0 or 1) 18. neq-i: oi ii ii 19. leq-i: oi ii ii 20. le-i: oi ii ii 21. eq-f: oi if if 22. neq-f: oi if if 23. leq-f: oi if if 24. le-f: oi if if 25. div-f: of if if (float division) 26. min-f: of if if (float min) 27. max-f: of if if (float max) 28. mul-f: of if if (float multiplication) 29. copysign-f: of if if (float op1's sign is changed to op2's sign) 30. negcopysign-f: of if if (float op1's sign is changed to the negation of op2's sign) 31. xorsign-f: of if if (float op1's sign is changed to xor of op1's and op2's signs)

The 16 three-operand instructions in opcode addressing mode REGISTER_INDIRECT:

16. loadc: c c o (load constant; the implementation of constant storage is up to the implementation, but one simple idea is to put them in the space before the start of the program) 17. jmp: c c c (jump to absolute address. Positive locations are bytes in the BootX? instruction stream relative to program start. The interpretation of negative locations is implementation-dependent) 18. beq-int: c ii ii 19. blt-int: c ii ii 20. beq-ptr: c ii ii 21. blt-ptr: c ii ii 22. lev: op ii ii (load effective value; takes an addressing mode and an operand_value, and returns an effective address, possibly causing side-effects) 23. lea: op ii ii (load effective address; takes an addressing mode (other than immediate, register, stack, or constant) and an operand_value (other than 2), and returns an effective address, possibly causing side-effects) 24. ld-postincrement-scaled: o iop ii (load from op1, then add op0 to op1) 25. st-predecrement-scaled: iop ii i (add op1 to op2, then store op0 to the (new) op2) 26. ld-idx: o ip ii (load from (memory address plus integer)) 27. st-idx: ip ii i (store to (memory address plus integer)) 28. push-smallstack-idx-scaled: ip ii ii (push onto SMALLSTACK from (memory address plus index*scale) 29. pop-smallstack-idx-scaled: ip ii ii (pop from SMALLSTACK into (memory address plus index*scale) 30. mac-int: ioi ii ii (multiply-accumulate on integer: op2 = op2 + op1*op0) 31. mac-ptr: iop ii ii (multiply-accumulate on pointer: op2 = op2 + op1*op0)

The 16 three-operand instructions in opcode addressing mode STACK:

16. sub-f: of if if (float subtraction) 17. ld-multi: o ii ip (load multiple items starting at the pointer in op0 into a series of places starting with op2, where the number of items is op1) (this may or may not be atomic on a given platform; see SYSINFO, todo) 17. st-multi: op ii i (store multiple items starting at op0 into a series of memory locations starting with op2, where the number of items is op1) (this may or may not be atomic on a given platform; see SYSINFO, todo) 18. ldi-i16: c c c (load signed 16-bit immediate into indicated register or stack location (the most significant bit is register (0) (register addr mode) or smallstack (smallstack addr mode) (1), and the next most significant 4 bits are the register number or smallstack location; selecting register 2 (register/stack bit is 0, location is 2) indicates a push to smallstack; another way of looking at this is that op2 is donating 2 bits of its addr mode to the concatenated constant) 19. ccpy-nz: o i i (conditional copy; if op0 is non-zero, copy op1 to op2) 20. ccpy-z: o i i (conditional copy; if op0 is zero, copy op1 to op2) 21. 22. cadd-nz-int: ioi ii ii (conditional add; if op0 is non-zero, then op2 = op2 + op1) 23. cadd-z-int: ioi ii ii (conditional add; if op0 is zero, then op2 = op2 + op1)) 24. cadd-nz-ptr: iop ii ii 25. cadd-z-ptr: iop ii ii 26. cneg-nz: oi ii ii (conditional negate; if op0 is non-zero, then op2 = -op1) 27. cneg-z: oi ii ii (conditional negate; if op0 is zero, then op2 = -op1) 28. cbitnot-nz: oi ii ii (conditional bitnot; if op0 is non-zero, then op2 = bitnot(op1)) 29. cbitnot-z: oi ii ii (conditional bitnot; if op0 is zero, then op2 = bitnot(op1)) 30. csel-pushsmallstack: i i i (conditional select and push to smallstack; if op0 is 0, then push op1 to smallstack; otherwise push op2) 31. csel-multi: o i ii (take 2 items from op1 in the same manner as st-multi, and conditionally select; if op0 is 0, then copy the first item to op2; otherwise copy the second item to op2)

The 16 three-operand instructions in opcode addressing mode DISPLACEMENT:

16. ld-idx2: o ip ii (multiply the index op0 by 2, then execute ld-idx) 17. ld-idx4: o ip ii (multiply the index op0 by 4, then execute ld-idx) 18. ld-idx8: o ip ii (multiply the index op0 by 8, then execute ld-idx) 19. st-idx2: ip ii i (multiply the index op1 by 2, then execute st-idx) 20. st-idx4: ip ii i (multiply the index op1 by 4, then execute st-idx) 21. st-idx8: ip ii i (multiply the index op1 by 8, then execute st-idx) 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.

The 16 three-operand instructions in opcode addressing mode INDEX:

16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.

The 16 three-operand instructions in opcode addressing mode PREDECREMENT_POSTINCREMENT:

16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.

The 16 two-operand INSTR-TWO-BOOTX-1 instruction opcodes and mnemonics and operand signatures are: 0. INSTR-ONE-BOOTX: c ? (used to encode one-operand instructions) 1. 2. pickk: c sio (pick c on stack) 3. rollk: c sio (roll c on stack) 4. csel-stack: sio ii (pop two items from stack op1; if op0 == 0, push the item that was popped first back onto op1, otherwise push the second item) 5. sysinfo (query system metadata): c o 6. pop: o sim 7. push: som i 8. cpy: o i (copy from register to register) 9. neg: oi ii (arithmetic negation) 10. bitnot: oi ii (bitwise negation) 11. neg-f: of if (arithmetic negation of float) 12. cvt-int-f: of ii 13. fclass-f: oi if (classify a floating point number; see eg FCLASS in RISC-V, eg FCLASS.S in section 8.9) 14. 15.

The 16 two-operand INSTR-TWO-BOOTX-2 instruction opcodes and mnemonics and operand signatures are:

0. sqrt-f 1. roundeven-f-int ('round' in wasm) 2. trunc-f-int 3. ceil-f-int 4. roundawayfrom0-f-int ('round' in C) 5. sqrt-f 6. 7. 8. 9. 10. malloc: op ii 11. mrealloc: iop ii 12. cret-nz: sio ii (if op0 is nonzero, pop CODEPTR from indicated stack, then jmp to it) 13. cret-z: sio ii (if op0 is zero, pop CODEPTR from indicated stack, then jmp to it) 14. 15. 16.

The 16 two-operand INSTR-TWO-BOOTX-3 instructions are:

. pushselif-eq: i i (pop two items from smallstack; if item0 == item1, push op0 to op2, otherwise, push op1 to op2) 5. pushselif-ne: i i (pop two items from smallstack; if item0 != item1, push op0 to op2, otherwise, push op1 to op2) 6. pushselfif-le: i i (pop two items from smallstack; if item0 <= item1, push op0 to op2, otherwise, push op1 to op2) 7. pushselfif-lt: i i (pop two items from smallstack; if item0 < item1, push op0 to op2, otherwise, push op1 to op2) 8. 9. 10. 11. 12. 13. 14. 15.

The 16 INSTR-ONE-BOOTX one-operand instruction opcodes and mnemonics and operand signatures are: 0. instr-syscall: c (used to encode zero-operand misc instructions) 1. instr-syscall2: c 2. 3. 4. 5. 6. 7. 8. 9. 10. ret: sio (pop codeptr from indicated stack, then jmp to it) 11. 12. jd: ipp (dynamic jump) 13. 14. mdealloc: ip 15.

The 16 SYSCALL2 zero-operand instructions are: 0. 1. exec 2. get 3. put 4. spawn 5. pctrl (process control, eg join/wait, kill, etc -- or should these each be separate?) 6. time 7. rand 8. environ 9. getpid 10. signal (?? not sure if we want to do it this way -- signal handler setup) 11. create 12. delete 13. floor-f-int 14. 15. divmod-int: (on SMALLSTACK; consume 2 items and push dividend, then push remainder)

The 16 SYSCALL zero-operand instructions are the same as in Boot.

Semantics

Concurrency semantics

In shared memory (memory allocated with malloc-shared), LOAD, STORE, CAS, and atomics (the operations labeled -atomic) are atomic within the chunk allocated by malloc in one allocation. All implicit loads and stores of single Boot words (ints, or pointers) are tear-free/atomic. Nothing else in shared memory is guaranteed to be atomic. Some or all of the -atomic operations may not be available on all implementations. Some implementation may implement some atomics, or even all memory accesses to shared memory, via a lock at the granularity of the entire region allocated by one call to malloc-shared.

todos

probably won't do these b/c they can be emulated with existing instructions, and there isn't much opcode space here so these should go in OVM instead:

so with the current plan, in 32-bit, we'll get 16 more 3-operand instructions plus 8 more 2-operand ones. This will allow us to make a lot more things 3-operand, so may as well rewrite the whole thing. So we got tons of space; after just moving stuff around (and freeing up loadpc), we have 8 3-operand opcodes, 9 2-operand opcodes, 7 1-operand opcodes are open. Ideas:

with addition of:

note: the f64 instructions operate on 16 separate floating point registers

note: what about using half-precision (16-bit) floats instead? then we wouldn't need the separate registers. 64-bit floats ('doubles') could be available in OVM.

we could even just do unspecified 'native precision', but i'd prefer not to -- the point of native precision is to have a portable way to express computations, which may involve distances between pointers, and we dont know the pointer size so we cant know the integer size. but once we move outside of that, it's probably better to know how much precision you're dealing with.

but a problem with this is that i bet there are more architectures with f64 support than f16 support. In fact, it's worth than that; a quick check shows that ARM Cortex-M4F only supports f32 (single-precision floating point). Whereas Javascript and Lua and Python only support f64.

so, i think we'll stick with f64.

note: what about sign-extension and zero-extension of integers of various bitwidths? what about coerce-i16-i and coerce-i-i16? i guess those aren't very useful without knowing the bitwidth of the native ints (which a program could get from sysinfo and then have a switch statement based on the result, but it's probably simpler not to)

note: instead of all the 3-operand i16 and f64 arithmetic, perhaps we'd like more kinds of compare-and-branch instructions on native ints and ptrs?

new instructions:

==
arithmetic on native ints divmod-int
stack ops pickk, rollk (note: these obsolete the 1-operand dup, swap, over instructions)
64-bit floating point (optional)add-f64, mul-f64, div-f64, bne-f64, ble-f64, neg-f64, peek-f64, poke-f64, fclass-f64, cvt-f64-i16, cvt-i16-f64, coerce-f64-i16, coerce-i16-f64
16-bit integers add-i16, mul-i16, cvt-i16-i, cvt-i-i16, neg-i16, peek-i16, poke-i16, bne-i16, ble-i16, divmod-i16
syscalls (optional)exec, delete, get, put, spawn, pctrl, time, rand, environ, getpid, signal
==

new critical errors: divide-by-zero, (??: conversion-out-of-range? should that really be a critical error?)

note: does float divide by zero cause a critical error, or does it just create Inf? Can you choose?

todo: specify that native ints are 2s complement, and also that you can give native ints as inputs to -i16 operations on native ints, and the effect is to take just the lowest-order 16 bits of the native int. You can also give -i16s as input to native int operations, and the effect is to (zero-extend or sign-extend?) the i16. This allows us to use bne-int to compare i16s (but watch out comparing an i16 to an int; it may compare as unequal even if the low-order bits are the same, if int has any high-order bits set). However, since the i16 has a sign bit where longer ints have a '128' bit, ble-int won't work right on i16s; it will interpret i16 -128 as int 128. So, we want a ble-i16, but we have no room for that. Alternately, we could change the 'i16's to 'u16's; but then (a) we want a sub-i16, and we have no room in the 3-operands for that, and also (b) if the native ints happen to be 16-bit then ble-int wont work again. So, maybe we should say that we CANNOT use bne-int or ble-int on i16s (it's a type error), and that we must use cvt-i16-i first, which i guess sign-extends the i16 to an int. Is this a significant enough hit to i16 efficiency such that we should provide bne-i16 and ble-i16, and move mul-f64 and div-f64 (or something else) to make room? Should we consume our last 3-operand RESERVED? ok so far i consumed the RESERVED and moved divmod-int.

note: we clearly can't use most of the addr modes on float regs, should we use the bits differently? like, mb just offer register direct, immediate (which are still interpreted as signed integers), and constant, and allow 5 value bits instead of 4? Should we then have 32 f64 regs instead of 16 (eg RISC-V has 32)? Alternately, if the addr mode has any inirection, we treat this as a PEEK or a POKE to normal memory through the normal registers. I guess the latter is more consistent.

todo: instead of offering PEEK and POKE for -i16 and -f64, we could just provide bitwise truncation and sign- and zero-extension, and say, just use those with indirect addressing modes to load and store. That makes more sense since these loads and stores are to ordinary memory, right? Well, no, not with f64; how many native memory spots it will occupy is non-portable (it will be 1 on 64-bit machines and 4 on 16-bit machines). So, maybe do that with i16 but not f64?

todo: hmm aside from f64 and i16 we're looking pretty stable. But there are still some decisions to be made then regarding f64 and i16. Throwing in f64 and i16 adds some complexity, is it worth it? If so, i16 or u16? How do f64s get read from and stored into memory, and how many spaces do they take up? How about u16s? What happens if you attempt an int operation on an int16? Are there sign-extend and zero-extend ops for in16? How do you convert from ints to int16s and could this cause a critical error? How many f64 operations do we expose? Is f64 divide by 0 a critical error or does it produce inf, and can this be configured?

note: Will we be IEEE 754 compliant? I don't think so; it seems to me that IEEE-754-2008 may require SQRT and ABS and multiple rounding modes. Also, the RISC-V spec comments, "The C99 language standard effectively mandates the provision of a dynamic rounding mode register". Perhaps the RISC-V floating point operations would be the simplest way that would support the standard. I guess we could say we support it if OUR standard required various assembler intrinsics that computed in 'software' for what the VM itself doesn't do at runtime. I'd rather just keep BootX? simple, though, and say that we don't support IEEE-754-2008, although we do provide a subset of the operations defined there. If we wanted to be as complex as RISC-V, why would we even create BootX? at all? OVM will have more opcode space and can have all those other operations.

note: in fact, it seems that even Python doesn't support IEEE-754 out of the box: [1]. And Python is used a lot for numerical computing. My motto: if Python doesn't support some numerical thing, then we really don't need it (at least not at the OVM level; maybe Oot stdlib could have it).

note: we need another feature flag in SYSINFO to state whether or not instructions (aside from CAS) are still atomic when they access memory multiple times, eg CPY (r6) (r7), which copies a word from the address pointed to by r7 to the address pointed to by r6; another example is LD R6 (R7), which loads a word from the address pointed to by the address pointed to by R7 into R6 (double indirection, b/c LD itself is already indirect once, and indirect addr mode was used).

note: we may want to add instructions for bitwise set, clear, test, and toggle; note that bitwise test would be another 3-operand branching instruction. And we may want to add bitwise rotate left, bitwise rotate right, and a bitwise rotate-through-carry.

note: the 'float' below is a 'native float', just like the 'int' is a 'native int' and 'ptr' is a 'native pointer'. ints, pointers, and floats can all fit in one memory location. 'float' is guaranteed to have at least the precision and range of an IEEE half-precision (binary16) floating point number (eg integers between 0 and 2048 can be exactly represented; integers at least up to 65504 round to no more than a multiple of 32). On a 64-bit platform, pointers might be 64-bits and so C doubles might be used as 'floats'; on a 32-bit platform, C floats might be used as 'floats'. Note that, although both 'native ints' and 'native floats' must fit within one memory location, the bitwidth of our 'native ints' is not guaranteed to match that of our 'native floats'; e.g. on a platform such as Javascript in which the only number type is a 64-bit C double, C doubles might be used as 'floats' even though our 'ints' might be 32-bits (because 32 is the largest power of two that fits within the 53-bit significand precision of 64-bit double-precision floating point).

The floating point instructions are optional.

We do not provide all floating point operations and modes required by IEEE 754-2008, but where provided, floating point operations match the behavior demanded by IEEE 754-2008. Regarding mode restrictions, we follow WebAssembly?; quoting from them:

"

and

" When the result of any arithmetic operation other than neg, abs, or copysign is a NaN?, the sign bit and the fraction field (which does not include the implicit leading digit of the significand) of the NaN? are computed as follows:

If the fraction fields of all NaN? inputs to the instruction all consist of 1 in the most significant bit and 0 in the remaining bits, or if there are no NaN? inputs, the result is a NaN? with a nondeterministic sign bit, 1 in the most significant bit of the fraction field, and all zeros in the remaining bits of the fraction field.

Otherwise the result is a NaN? with a nondeterministic sign bit, 1 in the most significant bit of the fraction field, and nondeterminsitic values in the remaining bits of the fraction field. " -- [3]

and

" min and max operators treat -0.0 as being effectively less than 0.0.

In floating point comparisons, the operands are unordered if either operand is NaN?, and ordered otherwise. " -- [4]

note: 'unordered' means that eq, lt, le/leq, gt, ge/geq return false, and ne/neq returns true (matching the behavior of [5])

note: as of now we don't have round, min, or max, so some of the above is not applicable; however i've kept it in in case we add those later, or for me to copy-and-paste into OVM's spec.

---

---

---

---

golang's calling convention?

" Go has it's own ABI calling convention:

"

"

---

mb 'dropc: c sio (drop c items from stack)'? but that's only useful on SMALLSTACK, b/c o/w just use ADD on the stack pointer with an immediate-mode operand

---

---

Or we could have 3-operand destructive compare-and-conditionals, eg CINC-LE x a b: if a <= b then x=x+1

and/or a 3-operand CMOVnz: CMOVnz dest src thing_that_might_be_nonzero: if thing_that_might_be_nonzero != 0 then dest = src

maybe also CSEL, CSINC, CSINV, CSNEG, which are 3-operand (like ARM64) and also pop a boolean off of SMALLSTACK.

maybe also 1-operand CINC, CINV, CNEG which pop a boolean off of SMALLSTACK, and then conditionally mutate the operand-indicated register (or effective address) in the respective way (increment, etc). And a 2-operand CCPY as above. And a 2-operand CSEL which pushes the selected result to SMALLSTACK.

if we did any of that 'bool on SMALLSTACK' stuff we'd want to add a CMP operation.

seems like many of these would eat up a lot of 3-operand instructions. So probably save it for OVM. But consider the non-3-operand ones a little longer. Like bool-on-SMALLSTACK-based 2-operand CSEL-push-to-smallstack, 2-operand CCPY (i tentatively added those two).

---

---