---
32bit ideas
so with the current plan, in 32-bit, we'll get 16 more 3-operand instructions plus 8 more 2-operand ones. This will allow us to make a lot more things 3-operand, so may as well rewrite the whole thing. So we got tons of space; after just moving stuff around (and freeing up loadpc), we have 8 3-operand opcodes, 9 2-operand opcodes, 7 1-operand opcodes are open. Ideas:
with addition of:
note: the f64 instructions operate on 16 separate floating point registers
note: what about using half-precision (16-bit) floats instead? then we wouldn't need the separate registers. 64-bit floats ('doubles') could be available in OVM.
we could even just do unspecified 'native precision', but i'd prefer not to -- the point of native precision is to have a portable way to express computations, which may involve distances between pointers, and we dont know the pointer size so we cant know the integer size. but once we move outside of that, it's probably better to know how much precision you're dealing with.
but a problem with this is that i bet there are more architectures with f64 support than f16 support. In fact, it's worth than that; a quick check shows that ARM Cortex-M4F only supports f32 (single-precision floating point). Whereas Javascript and Lua and Python only support f64.
so, i think we'll stick with f64.
note: what about sign-extension and zero-extension of integers of various bitwidths? what about coerce-i16-i and coerce-i-i16? i guess those aren't very useful without knowing the bitwidth of the native ints (which a program could get from sysinfo and then have a switch statement based on the result, but it's probably simpler not to)
note: instead of all the 3-operand i16 and f64 arithmetic, perhaps we'd like more kinds of compare-and-branch instructions on native ints and ptrs?
new instructions:
== | |
---|---|
arithmetic on native ints | divmod-int |
stack ops | pickk, rollk (note: these obsolete the 1-operand dup, swap, over instructions) |
64-bit floating point (optional) | add-f64, mul-f64, div-f64, bne-f64, ble-f64, neg-f64, peek-f64, poke-f64, fclass-f64, cvt-f64-i16, cvt-i16-f64, coerce-f64-i16, coerce-i16-f64 |
16-bit integers | add-i16, mul-i16, cvt-i16-i, cvt-i-i16, neg-i16, peek-i16, poke-i16, bne-i16, ble-i16, divmod-i16 |
syscalls (optional) | exec, delete, get, put, spawn, pctrl, time, rand, environ, getpid, signal |
== |
new critical errors: divide-by-zero, (??: conversion-out-of-range? should that really be a critical error?)
note: does float divide by zero cause a critical error, or does it just create Inf? Can you choose?
todo: specify that native ints are 2s complement, and also that you can give native ints as inputs to -i16 operations on native ints, and the effect is to take just the lowest-order 16 bits of the native int. You can also give -i16s as input to native int operations, and the effect is to (zero-extend or sign-extend?) the i16. This allows us to use bne-int to compare i16s (but watch out comparing an i16 to an int; it may compare as unequal even if the low-order bits are the same, if int has any high-order bits set). However, since the i16 has a sign bit where longer ints have a '128' bit, ble-int won't work right on i16s; it will interpret i16 -128 as int 128. So, we want a ble-i16, but we have no room for that. Alternately, we could change the 'i16's to 'u16's; but then (a) we want a sub-i16, and we have no room in the 3-operands for that, and also (b) if the native ints happen to be 16-bit then ble-int wont work again. So, maybe we should say that we CANNOT use bne-int or ble-int on i16s (it's a type error), and that we must use cvt-i16-i first, which i guess sign-extends the i16 to an int. Is this a significant enough hit to i16 efficiency such that we should provide bne-i16 and ble-i16, and move mul-f64 and div-f64 (or something else) to make room? Should we consume our last 3-operand RESERVED? ok so far i consumed the RESERVED and moved divmod-int.
note: we clearly can't use most of the addr modes on float regs, should we use the bits differently? like, mb just offer register direct, immediate (which are still interpreted as signed integers), and constant, and allow 5 value bits instead of 4? Should we then have 32 f64 regs instead of 16 (eg RISC-V has 32)? Alternately, if the addr mode has any inirection, we treat this as a PEEK or a POKE to normal memory through the normal registers. I guess the latter is more consistent.
todo: instead of offering PEEK and POKE for -i16 and -f64, we could just provide bitwise truncation and sign- and zero-extension, and say, just use those with indirect addressing modes to load and store. That makes more sense since these loads and stores are to ordinary memory, right? Well, no, not with f64; how many native memory spots it will occupy is non-portable (it will be 1 on 64-bit machines and 4 on 16-bit machines). So, maybe do that with i16 but not f64?
todo: hmm aside from f64 and i16 we're looking pretty stable. But there are still some decisions to be made then regarding f64 and i16. Throwing in f64 and i16 adds some complexity, is it worth it? If so, i16 or u16? How do f64s get read from and stored into memory, and how many spaces do they take up? How about u16s? What happens if you attempt an int operation on an int16? Are there sign-extend and zero-extend ops for in16? How do you convert from ints to int16s and could this cause a critical error? How many f64 operations do we expose? Is f64 divide by 0 a critical error or does it produce inf, and can this be configured?
note: Will we be IEEE 754 compliant? I don't think so; it seems to me that IEEE-754-2008 may require SQRT and ABS and multiple rounding modes. Also, the RISC-V spec comments, "The C99 language standard effectively mandates the provision of a dynamic rounding mode register". Perhaps the RISC-V floating point operations would be the simplest way that would support the standard. I guess we could say we support it if OUR standard required various assembler intrinsics that computed in 'software' for what the VM itself doesn't do at runtime. I'd rather just keep BootX? simple, though, and say that we don't support IEEE-754-2008, although we do provide a subset of the operations defined there. If we wanted to be as complex as RISC-V, why would we even create BootX? at all? OVM will have more opcode space and can have all those other operations.
note: in fact, it seems that even Python doesn't support IEEE-754 out of the box: [1]. And Python is used a lot for numerical computing. My motto: if Python doesn't support some numerical thing, then we really don't need it (at least not at the OVM level; maybe Oot stdlib could have it).
The 32 three-operand instruction opcodes and mnemonics and operand signatures are (note: in 32-bit instructions, constants ('c') are 7 bits, not 3 or 4 bits, because the addressing mode is treated as part of the constant): 0. bne-int: c ii ii (branch-if-not-equal on ints) 1. bne-ptr: c ip ip (branch-if-not-equal on pointers) 2. jrel: c c c (unconditional relative jump by a constant signed amount) 3. ldi: oi c c (load immediate 8-bit int) 4. ld: c o si (load from memory address plus unsigned constant) 5. st: c so i (store from register to memory address plus unsigned constant) 6. addi-int: c io ii (in-place addition of ints and immediate constant) 7. addi-ptr-int: c iop ii (in-place addition of ints and immediate constant to ptr) 8. ble-int: c ii ii (branch if less-than-or-equal on ints) 9. ble-ptr: c ip ip (branch if less-than-or-equal on pointers) 10. add-int: oi ii ii (addition of ints) 11. add-ptr-int: op ip ii (add a int to a pointer) 12. CAS 13. bne-i16 14. annotate: c c c (can be ignored) 15. bitor: io ii ii (bitwise OR) 16. bitand: io ii ii (bitwise AND) 17. bitxor: io ii ii (bitwise XOR) 18. sub-ptr: op ip ip (subtraction of pointers) 19. mul-int: oi ii ii (integer multiply) 20. sll: io c ii (shift left logical (multiplication by 2^c (mod MAX_INT+1))) 21. srl: io c ii (shift right logical (division by 2^c, rounding towards zero)) 22. sra: io c ii (shift right arithmetic (division by 2^c, rounding towards negative infinity)) 23. ble-i16 24. add-i16: io16 ii16 ii16 (integer addition of 16-bits) 25. mul-i16: io16 ii16 ii16 (integer multiplication of 16-bits) 26. add-f64: iof64 if64 if64 (float64 addition) 27. mul-f64: iof64 if64 if64 (float64 multiplication) 28. div-f64: iof64 if64 if64 (float64 division) 29. bne-f64: c if64 if64 (branch-if-not-equal on float64) 30. ble-f64: c if64 if64 (branch-if-less-than-or-equal on float64) 31. instr-two:c ? ? (used to encode two-operand instructions))
The 16 two-operand instruction opcodes and mnemonics and operand signatures are: 0. cpy: o i (copy from register to register) 1. pop: o sim 2. push: som i 3. sysinfo (query system metadata): o c 4. neg: io ii (arithmetic negation) 5. bitnot: io ii (bitwise negation) 6. pickk: c sio (pick c on stack) 7. rollk: c sio (roll c on stack) 8. neg-f64: iof64 if64 (arithmetic negation of float64) 9. peek-f64: of64 ip64 (load f64 from external memory address) 10. poke-f64: op64 if64 (store f64 to external memory address) 11. cvt-i16-i: oi ii16 (convert int16 to int (sign-extend?)) 12. cvt-i-i16: oi16 ii (convert int to int16 (how do we deal with ints bigger than 2^15-1 or smaller than -2^15? is that a critical error or do we mandate saturation or something like that?)) 13. neg-i16: oi16 ii16 14. fclass-f64: oi if64 (classify a floating point number; see eg FCLASS in RISC-V, eg FCLASS.S in section 8.9) 15. instr-one: c ? (used to encode one-operand instructions)
The 16 one-operand instruction opcodes and mnemonics and operand signatures are: 0. jd: pc (dynamic jump) 1. 2. 3. cvt-f64-i16: if64 (convert float64 to int16, pushing 4 entries onto SMALLSTACK) 4. cvt-i16-f64: iof64 (convert int16 to float64, popping 4 entries from SMALLSTACK) (is this round? floor? what do we do when the f64 is larger or smaller than the largest or smallest int16 -- is this a critical error or do we saturate or something else?) 5. coerce-f64-i16: if64 (coerce float64 to int16, pushing 4 entries onto SMALLSTACK) 6. coerce-i16-f64: iof64 (coerce int16 to float64, popping 4 entries from SMALLSTACK) 7. malloc: op 8. mdealloc: ip 9. mrealloc: iop 10. peek-i16: ip (load i16 from external memory address, pushing onto SMALLSTACK) 11. poke-i16: iop (store i16 to external memory address, popping from SMALLSTACK) 12. 13. 14. syscall2 15. syscall: c (used to encode zero-operand misc instructions)
The 16 SYSCALL2 zero-operand instructions are: 0. exec 1. 2. get 3. put 4. spawn 5. pctrl (process control, eg join/wait, kill, etc -- or should these each be separate?) 6. time 7. rand 8. environ 9. getpid 10. signal (?? not sure if we want to do it this way -- signal handler setup) 11. create 12. delete 13. 14. divmod-i16: (on SMALLSTACK; consume 2 items and push dividend, then push remainder) 15. divmod-int: (on SMALLSTACK; consume 2 items and push dividend, then push remainder)
The 16 INSTR-ZERO zero-operand instructions are: 0. halt (terminate program execution) 1. break (mark breakpoint for a debugger) 2. fence-seq 3. peek-i8 4. poke-i8 5. devop 6. read 7. write 8. open 9. close 10. seek 11. flush 12. poll 13. coerce-int-ptr: coerce int in T to ptr 14. log 15. library
an idea for the 64 8-bit instructions (we really only have 6 bits because there are 2 encoding format bits in the 8-bit instruction encoding):
8-bit instruction ideas: 0. ANNOTATE
1. SMALLSTACK SWAP 2. SMALLSTACK OVER 3. SMALLSTACK DROP 4. SMALLSTACK DUP
5. SMALLSTACK ROT
6. POP SMALLSTACK and PUSH to MEMSTACK 7. POP MEMSTACK and PUSH to SMALLSTACK
8. POP SMALLSTACK into ERR 9. POP SMALLSTACK into R4 10. POP SMALLSTACK into R5 11. PUSH ERR onto SMALLSTACK 12. PUSH R4 onto SMALLSTACK 13. PUSH R5 onto SMALLSTACK
14. LD through T into R4 (that is, load in register indirect mode with register T, and put it into R4) 15. LD through R4 and PUSH it onto SMALLSTACK 16. ST R4 through T 17. POP SMALLSTACK and ST it through R4 18. (POP SMALLSTACK, add it to the pointer in R4), LOAD from the addr in parens and PUSH it to SMALLSTACK 19. (POP SMALLSTACK, add it to the pointer in R4), POP SMALLSTACK and ST it to the addr in parens
20. PUSHPC onto MEMSTACK 21. POP MEMSTACK and JD
22. SMALLSTACK ADD-ptr-int 23. SMALLSTACK ADD-int 24. SMALLSTACK NEG 25. sub-ptr SMALLSTACK 26. MUL SMALLSTACK 27. SLL SMALLSTACK 28. SLR SMALLSTACK 29. SRA SMALLSTACK
30. SKIPNZ-int SMALLSTACK 31. SKIPZ-int SMALLSTACK 32. SKIPEQ-int SMALLSTACK 33. SKIPEQ-ptr SMALLSTACK 34. SKIPLE-int SMALLSTACK 35. SKIPLE-ptr SMALLSTACK 36. SKIPLT-int SMALLSTACK 37. SKIPLT-ptr SMALLSTACK 38. SKIPGE-int SMALLSTACK 39. SKIPGE-ptr SMALLSTACK
40. LD SMALLSTACK 41. ST SMALLSTACK
42. PUSH 0 onto SMALLSTACK 43. PUSH 1 onto SMALLSTACK
44. PUSH addr of MEMSTACK (R1) onto SMALLSTACK 45. POP from SMALLSTACK to addr of MEMSTACK (R1)
46. JREL +1 47. JREL +2 48. JREL +3 49. JREL +4 50. JREL +5 51. JREL +6 52. JREL -2 53. JREL -3 54. JREL -4 55. JREL -5 56. JREL -6 57. JREL -7
58. BITAND SMALLSTACK 59. BITOR SMALLSTACK 60. BITXOR SMALLSTACK 61. BITNOT SMALLSTACK
62. CAS SMALLSTACK 63. FENCE
(this fits in all of our compute operations, even SRA and ROT, and except we can only access the first 8 registers; and no syscalls, not even malloc or sysinfo; and no way to load immediate constants; and JREL is restricted to +-6 (that means we can go fwd or back skipping over 3 16-bit instructions) (JREL is +-1024 in the 16-bit ISA). And we even managed to fit in EQ, LT, and GE (but not NE).)
(in reality, not only will be profile and choose the 8-bit format to represent the most common instructions or instruction sequences, but we need to leave at least 8 of these free as 'custom instructions' for VMs like OVM implemented on top of BootX?)
---
actually let's make 'native floats' of unspecified size, that fit in the same space as native ints and native pointers. Now we can get rid of the f64 registers.
---
Arm cortex M4 offers floating point with single Precision that is 32 bits which is the same as the native pointer bitwidth (so it seems pretty feasible to have floats of the same precision as pointers)
---
how to represent program memory:
if we use pointers:
if we say Boot instructions take up one 'word' and that program memory is word-addressed, then, in BootX?