Bayle Shanks's website: proj-oot-ootAssemblyNotes20

see also ootNotes30, where i start working on an ISA that might be suitable for genetic algorithms and the like.

that gave me some ideas that should maybe be applied to Boot or to OVM:

the LABEL and JUMPLABEL instructions
CALL pushing the return operand onto the stack, and RET using it
having more 3-operand instructions (somehow)
maybe allowing direct changes to the PC so as to not need the JD instruction (and maybe not even JREL, if the ADD operand size is large enough)
consider using signed arithmetic
the weird addressing modes i have there

also this makes me question if we shouldn't:

somehow add addressing modes to Boot
find a 32-bit instead of 64-bit encoding for OVM

hmm.. it doesn't seem possible to add addressing modes to Boot without either:

reducing the number of registers
going to a 32-bit encoding
going to a variable-length encoding
going to a 2-operand encoding

a variable-length encoding doesn't have to be irregular; some instructions (like LDI and absolute jumps) can just take some extra words at the end. Which is probably fine in most contexts. But:

it's not good for the 'genetic algorithms' application, and the related 'algorithmic compression contexts' application; it'll be better for the optimizer if all of the instructions are the same length
it's not good for the 'program analysis within the Oot toolchain' application

if we're willing to partially give up on program analysis, however, and restrict each function to ~25 local variables, and encode most constant programmatically, then it's perfectly possibly to make a 32-bit version of OVM: 4 fields (opcode, op2, op1, op0), each field has 3 addr mode bits and 5 value bits. Alternately, the opcode field has 3 addr mode bits and 5 value bits, and the other 3 fields have 3 addr mode bits and 4 value bits, and we have 3 bits left over; two of them are used for instruction encoding format choice, and 1 bit is reserved for future use. Using the same scheme as with Boot, this gives us 31 3-operand instructions, plus 31 2-operand instructions, plus 31 1-operand instructions, plus 31 0-operand instructions; of course the 1-operand and 0-operand instructions aren't as useful, so that's basically 64 2-operand+ instructions. With 8 addressing modes and on the order of 64 major instructions it may not be a 1-weekend project, and it's still lacking some stuff from OVM, most notably generics, address space types, custom instructions, and hooks, but still it seems like a suitable place to start, maybe even more suitable than Boot.

otoh for the 'algorithmic compression' application we may not need main memory at all; a few registers are probably enough to make lots of interesting patterns. In which case we can get rid of the register indirect addressing mode and the load/stores, and we can probably even find some way to get down to 1 addr mode bit instead of 2 (although i'm not sure how yet).

but even if the primary encoding were 32 bits, 16 bits probably still makes some sense, as many instructions won't need the addressing modes, won't access registers above 16, and won't be using the extra opcodes.

does 8 bits make sense? Well, if we use the current version of boot, extended format instructions must have 3 bits set a certain way, and we certainly couldn't afford to lose 3 of 8 bits, so these may not (currently) be compatible. But ignoring that:

you could have 3 operands each with 1 addr mode bit and 1 value bit, plus a 2-bit opcode. That isn't great because there's lots of things you'd want to do with that 1 addr mode bit (immediate vs register direct? register direct vs stack? register direct on TOS/second-on-stack vs stack push/pop? or just have 4 registers with no addr modes?). You could also do stuff like that weird format i thought of which is implicitly stack by default except that some bits say whether to do something else for the first input and for the output. Also, 4 opcodes isn't much. Also, there's not even a single format encoding bit there.
you could have 1 format bit, 4 opcode bits, and 3 operands each of 1 bit
you could have variable-length in the manner of the JVM; some bytecodes indicate more operands in the bytes to follow
you could have fully variable-length instructions, in which the operand fields are in different places. Or, similarly, the instructions could be 'all opcode' with a gigantic lookup table, many variants of many instructions, and only a few registers (6502-style ISA)

so it sounds to me like it would be good to retain the option to have 8 bits, but not to actually work on it (yet). It's probably not good for anything except for code density.

but 16 bits makes sense. And 32 bits makes sense (because, unless you only want less than 8 registers, 16 bits is probably too small to have both 3-operands and good addr modes). And 64 bits makes sense (at least for the program analysis application, because you can have 256 registers, and for OVM, because you can have the generics type field). And reserving the option for something even longer makes sense (for my weird natural language inspired encoding ideas).

if you had the format encoding bits for all of these, however, then it might look like this:

8 bits: 1 format encoding bit, 7 other bits (to be determined)
16 bits: 2 format encoding bits, 5 opcode bits, 3*3 operand bits
OR the current Boot proposal: 16 bits: 4 opcode bits (only one of which is free for longer formats), 3*4 operand bits
32 bits: 3 format encoding bits, (3+4)*4 field bits, 1 reserved bit (or, consume the reserved bit to make 5-bit operands)
OR to be compatible with the current Boot proposal: 32 bits: 4 format encoding bits, (3+4)*4 field bits
64 bits: 4 format encoding bits, (4+8)*5 field bits. NOTE: if we're compatible with the current Boot proposal, then there is no room for format bits for an extended format here, unless opcodes are allocated to that (which is problematic because opcodes are 8 bits long in this format).

working backwards, if we want the 64-bit format to use 4 format encoding bits, and we want 8 bits to use 1 format encoding bit, then 16 bits must be at least 2 format encoding bits, and 32 must be at most 3 format encoding bits. So, we'd want to reserve 4 3-operand opcodes in Boot (for 2 format-encoding bits). With the current Boot proposal, there aren't enough free opcodes for that.

We could do it if we demoted ANNOTATE to a 2-operand instruction, or if we demoted ld and st to 2-operand instructions. However, if you think about it, that isn't really enough; the 8-bit encoding demands an entire bit, not just 2 combinations out of 4. Also, if you think about it, why are we taking bits out of the opcode field anyway? That's the field that needs them the most. If we are short on bits, we really should demote the operand fields to 3-bits before we remove any bits from the opcode field. This would give us the first option above, 2 format encoding bits, 5 opcode bits, 3*3 operand bits; it would, of course, be nice to double our supply of 3-operand instructions. Of course, the issue with that is that since we're reserving 5 special registers, with 8 registers total that means we'd only have 3 GPRs.

Also, in the above, it seems strange to have 5 bits for the opcode for the 16-bit variant but only 4 bits for the opcode for the 32-bit variant. We could add that 'reserved bit' in the 32-bit variant to the opcode field, though -- that's nice but it doesn't do much when the addressing mode is not immediate or split, so it's a little bit of a waste.

Or we could just give up on the 8-bit encoding and the 32-bit encoding (which is the current plan). That's probably better because it's simpler -- but it misses out on 32-bits, the central 'sweet spot' that many other processors hit.

But if we have 5 instruction formats (well, 4 fixed-length ones), does the 'fixed-length' aspect even do us any good any more? Of course, we can always 'up-project' everything to the 64-bit form for program analysis, but if we're going to do that, why not allow JVM-style variable-length in the smaller bitwidths?

otoh i gotta say that the nice arithmetic progression in the number of format bits, and the idea of having a format for each power-of-two doubling, attracts me. The 16-bit and 32-bit ISA would still be easy to build (although now, with 5 bits, the 16-bit ISA could easily accomodate 'way too many' instructions -- a good problem to have, i guess).

also, note, instead of my 'natural-languagy-ISA' idea for the 'extended' language, we could just have something like SEXPRs (of course, SEXPRs could be easily encoded using my 'natural-languagy-ISA' so that's not an argument against it, except for simplicity).

and how does having 4 different instruction lengths with fixed field locations (or at least 3, since the 8-bit format might not follow that pattern) compare to JVM-style opcode-in-the-first-byte-plus-more-bytes-of-data-depending-on-opcode?

It's slightly easier to delineate instructions in this system, but in the JVM system instruction lengths are just an opcode-lookup-table, which isn't that hard (for a computer).
This system is slightly easier to code a compiler or interpreter for (excluding the 8-bit variant again).
In this system, instructions with very few 3 operands are inefficiently coded (eg zero-operand instructions still use all of their operand fields for 'instr-two', 'instr-one' and 'instr-zero' opcodes), whereas instructions with many operands are efficient; by contrast, in the JVM system, instructions with zero operands are most efficient (just a bytecode), and instructions with many operands are less efficient. This system probably has the upper hand there; according to [1], 60% of x86 instructions have 2 operands, 37% have 1, and only 3% have zero operands.

Adding the 2 format bits to the 16-bit instructions would make them almost unusable (at least, in terms of efficiency) for general computation, though; with only 3 GPRs it's very inefficient. This means that with this system, 16-bits is pretty much only good for compressing 32-bit code; so if we do this, maybe Boot should be moved to 32-bits (even though that means 8 addressing modes).

Also, it should be noted that this won't help with the thing that originally got me thinking about this today; for the purpose of compressing strings using Kolmogorov algorithmic complexity (finding short programs that make interesting patterns similar to the patterns in the string you are trying to compress), we'd prefer to have 16-bit instructions with fewer registers and addressing modes, instead of more registers and no addressing mode; and we don't want 32 opcodes (the format bits themselves are okay because we can just ignore them and not even try to mutuate them). The bespoke 16-bit format i made up earlier, with 16-bit opcodes and 3 fields each with 2 addr mode bits and 2 value bits, would probably be better for this application. I guess the remaining contribution to the argument for more formats from that line of thinking is just that addressing modes are useful, and since 32-bit instruction encodings can have addressing modes, it's a shame to just skip over 32-bit right to 64-bit.

other objections are:

the 32-bit one seems like the one you'd want beginners to target -- but then they have to implement 8 addressing modes
we have to rewrite it -- will i ever stop redesigning this project and just get it done?

Perhaps my biggest objection is that this destroys the beauty/usefulness of the 16-bit format, by cutting it from 11 GPRs to 4, and for what? More complexity.

hmm. i can't decide.

an alternative would be to reduce the operand length in the 64-bit format, freeing up more format encoding bits up there. That allows us to have one more format encoding bit in the 32-bit format (using the reserved bit; so we'd have 4 operand bits instead of 5 in the 32-bit format). Then if we eliminate the 8-bit format, we can go back to only reserving two opcodes (instead of any full bits) for the 16-bit format to indicate a format change.

OK i went back and looked at the old pymovm source to see how bad it actually is to add addressing modes. In terms of interpreter complexity, it's bad but not horrible. However, it also adds significant complexity to the assembler because you have to parse the addressing modes. Another issue with addressing modes is that now none of our formats are good RISC formats (no implicit load/store, and many registers); instead, we have one good 32-bit CISC format, and two subencodings which could be mixed in for code density (16-bit and 8-bit).

i guess the real issue is: the 16-bit current proposal has a sort of beauty. Most full, fixed-length, simple, orthogonal, 3-operand instruction sets are at least 32 bits, but here was one that was just 16 bits -- twice as good code density. And it really felt like it could be both simple to implement even in a small amount of ROM, and sufficient for all sorts of stuff. But this new proposal robs it of its beauty; instead we get a 16-bit mode that is not sufficient for stuff, and a futuristic mixed ISA with so many parts that the implementation would get larger.

and the new proposal still doesn't give us what i was thinking about when i came up with the new proposal; a 16-bit ISA that would be great for genetic algorithms (because for that we want addressing modes, and even less registers).

So it seems we have three design points in front of us; a 32-bit ISA with extensions up and down from 8- to 64-bits+, suitable for general computation but not terribly easy to implement; a beautiful 16-bit ISA with room to extend to 64-bits+ but not to 8- or 32, suitable for Boot; and a beautiful 16-bit ISA with addressing modes but only 2 registers, suitable for genetic algorithms.

Can we gain anything by giving up the extension down to 8-bits? This frees up one bit from the 16-bit mode. We can then give up the 32 opcodes (go back from 5-bit to 4-bit opcode field) to get another bit. We still need the third bit to switch into 32-bit mode, so we can extend 2 operands to 4-bits, but not all 3. This is not as elegant but it is sufficient; we extend op0 and op1, which lets us do CPYs (MOVs) and any other 2-operand stuff with all 16 registers; the restriction is only for 3-operand instructions, and only in the destination operand, which is restricted to the first 8 regs. So there will be some extra CPYs here sometimes, but that's it. This has the additional benefit of freeing up 1 bit in 32-bit mode, and 1 bit in 64-bit mode. But it completely sacrifices 8-bit mode, which may be useful for compression; and 16-bit mode, while now passable, is maybe not beautiful like it was.

Can we do better? Yes; if we wanted 8-bit mode to be a real fixed-length simple ISA then we need all the bits we can get there. But if we only wanted to use 8-bit mode for compressed instructions (a predefined lookup table of common instructions, giving extra weight to instructions that require 64-bit or 32-bit mode, and/or if we need alignment, then common components of common bigrams of instructions), then we can give up more than one bit for formatting there. If we give up two bits for formatting in 8-bit mode (allowing the remaining 6 bits to specify one of 64 common instructions, instead of 128 for 7 bits), then we can give back the extra reserved bits in 32-bit mode and 64-bit mode, and have 1 formatting bit for 16-bit mode, 2 for 8-bit mode, 3 for 32-bit mode, and 4 for 64-bit mode. So we can go back to having 4 modes, and 16-bit mode can again access 16 registers, except not in the output operand. Its beauty is marred but it's again passable as a bootstrap language.

I like this.

There is another way, though. We could keep 16-bit mode how it is, with 2 adjacent instructions reserved for instruction encoding. This means that any other mode burns 3 bits to say that it isn't 16-bit mode. Instead of adding a forth or fifth instruction selection bit to 32-bits, we can do the same thing in 32-bits (this time reserve 4 instructions), meaning that 6 bits are needed for an instruction to say that it is neither 16-bit nor 32-bit. Now in 64-bit mode, we reduce the operand value size from 8 to 7 bits, and get 5 bits to spare; since we already had 4 reserved, now we have 9, of which we consume 6, and consume 1 more for the extended format, leaving 2 bits reserved for 64 bits. This is probably better in a practical sense, because the 16-bit mode probably needs its third operand more than 64-bits needs to include 1 more bit of values. But it's aethetically annoying, because now 64-bit mode can't load any 16-bit immediate in one instruction, and can't use 8-bit immediate constants.

Is there another way? Yes -- we could demote ANNOTATE to 2-operands, remove LDX and STX (load indirect indexed) from BootX?, then reserve 4 3-operand instructions in the 16-bit mode. Now other modes need 2 bits to say that they aren't 16-bit mode. So there is no 8-bit mode, and 32-bit mode has 3 format bits, and 64-bit mode has 4 format bits.

With all of these reserved 3-operand instructions, though, we aren't really getting the full benefit out of the 3-operand format; we end up making operations like MUL, SUB, SLL, AND, OR, XOR in-place. In fact, the only 'normal, computational' 3-operand instruction is ADD!

---

i figured out how to do CALL and RET, where CALL takes not only an address to CALL, but arguments to pass in, and an operand with the effective address of where to put the return value:

just push the effective address for the return value onto the stack along with the return address. Then RET uses this to determine where to put the returned value.

one issue with this is that it is weird; it may hinder interop with other runtimes; for example it sort of prevents the CALL from being in tail position; even if we have a separate TAILCALL form of call, now we are expecting RET to know whether the caller called it as tail or not. Better would be just to consider CALL a compile-time macro that desugars to a real CALL and then a CPY; but that doesn't quite work either because we need to be able to execute dynamically-generated CALLs.

---

this guy making a Forth assembly language layer feels that Forth's separate stacks can be a detriment (which suggests that maybe we should just unify the DATASTACK and CODESTACK like most other languages out there):

" 2) The entire point of PAF is to get rid of separate stacks and separate stack pointers, to allow to put all stack items in registers that fit, and use a common stack for the spilled stack items. Implementing PAF on top of a conventional Forth would not achieve that objective. "

---

this guy (Hugh Aguilar) comments on the PAF project saying that you can't really make a portable assembly layer below a HLL unless the processors share a reasonable minimum number of registers and a reasonably orthogonal instruction set (i don't really see how this could be true; perhaps he means 'assuming you want fairly good performance'?).

he opines that today, you can assume 16 registers, which is enough for all popular modern processors:

" The idea of a portable assembler is good, as this would allow Forth to be quickly ported to any processor that fits the basic profile (there is not much difference between the MIPS, ARM and ColdFire?