Bayle Shanks's website: proj-jasper-jasperAssemblyOld1

ISA encoding ideas

load the first 16 bits. Look at the form bits (do we need 1 or 3? if we need 3 we have a problem..) to determine which form we are in. if large or huge form, look at opcode to determine how many operands to load afterwords.

note: we can have two form bits easily if we don't have 'medium form w/o result' and if we omit one of 'large form' or 'huge form'

huge form: 1 variable length instruction with 32-bit header: 32-bit header and zero or more 16-bit operands header: 3 scale bits, 4 addressing mode bits for each of 3 operands, 8 opcode bits, and what are the other 9 bits?

large form: 1 variable length instruction with 16-bit header: 16-bit header and zero or more 16-bit operands header: 2 scale bits, 3 addressing mode bits for each of 3 operands, 4 opcode bits, 1 'form' bit to indicate whether we're in medium or not

medium form w/o result: 1 instruction in 32 bits: 16-bit header and two eight-bit operands (result is pushed onto stack) header: 2 scale bits, 3 addressing mode bits for each of 3 operands, 4 opcode bits, 1 form bit

medium form: 1 instruction in 32 bits: 16-bit header and three four-bit operands header: 3 addressing mode bits for each of 3 operands, 5 opcode bits, one 'form' bit to indicate whether we're in small or not, and what about the other bit? second header: 2 scale bits, what about the two bits left over?

small form: 1 instruction in 16 bits: header: 2 addressing mode bits for each of 3 operands, 5 opcode bits, one 'form' bit to indicate whether we're in small or not, 1 operand bit for each of 3 operands, and what about the other bit?

tiny form: 3 instructions in 16 bits: 4-bit header followed by 3 4-bit opcodes header has 2 scale bits, one 'form' bit to indicate whether we're in tiny or not, and what is the other bit? the operands for each opcode are assumed to be on the stack

as you can see many of these only permit 4 bits for the opcode!

note: 'immediate' mode makes no sense for results; could we use this as a substitute for 'form' bits? i don't think so b/c i think that would mean wasing the other result mode bit, too

when there are 2 scale bits the scales are: 1,4,8,16. when 3 scale bits, the scales are: 1,2,4,8,16,32,64,128

when there are 2 addressing mode bits, the 4 addressing modes are immediate, register, register indirect, stack. stack addressing mode means that the operand specifies a position in the stack, and if this is an input, that position is rot'd to the top and then pop'd, and if this is an output, the output is pushed and then rot'd to that position. when there are 3 addressing mode bits, the first four modes are as above and the last four modes are custom.

in all cases either all the opcodes are custom, or the last half of them are (e.g. the first opcode bit would be whether this opcode is 'custom' or not).

with 2 form bits and no variable length:

tiny (3 instructions in 16 bits, top of stack operands): 2 form bits, 2 scale bits, 3 4-bit opcodes (all stack addr mode with operand zero, e.g. all args and result are popped and pushed from the top of the stack) small (1 instruction in 16 bits, 1-bit operands): 2 form bits, 2 scale bits, 4 opcode bits, 2 addr mode bits x 3 operands, 1 operand bit x 2 operands; NOTE: 3rd operand's value is implicit (usually zero; usually the third operand is the result so this means you must put the result into register zero or push it into the top of the stack, etc) medium (1 instruction in 32 bits, 4-bit operands): 2 form bits, 5 opcode bits, 3 addr mode bits x 3 operands + 2 scale bits, 2 reserved bits, 3 4-bit operands huge (1 instruction in 80 bits, 16-bit operands): 2 form bits, 5 opcode bits, 3 addr mode bits x 3 operands + 2 scale bits, a bunch of bits left over, + 3 x 16-bit operands? but then we're misaligned on 32-bit boundary. should we just go 32-bit? do we even care about 32-bit boundaries?

actually since small/tiny is 16 bits, and medium is 32 bits, we are already misaligned on 32-bit boundaries, unless we require tiny/smalls to always come in pairs. which we could do. we could also have the following 16-bits in a huge instruction be a tiny/smalls, to regain alignment 32-bit; but we will have lost 64-bit and higher alignment (which makes sense, as a huge would be more than 64 bits). we could also have the last 48-bits in a huge instruction just be a combination of mediums and tiny/smalls.

if we are going to restrict alignment like this, we may as well take out the form bits and put them as a header in each block. but that sounds complicated..

hmmm... maybe if we have no scale bits for large:

large: header: 2 form bits, 5 opcode bits, 3 addressing mode bits for each of 3 operands

yes, i guess this is best. embrace the 64k limit.. use special opcodes to escape it via arbitrary length addressing.

so:

now we are 64-bit aligned. Each 64-bit region might contain:

1 large instruction (1 instruction)
2 mediums (2 instructions)
1 medium and 2 smalls (3 instructions)
1 medium and 1 small and 1 tiny (5 instructions)
1 medium and 2 tinies (7 instructions)
4 smalls (4 instructions)
3 smalls and 1 tiny (6 instructions)
2 smalls and 2 tinies (8 instructions)
1 small and 3 tinies (10 instructions)
4 tinies (12 instructions)

i guess for maximal efficiency half the opcodes should be fixed. So for tiny instructions, we have 1 bit 'custom or not', and 8 fixed opcodes, and 8 custom opcodes. and for others, we have those 8 fixed plus another 8 fixed, and those 8 custom plus another 8 custom

so we need to design a good 8 opcode ISA for the tinies, and a good 16 opcode ISA in general.

some candidates:

MOV PUSH POP CMP ADD SUB AND OR NOT CALLIF NOP

random notes:

MOV PUSH POP CMP (but which comparison? mb BLE would be better, or even BNZ) AND OR CALLIF ADD (signed)

best guess so far:

MOV PUSH POP BLE (if operand1 is less than operand2, then goto operand3) (and what about BNE?) AND OR CALLIF (store some registers, possibly push current PC onto a call stack, possibly pop a call stack to get dest addr) ADD (signed)

note: CALLIF makes me think it may be good to have another stack mode where the operand specifies which register points to the stack to be used, rather than the position within the main stack; then we'd get implicit PUSH and POP. actually i guess you could do this by having a 'stack register'.

note: seems that we can't have an indexed, scaled address mode because where would we put the scaling parameter?

note: it would be nice to have an indexed address mode and a choose-your-stack address mode. should we only have 2 custom modes?

note: shall we have any 4-operand forms? that would allow us to have a parameterized CMP. Or, CMP could affect a set flag in a flag register.

note: with only 4/5 bit opcodes, we are starved for opcodes. Need to have a good, fast way of accessing custom instructions not currently assigned an opcode.

note: if the PC is an accessible register and if you had a zillion operands, then you don't need BLE and BNE per se, you could have 'if A <= B then put C into D'. But unless it was register 0 or 1, then you couldn't do this in tiny mode, which would cripple tiny mode.

mb add a PC addressing mode? for jasper's purposes though, since we dont want to expose linear memory too much, this isn't as useful (for relative jumps, maybe, but not for locating a data storage area relative to the PC)

also mb we don't need our 'stack choice addressing mode' because we should just put our choice of stack into the stack register prior to executing the command.

best guess so far: CPY BLE BNE AND OR NAND CALLIF ADD (signed) note: 'add' is also 'get' in a struct, which means it is also 'apply' in a function? mb this should use 'apply'?

actually the scales can also be dynamic.. and in fact they can just be the type annotations. One can be reserved for dynamic typing.

hmmm but then the 'scale bits' are 'type bits' and are absolutely required, unless we want to assume dynamic typing

hmmm i notice there's no room for them in 'small' or 'large' format then:

tiny (3 instructions in 16 bits, operands from top of stack): 2 format bits, 2 type bits, 3 x 4-bit opcodes. No addr mode or operands are specified because they are all implicitly assumed to be a default (usually stack addressing mode with operand zero, e.g. popping and pushing from the top of the stack)
small (1 instruction in 16 bits, 1-bit operands): 2 format bits, 5 opcode bits, 2 type bits, (2 addr mode bits + 1 operand bit) x 2 operands (third operand is implicitly assumed to be a default (usually to push the result onto the top of the stack), 1 unused bit
medium (1 instruction in 32 bits, 4-bit operands): 2 format bits, 5 opcode bits, 2 type bits, (3 addr mode bits + 4 operand bits) x 3 operands, 2 unused bits
large (1 instruction in 64 bits, 16-bit operands): 2 format bits, 5 opcode bits, 3 addr mode bits x 3 operands + 3 x 16-bit operands, NO TYPE BITS?

ok so let's just have one format bit and eliminate two formats:

tiny (3 instructions in 16 bits, operands from top of stack): 1 format bit, 2 type bits, 3 x 4-bit opcodes. No addr mode or operands are specified because they are all implicitly assumed to be a default (usually stack addressing mode with operand zero, e.g. popping and pushing from the top of the stack)
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 2 type bits, 4 opcode bits, 3 addr mode bits x 3 operands + 3 x 16-bit operands

if we are doing this we may as well just have 64-bit alignment:

tiny (15 instructions in 64 bits, operands from top of stack): 1 format bit, 2 type bits, 15 x 4-bit opcodes, 1 bit unused. No addr mode or operands are specified because they are all implicitly assumed to be a default (usually stack addressing mode with operand zero, e.g. popping and pushing from the top of the stack)
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 2 type bits, 4 opcode bits, 3 addr mode bits x 3 operands + 3 x 16-bit operands

15 implicit stack instructions seem like a lot, i worry we wont be able to fill them. let's use small instead of tiny:

small (7 instructions in 64 bits, operands from top of stack): 1 format bit, 2 type bits, 7 x (4-bit opcodes + 1 custom addr mode flag + 2 bits addr mode + 1 bit operand), 5 bits unused. No addr mode or operands are specified because they are all implicitly assumed to be a default (usually stack addressing mode with operand zero, e.g. popping and pushing from the top of the stack)
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 2 type bits, 4 opcode bits, (1 bit custom addr mode flag + 2 addr mode bits ) x 3 operands + 3 x 16-bit operands

hmmm those 4 opcode bits are really cramping my style. also we have 5 bits unused in the small format. doesn't seem to me to make much sense to add another opcode bit in the small format if we can't fit it in the large format. i'm not sure that 2 bit operands are much better than 1 bit operands (although maybe they are, that's 4 registers instead of 2). but how about we give the extra bits to yield a single immediate operand that can be used for any subinstruction in this instruction? and/or a codetable switch?

5 bits unused: 3 of these bits go towards an 'immediate mode append' which is optionally appended to the one bit operand for any immediate mode arguments, using the custom addressing modes. now 2 bits are left unused.

hmm.. i guess we COULD still use a prefix encoding for format, having two format with 2 form bits, and 1 with 1 format bit:

small: format 00 medium: format 01 large: format 1

would kind of like to save one format for metadirectives like code table changes though, so that you can have something that very quickly looks very far ahead on 64-bit boundaries to find potential metadirectives:

meta: format 00 small: format 01 large: format 1

which would only leave space for 1 custom mode

best guess so far: CPY BLE BNE AND OR NAND CALLIF ? GET ?

ISA encoding ideas

Instructions are packed into 64-bit superinstructions and are aligned.

Superinstructions can have multiple formats. All superinstructions start with at least one 'superinstruction format bit'. If the first superinstruction format bit is a '1', then the format is 'large'. Otherwise, there is a second superinstruction format bit. If it is a '1' (e.g. 01 for both of them), then the format is 'small'. If the second bit is '0' (e.g. 00 for both of them), then the format is 'reserved'.

The 'small' and 'large' formats are as followed:

small (7 instructions in 64 bits, operands from top of stack): 2 format bits, 2 type bits, 4 parameter-register bits, 7 x (1 bit custom opcode flag + 3-bit opcodes + 1 custom addr mode flag + 2 bits addr mode + 1 bit operand).
- note: mb add more operand bits? think about jumping to subroutines from a jump table
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 2 type bits, 1 bit custom opcode flag, 3 opcode bits, (1 bit custom addr mode flag + 2 addr mode bits ) x 3 operands + 3 x 16-bit operands

the format bits are described above. the type bits describe which version of polymorphic instructions are used. Type 11 is 'dynamic', otherwise the type bits are looked up in the opcodetable (for custom opcodes) or in the fixed opcodetable (for non-custom opcodes). opcode bits specify an opcode. If the custom opcode flag is 0, then one of the fixed opcodes is specified. If the custom opcode flag is 1, then the opcodetable is consulted. Similarly, if the custom addr mode flag is 0, one of the fixed addr modes is specified, otherwise, the codetable is consulted. The 4 parameter-register bits are copied into the parameter register.

addr modes:

immediate, parameter register plus immediate, stack, stack indirect

immediate means that the value is just the operand itself parameter register plus immediate means that the value is the value of the parameter register plus the value of the operand stack means that the value may be found/placed at n-th position on the stack, where n is the operand indirect means that the value may be found at the memory location denoted by going to the memory location denoted by the operand, then going to the location denoted by the value stored there

should change MOV to CPY. A clearer name, and also it avoids confusion with 'move semantics' in HLLs (since we might actually want a MOV someday too).

best guess so far: CPY (note: what to do with the third operand? stack selector?) BEQ (note: PC-relative jump, not absolute) NAND OR CALLEQ (note: or maybe just JMP? then what to do with other operands? build in indirect indexing to this?) RET (note: eliminate if no CALLEQ) GET (note: do we need this if indirect indexing?) ADD (signed)

So now we would like a POP in the core ISA.

CPY BLE BNE AND OR NOT POP GET

some other items on our wishlist:

because NOT is unary, NOT as an orthogonal addressing mode bit , rather than a separate operation
CALL/RET
CMP
only one of BLE/BNE
more type bits
more operand bits in small mode

also, lets add another bit to the small operands, now that we have to constantly address all these registers.

in MIPS they have beq and bne but they replace ble etc with stuff like 'set if less than (a, b, r0); beq (r0, zero, LABEL)': http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Mips/pseudojump.html

CPY CMP BEQ AND OR NOT POP GET ?

current guess: the stack is not the same as/mapped into low memory, it is separate, and only accessible using the stack addressing mode. This seems restrictive but it is not much since each operand always has its own addressing mode, so there's never a time you'd want to write to the stack and then read the value from low memory; the only time you'll hurt is if you write to low memory but then want to access that value using the implicit stack mode of tiny. in this case we can put registers into low memory (i'm thinking to put them above the first 32 bits, because these 32 bits are the only things accessible by 'small' instructions with 1-bit operands)

mb instructions like CMP and LEA, which would like parameters but also outputs, could use half of the output bits for their parameter and the other half for the output. So if we have 4 bit operands that's 4 choices for the param and 4 low-memory-mapped 'registers' for the output.

we could abandon the addressing modes and only use LEA..

4-bit operands in small:

small (5 instructions in 64 bits, operands from top of stack): 2 format bits, 2 type bits, 4 parameter-register bits, 5 x (1 bit custom opcode flag + 3-bit opcodes + 1 custom addr mode flag + 2 bits addr mode + 4 bit operand) + 1 reserved bit.

4 addr modes instead of 8:

small (5 instructions in 64 bits, operands from top of stack): 2 format bits, 2 type bits, 4 parameter-register bits, 5 x (1 bit custom opcode flag + 3-bit opcodes + 1 custom addr mode flag + 1 bits addr mode + 4 bit operand) + 6 reserved bits.
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 2 type bits, 1 bit custom opcode flag, 3 opcode bits, (1 bit custom addr mode flag + 1 addr mode bits ) x 3 operands + 3 x 16-bit operands + 3 reserved bits

now we can add more operands and types, shew:

small (5 instructions in 64 bits, operands from top of stack): 2 format bits, 3 type bits, 4 parameter-register bits, 5 x (1 bit custom opcode flag + 4-bit opcodes + 1 custom addr mode flag + 1 bits addr mode + 4 bit operand).
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 3 type bits, 1 bit custom opcode flag, 4 opcode bits, (1 bit custom addr mode flag + 1 addr mode bits ) x 3 operands + 1 reserved bit + 3 x 16-bit operands

and we even have a reserved bit in the large format!

and if we wanted we could have 1 format bit in the small format and a two in the large and have a reserved bit in small!

so the addressing modes are:

immediate (or memory indirect postincrement, e.g. PUSH, for an output) memory

and the 4 low-memory-mapped 'registers' are:

PC index stack? GPR

actually i guess 'index' and 'stack' are just conventions:

PC GPR GPR GPR

and actually having the PC first makes sense in a way, but since the PC is always 16 bits, it means that when working with 4-bit words, the first four words are unusable, which sucks when you can only address the first 16 words with a 4-bit operand. And you'll probably rarely work with 4-bit words within the 16-bit PC. So why not memory-map the PC in a little higher, so that it is accessible when addressing the first 16 16-bit words, but not the first 4 4-bit words? To make it easy to remember, put it in the 16th 16-bit word, i.e. in bits 240-256 (location F in hexadecimal when using 16-bit words).

and now we have space for 8 more fixed instructions. Best guess:

CPY (? what to do with second operand? perhaps include an LEA type thing in here, or addition of two indices, or CPY a block at once?) CMP BEQ AND OR NOT POP (this is really a POP and a ROT combined since you can specify a location other than TOS to POP) PUSH (this is really a PUSH and a ROT combined since you can specify a location other than TOS to PUSH) LEA ADD CALL (? what to do with the 3 operands? i guess one of them is the location of the call stack pointer, and one is the jump address? shall we even include a parameter? can we do some backwords calling and handler trees with this? can we combine CALL and RET?) RET (? what to do with the 3 operands? i guess one of them is the location of the call stack pointer? shall we even include a return value? can we do some backwords calling and handler trees with this? can we combine CALL and RET?) NOP CONV ? (use the second operand to specify the second to/from type? then we have two bits left; specify a custom conversion parameter?) ?? expression evaluation? ?? exceptions/backwards calling? ?? concurrency? ?? transactions? ?? perspectives? ??3 more to go

and now that we have 8 types, we can have 4 of those be fixed:

1-bit 4-bit (unsigned, i guess) 16-bit (unsigned, i guess) dynamic

the 4 LEA modes are as given above (indirect index, index indirect, memory indirect index, memory index indirect).

yay!

now, do we even need the parameter register in 'small'? i think not, since we are splitting the third operand.

small (5 instructions in 64 bits, operands from top of stack): 2 format bits, 3 type bits, 4 reserved bits, 5 x (1 bit custom opcode flag + 4-bit opcodes + 1 custom addr mode flag + 1 bits addr mode + 4 bit operand).

if we have variable word sizes, then what word size are the pointers in the addressing modes in terms of? one guess is:

if the pointer is in the operand, then it depends on the type. E.g. if the type is 4-bits, and the operand is 2, then that is pointing at bit 8.
if the pointer is in memory, then it is 16 bits

and other alternative is:

if the pointer is in the operand, then same as above
if the pointer is in memory, it depends on where it is stored. The first 16 bits of memory are 1-bit pointers with bit word length (e.g. they indicate either bit 0 or bit 1 of memory). The first 16 nibbles of memory are either 1-bit or 4-bit word length, and of 1-bit or 4-bit width. Everything else is a 16-bit pointer. E.g. in terms of 16 bits:

word 0: 1-bit pointers to 1-bit words words 1-3: 4-bit pointers to 4-bit words words 4+: 16-bit pointers to 16-bit words

i think i like the second alternative better than the first. but one feature/bug with the second alternative is that small superinstructions working in 4-bit type can never access memory above 16-bit word 3; even if you use LEA to put a high-memory address into low-memory, and use memory-indirect addr mode, you can't reach it because the high-memory address will be misinterpreted.

therefore here's alternative 3:

bit 0: some special behavior? the PC? bits 0-3 inclusive: 3 1-bit pointers bits 4-15: 3 4-bit pointers bits 16-64: 3 16-bit pointers bits 64+: 16-bit pointers

in this scheme was can put the PC at zero again.

addr modes for inputs:

immediate, memory, memory indirect, custom

immediate means that the value is just the operand itself memory means that the operand is a pointer to the memory location in which the value may be found memory indirect means that the operand is a pointer to a memory location X which is a pointer to a memory location Y in which the value may be found custom is a user-specified pure function

addr modes for outputs:

memory indirect indexed postincrement, memory, memory indirect, custom

memory indirect indexed postincrement means that the operand is a pointer to a memory location X, which is a pointer to a memory location Y, and that the value should be placed into Y, and X should be incremented memory means that the operand is a pointer to the memory location into which the result should be placed memory indirect means that the operand is a pointer to a memory location X which is a pointer to a memory location Y into which the result should be placed custom is a user-specified pure function (the same as for inputs, e.g. only one custom mode may be specified)

An operand that is pointing to a pointer is interpreted in a peculiar way:

1-bit word size:

0: the PC 1-3: 3 1 bit pointers 4-7: 4 4-bit pointers 8-15: 8 16-bit pointers

redo:

bit 0: the PC (16-bit pointer) bits 0-3 inclusive: 3 1-bit pointers bits 4-15: 3 4-bit pointers bits 16-64: 3 16-bit pointers bits 64+: 16-bit pointers

so, note that even an instruction using type 1-bit within a small superinstruction (4-bit operands), can access the PC (which is treated as a 16-bit pointer), and can also access

redo:

if we have variable word sizes, then what word size are the pointers here in terms of? When a 16-bit-width value, including an operand, is interpreted as a pointer, then it simply a 16-bit pointer to the 2^16 bits in memory; when a 4-bit-width value, including an operand, is interpreted as a pointer, then its value as a pointer is interpreted in a peculiar way:

0: the PC (16-bit pointer) 1-4: memory locations at bits 16, 17, 18, 19 5-7: memory locations at bits 20, 24, 28 8-15: (8-F in hex): memory locations at bits 32, 48, 64, 80, 96, 112, 128, 144

furthermore, the bit width of a pointer in a memory location is determined, not by the type mode of the instruction, but by the memory location, according to the above table; in other words:

memory bit 0: 16-bit pointer memory bit 16-19: 1-bit pointer memory bit 20, 24, 28: 4-bit pointer memory bits 32+: 16-bit pointer

this means that a conversion from a 4-bit pointer to a 16-bit pointer or back again is not as simply as a sign-extension.

todo: how do we specify conversions with type flags? maybe use the second operand?

ok we're getting close to a good basic polymorphic assembly, but we need to add in all the jasper-y things..

also we need to take out stuff that only makes sense with linear memory mapping, and replace it with HLL constructs

random idea: combine MOV-from-result-register and LEA and then have two-operand commands that place the result into a result register, usually followed by a MOV-from-result-register-LEA to place the result where it belongs. This is like having a complex lvalue in an assignment, i guess.

size x:

x format bits 2 + ceil(log_2(x)) type bits 1 + x opcode bits + remainder after other fields after operand remainder, but ratchet ceil(log_2(x)) addr mode bits * 3 (x-1) operand bits * 3 + remainder after other fields but before opcode remainder

so size 1 (8 bits): 1 format bit 2 type bits 3 opcode bit + 2 remainder = 5 opcode bits 0 addr mode bits 0 operand bits (0 bit per operand)

size 2 (16 bits): 2 format bit 3 type bits 5 opcode bits + 0 remainder = 5 opcode bits 3 addr mode bits 3 operand bits (1 bits per operand)

size 3 (32 bits): 3 format bit 4 type bits 5 opcode bit + 0 remainder = 5 opcode bits 9 addr mode bits 9 operand bits + 3 remainder = 12 operand bits (4 bits per operand)

size 3 (64 bits): 4 format bit 3 type bits 5 opcode bit + 1 remainder = 6 opcode bits 9 addr mode bits 12 operand bits + 30 remainder = 42 operand bits (14 bits per operand)

size 4 (128 bits): 5 format bit 4 type bits 6 opcode bit + 2 remainder = 8 opcode bits 12 addr mode bits 15 operand bits + 26 remainder = 99 operand bits (33 bits per operand)

...

redo

size x:

x format bits (the size in unary with terminator, e.g. a prefix encoding) ceil(log_2(x)) type bits if avail after opcode x opcode bits + remainder after other fields after operand remainder, but only 1 up to ceil(log_2(x)) addr mode bits * 3 if avail after format, x opcode bits (x-1) operand bits * 3 + remainder after other fields but before opcode remainder

so size 1 (2 bits): 1 format bit 0 type bits 1 opcode bits = 1 opcode bits 0 addr mode bits 0 operand bits

so size 2 (4 bits): 2 format bits 0 type bits 2 opcode bits = 2 opcode bits 0 addr mode bits 0 operand bits

so size 3 (8 bits): 3 format bits 2 type bits 3 opcode bits + 1 remainder = 3 opcode bits 0 addr mode bits 0 operand bits

size 4 (16 bits): 4 format bit 2 type bits 4 opcode bits + 0 remainder = 4 opcode bits 3 addr mode bits 3 operand bits (1 bits per operand)

size 5 (32 bits): 5 format bit 3 type bits 5 opcode bit + 1 remainder = 6 opcode bits 3 addr mode bits 15 operand bits + 0 remainder = 15 operand bits (5 bits per operand)

size 6 (64 bits): 6 format bits 3 type bits 6 opcode bit + 1 remainder = 7 opcode bits 9 addr mode bits 18 operand bits + 21 remainder = 39 operand bits (13 bits per operand)

size 7 (128 bits): 7 format bit 3 type bits 7 opcode bit + 0 remainder = 7 opcode bits 9 addr mode bits 21 operand bits + 81 remainder = 102 operand bits (34 bits per operand)

---

variable-length format that can scale to arbitrarily large operand sizes:

x format bits (the size in unary with terminator, e.g. a prefix encoding) x opcode bits (x - 2) first addr mode bits (x - 2) first operand bits x type bits (x - 2) second addr mode bits (x - 3) second operand bits (x - 2) third addr mode bits (x - 3) third operand bits ? append to operand 3

so size 1 (2 bits): 1 format bit 1 opcode bits 0 type bits 0 addr mode bits 0 operand bits

so size 2 (4 bits): 2 format bits 2 opcode bits 0 type bits 0 addr mode bits 0 operand bits

so size 3 (8 bits): 3 format bits 3 opcode bits 1 first addr mode bits 1 first operand bits 0 type bits 0 second, third addr mode, operand

size 4 (16 bits): 4 format bit 4 opcode bits 2 first addr mode bits 2 first operand bits 4 type bits 0 addr mode bits 0 operand bits

size 5 (32 bits): 5 format bit 5 opcode bits 3 first addr mode bits 3 first operand bits 5 type bits 3 second addr mode bits 2 second operand bits 3 third addr mode bits 2 third operand bits 1 append to third operand bits

size 6 (64 bits): 6 format bit 6 opcode bits 4 first addr mode bits 4 first operand bits 6 type bits 4 second addr mode bits 3 second operand bits 4 third addr mode bits 3 third operand bits 24 append to third operand bits

size 7 (128 bits): 7 format bit 7 opcode bits 5 first addr mode bits 5 first operand bits 7 type bits 5 second addr mode bits 4 second operand bits 5 third addr mode bits 4 third operand bits 79 append to third operand bits

the fixed types are:

dynamic 1-bit -- size 1 stops here (custom flag) -- size 2 stops here 2-bit unsigned 2-bit signed -- size 3 stops here 4-bit unsigned 4-bit signed 8-bit unsigned 8-bit signed -- size 4 stops here 16-bit unsigned 16-bit signed 32-bit unsigned 32-bit signed 64-bit unsigned 64-bit signed 128-bit unsigned 128-bit signed -- size 5 stops here 256-bit unsigned 256-bit signed -- -- ... note: also prob need a 'native pointer' type!

(the -- is because we only go up to the current instruction size for the variable word size types) (note: when there is more than 1 type bit, the second type bit is 'custom')

--- Instructions are packed into 64-bit superinstructions and are aligned.

The 'small' and 'large' formats are as followed:

small (5 instructions in 64 bits, operands from top of stack): 2 format bits, 3 type bits, 4 reserved bits, 5 x (1 bit custom opcode flag + 4-bit opcodes + 1 custom addr mode flag + 1 bits addr mode + 4 bit operand).
large (1 instruction in 64 bits, 16-bit operands): 1 format bit, 3 type bits, 1 bit custom opcode flag, 4 opcode bits, (1 bit custom addr mode flag + 1 addr mode bits ) x 3 operands + 1 reserved bit + 3 x 16-bit operands

the format bits are described above. the type bits describe which version of polymorphic instructions are used. Type 00 is 'dynamic', type 01 is '1-bit', type 10 is '4 bits unsigned', type 11 is '16 bits unsigned'. There are also 4 custom types, which are looked up in the opcodetable. opcode bits specify an opcode. If the custom opcode flag is 0, then one of the fixed opcodes is specified. If the custom opcode flag is 1, then the opcodetable is consulted. Similarly, if the custom addr mode flag is 0, one of the fixed addr modes is specified, otherwise, the codetable is consulted.

so we need to design a good 16 opcode ISA for the fixed opcodes.

addr modes for inputs:

immediate, memory, memory indirect, custom

addr modes for outputs:

memory indirect indexed postincrement, memory, memory indirect, custom

0: the PC (16-bit pointer) 1-4: memory locations at bits 16, 17, 18, 19 5-7: memory locations at bits 20, 24, 28 8-15: (8-F in hex): memory locations at bits 32, 48, 64, 80, 96, 112, 128, 144

furthermore, the bit width of a pointer in a memory location is determined, not by the type mode of the instruction, but by the memory location, according to the above table; in other words:

memory bit 0: 16-bit pointer memory bit 16-19: 1-bit pointer memory bit 20, 24, 28: 4-bit pointer memory bits 32+: 16-bit pointer

this means that a conversion from a 4-bit pointer to a 16-bit pointer or back again is not as simply as a sign-extension. really you can think of the 4-bit pointer operands as pointing to a register, and the 16-bit ones as pointing to a bit in memory, into which is mapped the registers.

for 2-bit pointers, the scheme is:

0: the PC (16-bit pointer) 1: memory location at bit 16 2: memory location at bit 20 3: memory locations at bit 32

so, note that even an instruction using a 4-bit operand as a pointer within a small superinstruction can access the PC (which is treated as a 16-bit pointer), and can also access any location in memory via indirection via the 16-bit pointers accessed via operand values 8-F.

out of the 8 types, 4 are fixed:

1-bit 4-bit (unsigned, i guess) 16-bit (unsigned, i guess) dynamic

and the other 4 are custom

LEA and CMP split the third operand; 2 bits are used as a 2-bit pointer, and the other two are used as a 2-bit parameter choosing the type of complex addressing mode or comparison.

the 4 LEA modes are:

(1) i give you a value. Take the contents of the index register, and add the value to that. This is a pointer to the target. (2) i give you a value. Deref the contents of the index register, and add the value to the result. This is a pointer to the target. (3) i give you a pointer. Retrieve the value at that pointer. Take the contents of the index register, and add the value to that. This is a pointer to the target. (4) i give you a pointer. Retrieve the value at that pointer. Deref the contents of the index register, and add the value to the result. This is a pointer to the target.

the 4 comparison modes are:

less than or equal to
lt
gt
geq

---

todo actually decide on one, i dont like the above one anymore, instructions should be at least a nibble if not 8 bits. Change the following:

todo i like the bundle pipeline idea; 2 6-bit packets

1 len bit 3 type bits 2 opcode bits 2 addr mode bits 2 operand bits 2 opcode bits 2 addr mode bits 2 operand bits

in general, could have every other count of the unary integer length field be a bundled pipeline of two guys. Could also have an 'i' format in which the first operand is 2x the size of the other two; and a 'j' format in which there is only one operand. Could get rid of immediate mode for the std 'd' format. These could be determined by operand.

So:

the rules are:

for odd formats: x unary format bits x+1 type bits x+1 opcode bits x 2 divide remaining bits by 4 assign evenly to addr mode and operand bits addr mode bits are x-1 any remainer goes to second operand, then first operand, then type bits

for even formats: x unary format bits x+1 type bits x+1 opcode bits divide remaining bits by 3 assign evenly to addr mode and operand bits addr mode bits are x any remainer goes to opcode, then type

fmt 1 (16 bits): 1 unary format bit 2 type bits 2 opcode bits 4 addr mode + operand bits (1 bit addr mode + 1 bits operand x 2 operands) or (1 bit addr mode + 3 bits operand x 1 operand) 3 opcode bits 4 addr mode + operand bits

fmt 2 (16 bits): 2 format bits 3 type bits 4 opcode bits 9 addr mode + operand bits (2 bit addr mode + 1 bits operand x 3 operands) or (2 bit addr mode + 7 bits operand x 1 operand) 1 unused bits were added to opcode

fmt 3 (32 bits): 3 unary format bits 4 type bits 4 opcode bits 8 addr mode + operand bits (3 bit addr mode + 1 bits operand x 2 operands) or (3 bit addr mode + 5 bits operand x 1 operand) 5 opcode bits 8 addr mode + operand bits

fmt 4 (32 bits): 4 unary format bits 5 type bits 5 opcode bits 18 addr mode + operand bits (4 bit addr mode + 2 bits operand x 3 operands) or (4 bit addr mode + 5 bits operand x 1 operand)

fmt 5 (64 bits): 5 unary format bits 6 type bits 6 opcode bits 20 addr mode + operand bits (5 bit addr mode + 5 bit operand x 2 operands) or (5 bit addr mode + 11 bits operand x 1 operand) 7 opcode bits 20 addr mode + operand bits

fmt 6 (64 bits): 6 unary format bits 8 type bits 8 opcode bits 42 addr mode + operand bits (6 bit addr mode + 8 bit operand x 3 operands) or (6 bit addr mode + 36 bits operand x 1 operand) 2 unused bits were added to type+opcode

first 2 addr modes are: custom, memory (register) indirect (good for stack) the one-operand thing is implied by opcode, its not an addr mode next addr modes are immediate, indirect

1st opcode bit is custom 1st 2 opcodes:? MOV mb MOVIDX (*(a + b)) -> C or MOVIDX2 a -> (*(b + c)) PUSH or PUSHIDX PUSH (*(a + b)) onto C

other popular ones are shl, xor, add, call, cmp, pop, inc, jz, lea, test, je, jnz

eh, i think the above is still not good enough at 16 and 32 bits. i guess we have to reduce the unary formatting bit overhead.

i guess this is hard b/c we insist on encoding the length, and because of all of the custom bits. We have 1 custom bit for type, 1 custom bit for opcode, and 1 custom bit per operand. That's 5 custom bits per instruction, which is as much as we're giving each operand in 32-bits below.

i decided we'd have to make operands shorter in order to permit enough bits for addressing modes and types and opcodes. with 2 addr mode bits, we get 2 fixed addr modes. with 3 type bits and 3 opcode bits, we get 4 fixed types and 4 fixed opcodes. we really need at least two types: native pointer and dynamic, so we need at least 2 bits (1 custom, and 1 for these two). We really need at least 2 opcodes, so we need at least 2 bits. We really need at least 2 addr modes, so we need at least 2 bits. 1 format bit + 2 bits addr mode x 3 operands + 2 type bits + 2 opcode bits = 11 bits. So we only have enough for 1 bit operands! But we have 2 bits left over. We could use these for bigger operands, or more opcodes, or more types, or some combination. For 2-operand instructions, we already have 2 bits per operand, so it's not like we can't do MOVs. It really sucks to only have 2 (fixed) opcodes in 16-bits. So let's use at least one of these for an opcode, and the other one for an operand bit.

---

-- 4 bits for fixed opcodes: AND OR NOT ADD POP (this is really a POP and a ROT combined since you can specify a location other than TOS to POP) LEA i guess this is like Jasper's give-me-the-link-itself-not-the-destination CALL (? what to do with the 3 operands? i guess one of them is the location of the call stack pointer, and one is the jump address? shall we even include a parameter? can we do some backwords calling and handler trees with this? can we combine CALL and RET?) RET (? what to do with the 3 operands? i guess one of them is the location of the call stack pointer? shall we even include a return value? can we do some backwords calling and handler trees with this? can we combine CALL and RET?) JMP CMP ? what is this exactly? can it work with only 3 operands? or do we need separate EQ, NEQ, LEQ, LT, GT, GEQ? is TEST better? BEQ ? do we need this? just use CMP and BT or BF BNE ? do we need this? just use CMP and BT or BF BF PUSH (this is really a PUSH and a ROT combined since you can specify a location other than TOS to PUSH) CONV (type conversion, two operand) ??1 more

?? expression evaluation? ?? exceptions/backwards calling? ?? concurrency? ?? transactions/marking code blocks? ?? perspectives? ?? while loop?

-- 6 bits for fixed opcodes: ...

other popular ones are shl, xor, add, call, cmp, pop, inc, jz, lea, test, je, jnz remaining: shl, xor, test,

discarded idea for 2-bit opcode:

BNE: if a != b, branch to c (c is signed value, relative to PC, e.g. for 2 bits, one of -2, -1, +2, +3) this is nice for Turing-completeness but its kinda dumb because in 16-bits we only have a 2-bit operand, so we can only branch by 2, which is only about 10% of branches according to Exploring a stack architecture, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.4548&rep=rep1&type=pdf . But i guess you could load the amount to branch into memory location 2, and then work with memory locations 0 and 1, and then do BNE (2,0,1), e.g. if the content of 0 is != the contents of 1, then branch by the amount given by the contents of 2.

Another problem with the BNE is the 1-bit inputs. If we are using 0 as the stack register, it isn't very useful to compare against that.

discarded ideas for 4-bit opcodes:

??NOP

---

how does haskell represent boxed types in memory? see

ok, is this (i was talking about the type specifications for instruction operands)) really a type or is it an interface? e.g. in the following, we have *(a[b]) -> c. Does this mean that a must be a pointer and b must be an integer? Or can a be any indexable type (which in Jasper, means any gettable type) and b is whatever a takes as an index? If the latter, then do we even need a separate type for 'pointer'? i think we still do, because we want to have things like stack pointers for which no a priori information may be available about the type of what it points to, and we want the stack pointer to live in a separate memory space so that we can also have data. ok here's my decision:

---

idea: could represent which operands the type applies to in the type argument using special bits. This is redundant with the verb but it allows the decoder to load the input operands from memory, if needed, without first looking up the verb (so that can be done in parallel)

we have 3 operands and two possibilities (1: that operand is an input AND also the type in the type argument is that operand, or 0: not), so that would be 2^8 to cover all possibilities. So that would be 5 of our 8 bits, and then we'd have 1 bit to flag custom types, leaving us with just 4 bits (16 fixed types). Or we could burn our 3 reserved bits on this.

neither of those seem attractive.

---

idea: y'know the type bits should allow us to have less than the typical amount of opcode bits b/c we don't need separate opcodes for ea. type. so it's kinda dumb to have 8 opcode bits and 8 type bits.

what if we go to a 8-register-ish system? now we only need 6 bits to have a base and an index:

fmt x (not recommended, i prefer the previous one) (64 bits): 1 format bit 3 bit addr mode for type 6 bit type operand (incld. custom) 3 bit opcode addr mode 6 opcode bit operand (incld. custom) 41 addr mode + operand bits (1 custom addr mode bit + 3 bit addr mode + 6 bits operand x 3 operands + 1 addr mode bit + 2 bit operand + 2 bit operand) or (1 custom addr mode bit + 3 bit addr mode + 14 bits operand x 1 operands + 1 custom addr mode bit + 3 bit addr mode + 8 bits operand x 1 operands + 1 addr mode bit + 2 bit operand + 2 bit operand) or (1 custom addr mode bit + 3 bit addr mode + 26 bits operand x 1 operand + 1 addr mode bit + 2 bit operand + 2 bit operand)

10 bits saved

we have 10 bits left now, so we could make the small operands into 3 bit + 4 addr mode ones and still have one bit to spare.

but.. but.. isn't that a huge waste of bits??

anyhow, i don't feel good about that.. i want 16 registers, uniformally. so we are back to:

fmt x (64 bits): 1 format bit 3 bit addr mode for type 8 bit type operand (incld. custom) 3 bit opcode addr mode 8 opcode bit operand (incld. custom) 41 addr mode + operand bits (1 custom addr mode bit + 3 bit addr mode + 8 bits operand x 3 operands + 1 addr mode bit + 2 bit operand + 2 bit operand) or (1 custom addr mode bit + 3 bit addr mode + 18 bits operand x 1 operands + 1 custom addr mode bit + 3 bit addr mode + 10 bits operand x 1 operands + 1 addr mode bit + 2 bit operand + 2 bit operand) or (1 custom addr mode bit + 3 bit addr mode + 32 bits operand x 1 operand + 1 addr mode bit + 2 bit operand + 2 bit operand)

actually let's just use status reg and accumulator when we need more inputs/outputs

so back to having 5 bits to spare:

fmt x (64 bits): 1 format bit 3 bit addr mode for type 8 bit type operand (incld. custom) 3 bit opcode addr mode 8 opcode bit operand (incld. custom) 36 addr mode + operand bits (1 custom addr mode bit + 3 bit addr mode + 8 bits operand x 3 operands) or (1 custom addr mode bit + 3 bit addr mode + 18 bits operand x 1 operands + 1 custom addr mode bit + 3 bit addr mode + 10 bits operand x 1 operands) or (1 custom addr mode bit + 3 bit addr mode + 32 bits operand x 1 operand) 5 reserved bits

now e.g. ADD uses the status register (location 1) as carry input, and then overwrites it with carry output (so ADD takes two input operands, one output operand, one implicit 1-bit input operand, and one implicit one-bit output operand).

wait a minute, we're really close to just having a format bit + reserved bits + a uniform 5 fields:

fmt x (64 bits): 1 format bit 3 reserved bits 60 bits for (1 custom addr mode bit + 3 bit addr mode + 8 bits operand) x 5 (opcode, type, 3 x operand)

let's just do that, it's simpler.

---

if we used all 12 type bits plus 3 reserved bits for a type/modifier for each of 3 operands, that's only 5 type bits per operand, incld. addr mode. If we also wanted some modifier bits for the opcode ('verb'), and we didn't use the 3 reserved bits, that's 3 bits per operand incld. addr mode.

if we redid everything with 3 bits instead of 4 (remember, we used 8 bits b/c its 4 x 2):

fmt x (64 bits): 1 format bit 13 reserved bits 50 bits for (1 custom addr mode bit + 3 bit addr mode + 6 bits operand) x 5 (opcode, type, 3 x operand)

so if ea. of the 4 opcode+operands had 5 bits for type incld. addr mode, that would cost 4 x 5 = 20 bits, but we only have 4 instead of 5 operands, so we'd still have 3 reserved bits. but we can't fit an addr mode + immediate type into 5 bits.

if we redid everything with 3 bit registers and 2 bit addr mode:

fmt x (64 bits): 1 format bit 18 reserved bits 45 bits for (1 custom addr mode bit + 2 bit addr mode + 6 bits operand) x 5 (opcode, type, 3 x operand)

if ea. of the 4 opcode+operands had 2 bits for addr mode plus 5 bits for type, we'd have -1 reserved bits..

if we didn't have a format bit we could do that:

fmt x (64 bits): 2 bit type addr mode + 5 bit type + 1 bit custom addr mode + 2 bit addr mode + 6 bits operand = 16 bits x 4 = 64 bits

if we disallowed a type on the third operand, then we only need 3 types and 4 operands:

fmt x (64 bits): 7 reserved bits 2 bit type addr mode + 5 bit type = 7 bits x 3 operands + 1 bit custom addr mode + 2 bit addr mode + 6 bits operand = 9 bits x 4 operands = 57 bits

so we could get back to 3 bit addr modes:

fmt x (64 bits): 3 reserved bits 2 bit type addr mode + 5 bit type = 7 bits x 3 operands + 1 bit custom addr mode + 3 bit addr mode + 6 bits operand = 10 bits x 4 operands = 61 bits

note: if only 1 type, could use the 3 reserved bits to tell the decoder what format the operand fields are in, where '-' means 'let the instruction handle it' and X means 'preload an operand of the given type':

--- X X- -X XX X-- -X- XX-

if we are specifying 2 types, then we have less choices because we never have to 'let the instruction handle it' for an input.

--- X X- XX X-- XX-

and if we only allow at most one output/'let the instruction handle it':

--- X X- XX XX-

and if we demand at least one input:

X X- XX XX-

note: a disadvantage of having a type for each operand is that the jump table for opcodes balloons, assuming we allow ad hoc polymorphism; we have 6 bit opcodes but 21-bit effective opcodes, since each combination of 6 bit opcode x 5 bit type x 5 bit type x 5 bit verb modifier could be implemented a different way. Such a jump table would take 2 megs of memory all by itself! Instead we would probably implement the jump table as a more compact data structure that would take more time to index into.

This is an argument for keeping a lid on the bit length of the types, as well as for allowing less types per instruction.

fmt x (64 bits): 11 reserved bits 3 operand format/type application bits (see above; ---, X, X-, -X, XX, X--, -X-, XX-) 1 bit custom addr mode + 3 bit type addr mode + 6 bit type = 10 bits x 1 operands 40 bits: 1 bit custom addr mode + 3 bit addr mode + 6 bits operand = 10 bits x 4 operands

note: could save two possibilities in the format/type application bits by disallowing the -X and -X- patterns; mb could use these for alternate instruction formats; or for having differently-sized operands; however if we have asymmetrically sized operands when there are 2 operands, then we probably want both of these as-is so that we can have a big operand of type X, or a big operand of type dictated by the instruction)

fmt x (64 bits): 3 operand format/type application bits (see above; ---, X, X-, -X, XX, X--, -X-, XX-) 1 conditional execution bit (0 = execute only if status register == TRUE, 1 = always execute) -- or would 1 format bit be better? 10 bits flags: 1 bit custom addr mode + 3 bit type addr mode + 6 bit flag operand = 10 bits x 1 10 bits type: 1 bit custom addr mode + 3 bit type addr mode + 6 bit type = 10 bits x 1 10 bits opcode: 1 bit custom addr mode + 3 bit addr mode + 6 bits operand = 10 bits x 1 30 bits operands: 1 bit custom addr mode + 3 bit addr mode + 6 bits operand = 10 bits x 3 operands

now we have a jump table that has only a 12-bit index, e.g. 4k entries. Note however that that means that the first 3 operand format/type application bits are redundant with the operand, because each operand can only support a certain choice of those 3 bits; unless we want to allow operands to be polymorphic in those bits, e.g. to have signatures of different lengths. i think a good compromise is to allow operands to be polymorphic if they want to, but not to include that in the jump table, e.g. the code to switch on that is within the opcode handler.

as for the flags, we can run a pre/post (wrapper) handler that accesses the flags. We can have 3 of them be fixed, and 3 of them be custom. Perhaps the 3 fixed flags can say whether the 3 operands are promises/thunks or actual values. Or mb only 1 flag for that?

so i guess it's a good idea to make each of 3 operands, the opcode, and a type, into fields of the same format. So we have a choice to make: 6 bit operands (+ 4 bit addr mode = 10 bits per), with space for more flags, or 8 bit operands (12 bits total per = 50 for 5), with only 4 bits for flags and format bits. half of 6 bits is 3 bits, so we'd have 8 registers that can be used as base or index for any operand. with 8 bits we'd get 16 registers. for the opcode, we need one custom flag bit, leaving only 5 bits for fixed operands (32 operands) for 6 bits; and the same for types. However, one good thing about this is that the opcode+type forms a 12-bit dispatch, which is a somewhat manageable 4K. To reduce that farther we could assign some of the type bits to type flags (like promise/thunk) which essentially cause calls to wrappers before and/or after the main instruction. But then we'd have less than 32 types.