proj-oot-ootAssemblyNotes24

Difference between revision 6 and current revision

No diff available.

---

190529

we need to take a hard look at if Boot is meeting its goals: i'm not sure that it's any easier to implement than RISC-V RV32I.

it provides other benefits:

it's better than WASM (no weird structured control flow), simpler than JVM, much simpler than CIL, simpler and lower level than Lua, and simpler than ARM (esp. in instruction encoding). But it may be more complicated than RISC-V. And RISC-V is more popular, so even if it were equal complexity, RISC-V should win.

i used to be scared of RISC-V's instruction encoding variants, but now we have a bunch of those too! And ours are (right now, at least) even less regular than theirs, and also theirs are more thought out and have a bunch of properties that they claim are important for simple, efficient hardware decoding:

Yes, their 32-bit and 64-bit platforms are different not only in pointer size, which makes it harder for us to write code targeting both. But we already have an unknown pointer size in Boot, is it really that much trouble to have distinct 32-bit and 64-bit targets? And it's probably not so much trouble to make it worth it to have to start from scratch toolchain-wise with a new assembly language.

Yes, for our purposes, it's slightly simpler to include the necessary syscalls as opcodes. But again, not worth the complexity of introducting a new assembly language.

the only thing that i think might really be important is pointer opaqueness. I'm imagining interop situations where we are running inside a high-level platform like Python, and we want to interoperate with its data structures. In this context we don't have pointers, we have opaque references. We can store them, but not in integers. We can't do pointer arithmetic on them even if we want to. I'm imagining that we will be accessing struct fields (e.g. objects) by assigning a numerical offset for each field and then adding that offset to the base 'reference's, which is why i include the operation of adding an int to a pointer. (now that i say this, it strikes me that even blt-ptr (bltp) is too much, ok i just took that out).

in older versions of Boot we had systems that made the pointer size totally opaque; the program didn't even have to know it for memory layout b/c memory was accessed in terms of pointer-sized words. Should we go back to that?

In any case, we should at least work on our instruction encoding to make it more regular, more like RISC-V.

i think the takeaways are:

---

ok so thinking a little more, we need to go back a little to the more complex system where memory was a linear sequence of locations of opaque type and size. The reason is that the whole reason that we want totally opaque pointers is in order to use Boot code in situations like when we are running on top of Python; memory we allocate is Python arrays and we access Python objects by assigning a numbering to their fields and then treating the object reference as a pointer and adding integers to that pointer to specify the fields. The ability to do this is why Boot is more useful for interop than RISC-V.

so some considerations:

So, TODO:

---

so far i'm leaning towards/TODO:

---

if our memory locations could literally even be references to consecutive fields in a C struct, then some of those C struct fields can hold i32s, some can't, some can hold pointers, some can't, and this changes for different types of structs. So in that case, either you can to check the size of things at every single location, or the program has to have some knowledge at compile-time.

And if you are relying on the program to have compile-time knowledge about what fits where, then is there anything left to be done? And in that case, are we really any better off than RISC-V?

Perhaps we want the Boot program to have the POSSIBILTY of doing low-level interfacing like this, but to also have a sort of 'general case' memory type that it can do internal computations in, which is more predictable?

here's a few cases to consider:

If we rule out the idiosyncratic memories then things probably become more tractable. This means that memory locations never refer to things like references to consecutive fields in a C struct -- you would treat such things as fields of bytes, instead. But in this case how would we interop with structs on top of a typed HLL language, like Haskell?

---

so, i think that direct interop with external data structure layout is always going to require either program knowledge, or a debilitating amount of runtime introspection. When we say that we want Boot programs to be universal, all that we can really ask for is for the existence of uniform memory for internal data structures -- so e.g. if a hash map library is written in Boot, we want to be able to re-use that same code when running on top of Haskell, on top of Python, or on top of raw assembly. Now again, we can do this at compile time (have constants for the size of ptrs, the sizeof i323s, the size of i16s), or we can do this at runtime (introspection/sysinfo, possibly additional instructions such as addsizeof to make it easier to iterate through memory, although we still need to do other stuff when we need to e.g. compute how many memory locations an array of 100 ptrs takes up).

so, can we/should we just use RISC-V for this? maybe... the semantics we are using are pretty different (pointer arithmetic is invalid) but that may not be a problem. should think further. my instinct is no, the semantics are so different that you don't gain much by using it, so just make something new, it'll confuse ppl less anyways. but that may just be Not Invented Here syndrome.

---

okay i've thought about this a little today, i think it's still worth doing, and here's why:

the other option we are considering is just using RISC-V, but adapted for our case.

this means just specifying that pointers are opaque.

so what happens when you load a pointer? what instruction do you use? We need to add a 'load pointer' instruction, and a 'store pointer' instruction.

and what happens when you add an int to a pointer? what instruction do you use? You could just use 'add'. But then the VM has to check what type is in the registers each type it encounters an 'add', in order to know whether we are adding integers, or adding one pointer and one integer. it would be better to have an add-int-ptr, like Boot.

what happens if you try to do pointer arithmetic? it fails at runtime.

So, we need at least some new instructions (load and store pointer) and could really use some others (add int to pointer). That's already probably enough to make us think about a new language. And pointer registers seem to fit really well too.

we could always do an extension to RISC-V tho...

and there are other minor reasons to go with a new language. We're going to have to redefine the RISC-V semantics a little (no pointer arithmetic), and we're going to have to add the syscall fns. It's more confusing to have an overlay like this than just to make our own new thing.

---

can we make better immediate formats? actually i don't buy RISC-V's idea of splitting the immediates to keep the sign bit in the most-significant-bit. I think it's simpler to keep them together.

---

this 'BootX? should have no runtime requirement over Boot' is getting too expensive. I put in a special 32-to-8 and 8-to-32 cpy (mov) instruction (with its own instruction format) so we could access all 32 BootX? registers from Boot. And i put in extra cpfis, cptis, cpfps, cptps (cpy from int stack, cpy to int stack, cp from pointer stack, cpy to pointer stack) instructions so that we could access deep into the stack without pushing and popping. And i kept the intstack and ptrstack at capacity 8 so that those instructions can access all of it.

But this is getting ridiculous. I'm going to take all that out and assume that BootX? will need extra state and can store it in allocated memory.

---

i used to have this:

Five formats. For each format, from LSB to MSB, the fields are:

In graphical form (from MSB to LSB):

but i'm gonig to change it to consolidate i5, i8, i12 into one format with a 2-bit signed immediate and a 3-bit opcode. The instructions can then add in the other operands themselves. This is slightly less efficient (now JPC only gets an 11-bit signed immediate; also the encoding logic has to split up the immediate), but it's slightly simpler/less scary (b/c 2 less formats), and it has the property that RISC-V likes that the signed bit of the immediate is in the same place everywhere.

i saved the former one in boot_old/pyboot_190529_i5_i8_i12_encoding/encoding.py

later that day:

ok i did that but i think it's more complicated, not less.

i saved it in

../boot_old/pyboot_190530_r_i_only_encoding

Also i reread the RISC-C comments about keeping the sign bit in one place and i think they are talking about when they have branch offset immediates in units of 2; rather than just multiplying the immediate by 2 they do some complicated thing where they swap around the location of bits, but they leave the sign bit in one place, and i think that's what they were talking about.

Regarding keeping the registers in one place, since in the immediate formats some registers are being used as immediates, i think it's fine for those to move, which is what the old format did.

Regarding the simplicity of having only R- and I- formats, i found it was cleaner to add in R2 R1 R0 subformats anyways, and i think the complexity of juggling the immediate bits outweighs not having I5 I8 I12 formats.

So forget about swapping around the sign bit, that's stupid.

But i like having a fixed number of opcode bits for the Is, even at the expense of reducing JPC range by 2x -- it's probably much less efficient in terms of code density, but it's cleaner. So let's do:

---

ok i've mostly implemented Boot now but i havent finished testing it yet. It took me much of yesterday to write the initial encode/decode logic, then today I spent about half a day changing the encoding, and the other half writing the actual VM, and the first draft of the VM is done now but i've only tested the first few instruction. So i think it's likely that this is a two-day job in total (which i guess means a two-weekend job for hobbyists) (although, even though someone else wouldn't have to spend time changing the design of Boot itself, they would have to spend more time reading and understanding the spec i wrote, so maybe it really is a three-day job).

Anyhow, the point is, unlike the previous attempt, this is a success, in that:

it's sufficiently easy to implement!!!

(although i dunno if i'd call it 'dead simple'; i could have left out the I encodings b/c 32-bits is bigger anyways, i could have left out the rtwo rone rzero/MORE instructions, i could have chosen either a register or a stack machine rather than mixing them, i could have had a variable-length encoding that just had whatever data was needed after a one-byte opcode).

However i noticed that i now think that BootX?, as i currently envision it, wouldn't really be that much more complicated to implement, at least not if you don't include all the instructions. I think much of the complexity that i ran into before came from complicated ideas for the addressing modes etc, for example, having 'stack-aware' instructions that take an operand of $S to mean, not auto-push-pop to/from the stack, but rather, do something to the stack itself. Also my CALL3 idea led to difficulty in terms of how the return argument from the instruction being called would get assigned to the register indicated in the destination operand (although as i think about it at this moment, i don't see the problem; just have the RET instruction look at the jump target and if it's right after a CALL3 instruction then look at that instruction to get the return register).

Anyway, so i've simplified the addressing modes, and i don't think this would be a problem anymore. In fact, the encoding might even be easier, since with bigger operands the need for immediate encoding formats is not so important (with a 32-bit encoding you wouldn't need those immediate formats so encoding/decoding gets simpler). Also, now that Boot and BootX? have different numbers of registers, the story of incrementally extending Boot to BootX? is not so good as it was. And because the 16-bit encoding was so cramped (only +-1KiB? direct jumps), i felt the need to let a constant table creep in to Boot, which is complexity we wouldn't need with BootX? b/c we could have a 21-bit immediate constant.

And there was other stuff i was dreaming about with BootX?, about having it be a good emulator and having all these hooks and having capabilites and having polymorphic functions, but now i think that that belongs only in OVM if anywhere. BootX? is just a lowest-common-denominator target platform to make it easier for me to port everything on top of it.

So what about just calling the thing we're calling BootX?, "Boot", and having "Boot0" just be a subset of Boot's instructions (also Boot0 would only use the 32-bit instruction encoding, whereas Boot could have more)?

To be clear, what i currently mean by "Boot" (was: BootX?) is:

32-bit fixed length encoding: 3 encoding length format bits, 6 opcode bits, 2 opcode addr mode bits, 3 x (5 operand bits, 2 addr mode bits)

Boot0 would have the addressing modes (this removes the necessity for us to provide immediate-mode instruction variants like ADDI that are only needed b/c we don't have an immediate constant addr mode), but might not have the instructions supporting any types beyond int and ptr. Boot0 would have the 32x2 registers, plus 32x2 stack items.

to be clear, i am really excited that the 16-bit encoding of Boot seems to be really complete and dense, possibly even denser than anything else out there. And working with those constraints did help me hone my ideas about what is really essential and fundamental at this level of the stack of intermediate languages. But now i have those ideas honed, and the primary remaining goal for Boot is ultimately just to be an easy-to-implement, easy-to-use, portable target, not to win contests about code density and beautiful compactness. If the 32-bit format is easier to compile to than the 16-bit format (b/c of the greater number of registers), and not tons harder to implement, and if we'd have to compile Boot-32 to a Boot-16-only machine (due to greater number of registers in Boot-32), rather than easily extend the Boot 16 machine , then we should just start out implementing the 32-bit one, and consider the 16-bit format just an optimization.

---

if we start out with 32-bits though i'm just bothered by having so much state. If there's 32 registers, times two banks, and 16 stack locs, times two stacks, on a 64-bit platform that's:

32*4 + 32*8 + 16*4 + 16*8 = 576 bytes of state

so how about just going back to 8 addressing modes. Now we have:

16*4 + 16*8 + 16*4 + 16*8 = 384 bytes of state

Still a ton but not quite as excessive.

First three registers are still 0, stack, tos, and still have their special behavior (this is a little redundant, but it make them more useful in the other addr modes, and makes everything more uniform). Except i had a new idea: there should be a 'zero page' of pre-allocated memory, and the zero register in the pointer register bank should point to it, rather than be the PC (for the PC, there is a PC-relative addr mode, see below).

Now we have:

3 encoding length format bits, 4 opcode bits, 3 opcode addr mode bits, 1 opcode bit to control whether or not addr modes are in use or are just more opcodes, 3 x (4 operand bits, 3 addr mode bits)

(16*8 + 16) opcodes = 144

addr modes (note that some of these have two forms, int and ptr, for the two 'banks'):

Should &0 holds a null pointer? Or maybe some platforms won't let us have a null pointer? Let's see:

so... i guess this doesn't decide it? Sounds like on most systems, there is a null pointer so we are good, but there isn't always, but otoh, even on systems where there isn't, we could always allocate one byte somewhere and then declare that to be OUR null pointer.

Regarding which instructions take forced immediate operands (which is important/special because then the 3 addressing mode bits are part of that immediate), we could just start by saying that only that weird bank of 16 opcodes in the 'immediate' mode do this (or, do we want to use 'immediate' mode as a CALL3 to nearby PC-relative locations? i don't think that's very useful...).

---

i dunno, now that i look at all that, i think it does make it more complex after all: