proj-oot-ootAssemblyNotes24

Difference between revision 6 and current revision

No diff available.

---

190529

we need to take a hard look at if Boot is meeting its goals: i'm not sure that it's any easier to implement than RISC-V RV32I.

it provides other benefits:

it's better than WASM (no weird structured control flow), simpler than JVM, much simpler than CIL, simpler and lower level than Lua, and simpler than ARM (esp. in instruction encoding). But it may be more complicated than RISC-V. And RISC-V is more popular, so even if it were equal complexity, RISC-V should win.

i used to be scared of RISC-V's instruction encoding variants, but now we have a bunch of those too! And ours are (right now, at least) even less regular than theirs, and also theirs are more thought out and have a bunch of properties that they claim are important for simple, efficient hardware decoding:

Yes, their 32-bit and 64-bit platforms are different not only in pointer size, which makes it harder for us to write code targeting both. But we already have an unknown pointer size in Boot, is it really that much trouble to have distinct 32-bit and 64-bit targets? And it's probably not so much trouble to make it worth it to have to start from scratch toolchain-wise with a new assembly language.

Yes, for our purposes, it's slightly simpler to include the necessary syscalls as opcodes. But again, not worth the complexity of introducting a new assembly language.

the only thing that i think might really be important is pointer opaqueness. I'm imagining interop situations where we are running inside a high-level platform like Python, and we want to interoperate with its data structures. In this context we don't have pointers, we have opaque references. We can store them, but not in integers. We can't do pointer arithmetic on them even if we want to. I'm imagining that we will be accessing struct fields (e.g. objects) by assigning a numerical offset for each field and then adding that offset to the base 'reference's, which is why i include the operation of adding an int to a pointer. (now that i say this, it strikes me that even blt-ptr (bltp) is too much, ok i just took that out).

in older versions of Boot we had systems that made the pointer size totally opaque; the program didn't even have to know it for memory layout b/c memory was accessed in terms of pointer-sized words. Should we go back to that?

In any case, we should at least work on our instruction encoding to make it more regular, more like RISC-V.

i think the takeaways are:

---

ok so thinking a little more, we need to go back a little to the more complex system where memory was a linear sequence of locations of opaque type and size. The reason is that the whole reason that we want totally opaque pointers is in order to use Boot code in situations like when we are running on top of Python; memory we allocate is Python arrays and we access Python objects by assigning a numbering to their fields and then treating the object reference as a pointer and adding integers to that pointer to specify the fields. The ability to do this is why Boot is more useful for interop than RISC-V.

so some considerations:

So, TODO:

---

so far i'm leaning towards/TODO:

---

if our memory locations could literally even be references to consecutive fields in a C struct, then some of those C struct fields can hold i32s, some can't, some can hold pointers, some can't, and this changes for different types of structs. So in that case, either you can to check the size of things at every single location, or the program has to have some knowledge at compile-time.

And if you are relying on the program to have compile-time knowledge about what fits where, then is there anything left to be done? And in that case, are we really any better off than RISC-V?

Perhaps we want the Boot program to have the POSSIBILTY of doing low-level interfacing like this, but to also have a sort of 'general case' memory type that it can do internal computations in, which is more predictable?

here's a few cases to consider:

If we rule out the idiosyncratic memories then things probably become more tractable. This means that memory locations never refer to things like references to consecutive fields in a C struct -- you would treat such things as fields of bytes, instead. But in this case how would we interop with structs on top of a typed HLL language, like Haskell?

---

so, i think that direct interop with external data structure layout is always going to require either program knowledge, or a debilitating amount of runtime introspection. When we say that we want Boot programs to be universal, all that we can really ask for is for the existence of uniform memory for internal data structures -- so e.g. if a hash map library is written in Boot, we want to be able to re-use that same code when running on top of Haskell, on top of Python, or on top of raw assembly. Now again, we can do this at compile time (have constants for the size of ptrs, the sizeof i323s, the size of i16s), or we can do this at runtime (introspection/sysinfo, possibly additional instructions such as addsizeof to make it easier to iterate through memory, although we still need to do other stuff when we need to e.g. compute how many memory locations an array of 100 ptrs takes up).

so, can we/should we just use RISC-V for this? maybe... the semantics we are using are pretty different (pointer arithmetic is invalid) but that may not be a problem. should think further. my instinct is no, the semantics are so different that you don't gain much by using it, so just make something new, it'll confuse ppl less anyways. but that may just be Not Invented Here syndrome.

---

okay i've thought about this a little today, i think it's still worth doing, and here's why:

the other option we are considering is just using RISC-V, but adapted for our case.

this means just specifying that pointers are opaque.

so what happens when you load a pointer? what instruction do you use? We need to add a 'load pointer' instruction, and a 'store pointer' instruction.

and what happens when you add an int to a pointer? what instruction do you use? You could just use 'add'. But then the VM has to check what type is in the registers each type it encounters an 'add', in order to know whether we are adding integers, or adding one pointer and one integer. it would be better to have an add-int-ptr, like Boot.

what happens if you try to do pointer arithmetic? it fails at runtime.

So, we need at least some new instructions (load and store pointer) and could really use some others (add int to pointer). That's already probably enough to make us think about a new language. And pointer registers seem to fit really well too.

we could always do an extension to RISC-V tho...

and there are other minor reasons to go with a new language. We're going to have to redefine the RISC-V semantics a little (no pointer arithmetic), and we're going to have to add the syscall fns. It's more confusing to have an overlay like this than just to make our own new thing.

---

can we make better immediate formats? actually i don't buy RISC-V's idea of splitting the immediates to keep the sign bit in the most-significant-bit. I think it's simpler to keep them together.

---

this 'BootX? should have no runtime requirement over Boot' is getting too expensive. I put in a special 32-to-8 and 8-to-32 cpy (mov) instruction (with its own instruction format) so we could access all 32 BootX? registers from Boot. And i put in extra cpfis, cptis, cpfps, cptps (cpy from int stack, cpy to int stack, cp from pointer stack, cpy to pointer stack) instructions so that we could access deep into the stack without pushing and popping. And i kept the intstack and ptrstack at capacity 8 so that those instructions can access all of it.

But this is getting ridiculous. I'm going to take all that out and assume that BootX?