Revision 5 not available (showing current revision instead)

proj-oot-ootAssemblyNotes12

if SootB? is also SHORT, i guess we really should find a way to at least call custom instructions from there.

also, if we've 'broken the seal' of non-fixed-size instructions in SHORT, OR if we use 16-bit fixed size instructions, should consider if we can do better and get some other 'addressing modes' in there besides pure stack.

if we throw some register addressing in there, that doesn't really increase the complexity of implementation very much imo, it's still a good bootstrap language.

---

we basically want 3 addr modes:

we need at least ~4 bits for opcode, plus some way to call longer 'custom' instructions.

3-operand format is nice but 2-operand mode is probably more efficient.

do we need to call first-class instructions? nah, save that for MEDIUM. So we never need the 4-th (3rd) operand.

With fixed length, we'd need at least 16 bits; because you really want at least 8 bits for absolute jumps, and you want the bitlength to be a power of 2. Another reason for this is 4 opcode bits + 2 operands * (2 addr mode bits) = 8 bits, without even putting in the operands yet.

So let's say we have 16 bits.

For absolute jumps, that gives us an instruction format with 4-bit opcodes and 12 bits of data.

For 2 operands with 2 bit addr modes, that's (16 - 4-bit opcode - 2*2 addr mode bits)/2 = 4 bits per operand left.

Now if we need more opcode bits, so that we can have more instructions, that reduces the above number. For 3 bits per operand, we can have 6-bit opcodes, and for 2 bits per operand, 8-bit opcodes.

but recall that since this is SHORT mode, we need 2 form bits now.

---

Since we only really need 3 addr modes, and since stack addr mode doesn't need an operand, we could compress things further:

---

So we're looking at:

---

wait, we COULD do first-class functions in here pretty easily, since we have 6 opcode bits, if we just use the first 2 bits to indicate when that happens;

---

if we added a 4th format we could also pack 3 4-bit stack opcodes in there.. jeez this is getting to be a lot of formats. Also, that would no longer be fixed-length. So mb not? Also, will chains of 3 stack-addressed instructions be all that common when we allow register addressing too?

---

could we possibly reduce this whole thing to about 8 bits?

---

so that reminds me, we'd really like at least 4 addr modes:

---

previous proposal is nice but 5-bit immediates isn't much; that only gives us a JMP table with 32 entries. We could accept variable-length addressing w/r/t the FORM bits. This wouldn't be too much more crazy than the old, simpler, 4-bit proposal:

---

so how about:

2 formats:

well now we're getting into the same sort of packed SHORT format we had before, where the first (and maybe the last) instruction has a special encoding, to accomodate those 2 form bits. But now we're trying to do a 2-operand format instead of 1. The old ootAssemblyNotes7 SHORT format used a system similar to the limitations on expressions in terms of pushes/pops and parenthesis described slightly earlier in ootAssemblyNotes11. The idea is that you probably want to PUSH the outcome of most of the instructions except at the end.

---

So how about:

2 formats:

hmm:

---

for the SHORT ones:

(in
out)2 (inout)2: 2x 2-bit ordinary (see below)
out)2 imm2: 2-bit ordinary + 2-bit immediate
POP, or if it's an imm4, then it becomes an imm3

instructions:

plus one MEDIUM instruction (LOADI 16-bits; note it can fit more than 16 bits but for SootB? we don't assume that our regs can hold more than 16 bits; i guess we should offer variants of LOADI that load into register 0, register 1, stack, or TOS? ordinary MEDIUM mode can support each of those. Maybe we also offer a 'STOREI' which writes a 16-bit immediate to any 11-bit address in (register?) memory)

NOTE: in MEDIUM, 'BR' in user-level code is constrained to only use 8-bits (one of the two inputs). The other input is reserved for the implementation. It is expected that the implementation uses this when inlining custom instructions (which increases the distance between points in user-level code).

todo: for hierarchical modules, need a way to LOADK an imported module into somewhere, and then LOADK from that memory location to access something exported by that module. I guess just treat modules as address subspaces.

---

y'know, i'm backing off the idea of unifying SootB? and SHORT. The thing is, SHORT is supposed to be efficient, and we won't know what's efficient until we profile, and we may want to change it each release to be most efficient. Whereas SootB? is for bootstrapping, so it prboably shouldn't change so rapidly. Otoh SootB? bootstrapping will run more efficiently if code density is high.

i guess the answer is, unify them but not yet? mb until then just use the subset-of-MEDIUM thing?

---

eh, maybe unifying SootB? and SHORT is good after all.

i guess what's really going on is that i think it's not worth spending as much time and energy as i have been on the details of SootB? (and even OotB?) assembly encodings at this point. This is supposed to be just the underpinnings for an easy-to-implement, interoperable implementation of Oot. And Oot isn't even designed yet.

---

i'm considering killing the second 4 addr modes and replacing them with a 'meta' bit. This might fit in with 3-Lisp too (although there the 'reflection procedures' were declared as such at the declaration site, not the callsite).

we also want some way to do 'stack' addressing, though. Maybe mmap it after all.

as for the sort of thing a 'meta' bit could be for, i'm copying this bit in here from ootAssemblyNotes7:

---

(following the previous) If meta level 2 is specifying a metaview as the view when targeting an edge (that is, the target of an edge is a node, but we can specify the metaview of that node, and then from there we can traverse from that node to other 'nodes' representing reified edges), then maybe meta level 3 is like an edge whose target is a view which is a graph which represents the different meta levels and their interrelationships.

 noting that there's different ways to construe meta (eg application level meta, implementation level (pointers), etc), as noted above

---

the 'meta' bit could choose between one of two 'view registers', which (similar to the capability registers) contain something which changes the 'view mode'.

my sense however is that 'meta' should be a mode within addressing modes (eg a direct product with existing addressing modes, doubling their amount; a modifier bit attached to addressing modes).

---

the interesting thing is that the 'unbox' bit and 'meta' bit are going in opposite directions, in a sense. A BOXED thing is a pointer (when the unbox bit is OFF), and a meta thing is a pointer (when the meta bit is ON); a pointer is more complicated (more meta), so the compilicated case occurs when the unbox bit is off but when the meta bit is on.

of course, if i had made it a 'box' bit that wouldn't be. But there's a reason it's an 'unbox' bit; treating a pointer in memory as a pointer is more primitive, less abstract, then pretending it is the thing it points to.

i think it's because 'unbox' refers to application level/platform stuff, whereas 'meta' refers to pure abstraction.

---

copied from ootModuleNotes1:

About C and C++ header files, and the C++ modules proposal [1]:

recommendations:

---

could we just skip both SootB? and OotB? and go directly to Root Core?

No, i don't think so, because it's more intimidating to write a parser than to write an interpreter for a linear instruction stream.

---

Could we just trust OotB? code and dispense with capabilities (capabilities would then be on the Root Core level)? Maybe... this gives up the ability to execute OotB? code in a sandbox, but perhaps we don't really need that, perhaps we only need to sandbox in the Oot Core interpreter.

Of course, OotB? could still be an efficient bytecode REPRESENTATION for Oot Core. But i'm saying that the bootstrapping OotB? interpreter need not handle capabilities.

And we could reduce the primitives instructions for (R?)Oot Core down to what we are getting for SHORT mode (SootB?) anyways..

And we could write the module loading code later, as part of the Root Core implementations, so that the bootstrapping intrepreter doesn't have to do it.

So then we'd have only:

to bootstrap a naive port, just implement Primitive OotB?. To make it faster:

Note that Root Core can be encoded in OotB?.

So what about the other difficult thing in OotB?, the unboxing (and 'meta' bits, and LONG, etc)? Just leave that in the encoding, but don't use it in OotB?; that's for Root Core representated using the OotB? encoding!

hmmm.. i kinda like this... seems like the Root Core interpreter may not be the best place for language services, though? I thought we wanted to implement those beneath the implementation of language constructs, at least naively. Well... i guess that qualifies, actually.. the Root Core language implementation will be much simpler, and we are an interpreter tower level below the Oot Core implementation.

---

ok so if we are following the immediately previous section, then the tower can be easily conceptualized:

OotB? is a language in itself, but it's also an AST representation encoding which can be used to encode Root Core or Oot Core code. When we come to additional 'higher-level' things like that for which it is not clear how OotB? should support them, we should try to push them up to Root Core. For example, there is an unboxing bit in OotB? MEDIUM format, but this isn't used in OotB?, only in Root Core. Also, when there is something that would put too much implementation burden on porters, we should to push it up to Root Core. For example, capabilities.

In a naive Oot implementation, then, the Oot program is syntactic sugar for an Oot Core program. The Oot Core program runs on an interpreter implemented in Root Core, which is itself running on a Root Core interpreter implemented in OotB? (which is itself syntactic sugar for a program composed only of OotB? primitives).

---

mb should have a different name for Oot Bytecode, the encoding format, and OotB?, the language.

Maybe Oot Bytecode vs Oot Assembly.

---

the BCPL language is interesting. BCPL was the predecessory to B, which was the predecessor to C. "B was essentially the BCPL system stripped of any component Thompson felt he could do without.". B was the first language to use a bytecode, called OCODE, for portability. Later, a lower-level assembly VM was added below the bytecode, called INTCODE. INTCODE was initially very simple, with 8 primary functions: LOAD, ADD, STORE, JUMP, JUMP ON TRUE, JUMP OF FALSE, CALL, EXECUTE OPERATION, with 23 more functions accessible via EXECUTE OPERATION, and then about 5-10 OS interfacing functions also in EXECUTE OPERATION.

There were two accumulators, two index register, an address register, and the PC. Like OotB?, the size of words was unspecified. An instruction was "a 3 bit function field, and an address field of unspecified size, 2 bits for index modification and an indirection bit"

See proj-plbook-plChSingleIntermedLangs for details.

---

hmm, as nice as it seems when i describe the idea of keeping OotB? simple and pushing anything complicated up to Root Core, it feels wrong.

So maybe keep capabilities and unboxing in OotB?.

I do think we should abandon SootB?, though.

And perhaps also Root Core should not be a wholly separate stage. I do think that the Oot Core implementation should be written in a restricted subset of Oot, and i do think we might have a special Oot implementation for this restricted subset (since it can assume static typing, etc). And so maybe when Oot is interpreted, it will be interpreted by an interpreter written in Oot Core, which is itself interpreted in an OotB? interpreter. But when compiled, i don't think ordinary Oot code should compile to this restricted subset, it should compile directly to OotB?. And since we can write platform-independent self-hosting compilers, we'll write a compiler from Oot Core to OotB?.

So the plan is:

---

another way we could have quintuples and interpret one field as 'meta' is just to reserve one field for metadata/annotations/prefix modifiers.

Eg we could take 2 bits from each of the 4 fields to make 8 bits of metadata per instruction, reducing the operand sizes from 12-bits to 10-bits (we could also interpret this metadata as being a 2-bit 'meta addr mode' plus 6 bits of 'meta operand').

Or we could also say that the last three fields (the 'true operands') should be largest, then the instruction field, then the meta field, and similarly with their addr mode bits. If the instruction field were only 8 bits, and the meta addr field were only 1 bit (so the whole meta field were 7 bits, not 8), then we could make the 'true operands' each 11 bits: 2 form bits, 1 meta addr mode bit, 6 meta operand bits, 2 instruction addr mode bits, 8 instruction operand bits, 3 x (4 addr mode bits + 11 operand bits). Lotsa nasty odd numbers in there, though. Also, the addressable instructions drop from 4k to 256. Also, that goes against the principal of allowing any register to serve hold a first-class function which can be immediately called. Unless... we reduce the number of registers to those addressable by the instruction field, and let the three 'true' registers hold larger numbers than the number of registers!

I don't think we need 8 bits of meta, and 10 bits operands are getting tight. How about 256 registers and:

2 form bits, 4 meta bits, 2 instruction addr mode bits, 8 instruction operand bits, 3 x (4 addr mode bits + 12 operand bits)

8 instruction operand bits sounds perfect, from the perspective of how many instructions (and custom instructions) we need, and from the perspective of keeping the max number of registers down to a reasonable number. But otoh recall that we are intending to map HLL local variables to registers. 255 locals is a pretty low limit; out of the popular languages that i've looked at, only Lua and 1988 C comes near that (200 local limit and 127, respectively). Also, Perl6's MoarVM? has ~700 instructions.

So maybe:

2 form bits, 1 meta addr mode bit, 4 meta bits, 2 instruction addr mode bits, 10 instruction operand bits, 3 x (4 addr mode bits + 11 operand bits)

or just:

2 form bits, 4 meta bits, 2 instruction addr mode bits, 11 instruction operand bits, 3 x (4 addr mode bits + 11 operand bits)

2k (11 bits) makes me happy in that:

buuuut... i think taking a whole OS page for the registers is pushing it. It would be nicer if the interpreter could fit the registers, and some additional internal state, onto one page.

So, mb back to:

2 form bits, 1 meta addr mode bit, 4 meta bits, 2 instruction addr mode bits, 10 instruction operand bits, 3 x (4 addr mode bits + 11 operand bits)

The issue with this is just that those 5 meta bits are totally unused right now, which is a big waste of space.

---

one reason to have 8 bits of registers is the idea of per-module custom instruction dispatch tables. With 10 bits of registers, even if we can only have 256 modules imported into any one module, that's an 18-bit dispatch table.

But even with 8 bits of registers, that's a 64k x 2 (assuming 16-bit addressing) dispatch table. Which is already way too large.

So maybe make the 'custom instructions' per-program instead of per-module.

---

incidentally, if you want quints, then we're getting closer to 128-bit instructions if we wanted 16 bit operands:

5 x (4 addr mode bits + 12 operand bits) = 80 5 x (8 addr mode bits + 12 operand bits) = 100 8 form bits + 5 x (8 addr mode bits + 12 operand bits) = 108

still have 20 bits free though, and we don't really need 8 addr mode bits, so this doesn't look like a great bargain.

---

i'm thinking of bringing SootB? back in the form of an 'extensible interpreter' with EXEC; that is, you can specify, in Oot code, how to handle the 2 custom addr mode bits (including the unbox bit), the form bits, and the meta bits. And how to deal with address subspaces, which can include getters and setters and capabilities. And perhaps handle 'custom instructions' too. You specify this stuff with 'plugins' for the interpreter, then you do something like 'EXEC' to tell the interpreter to start using those plugins. The underlying dispatch in the interpreter doesn't change; so this way you are not running an OotB? interpreter on top of a SootB? interpreter, you are running an extensible OotB? interpreter with plugins written in OotB?.

---

here's a proposal (i think this was a quals proposal, not an actual project) for a language to define data representation format/layout:

http://tap2k.org/projects/WIL/

---

note: to prevent security hazards, mb 0 is false and 1 is true, but must admit the possibility of some other integer, eg 2 or -1, being given.

---

mb just make it uadd08 after all, why not?

well, here's a reason why not: we mandate that registers must be able to accomodate at least U15s. Less confusing for implementors this way.

---

http://cs.lmu.edu/~ray/notes/squid/ is a neat IR, close to our own in some ways. It aspires to platform representation format independence. This webpage which describes it has admirably concise wording.

in addition, because the examples at the bottom show HLL-style syntax rather than assembly style, this would make it clear to a newbie how you could possibly do this sort of HLL stuff in assembly.

---

So if we don't want a Primitive OotB? implementor to have to implement LOADMODULE, then what primitives do we need?

We could make custom instruction lookup extensible. But that seems to gut the purpose of have an extensible interpreter instead of one interpreter running on the other, because it would be really slow. Remember, in the simplest case (unbox bit zero, meta bit zero, instruction-level meta bits all zero?), we want to run an instruction without touching the plugin code.

But if we don't do that, then the implementation must at least sort of understand the concept of different modules, right? It must know which module a piece of code is currently in, and maintain a dispatch table for each module (or map them all together).

No, not if we make custom instructions per-program instead of per-module. Which is fine provided that custom instructions are really just a bootstrapping technique, not a truly modular instruction set. In this case all we need is ISDEFINE and DEFINE. Note that SYSCALLs can also be defined.

---

LONG format would be simpler if we just had a series of 64-bit length fields!

but since we're 16-bit-y, maybe a series of 16-bit length fields instead? This is still different because the grouping of fields into 'instructions' is variable-length.

---

so i guess the main things in LONG are:

for representing arbitary-length fields, there are a few obvious choices:

---

y'know, on an actual 16-bit machine, yes, 1k registers (which, in most implementations, probably each must be able to hold a native pointer) will take up 2k bytes of memory; but on typical machines, which are 64-bit, 1k regs will take up 8k of memory, which is two full pages! To make it take up half a page, you'd need at most ~256 regs. Which means <256 local vars, since the implementation needs some regs.

of course, most functions won't need all those regs, and better implementations will take advantage of that.

---

eh, if we're gonna have 'quints', may as well be more uniform:

2 form bits + 2 flag bits + 5 x (4 addr mode bits + 8 data bits)

dunno what that'll accomplish, maybe something like letting the meta stuff address any register. This is kind of a big waste, though, since there are 12 bits of meta stuff (and 2 custom flag bits) that aren't needed for most instructions. But it's general and clean...

and it means that even with max registers, if each register is the size of a native pointer, then the registers fit into half of a 4k page on a 64-bit machine, leaving the other half for other implementation state.

This means that we would succumb to Lua-esque restrictions on number of local vars. Maybe worse, as, to be clean, we'd probably just limit the HLL to 128 vars and let the implementation use the other 128.

We'll also have trouble inlining custom instructions that use more than a few registers. We should define a small, fixed set of scratch regs for the custom instruction. Mb the implementation is responsible for saving and restoring these via a hidden stack? Or mb each custom instruction says how many it needs? Or mb just provide the 'hidden' stack explicitly, make each custom instruction declare how much hidden stack space it needs, and make the custom instruction 'callee save' to it as needed. Callee save is insecure, so we'll have to assume that any custom instructions are maximally trusted. Since custom instructions must not be recursive, we can statically determine the maximal nesting depth and similarly the maximal hidden stack space needed once we know the set of available custom instructions.

If we don't use it in the language, then the implementation could use the meta stuff or the flag bits for implementation-dependent annotation, eg whether or not this instruction has already been JIT'd. I guess we should specify whether that's allowed.

Yeah, ok, let's give the implementation at least one bit:

2 form bits + 1 implementation-dependent bit + 1 flag bit + 5 x (4 addr mode bits + 8 data bits)

the implementation-dependent bit must be 0 in Oot bytecode in a module file.

---

a minimal/bootstrap implementation of capabiliies would be easier if we didn't provide the ability to union capabilities together; so code would have to be constantly swapping different capabilities into the CAP register in order to access different address subspaces. Perhaps this unioning could be implemented later at a higher level?

i don't immediately see how to simply add the unioning on at a higher level later.. and forcing even higher-level code to manually swap stuff into the CAP register just for this is unacceptable.

could we instead just get rid of the special CAPMALLOC'd subspaces, and just unify capabilities with pointers? But that would mean necessarily typechecking to make sure we don't coerce capabilities.

or... capability tags on everything.. or... a separate map of where all capabilities are in various address subspaces, checked before each write

hmm.. none of these sound very appetizing

---

stuffing as much as possible into address subspace getters/setters seems like a good idea for easy bootstrapping.

---

hmm... we could just go OOP early and encapsulate the allowed operations on capabilities that way... ("early" just means in Oot Assembly, which is supposed to be somewhat lower level..)

---

ok i think you could do most everything by providing the following hooks in the interpreter:

Regarding address subspaces, they each also have getters and setters (and maybe jumpers), but calling these hooks is done by the core get and set hooks, so the porter doesn't have to worry about that. The type of address subspace will be statically known and fixed per-program (and hence fixed if you are using OotB? to run Oot) (in the same sense that the set of custom instructions will be fixed), so a more advanced implementation of an OotB?