Bayle Shanks's website: proj-oot-ootCoreThoughts

name change to 'oot'

---

now, to specify a bit field width which is a power of 2 between 0 and 7, we need to choose between 8 values, which takes 3 bits.

so we can pack together a bunch of bit field width specifications into a few bytes, with 3 bits for each spec. or maybe we should make it 4 bits to be even.

what are some things we may need to specify?

bit width for opcode
bit width for pointers
bit width for registers (if we have registers)
bit width for max number of local variables (or mb all accessible variables, including non-locals and constants and subprocedure names)
bit width for max stack depth
bit width for network addresses and IPC addressses
bit width for integers
bit width for floating point
bit width for characters

that's 9 items, so let's limit ourselves to 8 because it's a power of 2.

so i am suggesting that there is a 'self-describing' format for bytecode which specifies these bit widths in the first few bytes. This allows this format to be suitable for embedded as well as desktop encodings. i am not suggesting that most Oot Core bytecode interpreters should actually have the flexibility to accept files with arbitrary values for these bitwidths, just that conceptually it's a good way to specify things.

for example, low-end PIC MCUs had "12-bit instructions" made of "5 address bits to specify the memory operand, and 9-bit branch destinations", e.g. a bit width of 5 for the opcode and 9 for pointers. If we demand Oot bit widths to themselves be powers of 2, that rounds up to 8 bits for instructions and 16 for pointers. Which is not actually very satisfactory.

note that now low-end PICs have about 35 instructions (http://en.wikipedia.org/wiki/PIC_microcontroller#Instruction_set), so they need 6 bits for the opcode. Lua bytecode has 38 opcodes, also using 6 bits ( http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf ).

which suggests a two-step system:

use 3-bits to specify the bit width of the specification of the bit-width of the things
then specify the bit widths of the things

because on embedded systems we wouldn't want each program to waste like 8 bytes specifying these bit widths, even in the fully-describing format.

but this is too much; because 3 bits can specify a 128-bit width specification which can specify bit widths up to 2^(2^7), whereas we only want to specify bit widths up to 2^7. if we had 2 bits, then that's 2^3 instead of 2^7, which is 8, which is right.

so e.g. if the two bits were both 1, that means that the bit width spec for something is one byte. but wait, we never use zero, so if the two bits are both zero, that means the bit width spec is one bit (1 bits or two). if the two bits are 01, that means the spec is two bits (bit widths up to 4), if the two bits are 10, we have a four bit bit width spec (bit widths up to 16), and if the two bits are 11, we have an eight bit bit width spec (bit widths up to 256).

now we'll never really use those first few choices, because even old PICs and Lua need bit widths of 5 or more (we aren't trying to support esoteric computing here, with small bit widths just for the heck of it). So we would always see either 10 or 11. So we really only need 1 bit:

0: 4 bit bitwidth spec 1: 8 bit bitwidth spec

so in the self-describing format, the first byte could contain some bits that choose between 4-bit and 8-bit bitwidth specs.

Alternately, we could assume that tight embedded systems won't use more than a bit width of 16 for anything, and that systems which do have any bit width above 16 won't might a few extra bytes at the beginning of each Oot program, and have just one bit, which selects whether all subsequent bit width specs are 4 bits or 8 bits.

note also that in the case that the bit width specifications only need to go up to 16, we also know that no bit width will be below 4, so maybe we can save a couple more bits with some clever encoding, although that's probably more trouble than it's worth.

so if we have 8 things that need to be specified, we have a leading byte with that bit and maybe some other spec junk, then we waste either 4 or 8 bytes on the bit width specifications. What if we are so embedded that we don't want to waste 5 bytes? Maybe add another bit that allows us to turn off specification and accept the minimal values (probably 6 or 7 bit opcodes, 8 bit integers, 8 or 9 bit pointers, etc).

Note that PICS originally had 12-bit instructions, AVR has 16-bit instructions, MCP430 has 16-bit instructions, ARM Thumb originally had 16-bit instructions and now has 32, Lua bytecode has 32-bit instructions; so at a minimum the total number of bits per instruction (including opcode and operands) will be >=8, and >=16 most likely. EmbedVM?, which take only 3 KiB? on an AVR, has 8-bit instructions with some instructions taking 8- or 16-bit extensions. PyMite? has 8-bit opcodes and some instructions take 16-bit arguments (all bytecodes after 90 (0x5A)). eLua seems to follow Lua: https://github.com/elua/elua/blob/master/src/lua/lopcodes.h . What does Java Card do? I know Java Card opcodes are 8-bits, but how long are potential arguments?

So let's say that the Oot word size is 16 bits at a minimum, and that Oot opcodes are 6 bits at a minimum. So an instruction has at least 10 bits besides the opcode for operands, etc. If we tried to fit all instructions into 16 bits, with no extended instructions for e.g. memory addressing, and if the pointer size was at least 8 bits, then we'd only have 2 bits for another operand, so a LOAD or STORE opcode could only have 4 registers. Or, like PIC, we could replace LOAD and STORE with instructions that implicitly choose a register. One choice (not PIC's) would be to have LOAD and STORE but only to an implicit register, and then we have to follow or precede them with a general MOV. A MOV has 10 bits for all operands so it could have 5 bits per reg, so at most 32 regs. PIC's style is rather to have a MOV with one implicit register, in which case we'd have 10 bits to choose the other one (1024 registers, which i guess only makes sense if you memory-map them like PIC).

That all seems pretty tight, and i think i would rather have some more bits for modes to make the instructions more orthogonal. It seems like the most embedded systems have either 16-bit instruction sizes, or 8-bit instruction sizes with some instructions taking up to 16-bit extensions. Lua's choice of a uniform 32-bit instruction also seems reasonable.

I guess 32 bits is more uniform but the tighter thing to do would be to have extensions; so some instructions-with-arguments are 8 bits, some are 16, some 24, some 32. Is there any advantage to having 8/16/32 and disallowing the 24? The only thing i can think of is that that way all instructions align on 32-bit boundaries; which might be worth it already. Is there a reason that ARM Thumb and Thumb2 didn't offer any 8-bit instructions, only 16? Maybe because if you don't have enough commonly used 8-bit instructions, you might waste some opcode space on the 8-bit opcodes, depending on how instructions are formatted? Maybe because the CPU can decode easier if everything is on 16-bit boundaries? Maybe because there just weren't enough instructions without big operands to make it worth the extra complexity? Also, i notice that e.g. PyMite? has all the instructions with extensions have opcodes higher than 90; this sort of thing seems like a good idea in case you ever want to quickly segment the code (or in case you want to preload the rest of each instruction as soon as you read the first part).

Note that Java bytecode reserves two opcodes for the use of the interpreter. Sounds like a good idea.

so, let's say that the first byte is reserved for self-describing specification. The first bit of the leading byte is 0 if we this is the only self-describing specification byte, or 1 for the full deal. The interpretation of the rest depends on the first bit.

If 0, then:

we choose the most embedded choices
we have 7 bits left (todo)

If 1 (multibyte self-describing specification), then:

the second bit is whether we use 4 bits or 8 bits for each bit-width spec.
we have 6 bits left in the first byte (todo)

things we might want to include:

a bytecode interpreter version (sorta; we want to allow there to be more than 256 revisions, so this is really a version-within-a-version)
a program checksum
capabilities
- threading?
- floating point?
- exceptions?
- ideally if these are extensive enough that someone making those things like the embedded partial JVMs and the embedded partial Pythons could just check the capabilities requested bit of a program and tell you whether or not that program can be run on that partial Oot bytecode interpreter
safety mode requests (bounds-checking? etc)

---

Oot Core isn't really concerned about registers or stacks. It isn't trying to be a low-level bridge between Oot and the machine. Is there then any use to us of the concepts of register machines and stack machines?

Yes; think of registers as markers or aliases which attach to a subset of local variables which we want to cache because we plan to use frequently. E.g. when we operate on these we don't want to do a trip to main memory.

Now, the stack also fits within this conception; think of the stack as priority number labels which we attach to a subset of local variables.

Seen this way, we might have a use for attaching priority annotations to certain local variables. Note that the model is not 'load this into register A3', it's more like 'Note that i want variable x to have priority 3'; we could allow multiple variables at a given priority level, but i think it would work out better if the priorities were a stack, because then we could have instructions like 'add whatever variable is at priority 0 to whatever variable is priority 1'. This assists an interpreter or JIT compiler in deciding which variables should be mapped into registers for rapid access or stored on the stack as opposed to the heap. It also could improve code density if we add some register-like or stack-like instructions which don't need an entire pointer argument (RISC-style operands instead of memory-to-memory addressing); in this way we could probably fit a lot of commonly used instructions into 8 bits instead of 16 or 32 bits. Otoh perhaps those sort of things should exist on a level below Oot Core.

in other words, registers are like pronouns. Oot Core doesn't use pronouns, but it does annotate the code so that an interpreter can quickly transform it into a pronoun-using form. Alternately, Oot Core could have an alternate pronoun-using form itself (that makes more sense, i guess)

if we have 6 bits of opcodes (64) ourselves, and if we give the compiler another 6 bits of custom opcodes (1 more bit), then we only have one bit left in the first byte of the instruction. how about if this bit is zero, we go on to read in more bytes as arguments, but if it's 1, then we assume the 'common case', which may be different for each opcode. the 'common case' often takes arguments off of the stack and places the result onto the stack, and immediate arguments are things like 0 and 1 (e.g. right shift by n becomes right shift by 1).

---

should we have the interpreter look up which type a piece of data has, or should we specify the type in a regular field in the second byte of the instruction, or should it be part of the opcode as with many VMs, or should it be an annotation?

it seems to me that type should have been inferred before this point, and at this point should be explicitly represented

if type is in a regular field, then for things like addition, i guess the 'common case' defaults to a type, e.g. 'int32' (the 32 being platform specific, this is really 'approximate fixed width int') for addition.

---

also, recall that unlike register machines we can do memory-to-memory operations

---

could just put stuff on the virtual stack whenever it is a result

e.g.

a = 2 b = 3 add_default

puts 5 (2+3) on top of the stack, because the result 2 which was assigned to a was also pushed onto the stack, and likewise with 3, and then add_default took the top two stack args and added them and put the result onto the stack

a = 2 b = 3 rot2 subtract_default

puts 1 (3-2) on the stack

since we're working directly with variable names, not registers, we aren't doing this for the same reason as in assembly. the advantage is code density, i guess.

i think this concern for code density may be a distraction. One could imagine a Oot Core in which there were 32 instructions which was very regular, orthogonal, simple, and expressive, and in which almost every instruction required 16 bits of modes and types etc, and 3 64-bit pointers to main memory. This would be a perfect good Oot Core for our goals, but it would have terrible code density and likely poor execution speed. This could be improved by later compiling it to a register machine-based VM, but this is merely an optimization.

get as our selection primitive (e.g. build case, if, pattern matching out of this; conditional goto out of this and goto).

typeOf so that we can use case on typeOf(x) to implement polymorphism on type of x

Unify Haskell let.. in, where, and function application; they are all 'evaluate this expression in this environment'

could substitute/choose ENV via a 'prefix opcode' like LOCK on x86

also what other 'prefix argument' type things? i guess we need labels and annotations. these map cleanly to the graph. and boundaries

and if we distingush ordered seqs and unordered expressions (and mb unordered seqs) we need blocks (special case of boundary i guess), right?

in oot core all this graph-y stuff is represented directly as a graph

are loops actual cycles in the graph? mb in a sense, since there may be an edge to the beginning of the loop.

in oot core, do we still have 'while' or has this been compiled to a goto? we may want to do some optimizations with 'while'. maybe leave it in but only as an annotation?

the concept of sequence within graph edges, nodes

by sequence i mean a mapping from natural numbers to edges from a node, or to a subset of nodes, and also the idea of the structure of the integers (successor especially)

and if we have the concept of sequence, we should also have the concept of a total order.

and then, of a partial order.

-- set?

relation?

namespaces:

the 'root node' maps names to nodes throughout the graph, but mightn't this mapping be hierarchical? this would allow the mapping of the same name(s) to multiple distinct subsets of nodes in different parts of the hierarchy. e.g. would be useful for integer names, e.g. a:3 and b:3 are different nodes, but we understand the structure between a:0,a:1,a:2,a:3 and b:0,b:1,b:2,b:3. should we use dots for this instead of colon? probably.

in fact, if we have a general 'prefix opcode' for substitution/env which assigns formal parameters used in the following expression, then we can use this instead of an AST 'case node' which specifies both the the variable to be switched on, and the cases, in case (switch) statements.

(if you see what i mean; i.e. the case statement itself is an expression but doesn't refer to any particular variable; it's the previous opcode which sets its variable; i assume that the same or similar thing could be used if we are in imperative mode instead of expression mode)

this would allow us to collapse the distinction between function application (expression substitution), case statements, and get.

if we have an imperative and expressive mode, are there things you can do in one mode and not the others? are expressions pure? can you do goto within an expression? can you do continuations within an expression? can you do continuations within a statement?

maybe oot core should be very permissive (e.g. like debug mode/sections in oot, in which imperative actions are allowed even inside pure unordered expressions) and leave it to the higher level language, the Oot compiler, to enforce such things.

similarly, mb oot core doesn't need access modifiers

does oot core make use of protocols like __get? or is this higher-level? i'm thinking the latter

also, are views higher level? i'm thinking yes.

is oot core safe from segfaults or does it do naked pointer dereferencing? i'm thinking, unsafe (but capabilities can force it to be safe; bounds-checking is optionally included).

add copysign to oot core (? todo only in webassembly?)

add SIMD to oot core?

---

missing (todo see what this is again):

from NaCL?: icmp, fcmp, phi, select, insertelement, extractelement, bitcast, imul

missing (todo, replace vectors with arrays and structs with graphs? or both with graphs?)

from asm.js, sorta: arrays, structs

---

moot is simple but not minimalist. By 'simple' i mean it strives to provide only a small mostly-orthogonal 'basis set' of constructs. By 'not minimalist' i mean, for example, NAND alone is sufficient to build the boolean functions, but instead we provide AND, OR, NOT. Also, when there are multiple 'paradigms', each with their own natural 'basis set', we tend to provide ALL of them (or at least, a few of them), even though that's redundant. For example, we provide imperative iteration but we still also provide 'map'.

---

todo look through eLua, PyMite?, MicroPython?, Java Card, CIL, C0, MISRA-C, MiniML?, NETMF, etc (all the subsetted variants of popular languages/VM platforms), and make three lists:

the union of things that they DON'T support that their base language does support
the union of things that they DO support
the intersection of the things that they DO support

also, search for 'subset of C', 'embedded java', etc, in case there's more since the last time i searched (i never searched for subset of C or embedded C btw). Also look in SingleIntermedLangs?, there's probably some more subsets in there that i forgot; you might add stuff in those to the above lists anyhow.

---