Revision 13 not available (showing current revision instead)

proj-oot-ootOvmNotes1

see also ootVm.txt; that file is older and needs to be integrated into the list in the following section

---

so i think some stuff that OVM should implement is:

todo should also read more about the OOP parts of JVM and CIL. todo should also read about Cmm/C-- runtime support todo update Oot todos

as you can see, the purpose of OVM is coming into focus:

but everything should be not very dynamic/metaprogrammable/dynamically typed (remember the lesson from PyPy?), so this is different from Oot Core.

---

Instruction format (128-bit):

1 x 16-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)

16 bytes

Note that there is no space for encoding length format bits here -- UNLESS you reduced the opcode by 5 bits to 11 bits. Which isn't crazy. So maybe:

5 encoding length format bits 1 x 11-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)

16 bytes

We could also have a 64-bit encoding format:

4 encoding length format bits + 2 RESERVED bits + 10-bit opcode + 4 x (8-bit operand + 4-bit addr mode)

---

i dunno man, this seems like a lot of work to implement in assembly language.

also, what about the idea of incremental implementation? how is the porter going to be able to just implement the bare VM and use existing implementations of most of the opcodes?

i guess some of these things could be compile-time macros (eg hash tables).

but i feel like we really have a couple of levels here.

maybe some/many of these things would be 'standard library functions' at the OVM level (eg hash tables).

hmm, that makes a lot more sense to me. So we would specify a smaller core VM, which has to actually be implemented, and then a large standard library. And then porters would have to implement the VM, and then for better interoperability and efficiency on a given platform they could go ahead and incrementally override parts of the standard library with platform-specific stuff.

another issue is static typing. There's a tension here:

I think the solution is: (a) some of the dynamic stuff will be in the layer above (in the implementation of Oot Core on top of OVM) (b) there is some dynamic stuff at this level but it is easy to tell from looking at each instruction if it has any dynamicity. For example, if we use my idea for 'polymorphic assembly' by way of a type operand, then instructions whose type operands are constant mode are statically typed. This means that OVM code that is composed only of non-dynamic instructions can be efficiently compiled. And the language implementation itself will be like that.

Still, this suggests that maybe we are trying to do too many things at once.

Should we have one layer for a 'language implementing language', and then OVM is a 'runtime VM' implemented in that language? The problem with that may be that the 'runtime VM' has to support funky control flow like delimited continuations, so we don't want the language implementing language to impose and abstract away something like a restrictive call chain/stack abstraction, because then it seems like we have another interpreter-on-top-of-an-interpreter layer. But i'm not sure i fully understand that part, so that objection could be invalid. todo/hmmm.

My immediate thoughts are that Oot itself may be the 'language implementing language' that the reference implementation is ultimately written in. So when i say 'it's a lot of work to write this in assembly' that's not relevant, because the Oot implementation will be compiled to Boot, we don't have to write it in Boot directly (except maybe once to bootstrap). But is this really true? I don't expect our compiler to be smart enough to write efficient Boot code for things like context switches in the scheduler.

And, in any case we actually want the runtime VM to have the property that it supports dynamic typing yet you can easily identify code with static typing, because this will help JIT'ing, compilers, etc. This is certainly helpful for efficient compilation of a self-hosting implementation of Oot itself, but it'll be helpful for user code as well, because users will be able to write statically typed Oot code, we can use the ordinary toolchain to compile that to OVM, and then the toolchain will be able to recognize that the OVM code is statically typed and compile it down further rather than interpreting it.

---

so here's the design i'm coming up with. It seems odd to me, in that i don't think i've heard of it being done this way before, but it seems to satisfy the considerations noted in the previous section:

OVM is a VM with opcodes.

Some of the opcodes are primitive. A porter has to implement these. For example, BEQ.

The opcodes which are not primitive are 'standard library functions'. These have implementations provided in the language of OVM, in terms of primitive functions (or other library functions; but there are no circular dependencies between library functions, except for self-referencing recursion (a function can call itself)). For example, hashtable_get(table, key). A porter can start out not implementing these and then implement them incrementally later on to improve interoperation and efficiency.

Some of the opcodes, including some (but probably not all) of the primitive ones, and including some but not all of the standard library ones, are (what's the word for this? secured? protected? guarded? fenced? barricaded? shielded? defended? prohibited? restricted? controlled? secured? access-controlled? restrictedaccess? unsafe? let's say 'unsafe'), in a protection-ring-security-model sense. If we are given some untrusted user code to run, we had better scan through it and make sure it doesn't contain any of these opcodes (alternately, the OVM could have a special mode where it faults on privileged instructions). For example, array_get_noboundscheck(array, location).

Standard library opcode implementations can call unsafe opcodes.

Some of the opcodes can sometimes or always be 'dynamic' (the others are called 'static'). This may make them difficult to statically and efficiently compile to some targets. It is possible to determine whether each instruction instance is static or dynamic just by syntactically inspecting it. For example, 'ADD' (polymorphic addition) is dynamic when its type operand is not a constant.

The Oot implementation is written in terms of only static instructions, and can freely use unsafe opcodes.

User code that contains only static opcodes can be most efficiently compiled.

User code that is untrusted cannot contain unsafe opcodes. However, it can contain safe standard library opcodes which are themselves implemented using unsafe opcodes.

This design should:

Should we partition the opcode address space to make it easy to recognize unsafe and primitive and static opcodes? Yeah, why not, we have 16 bits.

I'm thinking that memory management would work like this: there are primitive operations that do stuff (like loading, storing, copying) without managing memory, and primitive operations that do things like incrementing/decrementing reference counts, and reading/writing through read/write barriers. Some or all of loading/storing/copying directly without memory management is unsafe, and messing with reference counts is unsafe. Then memory-aware variants of stuff like loading, storing, copying is provided, and untrusted code (or portable user code) uses that.

---

i guess OVM should have some optional instructions that some platforms and not others implement, and that are not provided by macros/stdlib unless implemented.

For example, unicode stuff: embedded platforms usually won't support this because it requires a lot of data, but desktop platforms usually will. We want the HLL to be able to use the platform unicode stuff if it is there because interop, but otherwise the HLL must do without.

---

some notes from RPython [1]:

"Note that there are tons of special cased restrictions that you’ll encounter as you go. The exact definition is “RPython is everything that our translation toolchain can accept” :)"

ok that's crazy

---

my conclusions from the previous section:

we should do: