Difference between revision 21 and current revision
No diff available.see also ootVm.txt; that file is older and needs to be integrated into the list in the following section
---
so i think some stuff that OVM should implement is:
todo should also read more about the OOP parts of JVM and CIL. todo should also read about Cmm/C-- runtime support todo update Oot todos
as you can see, the purpose of OVM is coming into focus:
but everything should be not very dynamic/metaprogrammable/dynamically typed (remember the lesson from PyPy?), so this is different from Oot Core.
---
Instruction format (128-bit):
1 x 16-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)
Note that there is no space for encoding length format bits here -- UNLESS you reduced the opcode by 5 bits to 11 bits. Which isn't crazy. So maybe:
5 encoding length format bits 1 x 11-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)
We could also have a 64-bit encoding format:
4 encoding length format bits + 2 RESERVED bits + 10-bit opcode + 4 x (8-bit operand + 4-bit addr mode)
---
i dunno man, this seems like a lot of work to implement in assembly language.
also, what about the idea of incremental implementation? how is the porter going to be able to just implement the bare VM and use existing implementations of most of the opcodes?
i guess some of these things could be compile-time macros (eg hash tables).
but i feel like we really have a couple of levels here.
maybe some/many of these things would be 'standard library functions' at the OVM level (eg hash tables).
hmm, that makes a lot more sense to me. So we would specify a smaller core VM, which has to actually be implemented, and then a large standard library. And then porters would have to implement the VM, and then for better interoperability and efficiency on a given platform they could go ahead and incrementally override parts of the standard library with platform-specific stuff.
another issue is static typing. There's a tension here:
I think the solution is: (a) some of the dynamic stuff will be in the layer above (in the implementation of Oot Core on top of OVM) (b) there is some dynamic stuff at this level but it is easy to tell from looking at each instruction if it has any dynamicity. For example, if we use my idea for 'polymorphic assembly' by way of a type operand, then instructions whose type operands are constant mode are statically typed. This means that OVM code that is composed only of non-dynamic instructions can be efficiently compiled. And the language implementation itself will be like that.
Still, this suggests that maybe we are trying to do too many things at once.
Should we have one layer for a 'language implementing language', and then OVM is a 'runtime VM' implemented in that language? The problem with that may be that the 'runtime VM' has to support funky control flow like delimited continuations, so we don't want the language implementing language to impose and abstract away something like a restrictive call chain/stack abstraction, because then it seems like we have another interpreter-on-top-of-an-interpreter layer. But i'm not sure i fully understand that part, so that objection could be invalid. todo/hmmm.
My immediate thoughts are that Oot itself may be the 'language implementing language' that the reference implementation is ultimately written in. So when i say 'it's a lot of work to write this in assembly' that's not relevant, because the Oot implementation will be compiled to Boot, we don't have to write it in Boot directly (except maybe once to bootstrap). But is this really true? I don't expect our compiler to be smart enough to write efficient Boot code for things like context switches in the scheduler.
And, in any case we actually want the runtime VM to have the property that it supports dynamic typing yet you can easily identify code with static typing, because this will help JIT'ing, compilers, etc. This is certainly helpful for efficient compilation of a self-hosting implementation of Oot itself, but it'll be helpful for user code as well, because users will be able to write statically typed Oot code, we can use the ordinary toolchain to compile that to OVM, and then the toolchain will be able to recognize that the OVM code is statically typed and compile it down further rather than interpreting it.
---
so here's the design i'm coming up with. It seems odd to me, in that i don't think i've heard of it being done this way before, but it seems to satisfy the considerations noted in the previous section:
OVM is a VM with opcodes.
Some of the opcodes are primitive. A porter has to implement these. For example, BEQ.
The opcodes which are not primitive are 'standard library functions'. These have implementations provided in the language of OVM, in terms of primitive functions (or other library functions; but there are no circular dependencies between library functions, except for self-referencing recursion (a function can call itself)). For example, hashtable_get(table, key). A porter can start out not implementing these and then implement them incrementally later on to improve interoperation and efficiency.
Some of the opcodes, including some (but probably not all) of the primitive ones, and including some but not all of the standard library ones, are (what's the word for this? secured? protected? guarded? fenced? barricaded? shielded? defended? prohibited? restricted? controlled? secured? access-controlled? restrictedaccess? unsafe? let's say 'unsafe'), in a protection-ring-security-model sense. If we are given some untrusted user code to run, we had better scan through it and make sure it doesn't contain any of these opcodes (alternately, the OVM could have a special mode where it faults on privileged instructions). For example, array_get_noboundscheck(array, location).
Standard library opcode implementations can call unsafe opcodes.
Some of the opcodes can sometimes or always be 'dynamic' (the others are called 'static'). This may make them difficult to statically and efficiently compile to some targets. It is possible to determine whether each instruction instance is static or dynamic just by syntactically inspecting it. For example, 'ADD' (polymorphic addition) is dynamic when its type operand is not a constant.
The Oot implementation is written in terms of only static instructions, and can freely use unsafe opcodes.
User code that contains only static opcodes can be most efficiently compiled.
User code that is untrusted cannot contain unsafe opcodes. However, it can contain safe standard library opcodes which are themselves implemented using unsafe opcodes.
This design should:
Should we partition the opcode address space to make it easy to recognize unsafe and primitive and static opcodes? Yeah, why not, we have 16 bits.
I'm thinking that memory management would work like this: there are primitive operations that do stuff (like loading, storing, copying) without managing memory, and primitive operations that do things like incrementing/decrementing reference counts, and reading/writing through read/write barriers. Some or all of loading/storing/copying directly without memory management is unsafe, and messing with reference counts is unsafe. Then memory-aware variants of stuff like loading, storing, copying is provided, and untrusted code (or portable user code) uses that.
---
i guess OVM should have some optional instructions that some platforms and not others implement, and that are not provided by macros/stdlib unless implemented.
For example, unicode stuff: embedded platforms usually won't support this because it requires a lot of data, but desktop platforms usually will. We want the HLL to be able to use the platform unicode stuff if it is there because interop, but otherwise the HLL must do without.
---
some notes from RPython [1]:
"Note that there are tons of special cased restrictions that you’ll encounter as you go. The exact definition is “RPython is everything that our translation toolchain can accept” :)"
ok that's crazy
---
my conclusions from the previous section:
we should do:
---
if you don't want to allow pointers into the middle of structures, then you probably want to deal with pairs (base_pointer, offset).
---
---
i guess all we really need to keep track of is where the pointers are. If we know where the pointers are on the stacks and in registers, and we know where the pointers are in memory, then we can prevent sandboxed code from manufacturing its own pointers. (is that how CLR prevents sandboxed code from manufacturing its own pointers by reading integers from memory in as pointers?)
in order to know where pointers are in memory we probably have to have data structure declarations, and force all allocated memory to be one of these data structures (although 'array of bytes' is a valid structure, provided that bytes can never be loaded as pointers).
data structure primitives include i32, ptr, probably also i64, fixed-length arrays, variable-length arrays.
i guess for these purposes we don't really need to know if e.g. such-and-such field within a data structure is a u32 or an i32, we just need its size. So we don't really need typed everything (although that may be useful at the OVM level for other reasons anyhow). And we don't really need 'objects' with encapsulated methods, we just need something more like C structs.
---
for another perspective, consider the runtimeless restricted variant of D, "BetterC?":
" Retained Features
Nearly the full language remains available. Highlights include:
Unrestricted use of compile-time features
Full metaprogramming facilities
Nested functions, nested structs, delegates and lambdas
Member functions, constructors, destructors, operating overloading, etc.
The full module system
Array slicing, and array bounds checking
RAII (yes, it can work without exceptions)
scope(exit)
Memory safety protections
Interfacing with C++
COM classes and C++ classes
assert failures are directed to the C runtime library
switch with strings
final switch
unittest...
Unavailable Features
D features not available with BetterC?:
Garbage Collection
TypeInfo and ModuleInfo
Classes
Built-in threading (e.g. core.thread)
Dynamic arrays (though slices of static arrays work) and associative arrays
Exceptions
synchronized and core.sync
Static module constructors or destructors"
(i don't know why dynamic arrays don't work, i think i read that it has something to do with GC)
this suggests a different path that i mentioned once before, where there would be yet another language in between BootX? and OVM. This other language would be a C-like runtimeless language.
In this case the other language would have the interop stuff and would only run trusted code, and the OVM would run untrusted code (and would do things like bounds-checking and convert cooperative multitasking to preemptive).
I'm not sure that there's really much benefit to not just doing as i said above, though, and have a subset of the OVM instructions be 'unsafe', and have many of the OVM instructions be macroinstructions. One issue with that is that multitasking remains cooperative, not fully pre-emptive; we can fix that by having a facility in the VM to designate an instruction to execute every N instructions, however. As for forcing read and write barriers upon access to various memory lcations, the safe instructions can do that.
---
Two alternatives for how unsafe instructions could be encoded in OVM:
1) BootX? instructions (8-, 16-, 32-bit) are always interpreted as unsafe instructions that do the same thing as in BootX?. The "safe variants" of these (e.g. safe ADD) are different opcodes and/or are in a 64- or 128-bit encoding. The allows the compiler to macrocompile some 'safe' instructions into a series of 'unsafe' instructions while leaving other 'safe' instructions as single (128-bit) instructions. Before beginning execution of untrusted code, OVM must scan it and check that no unsafe instructions are in there (which is easy because the unsafe ones are those less than 128-bit encoding, or mb less than 64-bit). Then during execution any unsafe instructions which are encountered are assumed to be trusted, and executed immediately.
2) BootX? instructions are interpreted as the 'safe variant' of themselves, e.g. LOAD does bounds-checking and reference-counting, etc. This allows the OVM program to be expressed with mostly short (8, 16, 32-bit) instructions. However to express unsafe instructions you would either (a) reuse the same opcodes but they are interpreted differently when the VM is in 'ring 0' protection ring/domain state, or (b) have variants of all of the BootX?-analog opcodes that are 'unsafe', and those can only be executed in the 'ring 0' state.
The advantage of (1) is that the compiler can intersperse unsafe instructions (such as those to update reference counts) with 'user code' without jumping to special unsafe subroutines and without also interspersing 'switch protection ring/domain' instructions. The advantage of (2) is that most user code can be more compact.
I guess it's likely that unsafe instructions may exceed user code in compiled code, because for every primitive actual operation like 'LOAD', this will usually be surrounded with a few unsafe instructions (bounds check, update reference counts). Of course the toolchain might not work like that at all (it may get compile all user code to unsafe instructions if it compiles any of it), but i bet if we're optimizing, it's better to optimize for the use case where safe and unsafe code is interspersed than for pure user code size.
So i'm leaning towards (1). This is also useful for expressing the definition of macroinstructions in terms of primitives and other previously defined macroinstructions.
We could also do both; have one 'ring' state that means 'treat BootX?-encoded instructions as unsafe BootX? instructions' and a different 'ring' state that means 'treat BootX?-encoded instructions as safe user code'. That would also allow us to immediately begin executing untrusted code without scanning it first (and then maybe switch to the other mode later after it's been scanned and/or partially compiled). Heck, why not?
So i'm leaning towards 'both'.
---
Also there are a few alternatives for defining the semantics of the safe variants of BootX? instructions:
(1) we can just define them anew. The fact that they are 'variants' is something we humans know but the computer doesn't know that at all.
(2) we define them using 'hooks', such as hooks for loading and storing stuff to registers, hooks for loading and storing stuff to memory, etc. The semantics of the instructions are defined by applying the original instructions, modified by the hooks.
For (2), it's conceptually nice, but imagine the hooks for an instruction like LOAD. We probably want a hook that executes before LOAD, and can access and alter its inputs or even return without calling LOAD (by raising an error or just returning successfully), and also a hook that executes after LOAD, and can access and alter its output or even raise an error. If the hooks can do all that, it's probably simpler just to redefine the entire instruction. Using choice (1) in the previous section, it's easy to redefine the entire instructions and also call the original instruction at some point within the redefinition.
So i'm leaning towards (1).
We could still have a 'boxed' addr mode bit and have hooks for that.
---
The GC and the serializer/pickler needs to know where pointers are in data structures in memory. Consider something like an array of pointers whose length is only known at runtime.
This data structure's type must be some sort of regular-expression-like template that says "There is a sequence of N pointers here". So where do we store N? Probably at the beginning of the array (another alternative is something like C strings, but let's not support that).
Another alternative is to store N in a fat pointer at each pointer that points the the data structure. But there may be many of those, so it's more memory-efficient to store it once in the data structure itself (although that's "intrusive" if we need to interop with native data structures in this manner; but i think we can just say that native data structures can't be under the GC and can't be serialized, unless someone writes special-case support routines for them).
We could call the part of the data structure that stores N the 'header', because the GC and serializer need to know how to interpret this header, whereas they don't need to interpret the rest of the data structure (they just need to know where the pointers are in the rest).
Now, consider an array of structs of type T, where each struct has two fields, and each field contains a pointer. This is best expressed with "array" being a template/generic type, and in this case the items (the type variable in the template) are of type T. We can support generic types which are possibly polymorphic at runtime if we generalize the header to contain the values assigned to the type variables.
What about the definition of the data structure itself (the thing that says that T has two fields, each of which is a pointer; and the thing that says that array<T> has a header with N and T followed by a payload consisting of an N-item sequence of items, each of which is a T)? We could put that in the header too, making the data structure self-describing. But this would be a ton of duplication because we really only need the definition once, and we may have lots of instances of that data type. So we should store the type elsewhere.
Should the header contain the type of the object? I guess it may as well; again, that allows for runtime polymorphism. Even if the compiler can deduce the type of everything, somewhere the GC would probably want to maintain a big table of the types of each pointer, in which case it wastes no additional memory to move that information into the header for each data structure.
Should the type in the header be a 16-bit type ID? 32-bit? Let's just make it a native pointer to the type definition. Wait, actually, that's confusing if we also have immediate constant types in the instructions, because those would have to either be integers, or the encoding needs to know which operands are types. Hmm, but if types are not pointers, then since type descriptions are of variable size, you have to use the type ID as an index into a table of type pointers, which is inefficient for a common operation. Hmm... We could also just say that where a pointer is expected, an immediate constant is taken to be an index into the pointer constant table.
So i guess that's what our boxed objects look like:
But what about hetrogenous arrays of dynamic size and composition? Now the header itself needs to have a dynamic array of type literals. So actually the integers and the type pointers in the header may be interspersed.
I guess type literals should be a core type itself.
One question is: do we allow internal pointers, that is, pointers that don't point at the base address of a data structure (also, do we allow recursive internal pointers, that is, pointers to another location within the same data structure?). We could also forbid direct internal pointers but allow fat ptrs/references of the form: pair (base ptr, offset) (i'm leaning towards that). TODO
---
Another question is:
When the BootX?-encoded instructions are interpreted as unsafe, are they interpreted just like actual BootX? code (that should be passed thru unchanged by the compiler into the BootX? object code, so they will directly affect the actual BootX? registers), or just as doing the same thing as those BootX? instructions in OVM?
I'm thinking the latter, because:
(a) the target platform for some OVM implementations may be other than BootX? (b) we don't know which OVM registers are being mapped to which BootX? registers anyways, so there's not much that we could do
So i'm leaning towards saying that they are all operating on OVM registers etc; they are not being 'passed thru' by a compiler into BootX? object code, rather they are ordinary OVM instructions.
---
it would be so nice to have the 64-bit encoding's 8-bit operands relate to some set of 256 registers. But i'm not sure how that would coincide with having 16 bit registers (or lexical variables?) in the 128-bit encoding. User code should be able to have more than 256 local variables. And we don't want to do an extra step of register allocation, do we? Otoh we could say that we have 256 REGISTERS but 64k VARIABLES. Hmm, not sure. TODO.
---
" While the common intermediate language (CIL) does have operators that can fetch and set arbitrary memory (and thus violate memory safety), it also has the following memory-safe operators and the CLR strongly encourages their use in most programming:
Field-fetch operators (LDFLD, STFLD, LDFLDA) that fetch (read), set and take the address of a field by name.
Array-fetch operators (LDELEM, STELEM, LDELEMA) that fetch, set and take the address of an array element by index. All arrays include a tag specifying their length. This facilitates an automatic bounds check before each access." -- [4]
---
So how does the GC scan a stack to find 'managed pointers'?
Here's how CLR does it:
https://mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime/ https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md#managed-frames
" Managed Frames
Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).
Since methods are typically called more times than their frames are crawled (stack crawls are relatively rare in the runtime, at least with respect to the rate at which methods are typically called) it makes sense to trade method call performance for some additional crawl time processing. As a result the JIT generates additional metadata for each method it compiles that includes sufficient information for the stack crawler to decode a stack frame belonging to that method.
This metadata can be found via a hash-table lookup with an instruction pointer somewhere within the method as the key. The JIT utilizes compression techniques in order to minimize the impact of this additional per-method metadata.
Given initial values for a few important registers (e.g. EIP, ESP and EBP on x86 based systems) the stack crawler can locate a managed method and its associated JIT metadata and use this information to roll back the register values to those current in the method's caller. In this fashion a sequence of managed method frames can be traversed from the most recent to the oldest caller. This operation is sometimes referred to as a virtual unwind (virtual because we're not actually updating the real values of ESP etc., leaving the stack intact). "
"All these examples end up calling into the Thread::StackWalkFrames?(..) method here"
"It’s worth pointing out that the only way you can access it from C#/F#/VB.NET code is via the StackTrace? class, only the runtime itself can call into Thread::StackWalkFrames?(..) directly."
" ProcessIp?(..) here has the job of looking up the current managed method (if any) based on the current instruction pointer (IP). It does this by calling into EECodeInfo::Init(..) here and then ends up in one of:
EEJitManager::JitCodeToMethodInfo(..) here, that uses a very cool looking data structure refereed to as a ‘[https://github.com/dotnet/coreclr/blob/release/2.2/src/inc/nibblemapmacros.h#L12-L26 nibble map]’
NativeImageJitManager::JitCodeToMethodInfo(..) here
ReadyToRunJitManager::JitCodeToMethodInfo(..) here"and it looks to me like in the first case (EEJitManager?) we end up here in RangeSection?* ExecutionManager::GetRangeSection?