Revision 18 not available (showing current revision instead)

proj-oot-ootOvmNotes1

see also ootVm.txt; that file is older and needs to be integrated into the list in the following section

---

so i think some stuff that OVM should implement is:

todo should also read more about the OOP parts of JVM and CIL. todo should also read about Cmm/C-- runtime support todo update Oot todos

as you can see, the purpose of OVM is coming into focus:

but everything should be not very dynamic/metaprogrammable/dynamically typed (remember the lesson from PyPy?), so this is different from Oot Core.

---

Instruction format (128-bit):

1 x 16-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)

16 bytes

Note that there is no space for encoding length format bits here -- UNLESS you reduced the opcode by 5 bits to 11 bits. Which isn't crazy. So maybe:

5 encoding length format bits 1 x 11-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)

16 bytes

We could also have a 64-bit encoding format:

4 encoding length format bits + 2 RESERVED bits + 10-bit opcode + 4 x (8-bit operand + 4-bit addr mode)

---

i dunno man, this seems like a lot of work to implement in assembly language.

also, what about the idea of incremental implementation? how is the porter going to be able to just implement the bare VM and use existing implementations of most of the opcodes?

i guess some of these things could be compile-time macros (eg hash tables).

but i feel like we really have a couple of levels here.

maybe some/many of these things would be 'standard library functions' at the OVM level (eg hash tables).

hmm, that makes a lot more sense to me. So we would specify a smaller core VM, which has to actually be implemented, and then a large standard library. And then porters would have to implement the VM, and then for better interoperability and efficiency on a given platform they could go ahead and incrementally override parts of the standard library with platform-specific stuff.

another issue is static typing. There's a tension here:

I think the solution is: (a) some of the dynamic stuff will be in the layer above (in the implementation of Oot Core on top of OVM) (b) there is some dynamic stuff at this level but it is easy to tell from looking at each instruction if it has any dynamicity. For example, if we use my idea for 'polymorphic assembly' by way of a type operand, then instructions whose type operands are constant mode are statically typed. This means that OVM code that is composed only of non-dynamic instructions can be efficiently compiled. And the language implementation itself will be like that.

Still, this suggests that maybe we are trying to do too many things at once.

Should we have one layer for a 'language implementing language', and then OVM is a 'runtime VM' implemented in that language? The problem with that may be that the 'runtime VM' has to support funky control flow like delimited continuations, so we don't want the language implementing language to impose and abstract away something like a restrictive call chain/stack abstraction, because then it seems like we have another interpreter-on-top-of-an-interpreter layer. But i'm not sure i fully understand that part, so that objection could be invalid. todo/hmmm.

My immediate thoughts are that Oot itself may be the 'language implementing language' that the reference implementation is ultimately written in. So when i say 'it's a lot of work to write this in assembly' that's not relevant, because the Oot implementation will be compiled to Boot, we don't have to write it in Boot directly (except maybe once to bootstrap). But is this really true? I don't expect our compiler to be smart enough to write efficient Boot code for things like context switches in the scheduler.

And, in any case we actually want the runtime VM to have the property that it supports dynamic typing yet you can easily identify code with static typing, because this will help JIT'ing, compilers, etc. This is certainly helpful for efficient compilation of a self-hosting implementation of Oot itself, but it'll be helpful for user code as well, because users will be able to write statically typed Oot code, we can use the ordinary toolchain to compile that to OVM, and then the toolchain will be able to recognize that the OVM code is statically typed and compile it down further rather than interpreting it.

---

so here's the design i'm coming up with. It seems odd to me, in that i don't think i've heard of it being done this way before, but it seems to satisfy the considerations noted in the previous section:

OVM is a VM with opcodes.

Some of the opcodes are primitive. A porter has to implement these. For example, BEQ.

The opcodes which are not primitive are 'standard library functions'. These have implementations provided in the language of OVM, in terms of primitive functions (or other library functions; but there are no circular dependencies between library functions, except for self-referencing recursion (a function can call itself)). For example, hashtable_get(table, key). A porter can start out not implementing these and then implement them incrementally later on to improve interoperation and efficiency.

Some of the opcodes, including some (but probably not all) of the primitive ones, and including some but not all of the standard library ones, are (what's the word for this? secured? protected? guarded? fenced? barricaded? shielded? defended? prohibited? restricted? controlled? secured? access-controlled? restrictedaccess? unsafe? let's say 'unsafe'), in a protection-ring-security-model sense. If we are given some untrusted user code to run, we had better scan through it and make sure it doesn't contain any of these opcodes (alternately, the OVM could have a special mode where it faults on privileged instructions). For example, array_get_noboundscheck(array, location).

Standard library opcode implementations can call unsafe opcodes.

Some of the opcodes can sometimes or always be 'dynamic' (the others are called 'static'). This may make them difficult to statically and efficiently compile to some targets. It is possible to determine whether each instruction instance is static or dynamic just by syntactically inspecting it. For example, 'ADD' (polymorphic addition) is dynamic when its type operand is not a constant.

The Oot implementation is written in terms of only static instructions, and can freely use unsafe opcodes.

User code that contains only static opcodes can be most efficiently compiled.

User code that is untrusted cannot contain unsafe opcodes. However, it can contain safe standard library opcodes which are themselves implemented using unsafe opcodes.

This design should:

Should we partition the opcode address space to make it easy to recognize unsafe and primitive and static opcodes? Yeah, why not, we have 16 bits.

I'm thinking that memory management would work like this: there are primitive operations that do stuff (like loading, storing, copying) without managing memory, and primitive operations that do things like incrementing/decrementing reference counts, and reading/writing through read/write barriers. Some or all of loading/storing/copying directly without memory management is unsafe, and messing with reference counts is unsafe. Then memory-aware variants of stuff like loading, storing, copying is provided, and untrusted code (or portable user code) uses that.

---

i guess OVM should have some optional instructions that some platforms and not others implement, and that are not provided by macros/stdlib unless implemented.

For example, unicode stuff: embedded platforms usually won't support this because it requires a lot of data, but desktop platforms usually will. We want the HLL to be able to use the platform unicode stuff if it is there because interop, but otherwise the HLL must do without.

---

some notes from RPython [1]:

"Note that there are tons of special cased restrictions that you’ll encounter as you go. The exact definition is “RPython is everything that our translation toolchain can accept” :)"

ok that's crazy

---

my conclusions from the previous section:

we should do:

---

if you don't want to allow pointers into the middle of structures, then you probably want to deal with pairs (base_pointer, offset).

---

---

i guess all we really need to keep track of is where the pointers are. If we know where the pointers are on the stacks and in registers, and we know where the pointers are in memory, then we can prevent sandboxed code from manufacturing its own pointers. (is that how CLR prevents sandboxed code from manufacturing its own pointers by reading integers from memory in as pointers?)

in order to know where pointers are in memory we probably have to have data structure declarations, and force all allocated memory to be one of these data structures (although 'array of bytes' is a valid structure, provided that bytes can never be loaded as pointers).

data structure primitives include i32, ptr, probably also i64, fixed-length arrays, variable-length arrays.

i guess for these purposes we don't really need to know if e.g. such-and-such field within a data structure is a u32 or an i32, we just need its size. So we don't really need typed everything (although that may be useful at the OVM level for other reasons anyhow). And we don't really need 'objects' with encapsulated methods, we just need something more like C structs.

---

for another perspective, consider the runtimeless restricted variant of D, "BetterC?":

[3]

" Retained Features

Nearly the full language remains available. Highlights include:

    Unrestricted use of compile-time features
    Full metaprogramming facilities
    Nested functions, nested structs, delegates and lambdas
    Member functions, constructors, destructors, operating overloading, etc.
    The full module system
    Array slicing, and array bounds checking
    RAII (yes, it can work without exceptions)
    scope(exit)
    Memory safety protections
    Interfacing with C++
    COM classes and C++ classes
    assert failures are directed to the C runtime library
    switch with strings
    final switch
    unittest

...

Unavailable Features

D features not available with BetterC?:

    Garbage Collection
    TypeInfo and ModuleInfo
    Classes
    Built-in threading (e.g. core.thread)
    Dynamic arrays (though slices of static arrays work) and associative arrays
    Exceptions
    synchronized and core.sync
    Static module constructors or destructors

"

(i don't know why dynamic arrays don't work, i think i read that it has something to do with GC)

this suggests a different path that i mentioned once before, where there would be yet another language in between BootX? and OVM. This other language would be a C-like runtimeless language.

In this case the other language would have the interop stuff and would only run trusted code, and the OVM would run untrusted code (and would do things like bounds-checking and convert cooperative multitasking to preemptive).

I'm not sure that there's really much benefit to not just doing as i said above, though, and have a subset of the OVM instructions be 'unsafe', and have many of the OVM instructions be macroinstructions. One issue with that is that multitasking remains cooperative, not fully pre-emptive; we can fix that by having a facility in the VM to designate an instruction to execute every N instructions, however. As for forcing read and write barriers upon access to various memory lcations, the safe instructions can do that.

---

Two alternatives for how unsafe instructions could be encoded in OVM:

1) BootX? instructions (8-, 16-, 32-bit) are always interpreted as unsafe instructions that do the same thing as in BootX?. The "safe variants" of these (e.g. safe ADD) are different opcodes and/or are in a 64- or 128-bit encoding. The allows the compiler to macrocompile some 'safe' instructions into a series of 'unsafe' instructions while leaving other 'safe' instructions as single (128-bit) instructions. Before beginning execution of untrusted code, OVM must scan it and check that no unsafe instructions are in there (which is easy because the unsafe ones are those less than 128-bit encoding, or mb less than 64-bit). Then during execution any unsafe instructions which are encountered are assumed to be trusted, and executed immediately.

2) BootX? instructions are interpreted as the 'safe variant' of themselves, e.g. LOAD does bounds-checking and reference-counting, etc. This allows the OVM program to be expressed with mostly short (8, 16, 32-bit) instructions. However to express unsafe instructions you would either (a) reuse the same opcodes but they are interpreted differently when the VM is in 'ring 0' protection ring/domain state, or (b) have variants of all of the BootX?-analog opcodes that are 'unsafe', and those can only be executed in the 'ring 0' state.

The advantage of (1) is that the compiler can intersperse unsafe instructions (such as those to update reference counts) with 'user code' without jumping to special unsafe subroutines and without also interspersing 'switch protection ring/domain' instructions. The advantage of (2) is that most user code can be more compact.

I guess it's likely that unsafe instructions may exceed user code in compiled code, because for every primitive actual operation like 'LOAD', this will usually be surrounded with a few unsafe instructions (bounds check, update reference counts). Of course the toolchain might not work like that at all (it may get compile all user code to unsafe instructions if it compiles any of it), but i bet if we're optimizing, it's better to optimize for the use case where safe and unsafe code is interspersed than for pure user code size.

So i'm leaning towards (1). This is also useful for expressing the definition of macroinstructions in terms of primitives and other previously defined macroinstructions.

We could also do both; have one 'ring' state that means 'treat BootX?-encoded instructions as unsafe BootX? instructions' and a different 'ring' state that means 'treat BootX?-encoded instructions as safe user code'. That would also allow us to immediately begin executing untrusted code without scanning it first (and then maybe switch to the other mode later after it's been scanned and/or partially compiled). Heck, why not?

So i'm leaning towards 'both'.

---

Also there are a few alternatives for defining the semantics of the safe variants of BootX? instructions:

(1) we can just define them anew. The fact that they are 'variants' is something we humans know but the computer doesn't know that at all.

(2) we define them using 'hooks', such as hooks for loading and storing stuff to registers, hooks for loading and storing stuff to memory, etc. The semantics of the instructions are defined by applying the original instructions, modified by the hooks.

For (2), it's conceptually nice, but imagine the hooks for an instruction like LOAD. We probably want a hook that executes before LOAD, and can access and alter its inputs or even return without calling LOAD (by raising an error or just returning successfully), and also a hook that executes after LOAD, and can access and alter its output or even raise an error. If the hooks can do all that, it's probably simpler just to redefine the entire instruction. Using choice (1) in the previous section, it's easy to redefine the entire instructions and also call the original instruction at some point within the redefinition.

So i'm leaning towards (1).

We could still have a 'boxed' addr mode bit and have hooks for that.

---

The GC and the serializer/pickler needs to know where pointers are in data structures in memory. Consider something like an array of pointers whose length is only known at runtime.

This data structure's type must be some sort of regular-expression-like template that says "There is a sequence of N pointers here". So where do we store N? Probably at the beginning of the array (another alternative is something like C strings, but let's not support that).

Another alternative is to store N in a fat pointer at each pointer that points the the data structure. But there may be many of those, so it's more memory-efficient to store it once in the data structure itself (although that's "intrusive" if we need to interop with native data structures in this manner; but i think we can just say that native data structures can't be under the GC and can't be serialized, unless someone writes special-case support routines for them).

We could call the part of the data structure that stores N the 'header', because the GC and serializer need to know how to interpret this header, whereas they don't need to interpret the rest of the data structure (they just need to know where the pointers are in the rest).

Now, consider an array of structs of type T, where each struct has two fields, and each field contains a pointer. This is best expressed with "array" being a template/generic type, and in this case the items (the type variable in the template) are of type T. We can support generic types which are possibly polymorphic at runtime if we generalize the header to contain the values assigned to the type variables.

What about the definition of the data structure itself (the thing that says that T has two fields, each of which is a pointer; and the thing that says that array<T> has a header with N and T followed by a payload consisting of an N-item sequence of items, each of which is a T)? We could put that in the header too, making the data structure self-describing. But this would be a ton of duplication because we really only need the definition once, and we may have lots of instances of that data type. So we should store the type elsewhere.

Should the header contain the type of the object? I guess it may as well; again, that allows for runtime polymorphism. Even if the compiler can deduce the type of everything, somewhere the GC would probably want to maintain a big table of the types of each pointer, in which case it wastes no additional memory to move that information into the header for each data structure.

Should the type in the header be a 16-bit type ID? 32-bit? Let's just make it a native pointer to the type definition. Wait, actually, that's confusing if we also have immediate constant types in the instructions, because those would have to either be integers, or the encoding needs to know which operands are types. Hmm, but if types are not pointers, then since type descriptions are of variable size, you have to use the type ID as an index into a table of type pointers, which is inefficient for a common operation. Hmm... We could also just say that where a pointer is expected, an immediate constant is taken to be an index into the pointer constant table.

So i guess that's what our boxed objects look like:

But what about hetrogenous arrays of dynamic size and composition? Now the header itself needs to have a dynamic array of type literals. So actually the integers and the type pointers in the header may be interspersed.

I guess type literals should be a core type itself.

One question is: do we allow internal pointers, that is, pointers that don't point at the base address of a data structure (also, do we allow recursive internal pointers, that is, pointers to another location within the same data structure?). We could also forbid direct internal pointers but allow fat ptrs/references of the form: pair (base ptr, offset) (i'm leaning towards that). TODO

---

Another question is:

When the BootX?-encoded instructions are interpreted as unsafe, are they interpreted just like actual BootX? code (that should be passed thru unchanged by the compiler into the BootX? object code, so they will directly affect the actual BootX? registers), or just as doing the same thing as those BootX? instructions in OVM?

I'm thinking the latter, because:

(a) the target platform for some OVM implementations may be other than BootX? (b) we don't know which OVM registers are being mapped to which BootX? registers anyways, so there's not much that we could do

So i'm leaning towards saying that they are all operating on OVM registers etc; they are not being 'passed thru' by a compiler into BootX? object code, rather they are ordinary OVM instructions.

---

it would be so nice to have the 64-bit encoding's 8-bit operands relate to some set of 256 registers. But i'm not sure how that would coincide with having 16 bit registers (or lexical variables?) in the 128-bit encoding. User code should be able to have more than 256 local variables. And we don't want to do an extra step of register allocation, do we? Otoh we could say that we have 256 REGISTERS but 64k VARIABLES. Hmm, not sure. TODO.

---

" While the common intermediate language (CIL) does have operators that can fetch and set arbitrary memory (and thus violate memory safety), it also has the following memory-safe operators and the CLR strongly encourages their use in most programming:

    Field-fetch operators (LDFLD, STFLD, LDFLDA) that fetch (read), set and take the address of a field by name.
    Array-fetch operators (LDELEM, STELEM, LDELEMA) that fetch, set and take the address of an array element by index. All arrays include a tag specifying their length. This facilitates an automatic bounds check before each access.

" -- [4]

---

So how does the GC scan a stack to find 'managed pointers'?

Here's how CLR does it:

https://mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime/ https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md#managed-frames

" Managed Frames

Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).

Since methods are typically called more times than their frames are crawled (stack crawls are relatively rare in the runtime, at least with respect to the rate at which methods are typically called) it makes sense to trade method call performance for some additional crawl time processing. As a result the JIT generates additional metadata for each method it compiles that includes sufficient information for the stack crawler to decode a stack frame belonging to that method.

This metadata can be found via a hash-table lookup with an instruction pointer somewhere within the method as the key. The JIT utilizes compression techniques in order to minimize the impact of this additional per-method metadata.

Given initial values for a few important registers (e.g. EIP, ESP and EBP on x86 based systems) the stack crawler can locate a managed method and its associated JIT metadata and use this information to roll back the register values to those current in the method's caller. In this fashion a sequence of managed method frames can be traversed from the most recent to the oldest caller. This operation is sometimes referred to as a virtual unwind (virtual because we're not actually updating the real values of ESP etc., leaving the stack intact). "

"All these examples end up calling into the Thread::StackWalkFrames?(..) method here"

"It’s worth pointing out that the only way you can access it from C#/F#/VB.NET code is via the StackTrace? class, only the runtime itself can call into Thread::StackWalkFrames?(..) directly."

" ProcessIp?(..) here has the job of looking up the current managed method (if any) based on the current instruction pointer (IP). It does this by calling into EECodeInfo::Init(..) here and then ends up in one of:

    EEJitManager::JitCodeToMethodInfo(..) here, that uses a very cool looking data structure refereed to as a ‘[https://github.com/dotnet/coreclr/blob/release/2.2/src/inc/nibblemapmacros.h#L12-L26 nibble map]’
    NativeImageJitManager::JitCodeToMethodInfo(..) here
    ReadyToRunJitManager::JitCodeToMethodInfo(..) here"

and it looks to me like in the first case (EEJitManager?) we end up here in RangeSection?* ExecutionManager::GetRangeSection?(TADDR addr):

https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codeman.cpp#L4378-L4459

So, uh, yeah, it looks to me like it's exactly what the above blog post implies: there is some sort of simple list of code ranges and we are simply searching this list to identify which range the current frame's instruction pointer is inside!

not sure exactly what this stuff means: " Unwinding ‘JITted’ Code

Finally, we’re going to look at what happens with ‘managed code’, i.e. code that started off as C#/F#/VB.NET, was turned into IL and then compiled into native code by the ‘JIT Compiler’. This is the code that you generally want to see in your ‘stack trace’, because it’s code you wrote yourself! Help from the ‘JIT Compiler’

Simply, what happens is that when the code is ‘JITted’, the compiler also emits some extra information, stored via the EECodeInfo? class, which is defined here. Also see the ‘Unwind Info’ section in the JIT Compiler <-> Runtime interface, note how it features seperate sections for TARGET_ARM, TARGET_ARM64, TARGET_X86 and TARGET_UNIX.

In addition, in CodeGen::genFnProlog() here the JIT emits a function ‘prologue’ that contains several pieces of ‘unwind’ related data. This is also imlemented in CEEJitInfo::allocUnwindInfo(..) in this piece of code, which behaves differently for each CPU architecture: "

If it finds the code then for each frame it returns a MethodDesc? to the caller (the code that wanted to walk the stack) " Managed code, well technically ‘managed code’ that was JITted to ‘native code’, so more accurately a managed stack frame. In this situation the MethodDesc? class defined here is provided, you can read more about this key CLR data-structure in the corresponding BotR? chapter. "

---

So the upshot from the previous section is, for each frame in the stack:

there is some sort of simple list of code ranges and we are simply searching this list to identify which range the current frame's instruction pointer is inside!

what does the JVM do? "The VM identifies the method that owns the stack frame by looking up the instruction pointer value in the method code block tables. " [5]

" When an exception is thrown through a PC, we first examine which try block covers the program range including that PC. For the table- driven exception handling [55, 20], which is commonly used to im- plement exception handling, as in the Java VM [48], the compiler constructs the table that maps a range of PCs to a try block. With the table, the PC is searched for and is mapped to a try block. If a try block is found, we then examine whether the handler can catch " -- [6]

Recall that the Book of the Runtime said:

" Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).

Since methods are typically called more times than their frames are crawled (stack crawls are relatively rare in the runtime, at least with respect to the rate at which methods are typically called) it makes sense to trade method call performance for some additional crawl time processing. As a result the JIT generates additional metadata for each method it compiles that includes sufficient information for the stack crawler to decode a stack frame belonging to that method. "

So, it seems to be that we can simplify the implementation by simply specifying that in an OVM stack frame there is a pointer to the relevant metadata (stack frame Type object, i guess). These will be at a specific location relative to the Frame Pointers which will be chained together. Problem solved, i think? This is inefficient in that every single stack frame has this superfluous extra item, but it will make our implementation code simpler (no need to map between code ranges and stack pointers, which is probably important because in the general case (e.g. transpilation) we can't assume that pointers are comparable, etc, which makes representing and doing arithmetic on 'code pointers' tricky).

This solution does grate on me though, because it is inefficient, and leaf function calling is one place where you really really do want to be efficient.

I guess we could leave this up to the implementation; maybe we should just say that there is an OVM instruction that maps a frame pointer to the frame type, and the implementation can implement this however it wants (PC->frame type maps or storing a frame type pointer in the frame).

I guess we might still have that other problem that CLR has, about interleaving our stack frames with native stack frames.

maybe see also the 'Stack Unwinding (other runtimes)' section in https://mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime/

---

Just look at this huge piece of code, this demonstrates why we are motivated to recreate all of this (to have a simpler, at the cost of lower performance, system):

https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp

(btw, [7] guides you through the above code)

---

anyhow so the upshot of the above is:

---

arg, it would be so pretty to give separate functions to the 64-bit encoding (8-bit operands) and the 128-bit encoding (16-bit operands). Even though it may mean another layer of register allocation. Let's do it.

Let's put all the unsafe stuff in the 64-bit encoding.

So OVM has two layers; the 'unsafe/interop' layer and the 'user code/managed code VM' layer.

---

i'm thinking that the 'registers' in OVM will actually reference locals, e.g. they are transparently stored/reloaded when you make/return from a function call.

This is similar to how Lua does it:

" Local variables are equivalent to certain registers in the currentstack frame, while dedicated opcodes allow read/write of globals and upvalues ... By default, Lua has a maximum stack frame size of 250. This is encoded as MAXSTACK inllimits.h. The maximum stack frame size in turn limits the maximum number of localsper function, which is set at 200, encoded as LUAI_MAXVARS in luaconf.h. Other limitsfound in the same file include the maximum number of upvalues per function (60), encodedas LUAI_MAXUPVALUES, call depths, the minimum C stack size, etc. Also, with an sBx fieldof 18 bits, jumps and control structures cannot exceed a jump distance of about 131071 " -- [8]

for upvals and globals, there are GETUPVAL, GETGLOBAL, SETUPVAL, SETGLOBAL.

---

So, the lower (64-bit encoding/unsafe) layer of OVM will:

---

as ECMA-335 says,

" III.4 Object model instructions The instructions described in the base instruction set are independent of the object model being executed. Those instructions correspond closely to what would be found on a real CPU. The object model instructions are less built-in than the base instructions in the sense that they could be built out of the base instructions and calls to the underlying operating system. [Rationale: The object model instructions provide a common, efficient implementation of a set of services used by many (but by no means all) higher-level languages. They embed in their operation a set of conventions defined by the CTS. This include (among other things):  Field layout within an object  Layout for late bound method calls (vtables)  Memory allocation and reclamation  Exception handling  Boxing and unboxing to convert between reference-based objects and value types "

i guess some of this is stuff that would be in the upper (safe/128-bit) layer of OVM.

---

so i guess the purpose of the levels so far is:

Boot/BootX?, Oot Core, Oot-nostdlib, and Oot are (informally) specified and may be (after 1.0) (somewhat) relied upon by users/other projects. OVM and pre-Oot are considered implementation details not to be relied upon. Note that although the semantics of Oot are defined in terms of pre-Oot and oot-nostdlib plus metaprogramming, implementations are free to recognize and compile these 'metaprogrammed' constructs in a hardwired way.

---

yeah, the more i think about it, the more i think that OVM having local variables (rather than registers) and a call stack may be the key to the purpose of this level of abstraction.

you can also say that the purpose of OVM is to provide services. OVMlow provides services, and OVMhigh provides ANTI-services; that is, useful restrictions on expressiveness (such as privacy and memory safety).

---

(note: i added this section later, in 1911 (Dec 2019), but put it here so it would be next to the other relevant sections before it):

i'm considering getting rid of the BootX? layer, and making Boot focused more on simplicity (rather than any care for code density or expressiveness) and on having all of the primitives needed (rather than saving some primitives for BootX?).

The downside is that this means that if we have to port implementations of e.g. garbage collection, they will only be available in Boot, rather than the nicer BootX?. However, in reality they will probably either be in some higher-level language (such as either Oot, or some special-purpose HLL that compiles to Boot), or they will be closely coupled to the platform, in which case they'll be written in some platform-specific language like LLVM.

if we do this, one outstanding question would be: should we include primitives in Boot that are not strictly necessary but are low-level and useful for performance, such as acquire/release atomic loads and stores?

i also am thinking about getting rid of one of OVMlow or OVMhigh. But then there's a problem: the idea was that if you need good interop (with platform dicts etc) then you can implement OVMlow, which is easier to implement than OVMhigh, but OVMhigh provides a simple abstraction with safety. If you only have OVMlow, then there's no JVM-like layer, and the safety properties are buried in the implementation of the compilation from Oot Core to OVMlow. If you only have OVMhigh, then implementing interop on some platform is roughly as hard as building a JVM on that platform, which is too hard.

If we combine OVMlow and OVMhigh, should we restrict OVMhigh to statically known typing? Hmm.. the dynamic/static divide was another reason to keep around both OVMlow and OVMhigh.

one idea is just having interop functions as optional extensions to Boot, then mucking with the toolchain to determine whether a particular build of OVM uses these or uses generic implementations (which are in the OVM project, not in the Boot project). And then only having OVMhigh. That sounds like it may be a practical solution. But wait.. one issue is that then we can't swap out LLVM for Boot on some platforms. That's may be a problem; we're not supposed to be tied to Boot.

But maybe i'm not thinking clearly; just use LLVM like a Boot without the extensions, then implement the platform-specific extensions, if desired, in the OVM implementation. No, the goal here is to be able to have good interop on a new platform without porting OVMhigh directly.

But actually i think we can put interop functions into Boot if we want; whatever is on top of Boot would have to find these functions either there or in whatever other platform it's on anyways, so no harm in putting them in Boot.

And maybe it's not so crazy; if you want to add good interop for a given platform, just add the needed features to that platform's Boot, and then interpret or compile the Boot object code of the Oot implementation into the platform. We're still missing some stuff with this methodology; the platform control flow is not being used here, instead we have an interpreter or compiler running over the Boot code, which implements its own control flow, so Oot functions are not ordinary native first-class functions on the platform (although they can probably be wrapped via one of those interop methods...). And for the same reason, it will be very inefficient; we're mapping all the variables into the available Boot registers, etc. But if you wanted to map Oot code (not just the Oot implementation's code) into something reasonable in the target platform, you'd have to go up a few levels and map ASTs at the Oot Core level anyhow (well, that's not quiten true, since Oot code, not just Oot implementation code, is being compiled into OVMlow; but it's true that mapping from an AST will get more reasonable results than linearizing and then mapping that). And if you wanted to make stuff efficient, you'd probably go up to at least the level of OVMhigh (although there i'm not sure.. there does seem to be opportunities at the level of OVMlow, which is like Rpython and has static typing -- but perhaps these would be better suited to an AST of an HLL that compiles to Boot, rather than to the linear instructions of an interpreted OVMlow, anyways).

so what we are looking at here is maybe an alternation:

So, regarding porting:

---

for boxed OVM values, is everything a promise/thunk? Or can you have a boxed value that is not a thunk?

Should we have the 4th addr mode bit indicate 'boxed'?

What about something that is already a struct and therefore what goes in the actual local variable register is a pointer? I guess 'boxed' indicates that it's to be treated as a reference type, not a value type (and perhaps also that it's a thunk).

And what about copy-on-write-ness? That's something that applies to value types, not reference types, but it involves some degree of boxing.

So should we have the 4th addr mode bit indicate 'boxed'? Or 'copy-on-write'? Or value vs. reference type? Or thunk-ness? Or nullable type? If only one of these (probably 'boxed'), then how do we indicate the others? I guess they should be 'in the box', that is, part of the generic box representation wrapper.

---

if we use fat pointers, then how should that work? My proposal:

4 x pointer width (which must be at least 16 bits):

alternative:

Or just make it (4 x pointer width + 16 bits) and do both:

---

i guess enforcing bounds, the stuff about privileges, and stuff like nullable tyes in boxed representations is an 'anti-service' and so should go in the OVMhigh layer only.

---

the current plan for the smallstack capacity is:

or another plan could be:

the current plan for the smallstack capacity is:

i'm thinking that OVMhigh would just have a single memory stack, not two, because otherwise you have to allocate two stacks which seems annoying. Interestingly, this didn't come up in Boot, b/c the stacks there were of a small, fixed size. So perhaps fixing the size of the stacks is why i came up with this weird idea of having one stack for each primitive type.

OVMhigh could still use two stacks if it wanted to use them to pass arguments, i guess.

eh, just leave it as two stacks. It's not much worse. But two smallstacks or big ones? I'm guessing two smallstacks.

---

.net has a type 'transient ptr' which may only be used within the body of a single method, and points to a value (that will not be moved by the GC) in unmanaged memory. These are only generated internally by the CLI, not created by the user. They can be used for interop to pass to unmanaged code. [10]

---

if we exploit the idea that you usually allocate powers-of-two-sized memory regions, then you can store array capacities in smaller spaces in fat pointers (by storing the log2 of the capacity).

---

so i was thinking that we might have a few different kinds of pointer-like things, like raw pointers (which can only be manipulated by OVMlow) and fat array pointers (which can be manipulated by OVMhigh); and it seems that yeah, this what other languages do:

[11] says that Go slices are basically fat array pointers. It also says that

"Go famously has pointers, including internal pointers, but not pointer arithmetic. You can take the address of (nearly) anything, but you can’t make that pointer point at anything else, even if you took the address of an array element. Pointer arithmetic would undermine Go’s type safety, so it can only be done through special mechanisms in the unsafe package."

Note: the 'nearly' link sorta says that in Go you can take an address of:

but not of:

The HN discussion points out that in these other languages, slices are also fat pointers:

One guy said, yeah, but what else could slices be? And the answer was, well you could have a pointer to a struct that has the fat ptr info and the content (or a pointer to the content):

"For instance traditional "oo" languages don't usually use fat pointers for dynamic dispatch, you have a single pointer to an instance which holds a pointer to its vtable. In Rust however, the "object pointer" is a fat pointer of (vtable, instance). Go's interface pointers are the same (which combined with nils leads to the dreaded "typed nil" issue)."

Also in both Rust and Go, interfaces are fat pointers:

And someone else pointed out another way to do slices: "Another way to implement fat pointers is to keep the pointer representation the same but have another structure like an interval tree that stores a map of pointers to the extra information on bounds. That's how I implemented Bounds Checking GCC back in the day: https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html It has the advantage that pointers remain compatible with code that wasn't specially compiled to know about fat pointers (such as system libraries)." [13]

---

millstone 16 hours ago [-]

Fat pointers + threads permit torn reads/writes, leading to memory unsafety. It seems weird that Go allowed memory unsafety for this narrow use case - AFAIK only slices and interfaces are fat pointers.

reply

DougBTX? 12 hours ago [-]

There's some discussion about unsafety in the face of concurrency from Rob Pike here: https://talks.golang.org/2012/splash.article#TOC_13.

> There is one important caveat: Go is not purely memory safe in the presence of concurrency. Sharing is legal and passing a pointer over a channel is idiomatic (and efficient).

> Some concurrency and functional programming experts are disappointed that Go does not take a write-once approach to value semantics in the context of concurrent computation, that Go is not more like Erlang for example. Again, the reason is largely about familiarity and suitability for the problem domain. Go's concurrent features work well in a context familiar to most programmers. Go enables simple, safe concurrent programming but does not forbid bad programming. We compensate by convention, training programmers to think about message passing as a version of ownership control. The motto is, "Don't communicate by sharing memory, share memory by communicating."

reply

the8472 12 hours ago [-]

If you need thread-safety you must at least use atomics anyway Either your platform has transactional memory, double-wide cas or you need to fall back to locks.

---

some usages of tagged values in various languages: " For instance, Erlang uses first two bits of a machine word to differentiate between objects on heap (boxed values), lists, and immediates which use next two bits to further differentiate between small integers, ports, pids, etc. It’s important to understand here that those bits are tags, not types. In other words, if we have a user type foo we can’t reconstruct it using the tags. Erlang does have types in the form of type specifications but they are not used by the compiler. ... Functional languages with static type systems such as Haskell or OCaml are a bit of both. They do reserve the first bit (or two) to distinguish between boxed and unboxed values, and the values also store information (as a number) about the constructor they were created with. However, in general, the values at run-time do not store any information about their originator type. " [14]

---

i guess since we are making something so that OVMhigh gets preempted, but at the OVMlow level there is no preemption (by our own greenthread system, at least) only cooperative multitasking, we should also have a construct that marks a segment of code as non-preemptible, so that this automatic preemption doesn't apply there.

that marking construct must itself be OVMlow, b/c to allow OVMhigh code to block preemption would be unsafe in the sense that it violates our guarantee of preemption.

---

i think the places for the OVMhigh -> OVMlow compiler to put the preemption/yield statements (or at least the statements that count cycles or instructions and yield whenever that count passes some threshold) are probably:

this is maybe less careful than Erlang's BEAM VM (which i've heard increments an abstract measure of computational time every instruction?), but it may be more careful than Golang's (which i've heard never preempts in a tight purely computational loop?, which must be galling when there is an unexpected hang in one OS thread due to a tight infinite loop in user code).

Th slopiness of not doing every instruction is okay because we're not concerned about realtime performance (or even soft realtime, i guess?), rather we're only concerned that we always preempt in finite time (although we try to do at least a little better than that... :).

---

https://en.m.wikipedia.org/wiki/Type-length-value

---

what should we do if the host language supports arbitrary-precision numbers? Don't we want to be able to interop with that?

i think we should not support them. Only 32-bit or 64-bit. We can have libraries that support them, though (just as with unicode).

---

should we have Pepsi style external blocks? i think not. after all, we have a zillion very different targets.

---

Alice VM / SEAM came after CLI, how is it different in terms of tracking pointers so that user program can't forge pointers out of integers? Is there a simpler way to do this? All we really need is a way to map references versus non-references on the stack and in items in the heap, right?

---

In ovm I'm leaning towards saying that 64 bit ints or floats take up two local variable spots. I'm leaning towards having ints and floats in the same registers but separating references. Also to reduce state I'm leaning towards having ints and floats in the same registers in boot also.

---

In OVM, save a register/ variable To be used as a temporary by the macro compiler from ovm Hi to OVM low for loading in variables past 255. Also save the first 8 for type variable references in case we want to use 12 bits of a 16-bit operand indicate these

---

In ovm, the stacks are 32 capacity, 16 of which is reserved for the macro compiler, but note that the macro compiler has the invariant that any routine that uses these stack locations pop them before the end of that routine -- therefore these stock locations can in theory be compiled away and therefore unless a context switch could happen in the middle of these macro routines space for these stack locations does not have to be allocated as thread state

(note: i'm thinking of changing it back to 16 from 32 tho, and prohibiting OVMhigh from using the smallstacks)

---

OVMhigh has no undefined behavior. This is necessary for pointers-as-capabilities; if there were undefined behavior then possibly on some implementations a program would be able to forge pointers.

One example of undefined behavior that must be prevented is data races (at least, on platforms which specify that data races are undefined behavior, for example C specifies that) (recall the distinction between race conditions and data races; see link in plPartConcurrency if you forgot). This means we must be somewhat conservative in our concurrency, because it must not be possible for the OVMhigh program to contain a data race.

The Bartosz Milewski methodology will probably be useful here (e.g. https://nwcpp.org/talks/2009/Ownership_Systems_against_Data_Races.pdf ). I think the upshot is that data is either immutable, unique (movable but not copyable; not aliasable), thread-local, or protected by an 'ownership'-based automatic locking system.

---

---

COBOL's arithmetic IF:

"

tjalfi 8 hours ago [-]

Arithmetic IF directly matches a hardware instruction on the IBM 704.

This was the first computer with a Fortran compiler.

IF (VALUE) 10, 20, 30

The three arguments after the value are labels that the program jumps to.

The equivalent of this line in C is something like this:

  if (value < 0)
  {
    goto 10;
  }
  else if (value == 0)
  {
    goto 20;
  }
  else
  {
    goto 30;
  }

"

so it's a like a branching instruction with 4 operands.

---

ok so OvmHigh? should not have catch-fire undefined behavior at all, and furthermore, it should not allow data races to manufacture pointers.

following http://altair.cs.oswego.edu/pipermail/memory-model-design/2018-June/000085.html , we give the following guarantees:

---

things to annotate:

variables/values at a point points in code statically points in code dynamically? regions of code lexically regions of code dynamically

---

" Swift's exceptions (which are implemented more like Result and less like unwinding) have the error type always boxed. The caller initializes the "swift error" register to 0, and if there's an exception the callee sets that register to hold the boxed error's pointer. This makes error propagation really fast (just don't change the register), and also doesn't require a Result to actually be materialized in the success case (avoid a ton of copies). Sadly this doesn't transfer over to Rust well, so they can't easily use the native Swift machinery that was added to llvm (swiftcc). "

---

" Swift reserves a callee-preserved register for a method's self argument (pointer) to make repeated calls faster. Cool? "

level 2 simon_o 5 points · 2 days ago

Do people here have an opinion on that? As far as I know, LuaJIT? does the same.

Are there any reasons not to do this? level 3 ubsan 1 point · 1 day ago

One reason might be that you can do, if you have something like f(g(h(x))), under the sysv ABI, you can do something like (iirc):

mov rax, [x] call h call g call f

---

so what we are looking at here is maybe an alternation:

So, regarding porting:

What should the "C-like" language be called? How about "Foot"?

no, actually i think that's kinda pointless. Why flatten the compiled Oot code at the OVMhigh step just to have an AST at the OVMlow step? Maybe OVMhigh should have an AST but not OVMlow. Maybe there should be a HLL that compiles to OVMlow, to make it easy to write the OVMhigh implementation, but that's not in the flow of compiled Oot code.

so it seems that all we've accomplished is to eliminate BootX?; we haven't managed to eliminate either OVMlow or OVMhigh. Although we did maybe come up with a snazy name for one of them, or for an HLL targeting one of them ('foot').

Maybe OVMhigh should be an AST though. Hmm... yeah that makes a lot of sense.

---

so the current plan is

For porters:

Notes on language technology:

Some remaining questions:

update: apparently some ppl think 'foot' is gross. So find a different name. Maybe 'LoVM?'? Or mb the HLL itself is called Lo or Lolang, and its bytecode (wordcode) representation is LoVM?. Or mb 'lowvm' and 'lowlang', but then we lose the connection to OVM which is the next level up.

---

memory allocator ideas:

---

see also ootVm.txt; that file is older and needs to be integrated into this series