see also ootVm.txt; that file is older and needs to be integrated into the list in the following section

---

so i think some stuff that OVM should implement is:

tail calls
management of activation frames/stack frames (in some way) (e.g. save them on the heap when they need to escape for closures)
- so do we introduce the confining stack abstraction here? i think so but am not sure
"a 'closure' instruction which handles all of the issues related to nested lexical scoping, and the management of activation frames" or something
serialization/pickling (see SEAM and their graph store (SEAM Abstract Store) for purposes of GC and pickling; for each bit of code they store both "Alice Abstract Code (so that it can be pickled), and also as compiled code (either compiled to bytecode or to native code). Compiled code is represented as binary 'chunk' nodes in the SEAM Abstract Store; since these are opaque to SEAM, the garbage collector can't tell which other nodes they reference, therefore binary code is wrapped by a pair node which contains (points to) both a binary chunk node and also to an "immediate environment" which is a node holding references to other nodes that are referred to by the binary code. Binary code is position-independent in anticipation of being moved around by the garbage collector.")
- note: this layer can't by itself pickle functions, although it offers facilities for other layers to do so: because that requires some sort of programming language representation -- however, it can be extended to custom node types, so the HLL can define some sort of bytecode or AST representation, and the OVM backend/platform can define native code blobs; so, assuming that the HLL did that, functions can be pickled
- note: one of the important functions here is to serialize the (probably cyclic) graph of pointers in stuff in memory
GC/memory management, read/write barriers
- ownership, e.g. distinction b/t 'move' and 'copy'
- aliases, value types vs reference types ?
- finalizers, 'resources' (in the SEAM sense of wrapping of external objects that cannot be fully reified inside the VM's object graph)
register assignment (on the backend)
arrays, aggregates, dicts, linked lists/S-expressions
- and interop with native data structures
something like futures (see SEAM transients)
concurrency spawning, synchronization and communication
- see L4
- see Golang (channels and Goroutines)
- see Erlang
scheduler (note: interacts with I/O because knows not to reschedule a thread waiting on I/O until the I/O completes; also must support nonblocking I/O)
- see L4
- see Python event loops
- since the HLL implementation on the layer above is trusted, we can use cooperative multitasking; the HLL must implement pre-emption if desired
bounds-checking
capabilities/sandboxing
- however, the HLL implementation on the layer above is trusted; this is a service to make it easy for the HLL to implement capabilities, not a sandbox of the HLL implementation itself; does this mean we have an 'ambient authority/ring 0' notion so the HLL can turn off bounds-checking and debug secrets and the like?
- guaranteed private methods and fields in OOP objects
- seal/unseal
- inability to create a reference unless you are given it (e.g. cannot cast an integer to a reference)
virtualization
calling conventions
- interop; glue with calling to/from platform, and with platform globals and other stuff
do we need to support any OOP stuff?
polymorphism? parameteric or adhoc?
expanded set of primitives (e.g. map, reduce); see ootAssemblyOps*
first class functions? suspended computations (is this different from futures)?
package management/binary module stuff
- dynamic loading/importing/linking/hotpatching of modules (mb call them 'components')
runtime
- constant tables
parsing and/or regexps?
direct compilation possible even to some platforms that Boot can only be interpreted on due to its use of unrestricted indirect jumps to implement a call stack (provided that these platforms provide enough flexibility in terms of stuff like tail calls, first-class continuations, closures, lookup tables, activation records on the stack, etc to do the sort of control flow that we need)
i guess it has to have structured control flow constructs so that it can support transpilation to native platforms. Should look at Haxe ( https://community.haxe.org/t/how-to-implement-new-haxe-target/1599/4 ).
copy-on-write
object model
laziness?
promises
distributed message-passing/mailboxes/calls and 'eventual send'
channels

todo should also read more about the OOP parts of JVM and CIL. todo should also read about Cmm/C-- runtime support todo update Oot todos

as you can see, the purpose of OVM is coming into focus:

services e.g. GC that i think are annoying/administrative rather than 'cool semantics'
- note; these services should be able to be overridden by their platform-native equivalents (e.g. in Python or Java, use Python's or Java's GC if possible). Note that this means that an Oot program can't depend on their exact semantics, since this will vary with platform
code generation stuff that i think is annoying (eg register allocation)
primitives
maybe some optimization?
interop

but everything should be not very dynamic/metaprogrammable/dynamically typed (remember the lesson from PyPy?), so this is different from Oot Core.

---

Instruction format (128-bit):

1 x 16-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)

16 bytes

Note that there is no space for encoding length format bits here -- UNLESS you reduced the opcode by 5 bits to 11 bits. Which isn't crazy. So maybe:

5 encoding length format bits 1 x 11-bit opcode 3 x (16-bit operand + 8-bit addr mode) 1 x (32-bit operand + 8-bit addr mode)

16 bytes

We could also have a 64-bit encoding format:

4 encoding length format bits + 2 RESERVED bits + 10-bit opcode + 4 x (8-bit operand + 4-bit addr mode)

---

i dunno man, this seems like a lot of work to implement in assembly language.

also, what about the idea of incremental implementation? how is the porter going to be able to just implement the bare VM and use existing implementations of most of the opcodes?

i guess some of these things could be compile-time macros (eg hash tables).

but i feel like we really have a couple of levels here.

maybe some/many of these things would be 'standard library functions' at the OVM level (eg hash tables).

hmm, that makes a lot more sense to me. So we would specify a smaller core VM, which has to actually be implemented, and then a large standard library. And then porters would have to implement the VM, and then for better interoperability and efficiency on a given platform they could go ahead and incrementally override parts of the standard library with platform-specific stuff.

another issue is static typing. There's a tension here:

if this is to be a language designed for IMPLEMENTING another language, like PyPy?'s RPython, then it has to be statically typed
if this is to be a VM for a RUNTIME of another language, and then language on top is dynamic and supports distributed 'mobile code' and hotswapping/'live coding', then it has to support dynamicity

I think the solution is: (a) some of the dynamic stuff will be in the layer above (in the implementation of Oot Core on top of OVM) (b) there is some dynamic stuff at this level but it is easy to tell from looking at each instruction if it has any dynamicity. For example, if we use my idea for 'polymorphic assembly' by way of a type operand, then instructions whose type operands are constant mode are statically typed. This means that OVM code that is composed only of non-dynamic instructions can be efficiently compiled. And the language implementation itself will be like that.

Still, this suggests that maybe we are trying to do too many things at once.

Should we have one layer for a 'language implementing language', and then OVM is a 'runtime VM' implemented in that language? The problem with that may be that the 'runtime VM' has to support funky control flow like delimited continuations, so we don't want the language implementing language to impose and abstract away something like a restrictive call chain/stack abstraction, because then it seems like we have another interpreter-on-top-of-an-interpreter layer. But i'm not sure i fully understand that part, so that objection could be invalid. todo/hmmm.

My immediate thoughts are that Oot itself may be the 'language implementing language' that the reference implementation is ultimately written in. So when i say 'it's a lot of work to write this in assembly' that's not relevant, because the Oot implementation will be compiled to Boot, we don't have to write it in Boot directly (except maybe once to bootstrap). But is this really true? I don't expect our compiler to be smart enough to write efficient Boot code for things like context switches in the scheduler.

And, in any case we actually want the runtime VM to have the property that it supports dynamic typing yet you can easily identify code with static typing, because this will help JIT'ing, compilers, etc. This is certainly helpful for efficient compilation of a self-hosting implementation of Oot itself, but it'll be helpful for user code as well, because users will be able to write statically typed Oot code, we can use the ordinary toolchain to compile that to OVM, and then the toolchain will be able to recognize that the OVM code is statically typed and compile it down further rather than interpreting it.

---

so here's the design i'm coming up with. It seems odd to me, in that i don't think i've heard of it being done this way before, but it seems to satisfy the considerations noted in the previous section:

OVM is a VM with opcodes.

Some of the opcodes are primitive. A porter has to implement these. For example, BEQ.

The opcodes which are not primitive are 'standard library functions'. These have implementations provided in the language of OVM, in terms of primitive functions (or other library functions; but there are no circular dependencies between library functions, except for self-referencing recursion (a function can call itself)). For example, hashtable_get(table, key). A porter can start out not implementing these and then implement them incrementally later on to improve interoperation and efficiency.

Some of the opcodes, including some (but probably not all) of the primitive ones, and including some but not all of the standard library ones, are (what's the word for this? secured? protected? guarded? fenced? barricaded? shielded? defended? prohibited? restricted? controlled? secured? access-controlled? restrictedaccess? unsafe? let's say 'unsafe'), in a protection-ring-security-model sense. If we are given some untrusted user code to run, we had better scan through it and make sure it doesn't contain any of these opcodes (alternately, the OVM could have a special mode where it faults on privileged instructions). For example, array_get_noboundscheck(array, location).

Standard library opcode implementations can call unsafe opcodes.

Some of the opcodes can sometimes or always be 'dynamic' (the others are called 'static'). This may make them difficult to statically and efficiently compile to some targets. It is possible to determine whether each instruction instance is static or dynamic just by syntactically inspecting it. For example, 'ADD' (polymorphic addition) is dynamic when its type operand is not a constant.

The Oot implementation is written in terms of only static instructions, and can freely use unsafe opcodes.

User code that contains only static opcodes can be most efficiently compiled.

User code that is untrusted cannot contain unsafe opcodes. However, it can contain safe standard library opcodes which are themselves implemented using unsafe opcodes.

This design should:

allow trusted, static code (such as the Oot toolchain) to be efficiently compiled to code that doesn't waste time bounds-checking, etc, and that can do anything (including serialize stuff, debug secrets, etc)
allow untrusted code to run in a sandboxed fashion (where bounds-checking is guaranteed, privacy restrictions are enforced, etc)
allow porters to start by implementing a small subset of the opcodes (the primitives) and add the rest incrementally (or never)
allow static user code to be compiled
not introduce another language layer or another layer of interpreter dispatch

Should we partition the opcode address space to make it easy to recognize unsafe and primitive and static opcodes? Yeah, why not, we have 16 bits.

I'm thinking that memory management would work like this: there are primitive operations that do stuff (like loading, storing, copying) without managing memory, and primitive operations that do things like incrementing/decrementing reference counts, and reading/writing through read/write barriers. Some or all of loading/storing/copying directly without memory management is unsafe, and messing with reference counts is unsafe. Then memory-aware variants of stuff like loading, storing, copying is provided, and untrusted code (or portable user code) uses that.

---

i guess OVM should have some optional instructions that some platforms and not others implement, and that are not provided by macros/stdlib unless implemented.

For example, unicode stuff: embedded platforms usually won't support this because it requires a lot of data, but desktop platforms usually will. We want the HLL to be able to use the platform unicode stuff if it is there because interop, but otherwise the HLL must do without.

---

some notes from RPython [1]:

"Note that there are tons of special cased restrictions that you’ll encounter as you go. The exact definition is “RPython is everything that our translation toolchain can accept” :)"

ok that's crazy

"variables should contain values of at most one type as described in Object restrictions at each control flow point, that means for example that joining control paths using the same variable to contain both a string and a int must be avoided."
"all module globals are considered constants. Their binding must not be changed at run-time. Moreover, global (i.e. prebuilt) lists and dictionaries are supposed to be immutable: modifying e.g. a global list will give inconsistent results. However, global instances don’t have this restriction, so if you need mutable global state, store it in the attributes of some prebuilt singleton instance."
"for loops restricted to builtin types"
"generators very restricted."
"range does not necessarily create an array, only if the result is modified"
"run-time definition of classes or functions is not allowed."
"generators are supported, but their exact scope is very limited. you can’t merge two different generator in one control point."
"exceptions fully supported"
integer, float, boolean work
strings: "a lot of, but not all string methods are supported and those that are supported, not necesarilly accept all arguments. Indexes can be negative. In case they are not, then you get slightly more efficient code if the translator can prove that they are non-negative. When slicing a string it is necessary to prove that the slice start and stop indexes are non-negative. There is no implicit str-to-unicode cast anywhere. Simple string formatting using the % operator works, as long as the format string is known at translation time; the only supported formatting specifiers are %s, %d, %x, %o, %f, plus %r but only for user-defined instances. Modifiers such as conversion flags, precision, length etc. are not supported."
tuples: "no variable-length tuples...Each combination of types for elements and length constitute a separate and not mixable type."
lists: "lists are used as an allocated array....if you use a fixed-size list, the code is more efficient....Negative or out-of-bound indexes are only allowed for the most common operations..."
dicts: "dicts with a unique key type only, provided it is hashable"
"sets are not directly supported in RPython. Instead you should use a plain dict and fill the values with None. Values in that dict will not consume space."
"function declarations may use defaults and *args, but not keywords."
"function calls may be done to a known function or to a variable one, or to a method....If you need to call a function with a dynamic number of arguments, refactor the function itself to accept a single argument which is a regular list."
A number of builtin functions can be used. The precise set can be found in rpython/annotator/builtin.py (see def builtin_xxx())....int, float, str, ord, chr... are available as simple conversion functions. Note that int, float, str... have a special meaning as a type inside of isinstance only."
- note: the current set is: range, enumerate, reversed, hasattr, zip, min, max. Plus conversions: bool, int, float, chr, unichr, bytearray, tuple, list
"methods and other class attributes do not change after startup...single inheritance is fully supported...classes are first-class objects too"
"The only special methods that are honoured are __init__, __del__, __len__, __getitem__, __setitem__, __getslice__, __setslice__, and __iter__. To handle slicing, __getslice__ and __setslice__ must be used; using __getitem__ and __setitem__ for slicing isn’t supported. Additionally, using negative indices for slicing is still not support, even when using __getslice__. Note that the destructor __del__ should only contain simple operations; for any kind of more complex destructor, consider using instead rpython.rlib.rgc.FinalizerQueue?."
"Exceptions are by default not generated for simple cases...Code with no exception handlers does not raise exceptions...By supplying an exception handler, you ask for error checking. Without, you assure the system that the operation cannot fail. This rule does not apply to function calls: any called function is assumed to be allowed to raise any exception...Exceptions explicitly raised or re-raised will always be generated"

---

my conclusions from the previous section:

we should do:

variables have a single type
module globals are clearly marked as immutable or not; and the only type of mutable module global is a reference (whose dereferenced value might mutate, not the reference itself)
some sort of for-loop iteration but only over built-in types
generators but very restricted, and also you can't merge two different generator in one control point."
exceptions
int, float, bool
strings (ASCII for us)
no negative indices/slice indices
a primitive printf-ish thing
tuples are fixed length, their fields are typed
fixed-length lists are typed as such
dicts
no sets
no variadic functions
range, enumerate, reversed, hasattr, zip, min, max
bool, int, float, chr, unichr, bytearray, tuple, list
here are the magic methods that should be used at some level: __init__, __del__, __len__, __getitem__, __setitem__, __getslice__, __setslice__, __iter__
find out what RPython means by "simple operations" in [2]
functions/classes/methods can't be defined or modified at runtime
"methods and other class attributes do not change after startup...single inheritance is fully supported...classes are first-class objects too"
"The only special methods that are honoured are __init__, __del__, __len__, __getitem__, __setitem__, __getslice__, __setslice__, and __iter__. To handle slicing, __getslice__ and __setslice__ must be used; using __getitem__ and __setitem__ for slicing isn’t supported. Additionally, using negative indices for slicing is still not support, even when using __getslice__. Note that the destructor __del__ should only contain simple operations; for any kind of more complex destructor, consider using instead rpython.rlib.rgc.FinalizerQueue?."

---

if you don't want to allow pointers into the middle of structures, then you probably want to deal with pairs (base_pointer, offset).

---

i guess all we really need to keep track of is where the pointers are. If we know where the pointers are on the stacks and in registers, and we know where the pointers are in memory, then we can prevent sandboxed code from manufacturing its own pointers. (is that how CLR prevents sandboxed code from manufacturing its own pointers by reading integers from memory in as pointers?)

in order to know where pointers are in memory we probably have to have data structure declarations, and force all allocated memory to be one of these data structures (although 'array of bytes' is a valid structure, provided that bytes can never be loaded as pointers).

data structure primitives include i32, ptr, probably also i64, fixed-length arrays, variable-length arrays.

i guess for these purposes we don't really need to know if e.g. such-and-such field within a data structure is a u32 or an i32, we just need its size. So we don't really need typed everything (although that may be useful at the OVM level for other reasons anyhow). And we don't really need 'objects' with encapsulated methods, we just need something more like C structs.

---

for another perspective, consider the runtimeless restricted variant of D, "BetterC?":

[3]

" Retained Features

Nearly the full language remains available. Highlights include:

    Unrestricted use of compile-time features
    Full metaprogramming facilities
    Nested functions, nested structs, delegates and lambdas
    Member functions, constructors, destructors, operating overloading, etc.
    The full module system
    Array slicing, and array bounds checking
    RAII (yes, it can work without exceptions)
    scope(exit)
    Memory safety protections
    Interfacing with C++
    COM classes and C++ classes
    assert failures are directed to the C runtime library
    switch with strings
    final switch
    unittest

...

Unavailable Features

D features not available with BetterC?:

    Garbage Collection
    TypeInfo and ModuleInfo
    Classes
    Built-in threading (e.g. core.thread)
    Dynamic arrays (though slices of static arrays work) and associative arrays
    Exceptions
    synchronized and core.sync
    Static module constructors or destructors

(i don't know why dynamic arrays don't work, i think i read that it has something to do with GC)

this suggests a different path that i mentioned once before, where there would be yet another language in between BootX? and OVM. This other language would be a C-like runtimeless language.

In this case the other language would have the interop stuff and would only run trusted code, and the OVM would run untrusted code (and would do things like bounds-checking and convert cooperative multitasking to preemptive).

I'm not sure that there's really much benefit to not just doing as i said above, though, and have a subset of the OVM instructions be 'unsafe', and have many of the OVM instructions be macroinstructions. One issue with that is that multitasking remains cooperative, not fully pre-emptive; we can fix that by having a facility in the VM to designate an instruction to execute every N instructions, however. As for forcing read and write barriers upon access to various memory lcations, the safe instructions can do that.

---

Two alternatives for how unsafe instructions could be encoded in OVM:

1) BootX? instructions (8-, 16-, 32-bit) are always interpreted as unsafe instructions that do the same thing as in BootX?. The "safe variants" of these (e.g. safe ADD) are different opcodes and/or are in a 64- or 128-bit encoding. The allows the compiler to macrocompile some 'safe' instructions into a series of 'unsafe' instructions while leaving other 'safe' instructions as single (128-bit) instructions. Before beginning execution of untrusted code, OVM must scan it and check that no unsafe instructions are in there (which is easy because the unsafe ones are those less than 128-bit encoding, or mb less than 64-bit). Then during execution any unsafe instructions which are encountered are assumed to be trusted, and executed immediately.

2) BootX? instructions are interpreted as the 'safe variant' of themselves, e.g. LOAD does bounds-checking and reference-counting, etc. This allows the OVM program to be expressed with mostly short (8, 16, 32-bit) instructions. However to express unsafe instructions you would either (a) reuse the same opcodes but they are interpreted differently when the VM is in 'ring 0' protection ring/domain state, or (b) have variants of all of the BootX?-analog opcodes that are 'unsafe', and those can only be executed in the 'ring 0' state.

The advantage of (1) is that the compiler can intersperse unsafe instructions (such as those to update reference counts) with 'user code' without jumping to special unsafe subroutines and without also interspersing 'switch protection ring/domain' instructions. The advantage of (2) is that most user code can be more compact.

I guess it's likely that unsafe instructions may exceed user code in compiled code, because for every primitive actual operation like 'LOAD', this will usually be surrounded with a few unsafe instructions (bounds check, update reference counts). Of course the toolchain might not work like that at all (it may get compile all user code to unsafe instructions if it compiles any of it), but i bet if we're optimizing, it's better to optimize for the use case where safe and unsafe code is interspersed than for pure user code size.

So i'm leaning towards (1). This is also useful for expressing the definition of macroinstructions in terms of primitives and other previously defined macroinstructions.

We could also do both; have one 'ring' state that means 'treat BootX?-encoded instructions as unsafe BootX? instructions' and a different 'ring' state that means 'treat BootX?-encoded instructions as safe user code'. That would also allow us to immediately begin executing untrusted code without scanning it first (and then maybe switch to the other mode later after it's been scanned and/or partially compiled). Heck, why not?

So i'm leaning towards 'both'.

---

Also there are a few alternatives for defining the semantics of the safe variants of BootX? instructions:

(1) we can just define them anew. The fact that they are 'variants' is something we humans know but the computer doesn't know that at all.

(2) we define them using 'hooks', such as hooks for loading and storing stuff to registers, hooks for loading and storing stuff to memory, etc. The semantics of the instructions are defined by applying the original instructions, modified by the hooks.

For (2), it's conceptually nice, but imagine the hooks for an instruction like LOAD. We probably want a hook that executes before LOAD, and can access and alter its inputs or even return without calling LOAD (by raising an error or just returning successfully), and also a hook that executes after LOAD, and can access and alter its output or even raise an error. If the hooks can do all that, it's probably simpler just to redefine the entire instruction. Using choice (1) in the previous section, it's easy to redefine the entire instructions and also call the original instruction at some point within the redefinition.

So i'm leaning towards (1).

We could still have a 'boxed' addr mode bit and have hooks for that.

---

The GC and the serializer/pickler needs to know where pointers are in data structures in memory. Consider something like an array of pointers whose length is only known at runtime.

This data structure's type must be some sort of regular-expression-like template that says "There is a sequence of N pointers here". So where do we store N? Probably at the beginning of the array (another alternative is something like C strings, but let's not support that).

Another alternative is to store N in a fat pointer at each pointer that points the the data structure. But there may be many of those, so it's more memory-efficient to store it once in the data structure itself (although that's "intrusive" if we need to interop with native data structures in this manner; but i think we can just say that native data structures can't be under the GC and can't be serialized, unless someone writes special-case support routines for them).

We could call the part of the data structure that stores N the 'header', because the GC and serializer need to know how to interpret this header, whereas they don't need to interpret the rest of the data structure (they just need to know where the pointers are in the rest).

Now, consider an array of structs of type T, where each struct has two fields, and each field contains a pointer. This is best expressed with "array" being a template/generic type, and in this case the items (the type variable in the template) are of type T. We can support generic types which are possibly polymorphic at runtime if we generalize the header to contain the values assigned to the type variables.

What about the definition of the data structure itself (the thing that says that T has two fields, each of which is a pointer; and the thing that says that array<T> has a header with N and T followed by a payload consisting of an N-item sequence of items, each of which is a T)? We could put that in the header too, making the data structure self-describing. But this would be a ton of duplication because we really only need the definition once, and we may have lots of instances of that data type. So we should store the type elsewhere.

Should the header contain the type of the object? I guess it may as well; again, that allows for runtime polymorphism. Even if the compiler can deduce the type of everything, somewhere the GC would probably want to maintain a big table of the types of each pointer, in which case it wastes no additional memory to move that information into the header for each data structure.

Should the type in the header be a 16-bit type ID? 32-bit? Let's just make it a native pointer to the type definition. Wait, actually, that's confusing if we also have immediate constant types in the instructions, because those would have to either be integers, or the encoding needs to know which operands are types. Hmm, but if types are not pointers, then since type descriptions are of variable size, you have to use the type ID as an index into a table of type pointers, which is inefficient for a common operation. Hmm... We could also just say that where a pointer is expected, an immediate constant is taken to be an index into the pointer constant table.

So i guess that's what our boxed objects look like:

header
- pointer to the type definition of the type of this object
- a list of values of type variables (if this object is of a generic type)
- a list of sizes (integers)
payload

But what about hetrogenous arrays of dynamic size and composition? Now the header itself needs to have a dynamic array of type literals. So actually the integers and the type pointers in the header may be interspersed.

I guess type literals should be a core type itself.

One question is: do we allow internal pointers, that is, pointers that don't point at the base address of a data structure (also, do we allow recursive internal pointers, that is, pointers to another location within the same data structure?). We could also forbid direct internal pointers but allow fat ptrs/references of the form: pair (base ptr, offset) (i'm leaning towards that). TODO

---

Another question is:

When the BootX?-encoded instructions are interpreted as unsafe, are they interpreted just like actual BootX? code (that should be passed thru unchanged by the compiler into the BootX? object code, so they will directly affect the actual BootX? registers), or just as doing the same thing as those BootX? instructions in OVM?

I'm thinking the latter, because:

(a) the target platform for some OVM implementations may be other than BootX? (b) we don't know which OVM registers are being mapped to which BootX? registers anyways, so there's not much that we could do

So i'm leaning towards saying that they are all operating on OVM registers etc; they are not being 'passed thru' by a compiler into BootX? object code, rather they are ordinary OVM instructions.

---

it would be so nice to have the 64-bit encoding's 8-bit operands relate to some set of 256 registers. But i'm not sure how that would coincide with having 16 bit registers (or lexical variables?) in the 128-bit encoding. User code should be able to have more than 256 local variables. And we don't want to do an extra step of register allocation, do we? Otoh we could say that we have 256 REGISTERS but 64k VARIABLES. Hmm, not sure. TODO.

---

" While the common intermediate language (CIL) does have operators that can fetch and set arbitrary memory (and thus violate memory safety), it also has the following memory-safe operators and the CLR strongly encourages their use in most programming:

    Field-fetch operators (LDFLD, STFLD, LDFLDA) that fetch (read), set and take the address of a field by name.
    Array-fetch operators (LDELEM, STELEM, LDELEMA) that fetch, set and take the address of an array element by index. All arrays include a tag specifying their length. This facilitates an automatic bounds check before each access.

" -- [4]

---

So how does the GC scan a stack to find 'managed pointers'?

Here's how CLR does it:

https://mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime/ https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md#managed-frames

" Managed Frames

Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).

Since methods are typically called more times than their frames are crawled (stack crawls are relatively rare in the runtime, at least with respect to the rate at which methods are typically called) it makes sense to trade method call performance for some additional crawl time processing. As a result the JIT generates additional metadata for each method it compiles that includes sufficient information for the stack crawler to decode a stack frame belonging to that method.

This metadata can be found via a hash-table lookup with an instruction pointer somewhere within the method as the key. The JIT utilizes compression techniques in order to minimize the impact of this additional per-method metadata.

Given initial values for a few important registers (e.g. EIP, ESP and EBP on x86 based systems) the stack crawler can locate a managed method and its associated JIT metadata and use this information to roll back the register values to those current in the method's caller. In this fashion a sequence of managed method frames can be traversed from the most recent to the oldest caller. This operation is sometimes referred to as a virtual unwind (virtual because we're not actually updating the real values of ESP etc., leaving the stack intact). "

"All these examples end up calling into the Thread::StackWalkFrames?(..) method here"

"It’s worth pointing out that the only way you can access it from C#/F#/VB.NET code is via the StackTrace? class, only the runtime itself can call into Thread::StackWalkFrames?(..) directly."

" ProcessIp?(..) here has the job of looking up the current managed method (if any) based on the current instruction pointer (IP). It does this by calling into EECodeInfo::Init(..) here and then ends up in one of:

    EEJitManager::JitCodeToMethodInfo(..) here, that uses a very cool looking data structure refereed to as a ‘[https://github.com/dotnet/coreclr/blob/release/2.2/src/inc/nibblemapmacros.h#L12-L26 nibble map]’
    NativeImageJitManager::JitCodeToMethodInfo(..) here
    ReadyToRunJitManager::JitCodeToMethodInfo(..) here"

and it looks to me like in the first case (EEJitManager?) we end up here in RangeSection?* ExecutionManager::GetRangeSection?(TADDR addr):

https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codeman.cpp#L4378-L4459

So, uh, yeah, it looks to me like it's exactly what the above blog post implies: there is some sort of simple list of code ranges and we are simply searching this list to identify which range the current frame's instruction pointer is inside!

not sure exactly what this stuff means: " Unwinding ‘JITted’ Code

Finally, we’re going to look at what happens with ‘managed code’, i.e. code that started off as C#/F#/VB.NET, was turned into IL and then compiled into native code by the ‘JIT Compiler’. This is the code that you generally want to see in your ‘stack trace’, because it’s code you wrote yourself! Help from the ‘JIT Compiler’

Simply, what happens is that when the code is ‘JITted’, the compiler also emits some extra information, stored via the EECodeInfo? class, which is defined here. Also see the ‘Unwind Info’ section in the JIT Compiler <-> Runtime interface, note how it features seperate sections for TARGET_ARM, TARGET_ARM64, TARGET_X86 and TARGET_UNIX.

In addition, in CodeGen::genFnProlog() here the JIT emits a function ‘prologue’ that contains several pieces of ‘unwind’ related data. This is also imlemented in CEEJitInfo::allocUnwindInfo(..) in this piece of code, which behaves differently for each CPU architecture: "

If it finds the code then for each frame it returns a MethodDesc? to the caller (the code that wanted to walk the stack) " Managed code, well technically ‘managed code’ that was JITted to ‘native code’, so more accurately a managed stack frame. In this situation the MethodDesc? class defined here is provided, you can read more about this key CLR data-structure in the corresponding BotR? chapter. "

---

So the upshot from the previous section is, for each frame in the stack:

there is some sort of simple list of code ranges and we are simply searching this list to identify which range the current frame's instruction pointer is inside!

what does the JVM do? "The VM identifies the method that owns the stack frame by looking up the instruction pointer value in the method code block tables. " [5]

" When an exception is thrown through a PC, we first examine which try block covers the program range including that PC. For the table- driven exception handling [55, 20], which is commonly used to im- plement exception handling, as in the Java VM [48], the compiler constructs the table that maps a range of PCs to a try block. With the table, the PC is searched for and is mapped to a try block. If a try block is found, we then examine whether the handler can catch " -- [6]

Recall that the Book of the Runtime said:

" Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).

So, it seems to be that we can simplify the implementation by simply specifying that in an OVM stack frame there is a pointer to the relevant metadata (stack frame Type object, i guess). These will be at a specific location relative to the Frame Pointers which will be chained together. Problem solved, i think? This is inefficient in that every single stack frame has this superfluous extra item, but it will make our implementation code simpler (no need to map between code ranges and stack pointers, which is probably important because in the general case (e.g. transpilation) we can't assume that pointers are comparable, etc, which makes representing and doing arithmetic on 'code pointers' tricky).

This solution does grate on me though, because it is inefficient, and leaf function calling is one place where you really really do want to be efficient.

I guess we could leave this up to the implementation; maybe we should just say that there is an OVM instruction that maps a frame pointer to the frame type, and the implementation can implement this however it wants (PC->frame type maps or storing a frame type pointer in the frame).

I guess we might still have that other problem that CLR has, about interleaving our stack frames with native stack frames.

maybe see also the 'Stack Unwinding (other runtimes)' section in https://mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime/

---

Just look at this huge piece of code, this demonstrates why we are motivated to recreate all of this (to have a simpler, at the cost of lower performance, system):

https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/stackwalk.cpp

(btw, [7] guides you through the above code)

---

anyhow so the upshot of the above is:

we probably need an OVM instruction that maps a frame pointer to the type of that frame
(runtime call) stacks should probably be a special case of our type system, since they are composed of a hetrogenous list of frames, but the types of those frames are not all in a header, but rather computed from each frame itself in a platform-specific manner
maybe we should allow hetrogenous arrays to also have each item contain its own header with its own type, rather than forcing the whole array header to have all of the item types
maybe we should generalize the concept of families of types (such as the family of call stack frame types) whose actual exact type must be computed from an instance, rather than being found in a header

---

arg, it would be so pretty to give separate functions to the 64-bit encoding (8-bit operands) and the 128-bit encoding (16-bit operands). Even though it may mean another layer of register allocation. Let's do it.

Let's put all the unsafe stuff in the 64-bit encoding.

So OVM has two layers; the 'unsafe/interop' layer and the 'user code/managed code VM' layer.

---

i'm thinking that the 'registers' in OVM will actually reference locals, e.g. they are transparently stored/reloaded when you make/return from a function call.

This is similar to how Lua does it:

" Local variables are equivalent to certain registers in the currentstack frame, while dedicated opcodes allow read/write of globals and upvalues ... By default, Lua has a maximum stack frame size of 250. This is encoded as MAXSTACK inllimits.h. The maximum stack frame size in turn limits the maximum number of localsper function, which is set at 200, encoded as LUAI_MAXVARS in luaconf.h. Other limitsfound in the same file include the maximum number of upvalues per function (60), encodedas LUAI_MAXUPVALUES, call depths, the minimum C stack size, etc. Also, with an sBx fieldof 18 bits, jumps and control structures cannot exceed a jump distance of about 131071 " -- [8]

for upvals and globals, there are GETUPVAL, GETGLOBAL, SETUPVAL, SETGLOBAL.

---

So, the lower (64-bit encoding/unsafe) layer of OVM will:

have ~256 per-function local 'registers' (or are pointer registers separate again, meaning we would have 2*256? probably)
introduce the call stack abstraction

---

as ECMA-335 says,

" III.4 Object model instructions The instructions described in the base instruction set are independent of the object model being executed. Those instructions correspond closely to what would be found on a real CPU. The object model instructions are less built-in than the base instructions in the sense that they could be built out of the base instructions and calls to the underlying operating system. [Rationale: The object model instructions provide a common, efficient implementation of a set of services used by many (but by no means all) higher-level languages. They embed in their operation a set of conventions defined by the CTS. This include (among other things):  Field layout within an object  Layout for late bound method calls (vtables)  Memory allocation and reclamation  Exception handling  Boxing and unboxing to convert between reference-based objects and value types "

i guess some of this is stuff that would be in the upper (safe/128-bit) layer of OVM.

---

so i guess the purpose of the levels so far is:

Boot/BootX?: a portability layer. Easy to implement, even as an interpreter on top of a HLL, and is not constraining (e.g. no structured programming or even a call stack abstraction). Some attention paid to code density and efficiency.
- two layers: Boot: easy to implement, BootX?: macrocompiles to Boot; greater code density/readability/expressiveness/semantics-preservation.
OVM: take care of all the annoying/low-level parts of implementing a programming language runtime including interop. Two layers (OVMhigh macrocompiles to OCMlow):
- OVMlow: provides function calls, local variables. Only 256 really local variables. Maybe restricted to statically known typing?
- OVMhigh: macrocompiles to OVMlow; safe/sandboxable. No undefined behavior.
Oot Core: a gem-like thing of perfect beauty [9]. Captures Oot's semantics and core concepts. Lacks convenient syntax, is verbose due to explicitness.
pre-Oot: a metaprogrammable language that compiles or is interpreted to OVM Core. Metaprogramming is in terms of the OVM Core AST. (should the metaprogramming used from here on up only involve local/context-free transformations e.g. macros? should we have a layer that only uses 'template macros' e.g. macros where blocks of code are directly (but hygenically) substituted in, not evaluated at compile-time? also, i was talking about allowing all-caps to escape macro hygenicity, should we do that?)
Oot-nostdlib: built out of pre-Oot using only metaprogramming
Oot: Oot-nostdlib plus an always-accessible standard library, which may include additional metaprogramming

Boot/BootX?, Oot Core, Oot-nostdlib, and Oot are (informally) specified and may be (after 1.0) (somewhat) relied upon by users/other projects. OVM and pre-Oot are considered implementation details not to be relied upon. Note that although the semantics of Oot are defined in terms of pre-Oot and oot-nostdlib plus metaprogramming, implementations are free to recognize and compile these 'metaprogrammed' constructs in a hardwired way.

---

yeah, the more i think about it, the more i think that OVM having local variables (rather than registers) and a call stack may be the key to the purpose of this level of abstraction.

you can also say that the purpose of OVM is to provide services. OVMlow provides services, and OVMhigh provides ANTI-services; that is, useful restrictions on expressiveness (such as privacy and memory safety).

---

(note: i added this section later, in 1911 (Dec 2019), but put it here so it would be next to the other relevant sections before it):

i'm considering getting rid of the BootX? layer, and making Boot focused more on simplicity (rather than any care for code density or expressiveness) and on having all of the primitives needed (rather than saving some primitives for BootX?).

The downside is that this means that if we have to port implementations of e.g. garbage collection, they will only be available in Boot, rather than the nicer BootX?. However, in reality they will probably either be in some higher-level language (such as either Oot, or some special-purpose HLL that compiles to Boot), or they will be closely coupled to the platform, in which case they'll be written in some platform-specific language like LLVM.

if we do this, one outstanding question would be: should we include primitives in Boot that are not strictly necessary but are low-level and useful for performance, such as acquire/release atomic loads and stores?

i also am thinking about getting rid of one of OVMlow or OVMhigh. But then there's a problem: the idea was that if you need good interop (with platform dicts etc) then you can implement OVMlow, which is easier to implement than OVMhigh, but OVMhigh provides a simple abstraction with safety. If you only have OVMlow, then there's no JVM-like layer, and the safety properties are buried in the implementation of the compilation from Oot Core to OVMlow. If you only have OVMhigh, then implementing interop on some platform is roughly as hard as building a JVM on that platform, which is too hard.

If we combine OVMlow and OVMhigh, should we restrict OVMhigh to statically known typing? Hmm.. the dynamic/static divide was another reason to keep around both OVMlow and OVMhigh.

one idea is just having interop functions as optional extensions to Boot, then mucking with the toolchain to determine whether a particular build of OVM uses these or uses generic implementations (which are in the OVM project, not in the Boot project). And then only having OVMhigh. That sounds like it may be a practical solution. But wait.. one issue is that then we can't swap out LLVM for Boot on some platforms. That's may be a problem; we're not supposed to be tied to Boot.

But maybe i'm not thinking clearly; just use LLVM like a Boot without the extensions, then implement the platform-specific extensions, if desired, in the OVM implementation. No, the goal here is to be able to have good interop on a new platform without porting OVMhigh directly.

But actually i think we can put interop functions into Boot if we want; whatever is on top of Boot would have to find these functions either there or in whatever other platform it's on anyways, so no harm in putting them in Boot.

And maybe it's not so crazy; if you want to add good interop for a given platform, just add the needed features to that platform's Boot, and then interpret or compile the Boot object code of the Oot implementation into the platform. We're still missing some stuff with this methodology; the platform control flow is not being used here, instead we have an interpreter or compiler running over the Boot code, which implements its own control flow, so Oot functions are not ordinary native first-class functions on the platform (although they can probably be wrapped via one of those interop methods...). And for the same reason, it will be very inefficient; we're mapping all the variables into the available Boot registers, etc. But if you wanted to map Oot code (not just the Oot implementation's code) into something reasonable in the target platform, you'd have to go up a few levels and map ASTs at the Oot Core level anyhow (well, that's not quiten true, since Oot code, not just Oot implementation code, is being compiled into OVMlow; but it's true that mapping from an AST will get more reasonable results than linearizing and then mapping that). And if you wanted to make stuff efficient, you'd probably go up to at least the level of OVMhigh (although there i'm not sure.. there does seem to be opportunities at the level of OVMlow, which is like Rpython and has static typing -- but perhaps these would be better suited to an AST of an HLL that compiles to Boot, rather than to the linear instructions of an interpreted OVMlow, anyways).

so what we are looking at here is maybe an alternation:

Boot, simplified, and with optional interop instructions (things like 'create platform-native first-class function from this Boot code', 'lookup key in dict', 'open file')
forget BootX?
a "C-like" low-level HLL with a standardized AST, which compiles to Boot
OVM (OVMhigh), written in that low-level HLL
Oot Core, implemented on top of OVM

So, regarding porting:

if you just want to run Oot programs (such as the Oot implementation itself), you can just port Boot, with none of the extensions (or just the floating-point extensions, and/or the file system extensions, etc, if you want to run programs that use that stuff)
if you want to make Oot run efficiently on your platform, then you write a new backend for the "C-like" language that OVM is implemented in
you shouldn't need to reimplement OVM or Oot Core unless you really want efficiency or very good interop

---

for boxed OVM values, is everything a promise/thunk? Or can you have a boxed value that is not a thunk?

Should we have the 4th addr mode bit indicate 'boxed'?

What about something that is already a struct and therefore what goes in the actual local variable register is a pointer? I guess 'boxed' indicates that it's to be treated as a reference type, not a value type (and perhaps also that it's a thunk).

And what about copy-on-write-ness? That's something that applies to value types, not reference types, but it involves some degree of boxing.

So should we have the 4th addr mode bit indicate 'boxed'? Or 'copy-on-write'? Or value vs. reference type? Or thunk-ness? Or nullable type? If only one of these (probably 'boxed'), then how do we indicate the others? I guess they should be 'in the box', that is, part of the generic box representation wrapper.

---

if we use fat pointers, then how should that work? My proposal:

4 x pointer width (which must be at least 16 bits):

base pointer
offset
capacity (max offset + 1)
16 flag bits: not sure what these would be used for, but maybe privileges (e.g. read-only vs writable, aliasable). Could also use some of these for the stuff that tagged pointers are used for; e.g. type tags, ref counts, whether an object has a destructor, etc.

alternative:

storage space identifier (e.g. memory bank 1, memory bank 2)
base pointer
offset
capacity (max offset + 1)

Or just make it (4 x pointer width + 16 bits) and do both:

storage space identifier (e.g. memory bank 1, memory bank 2)
base pointer
offset
capacity (max offset + 1)
16 flag bits

---

i guess enforcing bounds, the stuff about privileges, and stuff like nullable tyes in boxed representations is an 'anti-service' and so should go in the OVMhigh layer only.

---

the current plan for the smallstack capacity is:

capacity 32 for Boot/BootX? user code
OVMlow implementation and OVMhigh->OVMlow macrocompiler consumes 16 of this capacity
capacity 16 remains for OVMhigh user code to make use of?

or another plan could be:

the current plan for the smallstack capacity is:

capacity 32 for Boot/BootX? user code
OVMlow implementation and OVMhigh->OVMlow macrocompiler consumes 16 of this capacity
capacity 16 remains for OVMhigh implementation to make use of; OVMhigh presents up to 64k stack sizes to user code (each function says how many spaces it uses, like with registers)
OVMhigh user code doesn't have any access to smallstacks at all.

i'm thinking that OVMhigh would just have a single memory stack, not two, because otherwise you have to allocate two stacks which seems annoying. Interestingly, this didn't come up in Boot, b/c the stacks there were of a small, fixed size. So perhaps fixing the size of the stacks is why i came up with this weird idea of having one stack for each primitive type.

OVMhigh could still use two stacks if it wanted to use them to pass arguments, i guess.

eh, just leave it as two stacks. It's not much worse. But two smallstacks or big ones? I'm guessing two smallstacks.

---

.net has a type 'transient ptr' which may only be used within the body of a single method, and points to a value (that will not be moved by the GC) in unmanaged memory. These are only generated internally by the CLI, not created by the user. They can be used for interop to pass to unmanaged code. [10]

---

if we exploit the idea that you usually allocate powers-of-two-sized memory regions, then you can store array capacities in smaller spaces in fat pointers (by storing the log2 of the capacity).

---

so i was thinking that we might have a few different kinds of pointer-like things, like raw pointers (which can only be manipulated by OVMlow) and fat array pointers (which can be manipulated by OVMhigh); and it seems that yeah, this what other languages do:

[11] says that Go slices are basically fat array pointers. It also says that

"Go famously has pointers, including internal pointers, but not pointer arithmetic. You can take the address of (nearly) anything, but you can’t make that pointer point at anything else, even if you took the address of an array element. Pointer arithmetic would undermine Go’s type safety, so it can only be done through special mechanisms in the unsafe package."

Note: the 'nearly' link sorta says that in Go you can take an address of:

a variable, pointer indirection, or slice indexing operation; or a field selector of an addressable struct operand; or an array indexing operation of an addressable array, a composite literal.

but not of:

values in a map and the return values from function and method calls
- The return value of a function only becomes addressable when put into a variable
Functions

The HN discussion points out that in these other languages, slices are also fat pointers:

Go: https://blog.golang.org/go-slices-usage-and-internals
D: https://dlang.org/spec/arrays.html#dynamic-arrays (see also Walter Bright's suggestion for C to adopt fat ptrs: https://www.digitalmars.com/articles/b44.html )
Rust
C++ std::string_view
- someone also mentioned C++ spans and views
Haskell ByteStrings?
String.substring in Java (until 2007, see [12])
maybe numpy arrays
"Algol 68 slices are also fat pointers. Only they also carry a `stride` field, since in Algol 68 you can slice multidimensional arrays in any direction. "
random guy's array implementation in C: https://tse.gratis/aArray/

One guy said, yeah, but what else could slices be? And the answer was, well you could have a pointer to a struct that has the fat ptr info and the content (or a pointer to the content):

"For instance traditional "oo" languages don't usually use fat pointers for dynamic dispatch, you have a single pointer to an instance which holds a pointer to its vtable. In Rust however, the "object pointer" is a fat pointer of (vtable, instance). Go's interface pointers are the same (which combined with nils leads to the dreaded "typed nil" issue)."

Also in both Rust and Go, interfaces are fat pointers:

Rust trait object 'pointers' are (vtable, instance)
Go interfaces: https://research.swtch.com/interfaces
- "Go interfaces are also fat pointers. Basically, the pointer to the type info/method set and the actual data are stored separately. As a result of this you can use pointers as interface types without boxing them. For example you can take an interior pointer to an array, or a C pointer, and use it as an interface type without boxing since the type info doesn't need to be stored with the data."

And someone else pointed out another way to do slices: "Another way to implement fat pointers is to keep the pointer representation the same but have another structure like an interval tree that stores a map of pointers to the extra information on bounds. That's how I implemented Bounds Checking GCC back in the day: https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html It has the advantage that pointers remain compatible with code that wasn't specially compiled to know about fat pointers (such as system libraries)." [13]

---

millstone 16 hours ago [-]

Fat pointers + threads permit torn reads/writes, leading to memory unsafety. It seems weird that Go allowed memory unsafety for this narrow use case - AFAIK only slices and interfaces are fat pointers.

DougBTX? 12 hours ago [-]

There's some discussion about unsafety in the face of concurrency from Rob Pike here: https://talks.golang.org/2012/splash.article#TOC_13.

> There is one important caveat: Go is not purely memory safe in the presence of concurrency. Sharing is legal and passing a pointer over a channel is idiomatic (and efficient).

> Some concurrency and functional programming experts are disappointed that Go does not take a write-once approach to value semantics in the context of concurrent computation, that Go is not more like Erlang for example. Again, the reason is largely about familiarity and suitability for the problem domain. Go's concurrent features work well in a context familiar to most programmers. Go enables simple, safe concurrent programming but does not forbid bad programming. We compensate by convention, training programmers to think about message passing as a version of ownership control. The motto is, "Don't communicate by sharing memory, share memory by communicating."

the8472 12 hours ago [-]

If you need thread-safety you must at least use atomics anyway Either your platform has transactional memory, double-wide cas or you need to fall back to locks.

---

some usages of tagged values in various languages: " For instance, Erlang uses first two bits of a machine word to differentiate between objects on heap (boxed values), lists, and immediates which use next two bits to further differentiate between small integers, ports, pids, etc. It’s important to understand here that those bits are tags, not types. In other words, if we have a user type foo we can’t reconstruct it using the tags. Erlang does have types in the form of type specifications but they are not used by the compiler. ... Functional languages with static type systems such as Haskell or OCaml are a bit of both. They do reserve the first bit (or two) to distinguish between boxed and unboxed values, and the values also store information (as a number) about the constructor they were created with. However, in general, the values at run-time do not store any information about their originator type. " [14]

---

i guess since we are making something so that OVMhigh gets preempted, but at the OVMlow level there is no preemption (by our own greenthread system, at least) only cooperative multitasking, we should also have a construct that marks a segment of code as non-preemptible, so that this automatic preemption doesn't apply there.

that marking construct must itself be OVMlow, b/c to allow OVMhigh code to block preemption would be unsafe in the sense that it violates our guarantee of preemption.

---

i think the places for the OVMhigh -> OVMlow compiler to put the preemption/yield statements (or at least the statements that count cycles or instructions and yield whenever that count passes some threshold) are probably:

before each backwards jump (so that we get all the loops)
before and after each call to external code (because the external code could block or at least take a long time)
before each function call and return
every X instructions

this is maybe less careful than Erlang's BEAM VM (which i've heard increments an abstract measure of computational time every instruction?), but it may be more careful than Golang's (which i've heard never preempts in a tight purely computational loop?, which must be galling when there is an unexpected hang in one OS thread due to a tight infinite loop in user code).

Th slopiness of not doing every instruction is okay because we're not concerned about realtime performance (or even soft realtime, i guess?), rather we're only concerned that we always preempt in finite time (although we try to do at least a little better than that... :).

---

https://en.m.wikipedia.org/wiki/Type-length-value

---

what should we do if the host language supports arbitrary-precision numbers? Don't we want to be able to interop with that?

i think we should not support them. Only 32-bit or 64-bit. We can have libraries that support them, though (just as with unicode).

---

should we have Pepsi style external blocks? i think not. after all, we have a zillion very different targets.

---

Alice VM / SEAM came after CLI, how is it different in terms of tracking pointers so that user program can't forge pointers out of integers? Is there a simpler way to do this? All we really need is a way to map references versus non-references on the stack and in items in the heap, right?

---

In ovm I'm leaning towards saying that 64 bit ints or floats take up two local variable spots. I'm leaning towards having ints and floats in the same registers but separating references. Also to reduce state I'm leaning towards having ints and floats in the same registers in boot also.

---

In OVM, save a register/ variable To be used as a temporary by the macro compiler from ovm Hi to OVM low for loading in variables past 255. Also save the first 8 for type variable references in case we want to use 12 bits of a 16-bit operand indicate these

---

In ovm, the stacks are 32 capacity, 16 of which is reserved for the macro compiler, but note that the macro compiler has the invariant that any routine that uses these stack locations pop them before the end of that routine -- therefore these stock locations can in theory be compiled away and therefore unless a context switch could happen in the middle of these macro routines space for these stack locations does not have to be allocated as thread state

(note: i'm thinking of changing it back to 16 from 32 tho, and prohibiting OVMhigh from using the smallstacks)

---

OVMhigh has no undefined behavior. This is necessary for pointers-as-capabilities; if there were undefined behavior then possibly on some implementations a program would be able to forge pointers.

One example of undefined behavior that must be prevented is data races (at least, on platforms which specify that data races are undefined behavior, for example C specifies that) (recall the distinction between race conditions and data races; see link in plPartConcurrency if you forgot). This means we must be somewhat conservative in our concurrency, because it must not be possible for the OVMhigh program to contain a data race.

The Bartosz Milewski methodology will probably be useful here (e.g. https://nwcpp.org/talks/2009/Ownership_Systems_against_Data_Races.pdf ). I think the upshot is that data is either immutable, unique (movable but not copyable; not aliasable), thread-local, or protected by an 'ownership'-based automatic locking system.

---

COBOL's arithmetic IF:

tjalfi 8 hours ago [-]

Arithmetic IF directly matches a hardware instruction on the IBM 704.

This was the first computer with a Fortran compiler.

IF (VALUE) 10, 20, 30

The three arguments after the value are labels that the program jumps to.

The equivalent of this line in C is something like this:

  if (value < 0)
  {
    goto 10;
  }
  else if (value == 0)
  {
    goto 20;
  }
  else
  {
    goto 30;
  }

so it's a like a branching instruction with 4 operands.

---

ok so OvmHigh? should not have catch-fire undefined behavior at all, and furthermore, it should not allow data races to manufacture pointers.

following http://altair.cs.oswego.edu/pipermail/memory-model-design/2018-June/000085.html , we give the following guarantees:

a single thread’s execution is consistent with the single thread semantics of the language, given the value returned by a shared memory read
a read returns the value of some write to the same address after ((some event TBD, such as a fence or an acquire or release; this part is important so that you don't have a read to memory which in the past was free'd and reallocated memory containing values from its previous incarnation))

---

things to annotate:

variables/values at a point points in code statically points in code dynamically? regions of code lexically regions of code dynamically

---

" Swift's exceptions (which are implemented more like Result and less like unwinding) have the error type always boxed. The caller initializes the "swift error" register to 0, and if there's an exception the callee sets that register to hold the boxed error's pointer. This makes error propagation really fast (just don't change the register), and also doesn't require a Result to actually be materialized in the success case (avoid a ton of copies). Sadly this doesn't transfer over to Rust well, so they can't easily use the native Swift machinery that was added to llvm (swiftcc). "

---

" Swift reserves a callee-preserved register for a method's self argument (pointer) to make repeated calls faster. Cool? "

level 2 simon_o 5 points · 2 days ago

Do people here have an opinion on that? As far as I know, LuaJIT? does the same.

Are there any reasons not to do this? level 3 ubsan 1 point · 1 day ago

One reason might be that you can do, if you have something like f(g(h(x))), under the sysv ABI, you can do something like (iirc):

mov rax, [x] call h call g call f

---

so what we are looking at here is maybe an alternation:

Boot, simplified, and with optional interop instructions (things like 'create platform-native first-class function from this Boot code', 'lookup key in dict', 'open file')
forget BootX?
a "C-like" low-level HLL with a simple standardized AST representation, which compiles to Boot. This language respects the same limits that a language that is interpreted by the old idea of OVMlow would have had; 256 local variables, 16 nesting limit for expressions (so that they can fit on SMALLSTACK), etc, and the standardized AST shows this.
OVM (OVMhigh), written in that low-level HLL
Oot Core, implemented on top of OVM

So, regarding porting:

if you just want to run Oot programs (such as the Oot implementation itself), you can just port Boot, with none of the extensions (or just the floating-point extensions, and/or the file system extensions, etc, if you want to run programs that use that stuff)
if you want to make Oot run efficiently on your platform, then you write a new backend for the "C-like" language that OVM is implemented in
you shouldn't need to reimplement OVM or Oot Core unless you really want efficiency or very good interop

What should the "C-like" language be called? How about "Foot"?

no, actually i think that's kinda pointless. Why flatten the compiled Oot code at the OVMhigh step just to have an AST at the OVMlow step? Maybe OVMhigh should have an AST but not OVMlow. Maybe there should be a HLL that compiles to OVMlow, to make it easy to write the OVMhigh implementation, but that's not in the flow of compiled Oot code.

so it seems that all we've accomplished is to eliminate BootX?; we haven't managed to eliminate either OVMlow or OVMhigh. Although we did maybe come up with a snazy name for one of them, or for an HLL targeting one of them ('foot').

Maybe OVMhigh should be an AST though. Hmm... yeah that makes a lot of sense.

---

so the current plan is

Boot, dead simple fixed-length register machine assembly language, with optional interop extensions to the instruction set (things like 'create platform-native first-class function from this Boot code', 'lookup key in dict', 'open file'). Has undefined behavior and trusts the program implicitly. Does not define stack behavior, function calling ABI (except for interop calls). BootX? is no more. Comparable to LLVM or RISC-V, but easier to implement (at the expense of efficiency), and focused on interop.
OVMlow, a VM with a linear instruction stream. Statically typed. Has undefined behavior and trusts the program implicitly. Boot code can be embedded and is directly executed. Defines function calling semantics. -- Foot: a HLL that compiles to OVMlow; has limits on e.g. number of local variables suited to the OVMlow instruction encoding. Comparable to C, but maybe slightly lower level, and easier to implement, at the expense of efficiency and limits.
OVMhigh, a VM that runs ASTs. Dynamically typed. Compiles to OVMlow in a simple manner (i.e. mostly macro expansion). Safe; does not have undefined behavior, and does not fully trust the program (e.g. mandatory bounds-checking; multitasking is preemptive, not cooperative; Boot code which is embedded is not directly executed, but is instead executed as ordinary OVMhigh code in a safe manner). Comparable to JVM. -- does OVMhigh have limits on e.g. number of local variables suited to the OVMlow instruction encoding? -- the idea is that OVMlow and OVMhigh provide all of the nasty generic low-level language services such as garbage collection, so that the Oot Core implementation in OVMhigh can focus on implementing Oot Core semantics
Oot Core: captures the semantics of Oot. Comparable to small Scheme.
pre-Oot: a subset of Oot, with enough metaprogramming to build the rest of Oot via metaprogramming
Oot-nostdlib: Oot without the standard library (note: stdlib is allowed to contain more metaprogramming)
Oot: Oot-nostdlib + stdlib

For porters:

the easiest thing to do is just to port vanilla Boot. This will allow Oot programs that don't need the extensions (e.g. those that use the console but not the file system or GUI; e.g. the Oot compiler/interpreter itself) to run.
to provide more I/O capabilities, implement the relevant Boot extensions.
to provide more interop, implement the relevant Boot extensions. The idea is that getting to this step will be significantly easier than porting e.g. LLVM or the JVM to your platform; this ease of porting is why we bothered to invent Boot.
to provide better efficiency and more interop, port OVMlow or OVMhigh
to provide even better efficiency and more interop, port Oot Core

Notes on language technology:

Boot is a relatively language-agnostic language implementation technology that could be used by a wide variety of other projects in place of LLVM when they prioritize being easy to port to host platforms over efficiency
Foot is a language implementation technology which is less language-agnostic than Boot but still somewhat language-agnostic. It could be used by a wide variety of other projects in place of LLVM when they prioritize easy of porting and interop with host platforms over efficiency, and when they need to support powerful control structures in the HLL.

Some remaining questions:

do we provide most of the language services at the OVMlow or OVMhigh level? Providing them at the OVMlow level makes OVMlow harder to port, but it makes it easier for porters to override them with platform-native alternatives (without having to port all of OVMhigh) -- my current guess is that we should provide hooks in OVMlow (if not directly in Boot) for porters to register platform-native alternatives, but if no such alternatives are registered, then our implementation of these services is part of OVMhigh, not OVMlow. So OVMlow is really focused only on providing those basic structures (such as a call stack) that porters will probably want to integrate with the platform native alternatives, and which can't be easily split out into interop APIs. Whenever something can be cleanly split into an interop API hook, that API is put in Boot if possible (e.g. probably dict access) or not there then in OVMlow if possible (e.g. probably anything having to do with call stacks), but the Oot default implementation is in OVMhigh if possible (which allows us to write most of this stuff in Foot rather than in Boot).

update: apparently some ppl think 'foot' is gross. So find a different name. Maybe 'LoVM?'? Or mb the HLL itself is called Lo or Lolang, and its bytecode (wordcode) representation is LoVM?. Or mb 'lowvm' and 'lowlang', but then we lose the connection to OVM which is the next level up.

---

memory allocator ideas:

each physical CPU gets its own arena of memory (or we could be like jemalloc and give each CPU 4 arenas?)
use a buddy allocator to allocate total memory between CPUs -- this is simple and quick so it should reduce pause times due to concurrency in memory allocation
use a slab allocator within CPUs

---

see also ootVm.txt; that file is older and needs to be integrated into this series

proj-oot-ootOvmNotes1

16 bytes

16 bytes