proj-oot-ootOvmNotes2

see also ootVm.txt; that file is older and needs to be integrated into this series

---

we really should look more closely at existing 'multitarget' programming languages:

the most well known/successful ones i can think of are:

others include:

lists: -- https://github.com/jashkenas/coffeescript/wiki/List-of-languages-that-compile-to-JS#multitarget

dead stuff that i don't think needs to be looked at:

---

we should also consider just targetting Haxe or Shen, which already have lots of targets

---

so how do haxe and shen work?

Haxe is an OCaml compiler that compiles to an AST and then has a separate backend for each target that compiles the AST to that target.

Shen compiles to a small core language, KLambda or KL, with about 46 functions (with only about 35 basis functions needed to express the rest), which then has a separate backend for each target.

i feel like haxe is more concerned with source-to-source 'transpilation'.

Shen porting guide: https://github.com/Shen-Language/shen-sources/blob/master/doc/porting.md

the intersection of haxe and shen (certified for Shen) target platforms:

this is almost also what Fantom supports, although Fantom targets JVM not Java -- Fantom uses a bytecode for CLR and JVM so it is less concerned about source-to-source 'transpilation'.

if you add in Shen 'uncertified' you also get C++ and Python, for both Haxe and Shen (but not Fantom)

fantom for JVM and CLR uses fcode, which is a bytecode [2] [3] [4], but for JS translates to JS at compile time [5]

see also http://www.fandev.org/doc/compiler/

docs and code about fcode: https://bitbucket.org/fantom/fan-1.0/src/default/src/compiler/fan/fcode/FOp.fan https://bitbucket.org/fantom/fan-1.0/src/default/src/compiler/fan/fcode/format.txt https://bitbucket.org/fantom/fan-1.0/src/default/src/compiler/fan/assembler/CodeAsm.fan

(forum messages regarding fcode: https://fantom.org/forum/topic/1668 https://fantom.org/forum/topic/1668 )

info about how good the various haxe targets are:

https://community.haxe.org/t/where-is-haxe-heading/405

the neko VM was written by the lead Haxe guy, and was the primary VM for haxe. Now Haxe is planning to replace Neko with Hashlink [6].

https://hashlink.haxe.org/ https://github.com/HaxeFoundation/hashlink/wiki https://haxe.org/blog/hashlink-indepth/ https://haxe.org/blog/hashlink-in-depth-p2/

why switch to hashlink? and why isn't hashlink just called 'Neko 3.0'? https://haxe.org/blog/hashlink-indepth/ explains that 'Neko was my first complete virtual machine. It was designed before Haxe was even created and was meant to run various programming languages. In the end only two languages ended up targeting Neko: NekoML?, the language of the Neko compiler, and of course Haxe. Back then, the Neko virtual machine was not especially designed to run Haxe and suffered from some limitations, the main one being performance....Neko is very slow. The main reason for this is that it's dynamically typed". Hashlink by contrast is "strictly-typed register-based" (Neko is stack-based / a stack machine: [7].

At the beginning of each function, the type of each register (and, i guess, the number of registers) are declared.

Note that Hashlink DOES have a dyn (dynamic) type, however.

See also https://nekovm.org/faq/ for some comparisons between Neko and some other, older projects. https://github.com/HaxeFoundation/neko/blob/master/src/neko/Bytecode.nml https://nekovm.org/doc/lua/

---

for fantom:

"Fantom's type system is simple by design. All variables are statically typed, as they are in C# and Java. Fantom rejects generic types due to their complexity, but it does have a set of built-in generic types: List, Map, and Func. Fantom can also take on the feel of a dynamically typed language through dynamic calls and automatic downcasting. Fantom has an easy to use reflection API and metaprogramming capabilities. " [8]

---

for nim: " Features or modules that the JavaScript? platform does not support are not available. This includes:

    manual memory management (alloc, etc.)
    casting and other unsafe operations (cast operator, zeroMem, etc.)
    file management
    most modules of the standard library
    proper 64 bit integer arithmetic
    unsigned integer arithmetic

However, the modules strutils, math, and times are available! To access the DOM, use the dom module that is only available for the JavaScript? platform. " [9]

---

new backends for tensorflow (doesn't look very useful for us):

https://www.tensorflow.org/xla/developing_new_backend

---

see also ootVm.txt; that file is older and needs to be integrated into this series

---

so the current thinking is (this is mostly just a rehash of the layer list near the end of ootOvmNotes1.txt):

'Programming language implementation technology' language stages:

Oot-specific stages:

some remaining questions:

---

hmm i've been thinking about the question of 'where do we handle control flow that most target platforms won't support, e.g. continuations/goto and exceptions?'. There seems to be a conflict between making LoVM? something that reasonably maps to C on the one hand, and on the other hand, supporting these advanced control flow constructs so that we don't go OVM->LoVM?->Boot where both OVM and Boot can handle these things directly but LoVM? obscures them in another layer of interpretation (alternately, so that on platforms that do support these things, such as assembly or Scheme, we can directly/efficiently compile these control flow constructs without an unnecessary interpretive layer and without having to implement all of OVM directly on the target platform, skipping LoVM?).

how do other languages handle this? Now that i think about it, perhaps most of the languages i know with powerful control flow do implement it directly in assembly or at least something assembly-like such as LLVM without higher-levels calls?:

" Pointer tagging avoids looking up the info table

Haskell

mycase :: Maybe Int -> Int mycase x = case x of Just z -> z; Nothing -> 10

Cmm

mycase_entry() corresponds to forcing 'x' entry: R1 = R2; R1 = 'x' I64[Sp - 8] = mycase_exit; setup case continuation Sp = Sp - 8; if (R1 & 7 != 0) goto crL; check pointer tag to see if x eval'd jump I64[R1] (); x not eval'd, so eval exit: jump mycase_exit (); jump to case continuation

mycase_exit() case continuation entry: v::I64 = R1 & 7; get tag bits of 'x' and put in local variable 'v' if (_crD::I64 >= 2) goto crE; can use tag bits to check which constructor we have R1 = stg_INTLIKE_closure+417; 'Nothing' case Sp = Sp + 8; jump (I64[Sp + 0]) (); jump to continuation ~= return exit: R1 = I64[R1 + 6]; get 'z' thunk inside Just Sp = Sp + 8; R1 = R1 & (-8); clear tags on 'z' jump I64[R1] (); force 'z' thunk " -- https://web.archive.org/web/20191225135205/http://www.scs.stanford.edu/16wi-cs240h/slides/ghc-compiler.html

so, it seems like this stuff tends to require one of either:

toread: https://www.google.com/search?client=ubuntu&channel=fs&q=how+to+implement+delimited+continuation&ie=utf-8&oe=utf-8

---

this guy argues that if you have continuations at all, you should have multi-prompt delimited continuations, but that probably these aren't worth it: https://rain-1.github.io/scheme-continuations.html

incidentally, it's probably worth reading his blog "This page is for documenting things learned from implementing tarot, a basic self hosted scheme compiler that I did.": https://rain-1.github.io/scheme

---

here is that blog post on implementing delimited continuations in Guile, i finally know enough to understand it:

https://wingolog.org/archives/2010/02/26/guile-and-delimited-continuations

sounds like either a continuation-passing transform of everything, or direct stack manipulation, are required.

so, it seems like continuations tend to require one of either:

---

so my current feeling is that LoVM? should provide at least:

because these are things that seem to be related to the abstraction of function calling and its (possible) use of a stack.

so yeah, that means we are a little more than just a very C-like, Oberon-like thing -- we can still be simple, but we may not be able to compile code in a simple manner to C or Oberon. Is there some way that LoVM? programs could indicate to the LoVM? compiler when the extra fancy stuff is NOT required, that is, when we CAN compile function calls to simple C function calls?

---

should we provide a facility in LoVM? to allow code to choose whether a frame is allocated on the stack or on the heap, and whether other variables are allocated on the stack or on the heap?

---

if we're providing all of the above, then we're implementing the whole stack frame abstraction. In which case it probably makes sense to provide local variables as well, referred to by index, like Guile's VM. After all, for transpilation, it would be good to have LoVM? locals translate into locals on the target platform. One wrinkle here is that Guile's VM has all stack slots a uniform 64-bits (so that local variables can be referred to by index, instead of precomputed offsets from either T.O.S. or from the current frame pointer); i feel like we were aiming for LoVM? to be a lower-level language that does not force all stack slots into a standard size. Maybe declare variable types (like HashLink? and C), or (somewhat equivalently) provide stack maps with the source code.

however, if we're doing all this, then it seems to me that we almost may as well skip LoVM? and go right to OVM. Other stuff like garbage collection, maybe thunks, maybe even OOP probably needs to be intimiately tied to the frame abstraction stuff.

which makes me wonder if maybe we should pull back a little and just offer low-level instructions with which to interact directly with stacks, stack frames and slots in frames. This hurts readable transpilability but is good for a language in which to implement OVM. One middle ground would be to provide that stuff but also a 'simple call' that doesn't support any fancy stuff (possibly recursively in any function called from the simple call; so simply called things can be directly translated into C-like platforms, and can be inlined).

and what about setjmp/longjmp? at first glance it looks useful, but i think it still doesn't support frames on the heap, or even guarantee preserved frames in some cases necessary for coroutines.

---

and on the other hand, our 256 'registers' really already are the low-level, transpilable local variables, aren't they?

so maybe what we are looking for here is actually just to create instructions which, assuming that these registers are actually in stack frames, to go ahead and swap the 'current' stack frame, and to manipulate stack frames, where each frame is considered as an opaque thing. We'll probably need instructions to find the previous stack frame (walk the stack), to determine the type (layout) of each stack frame, to make a call and setup a new stack frame, to make a call and NOT setup a new stack frame (tail call), to demand storage of a frame on the heap, to indicate that some function and any functions called from it dont need any of this fancy functionality, to cut and paste contiguous sequences of stack frames (for delimited continuations), etc.

GuileVM? is probably a good example of this sort of thing.

---

i dunno, on the other hand, what's wrong with saying, "if the target platform is a less powerful language, like C, then LoVM? code transpiles simply but powerful Oot control flow will only be emulated; and if the target platform is a more powerful language, like Guile, then powerful Oot control flow can transpile simply but only if the porter implements OVM directly on the platform."

that is, maybe we should just have '(almost)-least-common-denominator' control flow in LoVM?, and put the delimited continuations in OVM. Otherwise, on platforms that don't have them natively, we are forcing the LoVM? implementor to implement them, which makes porting less likely to occur.

the thing that stops me from doing that though is the existence of KLambda, the Shen core language. We would be saying that in order to reasonably transpile to Shen's platform via KLambda, the porter has to implement all of OVM, which is probably much more work than implementing KLambda. On the other hand, Shen is the only language I know of with powerful control flow yet a focus on being easy to port via a small, specified core language, whereas there are tons of languages with more or less the same structured-style control flow (often plus exceptions) (although, i guess Scheme (small), the whole language, could be thought of as another example of this; still, for just two examples, maybe it's okay to say that for those two platforms, they may have to do the work of porting OVM rather than LoVM? -- anyways, even without porting OVM, they still get clean transpilation of simple control flow). Also, i haven't even looked at KLambda, I may have the wrong idea about it and maybe it's not as relevant as i think for this decision -- i'm not sure if it has delimited continuations, for example.

i guess my next step is just to actually look at KLambda:

notes on wacky stuff that klambda requires:

so, i don't see any delimited continuations there. I think currying/partial application and freeze and tail recursion are already the sort of thing that we were planning to do at the OVM level (except, maybe tail recursion should be at this level?).

so, 'least common denominator control flow in LoVM?' is still a possibility. I still can't decide though.

It would be nice to have the LoVM? layer encapsulate the control flow. Otherwise, is that layer even doing anything? Let's review the purposes of the LoVM?/OVM split, and what LoVM? is supposed to be doing:

But surely control flow is interrelated with concurrency scheduling, dynamic data representation, garbage collection, etc? Can LoVM? even be split from OVM? My guess is yes, because lots of languages are written in, and some even compile to, C, so there is a useful step in between assembly and a high-level VM. HOWEVER, a language whose goal is to be good at implementing OVM, and a language which is a good target language for compiled OVM, are two different goals; the second of these is what's giving us trouble, and the argument 'but C does this' only works for the first of these, because plenty of VMs don't compile to C.

stuff i should probably read before making this decision:

https://wingolog.org/archives/2011/07/12/static-single-assignment-for-functional-programmers (talks about CPS vs ANF vs SSA)

https://pdfs.semanticscholar.org/1de0/ccff4fad711ca43637f5c66c43a3e65db140.pdf ContinuationsintheJavaVirtualMachine? annoyingly, they call THEIR virtual machine Ovm too! I thought of that name a long time before seeing this paper! I guess there are only 26 three-letter acronyms that end with 'VM'!

https://eli.thegreenplace.net/2017/on-recursion-continuations-and-trampolines/

https://wingolog.org/archives/2014/01/12/a-continuation-passing-style-intermediate-language-for-guile

---

from https://wingolog.org/archives/2014/01/12/a-continuation-passing-style-intermediate-language-for-guile

" As far as intermediate languages go, basically there are two choices these days: something SSA-based, or something CPS-based. ... Why not ANF instead? If you recall from my SSA and CPS article, I mentioned that ANF is basically CPS with fewer labels. It tries to eliminate "administrative" continuations, whereas Guile's CPS labels everything. There is no short-hand let form.

ANF proponents tout its parsimony as a strength, but I do not understand this argument. I like having labels for everything...More importantly, labelling every control-flow point allows me to reason precisely about control flow... So I am happy with CPS, relative to ANF.

But what about SSA? In my previous article, I asked SSA proponents to imagine returning a block from a block. Of course it doesn't make any sense; SSA is a first-order language. But modern CPS is also first-order, is the thing! Modern CPS distinguishes "continuations" syntactically from functions, which is exactly the same as SSA's distinction between basic blocks and functions. CPS and SSA really are the same on this level.

The fundamental CPS versus SSA difference is, as Stephen Weeks noted a decade ago, one of data structures: do you group your expressions into basic blocks stored in a vector (SSA), or do you nest them into a scope tree (CPS)? It's not clear that I made the correct choice.

"

from https://wingolog.org/archives/2011/07/12/static-single-assignment-for-functional-programmers

" ...I am a functional programmer. By that I mean that lambda is my tribe... SSA is what the "machine tribe" uses as an intermediate langauge in their compilers...CPS is what "we" use... ... The machine tribe doesn't roll with closures, continuations, or tail calls. But they do have loops, and they crunch a lot of data. The most important thing for a compiler of a machine-tribe language like C is to produce efficient machine code for loops.

Clearly, I'm making some simplifications here. But if you look at a machine-tribe language like Java, you will be dealing with many control-flow constructs that are built-in to the language (for, while, etc.) instead of layered on top of recursion like loops in Scheme. What this means is that large, important parts of your program have already collapsed to a first-order control-flow graph problem. Layering other optimizations on top of this like inlining (the mother of all optimizations) only expands this first-order flow graph. More on "first-order" later.

So! After decades of struggling with this problem, after having abstracted away from assembly language to three-address register transfer language, finally the machine folks came up with something truly great: static single-assignment (SSA) form. The arc here is away from the machine, and towards more abstraction, in order to be able to optimize better, and thus generate better code. ... Peoples of the machine tribe, could you imagine returning a block as a value? Didn't think so. It doesn't make sense to return a label. But that's exactly what the lambda-calculus is about. One may represent blocks as functions, and use them as such, but one may also pass them as arguments and return them as values. Such blocks are of a higher order than the normal kind of block that is a jump target. Indeed it's the only way to express recursion in the basic lambda calculus.

That's what I mean when I say that CPS is good as a higher-order intermediate language, and when I say that SSA is a good first-order intermediate language.

If you have a fundamentally higher-order language, one in which you need to loop by recursion, then you have two options: do whole-program analysis to aggressively closure-convert your program to be first-order, and then you can use SSA, or use a higher-order IL, and use something more like CPS.

...

But if you can't do whole-program analysis -- maybe because you want to extend your program at runtime, support separate compilation, or whatever -- then you can't use SSA as a global IL. That's not to say that you shouldn't identify first-order segments of your program and apply SSA-like analysis and optimization on them, of course! That's really where the lambda tribe should go. "

---

btw two other misc notes from https://wingolog.org/archives/2014/01/12/a-continuation-passing-style-intermediate-language-for-guile

"the SSA approach of values having "implicit" names by their positions in a vector (instead of explicit ephemeral name-to-index mappings) is terrifying to me."

"One thing I did recently was to make all non-tail $call nodes require $kreceive continuations; if, as in the common case, extra values were unused, that was reflected in an unused rest argument."

and one from

"In the lambda tribe, we don't actually program in the lambda-calculus, you see. If you read any of our papers there's always a section in the beginning that defines the language we're working in, and then defines its semantics as a translation to the lambda-calculus.

This translation is always possible, for any programming language, and indeed Peter Landin did so in 1965 for Algol. Landin's original translations used his "J operator" to capture continuations, allowing a more direct translation of code into the lambda-calculus.

I wrote more on Landin, letrec, and the Y combinator a couple of years ago, but I wanted to mention one recent paper that takes a modern look at J, A Rational Deconstruction of Landin's J Operator. This paper is co-authored by V8 hacker Kevin Millikin, and cites work by V8 hackers Mads Sig Ager and Lasse R. Nielsen. Indeed all three seem to have had the privilege of having Olivier Danvy as PhD? advisor. That's my tribe!

Anyway, J was useful in the context of Landin's abstract SECD machine, used to investigate the semantics of programs and programming languages. However it does not help the implementor of a compiler to a normal machine, and intermediate languages are all about utility. The answer to this problem, for functional programmers, was to convert the source program to what is known as continuation-passing style (CPS). ...

Expressing a program in CPS has a number of practical advantages:

    CPS is capable of expressing higher-order control-flow, for languages in which functions may be passed as values.
    All temporary values are named. Unreferenced names represent dead code, or code compiled for effect only. Referenced names are the natural input to a register allocator.
    Continuations correspond to named basic blocks. Their names in the source code correspond to a natural flow analysis simply by tracing the definitions and uses of the names. Flow analysis enables more optimizations, like code motion.
    Full beta-reduction is sound on terms of this type, even in call-by-value languages.

...

Depending on how you implement your CPS language, you can also attach notes to different continuations to help your graph reduce further: this continuation is an effect context (because its formal parameter is unreferenced in its body, or because you knew that when you made it), so its caller can be processed for effect and not for value; this one is of variable arity (e.g. can receive one or two values), so we could jump directly to the right handler, depending on what we want; etc.

...

Note that nowhere in there did I mention Scheme's call-with-current-continuation! For me, the utility of CPS is in its explicit naming of temporaries, continuations, and its affordances for optimization. Call/cc is a rare construct in Guile Scheme, that could be optimized better with CPS, but one that I don't care a whole lot about, because it's often impossible to prove that the continuation doesn't escape, and in that case you're on the slow path anyway.

So that's CPS. Continuations compile to jumps within a function, and functions get compiled to closures, or labels for toplevel functions. The best reference I can give on it is Andrew Kennedy's 2007 paper, Compiling With Continuations, Continued. CWCC is a really fantastic paper and I highly recommend it.

...

Note that all calls to blocks are tail calls. Reminds you of CPS, doesn't it? For more, see Richard Kelsey's classic paper, A Correspondence Between Continuation-Passing Style and Static Single Assignment Form, or my earlier article about Landin, Steele, letrec, and labels.

But for a shorter, readable, entertaining piece, see Appel's SSA is Functional Programming. I agree with Appel that we in the lambda-tribe get too hung up on our formalisms, when sometimes the right thing to do is draw a flow-graph.

"

---

so why aren't we using either CPS or SSA? Because our layer cake is for a different purpose: not optimization, but making porting easier (mainly by encapsulating the differences between platforms)

so what's the purpose of LoVM?? i have a longer list above, but in line with the other two here, i might say that it is:

---

so far my takeaway from those Guile guy posts on CPS stuff is that:

---

seems like tail calls sort are just interprocedural GOTOs already. So in a way, we don't have to do anything special to implement them, provided LoVM? still has jump/goto instructions.

so this lines up nicely with the view that LoVM? should either provide a plethora of control flow options (call, tailcall, etc), and/or just provide primitives like set up this frame on the call stack, copy this frame to the heap, pop the top frame off the call stack and discard, recall this frame from the heap onto the call stack and/or just make it the current frame in a 'stackless' sort of way, do the stuff you need to do (e.g. push caller-save registers onto the stack) to make a conventional call to this address, etc.

and maybe even stuff like partially apply this function, etc, and maybe even stuff like create this thunk, execute this thunk.

and maybe with exceptions, since LLVM supports those?

So LoVM? would be like C, but with instructions for calling exposed like this, and with other options besides standard function calls.

continuations are not terribly far-fetched, as in a way, they are just a better setjmp/longjmp, which even C has

---

https://pdfs.semanticscholar.org/1de0/ccff4fad711ca43637f5c66c43a3e65db140.pdf simply describes what continuations do as capturing the call stack (or some part of it), local variables, and the instruction pointer.

they implement three kinds of continuations, and the user can choose which one they want:

" First-class continuations: This is the most general kind of continuations. There are no limitations with regard to thenumber of times a continuation can be called, or its lifetime. Since they need to copy the full execution stack, they are more costly than the other kinds of continuations.

Delimited continuations: The user first obtains a continuation bound which is then used to obtain a delimited continuation. Such a continuation is valid for as long as the execution stack does not exit its bound. An invalid continuation throws an exception when called. Valid delimited continuations can still be called any number of times. A delimited continuation costs less than a full continuation, since they only need to copy a subsection of the execution stack, given by the continuation bound and the point where callcc is called.

One-shot continuations: A one shot continuation can be called a single time. They are valid for as long as the execution does not leave the callcc frame4. They are the cheapest continuations available, since the system only has to save the instruction pointer. "

they talk about applications for each of these

---

" Computed gotos is basically a combination of two new features for C. The first is taking addresses of labels into a void*. void* labeladdr = &&somelabel; somelabel: code. The second is invoking goto on a variable expression instead of a compile-time-known label, i.e.: void* table[]; addresses goto *table[pc]; " [10]

---

so so far, it seems like our 'weird control structure' needs can be met by just a few primitives:

but we probably need a few more in order to deal with closures, right?

i recall that Lua has a tailcall instruction and one or a few instructions to deal with closures; could model ourselves after that.

so maybe we don't really need to get into frames directly in LoVM?, except as needed for closures.

and should we add something about exceptions, just because they are so common on platforms, even though delimited continuations can probably do them already? and something like that JVM's invokedynamic?

and should we do partial function application, because KLambda has it?

so i think this is shaping up well.

we are getting kind of close to the Lua virtual machine, which is nice.

---

mb instead of 16 registers per bank in Boot, or 256, we should have 32 or 64. Why?

eh, i dunno man, i'm pretty sure that most implementations will have to allocate the full max amount for each greenthread, which is a heft chunk of memory if we need millions of threads. Also, i want this to be at least somewhat appropriate for embedded, where the total amount of register state will hurt at least a little. I think we should stick with 16 now, actually.

also, should we reserve the first 4 regs (since they will be put to use in lOVM)? The first 8 (since we also want a memstack and a frame pointer and mb a link register and mb reserve one for the toolchain)? That's necessary if we want valid Boot code to be amenable to running within OVM later. But i guess not, because we want the lovm implementation in Boot to be able to implement that stuff.

---

yeah, i think we want LoVM? 'registers' to be locals, not registers. i think that is like Lua's VM after all, btw.

but we are still holding off on GC until OVM (i bet we'll at least need to add some instruction to give you the 'roots' of all of the frames)

so LoVM? is kind of like the Lua VM as regards frames, but like C in that there are explicit pointers and memory allocation. Like Lua VM we have tailcalls and closures, and like neither, we have delimited continuations.

---

so, LoVM? is the sorta-intersection of LLVM and WASM, but easier to implement than either of them, and with more powerful control flow. LoxVM? is like LoVM? but with some extensions for a more compact encoding: variable-length encoding and address modes; LoxVM? can be directly macrocompiled to lovm (provided that relative jumps don't get too big). LoVM? competes with LLVM and WASM.

the idea is that lovm is like assembly, except that it encapsulates function calling, "stack frames" ("activation frames"), and local variables. It also provides some features that both LLVM and WASM provide, such as arrays (i think).

lo is a language that compiles to lovm. It is just a lovm frontend, with few conveniences. It competes with BASIC, C, Oberon, Forth. It's sort of like C minus various features and plus powerful control flow, or like BASIC plus pointers and structured programming.

these are all unsafe, close-to-the-metal languages suitable for writing e.g. Oot's garbage collector.

---

with our current 64-bit encoding, we only have 256 opcodes. i worry that that will not quite be enough. otoh, we have a whole field for polymorphism, so we will probably be using many less opcodes than i think. also, if we really need to, we can use the polymorphism field for subopcodes, too.

---

i guess one of the things that is different about Oot, which explains the profileration of low-level target languages/compilation/interpretation stages, is that i want to have an HLL, which means it needs services that are usually provided in a runtime, but i want it to be portable, so i'm trying to develop an easily portable runtime, which means developing an easily portable language in which to write a runtime. i think C-- (Cmm) was supposed to be this, but i dunno if it hit the mark.

and the even more different thing is that i want interoperability with and maybe even transpilation to an HLL host language.

so, i dont want to just use LLVM or JVM or CLR, because they are too hard to port onto a host language.

but i do want to go the bytecode route, to make porting and porting of interoperation easy.

so my VM is going to be as complicated as the JVM, which i said was too complicated. So i need to specify a simpler language beneath it.

---

Cminor, from compcert https://xavierleroy.org/publi/compcert-backend.pdf starting page 18, looks pretty cool

---

some stuff that Cmm keeps around in registers:

" -- STG registers

Sp -- Stack ptr; points to last occupied stack location.
SpLim? -- Stack limit
Hp -- Heap ptr; points to last occupied heap location.
HpLim? -- Heap limit register
CurrentTSO? -- pointer to current thread's TSO
CurrentNursery? -- pointer to allocation area
HpAlloc? -- allocation count for heap check failure
  -- We keep the address of some commonly-called 
  -- functions in the register table, to keep code
  -- size down:
  | GCEnter1		-- stg_gc_enter_1
  | GCFun		-- stg_gc_fun
  -- Base offset for the register table, used for accessing registers
  -- which do not have real registers assigned to them.  This register
  -- will only appear after we have expanded GlobalReg into memory accesses
  -- (where necessary) in the native code generator.
  | BaseReg
  -- Base Register for PIC (position-independent code) calculations
  -- Only used inside the native code generator. It's exact meaning differs
  -- from platform to platform  (see compiler/nativeGen/PositionIndependentCode.hs).
  | PicBaseReg
  " -- https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler/cmm-type#global-registers-and-hints

---

should maybe also look in the ootLibrariesNotes.txt to see if we should include some fns that are in stdlibs in some languages. e.g. https://doc.rust-lang.org/core/index.html

---

remember to check Michelson

---

" First: in Rust, we can pass a fixed-length array by reference without it degrading into a pointer as it does in C. For instance,

fn get_element_3(array: [u8; 10]) -> u8 { This bounds check is trivially proven and will not be performed at runtime. array[3] }

This attempt to pass a 2-element array is a compile error. get_element_3([0; 2]);

Neither of those statements holds in C. As a result, we use fixed-length arrays in several places in the demo where we didn't in C++.

Second, if the length of an array is known (to us, the programmer) but not known (to the poor compiler), we can hoist bounds checks to a convenient place. For instance, this routine as written performs runtime bounds checks at each loop iteration:

Note that the array is a slice of runtime-determined length. fn fill_1024(array: &mut [u8], color: u8) { for i in 0..1024 { array[i] = color; } }

We can check the length outside the loop, and make the length visible to the compiler, like this:

fn fill_1024(array: &mut [u8], color: u8) { Perform an explicit, checked, slice of the array before entering the loop. let array = &mut array[..1024];

    for i in 0..1024 {
        array[i] = color;
    }} "

---

https://www.academia.edu/26447075/A_real-time_virtual_machine_implementation_for_small_microcontrollers

---

5.2 Using virtual machine abstraction to supportconcurrency

One abstract extension to a VM that was found to providegreatoverallvaluetothesystemisnonpreemptive,user-levelconcurrency. As shown in Fig.11, a VM that maintains itsown concurrency has advantages over a real-time operatingsystem.SwitchingcontextsinaVMcanbedoneinonevirtualinstruction? whereas a native RTOS-driven system requiressaving and restoring contexts, which may involve dozens of variables and the saving of a task state. Also, a concurrentVM is more controlled than a real-time operating systemtask switch, which can occur at almost any time requiringcritical regions of code to protect against race conditionsand errors. ...

.2.1 Implementing a task instruction to support concurrency New instructions were added to the Pascal Virtual Machineto support multitasking and message passing as high levelabstract instructions. A new task instruction was added tothe interpreter to support multitasking along with the neces-sary Pascal language extension as follows:

status := task(function); function: TASKINIT=0: status: 1=parent, 2=child TASKPEND=1: status: is new task ID TASKWHO=2: status: is my task ID TASKKILL=3: No status

With this new special function call, new tasks could becreated using the subfunction TASKINIT which performs asimilar function as the UNIX fork() command within a sin-glePCodeinstruction.ModelingtaskcreationaftertheUNIXforkcommandwasamatterofconvenienceforthisresearch?,although itdoes have some advantages.

...

Theinterpreter was modified to support up to five tasks (an arbi-trary choice that depends on the memory available). When anew task is created, an exact duplicate of the working stack of the parent task is created and a new entry is made in aTaskState table for the new child task. The return value fromthis function call indicates whether the currently executingtask is still the parent task (1) or the child task (0) that is anexact copy.Thereturnvaluecanthenbeusedtocauseeachofthetasksto follow their own execution path depending on whether itis the parent or the child task. As shown in Fig. 12, when a new task is created only four values need to be rememberedto save and restore the task; namely the PC, SP, MP, andEP registers. PC is the program counter, which is saved toremember what instruction this task was executing when itwas suspended so that it can pick up where it left off whenit starts running again. SP is the stack pointer which indi-cates where the current top of the stack is for this task. Sinceeach new task has its own copy of the stack, each task mustremember where that is in memory for its own stack. MP isthe mark pointer which indicates where the base of the cur-rentstackframeis.EPistheextremepointerwhichindicatesthetopofthecurrentstackframesothattheVMknowswhereto? put the next stack frame if another procedure or functionis called from the current context. There is no need to saveany working registers because the PMachine is stack basedand nothing else is stored in registers between instructionsbesideswhatisonthestack,makingtaskswitchingrelativelyfast and efficient.Another subfunction, TASKPEND, is to allow the cur-rentlyexecutingtasktovoluntarilygiveupthevirtualproces-sorandallowadifferenttasktorun. ...

TASKKILL can becalledfor a task to permanently kill itself and allow other tasks torun instead

5.3 Implementing a mail instruction to support concurrencyA second instruction was implemented to support messagepassing between tasks in the VM. The mail() instruction wascreated to handle this functionality:

Status := mail(function, parameter1, parameter2, parameter3); Function: MAILSEND=0 parameter1: destination task ID parameter2: length of message to send parameter3: buffer address of message to send

   MAILRCV=1
     parameter1: my task ID
     parameter2: size of message buffer for receiving
     parameter3: buffer address where to put received message

... In this implementation, a callby a task to receive mail will cause the task to pend if nomail is available. The task then becomes ready to run againwhen mail arrives. The task that sent the mail continues torun until it voluntarily gives up the processor or pends onits own mailbox. This implementation was chosen becausethe native operating system used in the timing comparisonperforms this way allowing the timing comparison to be fair.Another faster implementation might be to allow the receiv-ing task to wake up immediately which would make the sys-tem appear to switch tasks faster but is not commonly howembedded operating systems work.

...

Others showed that VMs are capable of doing things thatnative machine hardware is not effective at doing by imple-menting VMs that do not emulate real machines. The Esterel and the Harrison VMs are good examples of this [10,41]. Additionally,other researchshowed thatabstractionisawayto increase performance in a VM as shown by the Carbayo-nia and Hume VMs [18,19].

10. Craig ID (2005) Virtual Machines. Springer, New York 41. Plummer B, Khajanchi M, Edwards SA (2006) An Esterel VirtualMachine? for Embedded Systems. In: Proceedings of synchronouslanguages, applications, and programming (SLAP 2006) 18. Gutierrez DA, Soler FO (2008) Applying lightweight flexible vir-tual machines to extensible embedded systems. In: Proceedings of the1stworkshoponisolationandintegrationinembeddedsystems.ACM, Glasgow, pp 23–28 19. Hammond K, Michaelson G (2003) Hume: a domain-specific lan-guage for real-time embedded systems. In: Proceedings of the 2ndinternational conference on generative programming and compo-nent engineering. Springer, Erfurt, pp 37–56 "

-- https://www.academia.edu/26447075/A_real-time_virtual_machine_implementation_for_small_microcontrollers

---

so some things which were pushed from BootX? to LoVM?: