proj-oot-ootOvmNotes2

see also ootVm.txt; that file is older and needs to be integrated into this series

---

we really should look more closely at existing 'multitarget' programming languages:

the most well known/successful ones i can think of are:

others include:

lists: -- https://github.com/jashkenas/coffeescript/wiki/List-of-languages-that-compile-to-JS#multitarget

dead stuff that i don't think needs to be looked at:

---

we should also consider just targetting Haxe or Shen, which already have lots of targets

---

so how do haxe and shen work?

Haxe is an OCaml compiler that compiles to an AST and then has a separate backend for each target that compiles the AST to that target.

Shen compiles to a small core language, KLambda or KL, with about 46 functions (with only about 35 basis functions needed to express the rest), which then has a separate backend for each target.

i feel like haxe is more concerned with source-to-source 'transpilation'.

Shen porting guide: https://github.com/Shen-Language/shen-sources/blob/master/doc/porting.md

the intersection of haxe and shen (certified for Shen) target platforms:

this is almost also what Fantom supports, although Fantom targets JVM not Java -- Fantom uses a bytecode for CLR and JVM so it is less concerned about source-to-source 'transpilation'.

if you add in Shen 'uncertified' you also get C++ and Python, for both Haxe and Shen (but not Fantom)

fantom for JVM and CLR uses fcode, which is a bytecode [2] [3] [4], but for JS translates to JS at compile time [5]

see also http://www.fandev.org/doc/compiler/

docs and code about fcode: https://bitbucket.org/fantom/fan-1.0/src/default/src/compiler/fan/fcode/FOp.fan https://bitbucket.org/fantom/fan-1.0/src/default/src/compiler/fan/fcode/format.txt https://bitbucket.org/fantom/fan-1.0/src/default/src/compiler/fan/assembler/CodeAsm.fan

(forum messages regarding fcode: https://fantom.org/forum/topic/1668 https://fantom.org/forum/topic/1668 )

info about how good the various haxe targets are:

https://community.haxe.org/t/where-is-haxe-heading/405

the neko VM was written by the lead Haxe guy, and was the primary VM for haxe. Now Haxe is planning to replace Neko with Hashlink [6].

https://hashlink.haxe.org/ https://github.com/HaxeFoundation/hashlink/wiki https://haxe.org/blog/hashlink-indepth/ https://haxe.org/blog/hashlink-in-depth-p2/

why switch to hashlink? and why isn't hashlink just called 'Neko 3.0'? https://haxe.org/blog/hashlink-indepth/ explains that 'Neko was my first complete virtual machine. It was designed before Haxe was even created and was meant to run various programming languages. In the end only two languages ended up targeting Neko: NekoML?, the language of the Neko compiler, and of course Haxe. Back then, the Neko virtual machine was not especially designed to run Haxe and suffered from some limitations, the main one being performance....Neko is very slow. The main reason for this is that it's dynamically typed". Hashlink by contrast is "strictly-typed register-based" (Neko is stack-based / a stack machine: [7].

At the beginning of each function, the type of each register (and, i guess, the number of registers) are declared.

Note that Hashlink DOES have a dyn (dynamic) type, however.

See also https://nekovm.org/faq/ for some comparisons between Neko and some other, older projects. https://github.com/HaxeFoundation/neko/blob/master/src/neko/Bytecode.nml https://nekovm.org/doc/lua/

---

for fantom:

"Fantom's type system is simple by design. All variables are statically typed, as they are in C# and Java. Fantom rejects generic types due to their complexity, but it does have a set of built-in generic types: List, Map, and Func. Fantom can also take on the feel of a dynamically typed language through dynamic calls and automatic downcasting. Fantom has an easy to use reflection API and metaprogramming capabilities. " [8]

---

for nim: " Features or modules that the JavaScript? platform does not support are not available. This includes:

    manual memory management (alloc, etc.)
    casting and other unsafe operations (cast operator, zeroMem, etc.)
    file management
    most modules of the standard library
    proper 64 bit integer arithmetic
    unsigned integer arithmetic

However, the modules strutils, math, and times are available! To access the DOM, use the dom module that is only available for the JavaScript? platform. " [9]

---

new backends for tensorflow (doesn't look very useful for us):

https://www.tensorflow.org/xla/developing_new_backend

---

see also ootVm.txt; that file is older and needs to be integrated into this series

---

so the current thinking is (this is mostly just a rehash of the layer list near the end of ootOvmNotes1.txt):

'Programming language implementation technology' language stages:

Oot-specific stages:

some remaining questions:

---

hmm i've been thinking about the question of 'where do we handle control flow that most target platforms won't support, e.g. continuations/goto and exceptions?'. There seems to be a conflict between making LoVM? something that reasonably maps to C on the one hand, and on the other hand, supporting these advanced control flow constructs so that we don't go OVM->LoVM?->Boot where both OVM and Boot can handle these things directly but LoVM? obscures them in another layer of interpretation (alternately, so that on platforms that do support these things, such as assembly or Scheme, we can directly/efficiently compile these control flow constructs without an unnecessary interpretive layer and without having to implement all of OVM directly on the target platform, skipping LoVM?).

how do other languages handle this? Now that i think about it, perhaps most of the languages i know with powerful control flow do implement it directly in assembly or at least something assembly-like such as LLVM without higher-levels calls?:

" Pointer tagging avoids looking up the info table

Haskell

mycase :: Maybe Int -> Int mycase x = case x of Just z -> z; Nothing -> 10

Cmm

mycase_entry() corresponds to forcing 'x' entry: R1 = R2; R1 = 'x' I64[Sp - 8] = mycase_exit; setup case continuation Sp = Sp - 8; if (R1 & 7 != 0) goto crL; check pointer tag to see if x eval'd jump I64[R1] (); x not eval'd, so eval exit: jump mycase_exit (); jump to case continuation

mycase_exit() case continuation entry: v::I64 = R1 & 7; get tag bits of 'x' and put in local variable 'v' if (_crD::I64 >= 2) goto crE; can use tag bits to check which constructor we have R1 = stg_INTLIKE_closure+417; 'Nothing' case Sp = Sp + 8; jump (I64[Sp + 0]) (); jump to continuation ~= return exit: R1 = I64[R1 + 6]; get 'z' thunk inside Just Sp = Sp + 8; R1 = R1 & (-8); clear tags on 'z' jump I64[R1] (); force 'z' thunk " -- https://web.archive.org/web/20191225135205/http://www.scs.stanford.edu/16wi-cs240h/slides/ghc-compiler.html

so, it seems like this stuff tends to require one of either:

toread: https://www.google.com/search?client=ubuntu&channel=fs&q=how+to+implement+delimited+continuation&ie=utf-8&oe=utf-8

---

this guy argues that if you have continuations at all, you should have multi-prompt delimited continuations, but that probably these aren't worth it: https://rain-1.github.io/scheme-continuations.html

incidentally, it's probably worth reading his blog "This page is for documenting things learned from implementing tarot, a basic self hosted scheme compiler that I did.": https://rain-1.github.io/scheme

---

here is that blog post on implementing delimited continuations in Guile, i finally know enough to understand it:

https://wingolog.org/archives/2010/02/26/guile-and-delimited-continuations

sounds like either a continuation-passing transform of everything, or direct stack manipulation, are required.

so, it seems like continuations tend to require one of either:

---

so my current feeling is that LoVM? should provide at least:

because these are things that seem to be related to the abstraction of function calling and its (possible) use of a stack.

so yeah, that means we are a little more than just a very C-like, Oberon-like thing -- we can still be simple, but we may not be able to compile code in a simple manner to C or Oberon. Is there some way that LoVM? programs could indicate to the LoVM? compiler when the extra fancy stuff is NOT required, that is, when we CAN compile function calls to simple C function calls?

---

should we provide a facility in LoVM? to allow code to choose whether a frame is allocated on the stack or on the heap, and whether other variables are allocated on the stack or on the heap?

---

if we're providing all of the above, then we're implementing the whole stack frame abstraction. In which case it probably makes sense to provide local variables as well, referred to by index, like Guile's VM. After all, for transpilation, it would be good to have LoVM? locals translate into locals on the target platform. One wrinkle here is that Guile's VM has all stack slots a uniform 64-bits (so that local variables can be referred to by index, instead of precomputed offsets from either T.O.S. or from the current frame pointer); i feel like we were aiming for LoVM? to be a lower-level language that does not force all stack slots into a standard size. Maybe declare variable types (like HashLink? and C), or (somewhat equivalently) provide stack maps with the source code.

however, if we're doing all this, then it seems to me that we almost may as well skip LoVM? and go right to OVM. Other stuff like garbage collection, maybe thunks, maybe even OOP probably needs to be intimiately tied to the frame abstraction stuff.

which makes me wonder if maybe we should pull back a little and just offer low-level instructions with which to interact directly with stacks, stack frames and slots in frames. This hurts readable transpilability but is good for a language in which to implement OVM. One middle ground would be to provide that stuff but also a 'simple call' that doesn't support any fancy stuff (possibly recursively in any function called from the simple call; so simply called things can be directly translated into C-like platforms, and can be inlined).

and what about setjmp/longjmp? at first glance it looks useful, but i think it still doesn't support frames on the heap, or even guarantee preserved frames in some cases necessary for coroutines.

---

and on the other hand, our 256 'registers' really already are the low-level, transpilable local variables, aren't they?

so maybe what we are looking for here is actually just to create instructions which, assuming that these registers are actually in stack frames, to go ahead and swap the 'current' stack frame, and to manipulate stack frames, where each frame is considered as an opaque thing. We'll probably need instructions to find the previous stack frame (walk the stack), to determine the type (layout) of each stack frame, to make a call and setup a new stack frame, to make a call and NOT setup a new stack frame (tail call), to demand storage of a frame on the heap, to indicate that some function and any functions called from it dont need any of this fancy functionality, to cut and paste contiguous sequences of stack frames (for delimited continuations), etc.

GuileVM? is probably a good example of this sort of thing.

---

i dunno, on the other hand, what's wrong with saying, "if the target platform is a less powerful language, like C, then LoVM? code transpiles simply but powerful Oot control flow will only be emulated; and if the target platform is a more powerful language, like Guile, then powerful Oot control flow can transpile simply but only if the porter implements OVM directly on the platform."

that is, maybe we should just have '(almost)-least-common-denominator' control flow in LoVM?, and put the delimited continuations in OVM. Otherwise, on platforms that don't have them natively, we are forcing the LoVM? implementor to implement them, which makes porting less likely to occur.

the thing that stops me from doing that though is the existence of KLambda, the Shen core language. We would be saying that in order to reasonably transpile to Shen's platform via KLambda, the porter has to implement all of OVM, which is probably much more work than implementing KLambda. On the other hand, Shen is the only language I know of with powerful control flow yet a focus on being easy to port via a small, specified core language, whereas there are tons of languages with more or less the same structured-style control flow (often plus exceptions) (although, i guess Scheme (small), the whole language, could be thought of as another example of this; still, for just two examples, maybe it's okay to say that for those two platforms, they may have to do the work of porting OVM rather than LoVM? -- anyways, even without porting OVM, they still get clean transpilation of simple control flow). Also, i haven't even looked at KLambda, I may have the wrong idea about it and maybe it's not as relevant as i think for this decision -- i'm not sure if it has delimited continuations, for example.

i guess my next step is just to actually look at KLambda:

notes on wacky stuff that klambda requires:

so, i don't see any delimited continuations there. I think currying/partial application and freeze and tail recursion are already the sort of thing that we were planning to do at the OVM level (except, maybe tail recursion should be at this level?).

so, 'least common denominator control flow in LoVM?' is still a possibility. I still can't decide though.

It would be nice to have the LoVM? layer encapsulate the control flow. Otherwise, is that layer even doing anything? Let's review the purposes of the LoVM?/OVM split, and what LoVM? is supposed to be doing:

But surely control flow is interrelated with concurrency scheduling, dynamic data representation, garbage collection, etc? Can LoVM? even be split from OVM? My guess is yes, because lots of languages are written in, and some even compile to, C, so there is a useful step in between assembly and a high-level VM. HOWEVER, a language whose goal is to be good at implementing OVM, and a language which is a good target language for compiled OVM, are two different goals; the second of these is what's giving us trouble, and the argument 'but C does this' only works for the first of these, because plenty of VMs don't compile to C.

stuff i should probably read before making this decision:

https://wingolog.org/archives/2011/07/12/static-single-assignment-for-functional-programmers (talks about CPS vs ANF vs SSA)

https://pdfs.semanticscholar.org/1de0/ccff4fad711ca43637f5c66c43a3e65db140.pdf ContinuationsintheJavaVirtualMachine? annoyingly, they call THEIR virtual machine Ovm too! I thought of that name a long time before seeing this paper! I guess there are only 26 three-letter acronyms that end with 'VM'!

https://eli.thegreenplace.net/2017/on-recursion-continuations-and-trampolines/

https://wingolog.org/archives/2014/01/12/a-continuation-passing-style-intermediate-language-for-guile

---

from https://wingolog.org/archives/2014/01/12/a-continuation-passing-style-intermediate-language-for-guile

" As far as intermediate languages go, basically there are two choices these days: something SSA-based, or something CPS-based. ... Why not ANF instead? If you recall from my SSA and CPS article, I mentioned that ANF is basically CPS with fewer labels. It tries to eliminate "administrative" continuations, whereas Guile's CPS labels everything. There is no short-hand let form.

ANF proponents tout its parsimony as a strength, but I do not understand this argument. I like having labels for everything...More importantly, labelling every control-flow point allows me to reason precisely about control flow... So I am happy with CPS, relative to ANF.

But what about SSA? In my previous article, I asked SSA proponents to imagine returning a block from a block. Of course it doesn't make any sense; SSA is a first-order language. But modern CPS is also first-order, is the thing! Modern CPS distinguishes "continuations" syntactically from functions, which is exactly the same as SSA's distinction between basic blocks and functions. CPS and SSA really are the same on this level.

The fundamental CPS versus SSA difference is, as Stephen Weeks noted a decade ago, one of data structures: do you group your expressions into basic blocks stored in a vector (SSA), or do you nest them into a scope tree (CPS)? It's not clear that I made the correct choice.

"

from https://wingolog.org/archives/2011/07/12/static-single-assignment-for-functional-programmers

" ...I am a functional programmer. By that I mean that lambda is my tribe... SSA is what the "machine tribe" uses as an intermediate langauge in their compilers...CPS is what "we" use... ... The machine tribe doesn't roll with closures, continuations, or tail calls. But they do have loops, and they crunch a lot of data. The most important thing for a compiler of a machine-tribe language like C is to produce efficient machine code for loops.

Clearly, I'm making some simplifications here. But if you look at a machine-tribe language like Java, you will be dealing with many control-flow constructs that are built-in to the language (for, while, etc.) instead of layered on top of recursion like loops in Scheme. What this means is that large, important parts of your program have already collapsed to a first-order control-flow graph problem. Layering other optimizations on top of this like inlining (the mother of all optimizations) only expands this first-order flow graph. More on "first-order" later.

So! After decades of struggling with this problem, after having abstracted away from assembly language to three-address register transfer language, finally the machine folks came up with something truly great: static single-assignment (SSA) form. The arc here is away from the machine, and towards more abstraction, in order to be able to optimize better, and thus generate better code. ... Peoples of the machine tribe, could you imagine returning a block as a value? Didn't think so. It doesn't make sense to return a label. But that's exactly what the lambda-calculus is about. One may represent blocks as functions, and use them as such, but one may also pass them as arguments and return them as values. Such blocks are of a higher order than the normal kind of block that is a jump target. Indeed it's the only way to express recursion in the basic lambda calculus.

That's what I mean when I say that CPS is good as a higher-order intermediate language, and when I say that SSA is a good first-order intermediate language.

If you have a fundamentally higher-order language, one in which you need to loop by recursion, then you have two options: do whole-program analysis to aggressively closure-convert your program to be first-order, and then you can use SSA, or use a higher-order IL, and use something more like CPS.

...

But if you can't do whole-program analysis -- maybe because you want to extend your program at runtime, support separate compilation, or whatever -- then you can't use SSA as a global IL. That's not to say that you shouldn't identify first-order segments of your program and apply SSA-like analysis and optimization on them, of course! That's really where the lambda tribe should go. "

---

btw two other misc notes from https://wingolog.org/archives/2014/01/12/a-continuation-passing-style-intermediate-language-for-guile

"the SSA approach of values having "implicit" names by their positions in a vector (instead of explicit ephemeral name-to-index mappings) is terrifying to me."

"One thing I did recently was to make all non-tail $call nodes require $kreceive continuations; if, as in the common case, extra values were unused, that was reflected in an unused rest argument."

and one from

"In the lambda tribe, we don't actually program in the lambda-calculus, you see. If you read any of our papers there's always a section in the beginning that defines the language we're working in, and then defines its semantics as a translation to the lambda-calculus.

This translation is always possible, for any programming language, and indeed Peter Landin did so in 1965 for Algol. Landin's original translations used his "J operator" to capture continuations, allowing a more direct translation of code into the lambda-calculus.

I wrote more on Landin, letrec, and the Y combinator a couple of years ago, but I wanted to mention one recent paper that takes a modern look at J, A Rational Deconstruction of Landin's J Operator. This paper is co-authored by V8 hacker Kevin Millikin, and cites work by V8 hackers Mads Sig Ager and Lasse R. Nielsen. Indeed all three seem to have had the privilege of having Olivier Danvy as PhD? advisor. That's my tribe!

Anyway, J was useful in the context of Landin's abstract SECD machine, used to investigate the semantics of programs and programming languages. However it does not help the implementor of a compiler to a normal machine, and intermediate languages are all about utility. The answer to this problem, for functional programmers, was to convert the source program to what is known as continuation-passing style (CPS). ...

Expressing a program in CPS has a number of practical advantages:

    CPS is capable of expressing higher-order control-flow, for languages in which functions may be passed as values.
    All temporary values are named. Unreferenced names represent dead code, or code compiled for effect only. Referenced names are the natural input to a register allocator.
    Continuations correspond to named basic blocks. Their names in the source code correspond to a natural flow analysis simply by tracing the definitions and uses of the names. Flow analysis enables more optimizations, like code motion.
    Full beta-reduction is sound on terms of this type, even in call-by-value languages.

...

Depending on how you implement your CPS language, you can also attach notes to different continuations to help your graph reduce further: this continuation is an effect context (because its formal parameter is unreferenced in its body, or because you knew that when you made it), so its caller can be processed for effect and not for value; this one is of variable arity (e.g. can receive one or two values), so we could jump directly to the right handler, depending on what we want; etc.

...

Note that nowhere in there did I mention Scheme's call-with-current-continuation! For me, the utility of CPS is in its explicit naming of temporaries, continuations, and its affordances for optimization. Call/cc is a rare construct in Guile Scheme, that could be optimized better with CPS, but one that I don't care a whole lot about, because it's often impossible to prove that the continuation doesn't escape, and in that case you're on the slow path anyway.

So that's CPS. Continuations compile to jumps within a function, and functions get compiled to closures, or labels for toplevel functions. The best reference I can give on it is Andrew Kennedy's 2007 paper, Compiling With Continuations, Continued. CWCC is a really fantastic paper and I highly recommend it.

...

Note that all calls to blocks are tail calls. Reminds you of CPS, doesn't it? For more, see Richard Kelsey's classic paper, A Correspondence Between Continuation-Passing Style and Static Single Assignment Form, or my earlier article about Landin, Steele, letrec, and labels.

But for a shorter, readable, entertaining piece, see Appel's SSA is Functional Programming. I agree with Appel that we in the lambda-tribe get too hung up on our formalisms, when sometimes the right thing to do is draw a flow-graph.

"

---

so why aren't we using either CPS or SSA? Because our layer cake is for a different purpose: not optimization, but making porting easier (mainly by encapsulating the differences between platforms)

so what's the purpose of LoVM?? i have a longer list above, but in line with the other two here, i might say that it is:

---

so far my takeaway from those Guile guy posts on CPS stuff is that:

---

seems like tail calls sort are just interprocedural GOTOs already. So in a way, we don't have to do anything special to implement them, provided LoVM? still has jump/goto instructions.

so this lines up nicely with the view that LoVM? should either provide a plethora of control flow options (call, tailcall, etc), and/or just provide primitives like set up this frame on the call stack, copy this frame to the heap, pop the top frame off the call stack and discard, recall this frame from the heap onto the call stack and/or just make it the current frame in a 'stackless' sort of way, do the stuff you need to do (e.g. push caller-save registers onto the stack) to make a conventional call to this address, etc.

and maybe even stuff like partially apply this function, etc, and maybe even stuff like create this thunk, execute this thunk.

and maybe with exceptions, since LLVM supports those?

So LoVM? would be like C, but with instructions for calling exposed like this, and with other options besides standard function calls.

continuations are not terribly far-fetched, as in a way, they are just a better setjmp/longjmp, which even C has

---

https://pdfs.semanticscholar.org/1de0/ccff4fad711ca43637f5c66c43a3e65db140.pdf simply describes what continuations do as capturing the call stack (or some part of it), local variables, and the instruction pointer.

they implement three kinds of continuations, and the user can choose which one they want:

" First-class continuations: This is the most general kind of continuations. There are no limitations with regard to thenumber of times a continuation can be called, or its lifetime. Since they need to copy the full execution stack, they are more costly than the other kinds of continuations.

Delimited continuations: The user first obtains a continuation bound which is then used to obtain a delimited continuation. Such a continuation is valid for as long as the execution stack does not exit its bound. An invalid continuation throws an exception when called. Valid delimited continuations can still be called any number of times. A delimited continuation costs less than a full continuation, since they only need to copy a subsection of the execution stack, given by the continuation bound and the point where callcc is called.

One-shot continuations: A one shot continuation can be called a single time. They are valid for as long as the execution does not leave the callcc frame4. They are the cheapest continuations available, since the system only has to save the instruction pointer. "

they talk about applications for each of these

---

" Computed gotos is basically a combination of two new features for C. The first is taking addresses of labels into a void*. void* labeladdr = &&somelabel; somelabel: code. The second is invoking goto on a variable expression instead of a compile-time-known label, i.e.: void* table[]; addresses goto *table[pc]; " [10]

---

so so far, it seems like our 'weird control structure' needs can be met by just a few primitives:

but we probably need a few more in order to deal with closures, right?

i recall that Lua has a tailcall instruction and one or a few instructions to deal with closures; could model ourselves after that.

so maybe we don't really need to get into frames directly in LoVM?, except as needed for closures.

and should we add something about exceptions, just because they are so common on platforms, even though delimited continuations can probably do them already? and something like that JVM's invokedynamic?

and should we do partial function application, because KLambda has it?

so i think this is shaping up well.

we are getting kind of close to the Lua virtual machine, which is nice.

---

mb instead of 16 registers per bank in Boot, or 256, we should have 32 or 64. Why?

eh, i dunno man, i'm pretty sure that most implementations will have to allocate the full max amount for each greenthread, which is a heft chunk of memory if we need millions of threads. Also, i want this to be at least somewhat appropriate for embedded, where the total amount of register state will hurt at least a little. I think we should stick with 16 now, actually.

also, should we reserve the first 4 regs (since they will be put to use in lOVM)? The first 8 (since we also want a memstack and a frame pointer and mb a link register and mb reserve one for the toolchain)? That's necessary if we want valid Boot code to be amenable to running within OVM later. But i guess not, because we want the lovm implementation in Boot to be able to implement that stuff.

---

yeah, i think we want LoVM? 'registers' to be locals, not registers. i think that is like Lua's VM after all, btw.

but we are still holding off on GC until OVM (i bet we'll at least need to add some instruction to give you the 'roots' of all of the frames)

so LoVM? is kind of like the Lua VM as regards frames, but like C in that there are explicit pointers and memory allocation. Like Lua VM we have tailcalls and closures, and like neither, we have delimited continuations.

---

so, LoVM? is the sorta-intersection of LLVM and WASM, but easier to implement than either of them, and with more powerful control flow. LoxVM? is like LoVM? but with some extensions for a more compact encoding: variable-length encoding and address modes; LoxVM? can be directly macrocompiled to lovm (provided that relative jumps don't get too big). LoVM? competes with LLVM and WASM.

the idea is that lovm is like assembly, except that it encapsulates function calling, "stack frames" ("activation frames"), and local variables. It also provides some features that both LLVM and WASM provide, such as arrays (i think).

lo is a language that compiles to lovm. It is just a lovm frontend, with few conveniences. It competes with BASIC, C, Oberon, Forth. It's sort of like C minus various features and plus powerful control flow, or like BASIC plus pointers and structured programming.

these are all unsafe, close-to-the-metal languages suitable for writing e.g. Oot's garbage collector.

---

with our current 64-bit encoding, we only have 256 opcodes. i worry that that will not quite be enough. otoh, we have a whole field for polymorphism, so we will probably be using many less opcodes than i think. also, if we really need to, we can use the polymorphism field for subopcodes, too.

---

i guess one of the things that is different about Oot, which explains the profileration of low-level target languages/compilation/interpretation stages, is that i want to have an HLL, which means it needs services that are usually provided in a runtime, but i want it to be portable, so i'm trying to develop an easily portable runtime, which means developing an easily portable language in which to write a runtime. i think C-- (Cmm) was supposed to be this, but i dunno if it hit the mark.

and the even more different thing is that i want interoperability with and maybe even transpilation to an HLL host language.

so, i dont want to just use LLVM or JVM or CLR, because they are too hard to port onto a host language.

but i do want to go the bytecode route, to make porting and porting of interoperation easy.

so my VM is going to be as complicated as the JVM, which i said was too complicated. So i need to specify a simpler language beneath it.

---

Cminor, from compcert https://xavierleroy.org/publi/compcert-backend.pdf starting page 18, looks pretty cool

---

some stuff that Cmm keeps around in registers:

" -- STG registers

Sp -- Stack ptr; points to last occupied stack location.
SpLim? -- Stack limit
Hp -- Heap ptr; points to last occupied heap location.
HpLim? -- Heap limit register
CurrentTSO? -- pointer to current thread's TSO
CurrentNursery? -- pointer to allocation area
HpAlloc? -- allocation count for heap check failure
  -- We keep the address of some commonly-called 
  -- functions in the register table, to keep code
  -- size down:
  | GCEnter1		-- stg_gc_enter_1
  | GCFun		-- stg_gc_fun
  -- Base offset for the register table, used for accessing registers
  -- which do not have real registers assigned to them.  This register
  -- will only appear after we have expanded GlobalReg into memory accesses
  -- (where necessary) in the native code generator.
  | BaseReg
  -- Base Register for PIC (position-independent code) calculations
  -- Only used inside the native code generator. It's exact meaning differs
  -- from platform to platform  (see compiler/nativeGen/PositionIndependentCode.hs).
  | PicBaseReg
  " -- https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler/cmm-type#global-registers-and-hints

---

should maybe also look in the ootLibrariesNotes.txt to see if we should include some fns that are in stdlibs in some languages. e.g. https://doc.rust-lang.org/core/index.html

---

remember to check Michelson

---

" First: in Rust, we can pass a fixed-length array by reference without it degrading into a pointer as it does in C. For instance,

fn get_element_3(array: [u8; 10]) -> u8 { This bounds check is trivially proven and will not be performed at runtime. array[3] }

This attempt to pass a 2-element array is a compile error. get_element_3([0; 2]);

Neither of those statements holds in C. As a result, we use fixed-length arrays in several places in the demo where we didn't in C++.

Second, if the length of an array is known (to us, the programmer) but not known (to the poor compiler), we can hoist bounds checks to a convenient place. For instance, this routine as written performs runtime bounds checks at each loop iteration:

Note that the array is a slice of runtime-determined length. fn fill_1024(array: &mut [u8], color: u8) { for i in 0..1024 { array[i] = color; } }

We can check the length outside the loop, and make the length visible to the compiler, like this:

fn fill_1024(array: &mut [u8], color: u8) { Perform an explicit, checked, slice of the array before entering the loop. let array = &mut array[..1024];

    for i in 0..1024 {
        array[i] = color;
    }} "

---

https://www.academia.edu/26447075/A_real-time_virtual_machine_implementation_for_small_microcontrollers

---

5.2 Using virtual machine abstraction to supportconcurrency

One abstract extension to a VM that was found to providegreatoverallvaluetothesystemisnonpreemptive,user-levelconcurrency. As shown in Fig.11, a VM that maintains itsown concurrency has advantages over a real-time operatingsystem.SwitchingcontextsinaVMcanbedoneinonevirtualinstruction? whereas a native RTOS-driven system requiressaving and restoring contexts, which may involve dozens of variables and the saving of a task state. Also, a concurrentVM is more controlled than a real-time operating systemtask switch, which can occur at almost any time requiringcritical regions of code to protect against race conditionsand errors. ...

.2.1 Implementing a task instruction to support concurrency New instructions were added to the Pascal Virtual Machineto support multitasking and message passing as high levelabstract instructions. A new task instruction was added tothe interpreter to support multitasking along with the neces-sary Pascal language extension as follows:

status := task(function); function: TASKINIT=0: status: 1=parent, 2=child TASKPEND=1: status: is new task ID TASKWHO=2: status: is my task ID TASKKILL=3: No status

With this new special function call, new tasks could becreated using the subfunction TASKINIT which performs asimilar function as the UNIX fork() command within a sin-glePCodeinstruction.ModelingtaskcreationaftertheUNIXforkcommandwasamatterofconvenienceforthisresearch?,although itdoes have some advantages.

...

Theinterpreter was modified to support up to five tasks (an arbi-trary choice that depends on the memory available). When anew task is created, an exact duplicate of the working stack of the parent task is created and a new entry is made in aTaskState table for the new child task. The return value fromthis function call indicates whether the currently executingtask is still the parent task (1) or the child task (0) that is anexact copy.Thereturnvaluecanthenbeusedtocauseeachofthetasksto follow their own execution path depending on whether itis the parent or the child task. As shown in Fig. 12, when a new task is created only four values need to be rememberedto save and restore the task; namely the PC, SP, MP, andEP registers. PC is the program counter, which is saved toremember what instruction this task was executing when itwas suspended so that it can pick up where it left off whenit starts running again. SP is the stack pointer which indi-cates where the current top of the stack is for this task. Sinceeach new task has its own copy of the stack, each task mustremember where that is in memory for its own stack. MP isthe mark pointer which indicates where the base of the cur-rentstackframeis.EPistheextremepointerwhichindicatesthetopofthecurrentstackframesothattheVMknowswhereto? put the next stack frame if another procedure or functionis called from the current context. There is no need to saveany working registers because the PMachine is stack basedand nothing else is stored in registers between instructionsbesideswhatisonthestack,makingtaskswitchingrelativelyfast and efficient.Another subfunction, TASKPEND, is to allow the cur-rentlyexecutingtasktovoluntarilygiveupthevirtualproces-sorandallowadifferenttasktorun. ...

TASKKILL can becalledfor a task to permanently kill itself and allow other tasks torun instead

5.3 Implementing a mail instruction to support concurrencyA second instruction was implemented to support messagepassing between tasks in the VM. The mail() instruction wascreated to handle this functionality:

Status := mail(function, parameter1, parameter2, parameter3); Function: MAILSEND=0 parameter1: destination task ID parameter2: length of message to send parameter3: buffer address of message to send

   MAILRCV=1
     parameter1: my task ID
     parameter2: size of message buffer for receiving
     parameter3: buffer address where to put received message

... In this implementation, a callby a task to receive mail will cause the task to pend if nomail is available. The task then becomes ready to run againwhen mail arrives. The task that sent the mail continues torun until it voluntarily gives up the processor or pends onits own mailbox. This implementation was chosen becausethe native operating system used in the timing comparisonperforms this way allowing the timing comparison to be fair.Another faster implementation might be to allow the receiv-ing task to wake up immediately which would make the sys-tem appear to switch tasks faster but is not commonly howembedded operating systems work.

...

Others showed that VMs are capable of doing things thatnative machine hardware is not effective at doing by imple-menting VMs that do not emulate real machines. The Esterel and the Harrison VMs are good examples of this [10,41]. Additionally,other researchshowed thatabstractionisawayto increase performance in a VM as shown by the Carbayo-nia and Hume VMs [18,19].

10. Craig ID (2005) Virtual Machines. Springer, New York 41. Plummer B, Khajanchi M, Edwards SA (2006) An Esterel VirtualMachine? for Embedded Systems. In: Proceedings of synchronouslanguages, applications, and programming (SLAP 2006) 18. Gutierrez DA, Soler FO (2008) Applying lightweight flexible vir-tual machines to extensible embedded systems. In: Proceedings of the1stworkshoponisolationandintegrationinembeddedsystems.ACM, Glasgow, pp 23–28 19. Hammond K, Michaelson G (2003) Hume: a domain-specific lan-guage for real-time embedded systems. In: Proceedings of the 2ndinternational conference on generative programming and compo-nent engineering. Springer, Erfurt, pp 37–56 "

-- https://www.academia.edu/26447075/A_real-time_virtual_machine_implementation_for_small_microcontrollers

---

so some things which were pushed from BootX? to LoVM?:

what else?

---

'boundedLoad' and 'boundedStore' for security

---

https://llvm.org/docs/Coroutines.html

---

i'm getting pretty excited about Loot. I think it has the potential to be 'a thin layer over assembly' more than C is (by having less features, e.g. implicit conversions, https://en.wikipedia.org/wiki/Man_or_boy_test although it looks that page says that C already doesn't do that in that nested function B can't refer to A's variables (which is also the restriction i was thinking of), variadic functions, functions with many arguments), although not quite because it would be more portable and eschew (direct) casting (although, like C, it would not be low-level in the sense of C Is Not a Low-level Language).

Some restrictions of LOVM/Loot compared to C:

mb see also http://damienkatz.net/2013/01/the_unreasonable_effectiveness_of_c.html https://news.ycombinator.com/item?id=8668680 ---

Lisp, Forth, C, Basic, Oberon, JVM/CLR are sometimes considered as 'simple' lower-level languages

Lisp and Forth are quite different though. Both have a certain syntactic and semantic simplicity. But Lisp does not have manual memory management, and it even does some fancy things with activation records (closures). And, of course, Lisp has local variables (although some Forths do, eg https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Local-Variables-Tutorial.html). And, of course, in Lisp the mechanism for passing parameters and return addresses to functions is invisible, whereas in Forth it is visible.

Lisp also has continuations. Closures and continuations make it hard to even compile Lisp to C in a straightforward way.

For these reasons, i guess Lisp is higher level than Forth.

https://wiki.c2.com/?ForthVsLisp makes some other points:

---

i guess one could say that Forth is 'extensible assembly' if you take the model given in http://yosefk.com/blog/my-history-with-forth-stack-machines.html

:

" A Forth interpreter keeps an instruction pointer into this array (ip), a data stack (ds), and a return stack (rs), and does this:

while(true) { switch(*ip) { arithmetics (+,-,*...): case PLUS: ds.push(ds.pop() + ds.pop()); ++ip; stack manipulation (drop,swap,rot...): case DROP: ds.pop(); ++ip; literal numbers (1,2,3...): case LITERAL: ds.push(ip[1]); ip+=2; control flow: case COND_BRANCH: if(!ds.pop()) ip+=ip[1]; else ip+=2; case RETURN: ip = rs.pop(); user-defined words: save return address & jump default: rs.push(ip+1); ip = *ip; } }

That's it, pretty much. Similar, say, to the virtual stack machine used to implement Java. One difference is that compiling a Forth program is basically writing to the code array in a WYSIWYG fashion. COMPILE SOMETHING simply appends the address of the word SOMETHING to the end of the code array. So does plain SOMETHING when Forth is compiling rather than interpreting, as it is between a colon and a semicolon, that is, when a word is defined.

So

DRAW-RECTANGLE 2DUP UP RIGHT DOWN LEFT ;

simply appends {&2dup,&up,&right,&down,&left,RETURN} to the code array.

... : ( 41 word drop ; immediate ( That was the definition for the comment word. ) ( Now we can add comments to what we are doing! )

Now. we. can. add. comments. to. what. we. are. doing.

What this does is define a word (Forth's name for a function) called "(". "(" is executed at compile time (as directed by IMMEDIATE). It tells the compiler to read bytes from the source file (that's what the word called, um, WORD is doing), until a ")" – ASCII 41 – is found. Those bytes are then ignored (the pointer to them is removed from the stack with DROP). So effectively, everything inside "( … )" becomes a comment. "

in other words, you can 'add opcodes' using DEFINE (': ... ;'), and then when faced with a non-primitive word, the compiler replaces it with a pointer to the code for that word, and the runtime jumps to that pointer when it encounters it.

hmm, that's an interesting idea...

---

" ...the key attributes of a stack seem to be a) how comprehensible the implementation is, and b) how much it allows one to collaborate with others. Mainstream stacks enable collaboration in the short to medium term, but gradually grow incomprehensible to most people and thereby divide the community between implementors and users. ...

In contrast, simple stacks like Forth are far more comprehensible. But the poor safety guarantees make it intractable in practice to share code. Forth doesn't seem to be a write-once language/stack. But it does feel like a solipsistic stack, with everyone working in their own little private universe. ... Forth allows you to implement it quickly and with minimum requirements of the stack beneath it. However it exacts a big cost in safety. You can usually interpret any bit of memory as any type. "

---

i like Mu. See plChCtmLangs for links.

i don't love the syntax tho.

Some notes:

Other available stuff (from https://github.com/akkartik/mu/blob/main/vocabulary.md ):

    offset 0: write index
    offset 4: read index
    offset 8: size of array (in bytes)
    offset 12: start of array data
    Invariant: 0 <= read <= write <= size

Syscalls:

(i already copied the above to plChCtmLangs)

(it's implemented on top of something called SubX?, which i don't like, even though the motivation for it seems the same as the motivation for Boot)

Mu is big on memory safety, which we don't want yet for LOVM; we don't want/need to do array bounds checking or force fat pointers ('handles') for long-lived pointers.

Mu exposes machine registers and only has 6 regs. Boot is like that, but i dont think i want to do that at the LOVM level.

---

so some other small languages to look at:

forth:

lisp:

Mu.

macroassemblers, HLA.

C: The subset Cs eg

Oberon.

The Python subset dialects, eg

https://github.com/certik/bcompiler

What other small things?

i hear about Self and Io being very simple.

---

could have a forth at theboot/bootx level. probably would lead to inefficient use of registers tho.

---

i looked at StoneKnifeForth? briefly. It's less similar to what i want to do than i thought, although the README is right up my ally.

If i understand correctly, there is an interpreter written in Python (tinyboot.py) (for the purpose of bootstrapping) which has a (too-tiny) language and seems to be rather platform-agnostic. Then there is a self-hosting compiler to x86 (tinyboot1.tbf1) with a larger language which has a lot of x86- and ELF-specific stuff intermixed.

So i'm more interested in the language of tinyboot1.tbf1, but its implementation is intermixed with platform-specific stuff and compiler-specific stuff. The platform-agnostic/compiler-agnostic stuff in the interpreter is too minimalistic for me:

" compile_time_dispatch = { '(': eat_comment, 'v': dataspace_label, ':': define_function, 'b': literal_byte, '#': literal_word, '*': allocate_space, '^': set_start_address, '[': start_conditional, ']': end_conditional, '{': start_loop, '}': end_loop, ' ': eat_byte, '\n': eat_byte, "'": skip_literal_byte, }

...

run_time_dispatch = { '(': jump, 'W': write_out, 'G': read_byte, 'Q': quit, '-': subtract, '<': less_than, '@': fetch, '!': store, # 'f': fetch_byte, not yet needed 's': store_byte, ';': return_from_function, '[': conditional, ']': nop, '{': nop, '}': loop, ' ': nop, '\n': nop, "'": literal_byte, }" -- https://github.com/kragen/stoneknifeforth/blob/master/tinyboot.py

i guess i can sort of eyeball the subset of stuff that tinyboot1.tbf1 adds which is not platform-specific or compiler-specific:

" Compile-time primitives: ( — defines a comment that extends to the next right-paren v — defines a dataspace label, like for a variable. b — compiles a literal byte, numerically, into data space.

  1. — compiles a literal four-byte little-endian number into data space. ^ — the location where the program should start executing [everything else is just definitions] space, newline — ignored
— defines a function [ — begins a conditional block, which consumes the value on top of the stack and is skipped if that value is zero ] — ends a conditional block, matching the latest unmatched [ { — begins a loop } — ends a loop — consumes the value on top of the stack and loops back to the matching { if it is nonzero

Run-time primitives: G — gets a byte from stdin W — given an address and size on the stack, writes the specified number of bytes to stdout Q — exits the program with a return code of 0

Defined in this file, all necessarily run-time: + — addition d — DUP p — DROP x — SWAP

— an equality test

> — a greater-than test O — boolean OR N — boolean NOT " -- a subset of an excerpt from [11]

---

thoughts on macros for the assembler:

i took a look at m4, and the C preprocessor, and the wikipedia page on preprocessors, and also found a small general macro preprocessor called m1.

i think for assembly language we actually have a much simpler task than the C preprocessor or m4, because we don't need to process and insert text in the middle of an expression in an HLL-parsed language, which demands that the user be able to express a whole bunch of stuff. All we need is to to insert a template consisting of assembly lines (with some variables interpolated at well-defined places). So our task is less 'textual' and expansive. I might call these 'line-oriented macros', in contrast to Wikipedia's 'lexical macros' and 'syntactic macros'.

Also, recall that C's preprocessor doing textual includes, rather than imports, causes trouble. It causes long compile times, because an oft-used included file has to be included, as text, into a zillion source files -- rather than compiling an import once, and then importing it in sort of processed or compiled form (so the time-intensive text parsing stage only has to be done once for an oft-used import). So we should avoid any features that would make it necessary to think of an import as textual rather than symbolic -- and any features where the processing of an import is stateful, in a non-modular way.

For example, i don't think we should have something exactly like #ifdef, because then each time you #include that file in another file, you have to "re-run" the preprocessor on it, because the state of what is and isn't already #ifdef'd might have changed (a special case of this is that the order of #includes matters -- and, if #ifdef state persists across files, the order of files in compilation matters). Rather, we should either have the macros be entirely stateless, or we should allow the #import statement to have parameters, but not have any state beyond these (in other words, an import can be parsed as text once, and can have variables within it, and these variables may have to be re-evaluated upon each import, but the compiler knows what these variables are, and it doesn't have to redo any text parsing). This would allow conditional compilation for purposes of platform-specific implementations of things, without requiring textual inclusion.

Perhaps even better would be to disallow file-level import parameters, and allow only global import parameters; this would ensure that, within a given compilation, it's not possible for two different source files to import the same import but give it different parameters. If our variant of #ifdef is used ONLY for platform-specificity (and for global config options) then that should be okay. I'm not sure how much that really saves, however; it's probably not that hard for the compiler to keep two lookup tables, one that associates each import with its parsed representation, and another that associates a pair of (import, arguments) with an internal representation.

---

i think all we need is something like Lisp's backquote comma, and variables, and a conditional and a negated conditional, and possibly simple postfix arithmetic:

.mimp (macro import)

.mdef macroname arg0 arg1 (macro define) .mend (or .mfed): terminate .mdef block

.mset varname value (varname = value)

.mif .mfi (or, maybe the body of an .mif can only be a single line, which may however be another macro)

.mifn (ifnot)

.varname (interpolate/read variable varname)

() (postfix integer arithmetic expression delimiter) + - * (postfix integer arithmetic operators)

Expansion of macros within macros is allowed but nested macros must only refer to previously defined macros (so no recursion even though macros are recursively expanded at the time of encountering their definition) (and no state persists between files, so either the previously defined macro must be in this file, or it must be in an import).

Note that the . prefix is used for all 'preprocessor-y stuff'

In ordinary code, .variables can be used in place of operands, and macronames can be used in place of instruction mnemonics.

so for example:

.mdef macro1 addr .mset addr2 (.addr 1 +) jmp .addr2 .mend

.mdef macro2 shouldjump .mif shouldjump macro1 .ifn shouldjump jrel 2 .mend

.mset constant1 1

now in ordinary code you can have stuff like:

add $3 $2 #.constant1 macro2 1 macro2 .constant1

for extra fun, could even limit the number of arguments that a macro can take, so that lines with macros can be parsed just like ordinary instruction lines, and both can assume a small number of max arguments.

---

if you dont even offer dup (and i guess swap?) maybe all stacks computations could be expressed as expressions? which could help compiling lovm. when more is needed, registers could be used. I think the condition is that everything offered can be expressed as consuming some fixed number of arguments off the stack, and then pushing exactly one result onto the stack

otoh the backend could just scan for stuff like dup and swap and if they aren't present, then assume it's an expression

---

Wren may be good inspiration for OVM; we might want a simple OOP-ish language with static-ness and moderately high perf

should also consider JVM CLR Lua Scheme and mb other embeddedable langs eg the ones in TIC-80: Lua, Moonscript, Javascript (using Duktape, which claims to be full JS, not a subset), Wren, and Fennel.

---

oo remember how pretty this used to be:

" initial commit Bayle Shanks authored 4 years ago

OVM is a virtual machine platform for programming language implementation. It is distinguished by a small, easy-to-port implementation, yet with the ability to preserve some high-level constructs when used as a target IL. Ovm is a prototype and is incomplete and buggy. There have not been any releases yet. Ovm is open-source (Apache 2.0 license).

Motivation Ovm is being (very slowly) developed for the Oot programming language (which is also under development), to serve as:

the language in which Oot will be implemented in a target IL (intermediate language) for Oot programs

Ovm is motivated by Oot's aspirations to be an easy-to-port 'glue language' that can run under, interoperate with, and transpile to/from a variety of languages and platforms. The reason for Ovm's existence is that many other virtual machines:

are too big: many VMs have large implementations and long specs, making them hard to port constrain control flow: many VMs and ILs are written to support a particular HLL (high-level language) and only support control flow patterns of that HLL (for example, some VMs/ILs have trouble with tail calls) are too low-level: on the other hand, there are some tiny VMs that consist of only a handful of assembly instructions and few or no registers. When HLL constructs are compiled into assembly, program structure can be difficult to recover; such a language would not be suitable as an IL for transpilation between HLLs

Ovm deals with these issues:

Ovm has a small implementation. Much of its functionality is in its standard library, which is itself written in Ovm (so you don't have to port it) Ovm is assembly-like and supports arbitrary control flow patterns via computed GOTO and exposure of the callstack Ovm is slightly higher-level than assembly and provides enough registers for a simple mapping of HLL variables to registers in most cases, but more importantly, it is extensible, which allows the standard library to define some higher-level constructs

The implementation of even relatively simple and common operations as standard library functions makes them inefficient. The motivation of this is just to allow us to have a small core, for portability, and push most of the functionality into the standard library. An Ovm implementation which values efficiency over portability could simply override some of these standard library calls with native implementations. Ovm is structured to make this easy to do.

What's here This repo contains:

ovm: command-line frontend to pymovm pymovm: a minimal Ovm implementation (in Python) oasm: Ovm assembler odis: Ovm disassembler pyooblib: a Python library for working with .oob files (and .ob0 files and raw ovm bytecode files) cinstr.ob0 and syscall.ob0, part of the 'standard library' of Ovm docs: There are no docs yet. For now, see http://www.bayleshanks.com/proj-oot-ootAssemblyThoughts (and possibly the later items in the http://www.bayleshanks.com/proj-oot-ootAssemblyNotes* series eg http://www.bayleshanks.com/proj-oot-ootAssemblyNotes13 ).

Ovm understands three file types (raw and .ob0 formats are understood by the core OVM implementation, .oob format is supported by standard library functions):

raw: raw Ovm bytecode .ob0: a simple module format containing a list of functions, their type signatures, their bytecode implementations .oob: "Oot bytecode", a more elaborate module format also containing imports, constant tables, signatures, optional debugging information, etc (not implemented yet).

Overview of the Ovm language Ovm is a assembly-like 3-operand CISC memory-to-memory register machine with 2 primitive types, 4 primitive address modes, 16 primitive instructions, 256 registers, a data stack, and a call stack. Primitive types:

U16: unsigned 16-bit integer PTR: pointer

Primitive address modes:

immediate register register indirect (the equivalent of LOAD/STORE) stack (post-increment/pre-decrement upon read/write; reading/writing to a stack pointer pops/pushes the stack)

Primitive instructions:

annotate (not a real instruction, but an annotation of this point in the code) loadi (load immediate) cpy (copy) add-u16-u16 (addition on integers) sub-u16-u16 (subtraction on integers) leq-u16-u16 (<= comparision on integers) add-ptr-u16 (add an integer to a pointer) sub-ptr-u16 (subtract an integer from a pointer) call3 (call the function represented by a function pointer, passing in 2 inputs and getting 1 output) lea (load effective address: given an addressing mode and operand data, compute an effective address) syscall (call various language or system services) jzrel (conditional relative branch; branch if input is zero) jnzrel (conditional relative branch; branch if input is non-zero) j (absolute jump (or static, or immediate jump); jump to a fixed label) ji (indirect jump (or dynamic jump); jump to a pointer in memory) (maybe: swap?)

Ovm will provide compile-time type checking (not yet implemented). Ovm provides the following features which are atypical for assembly-like languages:

Generic and ad-hoc polymorphism Single-instruction function calls to first-class-functions: a memory location holding a reference to function is called, passing in 2 inputs, and assigning one output Capability-based security (not yet implemented) Separate data and call stacks; also, stack, and the register file, are first-class and may be swapped in a single instruction to allow concise 'caller saving' of registers, greenthread context switching, etc. Extensibility features:

custom instructions, written in Ovm custom syscalls, written in Ovm user-defined data structures with getters and setters written in Ovm (not yet implemented) language hooks which call Ovm code (not yet implemented)

Oot has three bytecode encodings, SHORT (16-bit fixed length instructions; defined but not implemented yet), MEDIUM (64-bit fixed length instructions), and LONG (under development; variable-length instructions composed of a 64-bit aligned stream of 16-bit words). " -- https://gitlab.com/oot/ovm

---

agner likes elf best: https://www.agner.org/optimize/calling_conventions.pdf

---

David Chisnall's CPU Feature Wishlist https://www.informit.com/articles/article.aspx?p=1328797

Polymorphic Instructions and Typed Memory

User Mode Traps

Hardware Sparse Arrays

Mondrian Memory Protection

Message Passing Primitives

(better read the article for details)

---

from lowEndTargetsUnsorted4:

i think we can assume at least:

so we might say that we want the runtime to fit in 16k and everything to fit in 128k (e.g. we assume that even when we are not in ROM we want to fit in L2)

---

Call3 with a destination register in immediate mode could be used to just pass that as an argument to the subroutine to be used as it will.

---

  1. ## 64-bit instruction encoding The four most-significant-bits (the first bit of the first byte) are always 1110.

The first 4 bits of each 12-bit field is the addressing mode. The last 8 bits is the operand.

op3 is often interpreted in a special way for polymorphism.

Any instruction in the 64-bit instruction encoding can be translated to a sequence of instructions in the 32-bit encoding in a straightforward way.

---

CAP register for access control/security/privilages/capabilities

---

HAT format to get the advantages of both a variable-length and a fixed-length encoding :

http://scale.eecs.berkeley.edu/papers/xoxo-meng.pdf

---

" VMs have a lot of nice advantages in theory, chiefly that they are very cross-platform and come with state-of-the-art runtime tools (like garbage collection) that you’d otherwise have to build yourself. Unfortunately, they traditionally have some weird problems that are showstoppers for many languages, such as the lack of value types on the JVM. No major VM supports coroutines, the clear winner in approaches to composable concurrency, so if you want this feature you’re back to building your own runtime. Project loom and the Wasm effect handlers proposal are exciting here, though. If and when wasm achieves its ambitious goals2 there will be far fewer reasons left to go native, and VMs have every chance to play a starring role in future language innovation – much as LLVM has over the last decade. https://wiki.openjdk.java.net/display/loom/Main https://github.com/effect-handlers/wasm-effect-handlers " [12]

---

i like

https://thakeenathees.github.io/pocketlang/getting-started-learn-in-15-minutes.html

---

forth style metaprogramming is appropriate for any language that is linear, for example, assembly language or virtual machine. So mb oot assemble, or mb ovm

---

i talk about OVM maybe having 'custom instructions' provided by a standard library which builds up capabilities like hash tables; i had been thinking of runtime mechanisms for dynamically loading these but actually now i realize that you can make the custom instructions compile-time only and then treat them like compile-time macros.

maybe the registers that i am reserving so that the implementation has room to cache the top of the in-memory stack could do double-duty of being available for 'custom instructions'. Could reserve 8 registers, and then divide them into 4 registers just for macros and a 4-item smallstack just for macros.

otoh OVM has a larger instruction encoding and so more registers, right? So why use the 32 registers that Oot Assembly uses, why not use higher-numbered registers, to reduce contention?

---

some things to look into:

---

assembly vs forth and assembly vs lisp; features/complexities that each has that the other(s) lack:

assembly: + operands

forth: + non-"finiteness" of commands (there is not a small number of potential words (i say small, b/c actually the number of potential words probably is bounded)) + functions (words) + macros (words) + run-time "macros" + forth's compile-time/runtime distinction and how it enables something like polymorphism

lisp: + non-"finiteness" of commands (there is not a small number of potential words (i say small, b/c actually the number of potential words probably is bounded)) + functions (words) + subexpressions + variables/environments + list data structure + macros

---

https://pithlessly.github.io/allocgate.html suggests that you should implement vtable-based runtime-polymorphism by replacing each pointer to the object with a 'fat pointer' consisting of two pointers; one pointer to the vtable and one pointer to the object instance data

---

" Hide All the Ugliness Behind Nice APIs

In Deegen framework, the semantics of each bytecode is described by a C++ function. One of the most important design philosophy of Deegen is to abstract away all the nasty parts of an interpreter. I will demonstrate with a simplified example for the Add bytecode:

void Add(TValue lhs, TValue rhs) { if (!lhs.Is<tDouble>()

!rhs.Is<tDouble>()) {
    ThrowError("Can't add!");
  } else {
    double res = lhs.As<tDouble>() + rhs.As<tDouble>();
    Return(TValue::Create<tDouble>(res));
  }}

...

Notice how much nasty work we have abstracted away: decoding the bytecode, loading the operands and constants, throwing out errors, storing results to the stack frame, and dispatching to the next bytecode. All of these interpreter details either happen automatically, or happen with a simple API call (e.g., ThrowError? or Return).

...

DEEGEN_DEFINE_BYTECODE(Add) { Operands( BytecodeSlotOrConstant?("lhs"), BytecodeSlotOrConstant?("rhs") ); Result(BytecodeValue?); Implementation(Add); Variant( Op("lhs").IsBytecodeSlot?(), Op("rhs").IsBytecodeSlot?() ); Variant( Op("lhs").IsConstant?<tDoubleNotNaN>(), Op("rhs").IsBytecodeSlot?() ); Variant( Op("lhs").IsBytecodeSlot?(), Op("rhs").IsConstant?<tDoubleNotNaN>() ); } " [13]

---

on a "Lisp-like Forth":

" .

~ doug-moen edited 29 hours ago

link flag
    I find myself wondering a lot what a more accessible Forth might look like; are there more flexible, composable, simple abstractions like the Forth “word” out there? Our current GUI paradigms can’t be irreducible in complexity; is there a radically simpler alternative that empowers individuals? What else could an individual-scale programming language look like, that is not only designed to enable simplicity, but to outright disallow complexity?

I think a “more accessible Forth” could look like a tiny 1960’s style Lisp interpreter that emphasizes extreme implementation simplicity over features that make modern Lisp nicer to use. Although such a language would be primitive compared to modern languages, it would be as malleable as Forth, and way easier to use, due to having local variables, garbage collected dynamic data structures, and memory safety. Like Forth, the implementation could fit into a few kilobytes of memory, because the original 1960 Lisp interpreter would have had to be that small.

...

    ~
    snej 27 hours ago | link | flag |

Do you mean with Lisp syntax, i.e. S-expressions? Because there are a bunch of tiny Lisps like that, i.e. SectorLisp?.

Otherwise I guess you have a language that’s “backwards Lisp” or “Lisp but it’s RPN.” You’d have to change a lot of the standard vocabulary, because on the downside functions don’t know how many arguments they’re given, and on the plus side they can have multiple return values. "

the reason i took note of this is b/c "on the downside functions don’t know how many arguments they’re given" is an interesting thought about (something simpler than) Lisp


wasm tail call:

"There are two ways to call a function in Wasm MVP: call and call_indirect. The WebAssembly? tail call proposal adds their tail call counterparts: return_call and return_call_indirect" -- https://v8.dev/blog/wasm-tail-call

---

on the WASM GC extension:

https://github.com/WebAssembly/gc/blob/main/proposals/gc/Overview.md https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md

interestingly the language extension for GC spec is more about types than GC

we have creation, get, set, equivalence-test for:

we have references (like the addressof (&) operator in C)

we have a language for specifying types (eg the fields in a struct, but also function types), which makes more sense to me in their LISP IR than in linear bytecode.

we have call_ref to call function pointers

we have various special types, eg an 'any' type and a 'func' type (which is like an 'any' type but only on the subset of function types)

we have an i31 type to allow unboxed 32-bit integers except with a bit reserved for pointer tagging.

we have a (down)cast operator

https://github.com/WebAssembly/gc/blob/main/proposals/gc/Overview.md shows how these features are enough to allow the HLL to implement objects with method tables, and/or to implement closures.

https://v8.dev/blog/wasm-gc-porting notes some disadvantages (at least of the current "MVP" version of the GC extension):

---