proj-oot-ootNotes38

https://github.com/nesbox/TIC-80 has 80k of RAM and accepts 64k of user code of any of: Lua, Moonscript, Javascript (using Duktape, which claims to be full JS, not a subset), Wren, and Fennel.

---

racket seems to have good metaprogrammy stuff, mb just implement in Racket, at least a prototype

that might be good for our core language and up, but what about the lower levels?

makes me wonder, how is Racket itself implemented? maybe they already did what we want to do. So, i took a look. They recently switched to be implemented on top of Chez Scheme (in the source code the version on top of Chez Scheme is called Racket CS, and the old version is called Racket BC (Before Chez, i assume)). Apparently before that they had a large C core which made the Racket implementation unwieldy to work with, but they (the Racket team) say that Chez Scheme has a much smaller C kernel than the C core that the old version of Racket used to have, and in general they say Racket CS is now easier to improve than Racket BC was.

So, what is this small C kernel in Chez Scheme? Unfortunately i did not find it to be very obvious or documented. Furthermore i didn't find documentation for a 'core lisp' within Chez Scheme.

Chez Scheme does have a file called "scheme.h" but i think it is only for FFI, for C programs to include that are providing interop with Chez, rather than a core of Chez, though i could be wrong. It is described in https://cisco.github.io/ChezScheme/csug9.5/csug9_5.pdf section 4.8. C Library Routines pdf page 94.

I've heard that Chez Scheme uses the nanopass compiler framework. https://andykeep.com/pubs/dissertation.pdf describes a little bit the Chez Scheme nanopass implementation, with an overview particularly in section 3.3.3. Workings of the new compiler. This section says that there are "approximately 50 passes and approximately 35 nanopass languages"! Lots of passes are fine but 35 languages is too much to get one's head around.

I think this reveals another goal that i have for Oot. The (reference) compiler should be understandable by newcomers. I thought i liked the idea of nanopass compilers because each pass is simple, but if the cost is to have 35 layers to learn about, that's just too much. Now, i'm sure most of these 35 languages are just simple variations on the previous one, but a newcomer first encountering the code doesn't know which ones are major transitions and which aren't so this just makes things more confusing (although perhaps really good documentation could help). Having a smaller number of language layers helps with that.

I also realized something in the way i think about these layers (although i think i've realized, and written about, this before). Part of the value of a 'bytecode' sort of thing is that it really makes it clear what functionality is in a given layer of the language; you can mostly see what it does by reading the list of instructions. The functionality of a lower bytecode layer (like Boot and LOVM, but maybe not so much JVM, CLR, or OVM) outside of the bytecode instructions itself is trivial, unlike, say, C, in which stuff like the functionality of maintaining local variables is not in any single core library function but creeps in in the language itself, or, worse, various Lisps which support closures and GC.

The goal of an understandable (and small) implementation is also why i don't like LLVM's encoding, which has things like variable-width integers (https://releases.llvm.org/8.0.1/docs/BitCodeFormat.html#variable-width-value); i feel like things are easier to understand if you can naively hack on the object file with a hex editor to some extent. And why i don't like LLVM's requirement that the compiler construct an SSA CFG graph before handing off the code; i think that's great for LLVM's goal of being an optimizer backend, but it's poor for my goal of a small reference compiler that is quickly comprehensible. I imagine that the Oot compiler could have an optional optimization flow which would do stuff like that, but a newcomer could more quickly understand the semantics by turning all that stuff off and look at Oot's basic 'reference compiler flow'.

By the way, i think the figures and (colorized) discussion in the 'Approach' section of http://www.cs.utah.edu/~mflatt/racket-on-chez-jan-2018/ are great for an architecture overview, we should do that sometime.

---

for some reason very late last night i had the inclination to look up that weird mathy intermediate language Om and its ilk. I had some trouble finding it because it seems to have changed its name and/or been replaced by other stuff. I finally found it all and put my notes on this archeology in section 'HTS, Pure, Infinity, EXE, Om project archeology' in plChProofLangs.

My conclusion is:

" Om as a programming language has a core type system, thePTS∞— the pure type system with the infinite numberof universes. This type system represents the core of the language. Higher languages form a set of front-ends to thiscore. Here is example of possible languages: 1) Language for inductive reasoning, based on CiC? with extensions; 2)Homotopy Core with interval [0,1] for proving J and funExt; 3) Stream Calculus for deep stream fusion (Futhark);3) Pi-calculus for linear types, coinductive reasoning and runtime modeling (Erlang, Ling, Rust). These languagesdesugar toPTS∞as an intermediate language before extracting to target language4.Not all terms from higher languages could be desugared to PTS. As was shown by Geuvers[8] we cannot buildinduction principle inside PTS, we need a fixpoint extension to PTS. And also we cannot build the J and funExt terms.But still PTS is very powerful, it’s compatible with System F libraries. The properties of that libraries could be provenin higher languages with Induction and/or [0,1] Homotopy Core. Then runtime part could be refined to PTS, extractedto target and run in an environment.We see two levels of extensions to PTS core: 1) Inductive Types support; 2) Homotopy Core with [0,1] and itseliminators. We will touch a bit this topic in the last section of this document " -- https://raw.githubusercontent.com/groupoid/pure/1133b524a241bf381f10699b9c08d05acf81a99a/doc/om.pdf

    PI

MLTT

    PI
    SIGMA
    ID
    INDUCTION

HTS

    PI
    SIGMA
    PATH
    HIT
    COMP" -- https://groupoid.space/homotopy/

" PTS

    PI

MLTT

    PI
    SIGMA
    ID
    INDUCTION

CUBICAL

    PI
    SIGMA
    PATH
    INDUCTION
    COMP
    GLUE" -- https://web.archive.org/web/20190102060830/https://groupoid.space/mltt/infinity/

Regarding HTS/MLTT/Infinity/EXE, these seem to be layers on top of Om that compile to Om, but possibly also adding stuff like induction, J, funExt, that cannot be built on top of PTS according to https://raw.githubusercontent.com/groupoid/pure/1133b524a241bf381f10699b9c08d05acf81a99a/doc/om.pdf , and maybe pi calculus and stream calculus stuff is included in that, i'm not sure. The old EXE page, https://web.archive.org/web/20190102060830/https://groupoid.space/mltt/infinity/, contains spawn, receive, send, but the new https://groupoid.space/homotopy/ page does not.

The old EXE page has a section on 'Effects':

" Effect

Effect syntax extensions defined basic process calculus axioms, IO and exception handling.

    data Effect: * :=
         (receive: Receive → Effect)
         (spawn: Spawn → Effect)
         (send: Send → Effect)
         (try: Exception → Effect)
         (raise: Exception → Effect)
         (write: File → Effect)
         (read: File → Effect)

Process

  record Process: (Sigma: *) → (X: *) → * :=
         (action: Sigma → X → GenServer X)

Spawn

  record Spawn:
         (proc: Process)
         (raise: list Eff)

Send

  record Send: (Sigma: *) → * :=
         (message: Sigma)
         (to: Process Sigma)

"

Not sure what the difference between MLTT and HTS is, and which of them, if any, correspond to Infinity and to EXE.

The new page https://groupoid.space/homotopy/ says of MLTT (superscript infinity) "In its core it has only comp primitive (on empty homogeneous systems) and CCHM connections (for J computability).". I'm guessing the 'CCHM connections (for J computability)' refers to that same J thing that the old Om paper said can't be derived from PTS. Not sure if the 'comp primitive' is what compiles to Om (which is suggested by its being the only other thing outside of 'CCHM connections'), or if it's another extension that can't be compiled to Om (which is suggested by the table on that page, which looks like a table of primitives, which has PI in both PTS and MLTT). It's also confusing that that table puts COMP in the HTS section, NOT in the MLTT section.

Also, the old stuff seems to talk slightly more about programming language stuff, whereas the new https://groupoid.space/homotopy/ sounds like it's purpose is to "compute all the MLTT [rules]" and to have "full HoTT? computability". I think that's more likely a difference in writing style than a difference in goals, but i'm not sure.

My conclusions so far are:

---

so another rundown on the current layer idea:

Oot (3 profiles: std, small, nostdlib)

Preoot Oot Core OVM LOVM with standard libraries --- Lo LOVM --- LOVM assembly --- Lo (HLL thin layer on top of LOVM) Boot (with BootX? profiles) --- Boot assembly
Metaprogramming
Lowering, simplified syntax
Main implementation of Oot semantics
implementation of low-level stuff
simple compiliation

Purposes:

Above this point I'll finally get to start working on my dreams for Oot syntax and semantics, instead of generic implementation-y stuff!

Below this point we have a generic toolbox of 'programming language implementation technologies' not very specific to Oot, and generally useful for creating portable programming languages that care more about simplicity than efficiency. Below this point we have unsafe languages with C-like wild undefined behavior. Above this point we have safe languages with mandatory bounds checking, suitable for security and sandboxing (possibly modulo low-level reliance due to CPU bugs on actual separate OS processes for security boundaries?), without wild undefined behavior (only 'controlled crashes'). Above this point we have stuff that is less likely to be reused outside of the Oot project. Highly optimized or highly platform-interoperable implementations will probably skip everything below this point and directly implement either OVM or Oot Core on the platform (it may be hard to go higher than that because Oot metaprogramming will operate on either the Oot Core or Preoot representation).