Bayle Shanks's website: proj-oot-ootCoreNotes2

thinking about the bit on models of computation i just wrote, some primitives for oot: pointer (turing head location in memory) array (turing tape, imperatives) goto, while (imps) integers in variables functions in variable, function application logic gates if 2d arrays (automata) copying an expression in one step (cow) expressions linear lists of code variable environemnts (lambda) mu successor functions with multiple arguments function composition recursion, primitive recursion expression trees selection structural equality nock operator

other common concepts: call stacks strings dicts lists regex db tables relation? types maps, folds what from logic programming? transition table grammar category theory? lattices? ordering relations?

can strings, cellular automata, etc be generalized with a 'topology' (or is this a geometry? i think topology) on top of something unstructured like a graph? is this like oot. pattersn/graph regexs/structural types? relatio to Chu spaces?

---

some more stack ops:

FLAG: place an opaque 'flag' on the stack CLEAR-TO-FLAG, ROT-TO-FLAG, etc; applied in the section from the top of the stack (TOS) down to the first flag

a 'flag' is like an Oot 'boundary' and maybe the command should be called STACKBOUND instead of FLAG.

(http://www.reocities.com/connorbd/varaq/varaqspec.html has this)

---

also might want list-to-stack and stack-to-list ops; e.g. SHATTER to 'Reduces a list to its component elements and pushes them on the stack in order' [1] and CONSUME n or CONSUME-TO-BOUNDARY to take the first n elements of the stack (or all elements up to a BOUNDARY) and return them as a list

(http://www.reocities.com/connorbd/varaq/varaqspec.html has this)

also http://www.reocities.com/connorbd/varaq/varaqspec.html has

empty? 'Examines a list on the stack and returns 1 if its value is null (pagh), a 0 if it contains anything.'
split 'Pops a list off the stack and returns the first item and the rest of the list.'
cons 'Takes an object and adds it to the head of a list. Equivalent to the LISP cons operator. '

---

http://www.reocities.com/connorbd/varaq/varaqspec.html also has some interesting stack-to-string ops:

strtie 'Concatenates the top two strings on the stack into one.'
compose 'Pops objects (executing proc objects if necessary) off the stack until a marker (placed by qaw) is hit and combines them into one string. '
streq 'Pops the top two strings on the stack, compares them, and returns 1 if identical, 0 if not. '
strcut 'Pops two values and a string and returns the section of the string between character startval and character endval.'
strmeasure 'Pops a string off the stack and returns its length in characters. '
explode 'Separates individual "words" in a string by whitespace.'

---

http://www.reocities.com/connorbd/varaq/varaqspec.html control flow and variables:

by default, names are lookup up if defined, like LISP. There is a quote operator that allow the name itself to be specified.
there is a 'set' operator to assign to a name
there is an 'eval'
there is an 'ifyes' and an 'ifno'
there is a 'return' (they call it 'escape')
there is a 'repeat'

---

[Self-proj-plbook-plChModelsOfComputation], and also the notes in section 'some ideas relating to generalizing/translating the instructions of Nock' in [Self-proj-oot-ootNotes12], make me think that defining an Oot core might just be picking one thing for each of these roles;

eg for Nock:

what is the thing that is like an atom in Nock
what is the thing that is like a cell in Nock
what is the thing that is like the state in Nock
what is the thing that is like the cons in Nock
what is the thing that is like the instruction in Nock
what is the thing that is like the select instruction in Nock
what is the thing that is like the arg instruction in Nock
what is the thing that is like the Nock instruction in Nock (the S combinator in combinatorial calc)
what is the thing that is like the isCell instruction in Nock
what is the thing that is like the addition instruction in Nock
what is the thing that is like the structural equality instruction in Nock

and for the "another writeup" section of [Self-proj-plbook-plChModelsOfComputation]:

what is the thing that provides data executed as code?
what is the thing that provides abstraction?
what is the thing that provides extensible memory?
what is the thing that provides conditionals?
what is the thing that provides looping?

and for the lambda calc model:

what is the thing that is like substitution in lambda calc?
what is the thing that is like variables in lambda calc? etc

and for the turing machine model:

what is the thing that is like pointers in turing machines?
what is the thing that is like instruction words in turing machines?
what is the thing that is like the tape in turing machines? etc

and for the imperative language with WHILE model:

what is the thing that is like while loops imperative languages with WHILE?

etc

and for Haskell:

what is the thing that is like lists in Haskell?
what is the thing that is partial function application in Haskell?
what is the thing that is currying in Haskell?
what is the thing that is types in Haskell?

etc

and for Python:

what is the thing that is like lists in Python?
what is the thing that is like dicts in Python?

etc

once (a) these lists are all completed, and (b) you have answer a few of those questions (for instance, if you only provided one thing like each thing in Nock, you'd be done, in terms of Turing completeness), you should have enough for a core language. Since i want Oot Core to not be orthogonal, but instead to have the sum of many basic models of computation, we should instead (c) answer almost all of the questions, for all the models of computation, and take the sum to be Oot Core.

i like the idea of starting with graphs as a data structure. The difference from dicts is just that in our library functions etc, we emphasize the whole graph as one object, not just those elements reachable from directly from the first node; and also that we provide a little more infrastructure than usual (labels, reification, etc) to give our graphs a little more 'flavor' (by which i mean, conventions for the interpretation of structural form). Similarly to how the 4-role (5-role, if you count the address of each instruction) system of syntax for Oot Assembly (see [Self-proj-oot-ootAssemblyNotes4]) gives a lot of 'flavor' to syntax.

---

so i guess mb:

first-class call stack, and first-class closures; goto, switch, but also while, repeat..until, if, foreach; grammar rule 'reductions'

probably not a full C-style 'for' b/c it seems kinda hacky, just syntactic sugar for a fancy 'while'. Perhaps a constrained 'for', though, which is always simply incrementing until it hits a limit

---

'return' to return in the middle of a fn; equivalent to assigning the return args to a temporary variable, then a JMP to the end of the fn, and insertion of a statement at the end that returns from those temp vars

(also 'break', 'continue')

---

keyword args and currying: hold onto future keyword until reqd args are satisfied

---

Oot needs a concept of:

a variable/document that has various versions/revisions (which might be arranged in a chain (linear time), a DAG (version control), etc
distinct from the concept of a particular value

---

starting with BASIC-like:

types: string, bignum integer, bignum float, graph, higher-order functions, ADTs
if
goto
some sort of concurrency thingamajig
function defn
function calling
- with keywords, defaults
macros (what else? syntax transforms? fexprs?)
foreach
continue
break
while
stuff for constructing graphs
patterns and pattern matching
bools
and/or/not
isEq
isPtrEq
var assignment
multidim arrays
error handling stuff
'is'
annotations (capable of representing transactions, labels, type annotations, function annotations, field annotations, assertions, etc)
comments
views

[2] is telling; webassembly is pretty small and the intersection of asmj.s, nacl, and webassembly is similar to webassembly. Suggests a slightly higher-level:

types:

int, float, void, low-level SIMD vectors, fn pointers, pointers

commands:

module
export
function defn
statements
- block
- expression (see below)
- if
- while, switch
- continue, break
- labelled
- goto (?but note webassembly comment below)
Function calling (statements and/or expressions)
- call(fn name literal): specify the callee by index into a function table
- call(fn ptr)
- addressof: obtain a function pointer value for a given function
- return
memory access statements
- heap access (load, store, alloca, load_with_offset, store_with_offset)
- load/store global
expressions
- expression
- literal
- get_local, set_local
- member
- ternary ?: operator
- comma: evaluate and ignore the result of the first operand, evaluate and return the second operand
Arithmetic expressions
- add, sub, mul
- neg
- copysign (? only in webassembly?)
- div, mod, rem
- shifts: shl, shr, sar
- <, <=
- ==, !=
- and, or, not, xor
- Infinity, NaN?
- various trig (note, not in webassembly mvp)
- exp, log, pow
- ceil, floor, round, trunc
- sqrt
- abs
- min, max
- E, PI
- clz (count leading zeros), ctz (count trailing zeros), popcnt (count number of ones)
- coercions
  - float/int (various)
  - ptr/int

other stuff that looks cool from nacl:

threading
- llvm.nacl.read.tp (i8* @llvm.nacl.read.tp(): Returns a read-only thread pointer. The value is controlled by the embedding sandbox’s runtime)
- : nacl: "Threading is explicitly supported through C11/C++11’s threading libraries as well as POSIX threads. Communication between threads should use atomic primitives as described in Memory Model and Atomics." [3]. global variable TLS models are not supported. volatile is not supported. See PNaCl C/C++ Language Support: Memory Model and Atomics.
freaky control flow
- C11 setjmp and longjmp
atomics
- llvm.nacl.atomic.store, llvm.nacl.atomic.load, llvm.nacl.atomic.rmw (read-modify-write operations; compile-time constant specifying the operation as one of: add, sub, or, and, xor, exchange), llvm.nacl.atomic.cmpxchg, llvm.nacl.atomic.fence, llvm.nacl.atomic.fence.all, llvm.nacl.atomic.is.lock.free: See atomic intrinsics
  - note to self: how do these compare to LLVM's? see http://llvm.org/releases/3.3/docs/LangRef.html#memmodel http://llvm.org/releases/3.3/docs/Atomics.html
instrinsics:
- memcpy, memmove, memset
- bswap (?)
- llvm.stacksave, llvm.stackrestore (These intrinsics are used to implement language features like scoped automatic variable sized arrays in C99. llvm.stacksave returns a value that represents the current state of the stack. This value may only be used as the argument to llvm.stackrestore, which restores the stack to the given state)
- llvm.trap (This intrinsic is lowered to a target dependent trap instruction, which aborts execution)
unreachable

webassembly comment regarding goto: " Break and continue statements can only target blocks or loops in which they are nested. This guarantees that all resulting control flow graphs are reducible, which leads to the following advantages:

    Simple and size-efficient binary encoding and compilation.
    Any control flow—even irreducible—can be transformed into structured control flow with the Relooper algorithm, with guaranteed low code size overhead, and typically minimal throughput overhead (except for pathological cases of irreducible control flow). Alternative approaches can generate reducible control flow via node splitting, which can reduce throughput overhead, at the cost of increasing code size (potentially very significantly in pathological cases).
    The signature-restricted proper tail-call feature would allow efficient compilation of arbitrary irreducible control flow."

" If you use only while-loops, for- loops, repeat-loops, if-then(-else), break, and continue, then your flow graph is reducible. ... Why Care About Back/Retreating Edges? 1. Proper ordering of nodes during iterative algorithm assures number of passes limited by the number of “nested” back edges. 2. Depth of nested loops upper-bounds the number of nested back edges " -- http://infolab.stanford.edu/~ullman/dragon/w06/lectures/dfa3.pdf

" Certain control-flow patterns make flowgraphs irreducible. Such patterns are called improper regions, and, in general, they are multiple-entry strongly connected components of a flowgraph ... as long as we avoid gotos, specifically, gotos into loop bodies (my note: based on a footnote at the bottom of the page, this includes loops made up of ifs and gotos) ... statistical studies..have shown that irreducibility is infrequent, even in languages that make...(little)...effort to restrict control-flow constructs...before structured programming became a serious concern...over 90% of a selection of real-world Fortran 77 programs have reducible control flow and all of a set of 50 large Fortran programs are reducible. Thus, irreducible flowgraphs occur rarely in practice...however, they do occur, so we must make sure our approaches to control- and data-flow analysis are capable of dealing with them.

There are three practical approaches to dealing with irreducibility...iterative data-flow analysis, as described in section 8.4, on irreducible regions...The second is to use a technique called node splitting that transforms irreducible regions into reducible ones. If irreducibility were common, node splitting could be very expensive, since it could exponentially increase the size of the flowgraph; fortunately, this is not the case in practice. The third approach is to perform an induced iteration on the lattice of monotone functions from the lattice to itself (see Sections 8.5 and 8.6). " -- Advanced Compiler Design Implementation, section 7.5, page 196 https://books.google.com/books?isbn=1558603204 Steven S. Muchnick - 1997

varargs not supported.

missing (todo see what this is again):

from NaCL?: icmp, fcmp, phi, select, insertelement, extractelement, bitcast, imul

missing (todo, replace vectors with arrays and structs with graphs? or both with graphs?)

from asm.js, sorta: arrays, structs

more on what a 'signature-restricted tail call' might be:

Tail Call Optimization in GCC

To address optimization needs, GCC has introduced the concept called "sibcalls" (short for "sibling calls"). Basically, a sibcall is an optimized tail call, but restricted to functions that share a similar signature. In other words, the compiler considers two functions as being siblings if they share the same structural equivalence of return types, as well as matching space requirements of their arguments.

For example, again assuming the ABI of ix86 Linux/UNIX, a tail call from function foo to bar would be a potential optimization candidate, because both share the same return type. Two arguments of type short are represented internally by using four bytes altogether, which is the same size as one long long argument:

int foo (long long a); int bar (short a, short b);

This restriction is necessary because in a chain of sibcalls, the top-most caller who is calling a tail-recursive function (and being unaware of it) attempts to clean up the callee's arguments when computation has finished. However, if this callee is allowed to exceed its own incoming argument space to perform a sibcall to a function requiring more argument stack space, you would end up with a memory leak when the top-most caller attempts to free the stack slots.

Another reason, related to this, is the shifting of the return address; see Figure 3. Apart from being a technical challenge, it would also break binary compatibility with other programs and libraries that do not support this notion of stack handling. Unaware third-party procedures would not be prepared to perform stack-shifting operations or, alternatively, to let the callee worry about the necessary memory clean ups. "

-- http://www.drdobbs.com/tackling-c-tail-calls/184401756

toread

http://www.wseas.us/e-library/conferences/2013/Vouliagmeni/INMAT/INMAT-05.pdf

pdf page 3 (page 43)

Lua VM has a three-argument CALL: the (register with a reference to the) function to be called, the number of arguments (in subsequent registers), and the number of return parameters to use

---

mb Oot Core should NOT have all the parsing that Oot has; specifically, grouping, precedence, etc; maybe, grouping-wise, it should be a simple Lisp-like close-to-AST thing with explicit parens and blocks. Oot Core should be valid Oot, but not the other way around.

This would imply, however, that Oot is NOT just Oot Core + metaprogramming, because our metaprogramming can't affect parsing/grouping (right?). However, the following would still hold: Oot Core = some_subset_of(Oot) + metaprogramming

---

hmm, it would be really nice for Oot Core to be really simple.

But it would also be nice for Oot to be just Oot Core + metaprogramming (+ the Oot stdlib).

Is there a way to 'square the circle'?

The most obvious way would seem to be to allow Oot Core enough metaprogramming to change its syntax.

But this conflicts with my goal that Oot should not be able to change its syntax.

So should Oot Core be MORE expressive than Oot in this way?

That would be cool, but then it's probably harder for a compiler to staticly optimize compilation of Oot Core code.

We could just have another layer of interpretation on top of this; when new syntax is defined for Oot Core, it essentially compiles this to a parser and then runs this parser on top of itself (as if it is getting the code as a string and 'eval'ing it, or like "reader macros", or like another layer of interpreter running within its runtime).

This seems pretty inefficient, although at least it would only be a compile time inefficiency.

(note that the COLA guy, Piumarta, wants to demonstrate that "* there is no reason to consider statically-compiled code (from any source) as being any less malleable at runtime than is dynamically-compiled code; and that * the choice of whether to make a particular language/system feature/component dynamic or (effectively) static should be nothing more than a function of personal style, with no implications for malleability or performance" [4]; maybe that isn't quite the same as what i'm thinking of, but it sounds like he's claiming there's a way to do stuff like this without performance penalty).

If Oot Core is to allow metaprogrammed changes to its syntax, and then compiling segments of the source code with the new syntax, is that really any better than just saying "an Oot compiler is bootstrapped in Oot Core"?

I guess one difference, although i don't know if it matters, is that Oot also compiles TO Oot Core; furthermore, Oot Core is used as an internal representation for Oot's metaprogramming constructs. So it's relationship to Oot Core is tighter than its relationship to Oot Bytecode (or to the JVM, or C, or whatever Oot Core is implemented in).

So i guess we'd end up having a Racket-like system where various sections of source code are marked as being this-or-that 'language', and then at compile time the metaprogramming that defines that 'language' is consulted.

Do we really need to disable this at the Oot level, in order to achieve my criteria that the syntax is uncustomizable (and therefore easy to read after you learn it once)? I guess not -- we can just say that "Oot syntax" is unchanging, but you could use metaprogramming to define other languages (even Haskell, Lisp, Java, etc, or custom DSLs) and have separate blocks of those 'other languages' in your source code. That isn't quite as bad as eg custom infix precedence in Haskell -- the reader will still know how to parse anything in an "Oot language" block. I worry that it invites the Lisp curse, though; that is, i worry that people will make up these other languages all the time, making the end result hard to read. But otoh if we really want Oot to be a great tool for language development, maybe we gotta allow ppl to easily make their own DSLs with their own syntax.

Would these language blocks be per-file or per-block? The most flexible choice is per-block, of course, but maybe per-file would be better to discourage their use. Otoh i could see things like eg Ruby on Rails which really want to have a little DSL block within one file (perhaps to specify a routing configuration for a website, without having to have a whole separate file just for that). Also, things like regexs fit here. Which highlight that this idea is really further thinking about my old idea of 'string-level metaprogramming' to allow regex-like things to be implemented via metaprogramming.

Of course i don't really want to restrict the types of syntaxes that could be used, so in general, this is a string-level metaprogramming construct, eg you can write your own lexer and parser that just gets the block as a string (of course if you want to go so far as to have uneven levels of {} or to have different string parsing conventions, either of which would prevent the Oot parser from correctly determining the end of your 'block', then you have to put the foreign code inside an Oot string HERE document instead of just using the {} block syntax; so maybe you should just always have to do that, to make things simpler). But usually a higher-level LL(k) (or even LL(1)) parsing syntax would be used, or a parser combinator syntax, or a PEG (Parsing Expression Grammar).

Now, wouldn't this prevent the compiler implementation from being really efficient when parsing ordinary Oot code? No, because we can use a trick like Urbit's Jets; the compiler could look at the Oot Core file defining, via Oot Core metaprogramming, how to parse Oot, and it could look at the Oot version number of this (or even look at a hash, although that would take longer), and assuming it recognizes the version, it can ignore this parsing metaprogramming and override it with its own custom optimized parsing implementation for Oot code of that version.

---

Related to the previous, and thinking about Piumarta's idea that a language implementation should be completely "self-describing (from the metal, or even FPGA gates, up) exposing all aspects of its implementation for inspection and incremental modification", and that "late-binds absolutely everything: programming (parsing through codegen to runtime and ABI)...":

and thinking about how Oot Bytecode is in a sense just an AST representation format for Oot Core,

and thinking about how Oot Bytecode is perhaps a stab at a common 'interchange format' for representing ASTs while transpiling,

and thinking about how Piumarta's COLA stack has a low-level language, Pepsi, which appears to have dynamic dispatch, and wondering how Oot Bytecode could be a good format for representing Pepsi ASTs,

perhaps we should think of Oot Bytecode not as something with a single definite operational semantics, but only as something with a default operational semantics; that more generally, Oot Bytecode instructions represent more abstract concepts (or families of related concepts), and that Oot Bytecode can be executed by different interpreters which ascribe different meanings to the same bytecode.

More concretely, imagine that we want to express Pepsi code as Oot Bytecode. We would need to have a way of representing a Pepsi-style dynamic dispatch invocation of a method (a 'message send' i think he calls it). This can't be Oot Bytecode's ordinary CALL instruction, because that doesn't do dynamic dispatch. But different languages have slightly different semantics for their dynamic dispatch, so if we created a DYNAMIC-CALL-PEPSI instruction/opcode, we'd still have to create another similar one for Python, for Common Lisp, etc. An alternative is to just use the existing Oot Bytecode 'CALL' opcode to represent Pepsi calls too, and to say via metaprogramming that "in the context of a Pepsi code block, when you see CALL, don't do what you normally do when you see a CALL, instead run this other code". In general, the "Pepsi code block" of Oot Bytecode could never be run by the Oot Bytecode implementation directly at all, but rather it could just be data that is passed to a custom Pepsi interpreter that whose input AST just happens to be given in the format of Oot Bytecode. But, to provide a 'metaprogramming hierarchy', we could provide a way to re-use Oot Bytecode dispatch, to blacklist or whitelist certain primitive and stdlib instructions, and then to allow even the Oot Bytecode primitives correspoding to these instruction to be redefined, as if they were custom instructions.

This reminds me of my 'levels of interpretation' fuzzy idea; one way to think about the 'level's was that one symbol could have different (but related) meanings depending on context.

This ties in with the idea behind the LONG format of Oot Bytecode, namely that it could be used to encode ASTs for program fragments written in an enriched language that can't be expressed by MEDIUM format (which is constrained by its fixed-length 64-bit encoding), for example, a language with instruction level or operand level flags (such as for atomicity). So I already had the idea that Oot Bytecode LONG format could be used to as a generic AST "interchange format" to represent alternative languages; this just extends that to MEDIUM format, both by wholesale running of a new interpreter on top of the existing one, and also by contemplating overriding the implementation of primitives and standard library functions in the manner of custom instructions.

Note that in addition to a facility to have 'foreign' Oot Bytecode segments run on your own interpreter, with that interpreter written in Oot Bytecode, it would be nice if your interpreter could be 'self-hosting', eg it is itself interpreter by itself/with its own interpretation of its instructions. Of course this would require first bootstrapping it via an Oot Bytecode implementation, but what i am saying is that Oot Bytecode should facilitate providing a bootstrapping chain, and if such a bootstrapping chain is given, then you should be able to run code (using the Oot Bytecode AST data format and dispatcher, but using your own custom instruction definitions) in which the custom instruction definitions are themselves interpreted in terms of the previous link of the chain.

---

So we seem to be converging towards a concept for how Oot and Oot Core and Oot Bytecode relate (not much changed since our previous stuff but i'll restate it anyways).

Oot is syntactic sugar for Oot Core. Oot is implemented by using Oot Core's metaprogramming constructs. Oot is compiled to Oot Core, and metaprogramming reflecting on Oot code sees an Oot Core AST.

Oot Core is more verbose than Oot mostly because it is more explicit. Oot Core is a simple, powerful, high-level language. It is high-level in that its implementation is expected to provide 'administrative' services such as (possibly limited to acyclic reference counting) garbage collection and scheduling. Oot Core compiles to Oot Bytecode. Oot Core is implemented in Oot (which is then compiled to Oot Bytecode to provide bootstrapping).

Oot Bytecode serves two different functions.

First, Oot Bytecode is a language in itself, which is the target language for Oot Core.

Second, Oot Bytecode is a format for representing ASTs. In particular, it is the format for representing Oot Core ASTs internally. The semantics of instructions in the AST can be modified; the usual semantics is only a default. For languages whose syntax doesn't cleanly fit in the default Oot Bytecode MEDIUM format, an extensible LONG format is available.

These functions impose various requirements.

Because Oot aspires to be ported to many platforms, Oot Bytecode, as the target language for Oot, needs to be very easy to port. This means that Oot Bytecode needs to be very easy to naively implement (and then incrementally improved), in order to allow Oot to be easily ported to new platforms. To make it easy to implement, it has a very small set of primitives, so that an implementer need only implement these primitives, and a dispatcher. Other instructions are implemented in terms of these primitives; to incrementally improve an implementation, an implementer can then provide custom implementations for more and more non-primitive instructions. Oot Bytecode's stdlib provides implementations of services such as garbage collection and scheduling. This allows Oot Bytecode to be easily targeted to platforms (such as native assembly) that don't already provide these services, while also allowing interoperability with other platforms (such as Java or Golang or Python) that do (interoperability is achieved by Oot Bytecode implementations which override the relevant stdlib functions with calls to the platform services).

Because Oot aspires to be 'transpiled' to readable sourcecode of various other HLLs, Oot Bytecode, as the target language for Oot, needs to preserve high-level intent. For this reason, it must represent Oot Core at a sufficiently high-level so as to make it relatively easy to translate it to high-level constructs (such as 'for' loops) in the target HLL.

So, Oot Bytecode is the level at which language services are ultimately expressed (even if they are orginally written in mostly in Oot). These services can be individually overriden by platform-native implementations when those are available on the target platform. Oot Core assumes the existence of these services and is more abstract. Oot is syntactic sugar metaprogrammed on top of Oot Core.

---

OK i skimmed flite, it looks great. Some other Haskell-y 'cores' to look into:

flite (f-lite): https://www.cs.york.ac.uk/fp/reduceron/memos/Memo9.txt SECD (Alice VM?) GHC Core

also: Klambda

Nock (already skimmed) is there OCaml bytecode? Smalltalk, Self, Io

what else?

---

capabilities are implemented underneath (not on top of) Oot Core

---

perhaps Root Core doesn't have capabilities, though

---

introspectable expressions

---

a key idea for 'Restricted' cores like ROot Core and RPython seems to be:

things are static, not dynamic; eg variables have a static type, the length of tuples is statically known, functions cannot be constructed
you can get at 'unboxed' primitives

---

yeah so i think some key aspects of Root Core vs Oot Core will be:

Oot Core is metaprogrammable, Root Core is not
Oot Core is dynamic, Root Core is fully static
Oot Core is boxed, Root Core is unboxed
Oot Core is above services, Root Core is below services
Oot Core has capabilities, Root Core trusts everyone (and implements capabilities for Oot Core)

the last one brings up a potential difficulty; if Root Core compiles into OotB?, but we wanted OotB? to already have capabilities. This suggests that either Root Core should have capabilities, or OotB? should not.

The problem with OotB? not having capabilities is that foreign OotB? code can get around any higher-level implementation of capabiliites by just working with the unboxed (raw) pointers. Note that this is the straw that broke the camel's back w/r/t SootB?; capabilities just seem like too much work for our bootstrapper to have to implement; if OotB? didn't have capabilities, maybe we could get away with merging OotB? and SootB?.

---

(generalized, regex) pattern matching should be in Oot Core

---

i'm struck how even LLVM, C, and EXE ( http://groupoid.space/exe.htm#raise ) have escape continuations/exceptions of some sort. It seems like even core languages include these. I note that EXE models them as effects, so i guess that's similar to Haskell's IO monad exceptions.

The reason why it's striking is that it seems simpler to me just to have GOTO, and it seems to me that 'effects' should be reserved for stuff like I/O, and exceptions are rather just control flow.

This makes me think that a fundamental tension causing a lot of complexity in programming languages is: how fundamental is the call stack? To put it another way, imperative with GOTO vs. procedural vs. functional; in imperative-w/-GOTO, control flow is simple; it's like an assembly language Program Counter (synonymous with Instruction Pointer) (PC, IP); in procedural, control flow is simple; 'while' loops; and in functional, control flow is simple; evaluate expressions like in math. But when you mix these together, you get tension; to implement procedural or functional in imperative with GOTO, you introduce a call stack. In the most basic versions of the procedural or functional paradigms, the call stack is inviolate; eg if you have an equation x = (2*y) + z, there's no way that you would get to y, and then in the middle of evaluating y, suddenly jump back up to give a result to x without evaluating z. But that's the sort of thing that happens with exceptions (or other continuations or 'non-local returns').

Similarly, 'break' and 'continue' are amazing metaprogramming magic if you think of them from the purely functional paradigm. Which is why it's notable that exceptions seem so common even in simple or core languages for these paradigms.

But if you look at the implementation of these things in imperative-with-GOTO, they are simple; you just mess with the stack a little.

---

" HNC stands for HN compiler. HN is a tool to facilitate writing C programs. It is a macro language for C. It is not embedded in C but uses a new syntax and a new type system to express C programs. ... If HN proves useful, we plan to write several tools on top of it:

    a compiler for a concurrent language for massively parallel supercomputers, in the spirit of SISAL;
    ports of HN to languages other than C, e.g. PHP and Javascript;
    further improvements of the language, e.g. quasi-quotation, regions, pi-sigma type system.

...

An example

When it's is ready,

    fold f x0 l =
        x = x0
        p = l
        while *p != NULL
            x := f x *p
            p := p->next
        x

    x = fold (\x y -> x + y + 1) 0 l

will get expanded into

    int x = 0;
    list_int *p = l;
    while (*p != NULL)
    {
        x += y + 1;
        p = p->next;
    }

A key point is that control abstractions such as the one provided by the fold macro above, though ubiqutous in code, cannot be abstracted away in C. Also note less punctuation, a full polymorphic type inference and genuine C memory model.

...

HN0 is the first version of HN language, having the very minimal set of constructs: a primitive macros working as always inlined higher order functions, an assignment, effectful statements, imperative control structures (while and if).

SPL stands for "Stream Programming Language" and is another syntax for a functional subset of HN0, inspired by J language. SPL and HN0 will share the same optimizer and back-end. "

is this related?

https://translate.google.com/translate?hl=en&sl=ru&u=http://vag.a.wiki-site.com/index.php/HN0&prev=search

---

"LISP (McCarthy?, 1958) that was built upon: cons, nil, eq, atom, car, cdr, lambda, apply and id. " -- [5]

---

[–]SrPeixinho?[S] 1 point 10 months ago

It would be a small subset of Haskell. It still captures most of the nice things about functional programming. ADTs + pure functions is the essence of Haskell to me and I'd not suffer that much from losing everything else. I imagine a very small language with System F on its core. It can perfectly have sum types, as I just showed on the example. Sum types can easily be compiled to readable code in common languages with tagged dictionaries. Of course, "if/else" isn't nearly as nice as matching, but it would work and be recognizable. Moreover, you get many of the good parts of Haskell (such as laziness) when you compile that language to Haskell.

    permalinkembedsaveparentgive gold

---

notes from OM/EXE

data and record (see below) (i might say 'codata' instead of 'record')
spawn, receive, send ('process calculus' things)
macros
types
normalization
'Logic'
- REC - Recursor
- INDUCTION - Dependent Eliminator
- EQU - Propositional Equality
- SIMPLE - Simplification, K combinator
- FREGE - Frege, S combinator
- PROP - Propositions
- fixpoint
case
Pattern Matching
let
I/O
- they have "IO - Input/Output Free Monad" and also "IOI - Infinity Input/Output Cofree Comonad"
laziness ("Lazy - Lazy Type")
bool
- true, false
maybe
nat(ural numbers)
list
- STREAM: head, tail; "stream is a record ((codata)) form of the list's cons constructor. It models the infinity list that has no terminal element. "
unit ('()'; "0-tuple")
1-Tuple ('Mon')
2-Tuple ('Prod')
Morte ('Recursive and Corecursive samples')
'sum' with constructors 'left, right' (is this sum type?)
'product' with constructors 'pr1' and 'pr2'; or constructor 'cons'
empty, or bot (bottom type)
OB (category theory object): elem, eq, point
HOM ('homomorphism'): elem, eq, point; 'elem' seems to be a mapping from one structure to another, 'eq' seems to be governed by a requirement that if x1 == x2 then elem(x1) == elem(x2), where '==' is the version of EQ in each structure, and 'point' seems to be governed by a requirement that elem(X.point) == Y.point, where X and Y are the structures standing in the homomorphic relationship (todo)
'option' (how is this different from 'sum'?)
AST (presumably used for macros?)
- identifiers, list literals, options, nat, parens, a -> b, f x, forall ( identifier : type ) -> whatever

lambda ( identifier : type ) -> whatever

eq
exists
vector (limited length list)
PURE ('return', probably like Haskell 'return')
FUNCTOR: fmap
APPLICATIVE: ap; also says 'extend' and then something about 'pure' and 'functor'
MONAD: join; also says 'extend' and then something about 'pure' and 'functor'
PROCESS
SIGMA (?): pr1, pr2 constructors
SETOID: ob, hom; 'hom' appear to have the attached requirement that for any X,Y in Ob, HOM(X,Y). Note that HOM is defined above as Homomorphism. Listed under 'logic', not 'coinductive', in the bottom. "Setoid in essence is a type with an equality." Wikipedia says, "In mathematics, a setoid (also called an E-set) is a set (or type) equipped with an equivalence relation...setoids may be used when a difference between identity and equivalence must be maintained"
GROUPOID: i dont understand this yet, see notes in plChProofLangs
effect
'extend' keyword in record typedefs
Om has constructors: star (has constructor 'index' which is a natural number; could this be like the * that means 'kind' in haskell?), var (variable; has constructor 'name'), app (application; has constructors func, arg), arrow, pi (both 'arrow' and 'pi' have constructors name (string), arg (Om), body (Om); perhaps these are like function definitions? elsewhere it says that pi is "universal quantifications")
try, raise
do (?)

notes on 'data' and 'record': inductive data type declaration 'data' and coinductive data type declaration 'record'. Eg (quoting http://c2.com/cgi/wiki?CoinductiveDataType 's syntax):

 type List A   = cons:(A,List A)->* | nil:()->*
 type Stream A = first:*->(A) & rest:*->(Stream A)

and quoting EXE syntax:

data list: (A:*) → * := (nil: () → list A) (cons: A → list A → list A)

record stream: (A: *) → * := (head: A) (tail: stream A)

("stream is a record form of the list's cons constructor. It models the infinity list that has no terminal element.")

my notes: why does 'nil' take '()'; shouldn't it just not take anything?

---

so just to review, some low-level core 'paradigms':

imperative assembly:

arithmetic: add sub, mb mul neg, mb div mod, shifts, bitwise
conditional branches, and comparisons: <= etc
absolute jump (goto)
immediate constants
loads, stores
primitive data: integers (signed/unsigned, various bitwidths), floats
composite data: pointers, linear address space

two-stack assembly (Forth):

data stack
call stack
stack ops like dup, swap, pick, roll, etc
loads, stores, loads and stores between stacks
integers

SK combinators:

projection
substitution

Nock:

SK
+integers
+cons cells
+structural equality

SECD:

apply (with partial application/closures)
environment
expression stack
call stack
am i missing something here?

Haskell:

+SECD
+cons-cells
ADTs, pattern matching

lambda calc:

lambda
function application
variables, and an environment assigning values to them

turing machine:

linear address space of bits
machine head

lisp:

normalize and reduce
+cons-cells

mu-recursion

Primitive data:

integers
floats
strings
bits
functions
(i put pointers under 'composite')

procedural:

(no goto, but:)
while
call/return
if and/or switch

hof:

map, fold, filter
am i missing something here?

oop:

getters, setters
call object method
am i missing something here?

Composite data paradigms:

cons cells (including car, cdr, isAtom, possibly amongst others)
lists (homo or hetero)
arrays (possibly multidim)
- linear address space
associative arrays (dicts)
pointers
tuples (and structs/named tuples)
discriminated unions

---

i guess you can think of both C and Forth as 'close to the metal'; 'portable macro assemblers' plus a little bit more.

Oot Assembly is supposed to be kind of on that level (although not quite; it isn't CTM for performance motivations, just for the sake of portability and interoperability). It isn't exactly Forth, because it has variables, not just a stack.

And it also has some functional stuff, such as the first-class functions.

---

and i guess Oot Core could start out as (minimal subset of: (C + Forth + Haskelly stuff), plus ditching the instruction encoding length constraints (eg instead of two positional operands, arbitrary multiple optional keyword inputs and outputs, etc)

of course Oot Core also has closures, automatic memory management, lenient evaluation, unevaluated-until-runtime AST expressions, metaprogramming, etc, so it's quite a bit more than this. But you might say that those are 'fancy semantics' and a few new constructs, in addition to the existing set of 'core constructs' given by (minimal subset of: (C + Forth + Haskelly stuff))

---

(polymorphic) typed lambda calculus (system F style?) looks like it may be just what we need (or at least, inspiration for the answer) for a simple (but verbose) notation for types that uses a similar syntax to types as for functions

eg the polymorphic identity fn:

λ(a : *) → λ(x : a) → x

(so, the first argument is a type, 'a'; the second is a value in that type; then, that value is returned)

the type of the previous function is:

∀(a : *) → ∀(x : a) → a

(from the first few sections of http://www.haskellforall.com/2016/04/data-is-code.html , which is a good tutorial for this)

---

elsewhere, i talked about unevaluated ASTs, that is, things that are even more dynamic than thunks because you can introspect on their ASTs at runtime.

What is the difference between these and just a quoted list in LISP?

i suppose the difference is that our 'unevaluated ASTs' are similar to lazy functions: if their evaluation is called for, they will evaluate like normal. In contrast to quoted lists in LISP, which are already in normal form; that is, their type is just 'AST', where our 'unevaluated ASTss type is whatever they reduce to as an expression.

---

tangentially, 3 defns of 'high level assembly language', given in http://www.plantation-productions.com/Webster/HighLevelAsm/HLADoc/HLARef/HLARef_pdf/01_Overview.pdf :

" David Salomon in his 1992 text "Assemblers and Loaders" (Ellis Horwood, ISBN 0-13-052564- (alternate, search)2) ...offers the following definitions for a High Level Assembler (or HLA): A high-level assembler language (HLA) is a programming language where each instruction is translated into a few machine instructions. The translator is somewhat more complex than an assembler, but much simpler than a compiler. Such a language should not have features like the if , for , and case control structures, complex arithmetic, logical expressions, and multi-dimensional arrays. It should consist of simple instructions, closely resembling traditional assembler instructions, and of a few simple data types. "

" A high-level assembler language (HLA) is a language that combines most of the features of higher-level languages (easy to use control structures, variables, scope, data types, block structure) with one important feature of assembler languages namely, machine dependence. "

" A "high level assembly language" (HLAL) is a language that provides a set of statements or instructions that practically map one-to-one to machine instructions of the underlying architecture. The HLAL exposes the underlying machine architecture including access to machine registers, flags, memory, I/O, and addressing modes. Any operation that is possible with a traditional assembler should be possible within the HLAL. In addition to providing access to the underlying architecture, the HLAL must provide some abstractions that are not normally found in traditional assemblers and that are typically found in traditional high level languages; this could include structured control statements (e.g., if, for, and while ), high level data types and data structuring facilities, extensive compile-time language facilities, run-time expression evaluation, and standard library support. A "High Level Assembler" is a translator that converts a high level assembly language to machine code. "

Oot Assembly doesn't meet these criteria because we don't expose the underlying architecture. The reason we are 'assembly-like' is just because we want to be easy to implement. I include these quotes, rather, because they list some things that Oot Core, at least, and mb Oot Assembly, will want to support.

---

the issue with whether to represent Oot Assembly expressions with POPs and PUSHs or with parens, and in the latter case, what happens in a context when they should be interpreted lazily eg top-down rather than bottom-up, suggests that that sort of issue should be bumped up to Oot Core, that Oot Core should be the interop language, not OotB? (even though OotB? can be a representation format for Oot Core), that we should use PUSHEs and POPs and leave expressions to Oot Core, and also provides some insight into the sort of way that Oot Core is going to be focused on defining the interpretation of Oot Core.

---

yknow, i don't think it's reasonable to say that Oot is just Oot Core + metaprogramming and therefore shares the same syntax, b/c:

we dont want the grouping syntax to be metaprogrammable in Oot
we want convenient grouping syntax in Oot
we want Oot Core to be simple to implement, which implies not too much 'sugar' grouping syntax

We could say that Oot Core contains metaprogramming operators for changing the syntax which are disallowed in Oot. But i guess that would look somewhat like defining how to parse a new language and compile it to Oot Core, right? So why not just do that.

So, how about instead that we want Oot's SEMANTICS to be just metaprogramming on top of Oot Core's semantics; but its syntax may have more grouping, etc. Of course, what we want is for Oot Core's syntax to be a 'desugared' version of Oot's

---