Bayle Shanks's website: proj-oot-ootCore

Oot Core is an intermediate language (IL) for Oot.

Goals:

Bootstrapping Oot: a small language that can be easily implemented yet that we can write Oot in, so that we can bootstrap porting by implementing Oot Core on a new platform, and then writing Oot in Oot Core
A kernel language that defines the model of computation that Oot is built upon
A kernel language that could be the target of many other high-level languages, not just Oot (like JVM or CLR bytecode)
Simple and regular
- Few opcodes
- Orthogonal opcodes
- Simple to understand
- Oot Core interpreter can be implemented with little code: This is one form of simplicity. This is needed to make Oot feasible for embedded platforms. This will also help things stay in cache on desktops.
- Possibility of high code density: This is one form of simplicity. This is needed to make Oot feasible for embedded platforms. This will also help things stay in cache on desktops.
- Low-level constructs (possibly in std library): This language should be able to interoperate with C libraries and express anything that C can, allowing it to be used for systems work
Portable bytecode interpreter: a language that Oot programs can be compiled to that can then be efficiently intepreted
The only IL that Oot toolchain folks really need to know
Amenable to proofs
Secure: if we have capabilities or anything like that, they should be in Oot Core, so that a proof of secureness can be written in terms of the simple Oot Core, rather than the complex Oot

Non-goals:

Execution speed: if people want fast execution speed, they should compile Oot directly to their platform's machine code
Human read/writability: Similar to e.g. GHC Core, we don't expect people to use this language; it can require excessive type annotations, awkward/lengthly syntax, boilerplate, etc
High code density in initial implementation: We don't want to actually spend time making a very memory-efficient implementation, we just want the possibility to do that later, and for this design constraint to remind us to keep things simple
A low-level IL for compiler writers, just before native code genaration: This is not like Haskell's Cmm or STG
A stepping stone between Oot and typical assembly language: We don't care about representing today's machine models
A kernel language that necessarily includes all of the most important parts of Oot: We want Oot Core to define a model of computation, but not to define the structures that Oot provides to make a programmer's life easier. It's possible that some of the main constructs in Oot will end up in the latter category.

Properties:

Includes metaprogramming: We want to be able to write Oot in Oot Core incrementally, e.g. to be able to write parts of Oot in layers and to use the previous layer's constructs while writing the next layer
No syntactic sugar: Syntactic sugar is just for making programmer's lives easier, at the expense of complexity. That's what Oot is for, not Oot Core.
Statically typed: RPython, GHC Core, MoarVM? all seem to demonstrate that an efficient kernel language requires this
Limits on runtime metaprogramming: RPython, GHC Core, MoarVM? all seem to demonstrate that an kernel language requires some restrictions on metaprogramming in order to permit efficiency
Because of its simplicity and amenability to proofs, this is an intermediate language in which some optimizations might be performed

Oot core design todos

linear or tree or graph representation?
are you sure you don't want a 2nd IL rather than making Oot Core do everything?

Operations in some opcodes

add, subtract, multiply, divide
- or is arithmetic encapsulated as 'arbitrary primitive ops on unboxed primitive atoms' rather than given its own opcodes? then there is an 'op lookup table'? ops can be polymorphic in this limited way?)
bitfield stuff (set, clear, any, all, and also see below)
right shift
arithmetic left shift
logical left shift
load, store
jump
conditional jump
compare
negate (e.g. 3 -> -3)
and, or, not, xor
call (can be used to call external library functions)
graph manipulation
vector ops
matrix ops?
Nock 'nock'/SKI combinator 'substitute'?
evaluate expression? apply function?
distinction between pure and imperative code? big expression/imperative divide in Oot Core? what about sequenced/unsequenced imperative? also, should expression have a specified evaluation order, or should they be unordered (if expression = functionally pure (except for debugging) then i guess they should be unordered?)
fixed bit-width arithmetic, or only multiple precision?
looping constructs (for, while, foreach)
__get, __set?
inverses?
relations?
tail calls
'approximate mode/type' for numbers where we don't care about the bit width and just want to use what the platform recommends?

Stack ops?

push, pop
ROT
return stack
data stack

Concurrency

memory barrier?
fork, join
greenthread scheduler?
promises?
async call/await?
transactional memory?
message sending, queueing?
threadlocals, monitored shared memory, immutables, uniques?
map, reduce?
SIMD (GPU-like) stuff?
channels
DSM with various memory models? atomics?
mutexes?
condition variables? pthreads?
shared queues?
what else?

I/O?

or should this be a std lib?

memory

malloc/heap allocation?
stack allocation?
GC?

ooty stuff

boundaries? (i think so)
annotations? (i think so)
copy-on-write?
references?
views?
- node splits/combinations'from', other 'lineage' stuff if any (@)
lazy evaluation strategy?
boxed/unboxed?
- extensible primitive types: unboxed stuff like int, string, float could be thought of inscrutable 'atoms', which are treated in a uniform way, and which can have primitive 'operations', which are platform primitives or external libraries, attached to them (this would prevent unwanted opcode proliferation by allowing us to have the concept of 'addition' instead of one opcode for adding ints, one for adding floats, etc). if we use intersection types then this allows the polymorphic system to encompass concepts like 16-bit vs. 32-bit floats, and also possibly addressing modes (although perhaps indirection/pointers/references should be its own thing in Oot core). these operations could be imperative and strict (but should they be? surely there are primitive expression ops too?)
- i guess a pointer to a numerical location in memory is also an unboxed atom?
tree/graph representation of code?
bounds-checking?
types?
polymorphic? or has that been resolved by now? (i'm guessing the latter)
option/maybe types?
capabilities
ontology stuff?
semantics? (i think not)
constraint satisfaction?
lazy/eager markings? (probably so, see Strict Core for GHC)
primitive recursion?

metaprogramming

macro/custom opcodes
other typesafe compile-time macros?
fexprs (probably not, for efficiency)
virtualization, computable goto

Links

See/learn from programmingPartTargetLanguages.

---

Bytecode format

Self-describing format

Oot allows an arbitrarily large number of local variables, etc, and works on any pointer width. Yet in Oot Core's bytecode, we want to fix the size of instructions and of various bitfields. How to reconcile these?

In the header of each Oot program is a set of arbitrarily large self-describing integers. Here's how they work:

the first byte contains various flags. One of these flags says 'skip the self-describing stuff and accept the minimum length
if self-describing was not skipped, then there is a flag in the first byte which says 'bit-width descriptors are 4 bits, or they are arbitrarily long'
if they are four bits, then there are four bytes, each of which describes the bit widths of two things (see ootCoreThoughts)
if they are arbitrary, then there are at least eight bytes, each one of which says the bid-width of one thing. If the value of any of these is 255 (the max), that means it is followed by TWO bytes, to describe the bit-width of something (note that in this case we are describing bit-widths sufficient to hold integers larger than the approximate number of atoms in the universe; this may be useful for obscure descriptive purposes but will probably be unusual). If both of those two bytes are all 1s (65535), then that means it is followed by FOUR bytes, which describe the bit-width. Etc; so each of the 8 fields is of variable length, but only in very unusual circumstances.