let's use something similar to the nanopass compiler framework: https://github.com/akeep/nanopass-framework/blob/master/doc/user-guide.pdf?raw=true , https://github.com/akeep/nanopass-framework
static analysis must be able to say whether a given symbol is an fexpr or not, at least in most cases
---
why is it that compiling one HLL to another, e.g. CoffeeScript? -> JavaScript?, causes complaints that it's a hack and it's hard to debug, whereas compiling a HLL to a existing VM, e.g. Clojure -> JVM or Elixir to ErlangVM?, doesn't?
Part of it may be that VMs are designed to be intermediate target languages and have appropriate facilities for that, whereas languages like JavaScript? weren't designed to be intermediate target languages. What are the 'appropriate facilities' that intermediate target languages should have?
- constructs for passing along debugging/src lineno information
- what else?
- todo this would probably be a good thing to ask Lambda the Ultimate someday (tangentially, also would be good to ask for IDE API pointers, and any news on Yegge's Grok, which was portrayed as an effort to provide a generic/standard API for that sort of thing)
Oot should be a good intermediate target language and so should support these features.
callGC, the function used to explicitly invoke the garbage collector, should return a bool to say if it's done or not, so that the program doesn't waste time calling it 100 times when it has nothing to do (a complicated version of a wasteful spinlock, i guess). there should be a function callGCDuration(duration) which calls callGC over and over again until either callGC says it's finished, or the given duration has passed.
4-stage implementation:
- oot
- oot core
- oot bytecode/oot assembly
- underlying platform (js, LLVM, etc)
What is the different between Oot Core, and Oot assembly (Oot bytecode), and why have both?
Second question first (why have Oot Bytecode in addition to Oot Core?)
In one sense, Oot Bytecode is just an implementation detail. Like Dalvik and the JVM, or Lua and LuaJIT?, one could imagine that some implementations might eventually replace Oot bytecode with something else, especially since the design goal of Oot Bytecode is not efficiency. I am not sure if we even want to support Oot Bytecode as a standard, stable language the way that the JVM supports Java bytecode (although, doing that seemed to work well for Java; it seems like Java has gained various third parties making tooling partially due to this step). But here are some benefits of bytecode:
- introducing an additional abstaction layer provides easier portability: now you only have to rewrite the bytecode interpreter for each new platform, instead of rewriting the oot core interpreter. It remains to be seen how much easier this will be.
- could potentially enable more radical experiments with Oot: by introducing an additional abstraction layer, the primitives implemented in the bytecode are likely to be more decoupled than they may otherwise be from the form of Oot Core, which would make it easier for us to do experiments on Oot core, while being supported by things like Copy-On-Write and automatic memory management already implemented in the bytecode interpreter
- introducing an additional abstaction layer makes the code more readable and provides clarity about what services the interpreter must provide (eg Copy-On-Write, garbage collection), as well as about how Oot Core conceptually relates the platform (eg. introducing abstractions like stack frames)
- could even be a platform for multiple languages: just as there are multiple languages on the JVM and on the CLR, if we DO stabilize the bytecode, then the bytecode interpreter could in theory be a platform for languages other than Oot. However, history has not shown multiple languages building on the same VM until the first language the VM has designed for becomes big, and in addition it seems like there is always something wrong with other languages' VMs that make it unsuitable for your new language. So this is unlikely. If it were to happen, though, it could help get more developer attention for the Oot VM.
- compiling to bytecode represents a 'semi-compilation' step after which we could save the bytecode to disk; this means that the expensive parsing of textual Oot can be done once, and in future runs an interpreter only has to interpret the bytecode (also, interpreting bytecode may be quicker than interpreting the AST directly).
However, the main reason i'm thinking of Oot bytecode is different. The main reason is that I am trying to think about what the small 'core' of Oot is, and one way to think of that is to think about what an Oot Assembly would look like (what are the primitive instructions? what flags are in the bytecode? do we have addressing modes, and if so, which ones?). In addition, i really just want to explore all of the main computational paradigms, and all of the simplest and most fundamental languages, and in a certain sense assembly is very fundamental; thinking about how an Oot bytecode and Oot assembly would look helps me appreciate the design problems and choices made by assembly languages, which gives me inspiration for Oot.
In addition, bytecode serves as a thought experiment for implementation issues. It helps me clarify my thinking early about issues like what does the stack look like, where are various things stored, what are the internal data representations of common types, etc.
Now, the other question; what is the difference between Oot Core and Oot Assembly?
- oot core is made of expression trees, while Oot Assembly is a flat, linear sequence of instructions
- like Haskell, the ordering of statements in Oot Core is not necessarily the order of execution (eg 'a = b*c; c = range(0,infinity)[3]*5; b = c*2; print a'); in Oot Assembly, the statements are in order of execution (except for branching/jumping instructions, of course)
- oot core is a graph made out of nodes each which can have arbitrary size, whereas Oot Bytecode instructions are of fixed size (not all bytecodes must be fixed size, but ours is; but even other bytecodes tend to have a small fixed upper limit on the size of instructions)
- oot bytecode is 'imperativized': eg transaction annotations, exceptionhandling annotations, etc, are turned into instructions, for easy implementation
- long things like a.b.c.d.e which must be executed in multiple steps are turns into multiple steps in oot bytecode
---
i think we want to support both lexically-scoped closures, and dynamic scoping, and first-class call stacks.
so, instead of 'stack frames' on the stack we want to allocate 'stack frames' on the heap; call them 'activation records' or my preferred term, 'activation frames'
in order to do lexical upvariables, we'll need these closures to remember the lexical context they came from, eg by using a saguaro stack: https://en.wikipedia.org/wiki/Parent_pointer_tree#Use_in_programming_language_runtimes
so even though the call stack (activation frame stack) is just a linear stack (and can handle the dynamic scoping), we still have to use a 'saguaro stack' to remember lexical scoping
(and what about the tree-like 'handler stack'?)
---
in case i haven't written this down anywhere, the idea is that we make 5 things (2 of them repeated for each platform):
- for each platform, a naive/reference/toy implementation of Oot Core on that platform, written in another platform language that is (a) very simple, readable, and easy to follow, (b) concise, but not necessarily efficient.
- a naive/reference/toy implementation of Oot in Oot Core that is (a) very simple, readable, and easy to follow, (b) concise, but not necessarily efficient.
- a naive/reference/toy implementation of Oot in Oot that is (a) very simple, readable, and easy to follow, (b) concise, but not necessarily efficient. This implementation has hooks for overrides for the implementation of various things, and for optimization extensions (these can be: more efficient implementations of certain language constructs or standard library functions; more passes/replacing passes, and extra data and annotations generated during optimization). These overrides may be cross-platform or platform-specific.
- We then write an ugly, big, efficient implementation using these hooks; the rule being that any changes to language semantics must be done in the reference implementation too; the reference and the ugly implementations must have identical semantics, the ugly implementation is only a more efficient implementation of the same language. The idea is that (a) the reference implementation can fit in each person's head, (b) the function of each component of the ugly implementation and its overall position in the compilation pipeline can be gleaned from where it is hooked in.
- for each platform, an efficient implementation of Oot Core in that platform, written in Oot
if desired, for some platforms an ugly, big efficient implementation of Oot may then be created by starting with the naive Oot-in-Oot, adding the ugly-Oot-in-Oot-Core extensions, and then adding further platform-specific extensions.
The idea is that:
- Oot 'fits in your head'; all there is to understand is Oot Core, and then the simple reference implementation of Oot in Oot Core
- Oot Core 'fits in your head'; all there is to understand is the simple reference implementation of Oot Core on a platform that you already understand
- new language features are first written a simple (but inefficient) way in the pure Oot reference implementation
- new language features added to Oot are easy to understand, because they are first written in Oot in the simple reference implementation
- Easy portability: to bootstrap Oot on a new platform, port the simple reference implementation of Oot Core to it; that's all the coding you have to do. Then run the naive-Oot-in-Oot-Core implementation on that to get an inefficient Oot implementation. Then run the ugly-Oot-in-Oot extended version to get a more efficient Oot implementation.
- if you want to make some platform more efficient, add platform-specific overrides to ugly-Oot-in-Oot
Note that like Nocks's 'jets', Oot and Oot Core both support libraries written in pure Oot that are then overridden by platform-specific implementations, which may be alternate hand-written Oot Core implementations, or implementations in some external language such as C.
---
someday write a backend for Boot for
http://www.capstone-engine.org/
---
The Implementation of Lua5.0 section 5 "Functions and Closures" describes a simple implementation of closures using indirect pointers called 'upvalues'.
---