ideas-computer-jasper-jasperCompilationPipeline

Stages

user source filters -> scanner -> parser -> language reader macro application -> user reader macro application -> language compilation time macro application -> user compilation time macro application

Each of these steps is available to be called separately, and can be swapped out in a modular fashion.

---

User source filters

The code can optionally be passed to user program(s) to transform.

Scanner

The purpose of the scanner is to turn our source code, which is a stream of characters, into a stream of tokens.

The output is a sequence of tokens. Tokens are divided into string constants, EOL, EMPTYLINES, other grouping symbols, numbers, and preidentifiers. Numbers start with numbers and consist of non-whitespace. Preidentifiers start with letters or symbols and contain anything except whitespace.

---

Note that although the second-stage lexer, the stage 1 parser, and the stage 2 parser can most easily be explained as separate stages, they can most efficiently all be done at the same time, by processing the tokens sequentially. For this reason, they are not separate modules, but are all submodules of one module, the 'parser'. The user can still swap out these submodules, however (they are delineated by 'what do you do when you reach this token').

Second-stage lexer

The purpose of the second-stage lexer is to transform pre-identifiers. Pre-identifiers transform into tightly grouped sequences of preidentifiers, into labels, and into identifiers. The reason this is here is to process the interaction between whitespace and grouping giving by the syntax rules that say that symbols next to alphanumeric identifiers without whitespace are infix bound to them.

now that we have broken up preidentifiers, we classify them:

output of second-stage lexer

Now our tokens are string constants, EOL, EMPTYLINES, other grouping symbols, numbers, uppercase labels, capitalized labels, identifers.

Stage 1 Parser

The purpose of the stage 1 parser is to process the auto-grouping syntax rules.

Stage 2 Parser

The purpose of the stage 2 parser is to converts the sequence of tokens into a tree structure.

It simply iterates through the sequence, descending upon opening grouping constructs, and ascending upon closing grouping constructs. The } closing grouping construct may cause multiple ascents.

output of parser

a tree structure with tokens in it, with the tokens of type string constants, numbers, uppercase labels, capitalized labels, identifers. The tree has various types of boundaries within it, including blocks and data structures. At this stage, whitespace and grouping constructs have been completely processed and are no longer present.

Note that, as of this stage, the only part of the Jasper language which has been processed is the low-level syntax needed to define identifiers and to do grouping. Everything previous to here is fairly general and could be easily used for another language that wanted to use Jasper valid identifier and auto-grouping rules.

---

Language reader macro application

Identifers which match a reader macro pattern in the language reader macro table (perhaps this matching is done at lexing time) have that reader macro applied to it.

These are macros which are integral to the syntax of the language, e.g. to grouping or to the interpretation of literals.

Perhaps we should make use of Clojure's convention of # as a reader macro prefix?

todo

---

User reader macro application

Identifers which match a reader macro pattern in the user reader macro table (perhaps this matching is done at lexing time) have that reader macro applied to it.

This gives the user a chance to transform their program after parsing but before the application of the language-level macros. This is discouraged unless the user really needs to expand the language in deep ways, because it may make the code harder to read by others by violating reader's expectations of how core language constructs work. Generally speaking, taking advantage of this feature can be thought of as the creation of a new dialect of Jasper, rather than merely a new library; the hope is that the new language features created in this fashion will be used experimentally and then either be adopted into standard jasper or discarded, rather than splintering the Jasper community (although inevitably some small communities will persist in using various of these dialects). If more prosaic uses of this feature are found, the aim is to expand Jasper to provide special-purpose metaprogramming functionality for those uses, so that the only remaining use of this feature will be experimental dialect creation. Note that by using this feature, you can override Jasper's reserved words.

---

Language compilation time macro application

The bulk of language features, reserved words, etc, are implemented here.


User compilation time macro application

Perhaps the user also has some macros that they want expanded at compile time.



Cross-compilation

In order to initially create Jasper, i suppose i'll have to code up the compilation pipeline in another language. I will put as much as possible into the Jasper reader and language-level macros, and as little as possible into the core language, to make this as easy as possible.

Shen has a similar approach and has already been ported to a wide variety of platforms, so maybe i should initially target Shen. Otoh, Clojure is more popular and targets JVM, Clojure, and Javascript, which are most of what i'd like anyhow (i'd also like Python; Shen has Ruby, but neither have Python).

see also http://blog.fogus.me/2012/04/25/the-clojurescript-compilation-pipeline/

here's the 45 primitive functions of Shen (a sublanguage called Klambda): http://www.lambdassociates.org/specification/shen_1.8.htm#The%20Primitive%20Functions%20of%20K%20Lambda

hmm, that looks good. Mb i should just target Klambda! na, may as well target Shen if targetting Klambda.. although should have a proto-Jasper that is comparable in power to Klambda.

So, i guess the plan is:

something i wrote on this earlier:

If i were to implement it, i would probaby write a Jasper core interpreter in Jasper core, then a Jasper interpreter in Jasper (this would give a chance to write something in the language to see which other language changes should be made), then a Jasper Core->Haskell compiler in Haskell (at this point Jasper Core could be compiled to Haskell and hence run, and the JC interpreter could be debugged), then a Jasper->Jasper Core compiler in Jasper Core l (at this point Jasper could be compiled to Haskell and hence run, and the full interpreter could be debugged), then a Jasper->Jasper Core compiler in Jasper, then a Jasper->Haskell compiler in Jasper (at this point the Haskell compiler code could be obsoleted and further changes to the language or the compilation to Haskell could be written in Jasper itself).