Bayle Shanks's website: ideas-computer-jasper-jasperCompilationPipeline

Stages

user source filters -> scanner -> parser -> language reader macro application -> user reader macro application -> language compilation time macro application -> user compilation time macro application

Each of these steps is available to be called separately, and can be swapped out in a modular fashion.

---

User source filters

The code can optionally be passed to user program(s) to transform.

Scanner

The purpose of the scanner is to turn our source code, which is a stream of characters, into a stream of tokens.

throw out comments
take out string constants
a valid token is a sequence that does not have whitespace; or an EOL; or an EMPTYLINES (two or more newlines separated by nothing but whitespace)

The output is a sequence of tokens. Tokens are divided into string constants, EOL, EMPTYLINES, other grouping symbols, numbers, and preidentifiers. Numbers start with numbers and consist of non-whitespace. Preidentifiers start with letters or symbols and contain anything except whitespace.

---

Note that although the second-stage lexer, the stage 1 parser, and the stage 2 parser can most easily be explained as separate stages, they can most efficiently all be done at the same time, by processing the tokens sequentially. For this reason, they are not separate modules, but are all submodules of one module, the 'parser'. The user can still swap out these submodules, however (they are delineated by 'what do you do when you reach this token').

Second-stage lexer

The purpose of the second-stage lexer is to transform pre-identifiers. Pre-identifiers transform into tightly grouped sequences of preidentifiers, into labels, and into identifiers. The reason this is here is to process the interaction between whitespace and grouping giving by the syntax rules that say that symbols next to alphanumeric identifiers without whitespace are infix bound to them.

convert all preidentifiers with two or more dashes as follows: x--y --> (-- x y)
convert preidentifiers that start with a symbol and contain alphanumerics as follows e.g. ++ar --> (++ ar) (note: you may have expected something like {++ _ ar}, e.g. partial application of ++ as if it were infix and the second argument were not specified, but that would mess up things like -3; use _++ar if you want to partially apply ar as the second argument)
break up preidentifiers that start with a letter and end with symbols as follows, e.g. abc++ -> (++ abc)
break up preidentifiers that start with a letter and contain symbols and then more alphanumerics as follows, e.g. abc++efg -> (++ abc efg)
break up preidentifiers recursively (should we right or left associate?), e.g. abc++efg++hij --> (++ abc efg++hij) --> (++ abc (++ efg hij))
tight infixification of single identifiers surrounded by attached parens

now that we have broken up preidentifiers, we classify them:

uppercase labels start with an uppercase letter and contain only
capitalized labels start with a capital letter and contain only alphanumerics
identifiers either start with a lowercase letter and contain only alphanumerics, or start with a symbol and contain only symbols and numbers

output of second-stage lexer

Now our tokens are string constants, EOL, EMPTYLINES, other grouping symbols, numbers, uppercase labels, capitalized labels, identifers.

Stage 1 Parser

The purpose of the stage 1 parser is to process the auto-grouping syntax rules.

auto-add colons (in some cases semicolons)
convert EMPTYLINES to opening and closing braces (note that beginning-of-file and end-of-file must be treated as EMPTYLINES for this purpose)
convert colons and semicolons to parens (but leave something in to indicate the sequentialness of semicolons)
loose infixification

Stage 2 Parser

The purpose of the stage 2 parser is to converts the sequence of tokens into a tree structure.

It simply iterates through the sequence, descending upon opening grouping constructs, and ascending upon closing grouping constructs. The } closing grouping construct may cause multiple ascents.

output of parser

a tree structure with tokens in it, with the tokens of type string constants, numbers, uppercase labels, capitalized labels, identifers. The tree has various types of boundaries within it, including blocks and data structures. At this stage, whitespace and grouping constructs have been completely processed and are no longer present.

Note that, as of this stage, the only part of the Jasper language which has been processed is the low-level syntax needed to define identifiers and to do grouping. Everything previous to here is fairly general and could be easily used for another language that wanted to use Jasper valid identifier and auto-grouping rules.

---

Language reader macro application

Identifers which match a reader macro pattern in the language reader macro table (perhaps this matching is done at lexing time) have that reader macro applied to it.

These are macros which are integral to the syntax of the language, e.g. to grouping or to the interpretation of literals.

Perhaps we should make use of Clojure's convention of # as a reader macro prefix?

todo

---

User reader macro application

Identifers which match a reader macro pattern in the user reader macro table (perhaps this matching is done at lexing time) have that reader macro applied to it.

This gives the user a chance to transform their program after parsing but before the application of the language-level macros. This is discouraged unless the user really needs to expand the language in deep ways, because it may make the code harder to read by others by violating reader's expectations of how core language constructs work. Generally speaking, taking advantage of this feature can be thought of as the creation of a new dialect of Jasper, rather than merely a new library; the hope is that the new language features created in this fashion will be used experimentally and then either be adopted into standard jasper or discarded, rather than splintering the Jasper community (although inevitably some small communities will persist in using various of these dialects). If more prosaic uses of this feature are found, the aim is to expand Jasper to provide special-purpose metaprogramming functionality for those uses, so that the only remaining use of this feature will be experimental dialect creation. Note that by using this feature, you can override Jasper's reserved words.

---

Language compilation time macro application

The bulk of language features, reserved words, etc, are implemented here.

User compilation time macro application

Perhaps the user also has some macros that they want expanded at compile time.

Cross-compilation

In order to initially create Jasper, i suppose i'll have to code up the compilation pipeline in another language. I will put as much as possible into the Jasper reader and language-level macros, and as little as possible into the core language, to make this as easy as possible.

Shen has a similar approach and has already been ported to a wide variety of platforms, so maybe i should initially target Shen. Otoh, Clojure is more popular and targets JVM, Clojure, and Javascript, which are most of what i'd like anyhow (i'd also like Python; Shen has Ruby, but neither have Python).

here's the 45 primitive functions of Shen (a sublanguage called Klambda): http://www.lambdassociates.org/specification/shen_1.8.htm#The%20Primitive%20Functions%20of%20K%20Lambda

hmm, that looks good. Mb i should just target Klambda! na, may as well target Shen if targetting Klambda.. although should have a proto-Jasper that is comparable in power to Klambda.

So, i guess the plan is:

first thing to do is to write a Jasper interpreter in Jasper, to see what's missing and improve it before the language is implemented
then write a Jasper-to-proto-Jasper compiler
- now we have an interpreter in proto-Jasper
create a proto-Jasper interpreter in Shen
- now we have a Jasper interpreter that can run on various platforms
later on, if we have the need for speed, write a proto-Jasper interpreter directly on the platforms
later on, if we have the need for speed, write a Jasper compiler in Jasper

something i wrote on this earlier:

If i were to implement it, i would probaby write a Jasper core interpreter in Jasper core, then a Jasper interpreter in Jasper (this would give a chance to write something in the language to see which other language changes should be made), then a Jasper Core->Haskell compiler in Haskell (at this point Jasper Core could be compiled to Haskell and hence run, and the JC interpreter could be debugged), then a Jasper->Jasper Core compiler in Jasper Core l (at this point Jasper could be compiled to Haskell and hence run, and the full interpreter could be debugged), then a Jasper->Jasper Core compiler in Jasper, then a Jasper->Haskell compiler in Jasper (at this point the Haskell compiler code could be obsoleted and further changes to the language or the compilation to Haskell could be written in Jasper itself).