ideas-computer-jasper-jasperImplementationNotes1

hmm guile can be embedded, perhaps target guile. todo.

---

Platform targets

Lower priority targets:

Initial target

We plan to bootstrap by targetting a high-level language that is not necessarily embeddable/interoperable with popular languages, but rather, super-good at metaprogramming and hence well-suited for designing programming languages in. We'll write a Jasper interpreter in this. Then once we become (once we have a running Jasper interpreter in Jasper), we'll write a Jasper-to-X compiler for the secondary target.

Candidates: Racket, Scala, Haskell.

Secondary target

To provide maximal interoperability, we'll want to compile Jasper to an existing high-level multiplatform language. So far Lua seems like the best choice, but http://asmjs.org/spec/latest/ is close (note that LLVM bycode is compilable to asm.js).

Actually, an idea: to define RJasper, take the intersection of asm.js, JVM, CLR, LLVM, KLambda, RPython, Lua, C-, emscripten's LLVM ( https://github.com/kripken/emscripten/wiki/CodeGuidelinesAndLimitations ), nock, Dis. or something like that. maybe we want the union rather than the intersection, or more properly speaking, something in between these two (e.g. each fundamental construct from each of these, so not a minimal basis set, but neither an exact embedding of each of these in the result).

Other known potential candidates for the bootstrap target: Clojure, Guile, Shen, Scala, Fantom, Neko. The bootstrap target should probably support C embedding, or both JVM, CLR, and at least one other target.

Pros and cons:

Tertiary target

To provide maximal portability, we could also write a Jasper-to-C compiler (written in Jasper).

misc notes

http://stackoverflow.com/questions/809710/what-is-the-best-language-to-write-a-compiler-in

notes on interoperability between the jvma and the clr:

http://stackoverflow.com/questions/3602091/call-clr-code-from-jvm?rq=1

http://stackoverflow.com/questions/11825220/jvm-over-clr-and-viceversa

http://xmlvm.org/clr2jvm/

http://www.badlogicgames.com/wordpress/?p=2583

For future reference i should note Neko VM and LLVM

http://www.boredomandlaziness.org/2012_07_01_archive.html

http://stackoverflow.com/questions/2167610/what-is-a-good-vm-for-developing-a-hobby-language

http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#q=jvm+clr+vm+lua&hl=en&client=ubuntu&hs=Atu&tbo=d&channel=fs&ei=iyEDUaDcO9Ko0AGgroDQAg&start=20&sa=N&fp=1&biw=1340&bih=1984&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&cad=b&sei=fSEDUeuxJujw0QHt3YHQAg

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#hl=en&client=ubuntu&tbo=d&channel=fs&sclient=psy-ab&q=jvm+clr+vm+neko&oq=jvm+clr+vm+neko&gs_l=serp.3...8176.9088.0.9347.4.4.0.0.0.0.245.853.0j2j2.4.0.les%3B..0.0...1c.1.gRWvjWZ1GlA&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8e0d657e9b9006fd&biw=1340&bih=1984

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#hl=en&client=ubuntu&tbo=d&channel=fs&sclient=psy-ab&q=jvm+clr+vm+parrot&oq=jvm+clr+vm+parrot&gs_l=serp.3..33i21.6056.6808.1.6904.6.6.0.0.0.1.314.1140.0j1j3j1.5.0.les%3B..0.0...1c.1.ZktoKMvpdm8&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8e0d657e9b9006fd&biw=1340&bih=1984

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#hl=en&client=ubuntu&tbo=d&channel=fs&sclient=psy-ab&q=jvm+clr+vm+llvm&oq=jvm+clr+vm+llvm&gs_l=serp.3...7259.7634.2.7826.4.4.0.0.0.0.314.750.2-2j1.3.0.les%3B..0.0...1c.1.78hjhaKUYa4&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8e0d657e9b9006fd&biw=1340&bih=1984

" make bootstrap

The 'bootstrap' target does not just compile GCC, but compiles it several times. It uses the programs compiled in a first round to compile itself a second time, and then again a third time. It then compares these second and third compiles to make sure it can reproduce itself flawlessly. This also implies that it was compiled correctly.

"

links about implementing languages in racket, scheme, scala, haskell

racket or scheme

the racket compiler is self-hosting

http://cs.brown.edu/courses/cs173/2012/book/ http://matt.might.net/articles/implementing-exceptions/ http://docs.racket-lang.org/guide/performance.html http://pl.barzilay.org/lec03.txt http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket http://cs.brown.edu/courses/cs173/2012/OnLine/ http://cs.brown.edu/courses/cs173/2012/Assignments/Python2/designs.html http://jeremykun.com/tag/racket/page/2/ http://stackoverflow.com/questions/10449052/peter-norvigs-regular-expression-compiler-from-udacity-course-rewritten-in-rack http://stackoverflow.com/questions/3345397/how-is-racket-different-than-scheme https://github.com/geoffhill/lc-compiler http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf

scala

the scala compiler is self-hosting

http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ https://github.com/frolovv/scheme-interpreter-in-scala http://niklaskorz.de/post/40418904638/small-bf-interpreter-in-scala http://brianmckenna.org/blog/sexp_scala http://stackoverflow.com/questions/9064292/brainfuck-compiler-in-scala http://code.google.com/p/lua-compiler-in-scala/ http://tomlee.co/category/compiler-construction-with-scala-and-bcel/ http://tomlee.co/2008/07/jvm-compiler-construction-with-scala-and-bcel-part-15/ http://www.slideshare.net/thomaslee/open-source-compiler-construction-for-the-jvm-lca2011-miniconf http://www.furidamu.org/scalisp/pres/index.html www.irisa.fr/triskell/publis/2010/Fouquet10a.pdf about using scala as a target language

haskell

the haskell compiler is self-hosting

http://www.haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_interpreters http://www.haskell.org/haskellwiki/Applications_and_libraries/Compiler_tools http://www.mail-archive.com/haskell-cafe@haskell.org/msg59160.html

a 1987 book by one of the main Haskell guys about how to compile functional programming languages: http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

http://stackoverflow.com/questions/5451645/implementing-lazy-functional-languages http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://propella.blogspot.com/2009/04/prolog-in-haskell.html http://jonathan.tang.name/files/scheme_in_48/tutorial/overview.html http://www.defmacro.org/ramblings/lisp-in-haskell.html https://github.com/azylman/scheme-interpreter-in-haskell http://codereview.stackexchange.com/questions/18981/a-stack-based-language-interpreter-in-haskell http://stackoverflow.com/questions/6271285/does-haskell-have-something-like-gensym-in-racket http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://stackoverflow.com/questions/6299406/code-generation-for-compiler-in-haskell http://stackoverflow.com/questions/5234398/is-there-a-book-which-teaches-about-compilers-using-haskell http://alephnullplex.appspot.com/blog/view/2010/01/12/lbach-1-introduction https://github.com/wecing/compiler-in-haskell https://github.com/adrianomelo/c-compiler-in-haskell http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.1175 http://blog.poucet.org/2007/06/simplistic-compiler-in-haskell/ http://stackoverflow.com/questions/679815/haskell-how-to-best-to-represent-a-programming-languages-grammar http://www.cs.uu.nl/wiki/bin/view/Center/CompilerConstructionInHaskell http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://www.cs.uu.nl/wiki/bin/view/Cco/CourseSchedule (taught by one of the architects of Utrecht Haskell Compiler) http://mahler.cse.unsw.edu.au/webcms2/course/index.php?cid=2202 https://github.com/thsutton/passingcuriosity.com/blob/master/_drafts/2011-01-24-compiler-construction-jvm.md http://home.adelphi.edu/sbloch/class/archive/272/spring2012/syllabus.html http://chplib.wordpress.com/2010/04/07/tock-a-compiler-for-concurrent-languages-written-in-haskell/ http://www.cse.unsw.edu.au/~dons/blog/2006/12/11#interpreters-with-reader-monads http://www.defmacro.org/ramblings/lisp-in-haskell.html http://en.wikibooks.org/wiki/Haskell/Write_Yourself_a_Scheme_in_48_Hours http://www.mail-archive.com/haskell-cafe@haskell.org/msg59166.html http://www.cs.uu.nl/wiki/HUT/WebHome http://www.mail-archive.com/haskell-cafe@haskell.org/msg59358.html

general

http://stackoverflow.com/questions/1669/learning-to-write-a-compiler http://compilers.iecc.com/crenshaw/ https://class.coursera.org/compilers//lecture/preview http://www.dreamincode.net/forums/topic/268945-an-introduction-to-compiler-design-part-ii-parsing/

misc or unsorted

http://queue.acm.org/detail.cfm?id=2068896

http://matt.might.net/articles/implementing-a-programming-language/

http://jeremykun.com/2011/09/16/chai-basic/

http://www.furidamu.org/blog/2012/03/23/scalisp-a-lisp-interpreter-in-scala/ http://code.google.com/p/scalalogic/

http://offthelip.org/?p=123

http://stackoverflow.com/questions/5451645/implementing-lazy-functional-languages

http://www.cse.chalmers.se/edu/year/2011/course/TIN321/lectures/proglang-08.html

http://code.google.com/p/mjollnir/ http://matt.might.net/articles/cek-machines/ http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours http://news.ycombinator.com/item?id=4764088 http://blog.yuvimasory.com/2011/03/parser-combinators-and-interpreters-in.html

https://github.com/dwayne?tab=repositories https://bitbucket.org/rohinmshah/forth-interpreter/overview#followers http://lambda.jimpryor.net/lambda_evaluator/ http://hashcollision.org/whalesong/

random compiler course using llvm: http://www.cse.chalmers.se/edu/year/2012/course/TDA282/lectures.html http://llvm.org/docs/tutorial/OCamlLangImpl1.html

http://asmjs.org/spec/latest/

(after a post in which seanmcdirmid asserts that MIMD (task parallelism) doesnt scale and SIMD (data parallelization) does:

seanmcdirmid 4 hours ago

link

> but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.

MIMD hasn't been shown to scale, and its not locking that is the problem, but I/O and memory.

> To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).

When Martin Odersky began pushing actors as a solution to Scala and multi-core, this was my immediate thought: the Scala compiler is slow, can this make it go faster? It was pretty obvious after some thought that the answer was no. But then we have no way yet of casting compilation as a data-parallel task (a point in favor of task parallelism, but it doesn't help us much).

reply

The Spineless Tagless G-Machine — NOT! Kevin Hammond

https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS

---

http://metalua.luaforge.net/

---

" A Wealth of Information I feel that there is a lot of stuff -- low level C like languages like C-- and VMs like Parrot -- that I'm not aware of. It would be great if there's was a summary of the different options for little language implementers like myself. Is CPS the "trick" used to interpret/compile functional languages nowadays or are there other options? In praticular I'm interested in different options for a language with eager/strict semantics. By Johan Tibell at Thu, 2006-07-13 19:39

A-Normal Form
login or register to post comments
    Is CPS the "trick" used to interpret/compile functional languages nowadays or are there other options? 

One other option is A-normal form (ANF), which stands for administrative normal form (A-normal form is impossible to google for). See the thread Retrospective: The Essence of Compiling with Continuations. The retrospective paper is a one-page summary paper which puts ANF in context w.r.t. CPS. The original paper, also linked from that thread, provides the detailed spec for ANF. The thread also contains a link to a comp.lang.scheme thread with some discussion of the two.

Jens Axel Soegaard posted a short Scheme program on the Scheme Cookbook which demonstrates Transforming CoreScheme? into A Normal Form. By Anton van Straaten at Sat, 2006-07-15 10:54

" ---
login or register to post comments

http://lambda-the-ultimate.org/node/1617#comment-19661

"

Language-independent VMs

Designing a good language-independent VM is much harder than it might seem. In particular I've seen no VM which I would like to target my Kogut compiler with. My compiler currently produces C code.

Targetting a preexisting VM is practical only if we are willing to adapt the language for the VM, or if the language semantics is already similar to an existing language for which the VM has been designed.

The problem of targetting VMs compared to targetting C and assembler is that most VMs are high level relative to the other choices. Especially those which are interpreted rather than JIT-ed.

Besides the well-known issues of tail calls and continuations, VMs often include a runtime type system, garbage collection, calling functions with arguments, objects with fields, strings, dictionaries, typesafe arithmetic etc. It is unlikely that these operations match the semantics of an independently developed language. The type system will most likely not match at all. There are tons of problematic areas:

    How are functions with a statically unknown arity handled?
    Can objects other than functions be applied?
    How are object fields identified? Are they accessed when knowing the object type or not, possibly including subtyping? Are fields identified by strings or symbols, or indices computed at compile time? Is the set of fields known at object creation time, or is the semantics of field access resolved dynamically?
    What is the nature of characters, elements of strings? Unicode code points? UTF-16 code units? Bytes? Are they distinguished from 1-character strings? Are strings mutable, and can they change their length?
    How is hashing and equality used by a hash table specified?
    What is the semantics of weak references and finalizers? Is a finalizer specified when its associated object is created, or can it be added at any time to any object? May a finalizer refer to the object being finalized, or to other finalizable objects? Are finalizers run by other threads?
    What numeric types are available? What is the effect of overflowing fixed-size integers: returning bignums? an exception? wrapping modulo integer size? What is the semantics of integer division and remainder of negative numbers? What happens when you try to add objects which are not of a builtin numeric type?
    Which exceptions are thrown from particular erroneous situations? What happens when an array is accessed outside of bounds? What happens on integer and floating point division by zero? Are exceptions resumable? Is exception handling code in a tail position with respect to try?

Of course in each case the intended semantics can be somehow obtained from primitives offered by the VM. But this often requires a given source operation by many VM operations, most of which are not needed in typical cases (e.g. to translate error conditions and error signalling). Or it requires emulating features using completely different features for silly reasons (e.g. a language wants object fields to be identified by symbols having names being sequences of Unicode code points, while a VM identifies them by mutable 8-bit strings compared by contents, or vice versa). This is a significant instruction decoding overhead even if the VM took care to make individual opcodes fast. It is tempting to modify the language to suit the VM, which is a sign of weakness of the language implementation technology.

I suppose a faster result is obtained by targetting a VM which is either low-level and compiled into machine code, or designed with the given source language in mind.

For example NekoVM? looks very far from my language Kogut. .NET and JVM are different still, and so is Parrot. From these choices I suppose .NET would be best; it is sufficiently low-level. But I decided that it would be better to build the runtime myself in C, which is so close to the processor that the logic needed to obtain the semantics I wanted had a low overhead.

I don't advocate developing an entire compiler or interpreter in C. Here high level languages are better, especially for a compiler. I'm talking only about the runtime. This is painful too but this pain has a reason. By Qrczak at Fri, 2006-07-14 15:47

"
login or register to post comments

--

"

Functional backend

First thing to do is ask yourself, how much, and which type of fun do I want to have? Writing the whole thing in C (like popular php, perl, python and ruby) can be frustrating.

For example, if you wanted to be able to call functions written in python, you might consider a Johan-Lang to Python translator, and call Python as your assembler. :) Then you wouldn't have to deal with bytecode generation, and you'd have a huge library at your nascent language's fingertips.

Camlp4 Tutorial

Definitely read lots of sources before you start. There are some great ideas in those interpreters (and in some cases, bucketfulls of pain).

edit: You said small, functional language written in C, and I left out possibly your best resources: 1001 Scheme implementations! http://www.cs.indiana.edu/scheme-repository/imp.html http://www.schemers.org/Implementations/ http://www.aracnet.com/~briand/scheme_eval.html By Adam K at Thu, 2006-07-13 19:43

"
login or register to post comments

-- http://lambda-the-ultimate.org/node/1617#comment-19637

"

One interesting approach

One interesting approach might be to use the techniques from From Interpreter to Compiler and Virtual Machine and A Functional Correspondence between Evaluators and Abstract Machines . It should be straightforward how to represent an AST in C, it is just a labelled tree. Defunctionalization (not really defined but illustrated, ask google if you want a more formal approach) is one of apparently three* methods for implementing higher-order languages in first-order ones. It should be pretty clear how the final abstract or virtual machines can be implemented in C especially without the worry of GC added (which these papers don't cover).

"
login or register to post comments

"

I've decided to go with an all C interpreter

After reading about a few different VMs and intermediate languages recommended here I've decided to try to build something in C first. I believe that it'll be interesting and instructive for me. I've found a number of different papers and articles on interpreter implementation but all are written using more high level languages (such as Haskell and OCaml). What I need is some practical hints on implementing a strict lambda caluculs in C. More specifically I need some pointers on AST representations and value representation (i.e. not fully applied functions/lambdas/closures).

When I wrote an interpreter in Haskell I used something like this (simplified):

data Exp = Lam Id Exp

App Exp Exp
Var Id
Lit Int

And for values:

data Value = VFun (Value -> Eval Value)

VInt Int

I.e. embedding functions directly in Haskell which won't be possible using C (or will it?). So how to do all that trampolining, CPS andclosure stuff in C? Any good resources out there?

P.S. I will probably use the Boehm GC as it looks mature enough and I don't want to write a GC as well. By Johan Tibell at Mon, 2006-07-17 15:16

LC in 90 minutes
login or register to post comments
    So how to do all that trampolining, CPS andclosure stuff in C? 

The 90 Minute Scheme to C compiler might be helpful, although it describes a compiler, not an interpreter. Scheme is close enough to lambda calculus that the issues you mention are essentially the same at the implementation level. By Anton van Straaten at Mon, 2006-07-17 15:26

login or register to post comments

"

"

Boehm GC

    P.S. I will probably use the Boehm GC as it looks mature enough and I don't want to write a GC as well. 

I have used the Boehm GC to keep things tidy in the AST generation and processing part of a compiler written in C. Saves alot of bother.

I am not too sure whether the Boehm GC is well suited for memory management in functional languages, which typically allocate many small, short-lived objects, e.g. cons cells. This is not a typical pattern in C as such. The reason for my misgivings is that a Bigloo Scheme programme I was working on spent over 90% of its time in GC, and was slower than the same programme written in Python or Lua, and far slower than Ocaml. Bigloo Scheme compiles to C and uses the Boehm GC. Other programmes I tried out in Bigloo Scheme, which were less list intensive, ran pretty much as fast as C. Take this with a pinch of salt. As I am not an experienced Scheme programmer, I may have written the Scheme programme incorrectly in some way. By Glyn Adgie at Tue, 2006-07-18 13:54

It might be worthwhile to look at TinyScheme?
login or register to post comments

If you can wrap your head around the code, that particular interpreter is very tight, and it has stood the test of time fairly well. Don't assume that it has support for GC right -- last time I looked, there were bugs, but if you contact me offline I can describe the issues for you.

Boehm GC is not a good place to start if you ultimately want precise GC. It leaks like a proverbial sieve. Also, if you are writing a n interpreter you don't need something like Boehm, because you are in a position to easily keep track of all the references. My suggestion on this would be to try hard for a precise GC. The results are worth it.

Feel free to contact me offline about this. By shap at Tue, 2009-04-28 20:09

"
login or register to post comments

"

--

notes about llvm

(a) asm.js is fast (b) llvm can compile to asm.js via emscripten

stuff built on top of llvm:

http://vmkit.llvm.org/

clojure and llvm: https://groups.google.com/forum/#!topic/clojure/KrwtTsdYZ8I

http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html

" Unfortunately, LLVM in its current state is really designed as a static compiler optimizer and back end. LLVM code generation and optimization is good but expensive. The optimizations are all designed to work on IR generated by static C-like languages. Most of the important optimizations for optimizing Python require high-level knowledge of how the program executed on previous iterations, and LLVM didn't help us do that. An example of needing to apply high-level knowledge to code generation is optimizing the Python stack access. LLVM will not fold loads from the Python stack across calls to external functions (ie the CPython runtime, so all the time). We eventually wrote an alias analysis to solve this problem, but it's an example of what you have to do if you don't roll your own code generator. LLVM also comes with other constraints. For example, LLVM doesn't really support back-patching, which PyPy? uses for fixing up their guard side exits. It's a fairly large dependency with high memory usage, but I would argue that based on the work Steven Noonan did for his GSOC that it could be reduced, especially considering that PyPy?'s memory usage had been higher. "

looks like LLVM is working on it and by the time i need it it will be there:

https://groups.google.com/forum/#!topic/llvm-dev/oJtjkJPaHmc

http://stackoverflow.com/questions/6833068/why-is-llvm-considered-unsuitable-for-implementing-a-jit

http://www.ffconsultancy.com/ocaml/hlvm/

"

1

Interesting. Your work is on statically typed functional languages, where the complaints I heard (linked in the post) were from people implementing dynamically typed imperative/OO languages. I wonder if the typing or functionalness has a bigger impact. – Sean McMillan?