hmm guile can be embedded, perhaps target guile. todo.
---
Lower priority targets:
We plan to bootstrap by targetting a high-level language that is not necessarily embeddable/interoperable with popular languages, but rather, super-good at metaprogramming and hence well-suited for designing programming languages in. We'll write a Jasper interpreter in this. Then once we become (once we have a running Jasper interpreter in Jasper), we'll write a Jasper-to-X compiler for the secondary target.
Candidates: Racket, Scala, Haskell.
To provide maximal interoperability, we'll want to compile Jasper to an existing high-level multiplatform language. So far Lua seems like the best choice, but http://asmjs.org/spec/latest/ is close (note that LLVM bycode is compilable to asm.js).
Actually, an idea: to define RJasper, take the intersection of asm.js, JVM, CLR, LLVM, KLambda, RPython, Lua, C-, emscripten's LLVM ( https://github.com/kripken/emscripten/wiki/CodeGuidelinesAndLimitations ), nock, Dis. or something like that. maybe we want the union rather than the intersection, or more properly speaking, something in between these two (e.g. each fundamental construct from each of these, so not a minimal basis set, but neither an exact embedding of each of these in the result).
Other known potential candidates for the bootstrap target: Clojure, Guile, Shen, Scala, Fantom, Neko. The bootstrap target should probably support C embedding, or both JVM, CLR, and at least one other target.
Pros and cons:
To provide maximal portability, we could also write a Jasper-to-C compiler (written in Jasper).
http://stackoverflow.com/questions/809710/what-is-the-best-language-to-write-a-compiler-in
notes on interoperability between the jvma and the clr:
http://stackoverflow.com/questions/3602091/call-clr-code-from-jvm?rq=1
http://stackoverflow.com/questions/11825220/jvm-over-clr-and-viceversa
http://www.badlogicgames.com/wordpress/?p=2583
For future reference i should note Neko VM and LLVM
http://www.boredomandlaziness.org/2012_07_01_archive.html
http://stackoverflow.com/questions/2167610/what-is-a-good-vm-for-developing-a-hobby-language
http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time
" make bootstrap
The 'bootstrap' target does not just compile GCC, but compiles it several times. It uses the programs compiled in a first round to compile itself a second time, and then again a third time. It then compares these second and third compiles to make sure it can reproduce itself flawlessly. This also implies that it was compiled correctly.
"
racket or scheme
the racket compiler is self-hosting
http://cs.brown.edu/courses/cs173/2012/book/ http://matt.might.net/articles/implementing-exceptions/ http://docs.racket-lang.org/guide/performance.html http://pl.barzilay.org/lec03.txt http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket http://cs.brown.edu/courses/cs173/2012/OnLine/ http://cs.brown.edu/courses/cs173/2012/Assignments/Python2/designs.html http://jeremykun.com/tag/racket/page/2/ http://stackoverflow.com/questions/10449052/peter-norvigs-regular-expression-compiler-from-udacity-course-rewritten-in-rack http://stackoverflow.com/questions/3345397/how-is-racket-different-than-scheme https://github.com/geoffhill/lc-compiler http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf
scala
the scala compiler is self-hosting
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ https://github.com/frolovv/scheme-interpreter-in-scala http://niklaskorz.de/post/40418904638/small-bf-interpreter-in-scala http://brianmckenna.org/blog/sexp_scala http://stackoverflow.com/questions/9064292/brainfuck-compiler-in-scala http://code.google.com/p/lua-compiler-in-scala/ http://tomlee.co/category/compiler-construction-with-scala-and-bcel/ http://tomlee.co/2008/07/jvm-compiler-construction-with-scala-and-bcel-part-15/ http://www.slideshare.net/thomaslee/open-source-compiler-construction-for-the-jvm-lca2011-miniconf http://www.furidamu.org/scalisp/pres/index.html www.irisa.fr/triskell/publis/2010/Fouquet10a.pdf about using scala as a target language
haskell
the haskell compiler is self-hosting
http://www.haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_interpreters http://www.haskell.org/haskellwiki/Applications_and_libraries/Compiler_tools http://www.mail-archive.com/haskell-cafe@haskell.org/msg59160.html
a 1987 book by one of the main Haskell guys about how to compile functional programming languages: http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/
http://stackoverflow.com/questions/5451645/implementing-lazy-functional-languages http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://propella.blogspot.com/2009/04/prolog-in-haskell.html http://jonathan.tang.name/files/scheme_in_48/tutorial/overview.html http://www.defmacro.org/ramblings/lisp-in-haskell.html https://github.com/azylman/scheme-interpreter-in-haskell http://codereview.stackexchange.com/questions/18981/a-stack-based-language-interpreter-in-haskell http://stackoverflow.com/questions/6271285/does-haskell-have-something-like-gensym-in-racket http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://stackoverflow.com/questions/6299406/code-generation-for-compiler-in-haskell http://stackoverflow.com/questions/5234398/is-there-a-book-which-teaches-about-compilers-using-haskell http://alephnullplex.appspot.com/blog/view/2010/01/12/lbach-1-introduction https://github.com/wecing/compiler-in-haskell https://github.com/adrianomelo/c-compiler-in-haskell http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.1175 http://blog.poucet.org/2007/06/simplistic-compiler-in-haskell/ http://stackoverflow.com/questions/679815/haskell-how-to-best-to-represent-a-programming-languages-grammar http://www.cs.uu.nl/wiki/bin/view/Center/CompilerConstructionInHaskell http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://www.cs.uu.nl/wiki/bin/view/Cco/CourseSchedule (taught by one of the architects of Utrecht Haskell Compiler) http://mahler.cse.unsw.edu.au/webcms2/course/index.php?cid=2202 https://github.com/thsutton/passingcuriosity.com/blob/master/_drafts/2011-01-24-compiler-construction-jvm.md http://home.adelphi.edu/sbloch/class/archive/272/spring2012/syllabus.html http://chplib.wordpress.com/2010/04/07/tock-a-compiler-for-concurrent-languages-written-in-haskell/ http://www.cse.unsw.edu.au/~dons/blog/2006/12/11#interpreters-with-reader-monads http://www.defmacro.org/ramblings/lisp-in-haskell.html http://en.wikibooks.org/wiki/Haskell/Write_Yourself_a_Scheme_in_48_Hours http://www.mail-archive.com/haskell-cafe@haskell.org/msg59166.html http://www.cs.uu.nl/wiki/HUT/WebHome http://www.mail-archive.com/haskell-cafe@haskell.org/msg59358.html
general
http://stackoverflow.com/questions/1669/learning-to-write-a-compiler http://compilers.iecc.com/crenshaw/ https://class.coursera.org/compilers//lecture/preview http://www.dreamincode.net/forums/topic/268945-an-introduction-to-compiler-design-part-ii-parsing/
misc or unsorted
http://queue.acm.org/detail.cfm?id=2068896
http://matt.might.net/articles/implementing-a-programming-language/
http://jeremykun.com/2011/09/16/chai-basic/
http://www.furidamu.org/blog/2012/03/23/scalisp-a-lisp-interpreter-in-scala/ http://code.google.com/p/scalalogic/
http://stackoverflow.com/questions/5451645/implementing-lazy-functional-languages
http://www.cse.chalmers.se/edu/year/2011/course/TIN321/lectures/proglang-08.html
http://code.google.com/p/mjollnir/ http://matt.might.net/articles/cek-machines/ http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours http://news.ycombinator.com/item?id=4764088 http://blog.yuvimasory.com/2011/03/parser-combinators-and-interpreters-in.html
https://github.com/dwayne?tab=repositories https://bitbucket.org/rohinmshah/forth-interpreter/overview#followers http://lambda.jimpryor.net/lambda_evaluator/ http://hashcollision.org/whalesong/
random compiler course using llvm: http://www.cse.chalmers.se/edu/year/2012/course/TDA282/lectures.html http://llvm.org/docs/tutorial/OCamlLangImpl1.html
(after a post in which seanmcdirmid asserts that MIMD (task parallelism) doesnt scale and SIMD (data parallelization) does:
seanmcdirmid 4 hours ago
| link |
> but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.
MIMD hasn't been shown to scale, and its not locking that is the problem, but I/O and memory.
> To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).
When Martin Odersky began pushing actors as a solution to Scala and multi-core, this was my immediate thought: the Scala compiler is slow, can this make it go faster? It was pretty obvious after some thought that the answer was no. But then we have no way yet of casting compilation as a data-parallel task (a point in favor of task parallelism, but it doesn't help us much).
reply
The Spineless Tagless G-Machine — NOT! Kevin Hammond
https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS
---
---
" A Wealth of Information I feel that there is a lot of stuff -- low level C like languages like C-- and VMs like Parrot -- that I'm not aware of. It would be great if there's was a summary of the different options for little language implementers like myself. Is CPS the "trick" used to interpret/compile functional languages nowadays or are there other options? In praticular I'm interested in different options for a language with eager/strict semantics. By Johan Tibell at Thu, 2006-07-13 19:39
| login or register to post comments |
Is CPS the "trick" used to interpret/compile functional languages nowadays or are there other options?
One other option is A-normal form (ANF), which stands for administrative normal form (A-normal form is impossible to google for). See the thread Retrospective: The Essence of Compiling with Continuations. The retrospective paper is a one-page summary paper which puts ANF in context w.r.t. CPS. The original paper, also linked from that thread, provides the detailed spec for ANF. The thread also contains a link to a comp.lang.scheme thread with some discussion of the two.
Jens Axel Soegaard posted a short Scheme program on the Scheme Cookbook which demonstrates Transforming CoreScheme? into A Normal Form. By Anton van Straaten at Sat, 2006-07-15 10:54
| login or register to post comments |
http://lambda-the-ultimate.org/node/1617#comment-19661
"
Language-independent VMs
Designing a good language-independent VM is much harder than it might seem. In particular I've seen no VM which I would like to target my Kogut compiler with. My compiler currently produces C code.
Targetting a preexisting VM is practical only if we are willing to adapt the language for the VM, or if the language semantics is already similar to an existing language for which the VM has been designed.
The problem of targetting VMs compared to targetting C and assembler is that most VMs are high level relative to the other choices. Especially those which are interpreted rather than JIT-ed.
Besides the well-known issues of tail calls and continuations, VMs often include a runtime type system, garbage collection, calling functions with arguments, objects with fields, strings, dictionaries, typesafe arithmetic etc. It is unlikely that these operations match the semantics of an independently developed language. The type system will most likely not match at all. There are tons of problematic areas:
How are functions with a statically unknown arity handled?
Can objects other than functions be applied?
How are object fields identified? Are they accessed when knowing the object type or not, possibly including subtyping? Are fields identified by strings or symbols, or indices computed at compile time? Is the set of fields known at object creation time, or is the semantics of field access resolved dynamically?
What is the nature of characters, elements of strings? Unicode code points? UTF-16 code units? Bytes? Are they distinguished from 1-character strings? Are strings mutable, and can they change their length?
How is hashing and equality used by a hash table specified?
What is the semantics of weak references and finalizers? Is a finalizer specified when its associated object is created, or can it be added at any time to any object? May a finalizer refer to the object being finalized, or to other finalizable objects? Are finalizers run by other threads?
What numeric types are available? What is the effect of overflowing fixed-size integers: returning bignums? an exception? wrapping modulo integer size? What is the semantics of integer division and remainder of negative numbers? What happens when you try to add objects which are not of a builtin numeric type?
Which exceptions are thrown from particular erroneous situations? What happens when an array is accessed outside of bounds? What happens on integer and floating point division by zero? Are exceptions resumable? Is exception handling code in a tail position with respect to try?Of course in each case the intended semantics can be somehow obtained from primitives offered by the VM. But this often requires a given source operation by many VM operations, most of which are not needed in typical cases (e.g. to translate error conditions and error signalling). Or it requires emulating features using completely different features for silly reasons (e.g. a language wants object fields to be identified by symbols having names being sequences of Unicode code points, while a VM identifies them by mutable 8-bit strings compared by contents, or vice versa). This is a significant instruction decoding overhead even if the VM took care to make individual opcodes fast. It is tempting to modify the language to suit the VM, which is a sign of weakness of the language implementation technology.
I suppose a faster result is obtained by targetting a VM which is either low-level and compiled into machine code, or designed with the given source language in mind.
For example NekoVM? looks very far from my language Kogut. .NET and JVM are different still, and so is Parrot. From these choices I suppose .NET would be best; it is sufficiently low-level. But I decided that it would be better to build the runtime myself in C, which is so close to the processor that the logic needed to obtain the semantics I wanted had a low overhead.
I don't advocate developing an entire compiler or interpreter in C. Here high level languages are better, especially for a compiler. I'm talking only about the runtime. This is painful too but this pain has a reason. By Qrczak at Fri, 2006-07-14 15:47
| login or register to post comments |
--
"
Functional backend
First thing to do is ask yourself, how much, and which type of fun do I want to have? Writing the whole thing in C (like popular php, perl, python and ruby) can be frustrating.
For example, if you wanted to be able to call functions written in python, you might consider a Johan-Lang to Python translator, and call Python as your assembler. :) Then you wouldn't have to deal with bytecode generation, and you'd have a huge library at your nascent language's fingertips.
| Camlp4 Tutorial |
Definitely read lots of sources before you start. There are some great ideas in those interpreters (and in some cases, bucketfulls of pain).
edit: You said small, functional language written in C, and I left out possibly your best resources: 1001 Scheme implementations! http://www.cs.indiana.edu/scheme-repository/imp.html http://www.schemers.org/Implementations/ http://www.aracnet.com/~briand/scheme_eval.html By Adam K at Thu, 2006-07-13 19:43
| login or register to post comments |
-- http://lambda-the-ultimate.org/node/1617#comment-19637
"
One interesting approach
One interesting approach might be to use the techniques from From Interpreter to Compiler and Virtual Machine and A Functional Correspondence between Evaluators and Abstract Machines . It should be straightforward how to represent an AST in C, it is just a labelled tree. Defunctionalization (not really defined but illustrated, ask google if you want a more formal approach) is one of apparently three* methods for implementing higher-order languages in first-order ones. It should be pretty clear how the final abstract or virtual machines can be implemented in C especially without the worry of GC added (which these papers don't cover).
| login or register to post comments |
"
I've decided to go with an all C interpreter
After reading about a few different VMs and intermediate languages recommended here I've decided to try to build something in C first. I believe that it'll be interesting and instructive for me. I've found a number of different papers and articles on interpreter implementation but all are written using more high level languages (such as Haskell and OCaml). What I need is some practical hints on implementing a strict lambda caluculs in C. More specifically I need some pointers on AST representations and value representation (i.e. not fully applied functions/lambdas/closures).
When I wrote an interpreter in Haskell I used something like this (simplified):
data Exp = Lam Id Exp
| App Exp Exp |
| Var Id |
| Lit Int |
And for values:
data Value = VFun (Value -> Eval Value)
| VInt Int |
I.e. embedding functions directly in Haskell which won't be possible using C (or will it?). So how to do all that trampolining, CPS andclosure stuff in C? Any good resources out there?
P.S. I will probably use the Boehm GC as it looks mature enough and I don't want to write a GC as well. By Johan Tibell at Mon, 2006-07-17 15:16
| login or register to post comments |
So how to do all that trampolining, CPS andclosure stuff in C?
The 90 Minute Scheme to C compiler might be helpful, although it describes a compiler, not an interpreter. Scheme is close enough to lambda calculus that the issues you mention are essentially the same at the implementation level. By Anton van Straaten at Mon, 2006-07-17 15:26
| login or register to post comments |
"
"
Boehm GC
P.S. I will probably use the Boehm GC as it looks mature enough and I don't want to write a GC as well.
I have used the Boehm GC to keep things tidy in the AST generation and processing part of a compiler written in C. Saves alot of bother.
I am not too sure whether the Boehm GC is well suited for memory management in functional languages, which typically allocate many small, short-lived objects, e.g. cons cells. This is not a typical pattern in C as such. The reason for my misgivings is that a Bigloo Scheme programme I was working on spent over 90% of its time in GC, and was slower than the same programme written in Python or Lua, and far slower than Ocaml. Bigloo Scheme compiles to C and uses the Boehm GC. Other programmes I tried out in Bigloo Scheme, which were less list intensive, ran pretty much as fast as C. Take this with a pinch of salt. As I am not an experienced Scheme programmer, I may have written the Scheme programme incorrectly in some way. By Glyn Adgie at Tue, 2006-07-18 13:54
| login or register to post comments |
If you can wrap your head around the code, that particular interpreter is very tight, and it has stood the test of time fairly well. Don't assume that it has support for GC right -- last time I looked, there were bugs, but if you contact me offline I can describe the issues for you.
Boehm GC is not a good place to start if you ultimately want precise GC. It leaks like a proverbial sieve. Also, if you are writing a n interpreter you don't need something like Boehm, because you are in a position to easily keep track of all the references. My suggestion on this would be to try hard for a precise GC. The results are worth it.
Feel free to contact me offline about this. By shap at Tue, 2009-04-28 20:09
| login or register to post comments |
"
--
(a) asm.js is fast (b) llvm can compile to asm.js via emscripten
stuff built on top of llvm:
clojure and llvm: https://groups.google.com/forum/#!topic/clojure/KrwtTsdYZ8I
looks like LLVM is working on it and by the time i need it it will be there:
https://groups.google.com/forum/#!topic/llvm-dev/oJtjkJPaHmc
http://stackoverflow.com/questions/6833068/why-is-llvm-considered-unsuitable-for-implementing-a-jit
http://www.ffconsultancy.com/ocaml/hlvm/
"
1
Interesting. Your work is on statically typed functional languages, where the complaints I heard (linked in the post) were from people implementing dynamically typed imperative/OO languages. I wonder if the typing or functionalness has a bigger impact. – Sean McMillan? Mar 5 '12 at 21:49 1 upvote flag
Both static typing and functional style have a big impact but you can address the mismatch. LLVM's own mem2reg optimization pass actually transforms imperative code over value types (ints, floats etc.) from memory operations into purely functional single-static-assignment code (the kind HLVM generates naturally). Dynamic typing is harder because it makes it impossible to attain predictably-good performance but some simple solutions should be effective such as representing all values as unboxed unions or compiling functions for all possible combinations of types of arguments on-demand. – Jon Harrop Mar 5 '12 at 22:01 "
" It takes a long time to start up is the biggest complaint - however, this is not so much of an issue if you did what Java does and start up in interpreter mode, and use LLVM to compile the most used parts of the program. "
" For dynamic languages, LLVM might not be the right tool, just because it was designed for optimizing system programming languages like C and C++ which are strongly/statically typed and support very low level features. In general the optimizations performed on C don't really make dynamic languages fast, because you're just creating an efficient way of running a slow system. Modern dynamic language JITs do things like inlining functions that are only known at runtime, or optimizing based on what type a variable has most of the time, which LLVM is not designed for. "
Julia uses LLVM
http://www.aosabook.org/en/llvm.html
---
--- perhaps some of the complexity of Haskell's type system can be resolved not by simplifying the type system, but just by improving the UI to Haskell's type checker. one part of this is the error messages. one problem is that if a number of type signatures are, together, inconsistent, then type checker will complain about a seemingly arbitrary choice of one of these, whereas to a programmer, that one is not always where the problem is. Imposing ordering may help here. The most obvious ordering would be run-time; act as if you are interpreting the program and then the first time you hit an inconsistency, emit an error on that statement. this would be undecidable in general but in most cases there is a clear hierarchy to the program that can be deduced at compile time; e.g. "Main" has calls to Abc and Def, and Abc calls Xyz. Now if a type signature in Main which is passed to Abc causes a type signature in Abc which is passed to Xyz which is used to compute a local variable X within Xyz which conflicts with a type signature on another local variable Y within Xyz on a subsequent line, then the type system should complain about Y's type signature.
the type checker should also report the other induced types that led to the error, e.g. the types of the parameters in the Abc and Xyz calls which had a bearing on the induced type of X.
the key there is to remove the need for the programmer to 'debug' types by inserting a bunch of intermediate type signatures to find out where the problem lies.
Another thing the type checker errors could provide is examples. Rather than just saying "Something of TypeA?