Bayle Shanks's website: proj-jasper-jasperImplementationNotes1

hmm guile can be embedded, perhaps target guile. todo.

---

Platform targets

Embeddable via C on GNU/Linux
Can be made into shared libraries on GNU/Linux
Javascript, esp. http://asmjs.org/spec/latest/ (asm.js validator: https://blog.mozilla.org/luke/2013/03/21/asm-js-in-firefox-nightly/ )
JVM (and particularly JVM Java)
CLR (and particularly CLR C#)
Python

Lower priority targets:

Android
PHP
Perl
Ruby
embeddable via Objective-C/C and shared libraries on Mac OS, Windows
Various lisps
Lua

Initial target

We plan to bootstrap by targetting a high-level language that is not necessarily embeddable/interoperable with popular languages, but rather, super-good at metaprogramming and hence well-suited for designing programming languages in. We'll write a Jasper interpreter in this. Then once we become (once we have a running Jasper interpreter in Jasper), we'll write a Jasper-to-X compiler for the secondary target.

Candidates: Racket, Scala, Haskell.

Secondary target

To provide maximal interoperability, we'll want to compile Jasper to an existing high-level multiplatform language. So far Lua seems like the best choice, but http://asmjs.org/spec/latest/ is close (note that LLVM bycode is compilable to asm.js).

Actually, an idea: to define RJasper, take the intersection of asm.js, JVM, CLR, LLVM, KLambda, RPython, Lua, C-, emscripten's LLVM ( https://github.com/kripken/emscripten/wiki/CodeGuidelinesAndLimitations ), nock, Dis. or something like that. maybe we want the union rather than the intersection, or more properly speaking, something in between these two (e.g. each fundamental construct from each of these, so not a minimal basis set, but neither an exact embedding of each of these in the result).

Other known potential candidates for the bootstrap target: Clojure, Guile, Shen, Scala, Fantom, Neko. The bootstrap target should probably support C embedding, or both JVM, CLR, and at least one other target.

Pros and cons:

Lua is focused on being embeddable and on being a small, portable language
Clojure prominently announces that it is on JVM, CLR, and JS
Scala supports first class delimited continuations
Shen has a core, KLambda, that is focused on being a small, portable language
Guile is focused on being embeddable

Tertiary target

To provide maximal portability, we could also write a Jasper-to-C compiler (written in Jasper).

misc notes

http://stackoverflow.com/questions/809710/what-is-the-best-language-to-write-a-compiler-in

notes on interoperability between the jvma and the clr:

http://stackoverflow.com/questions/3602091/call-clr-code-from-jvm?rq=1

http://stackoverflow.com/questions/11825220/jvm-over-clr-and-viceversa

http://xmlvm.org/clr2jvm/

http://www.badlogicgames.com/wordpress/?p=2583

For future reference i should note Neko VM and LLVM

Programming in Lua, Third Edition

http://www.boredomandlaziness.org/2012_07_01_archive.html

http://stackoverflow.com/questions/2167610/what-is-a-good-vm-for-developing-a-hobby-language

http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#q=jvm+clr+vm+lua&hl=en&client=ubuntu&hs=Atu&tbo=d&channel=fs&ei=iyEDUaDcO9Ko0AGgroDQAg&start=20&sa=N&fp=1&biw=1340&bih=1984&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&cad=b&sei=fSEDUeuxJujw0QHt3YHQAg

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#hl=en&client=ubuntu&tbo=d&channel=fs&sclient=psy-ab&q=jvm+clr+vm+neko&oq=jvm+clr+vm+neko&gs_l=serp.3...8176.9088.0.9347.4.4.0.0.0.0.245.853.0j2j2.4.0.les%3B..0.0...1c.1.gRWvjWZ1GlA&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8e0d657e9b9006fd&biw=1340&bih=1984

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#hl=en&client=ubuntu&tbo=d&channel=fs&sclient=psy-ab&q=jvm+clr+vm+parrot&oq=jvm+clr+vm+parrot&gs_l=serp.3..33i21.6056.6808.1.6904.6.6.0.0.0.1.314.1140.0j1j3j1.5.0.les%3B..0.0...1c.1.ZktoKMvpdm8&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8e0d657e9b9006fd&biw=1340&bih=1984

https://www.google.com/search?client=ubuntu&channel=fs&q=jvm+clr+vm+lua&ie=utf-8&oe=utf-8#hl=en&client=ubuntu&tbo=d&channel=fs&sclient=psy-ab&q=jvm+clr+vm+llvm&oq=jvm+clr+vm+llvm&gs_l=serp.3...7259.7634.2.7826.4.4.0.0.0.0.314.750.2-2j1.3.0.les%3B..0.0...1c.1.78hjhaKUYa4&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8e0d657e9b9006fd&biw=1340&bih=1984

" make bootstrap

The 'bootstrap' target does not just compile GCC, but compiles it several times. It uses the programs compiled in a first round to compile itself a second time, and then again a third time. It then compares these second and third compiles to make sure it can reproduce itself flawlessly. This also implies that it was compiled correctly.

links about implementing languages in racket, scheme, scala, haskell

racket or scheme

the racket compiler is self-hosting

http://cs.brown.edu/courses/cs173/2012/book/ http://matt.might.net/articles/implementing-exceptions/ http://docs.racket-lang.org/guide/performance.html http://pl.barzilay.org/lec03.txt http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket http://cs.brown.edu/courses/cs173/2012/OnLine/ http://cs.brown.edu/courses/cs173/2012/Assignments/Python2/designs.html http://jeremykun.com/tag/racket/page/2/ http://stackoverflow.com/questions/10449052/peter-norvigs-regular-expression-compiler-from-udacity-course-rewritten-in-rack http://stackoverflow.com/questions/3345397/how-is-racket-different-than-scheme https://github.com/geoffhill/lc-compiler http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf

scala

the scala compiler is self-hosting

http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/ https://github.com/frolovv/scheme-interpreter-in-scala http://niklaskorz.de/post/40418904638/small-bf-interpreter-in-scala http://brianmckenna.org/blog/sexp_scala http://stackoverflow.com/questions/9064292/brainfuck-compiler-in-scala http://code.google.com/p/lua-compiler-in-scala/ http://tomlee.co/category/compiler-construction-with-scala-and-bcel/ http://tomlee.co/2008/07/jvm-compiler-construction-with-scala-and-bcel-part-15/ http://www.slideshare.net/thomaslee/open-source-compiler-construction-for-the-jvm-lca2011-miniconf http://www.furidamu.org/scalisp/pres/index.html www.irisa.fr/triskell/publis/2010/Fouquet10a.pdf about using scala as a target language

haskell

the haskell compiler is self-hosting

http://www.haskell.org/haskellwiki/Applications_and_libraries/Compilers_and_interpreters http://www.haskell.org/haskellwiki/Applications_and_libraries/Compiler_tools http://www.mail-archive.com/haskell-cafe@haskell.org/msg59160.html

a 1987 book by one of the main Haskell guys about how to compile functional programming languages: http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/

http://stackoverflow.com/questions/5451645/implementing-lazy-functional-languages http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://propella.blogspot.com/2009/04/prolog-in-haskell.html http://jonathan.tang.name/files/scheme_in_48/tutorial/overview.html http://www.defmacro.org/ramblings/lisp-in-haskell.html https://github.com/azylman/scheme-interpreter-in-haskell http://codereview.stackexchange.com/questions/18981/a-stack-based-language-interpreter-in-haskell http://stackoverflow.com/questions/6271285/does-haskell-have-something-like-gensym-in-racket http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://stackoverflow.com/questions/6299406/code-generation-for-compiler-in-haskell http://stackoverflow.com/questions/5234398/is-there-a-book-which-teaches-about-compilers-using-haskell http://alephnullplex.appspot.com/blog/view/2010/01/12/lbach-1-introduction https://github.com/wecing/compiler-in-haskell https://github.com/adrianomelo/c-compiler-in-haskell http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.139.1175 http://blog.poucet.org/2007/06/simplistic-compiler-in-haskell/ http://stackoverflow.com/questions/679815/haskell-how-to-best-to-represent-a-programming-languages-grammar http://www.cs.uu.nl/wiki/bin/view/Center/CompilerConstructionInHaskell http://bloggingmath.wordpress.com/2010/01/20/writing-a-compiler-in-haskell-compiler-series-part-i/ http://www.cs.uu.nl/wiki/bin/view/Cco/CourseSchedule (taught by one of the architects of Utrecht Haskell Compiler) http://mahler.cse.unsw.edu.au/webcms2/course/index.php?cid=2202 https://github.com/thsutton/passingcuriosity.com/blob/master/_drafts/2011-01-24-compiler-construction-jvm.md http://home.adelphi.edu/sbloch/class/archive/272/spring2012/syllabus.html http://chplib.wordpress.com/2010/04/07/tock-a-compiler-for-concurrent-languages-written-in-haskell/ http://www.cse.unsw.edu.au/~dons/blog/2006/12/11#interpreters-with-reader-monads http://www.defmacro.org/ramblings/lisp-in-haskell.html http://en.wikibooks.org/wiki/Haskell/Write_Yourself_a_Scheme_in_48_Hours http://www.mail-archive.com/haskell-cafe@haskell.org/msg59166.html http://www.cs.uu.nl/wiki/HUT/WebHome http://www.mail-archive.com/haskell-cafe@haskell.org/msg59358.html

general

http://stackoverflow.com/questions/1669/learning-to-write-a-compiler http://compilers.iecc.com/crenshaw/ https://class.coursera.org/compilers//lecture/preview http://www.dreamincode.net/forums/topic/268945-an-introduction-to-compiler-design-part-ii-parsing/

misc or unsorted

http://queue.acm.org/detail.cfm?id=2068896

http://matt.might.net/articles/implementing-a-programming-language/

http://jeremykun.com/2011/09/16/chai-basic/

http://www.furidamu.org/blog/2012/03/23/scalisp-a-lisp-interpreter-in-scala/ http://code.google.com/p/scalalogic/

http://offthelip.org/?p=123

http://stackoverflow.com/questions/5451645/implementing-lazy-functional-languages

http://www.cse.chalmers.se/edu/year/2011/course/TIN321/lectures/proglang-08.html

http://code.google.com/p/mjollnir/ http://matt.might.net/articles/cek-machines/ http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours http://news.ycombinator.com/item?id=4764088 http://blog.yuvimasory.com/2011/03/parser-combinators-and-interpreters-in.html

https://github.com/dwayne?tab=repositories https://bitbucket.org/rohinmshah/forth-interpreter/overview#followers http://lambda.jimpryor.net/lambda_evaluator/ http://hashcollision.org/whalesong/

random compiler course using llvm: http://www.cse.chalmers.se/edu/year/2012/course/TDA282/lectures.html http://llvm.org/docs/tutorial/OCamlLangImpl1.html

http://asmjs.org/spec/latest/

(after a post in which seanmcdirmid asserts that MIMD (task parallelism) doesnt scale and SIMD (data parallelization) does:

seanmcdirmid 4 hours ago

link

> but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.

MIMD hasn't been shown to scale, and its not locking that is the problem, but I/O and memory.

> To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).

When Martin Odersky began pushing actors as a solution to Scala and multi-core, this was my immediate thought: the Scala compiler is slow, can this make it go faster? It was pretty obvious after some thought that the answer was no. But then we have no way yet of casting compilation as a data-parallel task (a point in favor of task parallelism, but it doesn't help us much).

The Spineless Tagless G-Machine — NOT! Kevin Hammond

https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS

---

http://metalua.luaforge.net/

---

" A Wealth of Information I feel that there is a lot of stuff -- low level C like languages like C-- and VMs like Parrot -- that I'm not aware of. It would be great if there's was a summary of the different options for little language implementers like myself. Is CPS the "trick" used to interpret/compile functional languages nowadays or are there other options? In praticular I'm interested in different options for a language with eager/strict semantics. By Johan Tibell at Thu, 2006-07-13 19:39

A-Normal Form

    Is CPS the "trick" used to interpret/compile functional languages nowadays or are there other options?

One other option is A-normal form (ANF), which stands for administrative normal form (A-normal form is impossible to google for). See the thread Retrospective: The Essence of Compiling with Continuations. The retrospective paper is a one-page summary paper which puts ANF in context w.r.t. CPS. The original paper, also linked from that thread, provides the detailed spec for ANF. The thread also contains a link to a comp.lang.scheme thread with some discussion of the two.

Jens Axel Soegaard posted a short Scheme program on the Scheme Cookbook which demonstrates Transforming CoreScheme? into A Normal Form. By Anton van Straaten at Sat, 2006-07-15 10:54

" ---

http://lambda-the-ultimate.org/node/1617#comment-19661

Language-independent VMs

Designing a good language-independent VM is much harder than it might seem. In particular I've seen no VM which I would like to target my Kogut compiler with. My compiler currently produces C code.

Targetting a preexisting VM is practical only if we are willing to adapt the language for the VM, or if the language semantics is already similar to an existing language for which the VM has been designed.

The problem of targetting VMs compared to targetting C and assembler is that most VMs are high level relative to the other choices. Especially those which are interpreted rather than JIT-ed.

Besides the well-known issues of tail calls and continuations, VMs often include a runtime type system, garbage collection, calling functions with arguments, objects with fields, strings, dictionaries, typesafe arithmetic etc. It is unlikely that these operations match the semantics of an independently developed language. The type system will most likely not match at all. There are tons of problematic areas:

How are functions with a statically unknown arity handled?
Can objects other than functions be applied?
How are object fields identified? Are they accessed when knowing the object type or not, possibly including subtyping? Are fields identified by strings or symbols, or indices computed at compile time? Is the set of fields known at object creation time, or is the semantics of field access resolved dynamically?
What is the nature of characters, elements of strings? Unicode code points? UTF-16 code units? Bytes? Are they distinguished from 1-character strings? Are strings mutable, and can they change their length?
How is hashing and equality used by a hash table specified?
What is the semantics of weak references and finalizers? Is a finalizer specified when its associated object is created, or can it be added at any time to any object? May a finalizer refer to the object being finalized, or to other finalizable objects? Are finalizers run by other threads?
What numeric types are available? What is the effect of overflowing fixed-size integers: returning bignums? an exception? wrapping modulo integer size? What is the semantics of integer division and remainder of negative numbers? What happens when you try to add objects which are not of a builtin numeric type?
Which exceptions are thrown from particular erroneous situations? What happens when an array is accessed outside of bounds? What happens on integer and floating point division by zero? Are exceptions resumable? Is exception handling code in a tail position with respect to try?

Of course in each case the intended semantics can be somehow obtained from primitives offered by the VM. But this often requires a given source operation by many VM operations, most of which are not needed in typical cases (e.g. to translate error conditions and error signalling). Or it requires emulating features using completely different features for silly reasons (e.g. a language wants object fields to be identified by symbols having names being sequences of Unicode code points, while a VM identifies them by mutable 8-bit strings compared by contents, or vice versa). This is a significant instruction decoding overhead even if the VM took care to make individual opcodes fast. It is tempting to modify the language to suit the VM, which is a sign of weakness of the language implementation technology.

I suppose a faster result is obtained by targetting a VM which is either low-level and compiled into machine code, or designed with the given source language in mind.

For example NekoVM? looks very far from my language Kogut. .NET and JVM are different still, and so is Parrot. From these choices I suppose .NET would be best; it is sufficiently low-level. But I decided that it would be better to build the runtime myself in C, which is so close to the processor that the logic needed to obtain the semantics I wanted had a low overhead.

I don't advocate developing an entire compiler or interpreter in C. Here high level languages are better, especially for a compiler. I'm talking only about the runtime. This is painful too but this pain has a reason. By Qrczak at Fri, 2006-07-14 15:47

Functional backend

First thing to do is ask yourself, how much, and which type of fun do I want to have? Writing the whole thing in C (like popular php, perl, python and ruby) can be frustrating.

For example, if you wanted to be able to call functions written in python, you might consider a Johan-Lang to Python translator, and call Python as your assembler. :) Then you wouldn't have to deal with bytecode generation, and you'd have a huge library at your nascent language's fingertips.

Do they generate byte code for a virtual machine? Most do (the faster ones) The thing is, for you to target that format, you may have to wind up learning a lot more than just a few opcodes from that language's VM.
Does the virtual machine handle garbage collection? If you write your own interpreter, but backend into their runtime/VM, you will still have to worry about memory management (unless you write it in a GC'd language), because you have to cause values/variables, etc to come into existance (via your parser/AST builder) before they are translated into your target in-memory VM format. Now, if you write a compiler to their bytecode, you won't have to worry about this at program runtime (though you will still have to manage memory during compilation, and you won't have your nice REPL)
What virtual machine would be good to target if I need to implement typical things such tail call optimizations and closures? If you want TCO and closures, I'd seriously consider targeting ocaml's VM, or even just embedding your language via camlp4 LtU? link

Camlp4 Tutorial

Is parsers/lexers for modern languages handwritten or do they use something like Flex/Bison? Python uses a handwritten grammar. (See $Python-dir/Parser) There is really no reason to do this to yourself nowadays. For pragmatic reasons, I'd say: don't write it by hand unless it is LL(1) (or maybe LL(k), if you are ambitious), and you want to do some contextual error-handling that would be harder to understand or implement in an auto-generated parser. LL parsers are the only ones I'd consider writing by hand. If you are starting from scratch, I'd use ANTLR(Java) or lemon(C) or Elkhound(C++) or maybe yacc. (With lex)
Is it easier to write an interpreter in C or to generate byte code? Well, most interpreters that don't directly execute the AST generate what amounts to bytecode internally anyway (most LISPs, Python, Ocaml's REPL). Some of them write a representation of that out to disk (.pyc files, ocaml bytecode files) So, if you write an interpreter, you'll be doning both (unless you build an AST and just execute that directly with a recursive evaluator - this may be the way to go for a small interpreter).

Definitely read lots of sources before you start. There are some great ideas in those interpreters (and in some cases, bucketfulls of pain).

edit: You said small, functional language written in C, and I left out possibly your best resources: 1001 Scheme implementations! http://www.cs.indiana.edu/scheme-repository/imp.html http://www.schemers.org/Implementations/ http://www.aracnet.com/~briand/scheme_eval.html By Adam K at Thu, 2006-07-13 19:43

-- http://lambda-the-ultimate.org/node/1617#comment-19637

One interesting approach

One interesting approach might be to use the techniques from From Interpreter to Compiler and Virtual Machine and A Functional Correspondence between Evaluators and Abstract Machines . It should be straightforward how to represent an AST in C, it is just a labelled tree. Defunctionalization (not really defined but illustrated, ask google if you want a more formal approach) is one of apparently three* methods for implementing higher-order languages in first-order ones. It should be pretty clear how the final abstract or virtual machines can be implemented in C especially without the worry of GC added (which these papers don't cover).

Closures and combinators are the other two, though closure conversion are very much related to defunctionalization. By Derek Elkins at Mon, 2006-07-17 18:33

I've decided to go with an all C interpreter

After reading about a few different VMs and intermediate languages recommended here I've decided to try to build something in C first. I believe that it'll be interesting and instructive for me. I've found a number of different papers and articles on interpreter implementation but all are written using more high level languages (such as Haskell and OCaml). What I need is some practical hints on implementing a strict lambda caluculs in C. More specifically I need some pointers on AST representations and value representation (i.e. not fully applied functions/lambdas/closures).

When I wrote an interpreter in Haskell I used something like this (simplified):

data Exp = Lam Id Exp

App Exp Exp

Var Id

Lit Int

And for values:

data Value = VFun (Value -> Eval Value)

VInt Int

I.e. embedding functions directly in Haskell which won't be possible using C (or will it?). So how to do all that trampolining, CPS andclosure stuff in C? Any good resources out there?

P.S. I will probably use the Boehm GC as it looks mature enough and I don't want to write a GC as well. By Johan Tibell at Mon, 2006-07-17 15:16

LC in 90 minutes

    So how to do all that trampolining, CPS andclosure stuff in C?

The 90 Minute Scheme to C compiler might be helpful, although it describes a compiler, not an interpreter. Scheme is close enough to lambda calculus that the issues you mention are essentially the same at the implementation level. By Anton van Straaten at Mon, 2006-07-17 15:26

Boehm GC

    P.S. I will probably use the Boehm GC as it looks mature enough and I don't want to write a GC as well.

I have used the Boehm GC to keep things tidy in the AST generation and processing part of a compiler written in C. Saves alot of bother.

I am not too sure whether the Boehm GC is well suited for memory management in functional languages, which typically allocate many small, short-lived objects, e.g. cons cells. This is not a typical pattern in C as such. The reason for my misgivings is that a Bigloo Scheme programme I was working on spent over 90% of its time in GC, and was slower than the same programme written in Python or Lua, and far slower than Ocaml. Bigloo Scheme compiles to C and uses the Boehm GC. Other programmes I tried out in Bigloo Scheme, which were less list intensive, ran pretty much as fast as C. Take this with a pinch of salt. As I am not an experienced Scheme programmer, I may have written the Scheme programme incorrectly in some way. By Glyn Adgie at Tue, 2006-07-18 13:54

It might be worthwhile to look at TinyScheme?

If you can wrap your head around the code, that particular interpreter is very tight, and it has stood the test of time fairly well. Don't assume that it has support for GC right -- last time I looked, there were bugs, but if you contact me offline I can describe the issues for you.

Boehm GC is not a good place to start if you ultimately want precise GC. It leaks like a proverbial sieve. Also, if you are writing a n interpreter you don't need something like Boehm, because you are in a position to easily keep track of all the references. My suggestion on this would be to try hard for a precise GC. The results are worth it.

Feel free to contact me offline about this. By shap at Tue, 2009-04-28 20:09

notes about llvm

(a) asm.js is fast (b) llvm can compile to asm.js via emscripten

stuff built on top of llvm:

http://vmkit.llvm.org/

clojure and llvm: https://groups.google.com/forum/#!topic/clojure/KrwtTsdYZ8I

looks like LLVM is working on it and by the time i need it it will be there:

https://groups.google.com/forum/#!topic/llvm-dev/oJtjkJPaHmc

http://stackoverflow.com/questions/6833068/why-is-llvm-considered-unsuitable-for-implementing-a-jit

http://www.ffconsultancy.com/ocaml/hlvm/

Interesting. Your work is on statically typed functional languages, where the complaints I heard (linked in the post) were from people implementing dynamically typed imperative/OO languages. I wonder if the typing or functionalness has a bigger impact. – Sean McMillan? Mar 5 '12 at 21:49 1 upvote flag

Both static typing and functional style have a big impact but you can address the mismatch. LLVM's own mem2reg optimization pass actually transforms imperative code over value types (ints, floats etc.) from memory operations into purely functional single-static-assignment code (the kind HLVM generates naturally). Dynamic typing is harder because it makes it impossible to attain predictably-good performance but some simple solutions should be effective such as representing all values as unboxed unions or compiling functions for all possible combinations of types of arguments on-demand. – Jon Harrop Mar 5 '12 at 22:01 "

" It takes a long time to start up is the biggest complaint - however, this is not so much of an issue if you did what Java does and start up in interpreter mode, and use LLVM to compile the most used parts of the program. "

" For dynamic languages, LLVM might not be the right tool, just because it was designed for optimizing system programming languages like C and C++ which are strongly/statically typed and support very low level features. In general the optimizations performed on C don't really make dynamic languages fast, because you're just creating an efficient way of running a slow system. Modern dynamic language JITs do things like inlining functions that are only known at runtime, or optimizing based on what type a variable has most of the time, which LLVM is not designed for. "

Julia uses LLVM

http://www.aosabook.org/en/llvm.html

---

--- perhaps some of the complexity of Haskell's type system can be resolved not by simplifying the type system, but just by improving the UI to Haskell's type checker. one part of this is the error messages. one problem is that if a number of type signatures are, together, inconsistent, then type checker will complain about a seemingly arbitrary choice of one of these, whereas to a programmer, that one is not always where the problem is. Imposing ordering may help here. The most obvious ordering would be run-time; act as if you are interpreting the program and then the first time you hit an inconsistency, emit an error on that statement. this would be undecidable in general but in most cases there is a clear hierarchy to the program that can be deduced at compile time; e.g. "Main" has calls to Abc and Def, and Abc calls Xyz. Now if a type signature in Main which is passed to Abc causes a type signature in Abc which is passed to Xyz which is used to compute a local variable X within Xyz which conflicts with a type signature on another local variable Y within Xyz on a subsequent line, then the type system should complain about Y's type signature.

the type checker should also report the other induced types that led to the error, e.g. the types of the parameters in the Abc and Xyz calls which had a bearing on the induced type of X.

the key there is to remove the need for the programmer to 'debug' types by inserting a bunch of intermediate type signatures to find out where the problem lies.

Another thing the type checker errors could provide is examples. Rather than just saying "Something of TypeA? can't also be something of TypeB?" (e.g. "Something of type &[Int] can't also be of type [&Int]"), it should give an example for each type (e.g. "&[Int] (e.g. 0x0000 where *0x0000 = [0]) can't also be of type [&Int] (e.g. [0x0000] where *0x0000 = 0)". This might not help experienced users much (although i bet it will from time to time), but i bet the first thing that a newbie does when seeing some complicated type error is try to figure out an example of the type expression, so this saves them some energy (some will say 'but they have to learn to think!'; i say, well, turn this feature off if you want practise reading type expressions; not everyone does).

---

could use Haskell "GHC Core":

http://stackoverflow.com/questions/6121146/reading-ghc-core

(looks like GHC currently uses a native assembly backend by default but still supports LLVM and can do C for porting: http://www.haskell.org/ghc/docs/7.6.2/html/users_guide/code-generators.html )

(here's GHC's compiliation chain: haskell -> GHC Core -> simplified GHC Core -> STG ("spineless tagless G-machine") -> Cmm ("C minus minus") -> backend -- http://blog.ezyang.com/2011/04/tracing-the-compilation-of-hello-factorial/ ) ---

llvm and haskell and C--:

http://stackoverflow.com/questions/815998/llvm-vs-c-how-can-llvm-fundamentally-not-be-better-for-haskell-than-c

note however that since LLVM added a new calling convention for Haskell (presumably so that it can use "global registers"?), it's uncertain whether emscripten will even work with it (if it implements this uncommon calling convention): http://marc.info/?l=haskell-cafe&m=130253901622420&w=2 . however this suggests that it works: https://github.com/kripken/emscripten/issues/444 . however maybe not: https://github.com/kripken/emscripten/issues/500 http://www.mail-archive.com/haskell-cafe@haskell.org/msg106689.html http://www.mail-archive.com/haskell-cafe@haskell.org/msg106699.html http://www.mail-archive.com/haskell-cafe@haskell.org/msg106708.html http://www.mail-archive.com/haskell-cafe@haskell.org/msg106711.html https://github.com/kripken/emscripten/wiki/CodeGuidelinesAndLimitations

for details on GHC's special calling convention ("cc 10") see

http://llvm.org/docs/LangRef.html (search for "GHC convention") or http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/LangRef.html?revision=98212&view=markup&pathrev=98212 section "cc 10" (search for "cc 10")

http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CodeGenerator.html?revision=98212&view=markup&pathrev=98212 section "Tail call optimization",

haskell (because it is a lazy functional language) generates so many object allocations that to do it in a traditional way would be too inefficient, so GHC actually uses as many registers as possible, and presumably even across function calls ("global registers") before using the stack or heap:

http://lhc-compiler.blogspot.com/2009/01/why-llvm-probably-wont-replace-c.html

C-- vs LLVM:

http://stackoverflow.com/questions/3891513/how-does-c-compare-to-llvm?rq=1

emscripten LLVM limitations:

https://github.com/kripken/emscripten/wiki/CodeGuidelinesAndLimitations

-- todo rewrite and add some of this to programmingLanguages book:

What does GHC do?

Since Jasper intends to be mostly a lazy, functional language (remember, the initial impetus was to make a "Python of Haskell", e.g. Haskell semantics but readable like Python, although that is no longer the main focus), one idea is to look at what GHC, the most prominent Haskell compiler, and follow in its footsteps. Unfortunately, the result of that look is terrifying:

GHC's compiliation chain is : haskell -> GHC Core -> simplified GHC Core -> STG ("spineless tagless G-machine") -> Cmm ("C minus minus") -> pluggable backend

GHC Core is a simplified lazy functional language. Simplified GHC Core is the result of applying various regularizing transforms to GHC Core. STG is a slightly lower-level functional language (ezyang says "STG is very similar to Core, though there’s a lot of extra goop in preparation for the code generation phase"). Ezyang says "Cmm is GHC’s high-level assembly language. It is similar in scope to LLVM, although it looks more like C than assembly.". The pluggable backend can be native assembly, LLVM, or C, however GHC frowns upon the use of C for anything but porting (cross-compiling GHC, i think) because it is "unregistered" and hence very slow, which I'll explain a little later. See http://blog.ezyang.com/2011/04/tracing-the-compilation-of-hello-factorial/ for more details. See http://stackoverflow.com/questions/6121146/reading-ghc-core?rq=1 for details on GHC Core.

So, great, there's an LLVM backend, so GHC reduces Haskell to GHC Core and reduces that to LLVM and then we could in theory run emscripten to compile that LLVM to asm.js and do whatever we want, right? So we'll just make Jasper use something like GHC Core and LLVM, right, and then we're like GHC?

No. As of the last mailing list posts i could find, emscripten CAN compile most normal Haskell code, but the Haskell RUNTIME still needs to be ported. This has been described as a multi-man-month job (see http://www.mail-archive.com/haskell-cafe@haskell.org/msg106699.html ).

In addition, GHC feels free to pass inline assembly to LLVM or to postprocess object code emitted by LLVM: https://groups.google.com/forum/#!topic/llvm-dev/VWRVJWVx8jU

Furthermore, LLVM itself had to be modified to be able to work with Haskell. LLVM added: (a) tail-call optimization, and (b) a new calling convention that allows the LLVM programmer (GHC) to control register allocation in some way. I don't fully understand (b) but here's what it seems to me that people are saying is happening:

Lazy functional languages (or at least, Haskell) tend to generate orders of magnitude more short-term memory allocations than other types of languages (see http://lhc-compiler.blogspot.com/2009/01/why-llvm-probably-wont-replace-c.html , http://lhc-compiler.blogspot.com/2009/01/case-against-cllvm.html ). This means that GHC presumably feels the need to hyper-optimize this sort of thing. Doing the usual things (whatever they are, i dont fully understand this part) w/r/t allocating stuff on the heap, pushing stuff onto the stack, talking to garbage collector subsystems, just doesn't cut it (i'm guessing; note that the last two links are NOT about GHC). Therefore, GHC keeps track of available registers and uses registers as global variables, rather than pushing stuff onto the stack, whenever there are free registers. Even the stack isn't fast enough for GHC!

(however, there are variants of Haskell which use more normal stuff, e.g. http://lhc-compiler.blogspot.com/2009/01/what-is-lhc.html?showComment=1231978920000#c8877968406576262667 says "Garbage collection can be implemented by keeping a shadow stack containing the pointers / GC roots. Exceptions can be implemented by using setjmp/longjmp."

So, LLVM added an option to give the programmer (GHC) more control over more registers. See http://stackoverflow.com/questions/6394937/llvms-calling-convention-for-ghc for info on the additions that LLVM made for GHC.

In addition, GHC-code has been CPS-transformed by the time it gets to LLVM. This makes tail-call optimization on LLVM's part absolutely necessary, and makes many of LLVM's optimizations not work (see http://www.mail-archive.com/haskell-cafe@haskell.org/msg106708.html ).

Furthermore, the haskell runtime (RTS) hasn't yet been ported to LLVM. https://github.com/kripken/emscripten/issues/500 . Apparently, it currently isn't POSSIBLE to port the haskell runtime to LLVM because the fine-grained register control it requires can't be expressed in LLVM http://marc.info/?l=haskell-cafe&m=130253901622420&w=2 .

Now, Emscripten. Even if the runtime could be expressed as LLVM, there is no blanket guarantee that Emscripten could handle it, because Emscripten CANNOT accept arbitrary LLVM code: https://github.com/kripken/emscripten/wiki/CodeGuidelinesAndLimitations . As noted above, Emscripten can compile normal Haskell code, but the runtime, with its low-level register specific stuff, needs to be ported, probably directly into asm.js, and afaict Emscripten may need modification to preserve the register allocation.

Misc: more related misery: http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/index.html

---

random compiler test: http://rosettacode.org/wiki/Man_or_boy_test#C

---

three 'normal forms' that compilers use are CPS, SSA, and A-normal.

i wonder which is best, and why?

---

interesting paper on an (a) an optimization technique for a graph-reducing language runtime, and (b) the observation that, while it is hard to do NONLINEAR naming in a distributed system, it is easy to do LINEAR naming. Consider the case of RPC. Various things would be nicer if you used CPS in RPC; "do this then send the result somewhere else and tell them to do the next step". But you don't want to manually specify the address of "somewhere else"; you just want to pass a named continuation and then have a lower level distributed system resolve this. So, by using LINEAR graph reduction, you can easily distribute the graph over multiple concurrent machines.

http://web.archive.org/web/20070705161929/http://www.linearity.org/ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.5189&rep=rep1&type=pdf

http://tunes.org/wiki/linear_20graph_20reduction.html

" Linear Graph Reduction The term for a variation on traditional graph reduction where the availability of references obeys a linear logic in its type. That is, only one object can refer to a specific expression/value unless a specific duplication combinator is used, in which case the outputs of the duplicator constitute two expressions with the same value in a sense.

About linear graph systems as object frameworks, see Alan Bawden's thesis. "

(this wiki may be of interest in itself: http://tunes.org/ )

http://web.archive.org/web/20070225133151/http://www.linearity.org/bawden/

re: the distributed system app, a "link" is an object with two "ends" and five methods: create() => Link, destroy(Link), pick_up_link(Link) => Destriptor, put_down_link(Descriptor) => Link, query_link(Link) => Agent or NULL. pick_up_link must be called by an agent who currently owns one or both ends of a link. To transfer one end of a link, the current owner calls pick_up_link and sends the Descriptor generated to the recipient, who calls put_down_link on it. query_link tells you who currently holds the other end. destroy_link requires that you hold both ends. The important thing about links is that they cannot be copied; if you hold one end, you know that only one person holds the other end, and you don't have to care about anyone else. No one has to forward messages destined for a link that moved. If you hold both ends of a link, you can safely destroy it without worrying that someone out there still has a copied reference to it. The system requires communication with the other end of the link when you move your end. (this could be implemented more centrally but why would you?)

people seem very excited about Go's upcoming "full preemptive scheduler" (and unhappy about the current, non-preemptive one)

continuations 2 days ago

link

The author did say CPU efficiency is key, and Erlang isn't exactly known for its raw speed.

In general search is very CPU intensive, maybe that's why Google never adopted Erlang.

banachtarski 21 hours ago

link

Yes but intentionally so. There is an intrinsic tension between latency and throughput. Erlang chooses willfully to optimize for the former rather than the latter. This works when the majority of the tasks occurring concurrently are typically small and lightweight (aka, a web server).

shin_lao 1 day ago

link

In C++ it's even easier than in C to safely use stack-based allocation (thanks to RAII) so this it's better because it's C rather than C++ is a huge warning sign.

malkia 1 day ago

link

But then if you have some 3rd party library code using setjmp/longjmp (libjpeg and others) this would no long work with RAII - no unwinding would be done. That is if the your library calls a "callback" which calls one of the said 3rd party libs (or your code) with longjmp

Extending Lua is easy: Lua has been designed to work together with C. The Lua VM offers a small, yet flexible C API.

  Binding to C libraries is the forte of Lua. Often enough a few
  lines suffice. Even beginners can write a binding in minutes
  without studying huge manuals. There are plenty of automatic
  binding generators for larger tasks, too.

  Compared to other language's binding mechanisms, Lua's C API
  effectively provides a shield between the internals of the Lua
  VM and the outer world. One does not have to care about
  internal structures of the Lua VM or even the garbage collector
  (how many refcounting bugs do your Python bindings have?).

Embedding Lua is easy: the Lua core uses zero static data. All C APIs require a single opaque handle which carries along the complete state of the VM.

  Memory use is strictly contained. The Lua core does not leak
  memory. Even custom memory allocation functions can be used.
  It's easy to use multiple completely independent Lua
  interpreters in a single process if there should be a need.

  Lua needs only a handful of ANSI C library functions for
  operation. And even these can be further reduced. Lua has
  excellent portability to even the most restricted embedded
  environments.

Lua has an incremental GC: amongst the options for implementing a garbage collector, an incremental GC is most suitable for embedded devices. It has low latency, no additional memory cost and little implementation complexity.

  A traditional mark-and-sweep GC may cause lengthy pauses which
  are incompatible with semi-real-time requirements (imagine:
  "Please hold the line while we're cleaning up -- beep").
  Generational GC has a much higher memory overhead and its
  heuristic approach to reclamation may hold on to important
  resources much longer than tolerable in such an environment.

  [It has been debated at length whether GC is an appropriate
  mechanism for managing memory in resource constrained devices.
  But manual memory management is tedious and error-prone. The
  benefits of GC, coupled with modern languages by far outweigh
  its disadvantages wrt. average memory use IMHO.]

Lua has coroutines: multitasking is desirable, even on small devices. Too often one encounters annoying limits on cell phones, e.g. one cannot open the calendar or lookup a number in the phonebook while on the phone with someone. But fully preemptive multitasking is either not supported or is considered too costly for these environments.

  Lua's coroutines provide a fast and memory efficient way for
  non-preemptive multitasking. Lua's coroutines are built-in and
  are independent of the capabilities of the underlying OS. Lua
  even happily runs on devices without an MMU.

Lua can be sandboxed: it's easy to restrict access to library functions on a function-by-function basis. Memory use as well as CPU usage can be restricted, too.

  This is very important for a cell phone environment. You really
  want to prevent any random application you've downloaded from
  blocking the GUI forever or silently making expensive phone
  calls on your behalf.

  Removing certain dangerous library functions or (better yet)
  just offering a handful of "safe" functions for use by
  user-written applications is easy. It's even possible to manage
  multiple protection domains with different privilege levels in
  the same Lua VM.

  Other VMs provide either no support for sandboxing (short of
  completely ripping out huge parts of their libraries) or offer
  only complicated and inflexible protection mechanisms. Lua can
  be sandboxed _in_ Lua and with standard Lua calls, providing
  for maximum flexibility at minimum cost."

I say all this without having looked closely at the GC code though I was pleasantly impressed today at how easy it was to replace the standard setjmp/longjmp exception mechanism with something based on Cocoa's exception system (which is also setjmp/longjmp based but generally more complicated). "

---

llvm interpreter

http://stackoverflow.com/questions/8053995/is-it-possible-to-embed-llvm-interpreter-in-my-software-and-does-it-make-sense

http://llvm.org/docs/CommandGuide/lli.html

lua vs llvm

http://www.freelists.org/post/luajit/Using-luajit-runtime-for-nonlua-languages,8

lua has been ported to js:

http://kripken.github.io/lua.vm.js/repl.html

luajit seems a lot smaller than llvm but that'a mistake:

http://www.freelists.org/post/luajit/Using-luajit-runtime-for-nonlua-languages,5

http://www.freelists.org/post/luajit/Using-luajit-runtime-for-nonlua-languages,10

" LLVM's llvm-config is, basically, broken and all the options are wrong --- it seems to assume that you're working *on* LLVM rather than working

with* LLVM. This means that it will try very hard to statically link LLVM.

I'm using it for a small expression evaluation language and if you link against the *dynamic* library it's not that bad --- we're talking half megabyte executables here. "

---

marshray 1 day ago

link

Erlang seems to pay a small performance penalty for that ability though. The VM is not allowed to run more than a few thousand instructions without checking for a signal to task switch.

But if that kind of guaranteed cooperation isn't needed, I'd expect that lightweight threads could be added to a language like this in a straightforward way.

hetman 1 day ago

link

Unfortunately the documentation still states that at this point I can choose either threads or a dynamically linked run time library. For an embedded system, static linking isn't much of an option because there are real memory constraints.

It's a shame that a lot of these new languages focus on creating a really great language, but don't seem to give as much time to the practical usage of the language (for example, GHC did not support cross-compilation at all until very recently).

Ironing some of those fundamental issues out before adding some of the fancier features could probably really boost interest in the language and programmer uptake.

http://www.cminusminus.org/

ppl seem to like llvm better tho

" (The lessons of Smalltalk, Dis, and Lua are relevant, even though the world seems to have a strange hangup on LLVM's unfulfilled promises in this realm—not to say anything bad about LLVM and Clang for what they do well, but optimizing C++ is very, very different from optimizing everything else. So there's alsy that.) "

" Threads for Rakudo Perl 6 Posted: 22/12/2012

Author: ttjjss

Filed under: Perl

2 Comments »

So it has come to this. Threads.pm is up and running, bringing the ever-wanted threaded execution to the most popular Perl 6 implementation.

You’re looking for TL;DR, aren’t you? Here’s what it’s capable of:

use Threads; use Semaphore;

my @elements; my $free-slots = Semaphore.new(value => 10); my $full-slots = Semaphore.new(value => 0);

sub produce($id) { my $i = 0; loop { $free-slots.wait; @elements.push: $i; $i++; $full-slots.post; } }

sub consume($id) { loop { $full-slots.wait; my $a = @elements.shift; $free-slots.post; } }

for 1..5 -> $i { async sub { produce($i) } }

for 5..10 -> $i { async sub { consume($i) } }

Doesn’t look that awesome. I mean, it’s just a producer-consumer problem, what’s the big deal? Let me repeat:

OMG, RAKUDO HAS WORKING THREADS.

So, once we’re done celebrating and dancing macarena all around, there’ll always be someone to ask “hold on, there’s gotta be a caveat. Something surely is missing, explain yourself!”

I’ll be delighted to say “nope, everything’s there!”, but that’d make me a liar. Yeah, there are missing pieces. First, those aren’t really native threads – just green threads. Native OS threads are already implemented in Parrot VM, but NQP (the language that Rakudo is based on) still doesn’t support them, so before some volunteer comes along to fix them, you’ll still have to build parrot --without-threads (which means: use green threads, not OS threads) for Threads.pm to work. But fear not! The API is exactly the same, so once native threads are there, both Threads.pm and the code you write with it should work without any changes.

But green threads are fine too! Except for one minor detail: whenever any of them blocks on IO, the entire Parrot comes to a halt. The plan is for Parrot threads scheduler to handle it nicely, but it’s not there yet, so if you expected nice and easy async IO, sorry, but you’re stuck on MuEvent? :)

Yep, we’re not really there yet. But I claim it’s closer than ever. We have working threads implementation. You can write code with that, and it’s not a PITA. Go for it! There’s a lot of room to improve it. I didn’t try really hard for Threads.pm to follow the concurrency synopsis (in my defense, I think niecza doesn’t follow it either :)), and I think that once we unleash a wolfpack of developers which can work towards something that we’ll all love to use. "

talk on rpython and pypy http://files.meetup.com/2179791/pypy-compilation-framework.pdf

upvote

tych0 37 days ago

link

While I agree that a functional implementation of something usually copies more, it's a little disingenuous to say that most algorithms return a "fresh copy" of a structure. Most of the collections in most of the functional runtimes use clever tricks (hash array mapped tries, hash consing, etc.) to avoid copying the whole structure.

In the face of these techniques (and others, e.g. re-structuring things to use zippers and being clever manually or whatever), I suspect the actual amount of redundant data might be quite a bit less than you might think.

upvote

TylerE? 37 days ago

link

Much is shared, yes, but compared to a traditional approach with static arrays, there is a _lot_ of copying in a typical FP approach.

upvote

mercurial 36 days ago

link

Sure, but in Haskell, a lot of it will be deforested by the compiler.

upvote

winter_blue 36 days ago

link

> The other thing that holds it back IMHO is the compile time.

Now that you mention compile time -- one of the biggest things that the Go designers cared about was compile-time. Go touts it as one of its major pros; and I do agree for personal experience that compile time matters.

You can often cut down compile-time by making "run-time compromises". For instance, with C++ code, dyn-linking rather statically linking everything can speed up things significantly. This is because with massive C++ codebases, when you change a couple of lines, rebuilding the relevant objects only takes a few seconds - but the static linking stage can take forever.

On memory constrained system (4GB) of RAM, I've seen a particular codebase that I've worked with take up to 28 minutes just to link. The same code on a machine with 8 gigs of RAM (just double) took less than 4 minutes to link. Due to the sheer number of objects that need to be linked, your system ends up thrashing (swapping pages out to disk).

That being said, I read somewhere that Go doesn't support incremental compilation. I don't if this is still true, but that's a major problem that needs to be fixed right away.

With interpreted languages, practically everything is done at run-time and you have no compilation stage -- but at a massive performance penalty. Tracing JITs do help though.

upvote

tonyplee 1 day ago

link

Kind of remind me of GWT Compile Java to Javascript.

In order for "debug" (not just writing/copy-paste some demo code) any "real" project with that, you have to to be expert in Java, Javascript and all the tiniest details of GWT framework. Any crash, stack trace involve stack trace thru the JAVA framework, javascript stack and all the wonderful translation, compilations, VM layers.

The only few programmers who can really do that is probably the few folks who wrote GWT.

Have been programming C, Linux Kernel Driver, Embeded, VB, Tcl, Python, Android/Java, JS, SQL, for past 20+ years, one thing I still love is KISS - Keep It Simple Stupid.

upvote

tudborg 1 day ago

link

I would assume (hope is maybe a better word) they have some kind of decent source mapping for debugging.

upvote

bradleyjg 1 day ago

link

I think that was one of the big drivers of source maps in the first place (along with dart and coffescript). They now have a pretty good implementations, but it's still not complete.

The traditional way of debugging GWT is in eclipse as java emulating javascript emulating java. This actually worked surprisingly well most of the time, but sometimes you did have to step through the compiled code in the browser. Luckily there was a "pretty" compile option.

" I do think the future of software deployment is in small, containerised VMs and so-called PaaS?, as what we're all doing right now has to end, some time.

Non-deterministic deploys of code from (usually) un-tagged source control, with production environments needing all kinds of eccentric heavy weight tools to help with mutating assets to solve problems that are mostly a product of poor framework in the first place design can't go on forever.

Whilst I think Rails is a great framework (and by and large most users of Capistrano are Rails users), it's asset pipeline is a poorly thought out idea which causes a myriad of problems. Whilst I love Ruby as a language, it's failure to standardise on an interpreter has lead to horrible situations with people using rvm and rbenv and chruby in production where things really ought to be more specified.

These problems are problems of other tools, problems of poor design, and problems of poor education. People are often using rvm and rbenv in production environments because Ruby is pathologically difficult to install correctly on modern Linux distributions; and more often than not people choose LTS versions of their distribution, and then throw those guarantees out of the window by replacing system components with bleeding edge versions of turbo-GC hacked version of their required interpreters.

...

I'll be encouraging everyone I meet to find a better way to deploy software, using containers, or switching to a language better suited to deployment, where Gem bundlers aren't required to get all the load paths fixed, and where concatenating a few css files, and minifying Javascript doesn't take 10 minutes.

" -- Lee Hambley, lead Capistrano dev, https://groups.google.com/d/msg/capistrano/nmMaqWR1z84/hdjAGIGbwdYJ

---

L1 cache sizes are ~8K-64K on ARM Cortex A7. So it would be nice if much of the Jasper Core interpreter could fit into L1.

see http://lua-users.org/lists/lua-l/2005-05/msg00053.html

other L1 cache size notes:

---

https://wiki.php.net/phpng-int

great read (i only skimmed it so far, toread):

https://www.webkit.org/blog/3362/introducing-the-webkit-ftl-jit/

https://news.ycombinator.com/item?id=7740925

thomasahle 1 hour ago

link

It always strikes me, that the things that parts of python that create problems for people trying to optimize it, are all things of relatively small importance. Like __new__, __del__ semantics and changing the stack by inspection. I wish Python3 had just let go of the slow scripting-only parts.

dbpatterson 18 minutes ago

link

The problem is that often while regular users are not directly using those features, they are using libraries that do use them. As a concrete example, people writing webapps with Django probably (often) aren't using some of the crazier class metaprogramming, but they are using Django's models, which use lots of it.

" Micro Python has the following features:

Supports almost full Python 3 syntax, including yield (compiles 99.99% of the Python 3 standard library).
Most scripts use significantly less RAM in Micro Python, and various benchmark programs run faster, compared with CPython.
A minimal ARM build fits in 80k of program space, and with all features enabled it fits in around 200k on Linux.
Micro Python needs only 2k RAM for a basic REPL.
It has 2 modes of AOT (ahead of time) compilation to native machine code, doubling execution speed.
There is an inline assembler for use in time-critical microcontroller applications.
It is written in C99 ANSI C and compiles cleanly under Unix (POSIX), Mac OS X, Windows and certain ARM based microcontrollers.
It supports a growing subset of Python 3 types and operations.
Part of the Python 3 standard library has already been ported to Micro Python, and work is ongoing to port as much as feasible.

More info at:

http://micropython.org/

You can follow the progress and contribute at github:

www.github.com/micropython/micropython www.github.com/micropython/micropython-lib "

---

ngoldbaum 5 days ago

link

It was just pointed out to me that micropython starts an order of magnitude faster than both python2 and python3:

    $ time ./micropython -c 'print(1)' 
    1
    ./micropython -c 'print(1)'  0.00s user 0.00s system 0% cpu 0.002 total
    
    $ time ./python2 -c 'print(1)'
    1
    python2 -c 'print(1)'  0.01s user 0.00s system 52% cpu 0.019 total

    $ time ./python3 -c 'print(1)'
    1
    python3 -c 'print(1)'  0.03s user 0.00s system 85% cpu 0.035 total

chubot 4 days ago

link

Yeah this is one of my longstanding annoyances with Python... the interpreter is quite slow to start up because every import generates a ton of stat() calls. Moreover, there is a tendency to use a very long PYTHONPATH, which increases the number of stats.

It's basically doing random access I/O (the slowest thing your computer can do) proportional to (large constant factor) * (num imports in program) * (length of PYTHONPATH).

When you can use it, a shebang of

!/usr/bin/python -S

can speed things up substantially.

Perl starts up an order of magnitude faster too (more like 3ms than 30ms). Ruby seems to have the same problem as Python.

pekk 4 days ago

link

What are you doing that you are noticing this routinely and finding it to be a large annoyance?

price 4 days ago

link

Any command-line script in Python that uses a significant codebase or a significant number of libraries. For me the main example is Mercurial -- it takes about 200ms (on my Mac) to run even a trivial command, and that's plenty long enough to get in the way.

It's about 60ms on my other laptop (Linux), and that's just at the threshold of noticeable and annoying.

im3w1l 4 days ago

link

Hmm should be possible to reduce that to num imports + length of pythonpath, no?

sp332 4 days ago

link

For each import statement, it has to check each directory in the path. So (num imports) * (length of path).

---

carbon12 5 days ago

link

> What parts of Python 3 syntax are missing? Which parts of the library don't compile?

The only things that don't compile properly are certain uses of "super()". super() without arguments is a very strange beast that captures the first argument of the function (interpreting it as the self object), and needs to infer its class.

Other than that, all the Python scripts in the Python 3 standard library will compile.

[deleted]

thomasahle 5 days ago

link

dbpatterson 5 days ago

link

---

plorkyeran 5 days ago

link

It's still well over an order of magnitude bigger than Lua. MicroPython? is ~200k non-blank-non-comment lines of code (as reported by cloc), while Lua 5.1 is 12k.

carbon12 5 days ago

link

Actually, most of that SLOC is in the stmhal/ port, of which ~100k is the ST library.

The true count comes from the py/ directory, for which cloc gives only 25k. And there are a lot of extra bits there, eg the inline assembler, the Thumb and X64 assembler helpers, etc.

EDIT: without the additional assemblers, cloc gives 22k. Remember that Python has ints, floats, complex and bignum, and these are all included in Micro Python. So that 22k SLOC includes complex arithmetic and a 1k bignum implementation.

nly 4 days ago

link

    $ size /usr/bin/micropython 
       text    data     bss     dec     hex filename
     272356    1336    1216  274908   431dc /usr/bin/micropython
    $ size /usr/lib/liblua.so.5.2.3 
       text    data     bss     dec     hex filename
     195101    6408       8  201517   3132d /usr/lib/liblua.so.5.2.3

I think it would really interesting for µPy to be opened up to run as a hosted runtime.

trynumber9 5 days ago

link

And then there is LuaJIT?. As fast as V8, smaller than Micro Python, with a dead simple FFI.

---

carbon12 5 days ago

link

A lot of the internals are significantly different, and would be difficult to back port. Eg, a Micro Python object is a machine word which might be a pointer to a real object, a small integer, or an interned string. By stuffing small integers (and strings) into the unused bit-space of a machine word, Micro Python can do a lot of integer operations without allocating heap RAM. This is in contrast to CPython where all integers are on the heap (although the ones below 257 are cached).

Another example: calling a function in CPython requires a heap allocation of the call frame. In Micro Python this is generally not needed (unless the function call is significantly complicated).

keeperofdakeys 4 days ago

link

Quite simply, this project cut most of the features that big programs depend on, in order to make a smaller, faster core. Things like most of the standard library, meta-programming, and general scalability are included in these cuts. Most people don't realise how much python does underneath for them, which is why cpython is so "bloated" in comparison to this.

---

Micropython Cython support #658 https://github.com/micropython/micropython/issues/658

---

https://github.com/micropython/micropython/wiki/Differences

Differences pfalcon edited this page on May 24 · 25 revisions Pages 41

    Board FreeSOC
    Board Apogee V1.0
    Board Arduino Due
    Board Digi X
    Board EA LPC4357
    Board Mikroe mini M4
    Board NetduinoPlus2
    Board Olimex STM32 405STK
    Board Olimex STM32 H407
    Board OpenMV
    Board OpenPilot Revolution
    Board Pixy
    Board Polyhelo
    Board STM32F401 Discovery
    Board STM32F407 Discovery
    Show 26 more pages…

Clone this wiki locally

This page is a proof-of-concept effort to document the differences between CPython3 (considered to be a reference implementation of the Python3 language) and MicroPython?. It classifies differences into 3 categories, each category having a different status regarding the possibility that items belonging to it will change. Differences By Design

MicroPython? is intended for constrained environments, in particular, microcontrollers, which have orders of magnitude less performance and memory than "desktop" systems on which CPython3 runs. This means that MicroPython? must be designed with this constrained environment in mind, leaving out features which simply won't fit, or won't scale, with target systems. "By design" differences are unlikely to change.

MicroPython does not ship with an extensive standard library of modules. It's not possible and does not make sense, to provide the complete CPython3 library. Many modules are not usable or useful in the context of embedded systems, and there is not enough memory to deploy the entire library on small devices. So MicroPython takes the minimalist approach - only core datatypes (plus modules specific to particular hardware) are included with the interpreter, and all the rest is left as 3rd-party dependencies for particular user applications. The micropython-lib project provides the non-monolithic standard library for MicroPython (forum)
Unlike CPython3, which uses reference-counting, MicroPython uses garbage collection as the primary means of memory management.
MicroPython does not implement complete CPython object data model, but only a subset of it. Advanced usages of multiple inheritance, __new__ method may not work. Method resolution order is different (#525). Metaclasses are not supported (at least yet).
By virtue of being "micro", MicroPython implements only subset of functionality and parameters of particular functions and classes. Each specific issue may be treated as "implementation difference" and resolved, but there unlikely will ever be 100% coverage of CPython features.
By virtue of being "micro", MicroPython supports only minimal subset of introspection and reflection features (such as object names, docstrings, etc.). Each specific feature may be treated as "implementation difference" and resolved, but there unlikely will ever be 100% coverage of CPython features.
print() function does not check for recursive data structures, and if fed with such, may run in infinite loop. Note that if this is a concern to you, you can fix this at the Python application level. Do this by writing a function which keeps the history of each object visited, and override the builtin print() with your custom function. Such an implementation may of course use a lot of memory, which is why it's not implemented at the MicroPython level.

Implementation Differences

Some features don't cater for constrained systems, and at the same time are not easy to implement efficiently, or at all. Such features are known as "implementation differences" and some of them are potentially the subject for future development (after corresponding discussion and consideration). Note that many of them would affect the size and performance of any given MicroPython? implementation, so sometimes not implementing a specific feature is identified by MicroPython?'s target usage.

    No unicode support is actually implemented. Python3 calls for strict difference between str and bytes data types (unlike Python2, which has neutral unified data type for strings and binary data, and separates out unicode data type). MicroPython faithfully implements str/bytes separation, but currently, underlying str implementation is the same as bytes. This means strings in MicroPython are not unicode, but 8-bit characters (fully binary-clean).
    Object finalization (__del__() method) is supported for builtin types, but not yet user classes. This is tracked by #245.
    Subclassing of builtin types is partially implemented, but there may be various differences and compatibility issues with CPython. #401

Known Issues

Known issues are essentially bugs, misfeatures, and omissions considered such, and scheduled to be fixed. So, ideally any entry here should be accompanied by bug ticket reference. But note that these known issues will have different priorities, especially within wider development process. So if you are actually affected by some issue, please add details of your case to the ticket (or open it if does not yet exist) to help planning. Submitting patches is even more productive. (Please note that the list of not implemented modules/classes include only those which are considered very important to implement; per the above, MicroPython? does not provide full standard library in general.)

    Some functions don't perform argument checking well enough, which potentially may lead to crash if wrong argument types are passed.
    Some functions use assert() for argument checking, which will lead to crash if wrong argument types are passed/erroneous conditions are faced. This should be replaced with Python exception raising.
    It's not possible to override builtin functions in a general way. So far - no "builtins" module is implemented, so any overrides work within the current module only.
    There's no support for persistent bytecode, and running bytecode (vs running Python source). #222
    print() function doesn't use the Python stream infrastructure, it uses the underlying C printf() function directly. This means if you override sys.stdout, then print() won't be affected. #209
    Some more advanced usages of package/module importing is not yet fully implemented. #298
    Exception handling with generators is not fully implemented. (Some small details may need to be done yet.) #243
    str.format() may miss few advanced/obscure features (but otherwise almost completely implemented). #407, #574
    % string formatting operator may miss more advanced features #403, #574
    struct module should have all functionality, but misses some syntactic sugar (like repetition specifiers in type strings).
    No builtinre module implementation. micropython-lib offers the initial PCRE FFI module for "unix" port #13
    Only beginning of io module and class hierarchy exists so far.
    collections.deque class is not implemented.
    memoryview object not implemented.
    Container slice assignment/deletion is only partially implemented. #509
    Keyword and keyword-only arguments need more work #466, #524.
    Only basic support for __new__ method is available #606, #622.

---

micropython hn https://news.ycombinator.com/item?id=7840566

Steve Yegge and Grok

https://news.ycombinator.com/item?id=4371267

---

http://lambda-the-ultimate.org/node/3220

Equality Saturation: A New Approach to Optimization, Ross Tate, Michael Stepp, Zachary Tatlock, Sorin Lerner, POPL 2009.

    Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization.

    We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The optimizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program.

    At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuristic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own.

    We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer.