proj-plbook-plChImplementationCaseStudies

Difference between revision 27 and current revision

No diff available.

Table of Contents for Programming Languages: a survey

Chapter : a tour of some targets, IRs, VMs and runtimes

We'll refer to the top of the stack as TOP0, and to the second position on the stack as TOP1, and to the third position as TOP2, etc.

stacks:

cpython: block stack

stack ops: cpython:

arithmetic:

Chapter : programming language implementation case studies

Go

Haskell: GHC

Optimizations:

See also the Haskell section in plChSingleIntermedLangs.

Haskell: Gofer

Haskell (random wikibook section on reduction?)

https://en.wikibooks.org/wiki/Haskell/Graph_reduction

Haskell FLRC

Python: CPython

http://docs.python.org/devguide/compiler.html

" Python’s parser is an LL(1) parser mostly based off of the implementation laid out in the Dragon Book [Aho86]." -- https://docs.python.org/devguide/compiler.html#parse-trees

" The abstract syntax tree (AST) is a high-level representation of the program structure without the necessity of containing the source code; it can be thought of as an abstract representation of the source code. The specification of the AST nodes is specified using the Zephyr Abstract Syntax Definition Language (ASDL) [Wang97]. " -- https://docs.python.org/devguide/compiler.html#abstract-syntax-trees-ast

" The AST is generated from the parse tree...The function begins a tree walk of the parse tree, creating various AST nodes as it goes along. It does this by allocating all new nodes it needs, calling the proper AST node creation functions for any required supporting functions, and connecting them as needed." -- https://docs.python.org/devguide/compiler.html#parse-tree-to-ast

" A control flow graph (often referenced by its acronym, CFG) is a directed graph that models the flow of a program using basic blocks that contain the intermediate representation (abbreviated “IR”, and in this case is Python bytecode) within the blocks. Basic blocks themselves are a block of IR that has a single entry point but possibly multiple exit points. The single entry point is the key to basic blocks; it all has to do with jumps. An entry point is the target of something that changes control flow (such as a function call or a jump) while exit points are instructions that would change the flow of the program (such as jumps and ‘return’ statements). What this means is that a basic block is a chunk of code that starts at the entry point and runs to an exit point or the end of the block.

As an example, consider an ‘if’ statement with an ‘else’ block. The guard on the ‘if’ is a basic block which is pointed to by the basic block containing the code leading to the ‘if’ statement. The ‘if’ statement block contains jumps (which are exit points) to the true body of the ‘if’ and the ‘else’ body (which may be NULL), each of which are their own basic blocks. Both of those blocks in turn point to the basic block representing the code following the entire ‘if’ statement.

CFGs are usually one step away from final code output. Code is directly generated from the basic blocks (with jump targets adjusted based on the output order) by doing a post-order depth-first search on the CFG following the edges. " -- https://docs.python.org/devguide/compiler.html#control-flow-graphs

" AST to CFG to Bytecode

With the AST created, the next step is to create the CFG. The first step is to convert the AST to Python bytecode without having jump targets resolved to specific offsets (this is calculated when the CFG goes to final bytecode). Essentially, this transforms the AST into Python bytecode with control flow represented by the edges of the CFG.

Conversion is done in two passes. The first creates the namespace (variables can be classified as local, free/cell for closures, or global). With that done, the second pass essentially flattens the CFG into a list and calculates jump offsets for final output of bytecode.

...

PySymtable?_Build() begins by entering the starting code block for the AST (passed-in) and then calling the proper symtable_visit_xx function (with xx being the AST node type). Next, the AST tree is walked with the various code blocks that delineate the reach of a local variable as blocks are entered and exited using symtable_enter_block() and symtable_exit_block(), respectively.

Once the symbol table is created, it is time for CFG creation, whose code is in Python/compile.c . This is handled by several functions that break the task down by various AST node types.

Emission of bytecode is handled by the following macros: ... ADDOP() add a specified opcode ... ADDOP_JABS() create an absolute jump to a basic block ADDOP_JREL() create a relative jump to a basic block ... In addition to emitting bytecode based on the AST node, handling the creation of basic blocks must be done. Below are the macros and functions used for managing basic blocks:

NEW_BLOCK() create block and set it as current ... Once the CFG is created, it must be flattened and then final emission of bytecode occurs. Flattening is handled using a post-order depth-first search. Once flattened, jump offsets are backpatched based on the flattening and then a PyCodeObject? file is created.

" -- https://docs.python.org/devguide/compiler.html#ast-to-cfg-to-bytecode

Links:

Python: PyPy

The PyPy? project has two goals (1) a reimplementation of Python in RPython, (2) to be a generic toolkit for programming language implementation based on RPython. This section is only about (1). For (2), see PyPy (seen as a toolkit for creating programming languages) . For RPython considered as a language in itself, see RPython.

Perl5

i'm not positive that everything here pertains to Perl5 and not some other version of Perl..

Links:

Perl6: Rakudo

Perl6 source code is parsed and the parse tree is annotated by firing "action methods" during parsing. The annotated AST is called a QAST. The QAST is then compiled to a virtual machine bytecode. Various virtual machines are planned to be supported, include Parrot, JVM, and MoarVM?. The compilation steps from source code to bytecode are implemented in a subset of Perl6 called 'NQP' (Not Quite Perl).

For MoarVM? considered as a language in itself, see the relevant section of [1].

The Perl6 object model is being reimplemented in the 6model project.

Links:

Smalltalk: Squeak

Squeak runs on a VM.

The VM is implemented in Slang, a subset of Smalltalk that can be efficiently optimized.

Erlang

http://prog21.dadgum.com/127.html

Erlang -> Core Erlang -> BEAM -> optimized BEAM

Guile 2.1

http://www.gnu.org/software/guile/docs/master/guile.html/Compiler-Tower.html#Compiler-Tower

" Guile defines a tower of languages, starting at Scheme and progressively simplifying down to languages that resemble the VM instruction set (see Instruction Set).

Each language knows how to compile to the next, so each step is simple and understandable. Furthermore, this set of languages is not hardcoded into Guile, so it is possible for the user to add new high-level languages, new passes, or even different compilation targets.

Languages are registered in the module, (system base language):

(use-modules (system base language))

They are registered with the define-language form.

Scheme Syntax: define-language [#:name] [#:title] [#:reader] [#:printer] [#:parser=#f] [#:compilers='()] [#:decompilers='()] [#:evaluator=#f] [#:joiner=#f] [#:for-humans?=#t] [#:make-default-environment=make-fresh-user-module]

    Define a language.
    This syntax defines a <language> object, bound to name in the current environment. In addition, the language will be added to the global language set. For example, this is the language definition for Scheme:
    (define-language scheme
      #:title	"Scheme"
      #:reader      (lambda (port env) ...)
      #:compilers   `((tree-il . ,compile-tree-il))
      #:decompilers `((tree-il . ,decompile-tree-il))
      #:evaluator	(lambda (x module) (primitive-eval x))
      #:printer	write
      #:make-default-environment (lambda () ...))

The interesting thing about having languages defined this way is that they present a uniform interface to the read-eval-print loop. This allows the user to change the current language of the REPL:

scheme@(guile-user)> ,language tree-il Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'. tree-il@(guile-user)> ,L scheme Happy hacking with Scheme! To switch back, type `,L tree-il'. scheme@(guile-user)>

" -- http://www.gnu.org/software/guile/docs/master/guile.html/Compiler-Tower.html#Compiler-Tower

The normal tower of languages when compiling Scheme goes like this:

    Scheme
    Tree Intermediate Language (Tree-IL)
    Continuation-Passing Style (CPS)
    Bytecode 

http://wingolog.org/archives/2013/11/26/a-register-vm-for-guile

See also Guile section of proj-plbook-plChLispLangs, [[proj-plbook-plChMiscIntermedLang?]], proj-plbook-plChSingleIntermedLangs, proj-plbook-plChTargetsImpl.

LuaJIT 2

http://nominolo.blogspot.com/2012/07/implementing-fast-interpreters.html

LadyVM JVM

Links:

Potion

Lua-like bytecode.

VM instructions [2] (50 instructions):

Links:

Ruby

Detail case study: explicit vs. implicit procs in Ruby

" Given this:

    def a(&block)
      yield
    end
    def b
      yield
    end

Running `a {1+1}` is 4x slower than running `b {1+1}`. " -- https://news.ycombinator.com/item?id=9120409

Why is it slower? "There are two expensive things going on here: first, vm_make_env_object is creating a new environment for the call. This ends up doing a heap allocation using xmalloc (see line 461). Moreover, that call to rb_proc_alloc on line 683 ends up doing an allocation to create the Proc itself. Thus, every time you call a method that has an explicitly defined block parameter, you incur this allocation cost – and heap allocations are far slower than anything you can do on the Ruby stack....why isn't it equally expensive to yield to a block, even if that block isn't explicitly declared? ...MRI has optimized this common case, providing an entirely separate pathway, from parsing through execution, for blockless yields (it's actually a primitive in YARV, the meta-language to which Ruby is compiled at execution time). Because of this special pathway, calls to yield end up being handled by a dedicated C function in the interpreter (vm_invoke_block in vm_inshelper.c, if you're interested), and are much faster than their explicitly blocked bretheren." [3]

But why?

"> Is it because declaring &block means my code in the method body can now "do stuff" with `block`, whereas in the implicit case I don't have a reference so I can't do any funny business but just yield? That would make sense.

That's exactly it. With `&block`, you get a reified Proc on which you can do all kinds of stuff[0] and which you can move around. With a `yield` and an implicit block, you can't do any of that, you can just call the block. So the interpreter can bypass all the Proc reification as it knows the developer won't be able to use it, it can keep just a C-level structure instead of having to setup a Ruby-level object.

There's a not-completely-dissimilar situation with `arguments` in javascript: if you're using it, the runtime has to leave a number of optimisations off because you can do much, much more with it than with just formal arguments.

[0] http://ruby-doc.org//core-2.2.0/Proc.html has plenty of fun methods, which you can only access via a reified proc " -- [4]

"

mdavidn 1 day ago

This isn't just a syntactic difference. A Proc can outlive the method that defined it, so the interpreter must lift local variables from the stack to the heap. A block, however, can never outlive the method that defined it, so local variables can remain on the stack. In theory, the interpreter could perform an escape analysis and implicitly convert `block.call` to `yield` if no `block` reference survives local scope. Presumably MRI 2.3 does this analysis.

Python has a similar performance impact for closures. Defining a lambda forces local variables onto the heap, even if the lambda is never instantiated, e.g. when defined inside an infrequently true condition.

reply " -- [5]

Links:

Links

Pedagogical implementations

Funa's Turbo Pascal

Igor Funa has made a self-hosting Turbo Pascal compiler (this is not the Borland Turbo Pascal compiler). He explains how it works:

http://turbopascal.org/turbo-pascal-internals

Rust

A list of Rust 'language items' may be found at the end of:

https://github.com/rust-lang/rust/blob/master/src/librustc/middle/lang_items.rs

my summary:

char, str, int (various sizes), unsigned int (various sizes), slice, const_ptr, mut_ptr

trait items: send, sized, unsize, copy, sync

drop

coerce_unsized

add, sub, mul, div, rem, neg, not, bitxor, bitand, bitor, shl, shr; _assign variants of most of the things on this line index, index_mut

struct items: range, range_from, range_to, range_full

cell items: unsafe_cell

more trait items: deref, deref_mut

fn, fn_mut, fn_once

eq, ord

fn items: str_eq

    // A number of panic-related lang items. The `panic` item corresponds to
    // divide-by-zero and various panic cases with `match`. The
    // `panic_bounds_check` item is for indexing arrays.
    //
    // The `begin_unwind` lang item has a predefined symbol name and is sort of
    // a "weak lang item" in the sense that a crate is not required to have it
    // defined to use it, but a final product is required to define it
    // somewhere. Additionally, there are restrictions on crates that use a weak
    // lang item, but do not have it defined.panic_fn, panic_bounds_check_fn, panic_fmt

exchange_malloc_fn, exchange_free_fn, strdup_uniq_fn

start_fn;

eh_personality, eh_personality_catch, eh_unwind_resume, msvc_try_filter

exchange_heap, owned_box

data items: phantom_data

no_copy_bound, non_zero, debug_trait

Julia

https://docs.julialang.org/en/release-0.4/devdocs/eval/

Webkit JS

Comparisons and observations

Generally the implementation of a high-level language in a restricted 'core' version of that same language define the core by producing a statically typed variant and disallowing various metaprogramming constructs.

Links

Other lists of implementations and implementation case studies