Bayle Shanks's website: proj-plbook-plPartImplementation

Table of Contents for Programming Languages: a survey

Part V: Implementation of a programming language

This is not a book about how to implement a programming language. However, even at the earliest stages of language design, it is valuable to have some idea of how the certain choices in the design of the language could make the implementation much less efficient or much more difficult to write. For this purpose, we'll provide a cursory overview of implementation topics, touching as we go on some of the language design choices that may have a large impact.

Chapter : the general pipeline

parsing (and lexing)
parse tree -> AST
variable binding?
what else here?
typechecking?
basic blocks, control flow graph (also explain call stack, and possibly scope/block stack, at some point)
- see also https://en.wikipedia.org/wiki/Extended_basic_block
what else here?
runtime

when do we convert to normal forms?

todo

" The first big phase of the compilation pipeline is parsing. You need to take your input and turn it into a tree. So you go through preprocessing, lexical analysis (aka tokenization), and then syntax analysis and IR generation. Lexical analysis is usually done with regexps. Syntax analysis is usually done with grammars. You can use recursive descent (most common), or a parser generator (common for smaller languages), or with fancier algorithms that are correspondingly slower to execute. But the output of this pipeline stage is usually a parse tree of some sort.

The next big phase is Type Checking. ...

The third camp, who tends to be the most isolated, is the code generation camp. Code generation is pretty straightforward, assuming you know enough recursion to realize your grandparents weren't Adam and Eve. So I'm really talking about Optimization...

-- http://steve-yegge.blogspot.ca/2007/06/rich-programmer-food.html

modified from Python's docs:

" The usual steps for compilation are:

    Parse source code into a parse tree
    Transform parse tree into an Abstract Syntax Tree
    Transform AST into a Control Flow Graph
    Emit bytecode based on the Control Flow Graph" -- https://docs.python.org/devguide/compiler.html#abstract

discussions on the benefits of the introduction of a 'Core' or 'middle' language in between the HLL AST and a lower-level, LLVM-ish language:

https://news.ycombinator.com/item?id=11581863
- this thread is discussing the following Rust-specific link, which also makes some general points near the beginning: http://blog.rust-lang.org/2016/04/19/MIR.html

some advantages of a core language:

the core language can allow powerful but unsafe constructs that are not allowed in the HLL (eg for Rust, GOTO, downcasting, changing potentially shared variables conditionally upon some conditions where they have not actually been shared; [1])
the core language has fewer primitives than the HLL so it is easier to work with than the HLL AST; but compared to a LLL (low-level language) like LLVM, less high-level intent has been lost, so it is easier to write some optimizations, or other analysis like borrow checking, on the core language than on the LLL
more caching for incremental re-compilation within compilation units
focuses the designer's mind on what is 'core' for the language

Links:

todo: somewhere put my observations on LLLs and target languages:

'target' languages include both 'Core languages' and LLLs

goal for:

HLL: usability
core: easy compilation from HLL->core; easy manipulation by compiler transformations (implies simplicity). Unlike HLL, verbosity is fine. Unlike LLL, machine sympathy is not necessary.
LLL: efficiency (for actual assembly languages) OR ease of transformation to backend (and similar enough to actual assembly such that stuff which looks efficient in the LLL remains so after compilation to assembly) (for eg LLVM)

Core languages tend to be: similar to the HLL, except: de-sugared (eg apply various local 'desugar' tranformations) have fewer primitives than the HLL (eg de-sugaring various control flow constructs to GOTO) in safe or prescriptive languages, sometimes the Core primitives are more powerful than the HLL (eg in Rust the MIR core language has GOTO but Rust does not) still retain function calling have explicit types have a bunch of other things that are explicit, so that the LLL is way to verbose for human use, but with a similar control flow to the original HLL like the HLL but unlike LLL, still nonlinear (parethesized subexpressions are still present)

LLLs and 'assembly' languages are more similar to each other than you might expect.

LLL general properties:

linear, rather than AST (each instruction is at an address; you cannot give a parenthesized subexpression as the argument to a primitive instruction)
no type polymorphism (would lead to inefficient dispatch)
fixed sizes for things (eg almost always a max bit width of addresses, often a the max # of arguments that a primitive instruction can take)
designed along with, and constrained by, an instruction encoding (eg # of registers and addressing modes may be small so as to permit a dense instruction encoding)
designed along with, and constrained by, internal representations for primitive data types
often takes multiple instructions to do one 'primitive' thing such as a function call (eg first you must marshal the function arguments into their proper place, then you call the function)
often LLLs are in a 'canonical form' such as SSA, CPS (eg LLVM is SSA), or if not, often the compiler chooses to transform to a canonical form before the LLL

Main families of LLLs: imperative * stack vs. register machines stack machines (eg JVM, Python's, mb Forth) register machines * 3-operand vs 2- or 1-operand register machines * various calling conventions * addressing modes LOAD/STORE vs addressing modes vs LOAD/STORE with constant argument but self-modifying code explicit constant instructions (eg ADDC) vs constant addressing mode vs LOADK * are registers and memory locations actual memory locations upon which address arithmetic can be done (like in assembly), or just variable identifiers (like in Lua)? functional * eg https://en.wikipedia.org/wiki/SECD_machine combinator * eg Nock

Also, instruction encodings might be fixed-width or variable (most are variable).

Primitive operations in LLLs: not as much variance as you might expect. Generally:

most everyone has basic arithmetic; addition, subtraction, maybe multiplication, maybe division, various shifts
some sort of GOTO
some sort of conditional local branch or skip; possibly with a TEST or COND instruction separated from the branch/skip, possibly not
possibly bitwise test/set
possibly bitwise boolean ops
possibly some composite data structures; some variance here but not a ton
- cons cells
- fixed-size vectors
- variable-length arrays
- associative arrays
- strings

Links regarding stack machine vs register machines:

"Interpreters for virtual stack machines are easier to build than interpreters for register or memory-to-memory machines; the logic for handling memory address modes is in just one place rather than repeated in many instructions." -- [2]
[3]
Registers vs stacks for interpreter design says "it's easier to generate code for a stack machine" but goes on to say that register machines may be more efficient.
Register machines may be faster, according to Virtual Machine Showdown: Stack Versus Registers by Yunhe Shi, David Gregg, Andrew Beatty, M. Anton Ertl.
- (cites The case for virtual register machines)
https://en.wikipedia.org/wiki/Stack_machine ([4])
A Performance Survey on Stack-based and Register-based Virtual Machines
- discussion: https://news.ycombinator.com/item?id=13154111
Why have a stack? by Eric Lippert
https://web.archive.org/web/20200610223140/http://home.pipeline.com/~hbaker1/ForthStack.html

---

Someone's response to the question "Does anyone know if there is research into the effects of register counts on VMs?":

" Someone 1819 days ago [-]

If you can map all VM registers to CPU registers, the interpreter will be way simpler.

If you have more VM registers than CPU registers, you have to write code to load and store your VM registers.

If you have more CPU registers than VM registers, you will have to write code to undo some of the register spilling done by the compiler (if you want your VM to use the hardware to the fullest.)

So, the optimum (from a viewpoint of 'makes implementation easier') depends on the target architecture.

Of course, a generic VM cannot know how many registers (and what kinds; uniform register files are not universally present) the target CPU will have.

That is a reason that choosing either zero (i.e. a stack machine) or infinitely many (i.e. SSA) are popular choices fo VMs: for the first, you know for sure your target will not have fewer registers than your VM, for the second, that it will not have more.

If you choose any other number of VM registers, a fast generic VM will have to handle both cases.

Alternatively, of course, one can target a specific architecture, and have the VM be fairly close to the target; Google's NaCl? is (was? I think they changed things a bit) an extreme example. I have not checked the code, but this, I think, is similar. " -- https://news.ycombinator.com/item?id=2930109

---

Chapter : modularity

whole program analysis

if modules are separately compiled and allow types to remain completely abstract, e.g. when a module is compiled it is possible to not know what the in-memory representation of values in the module will be, then a host of optimizations cannot be performed. Alternatives:

Acceptance: produce fully generic code and accept the fact that the program will execute slowly. This is especially a problem if the language allows operation overloading on primitive operations and especially if it allows redefining standard operators on primitive types, e.g. if even adding two integers will be slowed down because the compiler did not know at compile time that integers were involved or it did not know if '+' had been overridden for integers.
produce bytecode, and either have a bytecode interpreter at runtime (slow execution), or compile the bytecode at runtime (slow startup) or have a just-in-time compiler (complex)
do the optimizations at link time instead of compile time; by this time the compiler (linker) has all the information and can tell what the representation will be for each value. But if the language supports dynamic linking, then 'link time' does not really occur until runtime, so this reduces to the former case.
give up on separate compilation; at compile time the compiler requires the source code of all libraries. This is impractical because some users may want to use closed-source libraries created by others, and it is also slow because libraries must be recompiled each time they are used.
create something like header files that can be distributed separately from library source code, such that by looking at all of the headers for all of the libraries, enough information is provided such that compiler can determine the in-memory representation of all values in the program at compile time. In some languages, such as C++, the header files require processing (macro preprocessing in the case of C++) and the same header file may be reloaded multiple times during one compilation, slowing down compilation significantly. The presence of macros in the header files also adds complexity and makes it more difficult to write third-party tools and to interoperate with other languages (cite https://news.ycombinator.com/item?id=6273739 ). Language designers using headers should consider (a) omitting macros, and (b) ensuring that each header file needs to be read at most once during any single compilation, and compiler writers should consider producing binary compiled header files as an intermediate output to save time during later recompilation.

see section "compilation model" in "Retrospective Thoughts on BitC?" http://www.coyotos.org/pipermail/bitc-dev/2012-March/003300.html

https://news.ycombinator.com/item?id=5422094

ml functors

intermediate representations (IRs)

" Some common intermediate representations:

General forms of intermediate representations (IR):
Graphical IR (parse tree, abstract syntax trees, DAG. . . )
Linear IR (ie., non graphical)
Three Address Code (TAC): instructions of the form “result=op1 operator op2”
Static single assignment (SSA) form: each variable is assigned once
Continuation-passing style (CPS): general form of IR for functional languages
Control flow graph

Examples:

Java bytecode (executed on the Java Virtual Machine)
LLVM (Low Level Virtual Machine): SSA and TAC based
C is used in several compilers as an intermediate representation (Lisp, Haskell, Cython. . . )
Microsoft’s Common Intermediate Language (CIL)
GNU Compiler Collection (GCC) uses several intermediate representations:
- Abstract syntax trees
- GENERIC (tree-based)
- GIMPLE (SSA form)
- Register Transfer Language (RTL, inspired by lisp lists) "

-- [5]

[6] also describes and motivates SSA and CPS in more detail. It defines SSA and explains why you need phi in SSA, and refers to an algorithm to determine where to place the phis.

Removing and Restoring Control Flow with the Value State Dependence Graph, thesis by James Stanier, Foundations of Software Systems School of Informatics University of Sussex . http://sro.sussex.ac.uk/id/eprint/7576/1/Stanier,_James.pdf , Chapter 2, Intermediate Representations in Compilers: A Survey

Abstract machines and the compilers that love/hate them

Introducing structured control flow

Some target languages, such as WASM and SPIR V, require some amount of structure in control flow. If the source language doesn't provide this structure, it must be introduced.

Links:

Chapter : possible compiler safety checks beyond typechecking

check to see if there are any reads to uninitialized variables could implement as types

Chapter : linker

issue in C:

in C, according to one of the comments on http://www.slideshare.net/olvemaudal/deep-c , it was claimed (i didn't check) that if you declare your own printf with the wrong signature, it will still be linked to the printf in the std library, but will crash at runtime, e.g. "void printf( int x, int y); main() {int a=42, b=99; printf( a, b);}" will apparently crash.

-- A new programming language might want to throw a compile-time error in such a case (as C++ apparently does, according to the slides).

Links:

http://www.lurklurk.org/linkers/linkers.html

Chapter : concurrency implementation

atomics

SIMD, MIMD

GPU

Useful algorithms and data structures

Efficient operations on long strings: http://en.wikipedia.org/wiki/Rope_%28data_structure%29

Chapter: designing and implementing a virtual machine

"Instructions should not bind together operations which an optimizing compiler might otherwise choose to separate in order to produce a more efficient program."

Brian Case

proj-plbook-plPartImplementation

Part V: Implementation of a programming language

Chapter : the general pipeline

Chapter : modularity

ml functors

intermediate representations (IRs)

Introducing structured control flow

Chapter : possible compiler safety checks beyond typechecking

Chapter : linker

Chapter : concurrency implementation

Useful algorithms and data structures

Chapter: designing and implementing a virtual machine

Chapter: ?? where to put these