proj-plbook-plChLlvmImpl

Table of Contents for Programming Languages: a survey

LLVM

The LLVM IR language

See [proj-plbook-plChLlvmLang LLVM IR].

Note: even though LLVM is in SSA form, the HLL compiler author doesn't have to do SSA analysis to generate LLVM; see https://llvm.org/docs/tutorial/OCamlLangImpl5.html#llvm-ir-for-if-then-else and https://llvm.org/docs/tutorial/OCamlLangImpl7.html . In short, you can easily generate Phi nodes for conditionals, and for mutable variables you can stack-allocate them, in which case they are "in memory" and SSA form isn't required (and if you are concerned about not making full use of registers when you put everything on the stack, LLVM will try to optimize stack variables into registers when appropriate).

tips and details

TODO: reorganize chapters so that eg stuff about LLVM isn't spread between plPartImplementation, plChImplementationCaseStudies, plChIntermedLanguages, plChTargetsImpl, plChTargetLanguages; although perhaps have two places, one for talking about the compiler's target/intermediate language as a language in itself, and one talking about the project as a tool for implementating other languages; BUT HYPERLINK THESE CLEARLY. Also, have a place to put specific tips and details for each of the various target languages/intermediate languages/language toolkits/compiler frameworks.

tail calls

LLVM has support for tail calls via the 'tail' and 'musttail' variants of the 'call' instruction, but afaict this is quite limited because:

"Note that 'invoke' instructions (aka calls that can throw exceptions, and which have local handlers) are never safe to be marked for tail calls: if the callee function unwinds, the discarded stack would be needed." -- http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt

Links:

delimited continuations

i'm not sure if LLVM supports delimited continuations

http://www.typedynamic.com/2012/09/compiler-intermediate-representations.html suggests that exceptions are "evocative of delimited continuations" but it seems to me that exceptions are a very special case.

http://www.reddit.com/r/programming/comments/13cmss/z3_an_llvm_backed_runtime_for_ocaml/c72wl1m suggests that three ways to implement delimited continuations in LLVM are:

(to me, if these sorts of things are the only ways to get delimited continuations, then i'd say that LLVM doesn't 'support' delimited continuations, because in these cases the implementation of the language on top of LLVM is either implementing control flow themself out of jumps, or they are doing something 'unsafe' -- but i may not be understanding correctly)

Links:

C focus

e.g. http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043730.html

LLVM and functional languages

todo summarize

From http://www.reddit.com/r/programming/comments/13cmss/z3_an_llvm_backed_runtime_for_ocaml/ :

" [–]gasche 10 points 1 year ago

How do performance compare to OcamlJit?2?

The usual problems with C or LLVM backends for OCaml are:

    having to use a potentially costly calling convention (whereas OCaml's is optimized for nice tail call elimination and currying)
    trying to match OCaml extremely fast [http://stackoverflow.com/a/8567429/298143 compilation of exceptions raising and catching
    interacting with the garbage collection, which may pass GC roots in registers rather than on the stack, something that are not often planned for in runtimes parametrized over an optional, generic GC

Even when targeting languages that are optimized to death (LLVM, C), these three issues alone are often enough to make programs slower than when using the apparently rather naive (not globally-optimizing) native OCaml compilation. Good choices of data structure representation, GC, calling and exception compilation can go a very long way, that more and more complex optimization techniques actually have a hard time competing with. Spending a man-year of work on a bright new register allocator might not actually do much better on real-world code if function calls are slower for fundamental architectural reasons.

[–]julesjacobs 5 points 1 year ago*

You don't have to use the C calling convention with LLVM; it has a calling convention named "fastcc" that also supports tail calls. I'm not sure how fast it is compared to OCaml though.

The last two problems can be greatly reduced at the cost of slightly more expensive function calls. You transform each function so that it can exit in multiple different ways: return normally, throw exception, capture stack for GC. This can be done by returning a return code for each type of return. On an "throw exception" type return, if there is an exception handler in the current stack frame, invoke the handler. Otherwise return to the next stack frame (again with a "throw exception" return code). On an "capture stack for GC" type return, record the live local variables as GC roots, and continue further capture of the stack by also returning a "capture stack for GT" return code from this stack frame. This approach has the advantage that it supports reasonably fast exceptions, GC roots are in registers, and the exception control flow is exposed as normal control flow to LLVM optimization passes. This disadvantages are that throwing an exception across many stack frames is slow, and after each function return you have to check a return code. You also need some ingenuity to rebuild the stack after unrolling it for GC.

You can use the same stack capturing strategy to implement delimited continuations without CPS or unsafe stack manipulation.

[–]Raphael_Amiard[S] 2 points 1 year ago sorry, this has been archived and can no longer be voted on

Your comment is spot on, and you listed succesfully every problem this project has.

Additionally, this project is running OCaml bytecode, that is untyped, so it looses a bit more potential for optimization.

    About 1, you can use fastcc that supports tail calls, and that's what we are using.
    About 2, spot on, we're using setjmp and longjmp. If you're doing a backend for ocamlopt rather than a bytecode compiler, you might be able to use LLVM zero cost exceptions, but i don't know what the speed would be compared to ocamlopt.
    About 3, spot on, LLVM support for garbage collection is bad in this regard, you can't have roots in registers.

Some more information about the trade-offs i made here "

"

Ob. grinch: LLVM & functional languages?

    Alice ML should have back-ended SEAM on LLVM, instead of going it alone 

LLVM has been mentioned a few times recently, but unless I'm missing something, it doesn't really seem to be a good fit for functional languages at the moment. For a start, it apparently has self-tail-call support, but no general tail-call optimization, which is pretty important for languages like ML.

Also, LLVM's "low level" nature means it doesn't deal with things like nested lexical scoping at the VM level, so functional languages have to treat LLVM in the same way as they would a native target, but with a more limited range of possibilities, afaict. Granted, they achieve some portability in return.

VMs for functional languages are typically higher-level, supporting e.g. a 'closure' instruction which handles all of the issues related to nested lexical scoping, and the management of activation frames. There are a lot of advantages to this from the perspective of the language developer, including e.g. easier integration with debugging tools.

So it seems to me that using LLVM as a target for a functional language right now could present quite a few challenges. At the very least, it seems far from clear that it's a no-brainer choice for a functional language implementor. The only "real" languages that compile to LLVM right now are C and C++. The sample Scheme implementation that's available isn't realistic enough to draw any (good) conclusions from. By Anton van Straaten at Sat, 2004-12-25 07:52

login or register to post comments

" -- http://lambda-the-ultimate.org/node/440#comment-3246

Links:

Details

structure of an LLVM program

"

    A Module represents a source file (roughly) or a translation unit (pedantically). Everything else is contained in a Module.
    Most notably, Modules house Functions, which are exactly what they sound like: named chunks of executable code. (In C++, both functions and methods correspond to LLVM Functions.)
    Aside from declaring its name and arguments, a Function is mainly a container of BasicBlocks....
    An Instruction, in turn, is a single code operation. The level of abstraction is roughly the same as in RISC-like machine code: an instruction might be an integer addition, a floating-point divide, or a store to memory, for example.

Most things in LLVM—including Function, BasicBlock?, and Instruction—are C++ classes that inherit from an omnivorous base class called Value. A Value is any data that can be used in a computation—a number, for example, or the address of some code. Global variables and constants (a.k.a. literals or immediates, like 5) are also Values. " -- http://adriansampson.net/blog/llvm.html

Using C++'s foreach syntax, you can access Functions as literally lists of basic blocks, and you can access Basic Blocks as literally lists of Instructions (see example in section 'Inspecting IR in Our Pass' in http://adriansampson.net/blog/llvm.html ).

LLVM IR is in SSA form [1].

LLVM and garbage collection

"There are LLVM backed languages that provide garbage collection; Julia and Crystal are the first two that come to mind. Haskell also has an LLVM back-end. " -- [2]

LLVM and providing a C ABI

" LLVM does NOT provide full C compatibility for you.

In more technical terms, any language that provides good C compatibility and uses LLVM as a backend needs to reimplement large chunks of the ABI in its frontend! This well known issue in the LLVM community causes a great deal of duplication and bugs.

Implementing a complete C ABI (with struct arguments and returns) is incredibly tricky, and not really a lot of fun. QBE provides you with IL operations to call in (and be called by) C with no pain. Moreover the ABI implementation in QBE has been thoroughly tested by fuzzing and manual tests.

" -- [3]

" > Implementing a complete C ABI (with struct arguments and returns) is incredibly tricky, and not really a lot of fun.

This is so true. Rust does it here: https://github.com/rust-lang/rust/tree/master/compiler/rustc... and it's a lot of code for no good reasons. " -- [4]

" +1 I plan to write a toy compiler and considered llvm. This one thing is the reason I will never use llvm for a hobby project.

Even for non-toy compilers, I think it's a waste of human resources that every frontend has to implement the C ABI. " -- [5]

types of passes in LLVM

http://llvm.org/docs/WritingAnLLVMPass.html#pass-classes-and-requirements :

calling conventions

There are some built-in calling conventions, but you cannot add a custom one without modifying LLVM [6].

MIR

http://llvm.org/docs/MIRLangRef.html

bitcode

backend implementation


supported architectures

http://llvm.org/docs/doxygen/html/Triple_8h_source.html ( see http://stackoverflow.com/questions/15036909/clang-how-to-list-supported-target-architectures )

variant implementations

opinions

Links

Implementing LLVM backends: