Table of Contents for Programming Languages: a survey

LLVM IR

This chapter is about the LLVM IR programming language. For information about other parts of LLVM, the language implementation toolkit, see LLVM.

" LLVM IR is designed to host mid-level analyses and transformations that you find in the optimizer section of a compiler. It was designed with many specific goals in mind, including supporting lightweight runtime optimizations, cross-function/interprocedural optimizations, whole program analysis, and aggressive restructuring transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language with well-defined semantics. To make this concrete, here is a simple example of a .ll file:

define i32 @add1(i32 %a, i32 %b) { entry: %tmp1 = add i32 %a, %b ret i32 %tmp1 }

define i32 @add2(i32 %a, i32 %b) { entry: %tmp1 = icmp eq i32 %a, 0 br i1 %tmp1, label %done, label %recurse

recurse: %tmp2 = sub i32 %a, 1 %tmp3 = add i32 %b, 1 %tmp4 = call i32 @add2(i32 %tmp2, i32 %tmp3) ret i32 %tmp4

done: ret i32 %b }

This LLVM IR corresponds to this C code, which provides two different ways to add integers:

unsigned add1(unsigned a, unsigned b) { return a+b; }

Perhaps not the most efficient way to add two numbers. unsigned add2(unsigned a, unsigned b) { if (a == 0) return b; return add2(a-1, b+1); }

As you can see from this example, LLVM IR is a low-level RISC-like virtual instruction set. Like a real RISC instruction set, it supports linear sequences of simple instructions like add, subtract, compare, and branch. These instructions are in three address form, which means that they take some number of inputs and produce a result in a different register.5 LLVM IR supports labels and generally looks like a weird form of assembly language.

Unlike most RISC instruction sets, LLVM is strongly typed with a simple type system (e.g., i32 is a 32-bit integer, i32 is a pointer to pointer to 32-bit integer) and some details of the machine are abstracted away. For example, the calling convention is abstracted through call and ret instructions and explicit arguments. Another significant difference from machine code is that the LLVM IR doesn't use a fixed set of named registers, it uses an infinite set of temporaries named with a % character. " -- http://www.aosabook.org/en/llvm.html

comparing LLVM to alternatives:

" Java and .NET virtual machines. These systems provide a JIT compiler, runtime support, and a very well defined bytecode format. This means that any language that can compile to the bytecode format (and there are dozens of them3) can take advantage of the effort put into the optimizer and JIT as well as the runtime. The tradeoff is that these implementations provide little flexibility in the choice of runtime: they both effectively force JIT compilation, garbage collection, and the use of a very particular object model. This leads to suboptimal performance when compiling languages that don't match this model closely, such as C (e.g., with the LLJVM project).

A second success story is perhaps the most unfortunate, but also most popular way to reuse compiler technology: translate the input source to C code (or some other language) and send it through existing C compilers. This allows reuse of the optimizer and code generator, gives good flexibility, control over the runtime, and is really easy for front-end implementers to understand, implement, and maintain. Unfortunately, doing this prevents efficient implementation of exception handling, provides a poor debugging experience, slows down compilation, and can be problematic for languages that require guaranteed tail calls (or other features not supported by C).

A final successful implementation of this model is GCC4. GCC supports many front ends and back ends, and has an active and broad community of contributors. GCC has a long history of being a C compiler that supports multiple targets with hacky support for a few other languages bolted onto it. As the years go by, the GCC community is slowly evolving a cleaner design. As of GCC 4.4, it has a new representation for the optimizer (known as "GIMPLE Tuples") which is closer to being separate from the front-end representation than before. Also, its Fortran and Ada front ends use a clean AST.

While very successful, these three approaches have strong limitations to what they can be used for, because they are designed as monolithic applications " -- http://www.aosabook.org/en/llvm.html

Some issues with LLVM:

LLVM ISA

Instructions

http://llvm.org/docs/LangRef.html

http://llvm.org/docs/LangRef.html#instruction-reference

Terminator Instructions (control flow between basic blocks):

ret (return)
br (branch (can be either conditional or unconditional)
switch (switch)
indirectbr (indirect branch; that is, branch target is determined at runtime, out of a static list of possibilities (i think)),
invoke (call function with the possibility for the called function to raise an exception)
resume (rethrow exception from within exception handler?)
unreachable (this instruction should never be reached)

Terminator Instructions, added later:

catchswitch, catchret, cleanupret

Binary Operations:

add, fadd (integer and floating point addition)
sub, fsub (integer and floating point subtraction)
mul, fmul (integer and floating point multiplication)
udiv, sdiv, fdiv (unsigned integer, signed integer, and floating point division)
urem, srem, frem (unsigned integer, signed integer, and floating point remainder)

Bitwise Binary Operations:

shl, lshr, ashr (shift left, logical shift right, arithmetic shift right)
and, or, xor (bitwise and, or, xor)

Vector Operations:

extractelement, insertelement, (write and read to vectors)
shufflevector (takes two input vectors and a vector of indices, considering the indicies to refer to the concatenation of the two input vectors, and return a new vector with elements chosen via the indicies of the third input)

Aggregate Operations:

extractvalue, insertvalue (write and read to aggregates)

Memory Access and Addressing Operations:

alloca (allocate memory on the current stack frame)
load, store (load and store to memory)
fence (memory fence)
cmpxchg (atomic CAS (compare and exchange))
atomicrmw (atomic ops which return old value and write the result of the op to memory: xchg (exchange value at memory location with given value), add, sub, and, nand, or, xor, max, min (signed max and min), umax, umin (unsigned max and min))
getelementptr (get a pointer to a subelement of an aggregate)

Conversion Operations (all take a destination type):

trunc, fptrunc (truncating conversion, integer and floating point)
zext , sext, fpext (zero-extend integer, sign-extend integer, floating-point extend)
fptoui, fptosi (floating point to unsigned or signed integer)
uitofp , sitofp (unsigned or signed integer to floating point)
ptrtoint, inttoptr (pointer to/from int)
bitcast (convert value to a different type without changing any bits)

Conversion Operations, added later:

addrspacecast (converts ptrval from pty in address space n to type pty2 in address space m)

Other Operations:

icmp (integer comparison; the comparison being tested can be equal, not equal, or unsigned or signed gt/gte/lt/lte)
fcmp (floating point comparison; the comparison being tested includes ordered/unordered and can also test for eq/neq/gt/gte/lt/lte)
phi (must be at the beginning of a basic block; returns a different value depending on which preceding basic block control just came from)
select (choose either the first or the second value depending on a boolean argument (like C's '?') (vectorizable))
call (call a subroutine and return the result)
va_arg (get argument out of variable arguments; not supported for all targets)
landingpad (tags a basic block as an exception handler)

Other operations, added later:

catchpad, cleanuppad

Intrinsics

Variable Argument Handling Intrinsics:

llvm.va_start, llvm.va_end (initialize and destroy a variable-argument list)
llvm.va_copy ("copies the current argument position from the source argument list to the destination argument list")

Accurate Garbage Collection Intrinsics:

llvm.gcroot ("declares the existence of a GC root to the code generator, and allows some metadata to be associated with it")
llvm.gcread ("identifies reads of references from heap locations, allowing garbage collector implementations that require read barriers")
llvm.gcwrite ("identifies writes of references to heap locations, allowing garbage collector implementations that require write barriers (such as generational or reference counting collectors")

Code Generator Intrinsics: llvm.returnaddress, llvm.frameaddress, llvm.stacksave, llvm.stackrestore, llvm.prefetch, llvm.pcmarker, llvm.readcyclecounter,

Code Generator Intrinsics, added later: llvm.addressofreturnaddress, llvm.localescape, llvm.localrecover, llvm.readregister, llvm.writeregister, llvm.get.dynamic.area.offset, llvm.clearcache, llvm.instrprof.increment, llvm.instrprof.increment.step, llvm.instrprof.value.profile, llvm.thread.pointer

Standard C Library Intrinsics: llvm.memcpy, llvm.memmove, llvm.memset, llvm.sqrt, llvm.powi, llvm.sin, llvm.cos, llvm.pow, llvm.exp, llvm.exp2, llvm.log, llvm.log10, llvm.log2, llvm.fma, llvm.fabs, llvm.copysign, llvm.floor, llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint, llvm.round,

Standard C Library Intrinsics, added later: llvm.minnum, llvm.maxnum

Bit Manipulation Intrinsics: llvm.bswap, llvm.ctpop, llvm.ctlz, llvm.cttz,

Bit Manipulation Intrinsics, added later: llvm.bitreverse, llvm.fshl, llvm.fshr

Arithmetic with Overflow Intrinsics: llvm.sadd.with.overflow, llvm.uadd.with.overflow, llvm.ssub.with.overflow, llvm.usub.with.overflow, llvm.smul.with.overflow, llvm.umul.with.overflow,

Specialised Arithmetic Intrinsics: llvm.fmuladd,

Specialised Arithmetic Intrinsics, added later: llvm.canonicalize

Experimental Vector Reduction Intrinsics, added later: llvm.experimental.vector.reduce.: add, fadd, mul, fmal, and, or, xor, smax, smin, umax, umin, fmax, fmin

Half Precision Floating Point Intrinsics: llvm.convert.to.fp16, llvm.convert.from.fp16,

Debugger Intrinsics: llvm.dbg.declare, llvm.dbg.value¶

Debugger Intrinsics, added later: llvm.dbg.addr

Exception Handling Intrinsics: llvm.eh.typeid.for, llvm.eh.sjlj.setjmp, llvm.eh.sjlj.longjmp, llvm.eh.sjlj.lsda, llvm.eh.sjlj.callsite,

Exception Handling Intrinsics, added later: llvm.eh.begincatch, llvm.eh.endcatch, llvm.eh.exceptionpointer

Trampoline Intrinsics: llvm.init.trampoline, llvm.adjust.trampoline

Masked Vector Intrinsics, added later: llvm.masked.: load, store, gather, scatter, expandload, compresstore

Memory Use Markers: llvm.lifetime.start, llvm.lifetime.end, llvm.invariant.start, llvm.invariant.end

Memory Use Markers, added later: llvm.launder.invariant.group, llvm.strip.invariant.group

Constrained Floating-Point Intrinsics, added later: llvm.experimental.constrained.: fadd, fsub, fmul, fdiv, frem, fma

Constrained libm-equivalent Intrinsics, added later: llvm.experimental.constrained.: sqrt, pow, powi, sin, cos, exp, exp2, log, log10, log2, rint, nearbyint

General Intrinsics: llvm.var.annotation, llvm.ptr.annotation.*, llvm.annotation.*, llvm.trap, llvm.debugtrap, llvm.stackprotector, llvm.stackprotectorcheck (later removed), llvm.objectsize, llvm.expect, llvm.donothing,

General Intrinsics, added later: llvm.codeview.annotation, llvm.stackguard, llvm.ssa_copy, llvm.type.test, llvm.type.checked.load, llvm.experimental.deoptimize, llvm.experimental.guard, llvm.load.relative, llvm.sideeffect

Stack map intrinsics, added later: llvm.experimental.stackmap, llvm.experimental.patchpoint

Element Wise Atomic Memory Intrinsics, added later: llvm.memcpy.element.unordered.atomic, llvm.memmove.element.unordered.atomic, llvm.memset.element.unordered.atomic

" The LLVM instruction set defines a register based virtual machine with an interesting twist: it has an infinite number of registers. In keeping with its design point as a compiler intermediate representation, LLVM registers enable static single assignment form. A register is used for exactly one value and never reassigned, making it easy for subsequent processing to determine whether values are live or can be eliminated. " -- http://codingrelic.geekhold.com/2010/07/virtual-instruction-sets-opcode.html

Extensions, proposals, and simplifications

More Target Independent LLVM Bitcode (slides)
Reddit discussion on 'LLVM IR is better than assembly' touches on LLVM vs. assembly for language code generation backends, discusses LLVM vs. asm.js, and non-target independence of LLVM
https://www.seas.harvard.edu/courses/cs153/2019fa/llvmlite.html
- https://www.cs.princeton.edu/courses/archive/spring19/cos320/lectures/lecture6.pdf ---

todo: llva: "There has been work on things like virtual ISA's using LLVM (see http://llvm.org/pubs/2003-10-01-LLVA.html), but the result of this research was, IMHO, that it's not currently a worthwhile endeavor. You also can do things like restrict bitcode in ways that make it portable (like, for example, PNaCL?), but this is closer to the LLVA work than anything else (It's essentially a portability layer) It actually still requires porting, just porting to the single portability layer. "

PNaCL

PNaCl? is (mostly) a subset of LLVM that Google wants to use as a portable low-level browser 'assembly language'. As a (mostly) subset, yet which is still complete enough to write programs, i figure it might be useful as a description of a 'simpler' LLVM, esp. for someone first learning LLVM (like me). See the PNaCl? section in [1].

Tutorials

Opinions

https://www.npopov.com/2021/06/02/Design-issues-in-LLVM-IR.html
- https://lobste.rs/s/wzo7fm/design_issues_llvm_ir

proj-plbook-plChLlvmLang