Bayle Shanks's website: proj-plbook-plChBackendImpl

# Pointer tagging

"Most dynamic languages use pointer tagging, so most ints fit inside a pointer and don’t allocate memory. This goes back at least to Smalltalk-76, and probably all the way to LISP." [1]

"Alan Kay explicitly said that he copied this from Lisp. It’s a really old technique." [2]

# Resolving jump addresses Example: https://elle.readthedocs.io/en/latest/implementation.html?source=post_page---------------------------
1. Symbol patching

"One neat trick I pulled from otcc is the symbol patching mechanism. Namely, when we don't know the value of a given symbol yet while emitting code we use the bits dedicated to this value in the machine code to store a linked list of all locations to be patched with the given symbol value. See the code of the patch function for details about that. "

More symbol patching ("linker relaxation"): https://www.sifive.com/blog/all-aboard-part-3-linker-relaxation-in-riscv-toolchain

# Code generation tools

targets: x86, x64, ARM, PowerPC? and MIPS

MIT license

## LibJIT? https://www.gnu.org/software/libjit/

Started as a part of DotGNU? Project.

targets: x86, x86-64, arm, also has an interpreter for unsupported architectures

IR: https://code.google.com/archive/p/libjit-linear-scan-register-allocator/wikis/LibJITVirtualMachineInstructionSet.wiki

Links: - https://eli.thegreenplace.net/2013/10/17/getting-started-with-libjit-part-1 - https://eli.thegreenplace.net/2013/11/12/getting-started-with-libjit-part-2/ - https://eli.thegreenplace.net/2014/01/07/getting-started-with-libjit-part-3 - https://code.google.com/archive/p/libjit-linear-scan-register-allocator/wikis/LLVM_and_GNU_Lightning.wiki "How does Libjit compare to LLVM or GNU lightning?" - http://lists.ximian.com/pipermail/mono-devel-list/2009-April/031640.html

Lesser GPL

The MIR project considers it a competitor as one of 3 "projects which could be considered or adapted as real universal light-weight JIT competitors". Compared to MIR:

LIBJIT is bigger:
80K C lines (for LIBJIT w/o dynamic Pascal compiler) vs 10K C lines for MIR (excluding C to MIR compiler)
420KB object file vs 170KB
LIBJIT has fewer optimizations: only copy propagation and register allocation " " -- [3]

GNU Lightning http://lists.ximian.com/pipermail/mono-devel-list/2009-April/031640.html targets: aarch64, alpha, arm, hppa, ia64, mips, powerpc, risc-v, s390, sparc and x86

https://www.gnu.org/software/lightning/

Lesser GPL

## Asmjit

https://github.com/asmjit/asmjit

targets?: ARM AArch64 X86 X64

zlib License

## QBE

https://c9x.me/compile/

https://c9x.me/git/qbe.git

QBE vs LLVM

The MIR project considers it a competitor as one of 3 "projects which could be considered or adapted as real universal light-weight JIT competitors". Compared to MIR:

It is small (10K C lines)
It uses SSA based IR (kind of simplified LLVM IR)
It has the same optimizations as MIR-generator plus aliasing but QBE has no inlining
It generates assembler code which makes QBE 30 slower in machine code generation than MIR-generator " -- [4]

Tutorials and discussions:

QBE IR for Rust
created by Quentin Carbonneaux
- https://twitter.com/qcar_

MIT License

Opinions:

"I’m only slightly disappointed to see Phi nodes, which are a bit less elegant than block arguments IMO (which MLIR, Swift and Cranelift’s newer IRs use – rationale). But of course it’s no deal-breaker." -- [5]
"I was quite disappointed to see that there’s no pointer type in the IR. That means that it will never be able to target a CHERI platform (or any other architecture where pointers are not Integers), so the Morello system that I’m writing code on right now can never be a target." -- [6]

Notes:

"QBE IR has a little quirk in that you cannot jump to the first block in a function. You can have two blocks right on top of each other and jump to the second block." [7]

Links:

Let's get hands-on with QBE

https://github.com/vnmakarov/mir

https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/

https://github.com/vnmakarov/mir/blob/master/MIR.md

https://linuxplumbersconf.org/event/7/contributions/732/attachments/507/952/The-LightWeight-JIT-Compiler-Slides.pdf

https://developers.redhat.com/articles/2022/02/16/code-specialization-mir-lightweight-jit-compiler#tracing_and_meta_tracing_in_a_jit_compiler

targets: x86_64, aarch64, ppc64be, ppc64le, s390x Linux and x86_64 MacOS?

MIT license

MIR also has a C->MIR compiler "C11 standard w/o standard optional variable arrays, complex, and atomics" [8]

discussions:

RyuJIT?

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/ryujit-overview.md

"...the next generation Just in Time Compiler (aka “JIT”) for the AMD64 .NET runtime."

The MIR project considers it a competitor as one of 3 "projects which could be considered or adapted as real universal light-weight JIT competitors". Compared to MIR:

RyuJIT? is even bigger: 360K SLOC
RyuJIT? optimizations is basically MIR-generator optimizations minus SCCP
RyuJIT? uses SSA "

LIBFirm

https://github.com/libfirm/libfirm

## Cranelift

https://github.com/CraneStation/cranelift

## NanoJIT?

https://github.com/dibyendumajumdar/nanojit

## LLVM JIT (ORC)

https://llvm.org/docs/ORCv2.html

(much slower startup than smaller projects like MIR [9])

## GCC JIT (libgccjit)

https://gcc.gnu.org/onlinedocs/jit/

(much slower startup than smaller projects like MIR [10])

The evolution of the ABI is documented in: https://gcc.gnu.org/onlinedocs/jit/topics/compatibility.html#abi-symbol-tags

## GCC See also https://www.reddit.com/r/ProgrammingLanguages/comments/jhaand/what_code_generators_are_there/g9ycpg8/

"The key difficulty in adding garbage collection to code emitted with tools like this is being able to accurately tell what the root set is." -- [11]

## Code generation lists and links

Calling conventions

Calling conventions misc " Varargs in combination with other language fea- tures have led to calling conventions where the caller is responsible for removing the arguments from the stack. This makes it impossible to im- plement guaranteed tail-call optimization, which would be necessary to use C calls as a general con- trol flow primitive [Ste77] "

https://c9x.me/compile/bib/abi-x64.pdf https://c9x.me/compile/bib/abi-arm64.pdf

# Executable format tools

https://docs.rs/goblin/0.3.0/goblin/ https://github.com/m4b/goblin

Debugging instrumentation
1. DWARF

https://github.com/gimli-rs/gimli

Memory
1. Memory management tools

Misc
1. Dealing with stack alignment requirements when implementing VMs

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/using-the-stack-in-aarch64-implementing-push-and-pop gives methods of dealing with AArch64's requirement that SP be 16-byte aligned when implementing VMs that expect to be able to push or pop single registers).

## Libc and stack size

https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/
Stack size is invisible in C and the effects on "portability"
Some reasons for Go to not make system calls through the standard C library
"You cannot effectively handle stack overflow situations if you use the system stack on Linux...I found it was just easier to allocate my own stack segment and immediately switch to it and just leave the loader-provided stack segment alone....This also makes stack overflow very deterministic (only depends on the program) and controllable--there is a compiler option to set the stack size in KiB?." [12]
- "Managing your own pthread stacks can make thread more costly though, can't it? Use of pthread_set_stack requires stack_size at least PTHREAD_STACK_MIN which on glibc linux tends to be 16384, four times larger than the initial cost of a demand-paged stack map. I understand that on musl this is 2048, so that's nice."
  - " Yeah, I think there would be some overhead, because it doesn't look like pthread_create allows specifying a stack segment. But I wouldn't be using pthread_create (or any C library for that matter), because Virgil goes all the way down to kernel; there is no C environment."
"Your choices are:
Make a "stack" yourself, in heap memory, and implement your algorithm to manually push/pop from the stack you implemented (often not hard to do, but feels like reimplementing for no reason).
Spawn a new thread, where in most languages you can set the stack size. This is what I tend to do, although it does seem like a waste to have an extra thread for no reason other than to get more stack." " [13]
- " Zig appears to have a proposed solution to this similar to your option 1, move recursion to the heap. https://github.com/ziglang/zig/issues/1639 They're also looking at static analysis of the maximum stack depth which might help in some cases. " [14]
"LLVM-MOS targets 6502 and so goes out of its way to use as few of the hardware stack as possible: it does global program optimization, figuring out what calls what, and what functions don't need to be reentrant so it could allocate as many of activation records as possible statically, FORTRAN-style. " [15]