proj-plbook-plChOptimizationImpl

Difference between revision 16 and current revision

No diff available.

Table of Contents for Programming Languages: a survey

 Since you mentioned escape analysis: would it be possible to optimize x ++ y to avoid copying of x (just change the last tail pointer to point to y), if x is not used anywhere else? Does GHC do this?    Roman Cheplyaka Feb 27 '11 at 13:47

@Roman: No, GHC's escape analysis only applies to local let-bindings. There is no interprocedural analysis going on. For your example, you'd need to statically ensure that there is no other pointer anywhere in the heap that points to x or any of its successors. You'd need linear or uniqueness types to prove something like that. nominolo Feb 27 '11 at 14:00

-- http://stackoverflow.com/questions/5132350/how-do-haskell-compilers-decide-whether-to-allocate-on-the-heap-or-the-stack#comment5759432_5132862

https://en.wikipedia.org/wiki/Hash_consing

https://shipilev.net/jvm-anatomy-park/

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/hoopl-haskell10.pdf

---

"some keywords to get you started are “constant propagation”, “common subexpression elimination”, “loop invariant code motion”, “global value numbering”, “strength reduction”, “scalar replacement of aggregates”, “dead code elimination”, and “loop unrolling”." -- [1]

---

the "performance" section of this blog post talks about some late-stage language optimizations that were particularly helpful for one application:

https://blogs.msdn.microsoft.com/dotnet/2018/08/20/bing-com-runs-on-net-core-2-1/

---

https://v8.dev/blog/v8-lite

---

example:

http://blog.kevmod.com/2020/05/python-performance-its-not-just-the-interpreter/ discussion: https://news.ycombinator.com/item?id=23235930

---

Links:

---

JIT

https://carolchen.me/blog/jits-intro/ https://carolchen.me/blog/jits-impls/ https://blog.erlang.org/a-first-look-at-the-jit/

---

"Most of the heavy-duty optimizations in modern industrial compilers happen on the linear IR representation anyway." [2]

---

pizlonator 1 day ago [–]

The lack of condition codes is a big deal for anyone relying on overflow checked arithmetic, like modern safe languages that do this for all integer math by default, or dynamic languages where it’s necessary for the JIT to speculate that the dynamic “number” type (which in those languages is either like a double or like a bigint semantically) is being used as an integer.

RISC-V means three instructions instead of two in the best case. It requires five or more instead of two in bad cases. That’s extremely annoying since these code sequences will be emitted frequently if that’s how all math in the language works.

Also worth noting, since this always comes up, that these things are super hard for a compiler to optimize away. JSC tries very aggressively but only succeeds a minority of the time (we have a backwards abstract interpreter based on how values are used, a forward interpreter that uses a simplified octagon domain to prove integer ranges, and a bunch of other things - and it’s not enough). So, even with very aggressive compilers you will often emit sequences to check overflow. It’s ideal if this is just a branch on a flag because this maximizes density. Density is especially important in JITs; more density means not just better perf but better memory usage since JITed instructions use dirty memory.

reply

tom_mellior 1 day ago [–]

> JSC [...] a forward interpreter that uses a simplified octagon domain to prove integer ranges

Off-topic, but could you point me to more details on this?

...

pizlonator 1 day ago [–]

I didn’t call it octagon when I wrote it because I didn’t know that I reinvented a shitty/awesome version (shitty because it misses optimization opportunities, awesome because by missing them it converges super fast while still nailing the hard bits).

Look at DFGIntegerRangeOptimizationPhase?.cpp

reply

---

about alloca:

https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html https://www.gnu.org/software/libc/manual/html_node/Disadvantages-of-Alloca.html https://stackoverflow.com/questions/1018853/why-is-the-use-of-alloca-not-considered-good-practice

---

https://www.amazon.ca/Advanced-Compiler-Design-Implementation-Muchnick/dp/1558603204

---

" What are the most valuable GCC optimizations for x86-64?

GCC-9.0, i7-9700K under FC29 SPECInt2000 Est. GCC -O2 GCC -O0 + simple RA + combiner -fno-inline 5458 4342 (80%) -finline 6141 4339 (71%) " -- [3]

---

see "Major Types of Optimizations" slide for a good intro to some optimizations:

https://passlab.github.io/CSCE513/notes/lecture03_ISA_Principles.pdf

---

Which optimizations are most important?

"The ~8 passes to write if you're going to bother. •Inline, Unroll (& Vectorize), CSE, DCE, Code Motion, Constant Fold, Peephole ... Many compilers just do those, get ~80% best-case perf. " -- [4]

---

QBE does at least:

"Copy elimination. Sparse conditional constant propagation. Dead instructions elimination. Registerization of small stack slots. Split spiller and register allocator thanks to SSA form. (Simpler and faster than graph coloring.) Smart spilling heuristic based on loop analysis. Linear register allocator with hinting. "

---

" MIR project competitors

I've been studying JITs for a long time. Before starting the MIR project, I thought about adapting the existing code. Here is a comparison of MIR with the simple compiler projects I considered:

Project SLOC License IR type Major Optimizations Output MIR 16K C MIT non-SSA Inlining, GCSE, SCCP, RA, CP, DCE, LA Binary code DotGNU? LibJIT? 80K C LGPL non-SSA Only RA and primitive CP Binary code .NET RyuJIT? 360K C++ MIT SSA MIR ones minus SCCP plus LICM, Range Binary code QBE 10K C MIT SSA MIR ones plus aliasing minus inlining Assembler LIBFirm 140K C LGPL2 SSA RA, Inlining, DCE, LICM, CP, LA, Others Assembler CraneLift?