books-programmingLanguages-programmingLanguagesChTradeoffsImpl

Table of Contents for Programming Languages: a survey

Chapter : Why high-level languages tend to be slower

Aside: algorithms can matter more than languages

"It has been a long-known lesson in programming that different languages for the same algorithm give you factors of a few in improvement (let's say 5-10x here), but that improved algorithms give you orders of magnitude improvement (10-100x). " -- http://danbricklin.com/log/2013_06_19.htm#jsspeed

however some disagree:

"With an interpreted language, either you are lucky and have an efficient routine that does what you need, or...the two orders of magnitude paid up-front are hard to recover with algorithmic improvements" -- http://web.archive.org/web/20090721080837/http://eigenclass.org/hiki/wide-finder-conclusions

layers of abstraction

In general, whenever one writes a program on a higher layer of abstraction which is executed on some intermediate layer platform, there is an efficiency penalty. For example, a program running on a VM will almost never be faster, and will often be slower, than the same program running on the underlying hardware, for the same reason that running an algorithm on top of an emulator can be slower than running the same algorithm directly on the underlying hardware that the emulator is running on.

Many high level languages use VMs.

late binding

If late binding is used, then a method call must be resolved at runtime to the actual code that will be executed. This resolution is an extra step.

function calling

(antidote: inlining)

overuse of hashes and string allocations

https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow

todo

GC

or reference counting

bounds checking

runtime type checking

runtime evaluation strategy

e.g. lazy

e.g. prolog

runtime multitasking

e.g. greenthreads

GILs

not making use of platform primitives

???

" Managed languages made deliberate design tradeoffs to optimize for programmer productivity even when that was fundamentally in tension with, and at the expense of, performance efficiency… In particular, managed languages chose to incur costs even for programs that don’t need or use a given feature; the major examples are assumption/reliance on always-on or default-on garbage collection, a virtual machine runtime, and metadata. But there are other examples; for instance, managed apps are built around virtual functions as the default, whereas C++ apps are built around inlined functions as the default, and an ounce of inlining prevention is worth a pound of devirtualization optimization cure. " -- http://herbsutter.com/2012/04/02/reader-qa-when-will-better-jits-save-managed-code/

metadata

see metadata subsection in next section on memory; following pointers in these larger memory structures takes time, too

computed goto

if you are writing an interpreter, commonly the structure will involve a dispatcher which looks at the next instruction to be executed, execute a segment of code corresponding to that instruction, and then returns to dispatch. You might think that the best way to do this would be a switch statement on the next instruction. But actually the best way is to make a dispatch table, and then at the end of each instruction-specific code segment to look at the next instruction, lookup the address to jump to out of the dispatch table, and then jump to it. There are two reasons this is faster. First, a switch statement does some bounds-checking overhead, but if you know beforehand that the next instruction is a valid instruction that appears somewhere in the dispatch table, then you don't have to do this. Second, most hardware branch predictors will work better because they can associate a different branch probability table with each instruction, rather than just associating one branch probability table with the single jump in the switch statement.

Empirically, this is about 20% faster. Implementations of Python, Ruby, and Java (on Dalvik, the Android Java VM) use this. But many high level languages don't provide a computed goto instruction, so you can't do this in them. Writing an interpreter is a niche need, but an important one for platform metaprogrammability.

todo how close is tailcall to this?

links: http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/ http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Labels-as-Values.html http://www.emulators.com/docs/nx25_nostradamus.htm

Chapter : Why high-level languages tend to use more memory

GC

metadata attached to values

e.g. type information, thunks

boxing, e.g. linked list vs. array

prevention of inlining object members into the object struct

prevention of e.g. intrusive linked lists

Chapter : Why some languages take a long time to compile

todo

Case studies

scala

Chapter : Constraints that performance considerations of the language and the toolchain impose on language design

Whole program analysis

other stuff that Go does (later: what?!?)

a sequence point is "a point in the program's execution sequence where all previous side effects SHALL have taken place and all subsequent side-effects SHALL NOT have taken place" -- http://www.slideshare.net/olvemaudal/deep-c

if each computation step is a sequence point, then the programmer doesn't have to think too much about what is going on. but if there are few sequence points, there is more room for optimization. Similary, if the order of evaluation of expressions is unspecified, then the compiler can optimize more.

similarly, if expression evaluation order is undefined, more room for optimization

something about memory model and reordering of statements e.g. java volatile

if you pad data structures to fall on word boundaries, you get better performance

what else?