Table of Contents for Programming Languages: a survey
"It has been a long-known lesson in programming that different languages for the same algorithm give you factors of a few in improvement (let's say 5-10x here), but that improved algorithms give you orders of magnitude improvement (10-100x). " -- http://danbricklin.com/log/2013_06_19.htm#jsspeed
however some disagree:
"With an interpreted language, either you are lucky and have an efficient routine that does what you need, or...the two orders of magnitude paid up-front are hard to recover with algorithmic improvements" -- http://web.archive.org/web/20090721080837/http://eigenclass.org/hiki/wide-finder-conclusions
In general, whenever one writes a program on a higher layer of abstraction which is executed on some intermediate layer platform, there is an efficiency penalty. For example, a program running on a VM will almost never be faster, and will often be slower, than the same program running on the underlying hardware, for the same reason that running an algorithm on top of an emulator can be slower than running the same algorithm directly on the underlying hardware that the emulator is running on.
Many high level languages use VMs.
If late binding is used, then a method call must be resolved at runtime to the actual code that will be executed. This resolution is an extra step.
(antidote: inlining)
see [1]
todo
computation < threadlocal memory access < shared memory access < memory allocations ===
In general,
Of course, very long computations could take longer than memory access (todo: for example, would doing a lookup on a cached hash table with string keys take longer than an uncached access to main memory?)
Some things that tend to take a lot of time ([2]):
or reference counting
e.g. lazy
e.g. prolog
e.g. greenthreads
" Managed languages made deliberate design tradeoffs to optimize for programmer productivity even when that was fundamentally in tension with, and at the expense of, performance efficiency… In particular, managed languages chose to incur costs even for programs that don’t need or use a given feature; the major examples are assumption/reliance on always-on or default-on garbage collection, a virtual machine runtime, and metadata. But there are other examples; for instance, managed apps are built around virtual functions as the default, whereas C++ apps are built around inlined functions as the default, and an ounce of inlining prevention is worth a pound of devirtualization optimization cure. " -- http://herbsutter.com/2012/04/02/reader-qa-when-will-better-jits-save-managed-code/
see metadata subsection in next section on memory; following pointers in these larger memory structures takes time, too
" The Go authors had a pretty good article on what's wrong with Java's performance : pointers everywhere. Every last little thing that isn't a primitive type is a pointer. Everywhere, in every bit of code.
That means a "new Object()" takes up 16 bytes (8 bytes for the object, 8 for the pointer to it). That means you fill a cache line by allocating 4 objects, or 2 objects containing a single reference, or ...
So in java you should never program a line drawing loop by using 2 vectors, because 2 vectors, each with 2 32-bit ints take up 82 (2 pointers to the objects you're using) + 82 (overhead for the objects) + 4*2 (the actual data) 40 bytes of data. No way you can fit that in registers and still use registers to actually calculate things. So instead you should use 4 ints and just forget about the objects, and even that will only work if you never call any functions.
Same loop in C/C++/Pascal/Go/... using structs takes 8 bytes (they don't keep structs on the heap), which, if necessary, fits in 1 register (granted, in practice we're talking 2 registers, but still).
People might reply to this with benchmarks, but if you actually analyse the java code where java beats or is comparable with C/C++ you're going to see zero object allocations. You're not even going to see them using bool in the extreme cases, rather they'll bitshift into ints to effectively generate packed bools (certainly in SAT benchmarks). This is not realistic java code, which would have been way slower.
Java's memory model is the main culprit at this point in time. Java can do incredible tricks with programs, and actually exposes them, enabling lots of language creativity on the JVM. But there's a pretty sizeable cost in speed and memory usage. " -- iofj
e.g. type information, thunks
boxing, e.g. linked list vs. array
prevention of inlining object members into the object struct
prevention of e.g. intrusive linked lists
todo
See my StackOverflow? question: http://stackoverflow.com/questions/28118218/rationale-for-design-choices-causing-jvm-clr-languages-to-have-long-startup/
Here is my summary answer to that question: http://stackoverflow.com/questions/28118218/rationale-for-design-choices-causing-jvm-clr-languages-to-have-long-startup/28379292#28379292
A good choice would be LL(1) syntax, or at least LL(k), with no symbol table required at parse time. This enables you to write a simple recursive descent parser or to use a wide variety of existing tools.
todo: list of languages with LL(k) grammars: Python, what else?
If code anywhere in the program can, at any time, redefine the addition operator to do something else (this occurs, for example, with some forms of 'open classes'), and if this redefinition changes the meaning of the addition operator throughout the whole program, then the compiler can't optimize very much because it must not merge addition with anything else and forget where the original additions occurred, and it can't assume that addition will necessarily have the usual properties (no side-effects, deterministic, no chance of throwing an exception, associative, commutative, 0 is an identity, etc); in addition, either the runtime must be able to at least partially recompile the rest of the program or every addition must be an indirect function call rather than a machine instruction.
Even if the program does not redefine addition, if the programming language allows the possibility of doing so, then the compiler must statically/at compile time prove to itself that this particular program will not do it before it can optimize addition or use machine instructions for it. This could be tricky if the program contains any instances of 'eval' applied to a dynamically-generated string.
The less than the language is capable of runtime metaprogramming, the less that this sort of thing will come up. If metaprogramming is possible, but it is all at compile time (eg compile-time macros), this solves this problem.
If runtime metaprogramming is permitted by the language, then consider restricting its lexical scope, so that the compiler can easily be sure that at least most of the code (that outside the lexical scope of the runtime metaprogramming) is not affected. For example, a form of 'eval' might be provided that can take input and return output, and use metaprogramming within the 'eval' itself, but does not have the power to affect the interpretation of expressions outside of the 'eval'
Another way out (or at least to 'contain the damage') is to allow metaprogramming to define new terms but never to redefine the meaning of terms already defined at compile-time.
One construct that is not often thought of as 'metaprogramming' but that can cause the same sort of problem are 'import' statements, especially if they operate at runtime. If an import statement can redefine existing terms at runtime, then the compiler cannot finish dealing with expressions containing that term until all possible imports that might affect those expressions have been resolved. Dynamic interpreted languages such as Ruby sometimes have import constructs of this sort [4]