Table of Contents for Programming Languages: a survey

Implementation: Internal representations

Naming identifiers

The problem

This seems like it should not even be an issue. Clearly, identifiers are uniquely identified in the source code, so what is the problem? The problems are:

in the source code, identifiers are identified by strings; but it is too slow to deal with strings at runtime
strings in the source code are only unique within a given context
some languages allow the dynamic generation of new identifiers (eg gensym)
some languages allow the dynamic access (reflection) to the string names of some or all identifiers
some intermediate languages may want to have a fixed-length representation for identifiers

To recapitulate the previous list, in the source code identifiers are strings. We could therefore satisfy the semantics of the language by implementating all variable storage as lookup tables whose keys are strings. There would be a different table for each context (for example, each lexical scope would have its own table), and sometimes the language implementation may have to search multiple tables for one variable access (for instance, for lexically scoped closures). We would have a standard 'mangling' and escaping procedure to create unique dynamically-named identifiers (for example, we could say that all identifiers starting with '_' (one underscore) are dynamically-created, we could never dynamically create any identifier starting with '__' (two underscores), and if the source code contains any identifiers starting with one underscore, we substitute the one underscore with two underscores).

This works but it could be incredibly slow. Clearly, we don't have enough memory to pre-allocate a dense lookup table with one empty slot for every possibly identifier string, so we have to use a spare table representation such as a hash table. But this means that accesses to the values in the table, that is to say every variable access, takes a significant amount of time. So, for example, let's imagine a program that spends most of its time inside the following loop:

multiply integer x by integer y i = y - 1 result = x while i > 0: result = result + x i = i - 1

consider the statement "i = i - 1"; this would not be executed as a single assembly language instruction; rather, there would be a hashtable lookup into a table of variables to determine the memory location for the string 'i' (possibly multiple hashtable lookups if i is not in the current scope but is in a lexically enclosing scope).

This slowness seems unnecessary, because if you were hand-compiling this into assembly, you would keep i in a register and do i - 1 in a single statement. Even if you kept i in a stack frame, you would only take a few assembly statements to increment i. But looking up "i" in a hashtable could take very many assembly statements.

Identifiers as integers

Typically, for this reason, languages internally map identifiers to integers, not strings. Then variables are stored in a dense table (by this i mean that the table actually has a place in memory to store the value corresponding to each consecutive integer, up the some upper bound, typically the number of identifiers present in some lexical context at compile time). Now lookup into this table is cheap.

What about gensyms? One could have the storage area dynamically grow based on how many gensyms have been dynamically created. This could be the same table as for other identifiers, or it could be a special secondary table.

What about reflection? One could keep around tables of string identifiers in the source code, mapped to their corresponding integer identifier values. Lookup into these tables would be expensive, but this expense would only need to be incurred at the time the reflection operations were executed.

But, there is more than one lexical context in most programs. And, sometimes execution in one context may be permitted to reference values from some other context (for example, languages with both local and global variables; or languages which allow expressions to reference variables defined in enclosing lexical scopes). So we have to think about how many of these variable storage tables exist, to which table a given identifier in the source code is mapped, in what order the tables are searched, and how, where, and when the tables are stored.

These issues overlap with the implementation of control flow (see also Chapter [Self-proj-plbook-plChControlImpl]).

Local variables, activation records, lexical context, dynamic context, stack frames

By 'lexical context' is meant the lexical position within the source code of a given statement. By 'dynamic context' is meant control flow state at some time during program execution (including, for example, the call stack, which specifies the nested set of subroutine calls which the program is currently 'within').

Consider a statically-typed imperative language which only has local, lexically-scoped variables; does not permit the access of local variables in any enclosing lexical scopes; is single-threaded; does not permit reflection or dynamic creation of variables; and does not permit program state to be 'saved' and later restored; and explicitly returns function results.

In such a language:

at any point during execution, the point of execution exists within one single lexical context, and the mapping of identifiers names to variables depends only on that lexical context
the string names of all possible identifiers in each context are statically known, and are not needed at runtime
when the program leaves a subroutine, any variable values assigned during that subroutine can be thrown away, except for those explicitly returned as function results.

In this situation, we could implement the language by having the compiler generate, for each lexical context, a static mapping from string identifier names to integers; and then throwing away the string identifier names and keeping only the integers (detail: for purposes of printing useful error messages in case of a runtime error, we might want to keep the mapping between the integers and the original string names at runtime anyways).

At runtime, as the flow of control enters each lexical context, memory would be allocated for all of the variables defined in that lexical context (note that in a recursive process, the same lexical context may be re-entrantly entered multiple times, meaning that it's possible that a new copy of all local variables associated with a lexical scope be created upon each recursive entrance to the same lexical scope; we say "for each 'dynamic context', there is a copy of all of these variables"). As we have static typing, the amount of memory needed to store each variable is statically known. As to how the identifier integers map to the storage locations of the corresponding variables, we have a few choices.

One idea that can't work is to have the identifier integers actually be absolute pointers (within the address space of the executing program's process). This can't work because if we recursively re-enter a procedure, we need the same identifier to now refer to a different memory location.

(A) One choice is to have the identifier integers be pointer offsets from some base address that varies dynamically.

(B) Another choice is to add a layer of indirection; for each dynamic context there is a table of pointers, one for each variable; identifier integers are indices in this table; to access a variable's value, first look up the pointer in this table, then follow that pointer to find the value.

An advantage of (A) is that we don't have indirection; that is, we don't have to do a second pointer lookup to access variables; that is, we only have one memory access instead of two.

An advantage of (B) is that each consecutive integer refers to a new variable, regardless of actual variable storage size; and also that we DO have indirection, which may be useful for memory management by allowing the memory manager to move around the physical location of variable storage during program execution.

(C) Another choice is to combine (A) and (C); for variables whose storage space requirements are small (for example, no larger than a pointer), put the values themselves in the lookup table; for other variables, store the values elsewhere and put a pointer to the actual value in a lookup table.

The scheme in (C) could be expanded to also force storage outside of the lookup table when the variable's value will be returned from the current function; preventing the need to copy values out of local variables and into a return parameter upon function return.

In each of these three schemes, we create a table of variables for each function invocation. These tables are called __activation records__ (or sometimes __activation frames__).

In a language in which the only activation records that ever need to be accessed are ones corresponding to the currently executing function, and to functions which called it (and so are still present on the call stack), and in which activation records never need to be 'saved' eg in a closure, we can store activation records in the stack. When we do this, these activation records are called __stack frames__.

saguaro stacks, closures

In some languages the variables corresponding to some dynamic and lexical context can persist longer than the corresponding entry on the call stack. For example, in languages with 'closures', one can create a first-class function value and return it; the first-class function value however can retain a reference to variables in enclosing lexical scopes. This necessitates continuting to store those variables even after the function in whose lexical scope those variables are returns, as long as the first-class function value exists. There could be multiple distinct first-class function values with references to variables in the same enclosing lexical scope; in some languages, the first-class functions, if called, can alter these values, and other distinction first-class functions can see these changes, if they are later called.

One way to represent this is to allocate the activation records somewhere other than the stack (ie, the 'heap', which is often used to mean all of allocatable memory except for the stack). In this case activation records are not the same as stack frames. If the language allows functions to reference enclosing lexical contexts, then these activation records will have, conceptually, a tree-like structure, representing the lexical enclosing relation (where the root is the outermost lexical context, the entire program or module or source code file). Resolving a variable access may require walking up the tree, starting from the currently executing location, and going towards the root. Due to this tree-like structure, this is sometimes referred to as a 'saguaro stack'.

namespaces, namespace concatenation vs explicit specification

What about a language with both local and global variables? This is an example in which variables exist in two different namespaces.

There could be a local variable named 'x' and also a global variable named 'x' (depending on the language, a reference to 'x' may implicitly refer to the local variable if one with the name exists, and a global variable otherwise; in which case, if both exist, we say that the global variable is 'shadowed', eg inaccessable by the usual route due to another variable taking its place).

If we map identifiers to integers, how do different namespaces map to integers?

One alternative is to map strings to integers at compile time, and map the same string in different namespaces to the same integer, and then do resolution at runtime.

Another alternative is to give each namespace a number, and to resolve each identifier at compile-time to a pair of integers (namespace, identifier). For example, the local namespace could be '0' and the global namespace could be '1'; global variable 'x' might be (1, 0).

Another alternative is to create a composite namespace of identifier integers where various integers within this namespace reference identifiers in different original namespaces. For example, local variable 'x' could be 0, local variable 'y' could be 1, and global variable 'x' could be 2, and global variable 'g' could be 3. The original namespaces could correspond to contiguous regions in the composite namespace (as in the previous example, where the local variables were the region 0 thru 1, and the global variable(s) were the region of 2 thru 3); or not (for example, if we started with the previous example but then also said that local variable 'l' was 4). If there are different namespace types, and many namespaces of that type, one might use a triplet, (namespace type, namespace, id); for example, if each lexical scope has its own namespace.

Because different types of namespaces (such as local variables vs. global) might persist for different amounts of time, these might be implemented as separate lookup tables.

Out of these alternatives, the runtime-resolution alternative preserves the structure of the source code, but requires more work at runtime; giving each namespace a number is clear and admits efficient runtime access, but increases the number of bits needed to store each reference; and creating a composite namespace allows a compact representation of identifiers but may require more work at runtime to determine to which actual variable storage table a given identifier integer corresponds.

Note that many language implementations adopt an implict version of the numbered namespace choice; instead of giving the namespaces numbers, they simply use different instructions to reference each one, for instance the separate GETUPVAL and GETGLOBAL in the Lua 5.1 VM.

How many bits for various things?

For example, if identifiers are being mapped to integers, how many bits do we give each one? This sets an upper bound on how many distinct identifiers we can handle within one context. We could have the size of identifier integers be arbitrarily large and vary per-program, but this would make language implementation more difficult and less efficient.

Often embedded sytems have lower limits. Sometimes special-purpose languages have lower limits. Newer implementations often have higher limits.

The following numbers are approximate (eg we say "64k" for 65534).

Here is the maximum number of identifiers/variables set by various languages and language implementations (possibly 'within one block' etc):

Visual Basic 1998: 64k [1]
Stata: 2k [2]
SAS pre 9.1: 32k [3]
C#: 64k [4]
SQL IBM DB2: 4k "Maximum number of variables and constants in an SQL statement " [5]
ANSI C 1988: 127 ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 1024 [6]
Java Card (embedded): 255 [7]
Java/JVM: 64k [8]
Lua: 200 [9] "By default, Lua has a maximum stack frame size of 250. This is encoded as MAXSTACK in llimits.h. The maximum stack frame size in turn limits the maximum number of locals per function, which is set at 200, encoded as LUAI_MAXVARS in luaconf.h. Other limits found in the same file include the maximum number of upvalues per function (60), encoded as LUAI_MAXUPVALUES, call depths, the minimum C stack size, etc. Also, with an sBx field of 18 bits, jumps and control structures cannot exceed a jump distance of about 131071."
emacs' elisp: used to be 1000 [10], is now 1300 [11]

Another common limit is the maximum number of parameters in a function call:

SQL IBM DB2 9.7: 32k "Maximum number of parameters in a procedure with LANGUAGE SQL" [12]
SQL IBM DB2 9.7: 90 "Maximum number of parameters in an external procedure with PROGRAM TYPE SUB" [13]
ANSI C 1988: 31 ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 256 [14]
Java Card (embedded): 255 [15]
Python: 255 [16]
Java/JVM: 255 [17]

Another common limit is the maximum number of fields in a structure:

ANSI C 1988: 127 ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 16k data members in a single class, structure, or union [18]
C++ 1996: 4k members declared in a single class [19]
Java Card (embedded): 255 [20]
Java/JVM: 64k [21]

Some languages have a maximum nesting level:

SQL IBM DB2 9.7: 64 "Maximum number of nested levels for routines" [22]
ANSI C 1988: 15 nesting levels of compound statements, iteration control structures, and selection control structures ANSI C X3.???-1988 section 2.2.4.1
ANSI C 1988: 32 expressions nested by parentheses within a full expression ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 256 Nesting levels of compound statements, iteration control structures, and selection control structures [23]
C++ 1996: 256 Nesting levels of parenthesized expressions within a full expression [24]

Some language have a maximum number of (significant) characters in identifier names:

SQL IBM DB2 9.7: 128 "Identifier in an SQL procedure" [25]
Visual Basic 2015: 1023 [26] [27]
Stata: 32 [28]
SAS: 32 [29]
ANSI C 1988: 31 expressions nested by parentheses within a full expression ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 1024 [30]
Java/JVM: 64k [31]

Some language have a maximum number of characters per line:

Visual Basic 1998: 1023 [32]
Visual Basic 2015: 64k [33]
ANSI C 1988: 509 ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 64k [34]

Some language have a maximum number of characters per string constant or per varying-length string:

Visual Basic 2015: 2^31 [35] [36]
Stata 12: 244 per varying-length string [37]
Stata 13: ~2^31 per varying-length string [38]
ANSI C 1988: 509 ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 64k [39]
JVM: 64k [40]

Some language have a maximum number of array dimensions:

Visual Basic 2015: 32 [41] [42]
Java/JVM: 255 [43]

Some language have a maximum number of declarations per module, or external identifiers per module:

Visual Basic 1998: ~1500 [44]
ANSI C 1988: 511 ANSI C X3.???-1988 section 2.2.4.1
C++ 1996: 64k External identifiers in one translation unit [45]
Java Card (embedded): 255 classes and interfaces per package [46]
Java Card (embedded): 128 external packages referenced in a package [47]

Some default max stack depts:

emacs's elisp has a default max-lisp-eval-depth of 500 [48] (but i tried this just now in my emacs and got 600; and [49] says the default is 400, and used to be 300 [50])

distantly related:

The Need for Large Register Files in Integer Codes

Strings

String interning

https://en.wikipedia.org/wiki/String_interning

Numbers

See the implementation of arithmetic.

Hash tables

Links:

Ruby: https://blog.engineyard.com/2013/hash-lookup-in-ruby-why-is-it-so-fast
Comparison: http://incise.org/hash-table-benchmarks.html
Comparison: http://preshing.com/20110603/hash-table-performance-tests/

How much precision is typically desired in numeric calculations?

http://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimals-of-pi-do-we-really-need/

https://news.ycombinator.com/item?id=11316401

If you have some time to spare, watch this talk "The astonishing simplicity of everything" by Neil Turok. He explains all this so beautifully.

https://www.youtube.com/watch?v=f1x9lgX8GaE

---

http://www.hpcalc.org/details.php?id=1693

notjpl 19 days ago

I agree, that's a poor answer by NASA director and chief engineer. Here is a better answer:

The precision used for calculations is dependent on the number of "steps" required to get to the final result. Roughly, for N repeated calculations you lose somewhere between sqrt(N) * eps to N * eps of precision (eps=2e-16 for IEEE64).

Here are some actual examples:

IEEE64 (~16 decimal digits) is OK for interplanetary navigation for few months, where relatively low accuracy is required.

With the same precision, you start to lose phase accuracy above 24 hours if you're simulating GPS constellations. You need quad precision or above for simulations > 24 hours.

For simulating planet trajectories and solar system stability (Lyapunov time of planets), IEEE64 is good for ~10 mya in the future (Neptune-Pluto Lyapunov time), IEEE128 for ~200-1000mya, above that it is recommended to use 256bit floats and above. This is assuming typically ~1000 steps per simulated orbit.

Fun fact: we know from simulations that Pluto trajectory is stable for >10G years, but unpredictable above >10M years because of chaotic (but stable) interaction with Neptune.

[1] https://en.wikipedia.org/wiki/Stability_of_the_Solar_System

cperciva 19 days ago

Here are some actual examples:

Something to add to your list of examples: During the first Gulf war, 28 US soldiers died due to accumulated rounding errors in the Patriot Missile battery computers: https://www.ima.umn.edu/~arnold/disasters/patriot.html

(This was in fact a known issue, and operators had been instructed to reboot the computers every 8 hours. Unfortunately this instruction ignored the fact that, in the field, nobody wanted to be responsible for turning off their defensive systems for a minute.)

notjpl 19 days ago

Yeah, that doesn't surprise me in the least, many high-tech military systems have MTBF/MTTF of a few hours at best. Also, that's what you get when you try to do radar time computations using 24-bit fixed point in Ada.

Back to astronomy, in many astronomy libraries (such as astropy library) computations regarding time are done using 2 doubles (about 106 bit precision). 1 double is not enough.

_brandmeyer_ also mentioned something important that I totally forgot - any trigonometric computation requires computing modulo-pi to an accuracy of 1 ulp, which requires storing PI to ~1144 bits for double precision (for numbers near pi) (see Kahan argument reduction paper).

Since Intel processsors don't reach the required precision for IEEE standard above pi/2, this modulo reduction is done in software to this day. gcc maintains a 1144 bit PI constant and does a 1144 bit modulo every time you compute a sine/cosine above pi.

TLDR - 344 decimal digits of PI are used. High-precision PI computation is surprisingly more common than we expect...

[1] http://docs.astropy.org/en/stable/time/index.html

[2] https://software.intel.com/en-us/blogs/2014/10/09/fsin-docum...

[3] https://gcc.gnu.org/gcc-4.3/changes.html

[4] http://www.csee.umbc.edu/~phatak/645/supl/Ng-ArgReduction.pd...

cperciva 19 days ago

any trigonometric computation requires computing modulo-pi to an accuracy of 1 ulp

For the trigonometric function itself, sure. For any reasonable algorithm which uses the trigonometric function, no. If find yourself computing sin(10^6), you're not really trying to compute sin(10^6); you're trying to compute sin(x) for some value of x which you know lies between 10^6(1 - epsilon) and 10^6(1 + epsilon). So the extent to which trigonometric calculations can lose precision by not doing extra-precision argument reduction, that precision was already lost in computing the unreduced argument. "

So how does the GC scan a stack to find 'managed pointers'?

Here's how CLR does it:

https://mattwarren.org/2019/01/21/Stackwalking-in-the-.NET-Runtime/ https://github.com/dotnet/coreclr/blob/master/Documentation/botr/stackwalking.md#managed-frames

" Managed Frames

Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).

Since methods are typically called more times than their frames are crawled (stack crawls are relatively rare in the runtime, at least with respect to the rate at which methods are typically called) it makes sense to trade method call performance for some additional crawl time processing. As a result the JIT generates additional metadata for each method it compiles that includes sufficient information for the stack crawler to decode a stack frame belonging to that method.

This metadata can be found via a hash-table lookup with an instruction pointer somewhere within the method as the key. The JIT utilizes compression techniques in order to minimize the impact of this additional per-method metadata.

Given initial values for a few important registers (e.g. EIP, ESP and EBP on x86 based systems) the stack crawler can locate a managed method and its associated JIT metadata and use this information to roll back the register values to those current in the method's caller. In this fashion a sequence of managed method frames can be traversed from the most recent to the oldest caller. This operation is sometimes referred to as a virtual unwind (virtual because we're not actually updating the real values of ESP etc., leaving the stack intact). "

"All these examples end up calling into the Thread::StackWalkFrames?(..) method here"

"It’s worth pointing out that the only way you can access it from C#/F#/VB.NET code is via the StackTrace? class, only the runtime itself can call into Thread::StackWalkFrames?(..) directly."

" ProcessIp?(..) here has the job of looking up the current managed method (if any) based on the current instruction pointer (IP). It does this by calling into EECodeInfo::Init(..) here and then ends up in one of:

    EEJitManager::JitCodeToMethodInfo(..) here, that uses a very cool looking data structure refereed to as a ‘[https://github.com/dotnet/coreclr/blob/release/2.2/src/inc/nibblemapmacros.h#L12-L26 nibble map]’
    NativeImageJitManager::JitCodeToMethodInfo(..) here
    ReadyToRunJitManager::JitCodeToMethodInfo(..) here"

and it looks to me like in the first case (EEJitManager?) we end up here in RangeSection?* ExecutionManager::GetRangeSection?(TADDR addr):

https://github.com/dotnet/coreclr/blob/release/2.2/src/vm/codeman.cpp#L4378-L4459

So, uh, yeah, it looks to me like it's exactly what the above blog post implies: there is some sort of simple list of code ranges and we are simply searching this list to identify which range the current frame's instruction pointer is inside!

not sure exactly what this stuff means: " Unwinding ‘JITted’ Code

Finally, we’re going to look at what happens with ‘managed code’, i.e. code that started off as C#/F#/VB.NET, was turned into IL and then compiled into native code by the ‘JIT Compiler’. This is the code that you generally want to see in your ‘stack trace’, because it’s code you wrote yourself! Help from the ‘JIT Compiler’

Simply, what happens is that when the code is ‘JITted’, the compiler also emits some extra information, stored via the EECodeInfo? class, which is defined here. Also see the ‘Unwind Info’ section in the JIT Compiler <-> Runtime interface, note how it features seperate sections for TARGET_ARM, TARGET_ARM64, TARGET_X86 and TARGET_UNIX.

In addition, in CodeGen::genFnProlog() here the JIT emits a function ‘prologue’ that contains several pieces of ‘unwind’ related data. This is also imlemented in CEEJitInfo::allocUnwindInfo(..) in this piece of code, which behaves differently for each CPU architecture: "

If it finds the code then for each frame it returns a MethodDesc? to the caller (the code that wanted to walk the stack) " Managed code, well technically ‘managed code’ that was JITted to ‘native code’, so more accurately a managed stack frame. In this situation the MethodDesc? class defined here is provided, you can read more about this key CLR data-structure in the corresponding BotR? chapter. "

---

So the upshot from the previous section is, for each frame in the stack:

there is some sort of simple list of code ranges and we are simply searching this list to identify which range the current frame's instruction pointer is inside!

what does the JVM do? "The VM identifies the method that owns the stack frame by looking up the instruction pointer value in the method code block tables. " [51]

" When an exception is thrown through a PC, we first examine which try block covers the program range including that PC. For the table- driven exception handling [55, 20], which is commonly used to im- plement exception handling, as in the Java VM [48], the compiler constructs the table that maps a range of PCs to a try block. With the table, the PC is searched for and is mapped to a try block. If a try block is found, we then examine whether the handler can catch " -- [52]

Recall that the Book of the Runtime said:

" Because the runtime owns and controls the JIT (Just-in-Time compiler) it can arrange for managed methods to always leave a crawlable frame. One solution here would be to utilize a rigid frame format for all methods (e.g. the x86 EBP frame format). In practice, however, this can be inefficient, especially for small leaf methods (such as typical property accessors).

proj-plbook-plChInternalRepresentationsImpl