Table of Contents for Programming Languages: a survey
This is not a book about how to implement a programming language. However, even at the earliest stages of language design, it is valuable to have some idea of how the certain choices in the design of the language could make the implementation much less efficient or much more difficult to write. For this purpose, we'll provide a cursory overview of implementation topics, touching as we go on some of the language design choices that may have a large impact.
when do we convert to normal forms?
todo
" The first big phase of the compilation pipeline is parsing. You need to take your input and turn it into a tree. So you go through preprocessing, lexical analysis (aka tokenization), and then syntax analysis and IR generation. Lexical analysis is usually done with regexps. Syntax analysis is usually done with grammars. You can use recursive descent (most common), or a parser generator (common for smaller languages), or with fancier algorithms that are correspondingly slower to execute. But the output of this pipeline stage is usually a parse tree of some sort.
The next big phase is Type Checking. ...
The third camp, who tends to be the most isolated, is the code generation camp. Code generation is pretty straightforward, assuming you know enough recursion to realize your grandparents weren't Adam and Eve. So I'm really talking about Optimization...
-- http://steve-yegge.blogspot.ca/2007/06/rich-programmer-food.html
modified from Python's docs:
" The usual steps for compilation are:
Parse source code into a parse tree
Transform parse tree into an Abstract Syntax Tree
Transform AST into a Control Flow Graph
Emit bytecode based on the Control Flow Graph" -- https://docs.python.org/devguide/compiler.html#abstractdiscussions on the benefits of the introduction of a 'Core' or 'middle' language in between the HLL AST and a lower-level, LLVM-ish language:
some advantages of a core language:
Links:
todo: somewhere put my observations on LLLs and target languages:
'target' languages include both 'Core languages' and LLLs
goal for:
Core languages tend to be: similar to the HLL, except: de-sugared (eg apply various local 'desugar' tranformations) have fewer primitives than the HLL (eg de-sugaring various control flow constructs to GOTO) in safe or prescriptive languages, sometimes the Core primitives are more powerful than the HLL (eg in Rust the MIR core language has GOTO but Rust does not) still retain function calling have explicit types have a bunch of other things that are explicit, so that the LLL is way to verbose for human use, but with a similar control flow to the original HLL like the HLL but unlike LLL, still nonlinear (parethesized subexpressions are still present)
LLLs and 'assembly' languages are more similar to each other than you might expect.
LLL general properties:
Main families of LLLs: imperative * stack vs. register machines stack machines (eg JVM, Python's, mb Forth) register machines * 3-operand vs 2- or 1-operand register machines * various calling conventions * addressing modes LOAD/STORE vs addressing modes vs LOAD/STORE with constant argument but self-modifying code explicit constant instructions (eg ADDC) vs constant addressing mode vs LOADK * are registers and memory locations actual memory locations upon which address arithmetic can be done (like in assembly), or just variable identifiers (like in Lua)? functional * eg https://en.wikipedia.org/wiki/SECD_machine combinator * eg Nock
Also, instruction encodings might be fixed-width or variable (most are variable).
Primitive operations in LLLs: not as much variance as you might expect. Generally:
Links regarding stack machine vs register machines:
---
Someone's response to the question "Does anyone know if there is research into the effects of register counts on VMs?":
" Someone 1819 days ago [-]
If you can map all VM registers to CPU registers, the interpreter will be way simpler.
If you have more VM registers than CPU registers, you have to write code to load and store your VM registers.
If you have more CPU registers than VM registers, you will have to write code to undo some of the register spilling done by the compiler (if you want your VM to use the hardware to the fullest.)
So, the optimum (from a viewpoint of 'makes implementation easier') depends on the target architecture.
Of course, a generic VM cannot know how many registers (and what kinds; uniform register files are not universally present) the target CPU will have.
That is a reason that choosing either zero (i.e. a stack machine) or infinitely many (i.e. SSA) are popular choices fo VMs: for the first, you know for sure your target will not have fewer registers than your VM, for the second, that it will not have more.
If you choose any other number of VM registers, a fast generic VM will have to handle both cases.
Alternatively, of course, one can target a specific architecture, and have the VM be fairly close to the target; Google's NaCl? is (was? I think they changed things a bit) an extreme example. I have not checked the code, but this, I think, is similar. " -- https://news.ycombinator.com/item?id=2930109
---
whole program analysis
if modules are separately compiled and allow types to remain completely abstract, e.g. when a module is compiled it is possible to not know what the in-memory representation of values in the module will be, then a host of optimizations cannot be performed. Alternatives:
see section "compilation model" in "Retrospective Thoughts on BitC?" http://www.coyotos.org/pipermail/bitc-dev/2012-March/003300.html
https://news.ycombinator.com/item?id=5422094
" Some common intermediate representations:
Examples:
-- [5]
[6] also describes and motivates SSA and CPS in more detail. It defines SSA and explains why you need phi in SSA, and refers to an algorithm to determine where to place the phis.
Abstract machines and the compilers that love/hate them
Some target languages, such as WASM and SPIR V, require some amount of structure in control flow. If the source language doesn't provide this structure, it must be introduced.
Links:
check to see if there are any reads to uninitialized variables could implement as types
issue in C:
in C, according to one of the comments on http://www.slideshare.net/olvemaudal/deep-c , it was claimed (i didn't check) that if you declare your own printf with the wrong signature, it will still be linked to the printf in the std library, but will crash at runtime, e.g. "void printf( int x, int y); main() {int a=42, b=99; printf( a, b);}" will apparently crash.
-- A new programming language might want to throw a compile-time error in such a case (as C++ apparently does, according to the slides).
Links:
atomics
SIMD, MIMD
GPU
"Instructions should not bind together operations which an optimizing compiler might otherwise choose to separate in order to produce a more efficient program."
tags
descriptors
stacks
cactus/saguaro stacks
random book on stack-based hardware architectures: http://www.ece.cmu.edu/~koopman/stack_computers/contents.html
trampolines
vtables (C++)
interpreter vs spec:
" The technique that we had for Smalltalk was to write the VM in itself, so there’s a Smalltalk simulator of the VM that was essentially the only specification of the VM. You could debug and you could answer any question about what the VM would do by submitting stuff to it, and you made every change that you were going to make to the VM by changing the simulator. After you had gotten everything debugged the way you wanted, you pushed the button and it would generate, without human hands touching it, a mathematically correct version of C that would go on whatever platform you were trying to get onto." -- Alan Kay, http://queue.acm.org/detail.cfm?id=1039523
metacircular interpreters
Futamura projections https://news.ycombinator.com/item?id=7061913
"
nostrademons 1 day ago
| link |
IIUC PyPy? is all 3 Futamura projections.
I'm a little rusty on the details, but IIUC the core of PyPy? is an "abstract interpreter" for a restricted subset of Python known as RPython. Abstract interpretation is essentially a fold (in the functional programming sense) of the IR for a language, where you can parameterize the particular operation applied to each node. When the operation specified is "evaluate", you get partial evaluation, as described in the Wikipedia article.
The interesting part of PyPy? happens when the RPython program to be evaluated is itself a Python interpreter, which it is. In this case, 'prog' is the Python interpreter written in RPython, and Istatic is your Python program. The first Futamura projection will give you a JIT; you specialize the RPython interpreter with your particular program, and it evaluates everything statically known at compile time (namely, your program) and produces an executable.
The second Futamura projection will give you a JIT compiler. Remember, since RPython is a subset of Python, an interpreter for Python written in RPython could be interpreted by itself. Moreover, it could be compiled by itself by the process described in the paragraph above. When you compile your JIT through the same process, you get a program that means the same thing (i.e. it will translate a Python program into an executable) but runs faster. The PyPy? JIT that everybody's excited about for running Python scripts faster is this executable.
The third Futamura projection involves running PyPy? on the toolchain that generated PyPy?. Remember, this whole specializer machinery is all written in RPython, which can be interpreted by itself. So when your run the specializer machinery through itself, you get - an optimized toolset for building JIT compilers. That's what made programming language theorists excited about PyPy? long before the Python community was. It's not just "faster Python", but it's a toolset for building a JIT out of anything you can build an interpreter for. Write an interpreter in RPython - for anything, Ruby, PHP, Arc, whatever - and run it through the PyPy? toolchain, and it will give you a JIT compiler.
reply "
https://www.semipublic.comp-arch.net/wiki/ROMmability
http://www.quora.com/What-is-the-difference-between-PyPy-Parrot-and-LLVM
http://lambda-the-ultimate.org/node/2634
Links:
"Basic block": "a portion of the code within a program with only one entry point and only one exit point." (wikipedia)
"in CPS, as in SSA or ANF, expressions don't have subexpressions." -- http://wingolog.org/archives/2014/05/18/effects-analysis-in-guile
"If an intermediate language is in SSA form, then every variable has a single definition site. Single-assignment is a static property because in a general control flow graph the assignment might be in a loop. SSA form was developed by Wegman, Zadeck, Alpern, and Rosen [4, 37] for efficient computation of data flow problems, such as global value numbering and detecting equality of variables." -- https://synrc.com/publications/cat/Functional%20Languages/AliceML/christian_mueller_diplom.pdf , Chapter 3
For example (examples taken from https://synrc.com/publications/cat/Functional%20Languages/AliceML/christian_mueller_diplom.pdf ), if the source code is: x = a + b; y = x + 1; x = a + 1;
then this is equivalent to:
x1 = a + b; y1 = x1 + 1; x2 = a + 1;
When two control flow edges join, we have to do something else. Consider
if x < 5 then v = 1 else v = 2; w = v + 1;
In this v, v has two possible definition sites, violating SSA, but we don't know statically which one will be used. As a control flow graph, we have a diamond:
(if x < 5) / \ / \(v1 = 1) (v2 = 2) \ / \ / (w = v? + 1)
SSA deals with this by introducing a "magic" operator, Phi. Phi is only allowed to occur at the beginning of a basic block. Its semantics are that Phi "knows" when it executes from which basic block control was recently passed to it; it is given multiple arguments, and it simple returns one of them; it chooses which one of its various arguments to return based on from which basic block control was recently passed to it. For example, the above diamond can be replaced by:
(if x < 5) / \ / \(v1 = 1) (v2 = 2) \ / \ / (v3 = Phi(v1,v2)
(w = v3 + 1)
here, Phi will return v1 if control came to it by the left path, or it will return v2 if control came to it by the right path. Meaning that v3 will end up being equal to v1 if the left path is taken, and it will end up being equal to v2 if the right path is taken. Note that the SSA property is preserved; each variable is assigned to exactly once.
"SSA form is typically used as intermediate representation for imperative languages. The functional programming community prefers the λ-calculus and continuations as intermediate language. Andrew Appel pointed out the close relationship between the two representations in his article ”SSA is Functional Programming”" -- https://synrc.com/publications/cat/Functional%20Languages/AliceML/christian_mueller_diplom.pdf
"AliceML? uses a weakening of SSA called Executable SSA Form, which works only on acyclic (control?) graphs, and which does not require Phi nodes. See https://synrc.com/publications/cat/Functional%20Languages/AliceML/christian_mueller_diplom.pdf Chapter 3, section 3.1.1. One disadvantage of that is that "the removal of Phi-functions in the abstract code might artificially extend the liveness of a variable across branches. Suppose a left branch sets a variable at the beginning and the right branch synchronizes this variable at the end. Then, the variable is declared to be live over the whole right branch. This increases register pressure and decreases code quality for register-poor architectures."
A good, simple description of phi nodes: https://capra.cs.cornell.edu/bril/lang/ssa.html
" A program is in SSA form if: 1. each definition has a distinct name 2. each use refers to a single definition ... Main interest: allows to implement several code optimizations " -- http://www.montefiore.ulg.ac.be/~geurts/Cours/compil/2014/05-intermediatecode-2014-2015.pdf
"
Transforming a 3-address representation into stack is easier than a stack one into 3-address.
Your sequence should be the following:
Form basic blocks
Perform an SSA-transform
Build expression trees within the basic blocks
Perform a register schedulling (and phi- removal simultaneously) to allocate local variables for the registers not eliminated by the previous step
Emit a JVM code - registers goes into variables, expression trees are trivially expanded into stack operationsshareeditflag answered Dec 8 '11 at 8:18 SK-logic 8,60612032
Wow! Thanks this is just what I was looking for. Questions: do I need to do the SSA transform within each basic block or across the whole procedure? Do you have any pointers to tutorials, textbooks or other resources? – akbertram Dec 8 '11 at 9:09 SSA transform is aways procedure-wide: en.wikipedia.org/wiki/Single_static_assignment You'll just need to find a dominance frontier for each basic block where you're assigning a variable (with multiple assignment locations), insert the phi nodes there and then get rid of the redundant phis (n.b.: some may have circular dependencies). – SK-logic Dec 8 '11 at 9:13 @akbertram, LLVM can be a useful source of inspiration here, you can safely model your intermediate representation after it. Some important design decisions from there: do not allow to assign one register to another, and do not allow to assign a constant to a register, always substitute it in place instead. – SK-logic Dec 8 '11 at 9:16 The expression tree looks alot like the original AST -- is it worth making a round-trip through a TAC-like IR or does another sort of IR make sense when the only target is the JVM? – akbertram Dec 8 '11 at 9:17 @akbertram, if you can stuff your JVM compilation before making TAC - then yes, you do not need it, a direct stack code generation is easier. Otherwise, if you can only stack on top of the existing compiler infrastructure, you'll need to re-construct the expression trees. Funny bit is that the JIT will do it again too. – SK-logic Dec 8 '11 at 9:22 " -- [7]
Alternatives to Phi nodes in SSA:
Links:
See e.g.
CPS vs. SSA: http://wingolog.org/archives/2011/07/12/static-single-assignment-for-functional-programmers
"in CPS, as in SSA or ANF, expressions don't have subexpressions." -- http://wingolog.org/archives/2014/05/18/effects-analysis-in-guile
" There are a couple of things going on with CPS as implemented in this style of compilers that make it particularly nice for writing optimization passes:
1) There are only function calls ("throws") and no returns. Though there's no theoretical reduction in the stuff you have to track (since what is a direct-style return is now a throw to the rest of the program starting at the return point), there's a huge reduction in in the complexity of your optimizations because you're not tracking the extra thing. For some examples, you can look in the Manticore codebase where even simple optimizations like contraction and let-floating are implemented in both our early direct-style IR (BOM) and our CPS-style IR (CPS). The latter is gobs more readable, and there's no way at all I'd be willing to port most of the harder optimizations like reflow-informed higher-order inlining or useless variable elimination to BOM.
2) IRs are cheap in languages like lisp and ML. So you write optimizations as a tree-to-tree transformation (micro-optimization passes). This style makes it much easier to enforce invariants. If you look at the internals of most compilers written in C++, you'll see far fewer copies of the IR made and a whole bunch of staged state in the object that's only valid at certain points in the compilation process (e.g., symbols fully resolved only after phase 2b, but possibly invalid for a short time during optimization 3d unless you call function F3d_resolve_symbol...). Just CPS'ing the representation of a C++ compiler without also making it easy to write your optimization passes as efficient tree-to-tree transformations will not buy you much, IMO. " -- [10]
" So the lineage of the CPS-as-compiler-IR thesis goes from Steele's Rabbit compiler through T's Orbit to SML/NJ. At which point Sabry & Felleisen at Rice published a series of very heavy-duty papers dumping on CPS as a representation and proposing an alternate called A-Normal Form. ANF has been the fashionable representation for about ten years now; CPS is out of favor. This thread then sort of jumps tracks over to the CMU ML community, where it picks up the important typed-intermediate-language track and heads to Cornell, and Yale, but I'm not going to follow that now. " -- http://www.paulgraham.com/thist.html
?
"in CPS, as in SSA or ANF, expressions don't have subexpressions." -- http://wingolog.org/archives/2014/05/18/effects-analysis-in-guile
http://lambda-the-ultimate.org/node/1617#comment-19700
http://lambda-the-ultimate.org/node/69
see Optimizing compilers for structured programming languages by Marc Michael Brandis
https://github.com/cfallin/rfcs/blob/cranelift-egraphs/accepted/cranelift-egraph.md
Many large books have been filled with the details of writing efficient compilers and interpreters, so this chapter will only provide an overview of selected techniques.
Calling a function typically involves manipulating the stack (upon entry, you gotta push the return address and the frame pointer register onto the stack, push the formal parameters onto the stack, adjust the top-of-the-stack pointer register, and adjust the frame pointer register to point to the new current stack position).
You can save this overhead by inlining.
Some languages present a computation model in which there is only a stack, no registers. In this case, assuming that the underlying hardware has registers, it may speed things up to use some of the registers to hold the top few stack positions.
Dynamic loading: Languages with dynamic loading can experience slow startup times if they:
See also relevant section of [11].
Links:
Links:
Languages without a canonical implementation tend to:
Scheme started out as an implementation, was described in a series of published papers (the Lambda Papers), turned into a standard, however
Haskell started out as a standard (todo confirm), then many implementations sprung up, then one won out (GHC). As of this writing, GHC has stopped maintaining conformance with the language standard (except while running in a special mode that is not interoperable with most libraries in the wild; see [12] ), and discussions regarding changing the language appear to revolve around changing GHC first, with updates to the standard to cause them to conform to GHC an afterthought [13].
benefits, costs (lots of popular languages dont)
portability
kernel approach
There is always the 'chain solution': Every self-hosting language had to be bootstrapped from another language at the beginning. And each subsequent self-hosting version of a self-hosting language had to be compilable by the previous version. So, to bootstrap the language onto a new platform, one could simply re-boostrap that same early version of the language, and then re-compile each version of the language's compiler in sequence (see eg [14]). Disadvantages to this include: (a) if you are worried about Thompson 'trusting trust' attacks (see section below), then you must audit the source code of each version of the language implementation in this chain; and (b) this seems like a lot more compilation steps than should be necessary.
Another option is for the language to deliberately restrict its own implementation to using code compatible with an earlier version of itself (eg Golang [15]). This removes some of the benefits of a self-hosting language in terms of using the compiler as a way for the language designers to eat their own dogfood.
various stories of standards processes and advice on what to do if you find yourself involved in a standards process:
Ken Thompson wrote a popular essay in which he pointed out that a compiler could be subverted to introduce attacker-desired behavior into the programs it compiles; if the subversion is clever, then such a compiler, when compiling itself from source, would continually re-introduce the subversion.
Links:
" 1.4 WHY ARE STACKS USED IN COMPUTERS?
Both hardware and software stacks have been used to support four major computing areas in computing requirements: expression evaluation, subroutine return address storage, dynamically allocated local variable storage, and subroutine parameter passing.
1.4.1 Expression evaluation stack
Expression evaluation stacks were the first kind of stacks to be widely supported by special hardware. As a compiler interprets an arithmetic expression, it must keep track of intermediate stages and precedence of operations using an evaluation stack. In the case of an interpreted language, two stacks are kept. One stack contains the pending operations that await completion of higher precedence operations. The other stack contains the intermediate inputs that are associated with the pending operations. In a compiled language, the compiler keeps track of the pending operations during its instruction generation, and the hardware uses a single expression evaluation stack to hold intermediate results.
To see why stacks are well suited to expression evaluation, consider how the following arithmetic expression would be computed:
X = (A + B) * (C + D)
First, A and B would be added together. Then, this intermediate results must be saved somewhere. Let us say that it is pushed onto the expression evaluation stack. Next, C and D are added and the result is also pushed onto the expression evaluation stack. Finally, the top two stack elements (A+B and C+D) are multiplied and the result is stored in X. The expression evaluation stack provides automatic management of intermediate results of expressions, and allows as many levels of precedence in the expression as there are available stack elements. Those readers who have used Hewlett Packard calculators, which use Reverse Polish Notation, have direct experience with an expression evaluation stack.
The use of an expression evaluation stack is so basic to the evaluation of expressions that even register-based machine compilers often allocate registers as if they formed an expression evaluation stack.
1.4.2 The return address stack
With the introduction of recursion as a desirable language feature in the late 1950s, a means of storing the return address of a subroutine in dynamically allocated storage was required. The problem was that a common method for storing subroutine return addresses in non-recursive languages like FORTRAN was to allocate a space within the body of the subroutine for saving the return address. This, of course, prevented a subroutine from directly or indirectly calling itself, since the previously saved return address would be lost.
The solution to the recursion problem is to use a stack for storing the subroutine return address. As each subroutine is called the machine saves the return address of the calling program on a stack. This ensures that subroutine returns are processed in the reverse order of subroutine calls, which is the desired operation. Since new elements are allocated on the stack automatically at each subroutine call, recursive routines may call themselves without any problems.
Modern machines usually have some sort of hardware support for a return address stack. In conventional machines, this support is often a stack pointer register and instructions for performing subroutine calls and subroutine returns. This return address stack is usually kept in an otherwise unused portion of program memory.
1.4.3 The local variable stack
Another problem that arises when using recursion, and especially when also allowing reentrancy (the possibility of multiple uses of the same code by different threads of control) is the management of local variables. Once again, in older languages like FORTRAN, management of information for a subroutine was handled simply by allocating storage assigned permanently to the subroutine code. This kind of statically allocated storage is fine for programs which are neither reentrant nor recursive.
However, as soon as it is possible for a subroutine to be used by multiple threads of control simultaneously or to be recursively called, statically defined local variables within the procedure become almost impossible to maintain properly. The values of the variables for one thread of execution can be easily corrupted by another competing thread. The solution that is most frequently used is to allocate the space on a local variable stack. New blocks of memory are allocated on the local variable stack with each subroutine call, creating working storage for the subroutine. Even if only registers are used to hold temporary values within the subroutine, a local variable stack of some sort is required to save register values of the calling routine before they are destroyed.
The local variable stack not only allows reentrancy and recursion, but it can also save memory. In subroutines with statically allocated local variables, the variables take up space whether the subroutine is active or not. With a local variable stack, space on the stack is reused as subroutines are called and the stack depth increases and decreases.
1.4.4 The parameter stack
The final common use for a stack in computing is as a subroutine parameter stack. Whenever a subroutine is called it must usually be given a set of parameters upon which to act. Those parameters may be passed by placing values in registers, which has the disadvantage of limiting the possible number of parameters. The parameters may also be passed by copying them or pointers to them into a list in the calling routine's memory. In this case, reentrancy and recursion may not be possible. The most flexible method is to simply copy the elements onto a parameter stack before performing a procedure call. The parameter stack allows both recursion and reentrancy in programs.
1.4.5 Combination stacks
Real machines combine the various stack types. It is common in register-based machines to see the local variable stack, parameter stack, and return address stack combined into a single stack of activation records, or "frames." In these machines, expression evaluation stacks are eliminated by the compiler, and instead registers are allocated to perform expression evaluation.
The approach taken by the stack machines described later in this book is to have separate hardware expression evaluation and return stacks. The expression evaluation stacks are also used for parameter passing and local variable storage. Sometimes, especially when conventional languages such as C or Pascal are being executed, a frame pointer register is used to store local variables in an area of program memory. " -- http://users.ece.cmu.edu/~koopman/stack_computers/sec1_4.html
" ...nesting of tasks or threads. The task and its creator share the stack frames that existed at the time of task creation, but not the creator's subsequent frames nor the task's own frames. This was supported by a cactus stack, whose layout diagram resembled the trunk and arms of a Saguaro cactus. Each task had its own memory segment holding its stack and the frames that it owns. The base of this stack is linked to the middle of its creator's stack. " -- https://en.wikipedia.org/wiki/Stack_machine (see also https://en.wikipedia.org/wiki/Spaghetti_stack , another name for the same concept)
" Use in programming language runtimes
The term spaghetti stack is closely associated with implementations of programming languages that support continuations. Spaghetti stacks are used to implement the actual run-time stack containing variable bindings and other environmental features. When continuations must be supported, a function's local variables cannot be destroyed when that function returns: a saved continuation may later re-enter into that function, and will expect not only the variables there to be intact, but it will also expect the entire stack to be present so the function is able to return again. To resolve this problem, stack frames can be dynamically allocated in a spaghetti stack structure, and simply left behind to be garbage collected when no continuations refer to them any longer. This type of structure also solves both the upward and downward funarg problems, so first-class lexical closures are readily implemented in that substrate also. " -- https://en.wikipedia.org/wiki/Spaghetti_stack
Some VMs with stack-based (as opposed to register-based) instructions have a value stack (for the VM instructions), a call stack, and a block stack (to keep track of eg that you are in a FOR loop nested inside another FOR loop; a BREAK would pop this stack); for example, Python [16].
In many architectures the stack grows downwards [17] [18] [19], eg x86, Mostek6502, PDP11, most MIPS ABIs [20] [21] [22].
In early memory-constrained devices, often there is either just a memory section for program code, and a section for data, and a stack, and the code is placed near location 0, then the data section, whereas the stack is placed in high memory (near the largest addresses) and grows down [23].
Memory-constrained devices often don't have a heap that grows (only a fixed-size data section, which can be thought of as a fixed-size heap), which means that often the only memory section with a dynamically changing size is the stack. [24]
This arrangement would also work if the fixed-size data were placed in high memory and the stack were placed in low memory and grew up. This is seen sometimes (eg HP PA-RISC, Multics [25]) but less frequently.
One reason given for the stack being placed in high memory and growing downwards is that if the ISA makes it more efficient to add only UNSIGNED (always positives) offsets to memory addresses, then since a common calculation is to add an offset to the stack pointer to locate something in the stack, this will be most efficient if that offset is always positive, that is, if the stack pointer points to the lowest memory location in the stack, which occurs if the stack grows downwards [26] [27]. In addition, alignment calculations are simpler; eg "If you place a local variable on the stack which must be placed on a 4-byte boundary, you can simply subtract the size of the object from the stack pointer, and then zero out the two lower bits to get a properly aligned address. If the stack grows upwards, ensuring alignment becomes a bit trickier." [28].
Elsewhere, we treat (todo):
Links:
There are three basic ways to implement regular expressions:
It is also possible to try to use one of these techniques and then fall back to another one in certain cases, or to partially combine the first two techniques via simulating the NFA but with caching [32].
grep, awk, tcl use combinations of the first two techniques. Perl, PCRE, Python, Ruby, and Java use the third technique [33]