Table of Contents for Programming Languages: a survey

Chapter : control

note: looping control constructs are found in [1]

"It's easy to forget sometimes just how much work a modern language like Python does to guide and constrain how you put together a program. Like, just for the most basic example, consider how simply juxtaposing two statements f(); g() expresses ordering: you know that g won't start executing until f has finished. Another example – the call stack tracks relationships between callers and callees, allowing us to decompose our program into loosely-coupled subroutines: a function doesn't need to keep track of who called it, it can just say return to fire a value into the void, and the language runtime makes some magic happen so the value and the control flow are delivered simultaneously to the right place. Exception handling defines a systematic, structured way to unwind these decoupled subroutines when something goes wrong; if you think about it this trick of taking the call stack and reusing it as the unwind stack is really quite clever. with blocks build on that by giving us an ergonomic way to pin the lifetime of resources – file handles and so forth – to the dynamic lifetime of these call stacks. Iterators track the state needed to bind control flow to the shape of data.

These tools are so fundamental that they disappear into the background – when we're typing out code we don't usually stop to think about all the wacky ways that things could be done differently. Yet these all had to be invented. In functional languages, f(); g() doesn't express ordering. Back in the 1970s, the idea of limiting yourself to using just function calls, loops, and if statements – goto's hamstrung siblings – was *incredibly controversial*. There are great living languages that disagree with Python about lots of the points above. " --


todo generality of goto

todo uses of fwds and backwards gotos

todo some uses of gotos:

in many comments say that thee only place they need goto are:

note: reducible control flow, todo cite that stuff in the thread

why not goto?

" In the late 1960s, Edsger W. Dijkstra wrote a pair of now-famous papers that helped make this much clearer: Go to statement considered harmful, and Notes on structured programming...

But what Dijkstra pointed out is that..., there's a big difference between goto and ((if/then blocks, loop blocks, function calls)). For everything except goto, flow control comes in the top -> [stuff happens] -> flow control comes out the bottom. We might call this the "black box rule": if a control structure has this shape, then in contexts where you don't care about the details of what happens internally, you can ignore the [stuff happens] part, and treat the whole thing as regular sequential flow ... if you have a language with goto – a language where functions and everything else are built on top of goto, and goto can jump anywhere, at any time – then these control structures aren't black boxes at all! If you have a function, and inside the function there's a loop, and inside the loop there's an if/else, and inside the if/else there's a goto... then that goto could send the control anywhere it wants. Maybe control will suddenly return from another function entirely, one you haven't even called yet. ... And now that Dijkstra understood the problem, he was able to solve it. Here's his revolutionary proposal: we should stop thinking of if/loops/function calls as shorthands for goto, but rather as fundamental primitives in their own rights – and we should remove goto entirely from our languages. ... In the end, modern languages are a bit less strict about this than Dijkstra's original formulation. They'll let you break out of multiple nested structures at once using constructs like break, continue, or return. But fundamentally, they're all designed around Dijkstra's idea; even these constructs that push the boundaries do so only in strictly limited ways. In particular, functions – which are the fundamental tool for wrapping up control flow inside a black box – are considered inviolate. You can't break out of one function and into another, and a return can take you out of the current function, but no further. Whatever control flow shenanigans a function gets up to internally, other functions don't have to care.

This even extends to goto itself. You'll find a few languages that still have something they call goto, like C, C#, Golang, ... but they've added heavy restrictions. At the very least, they won't let you jump out of one function body and into another. Unless you're working in assembly [2], the classic, unrestricted goto is gone. Dijkstra won. A surprise benefit: removing goto statements enables new features

And once goto disappeared, something interesting happened: language designers were able to start adding features that depend on control flow being structured.

For example, Python has some nice syntax for resource cleanup: the with statement. You can write things like:

  1. Python with open("my-file") as file_handle: ...

and it guarantees that the file will be open during the ... code, but then closed immediately afterward. Most modern languages have some equivalent (RAII, using, try-with-resource, defer, ...). And they all assume that control flows in an orderly, structured way. If we used goto to jump into the middle of our with block... what would that even do? Is the file open or not? What if we jumped out again, instead of exiting normally? Would the file get closed? This feature just doesn't work in any coherent way if your language has goto in it.

Error handling has a similar problem: when something goes wrong, what should your code do? Often the answer is to pass the buck up the stack to your code's caller, let them figure out how to deal with it. Modern languages have constructs specifically to make this easier, like exceptions, or other forms of automatic error propagation. But your language can only provide this help if it has a stack, and a reliable concept of "caller". ...

goto statements: not even once

So goto – the traditional kind that ignores function boundaries – isn't just the regular kind of bad feature, the kind that's hard to use correctly. If it were, it might have survived – lots of bad features have. But it's much worse.

Even if you don't use goto yourself, merely having it as an option in your language makes everything harder to use. Whenever you start using a third-party library, you can't treat it as a black box – you have to go read through it all to find out which functions are regular functions, and which ones are idiosyncratic flow control constructs in disguise. This is a serious obstacle to local reasoning. And you lose powerful language features like reliable resource cleanup and automatic error propagation. Better to remove goto entirely, in favor of control flow constructs that follow the "black box" rule.

" -- [2]

a little bit on relooper and reducible control flow:

"Structured control flow provides simple and size-efficient binary encoding and compilation. Any control flow--even irreducible--can be transformed into structured control flow with the Relooper algorithm, with guaranteed low code size overhead, and typically minimal throughput overhead (except for pathological cases of irreducible control flow). Alternative approaches can generate reducible control flow via node splitting, which can reduce throughput overhead, at the cost of increasing code size (potentially very significantly in pathological cases). Also, more expressive control flow constructs unicorn may be added in the future." [3] (accessed on 180425)


structured programming

" The structured program theorem, also called Böhm-Jacopini theorem,[1][2] is a result in programming language theory. It states that a given class of algorithms can compute any computable function if those algorithms combine subprograms in only three specific ways. In other words, any algorithm can be expressed using only three control structures. They are

    Executing one subprogram, and then another subprogram (sequence)
    Executing one of two subprograms according to the value of a boolean expression (selection)
    Executing a subprogram until a boolean expression is true (iteration)

Note that these can be represented, respectively, by the concatenation, union, and star operations of a regular expression. "

the construction to transform a procedure into one using only these 3 is:

(note: i think a 'while' fulfills both of the last two conditions; you can do 'if' with 'while' like this:

if condition: stuff

replaced by:

x = condition while x: stuff x = 0


goto considered harmful



escape continuations

finalizers etc during unwind?

first-class continuations

delimited continuations

here's something delimited continuations are good for:

links about continuations

The zoo of kinds of continuations:


dynamic-wind in-guard thunk out-guard



tail call optimization

Also called TCO or tail call elimination.

Poor man's substitutions for TCO

clojure recur: Clojure does not do implicit tail call optimization, but for direct recursion (where a function calls itself recursively; as opposed to mutual recursion, where two or more functions call each other recursively), one can get the same effect with 'recur'. "Evaluates the exprs in order, then, in parallel, rebinds the bindings of the recursion point to the values of the exprs. If the recursion point was a fn method, then it rebinds the params. If the recursion point was a loop, then it rebinds the loop bindings. Execution then jumps back to the recursion point" -- What that means is that (todo: i think) is that when the compiler sees 'recur', it traverses up the chain of containing lexical scopes until it hits either a loop or a function definition; this becomes the 'recursion point'. The arguments to 'recur' are expressions for updating the local variables defined at the recursion point. When recur is hit the current stack frame is recycled; this is an optimized tail call, or 'non-stack-consuming', as Clojure puts it.

clojure trampoline: for mutual recursion in clojure, instead of calling another function directly, you return an anonymous function that calls that function. then you invoke the initial function with 'trampoline'. trampoline calls the initial function, and if it returns a non-function, terminates and returns that, but if it returns a function, trampoline then tail-recursively iterates, calling the function that was returned. In other words, it's implementing CPS (continuation-passing style), with the restriction that there is no way to 'really' return a function as the final return value without escaping it. See also

Here is the clojure.core implementation of `trampoline':

(defn trampoline ([f] (let [ret (f)] (if (fn? ret) (recur ret) ret))) ([f & args] (trampoline #(apply f args))))

some motivations for tail calls

tail calls as a better goto: " In a paper delivered to the ACM conference in Seattle in 1977, Guy L. Steele summarized the debate over the GOTO and structured programming, and observed that procedure calls in the tail position of a procedure can be most optimally treated as a direct transfer of control to the called procedure, typically eliminating unnecessary stack manipulation operations.[38] Since such "tail calls" are very common in Lisp, a language where procedure calls are ubiquitous, this form of optimization considerably reduces the cost of a procedure call compared to the GOTO used in other languages. Steele argued that poorly implemented procedure calls had led to an artificial perception that the GOTO was cheap compared to the procedure call. Steele further argued that "in general procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as machine code JUMP instructions", with the machine code stack manipulation instructions "considered an optimization (rather than vice versa!)".[38] Steele cited evidence that well optimized numerical algorithms in Lisp could execute faster than code produced by then-available commercial Fortran compilers because the cost of a procedure call in Lisp was much lower. In Scheme, a Lisp dialect developed by Steele with Gerald Jay Sussman, tail call optimization is mandatory.[39] " --


There are various ways to conditionally execute code, or to define an expression by cases:


three alternatives to 'if':

what evaluates to false

and typing gotcha in haskell

ternary if

e.g. in C: condition ? do-if-true : do-if-false

note: in most languages (but not in PHP), the ternary if operator has right associativity [5]


note: C switch and fall-thru; many languages don't have fall-thru because it's more verbose and has the potential for bugs if you forget a 'break'

short circuit boolean ops

combined conditionals and branching

SPIN Prolema 'do' (do-od): a looping construct. Presumably loops until the command 'break' is executed. Like some forms of 'switch', it has multiple branches (delimited by '::'), and each branch has a guard (separated from the branch code by '->'). I think if multiple branch guards are True, then a non-deterministic choice is made. I think if all other branch guards are false, the 'else' guard is True.


alteratives to while:

C# delegates: late-binding, method choice without object choice, chaining

In C#, instead of first-class functions, instead appears (I don't know C# as of this writing) something called 'delegates'.

Delegates aren't exactly functions, rather they are a choice of an object and a method on that object. Presumably the method on that object could be reassigned in the interim between when a delegate is created and called, making this different from a function, although i'm not sure if that is actually possible in C#.

Delegates have two more special features.

First, a delegate can, instead of specifing both a target object and a method on that object, only specify a method name, leaving the target object to be specified at the time they are called.

Second, a delegate can be 'chained', meaning that when a delegate is invoked, it calls its target and then invokes another delegate.

Chapter: control: dispatch


polymorphic (static and dynamic)

when you call a function/method, which code actually gets executed?


todo isn't this whole section 'polymorphism'? rename

traditional non-polymorphic functions

('one dimensional dispatch', according to ):

1-1 relationship between function names and the code

generic polymorphism

same source code in all cases, but possibly different object code

can be implemented via templates

ad-hoc polymorphism

you can have different source code executed for the same function/method name depending on context

OOP-style receiver type based dispatch

Choose the code to be executed based on both the method name, as also the type of 'receiver'. This terminology comes from considering a function call as a message to an object instance, the object instance being the 'receiver' of the message.

This way of doing things is called 'single dispatch polymorphism' (to contrast it with 'multiple dispatch', see below).

The method implementation executes within the context of its receiver, by which we mean that the private fields and methods of the instance are available during the execution of the method. In some languages, such as C++, the fields and methods of the receiver instance become in-scope for variable lookup, as if they were local variables.

OOP type systems often have subtypes and inheritance. This means that, if an object is of some type T, and T is a subtype of another type Tp, and Tp implements method name M, then objects of type T can also handle calls to method name M even without writing a separate implementation of method name M for type T. We say that type T inherits Tp's implementation of method M.

Now consider if type Tp is itself a subtype of another Tpp. Since subtype-ness is transitive, this means that T is also a subtype of Tpp. Perhaps Tpp also has its own implementation of method M. In this case, we have to have a procedure to determine which implementation of method M is used by T. The rule 'if T is a subtype of T2, and T has no implementation of M and T2 does, then T uses T2's implementation of M' is ambiguous, since either Tp or Tpp could fill the role of T2 in that rule.

If you have single inheritance, then there's an obvious solution. You recursively climb up the inheritance chain looking for an implementation of method M and stop at the first one you find. So, first you look at T, and it doesn't have an implementation of M, so you keep going. Then you look at Tp, and it does have an implementation of M, so you stop. The principal is that you want to use the 'most specific' implementation of M (in that a subtype is 'more specific' than its parent type).

If you have multiple inheritance, though, then this simple procedure does not suffice to eliminate ambiguity.

Note that, if one ignores access visibility restrictions on private fields and methods, than rather than thinking of the method as executing with the context of its receiver, you can model the 'receiver' as an actual argument passed to a function. Python makes this somewhat explicit; when you define a method inside a class, when that method is called, a value containing the object instance is prepended to the arguments which are passed by the caller. Traditionally in Python, this first argument is called 'self'. For example:

class A: def __init__(self, some_field_of_A): self.some_field_of_A = some_field_of_A def b(self, x): print repr(self) print repr(x) print repr(self.some_field_of_A)

a = A(1) a.b(2) print; print A.b(a,2)

Results in:

<__main__.A instance at 0x7f688028e5f0> 2 1

<__main__.A instance at 0x7f688028e5f0> 2 1

As you can see, the result of calling the 'b' method on instance 'a' was equivalent to calling the function A.b with the argument 'self' bound to the instance a, and the argument 'x' bound to 2. Also, even within the methods of the object instance a, the fields of a must be referenced through the 'self' variable.

determining the 'most specific' implementation in the case of multiple inheritance

todo the diamond problem

Subjective programming

Now we consider all three of the method name, the type of the receiver, and the type of the 'sender' of the message in the selection of which code to actually execute.

See 'Three-dimensional dispatch' is , and R. B. Smith and D. Ungar. A simple and unifying approach to subjective objects. TAPOS special issue on Subjectivity in Object-Oriented Systems, 2(3):161-178, 1996.


Also called 'multiple dispatch'.

Above, in section "OOP-style receiver type based dispatch", we saw that one can think of receiver-based dispatch equivalently as if the method definition is executed inside the context of its instance; equivalently (ignoring access restrictions), one can think of it as if a receiver is actually passed as an argument to a function. Generalizing the latter concept, we ask, if we can dispatch the method based on the type of the receiver argument, why not dispatch based on other arguments as well?

As with multiple inheritance, in this setting it is possible to have multiple implementations that match, so we need a way to resolve ambiguity. Often people try to come up with appropriate generalizations of 'most specific' and then to choose the 'most specific' method implementation when there is more than one to choose from. For example, a method implementation with more dependencies on the types of the arguments would be chosen over an otherwise identical method implementation which doesn't care about the types of some arguments.

Muschevici, Radu; Potanin, Alex; Tempero, Ewan; Noble, James (2008). "Multiple dispatch in practice". Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications. OOPSLA '08 (Nashville, TN, USA: ACM): 563–582. doi:10.1145/1449764.1449808. surveyed some programs in languages with multimethods and found that approximately 20% of polymorphic methods were multimethods (i.e. "13%-32% of generic functions utilize the dynamic type of a single argument, while 2.7%-6.5% of them utilize the dynamic type of multiple arguments", to quote Wikipedia).

An example of where you might want multimethods is when you are defining an operator such as "addition", which takes two arguments each of which may be various types such as integer or float.

See for a discussion of how multimethods can be used in some places where you might have used the Visitor pattern.

See also .


In the above, we dispatched based on the method name and possibly also on the types of receivers, senders, and/or method arguments.

Guards enable dispatching to depend also on the values of method arguments.

As with multiple inheritance, in this setting it is possible to have multiple implementations that match, so we need a way to resolve ambiguity. Often people try to come up with appropriate generalizations of 'most specific' and then to choose the 'most specific' method implementation when there is more than one to choose from. For example, a method implementation with a matching guard would be chosen over an otherwise identical method implementation but with no guard.

Context-oriented programming

Now we consider four things when determining dispatch: the method name, the type of the receiver, the type of the sender of the message, and also the 'context', which is a dynamically-determined value.

The difference between "context-oriented programming" and guards is that in context-oriented programming (see ), the "context" is specified as a sequence of 'layers'. Each 'layer' may contain one or more method implementations.

Note that elsewhere in this book we use the word 'context' in an informal sense to mean various different things; don't assume that we are talking about 'context-oriented programming' when you see the word 'context'.

Type classes

If, instead of grouping functions together as methods on objects, we just have unattached polymorphic functions, one thing we miss from receiver-oriented dispatch is the grouping together of various methods into an object. This was key to the idea of OOP encapsulation, that is, the idea that internal representations can be encapsulated by encapsulating data representations within objects with various methods by which their data can be accessed. As long as the rest of the program only accessed the data through these methods, you could change the representation of the data without breaking the rest of the program as long as you changed the methods at the same time. So every object's methods form a little API for accessing the data in that object.

If you just have unattached functions, even if those functions can be ad-hoc polymorphic depending upon the types of their arguments, you would still like a way to take a group of functions and put them together into a grouping, analogous to your object API (set of OOP methods for a class). You want to be able to say, e.g., "If something is a number, it should support addition, subtraction, multiplication, and division." Such a grouping is called a 'type class'. The signature of methods within a type class is polymorphic depending on the type. An implementation of a type class for a particular type is called a 'type class instance'.

Multimethods correspond to multi-parameter type classes. For example, an addition function which is ad-hoc polymorphic in both of its arguments could be contained within an 'arithmetic number' type class of addition, subtraction, multiplication, and division with two type arguments. There might be an instance of such a type class for the tuple of types (int, int), and another instance for (int, float) and another one for (float, int) and another one for (float, float).

Conditionals and boolean logic

Short-circuit connectives

Short-circuit connectives are forms of the operators 'or' and 'and' which don't evaluate their right-hand argument if the value of the left-hand argument is sufficient for determining their result.

For example, if we have three functions "returns_false" and "returns_true" and "f", and if "returns_false() == False", and "returns_true() == True", then in both of the following examples, if the language has short-circuit connectives then f() will never be called: returns_true() or f() returns_false() and f()

whereas in a language without short-circuit connectives, f() would have been called in both of those two examples.


Truthiness is a form of syntactic sugar that allows you to avoid the common cases of writing for example "x > 0" or "not x.isEmpty" in conditionals.

Many languages allow values with types other than boolean to be used in the predicate inside a conditional statement. This can be handled in various ways:

Values which act as the boolean True when inside a conditional are sometimes called 'truthy', and other values 'falsy'.

For example, here's how truthiness works in Python

"In Ruby only false and nil are falsey. Everything else is truthy (yes, even 0 is truthy).

In some languages, like C and Python, 0 counts as false. In fact, in Python, empty arrays, strings, and hashes all count as false. In JavaScript?, empty strings count as false, but empty arrays count as true. " --

In the language Pike, 0 is falsy, and everything else is truthy [7].

In some systems (eg R's falsy library), exception objects are also falsy.

Some languages with booleans:

Some languages without (dedicated) booleans:

Smalltalk does not have truthiness ([8]).

What is truthy in various languages, out of empty list, 0, empty string:

"0" is truthy in most languages with truthiness, but in PHP and Perl, "0" is falsy; (Taw says that falsy "0" is a [9] "huge source of bugs"])

In languages with nullable types, the 'nil' element is typically falsy (in fact, i know of no languages in which it is not).


Connectives with passthru return values

Some languages with truthiness also have syntactic sugar for the common operation of testing multiple values in a short-circuit fashion, and returning the first one to pass the test, or the last value if none pass. For example, in Python:

 ("hi" or "default") == "hi"
 ("" or "default") == "default"
 (False or "") == ""

What is happening is that the short-circuit 'or' operator is returning the last value that it tested. Because of the way that boolean 'or' works, the last value that is tested in short-circuit 'or' is always equal to the result (to see this, just consider all cases; False or False, False or True, True or False, True or True); the same is true for short-cirucit 'and'. Because the truthiness convention is being used, returning a value that is falsy or truthy is just as good as returning False or True itself.

Short-circuit 'and' can also be used to test multiple values in a short-circuit fashion, and return the first one to fail the test, or the last value if all pass. This is useful when the application of a test that you want to apply to a value will itself cause an error if the value is not of the right form; for example, in Python, if you have a list which might be empty, and you want to return that list but only if its first element is itself truthy, and otherwise you want a default: (maybeAnEmptyList and maybeAnEmptyList[0]) or default

'and' is also useful to test conjunctions within larger boolean expressions. In Python, if you want a value whenever both otherProperty is truthy, and value is truthy, and otherwise you want a default: (otherProperty and value) or default

It is also useful in some special cases where you want a sentinel value for some special case, and then sentinel you want happens to be the value that causes the special case, eg. if you want the average of a list of numbers, or 0 if the list is empty: count and total/count

Thanks [10].


Some languages with truthiness also have syntactic sugar for the common operation of testing a value for truthiness and binding it to a variable.

In Perl6, the following construct tests a value for truthiness and if it is truthy, binds that value to a variable and executes a block of code. For eg. (todo do i understand this correctly?)

  if (0..5).pick -> $number { say $number }
  if (0..5).pick { say $^number } # ditto

(thanks [11])

todo / links

dynamic wind/unwind (eg in Guile Scheme)

some complicated kinds of functions calling async: the fn returns a future (and, do you start the fn right away, or lazily?) or a future's generalization, a stream or the fn can callback is the fn an introspectable AST or just a fn? partial failure; is it possible that the function never got the message? is it possible that the function completed successfully but you never got its completion message? (are exceptions possibly raised, or is there an error return argument, or are nullable types used?) could the function not terminate? is there a timeout?

polymorphism dispatch based on first argument, or on multiple arguments? dispatch chosen at compiletime or runtime? argues for tail recursion.

The argument is that to really achieve OOP encapsulation, you want fully abstract interfaces. This means that an instance of an object implementing an interface cannot make any assumptions about the internals of other instances implementing the same interface (because they may be different classes, or at least different subclasses, that implement the interface in very different ways). Yet, you sometimes want to compose these objects together into larger objects (such as collections). If you have a very large collection, sometimes you may want to apply an operation that requires the entire collection to be traversed, and returns an answer that depends on arbitrarily many items in the collection. But, various subsets of the collection might be within other objects (consider a list built by instantiating a Nil object instance, then a Cons object instance which contains that Nil object, then another Cons instance which contains the first Cons, etc). So there might be no single objects which has a list of all of the objects that must be traversed. This means that each of these objects must be called, and the depth of this call chain might reach the number of items in the collection.

Another example he gives is the "implementation of transitions in an object-oriented finite-state machine of indefinitely long lifetime."

(note: [12] gives an example to show that tail calls which are not strictly necessary here; other language constructs, such as thunks, could substitute for tail calls)


if let and while let

Rust and Swift have some control flow that is conditional upon pattern matching:

" if let

if let allows you to combine if and let together to reduce the overhead of certain kinds of pattern matches.

For example, let’s say we have some sort of Option<T>. We want to call a function on it if it’s Some<T>, but do nothing if it’s None. That looks like this:

match option { Some(x) => { foo(x) }, None => {}, }

We don’t have to use match here, for example, we could use if:

if option.is_some() { let x = option.unwrap(); foo(x); }

Neither of these options is particularly appealing. We can use if let to do the same thing in a nicer way:

if let Some(x) = option { foo(x); }

If a pattern matches successfully, it binds any appropriate parts of the value to the identifiers in the pattern, then evaluates the expression. If the pattern doesn’t match, nothing happens.

If you want to do something else when the pattern does not match, you can use else:

if let Some(x) = option { foo(x); } else { bar(); }

while let

In a similar fashion, while let can be used when you want to conditionally loop as long as a value matches a certain pattern. It turns code like this:

let mut v = vec![1, 3, 5, 7, 11]; loop { match v.pop() { Some(x) => println!("{}", x), None => break, } }

Into code like this:

let mut v = vec![1, 3, 5, 7, 11]; while let Some(x) = v.pop() { println!("{}", x); } " --


an introductory example of tail recursion:


chained conditionals:



'unless' is used to create one-liner control structures that execute something on the left, UNLESS the condition on the right is true. Eg in Ruby:

  list_cart unless @cart.include? 'pineapple'