Table of Contents for Programming Languages: a survey

Chapter : control

note: looping control constructs are found in [1]

"It's easy to forget sometimes just how much work a modern language like Python does to guide and constrain how you put together a program. Like, just for the most basic example, consider how simply juxtaposing two statements f(); g() expresses ordering: you know that g won't start executing until f has finished. Another example – the call stack tracks relationships between callers and callees, allowing us to decompose our program into loosely-coupled subroutines: a function doesn't need to keep track of who called it, it can just say return to fire a value into the void, and the language runtime makes some magic happen so the value and the control flow are delivered simultaneously to the right place. Exception handling defines a systematic, structured way to unwind these decoupled subroutines when something goes wrong; if you think about it this trick of taking the call stack and reusing it as the unwind stack is really quite clever. with blocks build on that by giving us an ergonomic way to pin the lifetime of resources – file handles and so forth – to the dynamic lifetime of these call stacks. Iterators track the state needed to bind control flow to the shape of data.

These tools are so fundamental that they disappear into the background – when we're typing out code we don't usually stop to think about all the wacky ways that things could be done differently. Yet these all had to be invented. In functional languages, f(); g() doesn't express ordering. Back in the 1970s, the idea of limiting yourself to using just function calls, loops, and if statements – goto's hamstrung siblings – was *incredibly controversial*. There are great living languages that disagree with Python about lots of the points above. " -- https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/

goto

todo generality of goto

todo uses of fwds and backwards gotos

todo some uses of gotos:

simulate named break/continue
single exit style (instead of finally, i think: https://news.ycombinator.com/item?id=9827051 )

in https://news.ycombinator.com/item?id=9827051 many comments say that thee only place they need goto are:

as named break/continue
as replacement for try/except/finally (without uncaught exception propagation)
a few guys also mention as a replacement for switch {} dispatch in interpreters

note: reducible control flow, todo cite that stuff in the thread https://news.ycombinator.com/item?id=9827305

why not goto?

" In the late 1960s, Edsger W. Dijkstra wrote a pair of now-famous papers that helped make this much clearer: Go to statement considered harmful, and Notes on structured programming...

But what Dijkstra pointed out is that..., there's a big difference between goto and ((if/then blocks, loop blocks, function calls)). For everything except goto, flow control comes in the top -> [stuff happens] -> flow control comes out the bottom. We might call this the "black box rule": if a control structure has this shape, then in contexts where you don't care about the details of what happens internally, you can ignore the [stuff happens] part, and treat the whole thing as regular sequential flow ... if you have a language with goto – a language where functions and everything else are built on top of goto, and goto can jump anywhere, at any time – then these control structures aren't black boxes at all! If you have a function, and inside the function there's a loop, and inside the loop there's an if/else, and inside the if/else there's a goto... then that goto could send the control anywhere it wants. Maybe control will suddenly return from another function entirely, one you haven't even called yet. ... And now that Dijkstra understood the problem, he was able to solve it. Here's his revolutionary proposal: we should stop thinking of if/loops/function calls as shorthands for goto, but rather as fundamental primitives in their own rights – and we should remove goto entirely from our languages. ... In the end, modern languages are a bit less strict about this than Dijkstra's original formulation. They'll let you break out of multiple nested structures at once using constructs like break, continue, or return. But fundamentally, they're all designed around Dijkstra's idea; even these constructs that push the boundaries do so only in strictly limited ways. In particular, functions – which are the fundamental tool for wrapping up control flow inside a black box – are considered inviolate. You can't break out of one function and into another, and a return can take you out of the current function, but no further. Whatever control flow shenanigans a function gets up to internally, other functions don't have to care.

This even extends to goto itself. You'll find a few languages that still have something they call goto, like C, C#, Golang, ... but they've added heavy restrictions. At the very least, they won't let you jump out of one function body and into another. Unless you're working in assembly [2], the classic, unrestricted goto is gone. Dijkstra won. A surprise benefit: removing goto statements enables new features

And once goto disappeared, something interesting happened: language designers were able to start adding features that depend on control flow being structured.

For example, Python has some nice syntax for resource cleanup: the with statement. You can write things like:

Python with open("my-file") as file_handle: ...

and it guarantees that the file will be open during the ... code, but then closed immediately afterward. Most modern languages have some equivalent (RAII, using, try-with-resource, defer, ...). And they all assume that control flows in an orderly, structured way. If we used goto to jump into the middle of our with block... what would that even do? Is the file open or not? What if we jumped out again, instead of exiting normally? Would the file get closed? This feature just doesn't work in any coherent way if your language has goto in it.

Error handling has a similar problem: when something goes wrong, what should your code do? Often the answer is to pass the buck up the stack to your code's caller, let them figure out how to deal with it. Modern languages have constructs specifically to make this easier, like exceptions, or other forms of automatic error propagation. But your language can only provide this help if it has a stack, and a reliable concept of "caller". ...

goto statements: not even once

So goto – the traditional kind that ignores function boundaries – isn't just the regular kind of bad feature, the kind that's hard to use correctly. If it were, it might have survived – lots of bad features have. But it's much worse.

Even if you don't use goto yourself, merely having it as an option in your language makes everything harder to use. Whenever you start using a third-party library, you can't treat it as a black box – you have to go read through it all to find out which functions are regular functions, and which ones are idiosyncratic flow control constructs in disguise. This is a serious obstacle to local reasoning. And you lose powerful language features like reliable resource cleanup and automatic error propagation. Better to remove goto entirely, in favor of control flow constructs that follow the "black box" rule.

" -- [2]

a little bit on relooper and reducible control flow:

"Structured control flow provides simple and size-efficient binary encoding and compilation. Any control flow--even irreducible--can be transformed into structured control flow with the Relooper algorithm, with guaranteed low code size overhead, and typically minimal throughput overhead (except for pathological cases of irreducible control flow). Alternative approaches can generate reducible control flow via node splitting, which can reduce throughput overhead, at the cost of increasing code size (potentially very significantly in pathological cases). Also, more expressive control flow constructs unicorn may be added in the future." [3] (accessed on 180425)

Links:

https://news.ycombinator.com/item?id=9827276

Resource cleanup

RAII vs Defer:

https://kevincox.ca/2022/05/13/wrong-by-default/
- https://lobste.rs/s/wiva0m/wrong_by_default

structured programming

https://en.wikipedia.org/wiki/Structured_program_theorem

" The structured program theorem, also called Böhm-Jacopini theorem,[1][2] is a result in programming language theory. It states that a given class of algorithms can compute any computable function if those algorithms combine subprograms in only three specific ways. In other words, any algorithm can be expressed using only three control structures. They are

    Executing one subprogram, and then another subprogram (sequence)
    Executing one of two subprograms according to the value of a boolean expression (selection)
    Executing a subprogram until a boolean expression is true (iteration)

Note that these can be represented, respectively, by the concatenation, union, and star operations of a regular expression. "

the construction to transform a procedure into one using only these 3 is:

break the procedure into basic blocks (segments with only one entry and one exit point)
label each block with an integer; label the end of the the procedure with 0;
create a variable L. put the whole procedure in a while loop 'while(L != 0)'. inside the procedure, use 'if' statements to execute the block identified by L. At the end of each block, change L accordingly.

(note: i think a 'while' fulfills both of the last two conditions; you can do 'if' with 'while' like this:

if condition: stuff

replaced by:

x = condition while x: stuff x = 0

)

goto considered harmful

continuations

Links:

escape continuations

https://en.wikipedia.org/wiki/Continuation#Kinds

finalizers etc during unwind?

first-class continuations

delimited continuations

here's something delimited continuations are good for:

" Continuations are copies of a stack that allow one to pause and resume a computation. Mu provides the ability to create delimited continuations using two primitives: ‘call-with-continuation-mark’ and ‘return-continuation-until-mark’. The ‘call-with-continuation-mark’ operator is like ‘call’ (which calls a dynamic function pointer), but first bookmarks (marks) the current point on the call stack. The ‘return-continuation-until-mark’ operator copies the current call stack until the mark, unwinds the call stack until the mark, and returns this stack segment or continuation as the result of the corresponding ‘call-with-continuation-mark’. Intervening stack frames don't return; the computation they represent is effectively paused. To resume it, just ‘call’ the continuation. That will cause the segment of stack copied earlier to be pushed to the top of the stack, resuming the computation from where it was paused. (Example program, worth playing with for a few minutes if you're new to continuations.)

Delimited continuations are a powerful mechanism, well-studied by others. Some useful properties:

    Once you have that segment of a call stack, you can call it any number of times. (Example program.) The data you call it with can be different each time; any arguments passed to ‘call’ are returned as results of ‘return-continuation-until-mark’. (Example program.)
    A single function may be paused at multiple places; if you ‘call’ a continuation and running it encounters a second ‘return-continuation-until-mark’, you get a new continuation. (Example program.) Any arguments passed to ‘return-continuation-until-mark’ are returned along with the continuation as results of the ‘call’, just like a regular ‘return’. (Example program.)" -- http://akkartik.name/coroutines-in-mu/

I think the problem with call/cc is that it’s a powerful side effect, but doesn’t give you a way to sandbox that power. Sometimes you want to call some code without giving it permission to grab your continuation. For example, if you have a Scheme REPL written in Scheme, when you (eval user-expression) you might want a guarantee that it only returns once.

So there’s a variant called composable continuations, or shift/reset. shift grabs the continuation, like call/cc, and reset puts a boundary on the stack that prevents shift from grabbing any frames beyond the boundary. It seems similar to how catch lets you sandbox throw. (You could look at An argument against call/cc and the discussion on Lambda the Ultimate for more.)

I think algebraic effects are a bit like shift/reset. Or a bit like resumable exceptions. If you call some code that tries to write to a file, you can intercept the write, handle it however you like, then resume.

links about continuations

The zoo of kinds of continuations:

algebraic effects

Links:

https://www.cl.cam.ac.uk/~jdy22/papers/effective-concurrency-through-algebraic-effects.pdf
- "They’re essentially macro-expressible with delimited continuations. That means that you could do a totally local transformation. So, you could take a program written in delimited continuations and turn it into one written with algebraic effects and vice versa, entirely locally. You don’t have to look at the whole program to see how to do this. You can just do it locally, which means, it’s a good way of saying that they are equally expressive. But that translation is not type preserving. So, essentially, they are, from a dynamic semantics perspective, in terms of like what happens at run time that they’re the same language feature, basically, but one is easy to type and the other is not. And so algebraic effects are kind of that, that’s one way of thinking about them." -- Leo White
- "Dynamically-bound GOTO" -- [4]
https://antelang.org/docs/language/#algebraic-effects
https://kcsrk.info/webman/manual/effects.html
- https://lobste.rs/s/iyor14/ocaml_5_0_prerelease_manual_effect
https://eyg.run/documentation/effects

coroutines

http://akkartik.name/coroutines-in-mu/
- "... In The Art of Computer Programming, Donald Knuth presents a solution to this sort of problem. His answer is to throw away the stack concept completely. Stop thinking of one process as the caller and the other as the callee, and start thinking of them as cooperating equals. In practical terms: replace the traditional "call" primitive with a slightly different one. The new "call" will save the return value somewhere other than on the stack, and will then jump to a location specified in another saved return value. So each time the decompressor emits another character, it saves its program counter and jumps to the last known location within the parser - and each time the parser needs another character, it saves its own program counter and jumps to the location saved by the decompressor. Control shuttles back and forth between the two routines exactly as often as necessary. This is very nice in theory, but in practice no mainstream language supports the coroutine call primitive. Languages like C depend utterly on their stack-based structure, so whenever control passes from any function to any other, one must be the caller and the other must be the callee. ..."

dynamic-wind

dynamic-wind in-guard thunk out-guard

Links:

http://www.gnu.org/software/guile/manual/html_node/Dynamic-Wind.html#Dynamic-Wind

monads

tail call optimization

Also called TCO or tail call elimination.

Poor man's substitutions for TCO

clojure recur: Clojure does not do implicit tail call optimization, but for direct recursion (where a function calls itself recursively; as opposed to mutual recursion, where two or more functions call each other recursively), one can get the same effect with 'recur'. "Evaluates the exprs in order, then, in parallel, rebinds the bindings of the recursion point to the values of the exprs. If the recursion point was a fn method, then it rebinds the params. If the recursion point was a loop, then it rebinds the loop bindings. Execution then jumps back to the recursion point" -- http://clojure.org/special_forms#recur. What that means is that (todo: i think) is that when the compiler sees 'recur', it traverses up the chain of containing lexical scopes until it hits either a loop or a function definition; this becomes the 'recursion point'. The arguments to 'recur' are expressions for updating the local variables defined at the recursion point. When recur is hit the current stack frame is recycled; this is an optimized tail call, or 'non-stack-consuming', as Clojure puts it.

clojure trampoline: for mutual recursion in clojure, instead of calling another function directly, you return an anonymous function that calls that function. then you invoke the initial function with 'trampoline'. trampoline calls the initial function, and if it returns a non-function, terminates and returns that, but if it returns a function, trampoline then tail-recursively iterates, calling the function that was returned. In other words, it's implementing CPS (continuation-passing style), with the restriction that there is no way to 'really' return a function as the final return value without escaping it. See also http://pramode.net/clojure/2010/05/08/clojure-trampoline/

Here is the clojure.core implementation of `trampoline':

(defn trampoline ([f] (let [ret (f)] (if (fn? ret) (recur ret) ret))) ([f & args] (trampoline #(apply f args))))

some motivations for tail calls

Computed gotos can be implemented via tail calls
- tail calls as a better goto: todo subsection link https://en.m.wikipedia.org/wiki/Goto
trampolines reduce performance

tail calls as a better goto: " In a paper delivered to the ACM conference in Seattle in 1977, Guy L. Steele summarized the debate over the GOTO and structured programming, and observed that procedure calls in the tail position of a procedure can be most optimally treated as a direct transfer of control to the called procedure, typically eliminating unnecessary stack manipulation operations.[38] Since such "tail calls" are very common in Lisp, a language where procedure calls are ubiquitous, this form of optimization considerably reduces the cost of a procedure call compared to the GOTO used in other languages. Steele argued that poorly implemented procedure calls had led to an artificial perception that the GOTO was cheap compared to the procedure call. Steele further argued that "in general procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as machine code JUMP instructions", with the machine code stack manipulation instructions "considered an optimization (rather than vice versa!)".[38] Steele cited evidence that well optimized numerical algorithms in Lisp could execute faster than code produced by then-available commercial Fortran compilers because the cost of a procedure call in Lisp was much lower. In Scheme, a Lisp dialect developed by Steele with Gerald Jay Sussman, tail call optimization is mandatory.[39] " -- https://en.m.wikipedia.org/wiki/Goto#Tail_calls

Conditionals/selection

There are various ways to conditionally execute code, or to define an expression by cases:

conditional goto (eg assembly-language conditional branching)
- branch on condition code flag in status register vs conditional test in the branching instruction; skip vs branching destination operand (often PC-relative; typically signed; often limited to a relatively small number, sometimes as small as +-128 (eg one byte to specify a signed PC-relative branch destination))
  - eg Lua 5.1 bytecode: EQ, LT, LE, TEST, TESTSET combine a conditional test in the instruction with a skip
if (note: first came from Lisp [5] ) (note: can be thought of a structured programming special case of GOTO using a structure with two forward GOTOs; the two blocks implicitly defined by the GOTO are explicitly represented with 'if')
- if with else
  - (compare else to finally; finally is about non-local exits)
switch (can be thought of as shorthand for nested ifs; or, an if can be thought of as a special case of a switch) (also sometimes called 'case')
pythonic switch (put first-class functions in a table or hash, then index into it with the switch variable, then execute the result)
nock if: primitive is pythonic switch (although 'table' is more like a binary tree with fixed indices) and then use True/False on it; note that here your primitive operator is selection, not a conditional, which is why this section is called Conditionals/selection
- does relational computation have something like this?
if via dispatch (and possibly guards) (see Chapter on control: dispatch)
- smalltalk if (todo doublecheck)
lambda calculus if (this is one of the previous two, right? todo)
predicated operations (eg ARM assembly); IF without ELSE; instead of having a GOTO at the beginning of the block, you have predicated (conditionally executed) instructions within the block; differences from if include implementation (change PC vs just go along), intermixing of conditional and non-conditional code, nestability (not nestable), and optimization (ARM implementation computes both branches at once, rather than speculatively guessing one branch (todo cite))

if

three alternatives to 'if':

'branch if'/'conditional jump'/'conditional goto': Often found in assembly language, this tests if some primitive condition holds and if it does, execution jumps ('goto's) a new location.
select+eval: A dynamic selection operator (such as indexing into an array) is applied to select one item out of a data structure holding many items. In this case, the items are blocks of code, and the result of the selection is one block of code. This block of code is then somehow evaluated. This is found in Nock ('macro instruction' 6), and in Python (as an idiomatic alternative to 'switch' statements)
smalltalk style

what evaluates to false

and typing gotcha in haskell

ternary if

e.g. in C: condition ? do-if-true : do-if-false

note: in most languages (but not in PHP), the ternary if operator has right associativity [6]

switch

note: C switch and fall-thru; many languages don't have fall-thru because it's more verbose and has the potential for bugs if you forget a 'break'

short circuit boolean ops

combined conditionals and branching

SPIN Prolema 'do' (do-od): a looping construct. Presumably loops until the command 'break' is executed. Like some forms of 'switch', it has multiple branches (delimited by '::'), and each branch has a guard (separated from the branch code by '->'). I think if multiple branch guards are True, then a non-deterministic choice is made. I think if all other branch guards are false, the 'else' guard is True.

while

alteratives to while:

(backwards) conditional branch
nock's 'nock' function used to implement a loop
the unconstrained minimization operator ('mu') in the https://en.wikipedia.org/wiki/%CE%9C-recursive_function

expression while with 'else'

e.g. in Pony: " Pony while loops are very similar to those in other languages ... But what if the condition evaluates to false the first time we try, then we don’t go round the loop at all? In Pony while expressions can also have an else block. In general, Pony else blocks provide a value when the expression they are attached to doesn’t. A while doesn’t have a value to give if the condition evaluates to false the first time, so the else provides it instead.

So is this like an else block on a while loop in Python? No, this is very different. In Python, the else is run when the while completes. In Pony the else is only run when the expression in the while isn’t.

...

break immediately exits from the innermost loop it’s in. Since the loop has to return a value break can take an expression. This is optional, and if it’s left out, the value from the else block is returned.

...

if no else expression is provided, None is returned. " -- https://tutorial.ponylang.io/expressions/control-structures.html

C# delegates: late-binding, method choice without object choice, chaining

In C#, instead of first-class functions, instead appears (I don't know C# as of this writing) something called 'delegates'.

Delegates aren't exactly functions, rather they are a choice of an object and a method on that object. Presumably the method on that object could be reassigned in the interim between when a delegate is created and called, making this different from a function, although i'm not sure if that is actually possible in C#.

Delegates have two more special features.

First, a delegate can, instead of specifing both a target object and a method on that object, only specify a method name, leaving the target object to be specified at the time they are called.

Second, a delegate can be 'chained', meaning that when a delegate is invoked, it calls its target and then invokes another delegate.

Chapter: control: dispatch

assembly:

subroutine CALL and RETURN instructions
sometimes special 'link register' for 1-level calls
unconditional jumps (absolute and PC-relative)
'calling convention'; how are subroutine args passed (stack? regs?); who is responsible for saving/restoring registers that the caller was using (caller, callee? which regs need to be saved, if any, and which are assumed to be overwritten by the callee and lost, if any; 'caller saved regs' vs 'callee saved regs', see [7])?
- give some example calling conventions (todo mv to the Assembly chapter);
  - win32 and win64 assembly and calling conventions: "Calling convention. Win32 uses the STDCALL convention whereas Win64 uses the FASTCALL convention. In STDCALL all parameters which are sent to an API are PUSHed on the stack. In Win32 the stack pointer (ESP) is reduced by 4 bytes for each PUSH. In STDCALL it is the responsibility of the API to restore the stack to equilibrium. In FASTCALL, the first four parameters are sent to the API in registers (in this order: RCX,RDX,R8 and R9), but the fifth and subsequent parameters are PUSHed on the stack. In Win64, the stack pointer (RSP) is reduced by 8 bytes for each PUSH. Unlike STDCALL, it is not the responsibility of the API to clear up the stack. Instead this must be done by the caller to the API. The caller must also ensure that there is space on the stack for the API to store the parameters which are passed in registers. In practice this is achieved by reducing the stack pointer by 32 bytes just before the call. Note than in GoAsm? all the work required by the FASTCALL calling convention is done automatically if you use INVOKE or ARG followed by INVOKE. See coding to comply with FASTCALL calling convention. The use of ARG and INVOKE is described in the relevant part of the GoAsm? manual. ... Windows uses the FASTCALL convention to call the window procedures and other callback procedures in your application. This means that your window procedures will pick up the parameters in a different way under Win64. Also the window procedures no longer have to restore the stack to equilibrium. Note that GoAsm? will implement these things automatically if you use FRAME...ENDF. The use of FRAME...ENDF is described in the relevant part of the GoAsm? manual. " -- http://www.godevtool.com/GoasmHelp/64bits.htm
  - http://llvm.org/docs/LangRef.html#calling-conventions
- optimization: register windows; so that you don't have to go to main memory to store/fetch subroutine arguments and saved registers by providing a large number of registers in a CPU and a command to quickly switch the currently active set of registers (the 'window') between these; kinda like a cache for the stack (but not); however you often have to 'spill' these 'cached' stack items to the real, in-memory stack when call depth is changing a lot, negating much of its benefit

polymorphic (static and dynamic)

when you call a function/method, which code actually gets executed?

polymorphism

todo isn't this whole section 'polymorphism'? rename

traditional non-polymorphic functions

('one dimensional dispatch', according to http://www.jot.fm/issues/issue_2008_03/article4/ ):

1-1 relationship between function names and the code

generic polymorphism

same source code in all cases, but possibly different object code

can be implemented via templates

ad-hoc polymorphism

you can have different source code executed for the same function/method name depending on context

OOP-style receiver type based dispatch

Choose the code to be executed based on both the method name, as also the type of 'receiver'. This terminology comes from considering a function call as a message to an object instance, the object instance being the 'receiver' of the message.

This way of doing things is called 'single dispatch polymorphism' (to contrast it with 'multiple dispatch', see below).

The method implementation executes within the context of its receiver, by which we mean that the private fields and methods of the instance are available during the execution of the method. In some languages, such as C++, the fields and methods of the receiver instance become in-scope for variable lookup, as if they were local variables.

OOP type systems often have subtypes and inheritance. This means that, if an object is of some type T, and T is a subtype of another type Tp, and Tp implements method name M, then objects of type T can also handle calls to method name M even without writing a separate implementation of method name M for type T. We say that type T inherits Tp's implementation of method M.

Now consider if type Tp is itself a subtype of another Tpp. Since subtype-ness is transitive, this means that T is also a subtype of Tpp. Perhaps Tpp also has its own implementation of method M. In this case, we have to have a procedure to determine which implementation of method M is used by T. The rule 'if T is a subtype of T2, and T has no implementation of M and T2 does, then T uses T2's implementation of M' is ambiguous, since either Tp or Tpp could fill the role of T2 in that rule.

If you have single inheritance, then there's an obvious solution. You recursively climb up the inheritance chain looking for an implementation of method M and stop at the first one you find. So, first you look at T, and it doesn't have an implementation of M, so you keep going. Then you look at Tp, and it does have an implementation of M, so you stop. The principal is that you want to use the 'most specific' implementation of M (in that a subtype is 'more specific' than its parent type).

If you have multiple inheritance, though, then this simple procedure does not suffice to eliminate ambiguity.

Note that, if one ignores access visibility restrictions on private fields and methods, than rather than thinking of the method as executing with the context of its receiver, you can model the 'receiver' as an actual argument passed to a function. Python makes this somewhat explicit; when you define a method inside a class, when that method is called, a value containing the object instance is prepended to the arguments which are passed by the caller. Traditionally in Python, this first argument is called 'self'. For example:

class A: def __init__(self, some_field_of_A): self.some_field_of_A = some_field_of_A def b(self, x): print repr(self) print repr(x) print repr(self.some_field_of_A)

a = A(1) a.b(2) print; print A.b(a,2)

Results in:

<__main__.A instance at 0x7f688028e5f0> 2 1

As you can see, the result of calling the 'b' method on instance 'a' was equivalent to calling the function A.b with the argument 'self' bound to the instance a, and the argument 'x' bound to 2. Also, even within the methods of the object instance a, the fields of a must be referenced through the 'self' variable.

determining the 'most specific' implementation in the case of multiple inheritance

todo the diamond problem

Subjective programming

Now we consider all three of the method name, the type of the receiver, and the type of the 'sender' of the message in the selection of which code to actually execute.

See 'Three-dimensional dispatch' is http://www.jot.fm/issues/issue_2008_03/article4/ , and R. B. Smith and D. Ungar. A simple and unifying approach to subjective objects. TAPOS special issue on Subjectivity in Object-Oriented Systems, 2(3):161-178, 1996.

multimethods

Also called 'multiple dispatch'.

Above, in section "OOP-style receiver type based dispatch", we saw that one can think of receiver-based dispatch equivalently as if the method definition is executed inside the context of its instance; equivalently (ignoring access restrictions), one can think of it as if a receiver is actually passed as an argument to a function. Generalizing the latter concept, we ask, if we can dispatch the method based on the type of the receiver argument, why not dispatch based on other arguments as well?

As with multiple inheritance, in this setting it is possible to have multiple implementations that match, so we need a way to resolve ambiguity. Often people try to come up with appropriate generalizations of 'most specific' and then to choose the 'most specific' method implementation when there is more than one to choose from. For example, a method implementation with more dependencies on the types of the arguments would be chosen over an otherwise identical method implementation which doesn't care about the types of some arguments.

Muschevici, Radu; Potanin, Alex; Tempero, Ewan; Noble, James (2008). "Multiple dispatch in practice". Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications. OOPSLA '08 (Nashville, TN, USA: ACM): 563–582. doi:10.1145/1449764.1449808. surveyed some programs in languages with multimethods and found that approximately 20% of polymorphic methods were multimethods (i.e. "13%-32% of generic functions utilize the dynamic type of a single argument, while 2.7%-6.5% of them utilize the dynamic type of multiple arguments", to quote Wikipedia).

An example of where you might want multimethods is when you are defining an operator such as "addition", which takes two arguments each of which may be various types such as integer or float.

See http://nice.sourceforge.net/visitor.html for a discussion of how multimethods can be used in some places where you might have used the Visitor pattern.

guards

In the above, we dispatched based on the method name and possibly also on the types of receivers, senders, and/or method arguments.

Guards enable dispatching to depend also on the values of method arguments.

As with multiple inheritance, in this setting it is possible to have multiple implementations that match, so we need a way to resolve ambiguity. Often people try to come up with appropriate generalizations of 'most specific' and then to choose the 'most specific' method implementation when there is more than one to choose from. For example, a method implementation with a matching guard would be chosen over an otherwise identical method implementation but with no guard.

http://www.p-cos.net/documents/filtered-dispatch.pdf Filtered Dispatch by Pascal Costanza, Charlotte Herzeel, Jorge Vallejos, Theo D’Hondt? presents one way to use guards and to allow the programmer to specify specificity of guards (so the compiler can resolve ambiguity during multimethod dispatch by choosing the 'most specific' method)
- that paper also has a good review of multimethod stuff in general, e.g. "A central contribution of object-oriented programming is ad-hoc polymorphism via dynamic method dispatch, which enables behavioral variations based on typically one receiver argument in object-centric languages, or potentially multi- ple arguments in languages based on generic functions. In object-centric languages, methods and their overriding rela- tionships are de ned along the inheritance chains of classes or objects, but even in the case of generic functions, method selection and combination is driven by the inheritance hier- archies in which the received arguments are involved."

Context-oriented programming

Now we consider four things when determining dispatch: the method name, the type of the receiver, the type of the sender of the message, and also the 'context', which is a dynamically-determined value.

The difference between "context-oriented programming" and guards is that in context-oriented programming (see http://www.jot.fm/issues/issue_2008_03/article4/ ), the "context" is specified as a sequence of 'layers'. Each 'layer' may contain one or more method implementations.

Note that elsewhere in this book we use the word 'context' in an informal sense to mean various different things; don't assume that we are talking about 'context-oriented programming' when you see the word 'context'.

Type classes

If, instead of grouping functions together as methods on objects, we just have unattached polymorphic functions, one thing we miss from receiver-oriented dispatch is the grouping together of various methods into an object. This was key to the idea of OOP encapsulation, that is, the idea that internal representations can be encapsulated by encapsulating data representations within objects with various methods by which their data can be accessed. As long as the rest of the program only accessed the data through these methods, you could change the representation of the data without breaking the rest of the program as long as you changed the methods at the same time. So every object's methods form a little API for accessing the data in that object.

If you just have unattached functions, even if those functions can be ad-hoc polymorphic depending upon the types of their arguments, you would still like a way to take a group of functions and put them together into a grouping, analogous to your object API (set of OOP methods for a class). You want to be able to say, e.g., "If something is a number, it should support addition, subtraction, multiplication, and division." Such a grouping is called a 'type class'. The signature of methods within a type class is polymorphic depending on the type. An implementation of a type class for a particular type is called a 'type class instance'.

Multimethods correspond to multi-parameter type classes. For example, an addition function which is ad-hoc polymorphic in both of its arguments could be contained within an 'arithmetic number' type class of addition, subtraction, multiplication, and division with two type arguments. There might be an instance of such a type class for the tuple of types (int, int), and another instance for (int, float) and another one for (float, int) and another one for (float, float).

Conditionals and boolean logic

Short-circuit connectives

Short-circuit connectives are forms of the operators 'or' and 'and' which don't evaluate their right-hand argument if the value of the left-hand argument is sufficient for determining their result.

For example, if we have three functions "returns_false" and "returns_true" and "f", and if "returns_false() == False", and "returns_true() == True", then in both of the following examples, if the language has short-circuit connectives then f() will never be called: returns_true() or f() returns_false() and f()

whereas in a language without short-circuit connectives, f() would have been called in both of those two examples.

Truthiness

Truthiness is a form of syntactic sugar that allows you to avoid the common cases of writing for example "x > 0" or "not x.isEmpty" in conditionals.

Many languages allow values with types other than boolean to be used in the predicate inside a conditional statement. This can be handled in various ways:

if the language supports implicit type coercion, the type can be coerced to a boolean
if the language supports interfaces/protocols/typeclasses, an interface can be defined for truthiness (for example, in Python, the __nonzero__ method)

Values which act as the boolean True when inside a conditional are sometimes called 'truthy', and other values 'falsy'.

For example, here's how truthiness works in Python https://docs.python.org/2/library/stdtypes.html#truth-value-testing

"In Ruby only false and nil are falsey. Everything else is truthy (yes, even 0 is truthy).

In some languages, like C and Python, 0 counts as false. In fact, in Python, empty arrays, strings, and hashes all count as false. In JavaScript?, empty strings count as false, but empty arrays count as true. " -- https://gist.github.com/jfarmer/2647362

In the language Pike, 0 is falsy, and everything else is truthy [8].

In some systems (eg R's falsy library), exception objects are also falsy.

Some languages with booleans:

Ruby, Lua, Smalltalk, JavaScript?, Python, Scheme (named variously eg true, True, #t) (note: PHP has booleans too, but Taw notes that "their behavior is full of hacks - print true prints 1, print false prints nothing, false == 0, false == NULL, true == 1, even true == 42."

Some languages without (dedicated) booleans:

Common Lisp (t is true but uses empty list (nil) for false)
Perl (uses 1, undef for true, false)

Smalltalk does not have truthiness ([9]).

What is truthy in various languages, out of empty list, 0, empty string:

Ruby [10], Scheme, Lua, Wren [11]: everything is truthy except False and null (todo, is this right?)
PHP, Perl, Python: empty list, 0, and empty string are falsy
JavaScript?: 0 and empty string are falsy but empty list is truthy
Common Lisp: empty list is false, 0 and empty string are truthy

"0" is truthy in most languages with truthiness, but in PHP and Perl, "0" is falsy; (Taw says that falsy "0" is a [12] "huge source of bugs"])

In languages with nullable types, the 'nil' element is typically falsy (in fact, i know of no languages in which it is not).

Links:

http://programmers.stackexchange.com/questions/199425/the-rationale-behind-falsy-values
http://t-a-w.blogspot.com/2007/07/truth-falsehood-and-voidness-in-dynamic.html (much of the information above was taken from this link) (this link also contains Taw's and others' opinions on how this should work). One thing Taw points out is that unless you have a way for user-defined types to define truthiness, setting 0 to be falsy is unwise because user-defined numeric-ish types won't be able to do the same.

Connectives with passthru return values

Some languages with truthiness also have syntactic sugar for the common operation of testing multiple values in a short-circuit fashion, and returning the first one to pass the test, or the last value if none pass. For example, in Python:

 ("hi" or "default") == "hi"
 ("" or "default") == "default"
 (False or "") == ""

What is happening is that the short-circuit 'or' operator is returning the last value that it tested. Because of the way that boolean 'or' works, the last value that is tested in short-circuit 'or' is always equal to the result (to see this, just consider all cases; False or False, False or True, True or False, True or True); the same is true for short-cirucit 'and'. Because the truthiness convention is being used, returning a value that is falsy or truthy is just as good as returning False or True itself.

Short-circuit 'and' can also be used to test multiple values in a short-circuit fashion, and return the first one to fail the test, or the last value if all pass. This is useful when the application of a test that you want to apply to a value will itself cause an error if the value is not of the right form; for example, in Python, if you have a list which might be empty, and you want to return that list but only if its first element is itself truthy, and otherwise you want a default: (maybeAnEmptyList and maybeAnEmptyList[0]) or default

'and' is also useful to test conjunctions within larger boolean expressions. In Python, if you want a value whenever both otherProperty is truthy, and value is truthy, and otherwise you want a default: (otherProperty and value) or default

It is also useful in some special cases where you want a sentinel value for some special case, and then sentinel you want happens to be the value that causes the special case, eg. if you want the average of a list of numbers, or 0 if the list is empty: count and total/count

Thanks [13].

Test-and-bind

Some languages with truthiness also have syntactic sugar for the common operation of testing a value for truthiness and binding it to a variable.

In Perl6, the following construct tests a value for truthiness and if it is truthy, binds that value to a variable and executes a block of code. For eg. (todo do i understand this correctly?)

  if (0..5).pick -> $number { say $number }
  if (0..5).pick { say $^number } # ditto

(thanks [14])

todo / links

https://en.wikipedia.org/wiki/Control_flow

dynamic wind/unwind (eg in Guile Scheme)

some complicated kinds of functions calling async: the fn returns a future (and, do you start the fn right away, or lazily?) or a future's generalization, a stream or the fn can callback is the fn an introspectable AST or just a fn? partial failure; is it possible that the function never got the message? is it possible that the function completed successfully but you never got its completion message? (are exceptions possibly raised, or is there an error return argument, or are nullable types used?) could the function not terminate? is there a timeout?

polymorphism dispatch based on first argument, or on multiple arguments? dispatch chosen at compiletime or runtime?

http://wayback.archive.org/web/20110716163344/http://projectfortress.sun.com/Projects/Community/blog/ObjectOrientedTailRecursion argues for tail recursion.

The argument is that to really achieve OOP encapsulation, you want fully abstract interfaces. This means that an instance of an object implementing an interface cannot make any assumptions about the internals of other instances implementing the same interface (because they may be different classes, or at least different subclasses, that implement the interface in very different ways). Yet, you sometimes want to compose these objects together into larger objects (such as collections). If you have a very large collection, sometimes you may want to apply an operation that requires the entire collection to be traversed, and returns an answer that depends on arbitrarily many items in the collection. But, various subsets of the collection might be within other objects (consider a list built by instantiating a Nil object instance, then a Cons object instance which contains that Nil object, then another Cons instance which contains the first Cons, etc). So there might be no single objects which has a list of all of the objects that must be traversed. This means that each of these objects must be called, and the depth of this call chain might reach the number of items in the collection.

Another example he gives is the "implementation of transitions in an object-oriented finite-state machine of indefinitely long lifetime."

(note: [15] gives an example to show that tail calls which are not strictly necessary here; other language constructs, such as thunks, could substitute for tail calls)

---

pattern matching

Python pattern matching (and a critique of it)
- discussion: https://lobste.rs/s/dpdmcg/structural_pattern_matching_python_3_10

"Pattern matching has been available in functional programming languages for decades now, it was introduced in the 70s. (Logic programming languages expose even more expressive forms, at higher runtime cost.) It obviously improves readability of code manipulating symbolic expressions/trees, and there is a lot of code like this...pattern-matching was developed mostly in dynamically-typed languages before being adopted in statically-typed languages, and it works just as well in a dynamically-typed world. (In the ML-family world, sum types and pattern-matching were introduced by Hope, an experimental dynamically-typed language; in the logic world, they are basic constructs of Prolog, which is also dynamically-typed – although some more-typed dialects exist.)" [16]

"...it’s used to pervasively in Erlang, and later Elixir. Clojure has core.match, Racket has match, as does Guile. It’s now in Ruby as well!" [17]

"It is an important feature of OCaml." [18]

if let and while let

Rust and Swift have some control flow that is conditional upon pattern matching:

" if let

if let allows you to combine if and let together to reduce the overhead of certain kinds of pattern matches.

For example, let’s say we have some sort of Option<T>. We want to call a function on it if it’s Some<T>, but do nothing if it’s None. That looks like this:

match option { Some(x) => { foo(x) }, None => {}, }

We don’t have to use match here, for example, we could use if:

if option.is_some() { let x = option.unwrap(); foo(x); }

Neither of these options is particularly appealing. We can use if let to do the same thing in a nicer way:

if let Some(x) = option { foo(x); }

If a pattern matches successfully, it binds any appropriate parts of the value to the identifiers in the pattern, then evaluates the expression. If the pattern doesn’t match, nothing happens.

If you want to do something else when the pattern does not match, you can use else:

if let Some(x) = option { foo(x); } else { bar(); }

while let

In a similar fashion, while let can be used when you want to conditionally loop as long as a value matches a certain pattern. It turns code like this:

let mut v = vec![1, 3, 5, 7, 11]; loop { match v.pop() { Some(x) => println!("{}", x), None => break, } }

Into code like this:

let mut v = vec![1, 3, 5, 7, 11]; while let Some(x) = v.pop() { println!("{}", x); } " -- https://doc.rust-lang.org/book/if-let.html

---

an introductory example of tail recursion:

https://funcall.blogspot.com/2009/03/not-lisp-again.html

---

chained conditionals: http://interactivepython.org/runestone/static/CS152f17/Selection/Chainedconditionals.html

---

unless

'unless' is used to create one-liner control structures that execute something on the left, UNLESS the condition on the right is true. Eg in Ruby:

  list_cart unless @cart.include? 'pineapple'

---

dataflow/reactive/spreadsheet-like control

---

effect handlers

Links:

Retrofitting Effect Handlers onto OCaml

---

https://beej.us/guide/bgc/html/split/goto.html discussion: https://lobste.rs/s/sgn2dk/goto

20 david_chisnall 3 days ago

link

flag

It’s worth remembering that the original form of GOTO that Dijkstra objected to was very different from the C one. His article was written at a time when structured programming was only starting to gain traction. The C goto is a purely local construct that affects flow control within a subroutine, whereas older languages used goto in place of explicit subroutines with strict call-return semantics, scoping, and all of the things that are so ingrained in modern programming languages that it’s easy to forget that they’re optional. This made it incredibly hard to understand the control flow of a non-trivial program because it would jump all over the place with ad-hoc policies for how it selected the next code fragment to jump to.

The C goto eliminates most of these problems because it’s closely tied to the scoping rules of the language. You can’t goto x if x is a label in another function. You can, as the article says, jump over variable initialisation, but you’re implicitly jumping through the allocation of the variable, so the allocation / deallocation of automatic-storage variables is still well defined.

" [19]

" FeepingCreature? edited 4 days ago

link

flag

It seems that all these cases, possibly except the retry loop, can be handled cleanly with these two language features: as the article notes, labelled loops

label: for (int i ... ) { continue label; break label; }

and scope guards.

void *mem = malloc(16); scope(exit) free(mem); "free" will be called if we return from here.

These also avoid the unclear interaction of goto with variable declarations.

(Of course, they aren’t in C. This is why you should use D in -betterC mode :p)

" -- [20]

---

Pipe/threading/arrow/composition

---

Two-armed while

~ thequux 41 minutes ago

link

flag

I’ve wanted to see a language with a 2-armed while statement for a while.

In C-ish syntax, we have the classic while loop: while (condition) { things(); } and the do-while loop do { things(); } while(condition);, depending on whether to check the condition at the beginning or the end.

However, I’d love to see a combination of the two: do { some_things(); } while(condition) { other_things();}, which would have similar semantics to while(true) { some_things(); if(!condition) break; other_things();}.

The two-armed while has the benefit of making it clearer where the loop condition is tested and satisfying colleagues that are picky about while loops that appear infinite, while nicely unifying the two types of while loop and being completely backwards compatible with traditional C syntax. The only cost is that you’d lose the ability to emit a diagnostic if you forget the ; after a do-while loop.

(for the nitpickers, yes, Forth has this in the form of its BEGIN WHILE REPEAT construct. However, Forth tends to be a difficult sell in most development shops)

    ~
    Moonchild 21 minutes ago | link | flag |

        Common lisp has this. E.G.:

        (loop
          with x = 3
          do (format t "a~%")
          until (zerop (decf x))
          do (format t "b~%"))

        This prints a b a b a.

Multi-sequence iteration

https://kristoff.it/blog/zig-multi-sequence-for-loops/

proj-plbook-plChControl

Chapter : control

goto

Resource cleanup

structured programming

continuations

escape continuations

first-class continuations

delimited continuations

links about continuations

algebraic effects

coroutines

dynamic-wind

monads

tail call optimization

Poor man's substitutions for TCO

some motivations for tail calls

Conditionals/selection

if

what evaluates to false

and typing gotcha in haskell

ternary if

switch

short circuit boolean ops

combined conditionals and branching

while

expression while with 'else'

C# delegates: late-binding, method choice without object choice, chaining

Chapter: control: dispatch

polymorphic (static and dynamic)

polymorphism

traditional non-polymorphic functions

generic polymorphism

ad-hoc polymorphism

OOP-style receiver type based dispatch

determining the 'most specific' implementation in the case of multiple inheritance

Subjective programming

multimethods

guards

Context-oriented programming

Type classes

Conditionals and boolean logic

Short-circuit connectives

Truthiness

Connectives with passthru return values

Test-and-bind

todo / links

pattern matching

if let and while let

unless

dataflow/reactive/spreadsheet-like control

effect handlers

Pipe/threading/arrow/composition

Two-armed while

Multi-sequence iteration