Table of Contents for Programming Languages: a survey

Chapter : History and families of programming languages

spectrums:

ctm (close-to-the-metal) (low-level vs high-level)

What is a low-level language? Two definitions are:

A low-level language gives the programmer more control over exactly what the computer does.
A low-level language maps more directly to the underlying hardware architecture of the computer (for most present-day computers, this means register machines with RAM)

More specifically, a low-level language:

does not introduce large CPU-time overhead (in practice, the gold standard for CPU-time overhead is taken to be C, so if a language claims to be 'zero-overhead' for some operation, this usually means compared to C)
does not introduce large memory overhead
allows the programmer to control the representation of data in memory
allows the programmer sufficient control over what the CPU does and what is in memory, and where, so as to permit interoperation with other programs in arbitrary other languages (without the use of another language, e.g. in Python you can write a C module using an 'extension API', but that doesn't count because you had to use C, not just Python; an exception is that use of another language (such as assembly) might be required in order to meet a platform-specific calling convention)

One common class of CPU-time overhead is automatic memory management. In languages with garbage collection, a call to the garbage collector sometimes takes a long time, and in languages with reference counting, a single reference decrement could in the worst case lead to an arbitrarily long cascade of memory deallocations. In addition, automatic memory management may not deallocate memory as soon as the programmer knows that it could be deallocated, so this often adds memory overhead. In addition, in languages with automatic memory management, the programmer must interact closely with automatic memory management if they want to pass a pointer to memory allocated by their program to an outside process, and in many languages with automatic memory management, the necessary primitives do not exist (in the base language; most languages have an extension API that includes these facilities, but utilizing the extension API often requires use of another language, such as C). For these reasons, 'lack of mandatory automatic memory management' is considered by many to be another criterion for a language to be considered 'low-level'.

If there are two languages A and B, and A compiles to B, then according to our definition, B <= A, because if B does not give you the capability to control CPU or memory in some way, then neither can A. One might think that ordering is strict, that is, B < A, that is, A must be at least a little higher level than its compilation target language, but this is not true; for example, you can write a 'compiler' that compiles a language to itself by merely returning as output the same source code that it was given. One can imagine less trivial examples, too, however, in which there are two languages which are equally able to express low-level constructs but which express them in differently ways; such a pair of languages could compile to each other.

Some fans of high-level languages might define things a little differently; instead of enumerating abilities of low-level languages, they might enumerate lists of things that their favorite high-level languages 'take care of for you' so that you 'don't have to worry about' them.

todo leaky abstractions

see also Part Implementation, Chapter Efficiency.

continuing on..

discipline (safe vs unsafe; tradeoff between programming time and debugging time (part of this: protect programmers from themselves), also flexibility vs self-documenting architecture for programming-in-the-large, also making others's code easy to read) (Yegge's liberal vs. conservative: https://plus.google.com/110981030061712822816/posts/KaSKeg4vQtz) Yegge gives some specific features that are polarized in this spectrum:

static typing (and schema-ful data), and in general maximizing statically knowable knowledge in order to maximize what the compiler can check for you (maximizing provable correctness), is disciplined
metaprogrammy stuff is undisciplined
- (re)defining syntax
- operator overloading (only a little bit undisciplined)
- eval
- dynamic invocation
- macros
- reflection, RTTI
explicit is disciplined, implicit is undisciplined
- implicit type conversion, esp. between unrelated types (e.g. implicit conversion to string) is undisciplined
- checked exceptions are disciplined
stuff that lets programmers constraint how others will use their code
- private fields/methods are disciplined
dynamic stuff
- dynamic scoping
- prototype inheritance
unsafe casting is undisciplined
nullable types
Whole-namespace imports (e.g. 'import * from' in Python) is undisciplined
Arity-based function overloading
heterogenous (mixed type) collections

big/small

performant/not (fast/slow)

lazy/eager

functional/not note: the word 'functional' is used in multiple ways when talking about programming languages:

there are "functional requirement" as opposed to "non-functional requirements" when listing the requirements of a particular program; the functional requirements relate to things the program is supposed to do, and the non-functional requirements are other things like not crashing, speed, etc.
a 'functional programming language' is sometimes used to mean a '(pure function)-centric programming language', that is, one oriented around the mathematical concept of pure functions, which is a fancy way of saying that functions in this language just behave like math functions; they are given an input and they deterministically calculate an output, and they don't otherwise rely upon or affect the state of the caller, or interact with the user (e.g. they have no side effects). Such a language tends to make a lot of use of expressions (as opposed to lists of imperative commands).
a 'functional programming language' is also sometimes used to mean a '(first class function)-centric programming language', that is, a language in which functions are first class objects (that is to say, they are 'reified'; that is to say, functions can be assigned to variables and passed around). Functions which operate on functions are called higher-order functions, and such a language is often oriented around expressive higher-order functions, often including partial functions.
"functional programming language" can mean: referential transparency (lambda calculus-like); firstclass functions; higher order functions.
- some languages with higher-order functions have currying, which may be considered 'more functional' because it eliminates the distinction between higher-order functions and and multiargument fns

Often, the latter two concepts coincide; because their incidence is highly correlated, some confusion has developed between them, and the word 'functional' has come to mean either or both. Languages which are both (pure function)-centric and (first class function)-centric often are based on a lambda calculus-like fundamental model.

An example of how one tends to compute by applying higher-order functions to other functions, instead of by iteration, can be found in http://bartoszmilewski.com/2014/11/12/formula-1-go-functional/, which shows how to calculate one could calculate a moving average in Haskell:

delay n lst = replicate n 0 ++ lst deltas n lst = map (uncurry (-)) (zip lst (delay n lst)) slidingSums n lst = scanl1 (+) (deltas n lst) movingAverage n list = map (/ (fromIntegral n)) (slidingSums n list)

some make the point that state cannot be avoided completely; the point of functional languages is to avoid state as a default, to make state more explicit, not elimination of state completely

(supposed) advantages of the functional style (deriving from eschewing implicit state):

less bugs
easier multi-threading
easier testing
easier maintanance
smaller, easier to compose functions

statically typed/dynamically typed/gradually typed/mostly untyped http://ecee.colorado.edu/~siek/gradualtyping.html

should we list models of computation here? maybe just the most 'important' ones in terms of programming language families:

turing, assembly, imperative lambda calc, ? , haskell/purely functional concatenative, ?, forth?, joy relational, ?, SQL logic programming, ?, prolog combinatorial, Nock, Hoon mu-recursive, ?, ? grammar-based, O-meta, ? (see also the list of Turing-equivalent systems in the formal chapter)

early binding/late binding (static/dynamic)

strictly typed/not

scripting/systems/application/end-user/embedded scripting/glue scripting: one liners from command line?

??? size types/design goals ???

typical practical languages

a language that is invented for people to write programs in.

(note: one might say 'general purpose' but this is more often used to talk about application domain; e.g. general purpose vs. systems vs. scripting)

agglomerative vs. spartan

agglomerative: 'offer all features that programmers find useful'

spartan: 'be opinionated.'

motivations for spartan can be:

an expression of an idea for a programming style
to keep things simple

agglomerative: non-orthogonal features are okay, because they save programmers time

spartan: minimal orthogonal subset of features

the defining characteristic of a spartan language design is to say, "Yeah, we agree that this feature would be commonly and practically useful, but it is not strictly necessary and we're not going to include it ever". (note the 'ever'; all programming languages have to initially omit some desired features because they don't have time to implement them, or because they are as of yet unsure what the best design is for the feature) (note the 'commonly': all programming languages omit features because they think they would only be of niche usefulness, therefore their complexity cost outweighs their benefit) (note the 'agree': often a language designer feels that some proposed feature would not really serve any significant use, and that existing mechanisms are just as good) (that being said, these criteria are rarely absolutely met; there is almost always some arguent that adding feature X has disadvantage Y). The usual reason given for the omission is that some other way of accomplishing the same thing is more in keeping with the desired style.

'spartan' is similar to the term 'Fluchtpunkt' as used in http://replove.herokuapp.com/sakura/index.html

examples: Python's OOWTDI, e.g. iteration over recursion, disinclusion of tail call optimization

examples: Golang's dislike of exceptions, possibly golang's exclusion of generics

examples of some agglomerative languages: 'practical lisps' from http://replove.herokuapp.com/sakura/index.html

core/kernel/intermediate/VM

a language with a very heavy focus on simplicity.

usually because it is an intermediate target of compilers and/or interpreters, for interoperability or for other reasons. Sometimes just for the sake of simplicity (e.g. Nock).

what sets this apart from 'spartan' languages is that a 'spartan' language will still include non-orthogonal features if their exclusion would make it just too annoying to program in the language; but a kernel language isn't expected to be a sensible way for humans to program in, or even to read; even the language designer prefers to program in something higher-level.

a kernel language is not necessarily completely orthogonal.

A common case of non-orthogonality is the inclusion of multiple ways to do the same thing for reasons of efficiency, e.g. a kernel language might provide an integer addition operator even though Successor (+1), along with basic control constructs, is theoretically sufficient to express addition, because on most platforms integer addition can be accomplished in O(1) (constant time) by an optimized primitive, whereas calling Successor n times to add n to a number is an O(n) operation. Similarly, a kernel language might provide floating-point data types and arithmetic even though it could theoretically only provide bitstreams or only provide integers, and expect a library to handle floating point, because by providing floating-point primitives it can map these to optimized implementations on the underlying platform.

examples:

interoperable common targets:

asm.js
JVM
CLR (CIL)
LLVM
Nock
Dis

intermediate targets and cross-compilation bootstrap subsets of various languages:

KLambda (Shen core)
RPython
Haskell Core, C-

languages which have spartan motivations to be kernel languages:

scheme R7RS
see 'kernel lisps' from http://replove.herokuapp.com/sakura/index.html

languages which have pedagogical motivations to be a kernel language:

see 'pedagogical lisps' in http://replove.herokuapp.com/sakura/index.html

embeddable and scripting languages

languages which are designed to be used embedded within or as scipting for other applications or devices.

such languages are sometimes very small in order to make them efficient and portable.

sometimes, like DSLs, they have features for specific niche application domains.

examples:

Lua
see 'exo-lisps' from http://replove.herokuapp.com/sakura/index.html

calculus

a language with an even more extreme focus on simplicity than kernel languages. Generally the calculii are absolutely orthogonal.

The purpose of the calculii is usually not to program in or to compile to, but rather as a mathematical abstraction about which theorems may be proved. The smallness of the language is crucial because often proofs will have to treat each part of the language by cases. The length, readability, and often the efficiency of the resulting programs is immaterial.

examples: Turing machines, lambda calculus, SKI combinator calculus

Turing-level DSL

a "domain-specific language" is a term for a language which, rather than being "general purpose" or even with a tilt towards a domain such as 'systems programming', is narrowly targeted at a niche domain.

todo examples:

sub-Turing DSL

a sub-Turing DSL is a DSL that is not Turing complete

todo examples:

data language

a data language is a language intended to express data, but not computation on that data

examples: HTML, XML, the literal-defining subset of any other language

Paradigms

todo: merge with above?

(big ideas; constraints)

these are not mutually exclusive

Structured/procedural programming

block-structured

GOTO considered harmful ; ban (discipline) on GOTO; flow control via primitives for selection (if) and iteration (e.g. while) and subroutine call and return

call stack

Object-oriented

differing opinions on what it is

http://c2.com/cgi/wiki?DefinitionsForOo

encapsulation of data with methods

forced encapsulation (access modifiers) vs. consenting adults

object scope; self object

message passing

polymorphism

Bob Martin claims that OOP also involves a ban (discipline) on calling functions via pointers; he claims that, because polymorphism covers the main reason to do this, along with the polymorphism in OOP tends to come a removal of the ability (or cultural encouragement) to do this at all, improving discipline (http://blog.8thlight.com/uncle-bob/2012/12/19/Three-Paradigms.html)

prototype vs. class-based

single vs. multiple inheritance

inheritance vs. composition

functions of OOP:

encapsulates behavior and state: https://news.ycombinator.com/item?id=8751227
inheritance (reuse and override code, mixins allowing a data API to be built from parts
encapsulation (keep data format and functions for using data together)
subtyping as a way to extend optimized code that knows about the internal format of the data
package mutable state: make it convenient to pass around related variables containing state together
scoping: make it convenient for a bunch of related functions and variables to refer to each other
object lifecycle: constructors and destructors; separation of the "uninitialized state" of when you have a class or factory to the "initialized state" of when you construct an object instance from that
typing and subtyping: inheritance type system for data types
the concept of class vs. instance, essential vs. accidental properties, adding state to a class to make an instance
?? what else

"OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them." -- Alan Kay (E-mail to Stefan Ram, 2003, http://www.purl.org/stefan_ram/pub/doc_kay_oop_en)

more on how Kay defines OOP: " Which didn’t even require inheritance, which is not like we know it today:

    I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea.

Where the big missing piece lacking in mainstream typed OO languages today is:

    The big idea is “messaging”

He advocates focus should instead be on messaging and the loose-coupling and interactions of modules rather than their internal object composition:

    The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be." -- http://mythz.servicestack.net/blog/2013/02/27/the-deep-insights-of-alan-kay/

more Kay: http://lists.squeakfoundation.org/pipermail/squeak-dev/1998-October/017019.html

The concepts of instances of classes, of objects with properties, and of different types of things which act autonomously according to their nature, go back at least to Plato and Aristotle. See An Aristotelian Understanding of Object-Oriented Programming, Object-Oriented Programming and Objectivist Epistemology: Parallels and Implications.

"This approach recognises that programming is neither the breaking down of calculations (imperative approach) nor the definition of functions (functional approach) but the definition and manipulation of domain objects (like numbers, strings, arrays, files, graphs)." -- Sergi Valverde and Ricard Solé, Punctuated equilibrium in the large scale evolution of programming languages, SFI working paper 2014-09-030

" SIMULA contains the core of the concepts now available in mainstream object-oriented languages such as C++, Eiffel, Java, and C#:

Class and object. The class concept as a template for creating instances (objects).
Subclass. Classes may be organized in a classification hierarchy by means of subclasses.
Virtual methods. A SIMULA class may define virtual methods that can be redefined in subclasses.
Active objects. An object in SIMULA may be the head of an active thread; technically it is a co-routine.
Action combination. SIMULA has an inner-construct for combining the action-parts of a class and its subclass.
Processes and schedulers. SIMULA makes it easy to write new concurrency abstractions, including schedulers.
Frameworks. SIMULA provided the first object-oriented framework in the form of Class Simulation—the mechanism it used to implement its simulation.
Automatic memory management, including garbage collection.

One exception to the broad adoption of SIMULA concepts is the notion of an active object with its own action sequence, which, strangely enough, has not been adapted to other languages. For Dahl & Nygaard, having active objects was an essential facility to be able to simulate concurrent processes from the real world. " -- [1]

encapsulation vs. generic data structures: "More generic data types generally have well-known and useful functions associated with them. We could enumerate the keys or values on a map, filter an array by a function, or reduce across it. Alan Perlis [allegedly] said “It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures,” and I tend to agree. One benefit of using a more generic data structure instead of hiding that data behind a class is that it removes an extra step in applying well-known functions to that data." [2]

some constructs substituting for OOP functionality in non-OOP languages: https://8thlight.com/blog/colin-jones/2012/06/05/on-obsessions-primitive-and-otherwise.html

an argument against encapsulation:

" Objects bind functions and data structures together in indivisible units. I think this is a fundamental error since functions and data structures belong in totally different worlds. Why is this?

Functions do things. They have inputs and outputs. The inputs and outputs are data structures, which get changed by the functions. In most languages functions are built from sequences of imperatives: “Do this and then that …” to understand functions you have to understand the order in which things get done (In lazy functional programming languages (FPLs) and logical languages this restriction is relaxed).

    Data structures just are. They don’t do anything. They are intrinsically declarative. “Understanding” a data structure is a lot easier than “understanding” a function.

    Functions are understood as black boxes that transform inputs to outputs. If I understand the input and the output then I have understood the function. This does not mean to say that I could have written the function.

    Functions are usually “understood” by observing that they are the things in a computational system whose job is to transfer data structure of type T1 into data structure of type T2.

    Since functions and data structures are completely different types of animal it is fundamentally incorrect to lock them up in the same cage." -- [3]

another one:

" And encapsulation is usually not what you want either. Usually the problem is you're writing code in one part of the program, and suddenly realize it needs data from a different part of the program. If you made your data structures separate from your code, you can just go get the data you need. Putting it behind an "encapsulation" wall and making you go through object methods usually just results in unneeded complexity in the form of object instantiations and calling getter/setter functions and so fourth. Now I know, people will argue about encapsulation in distributed programs, but process boundaries and machine boundaries impose encapsulation on their own and usually in places where it makes sense. Where it doesn't make sense is within the same program, hiding the data of one part of the program from another parts of the same program. " -- [4]

some arguments against OOP: http://harmful.cat-v.org/software/OO_programming/why_oo_sucks

the "Ingalls test" for OOP-ness of a language:

"Dan Ingalls, one of the designers of Smalltalk, proposed a test for whether a language is object-oriented. In an object-oriented language, a user should be able to create a new type of integer and use it to specify the coordinates for drawing in a system-provided window class." [5]

Imperative programming

based on Turing machine

Functional programming

based on expression evaluation

referential transparency

  don't use variable assignment

  discipline/ban on (most) side-effects

higher-order functions function composition maps

tail call optimization

partial function application

closures

anonymous functions

based on lambda calculus

Links:

http://maryrosecook.com/blog/post/a-practical-introduction-to-functional-programming (but see also https://news.ycombinator.com/item?id=8943356 eg https://news.ycombinator.com/item?id=8944632 )

Logic programming

Dataflow programming

The word 'dataflow' means at least four things.

In the context of computer hardware, "dataflow" often means a "dataflow architecture", where "When all of the tagged operands of an instruction become available (that is, output from previous instructions and/or user input), the instruction is marked as ready for execution by an execution unit" [6].

In software, "dataflow" can be used to mean "the idea that changing the value of a variable should automatically force recalculation of the values of variables which depend on its value", like a spreadsheet [7]. This idea also called reactive programming.

So, dataflow, the hardware architecture, seems to be about how do you sequence a set of steps in a computation which may only be run once, whereas dataflow, in software, seems to be about how do you implicitly maintain invariants relating multiple pieces of state which may presumably lead to the need to redo a calculation many times.

In the context of programming language families, the term "dataflow programming" "is a programming paradigm that models a program as a directed graph of the data flowing between operations" [8].

In the context of compiler implementation, data-flow analysis refers to analysis of the propagation of data values through parts of a computer program.

Links:

http://en.wikipedia.org/wiki/Dataflow_architecture (a computer hardware architecture)
https://en.wikipedia.org/wiki/Dataflow#Software_architecture
https://en.wikipedia.org/wiki/Dataflow_programming (a programming language paradigm)
https://en.wikipedia.org/wiki/Data-flow_analysis (a type of program analysis)

Active data

Declarative

"Here's the gist: in most mainstream languages, you describe how to solve a particular problem; in declarative languages, you merely describe the result you want, and the language itself figures out how to get there." -- http://brikis98.blogspot.com/2014/04/six-programming-paradigms-that-will.html

examples: prolog, SQL

disadvantages: if the way that the language figures out how to solve the problem is inefficient, you end up having to learn about what it's doing and optimizing it anyway, in which case the resulting program is even harder to work with than imperative languages because of the additional layer of abstraction.

some question the usage of the term 'declarative': http://existentialtype.wordpress.com/2013/07/18/what-if-anything-is-a-declarative-language/

Concatenative

advantages: less syntax, opportunities for metaprogramming, close to the metal

disadvantages: programmer has to keep the contents of the stack in mind at all times. Named variables (and keyword arguments) can be seen as an invention to avoid having to do this.

Combinatory

SK combinators Y combinator

Nock

??? what family is Nock in? is it combinatory? is it its own thing?

Links about paradigms

http://people.ku.edu/~nkinners/LangList/Extras/classif.htm

systems programming

space efficient: possibility of manual memory management
can call and be called by C
can precisely specify in-memory representations
can cast, e.g. can take a memory representation passed in from outside and tell the language to assume it holds something of an in-language type
efficient (e.g. C++'s 'zero cost abstractions')
often unsafe (e.g. can crash)

types of type systems

dependent typing: can capture properties of values, e.g. you can say 'a list with 3 elements' or 'a list with at least one element' or 'a number less than 255'. fully generalized dependent typing can capture arbitrary computable properties of values. Inference with fully generalized dependent typing is intractable.

http://www.infoq.com/presentations/data-types-issues

slide 12

classify type systems into dimensions static/dynamic, weak/strong:

dynamic, weak: assembly dynamic, moderate: js, ruby dynamic, strong: python, clojure dynamic-ish but somewhat static, moderate: typescript, dart static-ish but somewhat dynamic, moderate: java, C# static-ish but somewhat dynamic, strong: scala static, weak: C static, strong: OCaml, Haskell

slide 13

classify type systems into dimensions coarse/detailed, weak/strong:

coarse, weak: C coarse, strong ("my way or the highway"): Go, Java 4, Pascal detailed, weak ("cutting corners"): eiffel, typescript, dart detailed, strong ("type it to the max"): scala, F#, haskell, ocaml moderate, moderate: C#, java 5+

My Way or The Highway (coarse, strong):

" Simple type systems No generics Not that extensible by users

Simpler tooling
Highly normative "

Type it to the Max: " Rich* language to write types Type combination forms, including generics. Type systems often inspired by logic.

Often, turing complete

Where dynamic languages had the upper hand: – No type-imposed limits to expressiveness  Rich type system + escape hatches such as casts – No boilerplate  Type Inference – Easier for exploration  Bottom type Nothing, ???

Other Strengths of Dynamic • Simpler languages − Rich types add complexity • Fewer puzzling compiler errors

Cutting corners: Appeal to user’s intuitions (covariant generics). – Employee are Persons – So functions from Employees to Employers are also functions from Persons to Employers, right? • Embrace unsoundness. • Easy, and superficially simple. • But, fundamentally, a hack. • Can we build great designs on false theories?

Precision Soundness Simplicity Take Any Two?

hmm what "take any two" would i say? maybe:

permissive of power (e.g. dynamic languages permit generics, even though their type system doesn't 'understand' this concept) simple/easy to learn and use precise (e.g. you can have generics in dynamic languages, but they aren't typechecked; it's more precise to have a language whose type system knows about generics specifically)

i guess that's the same as what he said; he's replacing my 'precise' with 'sound', and my 'precision' with 'permissive of power'

http://lambda-the-ultimate.org/node/3166

todo

"What real functional programmers do is “multi-paradigm”– mostly functional, but with imperative techniques used when appropriate. What the debate comes down to is the question of what should be the primary, default “building block” of a program. To a functional programmer, it’s a referentially-transparent (i.e. returning the same output every time per input, like a mathematical function) function. In imperative programming, it’s a stateful action. In object-oriented programming, it’s an object, a more general construct that might be a referentially-transparent function, might represent an action in a hand-rolled domain-specific language (DSL) or might be something else entirely. " -- https://michaelochurch.wordpress.com/2012/04/24/functional-programming-is-a-ghetto/

" In general, functional programming is right. The functional approach is not right for every problem, but there is a right answer regarding the default abstractions for building most high-level programs: immutable data, and referentially transparent functions should be the default, except in special cases where something else is clearly more appropriate. Why? A computational action is, without more knowledge, impossible to test or reason about, because one cannot control the environment in which it exists. One needs to be able to know (and usually, to control) the environment in which the action occurs in order to know if it’s being done right. Usually, the tester wants to be able to cover all special cases of this action, in which case knowing the environment isn’t enough; controlling it is also necessary. But at this point, the state of the environment in which the test happens is an implicit parameter to the action, and making it explicit would make it a referentially-transparent function. In many cases, it’s better to do so– when possible. It might not be. For example, the environment (e.g. the state in a computer cluster database) might be too large, complex, or volatile to explicitly thread into a function. Much of real-world functional programming is not about eliminating all state, but about managing what state is intrinsic, necessary or appropriate. " -- https://michaelochurch.wordpress.com/2012/04/24/functional-programming-is-a-ghetto/

should i maybe have stereotypes of different types of programmers, or at least, dimensions along which they stereotypically vary?

the person who likes fancy IDEs vs fancy languages?

the person who likes new languages vs old?

the person who likes unix-y plaintext vs binary formats?

the person who likes static typing vs dynamic typing?

the person who likes extensive tests vs not?

" The answer is that we tend to insist on good design, to such a degree that we avoid taking jobs where we’re at risk of having to deal with bad designs. This isn’t a vice on our part; it’s a learned necessity not to waste one’s time or risk one’s career trying to “fix” hopeless systems or collapsing companies. Generally, we’ve come to know the signs of necrosis. We like the JVM languages Clojure and Scala, and we might use Java-the-language when needed, but we hate “Java shops” (i.e. Java-the-culture) with a passion, because we know that they generate intractable legacy messes in spite of their best efforts. Say “POJO” or “Visitor pattern”, and you’ve lost us. This seems like arrogance, but for a person with a long-term view, it’s necessary. There are a million “new new things” being developed at any given time, and 995,000 of them are reincarnations of failed old things that are going to crash and burn. If I could characterize the mindset of a functional programmer, it’s that we’re conservative. We don’t trust methodologies or “design patterns” or all-purpose frameworks promising to save the world, we don’t believe in “silver bullets” because we know that software is intrinsically difficult, and we generally don’t believe that the shiniest IDE provides us enough to compensate for its shortfalls and false comforts. For example, an IDE is useless for resolving a production crisis occurring on a server 3,000 miles away. We’d rather use vim or emacs, and the command line, because we know they work pretty much everywhere, and because they give us enough power in editing to be productive." -- https://michaelochurch.wordpress.com/2012/04/24/functional-programming-is-a-ghetto/

---

Links:

Sergi Valverde and Ricard Solé, [http://www.santafe.edu/media/workingpapers/14-09-030.pdf Punctuated equilibrium in the large scale evolution of programming languages], SFI working paper 2014-09-030

The authors start with a dataset of which programming languages are said to have been influenced by which, and from this construct a network model of programming language evolution which matches various statistics from the data. However, actual programming language evolution is found to be more 'bursty' than the model, which is taken as evidence for a quality of 'punctuated equilibrium' in programming language evolution.

Supplement 1 contains lists of language families and some descriptions of some of them:

Figure 1 lists 20 connected components (although they say there is only 18). Table 1 summarizes them, in order from smallest number of languages in the cluster to largest. I will reproduce part of Table 1, except instead of giving the earliest language (the root) as the example of each component, i'll give what i believe to be the most well-known one:

representative language (or the earliest one, if i don't think any of them are well-known), number of languages in the cluster, timespan, paradigm name
brainf*ck, 2, ??, esoteric (not in table 1)
npl, 2, 1977-1980, Pattern Matching
general algebraic modeling system (GAMS), 2, 1978-1993, Mathematical Optimization
inform-sis, 2, 1972-1986, Fourth-generation language
cms-exec, 2, 1966-1972, Command Procedure Control
crt-rps, 2, 1967-1975, Report processing
metafont, 2, 1979-1994, Vector graphics
schoonship, 3, 1967-1984, Computer Algebra Systems
music-n, 3, 1957-2003, Music processing
hypertalk, 4, 1987-1993, Scripting (Multimedia)
yacc, 4, 1970-1996, Parsing
fargo, 4, 1959-1978, Report processing
m4, 4, 1965-1977, Macro processing
cobol, 5, 1955-1996, Business languages
sql, 6, ??, query (not in table 1)
prolog, 10, 1972-2006, Declarative (Logic)
apl, 11, 1964-2006, Array programming
xml, 14, 1969-2005, Data Definition languages
Lisp, 67, 1954-2011, Functional languages
Fortran, 198, 1953-2012, Imperative (multi-paradigm)

They then describe some of these clusters ("language families"). Here, i'll just note the subfamilies and languages they mention (again, sometimes renaming families from their earliest member to a more well-known member):

"functional" family (Lisp family)
- Logo
- Forth
- Smalltalk
- Scheme and Common Lisp
- Ruby
Array languages (APL family) (J is also given as a 'canonical example')
Business languages (COBOL family)
Logic languages (Prolog family)
Definition languages (XML family)
Imperative/multi-paradigm family (Fortran family)
- ML (also includes Clojure) family
- BASIC family
- Fortran and ALGOL family
- Pascal family
- C family

They give cladograms for many of these.

In Figure 1, the graph, the two largest clusters have multiple subclusters with a major language in larger type. I don't know if the larger font size just means that the authors subjectively decided to emphasize these, or if it means something objective. These two clusters and their larger-type children are:

lisp:

forth
ruby
logo
smalltalk
scheme

fortran:

algol (various variants of algol)
simula
pl/i
pascal
ada
iswim
ml
haskell
basic
speedcoding (the earliest member of what i am calling the fortran cluster)
C
objective-c
python
javascript
java
C#

In their dynamic "backbone explorer" at http://svalver.github.io/Proglang/paradigms.html , they give another list of families:

Functional Languages
Logo family
Forth family
Smalltalk family
Scheme family
Ruby family
Imperative Languages
ISWIM family
JOSS and BASIC family
Algol family
Pascal family
C family
Array Languages
Business Languages
Logic Languages
Definition Languages

They assert (presumably from other sources) that "There are four major PLs categories: imperative, functional, logic and object-oriented (see below)."; however, they note that their method does not produce a separate cluster for OOP. They say that "The concept of programming paradigm was rst introduced by Robert W. Floyd during his Turing Award lecture in 1978....In his lecture `The Paradigms of Programming', Floyd urged software practitioners and teachers of programming to identify what styles (or schools) of programming are they using (the so-called programming paradigms)....Unfortunately, there has been no systematic attempt to classify PLs according to programming paradigms."

They note that "Compatibility, flexibility, memory requirements and third-party support are key factors of PL success and main drivers of technological innovation".

Compare to https://en.wikipedia.org/wiki/Generational_list_of_programming_languages, which has families:

ALGOL
APL
BASIC
Batch
C (includes Perl, Python, C++, and under C++, Java, C#)
COBOL
COMIT (includes SNOBOL, and under that, Icon, and Lua (Lua is also under Modula and Scheme))
DCL (includes Windows PowerShell? (which is also under C#, ksh, Perl)
ed (includes Perl, which is also under C)
Eiffel
Forth
Fortran
FP
HyperTalk?
Java
JOSS
Lisp
ML
PL/I
Prolog
SASL (part of the ISWIM-based cluster in [9]'s clustering; includes Miranda and Haskell)
SETL (includes Python, which is also under C)
sh
Simula (includes Smalltalk, and under Smalltalk: Ruby (also under Perl), Javascript)
Tcl

todo: cover 'term rewriting'

eg Nock. 'Computation via macro substitution' (combinators)

term rewriting vs. lambda calculus:

http://stackoverflow.com/questions/24330902/how-does-term-rewriting-based-evaluation-work :

" How does term-rewriting based evaluation work?

The Pure programming language is apparently based on term rewriting, instead of the lambda-calculus that traditionally underlies similar-looking languages. ...

The matching of patterns, and substitution into output expressions, superficially looks a bit like syntax-rules to me (or even the humble #define), but the main feature of that is obviously that it happens before rather than during evaluation, whereas Pure is fully dynamic and there is no obvious phase separation in its evaluation system (and in fact otherwise Lisp macro systems have always made a big noise about how they are not different from function application). Being able to manipulate symbolic expression values is cool'n'all, but also seems like an artifact of the dynamic type system rather than something core to the evaluation strategy (pretty sure you could overload operators in Scheme to work on symbolic values; in fact you can even do it in C++ with expression templates).

So what is the mechanical/operational difference between term rewriting (as used by Pure) and traditional function application, as the underlying model of evaluation, when substitution happens in both?

1 Answer

Term rewriting doesn't have to look anything like function application, but languages like Pure emphasise this style because a) beta-reduction is simple to define as a rewrite rule and b) functional programming is a well-understood paradigm.

A counter-example would be a blackboard or tuple-space paradigm, which term-rewriting is also well-suited for.

One practical difference between beta-reduction and full term-rewriting is that rewrite rules can operate on the definition of an expression, rather than just its value. This includes pattern-matching on reducible expressions:

-- Functional style map f nil = nil map f (cons x xs) = cons (f x) (map f xs)

-- Compose f and g before mapping, to prevent traversing xs twice result = map (compose f g) xs

-- Term-rewriting style: spot double-maps before they're reduced map f (map g xs) = map (compose f g) xs map f nil = nil map f (cons x xs) = cons (f x) (map f xs)

-- All double maps are now automatically fused result = map f (map g xs)

Notice that we can do this with LISP macros (or C++ templates), since they are a term-rewriting system, but this style blurs LISP's crisp distinction between macros and functions.

CPP's #define isn't equivalent, since it's not safe or hygenic (sytactically-valid programs can become invalid after pre-processing).

...

Another practical consideration is that rewrite rules must be confluent if we want deterministic results, ie. we get the same result regardless of which order we apply the rules in. No algorithm can check this for us (it's undecidable in general) and the search space is far too large for individual tests to tell us much. Instead we must convince ourselves that our system is confluent by some formal or informal proof; one way would be to follow systems which are already known to be confluent.

For example, beta-reduction is known to be confluent (via the Church-Rosser Theorem), so if we write all of our rules in the style of beta-reductions then we can be confident that our rules are confluent. Of course, that's exactly what functional programming languages do!

btw the first few pages to this paper describe Aspect Oriented Programming more clearly than most:

http://homepages.cwi.nl/~paulk/publications/trs-aop.pdf

Links:

http://hopl.info/

todo

dynamism/late-binding

one example of when dynamism/late binding is useful:

" Bill Venners: In Ruby, I can add methods and variables to objects at runtime. I have certainly added methods and variables to classes, which become objects at runtime, in other languages. But in Java, for example, once a class is loaded or an object is instantiated, its interface stays the same. Allowing the interface to change at runtime seems a bit scary to me. I’m curious how I would use that feature in Ruby. What’s the benefit of being able to add methods at runtime?

    Yukihiro Matsumoto: One example is a proxy. Instead of designing individual proxy classes for each particular class, in Ruby you can create an all purpose proxy that can wrap any object. The proxy can probe the object inside of it and just morph into the proxy for that object. The proxy can add methods to itself so it has the same interface as the wrapped object, and each of those methods can delegate to the corresponding method in the wrapped object. So an all-purpose proxy, which can be used to wrap any object, is an example of how a library class can adapt to the environment. " -- [10]

---

https://www.hillelwayne.com/post/equals-as-assignment/

discussion: https://news.ycombinator.com/item?id=16803874

---

https://web.archive.org/web/20030217210838/http://www.byte.com/art/9509/sec7/art19.htm

---

Professor Frisby's Mostly Adequate Guide to Functional Programming (2015)
- https://news.ycombinator.com/item?id=17072662

---

todo: this blog post clearly separates the three senses of 'inheritance' and also touches upon two other things that this book should mention the 'square is not a subclass of rectangle' paradox of OOP, and the diamond problem of OOP inheritance:

" Why inheritance never made any sense Graham / March 16, 2018 / OOP

There are three different types of inheritance going on.

    Ontological inheritance is about specialisation: this thing is a specific variety of that thing (a football is a sphere and it has this radius)
    Abstract data type inheritance is about substitution: this thing behaves in all the ways that thing does and has this behaviour (this is the Liskov substitution principle)
    Implementation inheritance is about code sharing: this thing takes some of the properties of that thing and overrides or augments them in this way. The inheritance in my post On Inheritance is this type and only this type of inheritance.

These are three different, and frequently irreconcilable, relationships. Requiring any, or even all, of them, presents no difficulty. However, requiring one mechanism support any two or more of them is asking for trouble.

A common counterexample to OO inheritance is the relationship between a square and a rectangle. Geometrically, a square is a specialisation of a rectangle: every square is a rectangle, not every rectangle is a square. For all s in Squares, s is a Rectangle and width of s is equal to height of s. As a type, this relationship is reversed: you can use a rectangle everywhere you can use a square (by having a rectangle with the same width and height), but you cannot use a square everywhere you can use a rectangle (for example, you can’t give it a different width and height).

Notice that this is incompatibility between the inheritance directions of the geometric properties and the abstract data type properties of squares and rectangles; two dimensions which are completely unrelated to each other and indeed to any form of software implementation. We have so far said nothing about implementation inheritance, so haven’t even considered writing software.

Smalltalk and many later languages use single inheritance for implementation inheritance, because multiple inheritance is incompatible with the goal of implementation inheritance due to the diamond problem (traits provide a reliable way for the incompatibility to manifest, and leave resolution as an exercise to the reader). On the other hand, single inheritance is incompatible with ontological inheritance, as a square is both a rectangle and an equilateral polygon. " -- [11]

---

http://hopl.info/

---

Humourous looks at programming languages

---

https://people.inf.ethz.ch/wirth/Miscellaneous/Styles.pdf

(on procedural and functional families)

---

" Functional Programming is simply the same thing as imperative programming only without the power of mutability. No variable reassignment. No looping. " [12]

---

if...then...else had to be invented

---

https://cs.lmu.edu/~ray/notes/pltypes/

---

Great summary of some historically important languages:

https://hillelwayne.com/post/influential-dead-languages/

discussion: https://news.ycombinator.com/item?id=22690229 (some top comments there mention Self, Perl, Forth)

---

The next 700 programming languages (1966)

" This is my favourite PLT paper.

Although the term “functional programming” was already in use (to describe Lisp), this paper is actually the original functional programming manifesto that led to modern functional programming languages like SML, OCaml, F#, Haskell and Idris. The language it describes (ISWIM) is the ancestor of all the languages I listed, via ML. Note that Landin had previously invented lexically scoped closures (in 1964), a key functional programming feature that Lisp didn’t adopt until much later (some time in the 1970s). " -- doug-moen

---

https://people.kernel.org/linusw/rust-in-perspective

(note: some people found errors in this though: https://www.reddit.com/r/rust/comments/wpbe80/comment/ikg3r2p/?context=3 https://lobste.rs/s/kghptx/rust_perspective )

---

A Brief History of Software Engineering Niklaus Wirth (25.2.2008)

-- https://docslib.org/doc/842499/a-brief-history-of-software-engineering-niklaus-wirth-wirth-inf-ethz-ch-25-2-2008

--- https://thenewstack.io/rust-creator-graydon-hoare-recounts-the-history-of-compilers/

---

"...declarative languages are characterized as having no implicit state, and thus the emphasis is placed entirely on programming with expressions (or terms). In particular, functional languages are dec- larative languages whose underlying model of computation is the function (in contrast to, for example, the relation that forms the basis for logic programming languages). " -- P. Hudak. Conception, evolution, and application of functional programming languages. ACM Computing Surveys, 21(3), 1989

---

" 2. Java Java was released in 1996, and its immediate success had a critical impact in popularising garbage collection. Before Java, using languages with garbage collection for application programming was an extremely controversial idea among working programmers. Guy Steele (who co-authored the Java spec) remarked about this on the ll1-discuss mailing list:

    We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp.

Java also ended up dragging programmers halfway to ML! In 1998, Gilad Bracha, Martin Odersky, David Stoutamire and Phil Wadler proposed adding parametric polymorphism to Java, and in 2004 after 5 years of arguments it was added in Java 5. This also made a big impact in popularising advanced type systems features.

3. Dynamic Languages Languages like Perl, Python, Ruby, and Javascript (in roughly that order) began achieving serious popularity in the late 90s/early 2000s, primarily as tools for building websites. Like Java, all of these languages had garbage collection (even if only reference counting), which helped normalise it.

But in addition to automatic memory management, these languages all supported first-class functions of one variety or another, and so gave literally millions of programmers their first introduction to functional programming.

Younger programmers will be surprised by just how controversial an addition first-class functions were! Java omitted it initially, despite its designers being expert Lisp and Smalltalk programmers. Even C#, which was released 6 years after Java, did not support them at first. IMO it was only because of the experience people had with dynamic languages that created the pressure to add them.

Nowadays, thanks to the influence of dynamic languages on the design zeitgeist, even languges whose design aesthetic is deliberately conservative, such as Go, include first-class functions without a second thought. " -- https://semantic-domain.blogspot.com/2022/09/the-golden-age-of-pl-research.html?m=1

Classification of the principal programming paradigms by Peter Van Roy

---

https://madhadron-a-programmers-miscellany.simplecast.com/episodes/the-trail-of-algol/transcript

---

proj-plbook-plChHistory

Chapter : History and families of programming languages

ctm (close-to-the-metal) (low-level vs high-level)

continuing on..

??? size types/design goals ???

typical practical languages

core/kernel/intermediate/VM

embeddable and scripting languages

calculus

Turing-level DSL

sub-Turing DSL

data language

Paradigms

Structured/procedural programming

Object-oriented

Imperative programming

Functional programming

Logic programming

Dataflow programming

Active data

Declarative

Concatenative

Combinatory

Nock

Links about paradigms

systems programming

types of type systems

todo

Sergi Valverde and Ricard Solé, [http://www.santafe.edu/media/workingpapers/14-09-030.pdf Punctuated equilibrium in the large scale evolution of programming languages], SFI working paper 2014-09-030

todo

dynamism/late-binding