proj-plbook-plChHaskell

Difference between revision 58 and current revision

No diff available.

Table of Contents for Programming Languages: a survey

Haskell

Because it is so well-liked, Haskell gets its own chapter.

"In Haskell’s case, the Big Idea was purely functional lazy evaluation (or, if you want to be pedantic, call it “non-strict” instead of lazy)." [1]

Attributes:

Pros:

Syntax:

Library highlights:

Extension recommendations:

The following extensions are within the top 18 items and are >= net 33% in the 2020 Haskell survey poll, and are in the top 22 items and are >= 15% in the 2019 Haskell survey poll, and are rated as Basic in https://limperg.de/ghc-extensions/, and are rated as Benign or General in http://dev.stephendiehl.com/hask/#language-extensions, and are rated as 'Use at will' on https://gist.github.com/mightybyte/6c469c125eb50e0c2ebf4ae26b5adfff , and are on the list of default extensions in https://lexi-lambda.github.io/blog/2018/02/10/an-opinionated-guide-to-haskell-in-2018/ , and are in at least one of https://www.schoolofhaskell.com/user/PthariensFlame/guide-to-ghc-extensions, https://ocharles.org.uk/pages/2014-12-01-24-days-of-ghc-extensions.html:

The following extensions are within the top 18 items and are >= net 33% in the 2020 Haskell survey poll, and are in the top 22 items and are >= 15% in the 2019 Haskell survey poll, and are rated as Basic or Advanced in in https://limperg.de/ghc-extensions/, and are rated as Benign or General or Advanced or Lowlevel in http://dev.stephendiehl.com/hask/#language-extensions, and are rated as 'Use at will' or "More complex, but well understood" on https://gist.github.com/mightybyte/6c469c125eb50e0c2ebf4ae26b5adfff , and are on the list of default extensions in https://lexi-lambda.github.io/blog/2018/02/10/an-opinionated-guide-to-haskell-in-2018/ , and are in at least one of https://www.schoolofhaskell.com/user/PthariensFlame/guide-to-ghc-extensions, https://ocharles.org.uk/pages/2014-12-01-24-days-of-ghc-extensions.html except as noted:

The union of the previous two lists is 17 items. The only extension in the top 18 items or with >= net 33% of the 2020 Haskell survey poll but not on either of these two lists is DerivingVia?.

Here's another site that makes recommendations:

https://github.com/commercialhaskell/rio#language-extensions

This list contains all of the first 12 extensions noted above, and all of the second 5 extensions noted above, except for DerivingStrategies? and TypeApplications?; and this list also has many others not on either of those lists.

Another list is GHC's GHC2021;

This list contains all of the first 12 extensions noted above except for LambdaCase? and OverloadedStrings?, and all of the second 5 extensions noted above, except for DerivingStrategies? and GADTs (but it has GADTSyntax, what is the relation between these?), and this list also has many others not on either of those lists.

GHC has an older list, glasgow-exts. glasgow-exts contains all of the first 12 extensions noted above except for LambdaCase? and OverloadedStrings? and TupleSections?, and out of the second 5 extensions noted above it only contains RankNTypes?.

According to [8], Standard Chartered's strict Haskell variant Mu has MultiParamTypeClasses?, FunctionalDependencies?, PolyKinds?, DataKinds?, TypeFamilies? (only closed!), among others, and does not have ExistentialQuantification?, Rank2Types (and higher ranks), GADTs, CPP, TemplateHaskell?.

More data on Haskell language extensions at Standard Chartered from those presentation slides:

" In our 1,794 Haskell modules, we use 435 : DeriveDataTypeable? 410 : RecordWildCards? 284 : CPP 275 : OverloadedStrings? 187 : ScopedTypeVariables? 115 : MultiParamTypeClasses? 110 : FlexibleInstances? 89 : GeneralizedNewtypeDeriving? 77 : PatternGuards? 66 : FlexibleContexts? 59 : ViewPatterns? 59 : TypeSynonymInstances? 58 : TypeOperators? 51 : UndecidableInstances? 46 : StandaloneDeriving? 44 : DataKinds? 35 : TemplateHaskell? 32 : NamedFieldPuns? 31 : ExistentialQuantification? 30 : TypeFamilies? 30 : DeriveGeneric? 29 : ForeignFunctionInterface? 21 : TupleSections? 21 : FunctionalDependencies?

In our ~26,000 Mu modules, we use 7975 : RecordWildCards? 2342 : MultiParamTypeClasses? 1812 : DataKinds? 1527 : ViewPatterns? 975 : Recursion 787 : LambdaCase? 651 : GeneralizedNewtypeDeriving? 552 : NamedFieldPuns? 359 : KindSignatures? 255 : TypeFamilies? 252 : OverloadedLabels? 195 : PartialTypeSignatures? 188 : DeriveDataTypeable? 178 : FunctionalDependencies? 72 : PolyKinds? 66 : DeriveDefault? 60 : SignatureSections? 52 : TypeOperators? 44 : OverloadedStrings? 18 : FlexibleContexts? 17 : ScopedTypeVariables? 17 : DuplicateRecordFields? 13 : OverloadedRecordFields? 13 : ForeignFunctionInterface? "

Retrospectives:

Best practices, libraries, tooling:

Respected exemplar code:

Popularity:

Comparisons:

    Typeclasses (in Rust they are Traits)
    Sum types (you may take this for granted, but a lot of languages don’t have them….)
    Pervasive pattern matching
    Hindley-Mindler type inference (automatic type inference for variables)
    Pervasiveness of things being expressions rather than statements
    Parametric Polymorphism
    The feeling that once your program compiles, it will run" -- [21] 

Opinions

Gotchas

Misc

Features

Composition (.) and dollar sign ($) in std library

http://stackoverflow.com/questions/940382/haskell-difference-between-dot-and-dollar-sign

Tutorials

Haskell tutorials: foldr, foldl, foldl'

Books

expressive typechecking

People in Haskell often claim that the expressiveness of Haskell's type system allows them to use it to represent safety constraints that other languages cannot represent. (ppl using eg ML or F# often make similar claims).

I know a bit of Haskell, but not enough to fully explain or assess these claims.

Some claims i've seen along these lines:

Typechecking impurity

One thing that Haskell does is put monads into the type signature. This allows you to typecheck impurity (side-effects and non-determinism), where by 'impurity' i mean a situation where a function either reads state other than through its input parameters (for example, reading a line of input from the user, or getting a random number from a random number generator, or using a global variable), or alters state other than through its return value (for example, printing out a line of output, writing to a logfile, getting a random number from a random number generator, or changing a global variable). In essence, you can look at impurity as out-of-band reading-from or writing-to global state. Haskell allows you to divide this 'global state' into named pieces, and then to characterize in the type signature exactly which pieces of state each function is allowed to access (and prohibiting any impurity which is not explicitly called out in the type signature). For example:

http://learnyouahaskell.com/for-a-few-monads-more#state

This is said to make programs less error-prone by making it absolutely clear to someone reading (or debugging) a piece of code all the ways in which one function could possibly affect the rest of the program.

Tooling

Internals and implementations

Core data structures: todo

Number representations

Integers

Floating points

note: "what looks like a floating point literal is actually a rational number" -- http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-3.html

todo

array representation

variable-length lists: todo

multidimensional arrays: todo

limits on sizes of the above

string representation

Representation of structures with fields


todos

here's a long excerpt from an email i wrote to a friend:

I'd say people mean at least three different things when they talk about functional languages. Sorry if the following is stuff you already know.

First, functional languages are '(pure function)-centric'. that is, one oriented around the mathematical concept of pure functions, which is a fancy way of saying that functions in this language just behave like math functions; they are given an input and they deterministically calculate an output, and they don't otherwise rely upon or affect the state of the caller, or interact with the user (e.g. they have no side effects). Another word for this is 'referential transparency', which is when you can substitute any function call with the value returned from the function without changing semantics. Another term used for this is 'purely functional code'. Purely functional code is 'stateless' in the sense that no external state is being read or written to; all you need to know to determine what a pure function will return is the values you pass in to it as function parameters; and all that a pure function does is give you a return value. Languages that encourage pure functions/referential transparency also tend to encourage immutable data, because mutating a variable is a side-effect.

Second, 'functional programming language' is also sometimes used to mean a '(first class function)-centric programming language', that is, a language in which functions are first class objects. Functions which operate on functions are called higher-order functions, and such a language is often oriented around expressive higher-order functions, often including partial functions. Some languages with higher-order functions have currying, which may be considered 'more functional' because it eliminates the distinction between higher-order functions and multiargument fns.

F# seems to meet all parts of both definitions above. The different uses are probably because functional languages are historically based on lambda calculus, as opposed to Turing machines, and lambda calculus satisfies both of those.

The third definition is archaic and is no longer used (but if you're curious, John Backus created some languages (FP, FL) that he called 'function-level languages', which appears to mean restricting oneself to programs generated by combining a set of primitive functions in a point-free style (I think this is slightly more restrictive than modern functional languages but i don't really understand so i'm not sure); i think those languages are dead but the later Iverson array languages, such as K, are successors).

The main advantage of functional languages comes from the emphasis on pure functions/referential transparency. It is claimed that many bugs come from overuse of side-effects because the programmer has to think about how state evolves as the program executes. It is claimed that because side-effects cause 'hidden state' (eg you can't tell just from looking at a function's signature which external state it reads from or writes to), there is easily-forgotten coupling between different functions.

For example, if function f(x) has the side effect of mutating a global variable, and if the output of function g() depends upon that variables, then in the code "a = f(x); b = g(y)", the value assigned to 'b' depends on what f(x) did, even though you can't see this dependency by looking at the function calls to f() and g() (just looking at the calls, you might assume that the value of 'b' will have nothing to do with f()). In this case it's easy to remember that f() might influence g() because they are right next to each other, but you can imagine that in a large program written by a large team, f() could have been called from a very distant part of the code, making it easier to forget.

You might get the idea that functional programming languages are completely purely functional, but this is not correct; taking input from a user, or printing output to the screen, are 'impure' operations (the first reads from state, and the second writes to state), yet all serious functional programming languages support these operations. The difference is just that functional programming languages encourage purely functional code where possible. As a guy once said, "Purely Functional is the right default" ( https://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf ).

Another way to put it is that functional languages attempt to make state explicit rather than hidden, and to provide various mechanisms for the programmer to 'control' (as in, to limit) state.

This is one of the practical uses of Haskell's infamous monads. In Haskell, a function that reads from or writes to state must say as much in its type signature. State can be divided up into different regions (or 'monadic values'), and function type signatures can indicate which region of state they access. The advantage is not only making state explicit, but also enabling the static type checker to verify that there isn't any undeclared access to state.

It is claimed that this is good for concurrent programming, because it is claimed that a lot of concurrency bugs are from the programmer forgetting or getting confused about state. In fact, if there is no state at all, that is, if everything is purely functional, then i believe the program can be parallelized while guaranteeing correctness. See https://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf for a little about this (and a lot about many other things).

Some people also say that maybe the compiler can auto-parallelize purely functional code, because of the above guarantee. However, because forking and merging and communicating between threads is expensive, if you parallelize short or densely communicating operations it can actually be slower than serial. i think the reality is that, the today's compilers can't guess which operations will experience a net gain in speed by parallelizing and which won't. So the programmer still has to get involved.

I hear that F# allows undeclared side effects. Which means that you'd lose the benefits of this static typing discipline.

Haskell actually goes a little bit further than this. One way of thinking of it is that Haskell programs are written as if they are actually, completely, purely functional; stateful interactions are accomplished by having functions return 'actions' (my term, they would say 'values in the IO monad') which are data values that represent a stateful interaction. These 'actions' are in essence impure computer programs. The main() function of Haskell programs builds up, in a completely pure manner, a very complicated 'action', and returns it. This action represents the actual program which is executed. (however, due to lazy execution, this action is actually not pre-built, but rather built 'on-demand'). I found that to be a bit backwards and confusing, but good for learning about purely functional programming.

Another thing Haskell offers that F# may or may not is (pervasive) laziness. Lazy evaluation means that the computer doesn't evaluate a function when it is called, but rather, only when it's return value is needed. Sections 4 and 5 of this paper give detailed examples where laziness is useful: https://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf . To summarize section 5, imagine you are writing a chess-playing program. You're going to make the program lookahead and consider some possible moves in the tree of possibilities. Mathematically, the entire tree of possibilities is a well-defined object, but but you aren't going to enumerate the whole tree because that would take too much time and memory. So you need to have a 'search strategy' to decide which possibilities to consider. The way you would ordinarily write this program (without laziness), the search strategy code would be intertwined with the code defining the tree in a mathematical sense. With laziness, you can define the whole tree upfront the same way you do in math, but the tree will not be materialized at the time of its definition. Then you can keep your search strategy code separate, just passing your search strategy as in input the (unmaterialized) return value of the tree-construction code.

Laziness also lets you simply deal with infinite structures by defining them the way you would in math, secure in the knowledge that the runtime won't try to actually compute all of them. Eg the fibonacchi sequence can be defined:

fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2)

but it could also be defined:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

without laziness, 'tail fibs' would cause a problem since fibs is an infinite sequence.

There are various languages which offer the ability to create lazy lists, but without laziness being pervasive, you are afraid to pass your large lazy list to arbitrary library functions from libraries whose source code you haven't read, because that library might do something which would materialize your big (or infinite) lazy list. In Haskell that would be considered a bug, so it's less likely that a library will do that to you.

Laziness, at least the way Haskell does it, has some big disadvantages, too, though. It becomes harder to reason about memory performance. And, at least initially, you have to get into the habit of not unintentionally materializing any lazy lists you are passed, which means learning a lot of little patterns.

Haskell also offers some other things that i dunno if F# does or doesn't have:

Here's what some other ppl think. I havent read these:

People claim that the expressive type system in Haskell lets many bugs be avoided statically by "making illegal states unrepresentable":

Variants and variant implementations

GHC

The most popular Haskell implementation.

Eta

http://eta-lang.org/

"Eta is a dialect of Haskell that aims to bring the benefits of Haskell to the JVM, while supporting the vast majority of GHC extensions so that packages on Hackage can be used with little modification."

Gofer

http://web.cecs.pdx.edu/~mpj/goferarc/index.html

"Gofer (GOod For Equational Reasoning) is a subset of Haskell 1.2. Very small, very fast, but no GC." [53]

Mu

A strict Haskell variant at Standard Chartered

https://icfp21.sigplan.org/details/hiw-2021-papers/14/Haskell-reinterpreted-large-scale-real-world-experience-with-the-Mu-compiler-in-Fin


Misc Links