tutorials-haskell

Haskell tutorial

by Bayle Shanks

some code samples are from __Yet Another Haskell Tutorial__ which is by Hal Daume III

I'm writing this after I read some tutorials myself but before actually programming anything in Haskell -- it's a "blind leading the blind" situation. So this tutorial is probably filled with errors and misunderstandings. After I get some experience with Haskell I might come back and check it over and then remove this warning, if I ever get around to it

Preface

This (currently unfinished) tutorial aims to be short yet to cover topics through basic monads.

Much of the bulk of the text consists of short code examples and interpreter traces.

The intended reader has programming experience, but only in imperative languages, and wants a quick introductory tour of Haskell. The tutorial is not intended to be difficult, however in the interest of brevity many things are demonstrated without much explanation.

It does not aim to be comprehensive, so after reading it, interested parties may want to read a longer tutorial to get the details that are skipped here. Hopefully, after reading this tutorial, you will be able to breeze through a longer one.

Copyright

Copyright 2008 Bayle Shanks. You may copy this document under the terms of either:

whichever you prefer (but not in any other circumstances without prior permission).

Note to people reading this on the wiki

(ignore this if you are reading the PDF). I'm using EasyLatex? to compile the wiki source of this page into a PDF. So don't mind the occasional LaTeX? commands. For example:

\usepackage{times} \usepackage[margin=3.7cm]{geometry}

\newpage


Hello World


As of this writing, the most popular implementation of Haskell seems to be ghc, the Glasgow Haskell Compiler. The GHC project also produces ghci, an interpreter. In this tutorial, we'll use ghci (although of course most of the things that we'll talk about are general features of the language that will be in any implementation). So go install ghci now.

When you start ghci, you get an intro banner and then a prompt:

Prelude> 

To quit, type Cntl-D.

Start ghci again. You can enter Haskell directly in the interpreter, for example:

Prelude> print "Hello World"
"Hello World"
Prelude> 

If you want to put this in a file, here's how. Each file is a module. The module name should be the same as the file name, except that the file name should have an .hs at the end and the module name should not.

So, create a file called Test.hs and put this into it:

module Test where
helloWorldFunction = print "Hello world"

To load a module in ghci, you say ":l modulename". So, to load module Test and then evaluate helloWorldFunction, you do:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> helloWorldFunction
"Hello world"
*Test> 

Note that prompt changes from "Prelude" to "*Test". If you later change the sourcecode file and want to reload it, you can use :r, which reloads the current module.

In the previous example, the reason that there are quotes around "Hello world" is that the "print" command prints a representation of a value that is useful for debugging. If you just have a string that you want to send to the console directly, you use putStrLn. For example:

Prelude> print "Hello world"
"Hello world"
Prelude> putStrLn "Hello world"
Hello world
Prelude> 

If you want to compile a standalone executable, you need to make a Main module. Put the following into Main.hs:

module Main where
main = putStrLn "Hello world"

Now, at your operating system's command line, use ghc to compile:

$ ghc --make Main.hs -o main
$ ./main
Hello world

\newpage


Basic syntax


Haskell is **case-sensitive**

In addition,

  1. The names of types, typeclasses, and modules must begin with an upper-case letter.
  2. The names of infix functions must be composed of punctuation symbols.
  3. The names of almost1 everything else must begin with a lower-case letter.

Types

In Haskell, every expression has a type2. Various functions and operators have various restrictions on what sort of types they can operate on. If you try to use incompatible types, you'll get a compile time error.

If you are curious about the type of some expression in Haskell, you can use ghci's :t command to find out. For example:

Prelude> :t 'c'
'c' :: Char
Prelude> :t "a string"
"a string" :: [Char]

Later on we'll talk about how to read the notation for expressing the types of things in Haskell. Until then, don't worry about it.

At this point you might be worried that you'll spend a lot of time declaring the types of things, so I should mention that Haskell doesn't usually require you to declare the types of your variables and functions. Haskell has a powerful system of type inference that guesses the types of almost everything automatically.

A disadvantage of using a powerful type inference system is that it makes type error messages harder to interpret. For example, let's say that you have three expressions, call them A,B,C. Let's say that you give expression A the wrong type. Let's say that you construct expression B out of expression A, and then in a very different part of the program, you refer to expression B in expression C. Because you made a mistake with expression A, Haskell might infer the wrong type for expressions B and C. Perhaps the error will only surface when it gets to expression C. In this case, the error message will refer only to expression C, and you'll have to figure out that the root of the problem was really with expression A.

Defining functions

To define a function in a source code file, write something like:

functionName argument1 argument2 = expression

For example:

plusFive a = a+5

Inside ghci, you have to put the word "let" at the beginning of the function definition:

Prelude> let plusFive a = a+5
Prelude> plusFive(10)
15

Functions you define are prefix by default. We'll talk about how to define infix functions later.

Calling functions

To call a prefix function, just write the function, and then a space, and then the argument. If there are multiple arguments, just separate them by spaces. For example:

Prelude> let addition a b = a+b
Prelude> addition 2 3
5

__do__ blocks

In Haskell, if you want to execute a sequence of instructions, you can't just put them one after another, unless they are within a do. For example, try putting this incorrect code into Test.hs:

module Test where

test = print "First line."
       print "Second line."

It won't compile in ghci (don't worry about trying to understand this error message3):

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:2:7:
    Couldn't match expected type `(a -> IO ()) -> [Char] -> t'
	   against inferred type `IO ()'
    In the expression: print "First line." print "Second line."
    In the definition of `test':
	test = print "First line." print "Second line."
Failed, modules loaded: none.

The problem is that in Haskell, a function is just __a single expression__. So in a sense, a whole Haskell function is analogous to just a single line of code in other languages.

To execute a sequence of instructions, you have to wrap them in a do block. Example in ghci:

Prelude> do {putStrLn "First line."; putStrLn "Second line."}
First line.
Second line.

Example as a source code file (to be placed in Test.hs):

module Test where
test = do {putStrLn "First line."; putStrLn "Second line."}

A complete do block is itself a single expression that returns a single value (which is why you can have a whole do block inside a function, even though functions must be single expressions).

Under the hood, do blocks are actually syntactic sugar for something pretty complicated involving monads. We'll talk more about the deeper meaning of do blocks later. For now, one more thing to note is that a do block that contains I/O has a return value of type 'IO something' (where the 'something' varies).

Lack of destructive variable updates

You don't have mutable variables in Haskell.

Put the following incorrect code into Test.hs:

module Test where
x = 1
x = x+1

Now try loading it in ghci:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:3:0:
    Multiple declarations of `Test.x'
    Declared at: Test.hs:2:0
		 Test.hs:3:0
Failed, modules loaded: none.

To accomplish the same effect, you have to have a different name for the so-called variable at each stage. Put this into Test.hs:

module Test where
x = 1
x2 = x+1

Now it works:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> x2
2

You can see that what you are really doing here is not declaring "variables", but defining functions, just like the "addition" function in the example above. There are still variables in Haskell, in the form of function arguments; but these are just like the variables found in mathematics -- their value doesn't change over the course of the computation.

However, for me, the word "variable" in the context of a programming language is so bound up the idea of mutable variables that I like to tell myself this: in Haskell, there are no "variables", because nothing "varies"; there are just functions, each with a fixed4 definition.

I'll say that again. In the above code, x and x2 do not correspond to memory locations which you can read and write. x and x2 are names of functions that you have defined.

How can this be?

Now, you're thinking, how are we going to program without mutable variables? Well, as far as I know (todo), there are 3 ways in which variables are used in typical imperative programming languages:

  1. Cases in which the same variable is assigned different values on different lines of code.
  2. Function arguments.
  3. Things that change as you progress through loop iterations.

I'll treat each case in turn.

In the first case, these variables don't really have to be mutable -- you could always5 give the variable a new name on each line of code that assigns to it6.

An example of the first case, in Python:

import time

toOutput = "The time is "
toOutput = toOutput + time.strftime('%H:%M')
toOutput = toOutput + " right now"
print toOutput

In this situation, toOutput can be replaced by three different variables:

import time

toOutput1 = "The time is "
toOutput2 = toOutput + time.strftime('%H:%M')
toOutput3 = toOutput + " right now"
print toOutput3

The second case is function arguments. Function arguments aren't usually considered "mutable variables", but they do take on different values each time the function is called. Well, we do have function arguments in Haskell, so there's no problem here.

The third case are things that change as you progress through loop iterations. Well, that's easily taken care of. In Haskell, you're not allowed to loop. Feel better?

That's a half-truth. In Haskell, you replace loop constructs with recursion. Each iteration is replaced with a new call to the function. So, you can still go through the same section of code over and over again while incrementing the equivalent of a loop counter -- it's just that the loop counter is a function argument, rather than a local variable, and that each iteration is a separate function call. I'll give an example later.

By the way, you can put some nonalphanumeric characters into your function names. Some people like to use the apostrophe (for example, "x'" -- read "x prime") to indicate to the reader that a bunch of functions should be considered to form sequence. Example:

module Test where
x = 1
x' = x+1

You can redefine functions within __do__ blocks

To reassign a name within a do block, use something like "let {x=1}". Example:

Prelude> do {let {s="Line 1"}; putStrLn s; let {s="Line 2"}; putStrLn s;}
Line 1
Line 2

However, this doesn't mean that we have mutable variables. The following code enters an infinite loop when Haskell tries to evaluate "x=x+1":

Prelude> do {let {x=1}; print x; let {x=x+1}; print x;}

You can redefine functions within ghci

You can also reassign things within the ghci interpreter using let. You might think of the things that you enter in the interpreter as being within an implicit do block7. Example:

Prelude> let x = 3
Prelude> x
3
Prelude> let x = 4
Prelude> x
4
Prelude> let f a b = a+b
Prelude> f 2 3
5
Prelude> let f a b = a*b
Prelude> f 2 3
6

Using whitespace as shorthand for \{\} and ;

You can either enclose blocks in curly brackets and delimit lines with semicolons, or you can use whitespace. If you use whitespace, just indent things that you would have put into curly brackets.

In the ghci interpreter, you cannot use whitespace, you must use only "{}" and ";". You can only use whitespace in source code files.

Here is an example using "{}" and ";":

module Test where
test = do {putStrLn "First line."; putStrLn "Second line."}

And here is the equivalent using whitespace:

module Test where
test = do 
  putStrLn "First line."
  putStrLn "Second line."

The rules for interpretation of whitespace in Haskell are sometimes called the rules of "layout".

Generally for the rest of this tutorial when I present code samples, I'll use the whitespace form, as if it was sitting inside a source code file. Sometimes, however, when I want to show you what happens when you evaluate something, I'll show you a ghci trace and put in some commands in the form that uses {} and ;.

Comments

Examples:

x = 3    -- this is a single line comment

    {- this is a 
          multiline
  comment -}

Module import

To import a module in a source code file, use the import statement. For example, to import module Random:

import Random

To import a module within the ghci interpreter, use the :m command. For example, to import module Random:

Prelude> :m Random
Prelude Random> 

Debugging within ghci

You can set breakpoints within do blocks by using the ghci along with the breakpoint command, which is in the GHC.Base module:

module Test where
import GHC.Base

main = do let a = 3
          let b = a*2
          breakpoint $ do
          print a+b

If you are using ghci 6.8.1 or later, you can use the more powerful debugging facilities included in ghci. See http://haskell.org/ghc/docs/6.8.1/html/users_guide/ghci-debugger.html for details.

Some special values: True, False, ()

True and False are the result of boolean expressions.

Another special value in Haskell is the "unit" value, written:

()

This is sometimes used to mean something like what is called "null" in other languages (in Haskell, null is used for something else; it's a boolean function that tells you if a list is empty).

Quick note on numerical negation and order of operations

By the way, sometimes you run into trouble because you expect the negation function ('-') to bind very tightly to numbers, but it doesn't. Just use parenthesis. Example:

Prelude> abs 3
3
Prelude> abs -3

<interactive>:1:5:
    No instance for (Num (a -> a))
      arising from the literal `3' at <interactive>:1:5
    Possible fix: add an instance declaration for (Num (a -> a))
    In the second argument of `(-)', namely `3'
    In the expression: abs - 3
    In the definition of `it': it = abs - 3
Prelude> abs (-3)
3

When we tried to do abs -3, what happened was that it was parsed as (abs -) 3. Oh well.

if/then/else

Example8:

sgn x =
    if x < 0
      then -1
      else if x > 0
        then 1
        else 0

An else is always required in Haskell.

Both the then expression and the else expression have to return compatibly-typed values (so that the overall expression has a sensible return type).

For example, if you put this incorrect code into Test.hs, then it won't compile:

badIf x = if x < 0
          then 1
          else "this is a string"

Here's the error you get:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:10:15:
    No instance for (Num [Char])
      arising from the literal `1' at Test.hs:10:15
    Possible fix: add an instance declaration for (Num [Char])
    In the expression: if x < 0 then 1 else "this is a string"
    In the definition of `badIf':
	badIf x = if x < 0 then 1 else "this is a string"
Failed, modules loaded: none.

What the compiler is basically saying in this error is:

"I noticed that the type of the then clause was a number ('Num'), so I assumed that the type of the whole if/then/else clause was a number. Then I noticed that the type of the else clause was a string ('[Char]'). But a string is not a type of number ('No instance for (Num [Char])'). So the type of the else clause is not compatible with the inferred type of the whole if/then/else clause."

if/then/elses and dos

If you do an if/then/else inside a do block, and you want to do a sequence of actions inside the if/then/else, then you have to open a new do block.

For example, the following code is incorrect:

main = do
         let a = 3
         putStrLn "Testing if a == 3..."
         if a == 3
           then 
             putStrLn "Yep, it is 3."
             putStrLn "Good."
           else 
             putStrLn "Nope, a is not 3."
             putStrLn "That's impossible."

This is what you have to do:

main = do
         let a = 3
         putStrLn "Testing if a == 3..."
         if a == 3
           then do
             putStrLn "Yep, it is 3."
             putStrLn "Good."
           else do
             putStrLn "Nope, a is not 3."
             putStrLn "That's impossible."

\newpage


User input


To get a line of input from the console and temporarily assign it to the symbol name "line", do this within a do block:

line <- getLine

For example:

Prelude> do {putStrLn "Enter a name"; name <- getLine; putStrLn ("Hi " ++ name)}
Enter a name
Bayle
Hi Bayle

(note: ++ is how you do string concatenation)

Introduction to the arrow operator inside a __do__

At this point you are probably wondering, "what's that <- operator for?"

The full answer involves a theoretical issue with I/O, the deeper meaning of do blocks, and monads. So I'll put off explaining it until later. For now, just pretend that name is a mutable variable, and that within the context of a do block, the '<-' operator means "assign the result of this function to that variable." Note, however, that the '<-' operator can only be used with certain functions. Functions which receive transient information from 'the outside world' are some of these functions.

The __read__ function

When you use getLine, you get back strings. If the user is actually entering something else type of thing, like a number, you'll have to explicitly convert the string to that other type. So, the following is incorrect code:

isUserInputGreaterThan3_bad = 
    do
      input <- getLine;
      if input > 3
        then putStrLn (input ++ " is greater than 3")
        else putStrLn (input ++ " is NOT greater than 3")

The error you get is:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:6:17:
    No instance for (Num String)
      arising from the literal `3' at Test.hs:6:17
    Possible fix: add an instance declaration for (Num String)
    In the second argument of `(>)', namely `3'
    In the predicate expression: input > 3
    In the expression:
	if input > 3 then
	    putStrLn (input ++ " is greater than 3")
	else
	    putStrLn (input ++ " is NOT greater than 3")
Failed, modules loaded: none.

A function that parses user input and turns strings into other types of values is (for example, numbers) is read. This is the thing you need to fix the above example. For example:

isUserInputGreaterThan3 = 
    do
      input <- getLine;
      let number = read(input)
      if number > 3.0
        then putStrLn (input ++ " is greater than 3")
        else putStrLn (input ++ " is NOT greater than 3")

If the user enters something that isn't a number, you'll get an exception that looks like:

*Test> isUserInputGreaterThan3
woah
*** Exception: Prelude.read: no parse

IMPORTANT NOTE: Note that we used let = to assign the result from read, not <-. Remember that <- can only be used with certain special functions. read is not one of them.

Another note. In the above code, it is important that we said number > 3.0 rather than number > 3. If we hadn't done that, then the function would work find when the user enters and integer, but it would crash when the user tries to enter a number with a decimal point. The reason is that doing this caused Haskell's automatic type inference system to infer that number supported 'Fractional' operations. And this in turn caused Haskell to choose a variant of the read function that returns something that supports 'Fractional' operations, which is necessary for read to be able to parse reals.

Here's the error you get if you put 3 instead of 3.0:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> isUserInputGreaterThan3
4
4 is greater than 3
*Test> isUserInputGreaterThan3
3.5
*** Exception: Prelude.read: no parse

How are you supposed to figure out the root cause of the problem for something like this? We haven't yet talked enough about types to be able to explain how to troubleshoot this type of problem, but we'll come back to it later.

The __show__ function

The show function is sort of the opposite of the read function. The show converts various non-string data types to strings:

Prelude> 2
2
Prelude> show 2
"2"
Prelude> "string" ++ 2

<interactive>:1:12:
    No instance for (Num [Char])
      arising from the literal `2' at <interactive>:1:12
    Possible fix: add an instance declaration for (Num [Char])
    In the second argument of `(++)', namely `2'
    In the expression: "string" ++ 2
    In the definition of `it': it = "string" ++ 2
Prelude> "string" ++ (show 2)
"string2"

\newpage


Conditional I/O: a little trick


There's something tricky about doing conditional I/O in Haskell. We haven't run into this yet in the examples, but in case you try to do it yourself I wanted to mention it now. You can skip over this section and come back to it later if you want.

As previously noted, a do block that contains I/O has a return value of type 'IO something' (where the 'something' varies). Since the return type of both the then and the else must be compatible, this means that if you do I/O in one condition and not in the other, you'll get an error about incompatible types.

Symptoms of the problem

For instance, the following incorrect code causes an error:

conditionalPrint bool str =
    if bool
       then print str
       else ()

Here's the error that you get:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:12:12:
    Couldn't match expected type `IO ()' against inferred type `()'
    In the expression: if bool then print str else ()
    In the definition of `conditionalPrint':
	conditionalPrint bool str = if bool then print str else ()
Failed, modules loaded: none.

What the compiler is saying is something like:

"I noticed that the type of the then clause was IO (), so I assumed that the type of the whole if/then/else clause was IO (). So I expected that the type of the else clause would be IO (), too. But when I calculated the type of the else clause, it turned out to be (). And these two types don't match."

Solution

To fix this (at least in the case of print, which returns type IO ()), you can use the following weird-looking expression, which generates a type of IO () without actually doing any I/O:

( (do {return ()})::IO () )

We'll explain this... later. Here's an example of it in use:

conditionalPrint bool str =
    if bool
       then print str
       else ( (do {return ()})::IO () )

And here's how it looks when you run it (assuming you put it into Test.hs below an appropriate module header):

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> conditionalPrint True "hi"
"hi"
*Test> conditionalPrint False "hi"
*Test> 

\newpage


How to loop using recursion


Here's the analog of a for loop. The following function prints out a count from a to b:

countFromAtoB a b = 
		if a<b
		   then do
		   	print a
		   	countFromAtoB (a+1) b
		   else 
		   	print a

Here's how it looks (assuming that you put the above code into Test.hs, following an appropriate module declaration):

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> countFromAtoB 3 5
3
4
5

You might be worried that if you did a loop with many iterations this way, the stack would overflow, because as the number of iterations increased, the recursion depth would increase, and the call stack would have to hold more and more stack frames. However, compilers for functional languages notice when the very last operation that a function does is a recursive call to itself, and they convert these structures to iteration in the compiled code. This is called "tail call optimization"9. So don't worry.

Here's the analog of a while loop:

whileUserDoesntEnterQ = 
                do
                  putStrLn "Enter Q to quit"
                  input <- getLine;

                  if (input == "Q")
                    then print "Bye"
                    else whileUserDoesntEnterQ

\newpage


Example: What number am I thinking of?


Now that we know how to do if/then/else, basic console input and output, and loops using recursion, we can do this example10:

module Main where

import Random

main = do
  num <- randomRIO (1, 100)
  putStrLn "I'm thinking of a number between 1 and 100"
  doGuessing num


doGuessing num = do
  putStrLn "Enter your guess:"
  guess <- getLine
  let guessNum = read guess
  if guessNum < num
    then do 
          putStrLn "Too low!"
          doGuessing num
    else if read guess > num
            then do 
                  putStrLn "Too high!"
                  doGuessing num
            else do 
                  putStrLn "You Win!"

Note that randomRIO is one of those functions that uses the <- operator. Since the result of this function will be different each time you call it, you can think of it as a function which returns transient information about some 'outside world'.

\newpage


Basic data structures and functions


Tuples

Tuples are fixed-length, ordered collections of hetrogeneous elements.

To group two values together into an ordered pair, surround them with parentheses, for example11:

Prelude> (5,3)
(5,3)

The values in a pair don't have to be the same type:

Prelude> (5, "hello")
(5,"hello")

To extract the first and the second values from a pair, you could use the predefined functions fst and snd12:

Prelude> fst (5, "hello")
5
Prelude> snd (5, "hello")
"hello"

In addition to pairs, you can construct triples, quadruples, etc13:

Prelude> (1,2,3)
(1,2,3)
Prelude> (1,2,3,4)
(1,2,3,4)

In general, pairs, triples, quadruples, etc are called tuples.

Tuples are flexible because they can put together different data types. They are not equipped to (easily) change their length, however; if you want a variable-length collection of elements, you may be better off with a list.

Note: the functions fst and snd only work on pairs, not triples, etc. We'll see how to extract values from longer tuples later, when we talk about "pattern matching".

Lists

Lists are variable-length, ordered collections of homogenous elements.

To construct a list, surround the elements of the list with square brackets:

Prelude> [5,3,2]
[5,3,2]

To add an element to the front of a list, use a colon (called a "cons" operator, short for "construct" as in "we are constructing a list"):

Prelude> 7:[5,3,2]
[7,5,3,2]

You can build any list by starting with the empty list (written []) and then adding in the elements one by one:

Prelude> 7:5:3:2:[]
[7,5,3,2]

You can make lists of tuples, lists of lists, lists of functions, or whatever. However, the elements of a list must all be of the same type.

Strings are just lists of characters14:

Prelude> 'H':'e':'l':'l':'o':[]
"Hello"

Some basic list functions are:

take takes 2 arguments, an integer and a list. It returns a new list with the first n elements of the list. Example:

Prelude> take 3 [3,6,2,5]
[3,6,2]

filter takes two arguments, a boolean function and a list. It returns another list. It applies the boolean function to each element of the list and only those elements which pass the test (that is, those elements for which the function returns True) get added to the return list. For instance15:

Prelude> filter Char.isLower "Hello World"
"elloorld"

map takes two arguments, a function and a list, and returns another list. It applies the function to each element of the input list, and then adds the result of the function to the output list. For instance16:

Prelude> map Char.toUpper "Hello World"
"HELLO WORLD"

If the elements of a list are a type that is __enumerable__ (that is, if their type is a member of the Enum typeclass.. which will be introduced later), then you can use .. as a shorthand when constructing a list, as in the following example. Int and Char (integers and characters) are examples of enumerable types.

Prelude> [1..10]
[1,2,3,4,5,6,7,8,9,10]

To specify a step other than 1:

Prelude> [1,3..10]
[1,3,5,7,9]
Prelude> [1,4..10]
[1,4,7,10]

Example of zip:

Prelude> zip [1,2,3,4,5] ['a','b','c','d','e']
[(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')]

There is some syntactic sugar for using map and filter on lists. This construct is called a list comprehension. It looks like this:

[expression containing x | x <- inputList, g x]

(except you can use any name in place of x) and it evaluates to:

map (a function representing 'expression containing x') (filter g inputList)

For example17:

Prelude> map toLower (filter isUpper "Hello World")
"hw"
Prelude> [toLower x | x <- "Hello World", isUpper x]
"hw"

You can put more than one filter condition in the list comprehension:

Prelude> let li = [(0,0), (0,1), (1,0), (1,1)]
Prelude> [(x, "hello") | x <- li, fst x == snd x, fst x < 1]
[((0,0),"hello")]

List comprehensions can also be used to combine multiple lists in complex ways. Suppose18 you want to create a list of pairs, one for each point where 0 < x < 2, 0 < y < 3, and x <= y.

Prelude> [(x,y) | y <- [0..5], x <- [0..5], x <= 2, y <= 3, x <= y]
[(0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3),(2,3)]

Latter comma-separated clauses can use the names assigned in former clauses:

Prelude> [(x,y) | x <- [0..2], y <- [x..3]]
[(0,0),(0,1),(0,2),(0,3),(1,1),(1,2),(1,3),(2,2),(2,3)]

The order of the comma-separated clauses in the second part matters:

Prelude> [(x,y) | y <- [x..3], x <- [0..2]]

<interactive>:1:15: Not in scope: `x'
Prelude> [(x,y) | y <- [0..3], x <- [0..y], x <= 2]
[(0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3),(2,3)]

Note that the order of the elements in the resulting list is different from the way it was when x <- came before y <-.

Exercises

Exercise 119 Use map to convert a string into a list of booleans, each element in the new list representing whether or not the original element was a lower-case character. That is, it should take the string "aBCde" and return [True,False,False,True,True].

Exercise 220 Use the functions mentioned in this section (you will need two of them, plus Char.isLower) to compute the number of lower-case letters in a string. For instance, on "aBCde" it should return 3.

Exercise 321 Write a function that takes a list of pairs of length at least 2 and returns the first component of the second element in the list. So, when provided with [(5,'b'),(1,'c'),(6,'a')], it will return 1.

Some simple functions

\newpage


More syntax


Temporary function definitions with __let__ and __where__

You can temporarily define a symbol using let ... in. Example22:

roots a b c =
    let det = sqrt (b*b - 4*a*c)
    in ((-b + det) / (2*a),
         (-b - det) / (2*a))

You can define multiple symbols at once like this23:

roots a b c =
    let det = sqrt (b*b - 4*a*c)
        twice_a = 2*a
    in ((-b + det) / twice_a,
         (-b - det) / twice_a)

Another construct which is like let except that it goes in different places is where. where goes after the block of code rather than before it24 25:

roots a b c =
    ((-b + det) / (2*a), (-b - det) / (2*a))
    where det = sqrt(b*b-4*a*c)

You can use let or where to temporarily redefine something which was already defined outside of the code block. Outside of the code block, the symbol goes back to its previous definition:

x = 1

y = let x = 2 in
    x*2

z = x*2

In that piece of code, you end up with x=1, y=4, z=2:

Prelude> let x=1
Prelude> let y=let {x = 2} in x*2
Prelude> let z=x*2
Prelude> x
1
Prelude> y
4
Prelude> z
2

You can have both a let and a where surrounding the same piece of code:

z = 
    let x = 2 in
    x*y
    where y = 3

In that example, z=6.

If you tried to define the same symbol in both the let and the where clause, the let clause takes precedence. But don't do that.

Another name for this sort of "temporary definition" is "local binding". Another name is "temporary declaration". When a local definition takes precedence over a definition in an outer scope, it is said that the local definition "shadows" the other definition.

Since (outside of a do block, Haskell doesn't have26 sequential statements, let and where often substitute for what in other languages would be a list of sequential variables assignments. Example:

vectorLength x y z =
  let
    xSq = x^2
    ySq = y^2
    zSq = z^2
    sumOfSquares = xSq+ySq+zSq
    ans = sqrt(sumOfSquares)
  in ans

\newpage


Example: Search for a path between two nodes in a graph


In this example, edgeList represents the list of edges in a directed graph, represented as a list of pairs, where the first element of each pair is the edge source, and the second element is the target.

For example, [(0,1), (0,2), (2,3), (3,4)] represents the graph with vertices 0,1,2,3,4 which looks like:

\begin{graph} rankdir = "LR"; size="2,2" 0 -> 1 0 -> 2 2 -> 3 3 -> 4 \end{graph}

Here's the source code:

search edgeList src dst =
  if src == dst
    then
       [src]
    else
       searchEdgesOriginatingAtSrc edgeList src dst

searchEdgesOriginatingAtSrc edgeList src dst =
  if (null edgeList)
    then []
    else let 
      currentEdge = head edgeList
      currentEdgeSrc = fst currentEdge
      currentEdgeDst = snd currentEdge
    in
     if src == currentEdgeSrc
        then let searchResult = search edgeList currentEdgeDst dst in
           if not (null searchResult)
              then src:searchResult
              else searchEdgesOriginatingAtSrc (tail edgeList) src dst
        else searchEdgesOriginatingAtSrc (tail edgeList) src dst

Here's how it looks:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> let el = [(0,1), (0,2), (2,3), (3,4)]
*Test> search el 0 3
[0,2,3]
*Test> search el 0 4
[0,2,3,4]
*Test> search el 4 0
[]

We'll come back to this example27 a few times and rewrite it in various ways.

One alternative way that we can write this is to use a where clause to define searchEdgesOriginatingAtSrc within the scope of search, in which case we don't have to pass in the arguments src and dst:

search edgeList src dst =
  if src == dst
    then
       [src]
    else
       searchEdgesOriginatingAtSrc edgeList

  where 
    searchEdgesOriginatingAtSrc edgeList =
      if (null edgeList)
        then []
        else let 
          currentEdge = head edgeList
          currentEdgeSrc = fst currentEdge
          currentEdgeDst = snd currentEdge
         in
           if src == currentEdgeSrc
               then let searchResult = search edgeList currentEdgeDst dst in
                 if not (null searchResult)
                    then src:searchResult
                    else searchEdgesOriginatingAtSrc (tail edgeList)
               else searchEdgesOriginatingAtSrc (tail edgeList)

\newpage


Even more syntax


case expressions

Example:

dayStrExpand abbrev =
    case abbrev of
      "M" -> "Monday"
      "T" -> "Tuesday"
      "W" -> "Wednesday"
      "Th" -> "Thursday"
      "F" -> "Friday"
      "Sat" -> "Saturday"
      "Sun" -> "Sunday"
      _ -> "(INVALID DAY ABBREVIATION!)"

The underscore is the "default" rule that fires if none of the others do. In Haskell the underscore represents a wildcard that matches anything.

Just like with then and else, each expression of the case has to return compatibly-typed values (so that the overall case expression has a sensible return type).

"undefined"

There is a special value in Haskell called undefined. Example:

dayStrExpand abbrev =
    case abbrev of
      "M" -> "Monday"
      "T" -> "Tuesday"
      "W" -> "Wednesday"
      "Th" -> "Thursday"
      "F" -> "Friday"
      "Sat" -> "Saturday"
      "Sun" -> "Sunday"
      _ -> undefined

If you try to print out an undefined value, it results in an exception. Example:

Prelude> let x = undefined
Prelude> let y = x + 1
Prelude> print y
*** Exception: Prelude.undefined

Sometimes people use the symbol \bot (read "bottom"; if you are reading this in HTML, "bottom" looks like an upside-down 'T') to refer to Haskell's "undefined".

Piecewise function definitions

Like in math, you can define functions piecewise. This is much like a case statement. Example:

f 0 = 0
f 1 = 2
f 2 = 5

Note that in that example, the piecewise definition doesn't cover all of the possible cases. If you try to evaluate the function on an undefined case, you'll get an exception:

Prelude> let {f 0 = 0; f 1 = 2; f 2 = 5} 
Prelude> f 0
0
Prelude> f 1
2
Prelude> f 2
5
Prelude> f 3
*** Exception: <interactive>:1:5-29: Non-exhaustive patterns in function f

In a source code file, the different parts of the function definition don't even have to be next to each other. If you are doing this interactively in ghci, however, they all have to be in the same let, because otherwise the latter definitions overwrite the previous ones:

Prelude> let f 0 = 0
Prelude> let f 1 = 2
Prelude> f 1
2
Prelude> f 0
*** Exception: <interactive>:1:4-10: Non-exhaustive patterns in function f

Prelude> let {f 0 = 0;f 1 = 2;}
Prelude> f 1
2
Prelude> f 0
0

If multiple definitions match the same situation, the first one takes precedence:

Prelude> let {f 0 = 0; f 0 = 1}

<interactive>:1:5:
    Warning: Pattern match(es) are overlapped
	     In the definition of `f': f 0 = ...
Prelude> f 0
0

A more interesting example:

Prelude> let {f 0 = 0; f x = 1/(-x)}
Prelude> f (-10)
0.1
Prelude> f (-0.000001)
1000000.0
Prelude> f (0.000001)
-1000000.0
Prelude> f (10)
-0.1
Prelude> f 0
0.0

Guards

You can associate arbitrary conditions with each "piece" of a piecewise defined function using "guards". To do this, you write something like

functionName argument
     | condition1 = (function definition when condition1 is true)
     | condition2 = (function definition when condition2 is true)
     ...
     | otherwise = (function definition when none of the conditions are true)

where the condition1, condition2, etc. are boolean expressions. When the function is evaluated, all of the guards are tried in order until one matches.

For example:

abs x
    | x >= 0 = x
    | x < 0 = -x

or equivalently:

abs x
    | x < 0 = -x
    | otherwise = x

Pattern matching

Rather than using fst and snd to extract values from pairs, you can use "pattern matching" in the function definition instead:

Prelude> let f (a,b) = 2*a+b
Prelude> f (5,7)
17

You can do a similar thing with lists, in place of using head and tail:

Prelude> let f (x:xs) = do print x
Prelude> let g (x:xs) = do print xs
Prelude> f [5,7,3]
5
Prelude> g [5,7,3]
[7,3]

In fact, you can do this for any data type for which you have a constructor28; you simply write the constructor in the function arguments section, and when the function is called Haskell will bind the symbols used in the constructor expression to the appropriate values in the data structure.

You can combine this with piecewise function definition as a substitute for if/then/else expressions on parts of the data structure:

pairIs01Or10 (0,1) = True
pairIs01Or10 (1,0) = True
pairIs01Or10 (a,b) = False

In ghci:

Prelude> let pairTest (0,1) = True; pairTest (1,0) = True; pairTest (a,b) = False
Prelude> pairTest (0,0)
False
Prelude> pairTest (0,1)
True
Prelude> pairTest (1,0)
True
Prelude> pairTest (1,1)
False
Prelude> pairTest (1,2)
False

However, all of the possible inputs to the piecewise-defined function must share some type (later on we'll talk about how to get around this restriction using "typeclasses"29). The following code is illegal because it implies that function f should accept both pairs and lists:

f (a,b) = do print "pair"
f (x:xs) = do print "list"

Here's the error you get in ghci:

Prelude> let f (a,b) = do {print "pair"}; f (x:xs) = do {print "list"};

<interactive>:1:36:
    Couldn't match expected type `(t, t1)' against inferred type `[a]'
    In the pattern: x : xs
    In the definition of `f': f (x : xs) = do print "list"

You can also use the wildcard _ in patterns (instead of using "dummy variables" for values that you don't need; this makes the code easier to read):

fstIs10 (10,_) = True
fstIs10 _ = False

In ghci:

Prelude> let fstIs10 (10,_) = True; fstIs10 _ = False
Prelude> fstIs10 (10,0)
True
Prelude> fstIs10 (9,0)
False

\newpage


Example: Quicksort


qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ 
               [x]                     ++ 
               qsort (filter (>= x) xs)

Or, equivalently, using list comprehensions instead of filter:

qsort []     = []
qsort (x:xs) = qsort [e | e<-xs, e<x] ++ 
               [x]                    ++
               qsort [e | e<-xs, e>=x] 

I must say; out of the programming languages that I know so far, Haskell is definitely the best one for expressing quicksort.


Example: Revisiting path search in a graph


With the new syntax that we just covered, we can rewrite the path search algorithm much more concisely:

search edgeList src dst
  | src == dst = [src]
  | otherwise = searchEdgesOriginatingAtSrc edgeList

  where 
    searchEdgesOriginatingAtSrc [] = []
    searchEdgesOriginatingAtSrc ((currentEdgeSrc, currentEdgeDst):restOfEdges)
      | src == currentEdgeSrc =
        case search edgeList currentEdgeDst dst of
          []          -> searchEdgesOriginatingAtSrc restOfEdges
          successList -> src:successList
      | otherwise = searchEdgesOriginatingAtSrc restOfEdges

\newpage


Basic operations on functions


Anonymous function definitions

Sometimes you want to define a once-off function, that is, a function that will only be used once. In this case, for clarity, you may want to just define the function right where it will be used rather than defining it elsewhere in the source code. Also, you might not want to bother to give the function a name.

To do this, you write something like:

(\argument1 argument2 -> (the definition of the function))

For example, recall the addition example:

addition a b = a+b

The equivalent anonymous function would be written like this:

(\a b -> a + b)

This whole expression can then be used in place of whereever you would have used the word addition before. For example:

Prelude> (\a b -> a+b) 2 3
5
Prelude> let addition a b = a+b
Prelude> addition 2 3
5

Such nameless, once-off functions are called "anonymous" functions.

The example created an anonymous function with 2 arguments, but you can make anonymous function with just 1 argument, or with more than 2 arguments, in the same way.

They are useful as arguments to functions that take other functions as arguments, for example, map:

Prelude> map (\x -> x + 1) [1,2,3]
[2,3,4]

Converting prefix functions to infix and vice versa

You can use any prefix function as if it were an infix function, and vice versa.

To use a prefix function as an infix function, put backquotes around it. To use an infix function as a prefix function, put parentheses around it. Example:

Prelude> let addition a b = a+b
Prelude> addition 2 3
5
Prelude> 2 + 3
5
Prelude> 2 `addition` 3
5
Prelude> (+) 2 3
5

We'll be seeing a lot of this because infix functions have to be surrounded by parentheses if you are giving them as inputs to higher-order functions.

Higher-order functions

Higher-order functions are functions that operate on other functions. We've already seen some functions that take other functions as input; for instance, map and filter. Now I'll give some examples of functions that yield other functions as output.

Function composition

'.' is the function composition operator. It takes two functions as input and yields a new function as output. Specifically, the expression f . g yields a function which first does function g, and then does function f on the output of g. For example, if you have a list of numbers, and you want to first take their absolute value and then compute their square roots, you could either do it like this (without function composition):

Prelude> let li = [4.0,-9.0,-16.0,4.0]
Prelude> let li2 = map abs li
Prelude> let result = map sqrt li2
Prelude> li2
[4.0,9.0,16.0,4.0]
Prelude> result
[2.0,3.0,4.0,2.0]

Or you could do it like this (using function composition):

Prelude> let li = [4.0,-9.0,-16.0,4.0]
Prelude> let result = map (sqrt . abs) li
Prelude> result
[2.0,3.0,4.0,2.0]

The two input functions to '.' must each take a single argument.

flip

flip take a function as input and returns a new function as output. Specifically, it takes as input some function f, where f takes two arguments, and it returns a new function f' with the order of the arguments reversed (that is, (flip f) a b = f' a b = f b a). For example:

Prelude> (/) 5 10
0.5
Prelude> (/) 10 5
2.0
Prelude> (flip (/)) 5 10
2.0
Prelude> (flip (/)) 10 5
0.5

Note: for this to work, you have to put parentheses around '/'. (flip /) doesn't work30.

Partial function application

OK, this may seem a bit weird.

If a function takes a bunch of arguments, you can create a new function by giving values to some of those arguments, and leaving the rest unbound.

For example, addition takes two arguments. If we fix one of those arguments permanently at 1, and leave the argument open, then we have created a function (call it plusOne) that takes one argument, and adds 1 to that argument. Here's how to do it:

Prelude> ((+) 1) 2
3
Prelude> ((+) 1) 5
6
Prelude> let plusOne x = ((+) 1) x
Prelude> plusOne 2
3
Prelude> plusOne 5
6

(in other words, the function plusOne has the same effect as the function (\x -> x + 1) which we saw in an earlier example)

(note: remember that putting parentheses around an infix function makes it a prefix function. So, (+) is the prefix version of the + operator.)

Here's another example. In this example, we take map and fix its first argument (the mapping function) to plusOne -- but we leave the second argument open. The result is a specialized version of map -- a function which takes a list and returns the result of incrementing each element in the list by 1:

Prelude> let plusOne x = ((+) 1) x
Prelude> let incrementListElements = map plusOne
Prelude> incrementListElements [1,2,3]
[2,3,4]

We could have just used ((+) 1) instead of plusOne, since those expressions are equivalent (both of them represent a function which takes one argument and returns that argument plus one):

Prelude> let incrementListElements = map ((+) 1)
Prelude> incrementListElements [1,2,3]
[2,3,4]

You can do this with functions that take more than 2 arguments, too. Here's a function which tests to see if its third argument lies in between its first and second arguments:

Prelude> let isZInBetweenXandY x y z = (x < z) && (z < y)
Prelude> isZInBetweenXandY 0 10 5
True
Prelude> isZInBetweenXandY 0 10 15
False
Prelude> isZInBetweenXandY 0 10 (-1)
False

If we fix the first argument to 5, we get a function (call it isZInBetween5andY) which takes two arguments. The first argument of the new function is like the second argument of the original function, and the second argument of the new function is like the third argument of the original function:

Prelude> let isZInBetween5andY = (isZInBetweenXandY 5)
Prelude> isZInBetween5andY 10 7
True
Prelude> isZInBetween5andY 10 15
False
Prelude> isZInBetween5andY 10 4
False

We could also fix both of the first two arguments of isZInBetweenXandY to 5 and 10, respectively, yielding a function waiting for just one more argument (call it isZInBetween5and10):

Prelude> let isZInBetween5and10 = (isZInBetweenXandY 5 10)
Prelude> isZInBetween5and10 7
True
Prelude> isZInBetween5and10 15
False
Prelude> isZInBetween5and10 4
False

We can achieve the same result by starting with isZInBetween5andY and fixing its first argument:

Prelude> let isZInBetween5andY = (isZInBetweenXandY 5)
Prelude> let isZInBetween5and10 = (isZInBetween5andY 10)
Prelude> isZInBetween5and10 7
True
Prelude> isZInBetween5and10 15
False
Prelude> isZInBetween5and10 4
False

And there's no need to actually name the partially applied function (we just did that for readability):

Prelude> (isZInBetweenXandY 5 10) 7
True
Prelude> (isZInBetweenXandY 5 10) 15
False
Prelude> (isZInBetweenXandY 5 10) 4
False

In general, if there is a function f which takes __n__ arguments, you can give it just the first __i__ arguments, enclose the whole things in parenthesis, and you have a function which is like f but with the first __i__ arguments fixed, and which is still waiting to accept the other (__n__ - __i__) arguments.

Note that the syntax for normal ("total") function evaluation is just a special case of partial evaluation --- if you "fix" ALL of the arguments of a function, then it becomes a constant function, i.e. a value.

There's some syntactic sugar for partially applying infix functions. Example:

Prelude> (/2) 10
5.0
Prelude> (2/) 10
0.2

Thinking about the anonymous function syntax in terms of partial function application

Partial function application provides a neat way to think of the anonymous function syntax. If you know about "lambda calculus" from theoretical computer science, that's what we're about to discuss (if you haven't heard of lambda calculus, don't worry about it, we're going to explain the relevant part).

Let's say you define addition as an anonymous function:

(\x y -> x + y)

You can think of what happens when this expression encounters a value in the following way. The x y on the left (in between the backslash and the arrow) means that this expression "wants to eat" two values. When you feed it a value, it takes the leftmost variable in its "want-to-eat list", removes it from the list, and substitutes the value you fed it for that variable in the expression on the right. If the "want-to-eat list" is empty, then we remove the symbols \ -> also.

The way that you feed it a value is to put a value on its right. When it eats the value, the value disappears from the right. For instance, if you feed it a 3, here's what happens:

(\x y -> x + y) 3
= (\y -> 3 + y)

Even when we fully apply the function, we can look at it as a succession of partial applications:

(\x y -> x + y) 3 5
= (\y -> 3 + y) 5
= (\ -> 3 + 5)
= (3 + 5)

So, you see that the core of applying a function is really the substitution operation, where we substitute the arguments being given to the function for the argument symbols in the function body. Partial application just means that we substitute in for of the function arguments in the function body but also leave some other function arguments the way they were.

In this way we can see how normal function application can always be restated in terms of a succession of partial function applications. Keeping this in mind will make the type notation for functions make more sense.

\newpage


Type basics


Every expression in Haskell is assigned a type.

As we mentioned before, you can use ghci's :t command to find out the type of a given expression. The results is called a "type signature". For example:

Prelude> :t 'c'
'c' :: Char
Prelude> :t "a string"
"a string" :: [Char]

This means that the type signature of the expression 'c' is Char and the type signature of the expression "a string" is [Char].

[Char] means a list whose elements are of type Char.

The type signature of tuples is what you'd expect:

Prelude> :t ('a',"hi")
('a',"hi") :: (Char, [Char])

Types can be nested in all sorts of crazy ways:

Prelude> :t [("hello", ["goodbye"])]
[("hello", ["goodbye"])] :: [([Char], [[Char]])]

Type classes

If you ask for the type of a number, you get something unexpected:

Prelude> :t 1
1 :: (Num t) => t

This is our first encounter with a type class. Num is a type class. A number of types can be grouped together into a "type class" (the word "class" here is being used similar to the word "set" in mathematics). For any given type class, some types are part of the class, and the rest aren't.

The motivation for the invention of type classes is that a type class can be associated with a list of functions that must be implemented for each type in the class. So, type classes are a lot like interfaces in Java. If you know that a given type is part of a given type class, then you know that certain specific functions are defined with respect to data structures of that type. To give a concrete example, any type in the type class Num must support the addition operator (+). Num includes the types Int (integer) and Float, among other types. So, if you find yourself with two expressions of type Num, then you are guaranteed31 to be able to add them together.

Type classes are sort of related to "classes" in object-oriented languages, although not as closely as to interfaces. In object-oriented languages, "objects" are individual instantiations of data structures, and objects are instances of classes. By contrast, the instances of a type class are __types__.

Type classes often appear in type signatures. Here's what they mean. We saw that the type signature of 1 is (Num t) => t. The t here is just a variable; it may as well have been a or x.

1 :: (Num t) => t might be read as saying something like, "The type of 1 is some type t, where t belongs to the type class Num". What that means is that the type of 1 is undetermined except that it is constrained to be some type within the Num typeclass.

So, when you enter a numerical constant in Haskell, it enters the world not as an Int or a Float or anything else; it starts out being of undetermined type. Strange but true. However, it is constrained to be some type within the Num typeclass, which is why you are allowed to apply operations like addition to it.

Type signatures of functions

Example:

Prelude> :t not
not :: Bool -> Bool

We'll go through this piece by piece.

Next we'll look at a function that returns an output of a different type than its input:

Prelude> :t length
length :: [a] -> Int

This says that length is a function whose input must be a list of type a, where a can be any type, and whose output has type Int. So, in other words, the input to length must be a list, but it doesn't matter what type of element is in the list, and the output of length will have type Int.

Now we'll look at a function whose type signature involves a type class.

Prelude> :t abs
abs :: (Num a) => a -> a

We'll go through this piece by piece.

Putting it together, this says that abs will accept as input a single argument of any type, provided that that type is in the Num type class, and that the output of abs will be of the same type as the input.

Note that a -> a doesn't mean that abs returns the same __value__ that it was given! It just means that whatever it returns will have the same __type__ as the thing that it was given.

Type signatures of functions which take multiple inputs

Now let's look at boolean AND (&&), which is a function which takes two inputs of type Bool, and produces one output of type Bool

Prelude> :t (&&)
(&&) :: Bool -> Bool -> Bool

Weird, huh? The first thing to say about this is that there are implicit parentheses which are left out of this expression. It is assumed for type signatures that the arrows associate to the right unless there are parentheses that say otherwise. If we write the parentheses explicitly it would be:

(&&) :: Bool -> (Bool -> Bool)

Now, the key to understanding this is to think of partial function application. Imagine that we had the expression:

(&&) True False

As noted before, when a function takes multiple arguments, normal function application can always be thought of as a successive series of partial function applications. So, the above expression is equivalent to:

((&&) True) False

where ((&&) True) means the function that you get if you start with (&&) and then fix its first argument to True, leaving the second argument unbound.

So, think of how (&&) behaved right there. It started out as a function that took two arguments. But then we fed it an argument, and as a result we got a function that took only one argument.

So, (&&) can equivalently be thought of as a function that takes __one__ argument as input, and then produces another function as output.

What is the type of the function ((&&) True)? Well, this function takes one argument of type Bool, and returns something of type Bool. So, it must have type Bool -> Bool. And indeed we see that this is right:

Prelude> :t ((&&) True)
((&&) True) :: Bool -> Bool

So, going back to the original function (&&), what type is it (if we consider what happens when we partially apply just a single argument)? Well, it takes as input something of type Bool, and then it gives us back a new function which has type Bool -> Bool. So, (&&) is a function that goes from type Bool to type Bool -> Bool. So, its type is therefore:

Bool -> (Bool -> Bool)

And, remembering that the arrows are assumed to associate to the right, we see that this is equivalent to the reported type signature for (&&):

Bool -> Bool -> Bool

Generalizing our reasoning

In general, the procedure for figuring out the type of something like a function is:

  1. Figure out the type of the first argument that it takes.
  2. Say that that type was A. Now you know that the type of the whole thing is A -> X, where X is a placeholder for the type of the thing that you get when you apply the first argument.
  3. Now recurse; use the same procedure to figure out the type of X.

In the previous example, our reasoning followed that procedure:

Types of functions that take functions as arguments

Example:

Prelude> :t map
map :: (a -> b) -> [a] -> [b]

Including the implicit parentheses, this type signature could be written:

(a -> b) -> ([a] -> [b])

For example, consider

Prelude> let showInt :: Int -> String; showInt i = show i
Prelude> :t showInt
showInt :: Int -> String
Prelude> map showInt [2,5,7] 
["2","5","7"]
Prelude> (map showInt) [2,5,7]
["2","5","7"]
Prelude> :t (map showInt)
(map showInt) :: [Int] -> [String]

We can see that map takes one argument, which is a function of type a -> b (in our case a is Int and b is String), and yields another function with type [a] -> [b] (in our case, this other function is (map showInt)). This explains why map's type signature is:

(a -> b) -> ([a] -> [b])

Explicitly specifying types

In order to explicitly specify the type of some expression, write something like:

expression :: typeName

for example:

1 :: Int

is an expression for 1, with type Int.

In order to explicitly specify the type of some named function, write something like this:

square :: Int -> Int
square x = x*x

When you explicitly specify the type for some expression, it must match (i.e. be more specific than) the type that Haskell would infer for the same expression.

Infix operators must be surrounded by parentheses (i.e. converted to prefix form) in type declarations.

It is recommended32 that you explicitly specify the types of the outermost functions in your programs.

Type synonyms

The type command lets you define syntactic sugar for oft-used types. For example, if you were writing a program in which you often passed around lists of 3D coordinates, where each coordinate was a Double, then you might get tired of typing [(Double,Double,Double)]. So you could do:

type List3D = [(Double,Double,Double)]

An oft-used type synonym is:

type String = [Char]

\newpage


The __IO__ types


When you take the type of a function which does input or output, it yields something of type IO something, where "something" is the type that you would "expect" that function to yield. Examples:

Prelude> :t putStrLn
putStrLn :: String -> IO ()
Prelude> :t getLine
getLine :: IO String

You'd expect that putStrLn wouldn't return anything (in some languages you might call this a return type of 'void', but in Haskell we'd say it's '())'). You'd expect that getLine would return a string.

Even some functions which don't seem to actually do IO have this in their type:

Prelude> :m Random
Prelude Random> :t randomRIO
randomRIO :: (Random a) => (a, a) -> IO a

What's happening here? Isn't the result of getLine a string? Isn't the result of randomRIO a number?

No, it isn't. Note that whenever we actually used these functions in examples, we used them within do blocks, and we used the <- operator to assign their result. For example:

Prelude> do {putStrLn "Enter a name"; name <- getLine; putStrLn ("Hi " ++ name)}
Enter a name
Bayle
Hi Bayle

You can think of the <- operator as removing the "IO" from the type of a value. Whenever you want to use a function with an "IO" in its return type, you'll need to use the <- operator if you want to get at the "actual" return value.

Passing values outside of a do block

The <- operator only works within do blocks. How do you pass a value out of the do block? That is, how do you send a value from somewhere inside of a do block to somewhere outside of the do block, so that that value can be used somewhere else?

Well, you can send values out of do blocks containing I/O, but those values are required to be one of those IO something types! You're not allowed to smuggle any "naked" values out of an I/O-containing do block.

A do block returns the value of the expression on its last line33. If the last expression itself yields an IO something type, all is well:

Prelude> :t do {getLine}
do {getLine} :: IO String

If you try to return something of a "naked type" out of a do block containing I/O, however, there's a type error34:

Prelude> :t do {putStrLn "hi"; True}

<interactive>:1:19:
    Couldn't match expected type `IO t' against inferred type `Bool'
    In the expression: True

Incidentally, it is illegal for the last line to be a <- expression; these things are so weird that they're technically not even considered to be expressions:

Prelude> :t do {name <- getLine}
<interactive>:1:4: The last statement in a 'do' construct must be an expression

__return__

If you want to return something from a do block containing I/O, and you are willing to submit to the indignity of changing its type from something to IO something, you can use the return construct:

Prelude> :t do {putStrLn "hi"; return True}
do {putStrLn "hi"; return True} :: IO Bool

return is kinda the reverse of <-; it changes a value of type something into a value of type IO something.

Unlike in some other programming languages, return does not mean anything like "exit this function and return this value".

Is there any way to get the IO out of the type permanently?

No. The reason is that Haskell wants to clearly segregate all computation which is causally linked to the outside world35. This is related to monads.

We'll explain why soon, but in order to have the background to do that we need to explain "referential transparency".

\newpage


Referential transparency


A referentially transparent expression is one that can be replaced with its value without changing the behavior of the program. So, referential transparency means that, if you compute the function f(x) and find that f(x) = y, then every time you encounter f(x) in the future, you can just replace it with y.

This may sound like it is always the case, but here's some examples where it's not.

Failure of referential transparency: mutable variables

Consider the Python program:

a=1;
c=a*5;

a=5;
d=a*5;

In Python, this results in c=5 and d=25. But if we assumed that we had referential transparency, then as soon as the interpreter saw "a=1", it would be allowed to substitute "1" for every occurence of the expression "a" from then on. And then at the end we'd have d=1*5 = 5.

So, in this case, if (after the first assignment to a) we had replaced every instance of the expression "a" with its value, then we would have changed the behavior of the program.

In general, if you ever find an expression that can return different things at different times, then that expression is not referentially transparent (in this case, "a" was shown to be such an expression).

Failure of referential transparency: hidden state

Consider this Perl program:

srand(0);
print "\n";
print rand();
print "\n";
print rand();

The function rand(), which calls a pseudorandom number generator, returns a different result upon each call, even though it was passed the same arguments each time. This is because the result of the rand() function doesn't just depend on the arguments that you pass it; it also depends on some __hidden state__ which is altered upon each call.

Failure of referential transparency: input

In Python, the input() function reads input from the user. If the interpreter evaluated input() once, and then after that, whenever it saw input(), it just substitued the return value it got the first time without actually calling input() again, that would change the program's behavior.

Failure of referential transparency: output

In Python, the print command doesn't return anything. So, it doesn't return different things at different times. Still, it is not referentially transparent. If the interpreter just replaced every print command with nothing, that would change the program's behavior.

Side effects

In the last example, referential transparency failed not because the return value of an expression was not constant, but because print has a side-effect. Side-effects are when the evaluation of a function has a "side-effect" beyond the computation of the return value. For instance, in Perl, consider the statement:

$a = (print "hi");

This statement does two kinds of things. First, it calls the print function, which computes a return value, which gets put into variable a. But the call to print also has a __side-effect__: the argument is printed to the console.

Referential transparency: summary

The essence of referential transparency is that functions don't care about anything but their arguments and their return value. The point is that referentially transparent expressions and functions are just normal expressions and functions like you find in mathematics; they are fixed mappings from inputs to outputs. None of this funny business that you see in computer programming like x=x+1 or expressions like print that __do__ things when you evaluate them; if they're referentially transparent, they're just mathematical expressions.

\newpage


I/O, do blocks, and monads


The IO types segregates non-referentially transparent computations

Haskell tries to keep referentially transparent expressions segregated from other code. In this way, it's clear which expressions can be relied upon to be referentially transparent, and which cannot. In Haskell, we call referentially transparent functions "pure functions", and we call other functions "actions".

This segregation is achieved by marking actions as type IO something.

Actions are a type of value; they don't "do" anything right away

In Haskell, just as pure functions are first-class objects, so are actions. For example, the type of putStrLn is String -> IO (). This means that putStrLn takes a String as input and returns something of type IO () as output. IO () is an example of an "IO action". This should be thought of as a representation of the action of printing out that string to the console.

This action can be combined with other actions to make a sequence of actions. For instance, you might combine putStrLn "Line 1" with putStrLn "Line 2" to make a composite action that prints out two lines to the console. The way that you combine these two actions into a third action in Haskell is:

myCompositeAction = do
                      putStrLn "Line 1"
                      putStrLn "Line 2"

In this example, the type of myCompositeAction is also IO (). So, the function myCompositeAction doesn't "do" anything; it merely returns an IO action (which, in this case, happens to be a combination of two smaller actions).

How does anything get done, then? The function main is special. main returns an IO action, and then Haskell actually executes that IO action.

So, you should think of all of your functions which use IO as pure functions which operate on actions. Their job is to prepare the final IO action that will be returned by main and then36 executed by Haskell.

IO actions aren't just predetermined sequences of input and output. Later subactions can conditionally depend on earlier subactions. For example, recall the doGuessing subroutine in our random number guessing program:

doGuessing num = do
  putStrLn "Enter your guess:"
  guess <- getLine
  let guessNum = read guess
  if guessNum < num
    then do 
          putStrLn "Too low!"
          doGuessing num
    else if read guess > num
            then do 
                  putStrLn "Too high!"
                  doGuessing num
            else do 
                  putStrLn "You Win!"

The type of doGuessing is (Ord a, Read a) => a -> IO (). This means essentially that this is a subroutine that takes a single input (a number37), and produces an IO action as an output. So, IO actions can be entire programs.

And main returns an IO action, which is then run. In other words, your program, considered as a whole, IS an IO action.

IO and monads as syntactic sugar for explicit labeling of OutsideWorld influence

We aren't going to explain yet what monads are, but here we will explain what monads are used for in the context of IO38:

I/O operations aren't referentially transparent because they interact with the outside world, yet this interaction isn't reflected in their formal inputs and outputs. For instance, a putStrLn statement doesn't return anything or change any variables, yet it changes the outside world. Similarly, a getLine statement both changes and is changed by the outside world, yet the outside world isn't reflected in its input arguments. So we could substantially improve the situation if we could explicitly pass around a symbol that represents the outside world. Maybe the signatures of putStrLn and getLine should be the following:

putStrLn :: OutsideWorld -> String -> OutsideWorld
getLine :: OutsideWorld -> (OutsideWorld, String)

Instead of writing:

putStrLn "What's your name?"
name <- getLine
putStrLn ("Hello, " ++ name)

we'd write:

main outside = 
  let outside2 = putStrLn outside "What's your name?"
      (outside3, name) = getLine outside2
  in putStrLn outside3 ("Hello, " ++ name)

This correctly models the dependencies of the computation on the various interactions with the outside world. However, it is hard to read. Also, what should the computer do if you give it this?:

main outside = 
  let outside2 = putStrLn outside "What's your name?"
      (outside3, name) = getLine outside2
  in putStrLn outside2 ("Hello, " ++ name)

In that example, we passed outside2 in the last call to putStrLn; but we should have passed outside3.

Haskell address these problems by expressing program I/O using a weird formalism that allows them to essentially follow the idea embodied in these examples (they pass along a symbol representing the outside world) without the programmer having to write it that way. This solution is called monads. Monads can be implemented within Haskell with no special handling from the compiler39 (although it's nice if the compiler optimizes them).

Monads can also be used for other things besides I/O.

The do block notation is syntactic sugar for monads. do notation allows us to write code that looks like a sequence of imperative I/O commands, and then internally transforms it into a set of interdependent expressions which represents dependencies on the outside world by passing along an I/O symbol40.

So, in summary: programs that communicate with the outside world are first-class objects called "IO actions". When you write a function that does I/O, its return type will be an IO action. do blocks are the syntax that Haskell uses for composing actions together into larger actions. The return type of a do block is an action. Under the hood, do blocks are syntactic sugar for a construct called "monads", which we haven't explained yet. But, in the case of IO actions, monads can be thought of as a pattern for writing code so that, technically, a symbol representing the "outside world" is passed to and from our IO functions, without cluttering up our function calls by actually writing that symbol.

\newpage


Detour<<This section and the following section on referential transparency were originally placed at the beginning of the tutorial, but then I decided that, although it could be understood at the beginning, it would be better appreciated later on. Also, I didn't want to bore the reader with abstractions before showing them some code. But now I want to explain about why I/O is weird in Haskell, which requires an understanding of referential transparency.>>: What is Haskell?

(or: words that programming language nerds use to describe Haskell)


Haskell is a functional programming language.

When I heard that, I assumed that that was because Haskell lets functions be "first-class objects", and because Haskell provides you with cool higher-order operators to manipulate functions with. Both of these things are true, and I think both tend to go together with functional languages, but they don't get to the heart of the meaning of the word 'functional'.

Functional languages are often contrasted with imperative languages. Imperative languages tell the computer what to do. They tend to be step-by-step lists of instructions. To give an example in Python,

def factorial(k):
   x = 1
   i = 1
   while (i <= k): 
      x = x*i
      i = i+1
   return x


print(factorial(5))

The function factorial says something like:

__Set the contents of memory location "x" to 1. Then set the contents of memory location "i" to 1. Then, as long as the contents of memory location "i" is less than or equal to "k", do the following: multiply the contents of memory location "x" by the contents of memory location "i", then increment the contents of memory location "i".__

}}}By contrast, in Haskell, the focus is not on telling the computer a sequence of instructions, but rather on defining the functions that you want computed. Functions are defined much as they are in mathematics. Here's how you might define a function <code>factorial</code> in Haskell:

{{{
factorial 0  = 1
factorial n = n*(factorial (n-1))

This sort of thing is Haskell's focus, and that is the reason that Haskell is called "functional". In Haskell, running a program corresponds to evaluating a function.

But Haskell has an imperative side too. It has to; a mathematical function just defines a mapping from inputs to outputs. You can define all the functions you want, and the computer might then be able to compute the output of any of those functions for any particular input, but if that was all you did, the computer would just sit there, because you haven't told it to __actually__ evaluate any particular function on any particular input (like a C program with lots of function definitions but an empty main()). Also, mapping inputs to outputs doesn't provide a notion of __actions__ (example: the action of printing something to the screen).

So, just like "imperative" languages, Haskell does provide a facility for giving the computer a sequence of instructions. But the code that does this is clearly separated from the other code41. The imperative commands are found only within do blocks42. Here's a program that is equivalent to the Python program above:

factorial 0 = 1
factorial n = n*(factorial (n-1))

main = do print(factorial 5)

Haskell is a lazy, pure functional language with static, strong typing

Lazy means that expressions don't get evaluated until (and __unless__) they have to be. Contrast with C, a "strict" language; in C, each expression is evaluated as soon as it is encountered.

One neat thing about a lazy language is that you can create expressions which would take up an infinite amount of space or time to completely evaluate. For instance,

Prelude> do {print (take 5 [1..])}
[1,2,3,4,5]

is Haskell for "print out the list consisting of the first five elements of the list [1..]", where [1..] is an infinite list consisting of all of the integers. It runs fine, because Haskell never actually tries to construct the infinite list.

When you define an expression in Haskell, Haskell doesn't actually try to evaluate it; it just remembers, "ok, if I ever need the definition of that expression, here's where to find it". Only when the expression is needed for I/O (or is needed in order to evaluate some other expression that is needed for I/O) does it get evaluated.

Laziness allows you to define infinite data structures, which are often cleaner to define and work with (see section Circular function definitions for infinite data structures, below). L

aziness also allows you to do things like make a lazy database query that, if not lazy, would return a result too large to fit into memory; and then to retrieve the results one-by-one (or in small sets) (note: Haskell's laziness won't automatically make an eager database query into a lazy one, your database framework would have to do that; but this is a good example of a situation in which laziness can be helpful).

In some cases, laziness can make it easier to reuse modular code; for example, if you are doing a search through state space, you can construct the state space lazily and pass it to some other routine, and the other routine can decide to do breadth-first search or depth-first search or what have you. This is more reusable because the construction of the state space is completely modularized from its exploration. This separation also tends to make the code clearer.

See this essay for more on the value of laziness: http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html

Laziness has a price. When you pass around a variable with a lazy value that has not been fully evaluated, it has to keep track of the state of the partial evaluation, which can incur considerable memory overhead. In some cases the programmer must be aware of this sort of thing and plan carefully in order to avoid out of memory errors (see http://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl%27 for an example). There is also some extra CPU processing needed to keep track of this stuff.

Pure means that code in Haskell __(except within "do" blocks43!)__ satisfies __referential transparency__. We'll define44 that in the next section.

Static typing means that type-checking is computed at compile-time rather than at run-time.

Strong typing is used in different ways by different people, but what I mean here is that you can't do something like a typecast in C; that is, you can't tell Haskell to let you get away with type violations.

Pros and cons of Haskell

Since I haven't actually programmed in Haskell much yet (todo), all I can do is repeat what I've heard from others.

Pros