Bayle Shanks's website: tutorials-haskellTutorial

Haskell tutorial

by Bayle Shanks

some code samples are from __Yet Another Haskell Tutorial__ which is by Hal Daume III

I'm writing this after I read some tutorials myself but before actually programming anything in Haskell -- it's a "blind leading the blind" situation. So this tutorial is probably filled with errors and misunderstandings. After I get some experience with Haskell I might come back and check it over and then remove this warning, if I ever get around to it

Preface

This (currently unfinished) tutorial aims to be short yet to cover topics through basic monads.

Much of the bulk of the text consists of short code examples and interpreter traces.

The intended reader has programming experience, but only in imperative languages, and wants a quick introductory tour of Haskell. The tutorial is not intended to be difficult, however in the interest of brevity many things are demonstrated without much explanation.

It does not aim to be comprehensive, so after reading it, interested parties may want to read a longer tutorial to get the details that are skipped here. Hopefully, after reading this tutorial, you will be able to breeze through a longer one.

Copyright

the Gnu Free Document License, or
the Creative Commons Sharealike license, or
the Creative Commons Sharealike+Non-Commerical license,

whichever you prefer (but not in any other circumstances without prior permission).

Note to people reading this on the wiki

(ignore this if you are reading the PDF). I'm using EasyLatex? to compile the wiki source of this page into a PDF. So don't mind the occasional LaTeX? commands. For example:

\usepackage{times} \usepackage[margin=3.7cm]{geometry}

\newpage

Hello World

As of this writing, the most popular implementation of Haskell seems to be ghc, the Glasgow Haskell Compiler. The GHC project also produces ghci, an interpreter. In this tutorial, we'll use ghci (although of course most of the things that we'll talk about are general features of the language that will be in any implementation). So go install ghci now.

When you start ghci, you get an intro banner and then a prompt:

Prelude>

To quit, type Cntl-D.

Start ghci again. You can enter Haskell directly in the interpreter, for example:

Prelude> print "Hello World"
"Hello World"
Prelude>

If you want to put this in a file, here's how. Each file is a module. The module name should be the same as the file name, except that the file name should have an .hs at the end and the module name should not.

So, create a file called Test.hs and put this into it:

module Test where
helloWorldFunction = print "Hello world"

To load a module in ghci, you say ":l modulename". So, to load module Test and then evaluate helloWorldFunction, you do:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> helloWorldFunction
"Hello world"
*Test>

Note that prompt changes from "Prelude" to "*Test". If you later change the sourcecode file and want to reload it, you can use :r, which reloads the current module.

In the previous example, the reason that there are quotes around "Hello world" is that the "print" command prints a representation of a value that is useful for debugging. If you just have a string that you want to send to the console directly, you use putStrLn. For example:

Prelude> print "Hello world"
"Hello world"
Prelude> putStrLn "Hello world"
Hello world
Prelude>

If you want to compile a standalone executable, you need to make a Main module. Put the following into Main.hs:

module Main where
main = putStrLn "Hello world"

Now, at your operating system's command line, use ghc to compile:

$ ghc --make Main.hs -o main
$ ./main
Hello world

\newpage

Basic syntax

Haskell is case-sensitive

In addition,

The names of types, typeclasses, and modules must begin with an upper-case letter.
The names of infix functions must be composed of punctuation symbols.
The names of almost1 everything else must begin with a lower-case letter.

Types

In Haskell, every expression has a type2. Various functions and operators have various restrictions on what sort of types they can operate on. If you try to use incompatible types, you'll get a compile time error.

If you are curious about the type of some expression in Haskell, you can use ghci's :t command to find out. For example:

Prelude> :t 'c'
'c' :: Char
Prelude> :t "a string"
"a string" :: [Char]

Later on we'll talk about how to read the notation for expressing the types of things in Haskell. Until then, don't worry about it.

At this point you might be worried that you'll spend a lot of time declaring the types of things, so I should mention that Haskell doesn't usually require you to declare the types of your variables and functions. Haskell has a powerful system of type inference that guesses the types of almost everything automatically.

A disadvantage of using a powerful type inference system is that it makes type error messages harder to interpret. For example, let's say that you have three expressions, call them A,B,C. Let's say that you give expression A the wrong type. Let's say that you construct expression B out of expression A, and then in a very different part of the program, you refer to expression B in expression C. Because you made a mistake with expression A, Haskell might infer the wrong type for expressions B and C. Perhaps the error will only surface when it gets to expression C. In this case, the error message will refer only to expression C, and you'll have to figure out that the root of the problem was really with expression A.

Defining functions

To define a function in a source code file, write something like:

functionName argument1 argument2 = expression

For example:

plusFive a = a+5

Inside ghci, you have to put the word "let" at the beginning of the function definition:

Prelude> let plusFive a = a+5
Prelude> plusFive(10)
15

Functions you define are prefix by default. We'll talk about how to define infix functions later.

Calling functions

To call a prefix function, just write the function, and then a space, and then the argument. If there are multiple arguments, just separate them by spaces. For example:

Prelude> let addition a b = a+b
Prelude> addition 2 3
5

do blocks

In Haskell, if you want to execute a sequence of instructions, you can't just put them one after another, unless they are within a do. For example, try putting this incorrect code into Test.hs:

module Test where

test = print "First line."
       print "Second line."

It won't compile in ghci (don't worry about trying to understand this error message3):

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:2:7:
    Couldn't match expected type `(a -> IO ()) -> [Char] -> t'
	   against inferred type `IO ()'
    In the expression: print "First line." print "Second line."
    In the definition of `test':
	test = print "First line." print "Second line."
Failed, modules loaded: none.

The problem is that in Haskell, a function is just __a single expression__. So in a sense, a whole Haskell function is analogous to just a single line of code in other languages.

To execute a sequence of instructions, you have to wrap them in a do block. Example in ghci:

Prelude> do {putStrLn "First line."; putStrLn "Second line."}
First line.
Second line.

Example as a source code file (to be placed in Test.hs):

module Test where
test = do {putStrLn "First line."; putStrLn "Second line."}

A complete do block is itself a single expression that returns a single value (which is why you can have a whole do block inside a function, even though functions must be single expressions).

Under the hood, do blocks are actually syntactic sugar for something pretty complicated involving monads. We'll talk more about the deeper meaning of do blocks later. For now, one more thing to note is that a do block that contains I/O has a return value of type 'IO something' (where the 'something' varies).

Lack of destructive variable updates

You don't have mutable variables in Haskell.

Put the following incorrect code into Test.hs:

module Test where
x = 1
x = x+1

Now try loading it in ghci:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:3:0:
    Multiple declarations of `Test.x'
    Declared at: Test.hs:2:0
		 Test.hs:3:0
Failed, modules loaded: none.

To accomplish the same effect, you have to have a different name for the so-called variable at each stage. Put this into Test.hs:

module Test where
x = 1
x2 = x+1

Now it works:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> x2
2

You can see that what you are really doing here is not declaring "variables", but defining functions, just like the "addition" function in the example above. There are still variables in Haskell, in the form of function arguments; but these are just like the variables found in mathematics -- their value doesn't change over the course of the computation.

However, for me, the word "variable" in the context of a programming language is so bound up the idea of mutable variables that I like to tell myself this: in Haskell, there are no "variables", because nothing "varies"; there are just functions, each with a fixed4 definition.

I'll say that again. In the above code, x and x2 do not correspond to memory locations which you can read and write. x and x2 are names of functions that you have defined.

How can this be?

Now, you're thinking, how are we going to program without mutable variables? Well, as far as I know (todo), there are 3 ways in which variables are used in typical imperative programming languages:

Cases in which the same variable is assigned different values on different lines of code.
Function arguments.
Things that change as you progress through loop iterations.

I'll treat each case in turn.

In the first case, these variables don't really have to be mutable -- you could always5 give the variable a new name on each line of code that assigns to it6.

An example of the first case, in Python:

import time

toOutput = "The time is "
toOutput = toOutput + time.strftime('%H:%M')
toOutput = toOutput + " right now"
print toOutput

In this situation, toOutput can be replaced by three different variables:

import time

toOutput1 = "The time is "
toOutput2 = toOutput + time.strftime('%H:%M')
toOutput3 = toOutput + " right now"
print toOutput3

The second case is function arguments. Function arguments aren't usually considered "mutable variables", but they do take on different values each time the function is called. Well, we do have function arguments in Haskell, so there's no problem here.

The third case are things that change as you progress through loop iterations. Well, that's easily taken care of. In Haskell, you're not allowed to loop. Feel better?

That's a half-truth. In Haskell, you replace loop constructs with recursion. Each iteration is replaced with a new call to the function. So, you can still go through the same section of code over and over again while incrementing the equivalent of a loop counter -- it's just that the loop counter is a function argument, rather than a local variable, and that each iteration is a separate function call. I'll give an example later.

By the way, you can put some nonalphanumeric characters into your function names. Some people like to use the apostrophe (for example, "x'" -- read "x prime") to indicate to the reader that a bunch of functions should be considered to form sequence. Example:

module Test where
x = 1
x' = x+1

You can redefine functions within do blocks

To reassign a name within a do block, use something like "let {x=1}". Example:

Prelude> do {let {s="Line 1"}; putStrLn s; let {s="Line 2"}; putStrLn s;}
Line 1
Line 2

However, this doesn't mean that we have mutable variables. The following code enters an infinite loop when Haskell tries to evaluate "x=x+1":

Prelude> do {let {x=1}; print x; let {x=x+1}; print x;}

You can redefine functions within ghci

You can also reassign things within the ghci interpreter using let. You might think of the things that you enter in the interpreter as being within an implicit do block7. Example:

Prelude> let x = 3
Prelude> x
3
Prelude> let x = 4
Prelude> x
4
Prelude> let f a b = a+b
Prelude> f 2 3
5
Prelude> let f a b = a*b
Prelude> f 2 3
6

Using whitespace as shorthand for \{\} and ;

You can either enclose blocks in curly brackets and delimit lines with semicolons, or you can use whitespace. If you use whitespace, just indent things that you would have put into curly brackets.

In the ghci interpreter, you cannot use whitespace, you must use only "{}" and ";". You can only use whitespace in source code files.

Here is an example using "{}" and ";":

module Test where
test = do {putStrLn "First line."; putStrLn "Second line."}

And here is the equivalent using whitespace:

module Test where
test = do 
  putStrLn "First line."
  putStrLn "Second line."

The rules for interpretation of whitespace in Haskell are sometimes called the rules of "layout".

Generally for the rest of this tutorial when I present code samples, I'll use the whitespace form, as if it was sitting inside a source code file. Sometimes, however, when I want to show you what happens when you evaluate something, I'll show you a ghci trace and put in some commands in the form that uses {} and ;.

Comments

Examples:

x = 3    -- this is a single line comment

    {- this is a 
          multiline
  comment -}

Module import

To import a module in a source code file, use the import statement. For example, to import module Random:

import Random

To import a module within the ghci interpreter, use the :m command. For example, to import module Random:

Prelude> :m Random
Prelude Random>

Debugging within ghci

You can set breakpoints within do blocks by using the ghci along with the breakpoint command, which is in the GHC.Base module:

module Test where
import GHC.Base

main = do let a = 3
          let b = a*2
          breakpoint $ do
          print a+b

If you are using ghci 6.8.1 or later, you can use the more powerful debugging facilities included in ghci. See http://haskell.org/ghc/docs/6.8.1/html/users_guide/ghci-debugger.html for details.

Some special values: True, False, ()

True and False are the result of boolean expressions.

Another special value in Haskell is the "unit" value, written:

()

This is sometimes used to mean something like what is called "null" in other languages (in Haskell, null is used for something else; it's a boolean function that tells you if a list is empty).

Quick note on numerical negation and order of operations

By the way, sometimes you run into trouble because you expect the negation function ('-') to bind very tightly to numbers, but it doesn't. Just use parenthesis. Example:

Prelude> abs 3
3
Prelude> abs -3

<interactive>:1:5:
    No instance for (Num (a -> a))
      arising from the literal `3' at <interactive>:1:5
    Possible fix: add an instance declaration for (Num (a -> a))
    In the second argument of `(-)', namely `3'
    In the expression: abs - 3
    In the definition of `it': it = abs - 3
Prelude> abs (-3)
3

When we tried to do abs -3, what happened was that it was parsed as (abs -) 3. Oh well.

if/then/else

Example8:

sgn x =
    if x < 0
      then -1
      else if x > 0
        then 1
        else 0

An else is always required in Haskell.

Both the then expression and the else expression have to return compatibly-typed values (so that the overall expression has a sensible return type).

For example, if you put this incorrect code into Test.hs, then it won't compile:

badIf x = if x < 0
          then 1
          else "this is a string"

Here's the error you get:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:10:15:
    No instance for (Num [Char])
      arising from the literal `1' at Test.hs:10:15
    Possible fix: add an instance declaration for (Num [Char])
    In the expression: if x < 0 then 1 else "this is a string"
    In the definition of `badIf':
	badIf x = if x < 0 then 1 else "this is a string"
Failed, modules loaded: none.

What the compiler is basically saying in this error is:

"I noticed that the type of the then clause was a number ('Num'), so I assumed that the type of the whole if/then/else clause was a number. Then I noticed that the type of the else clause was a string ('[Char]'). But a string is not a type of number ('No instance for (Num [Char])'). So the type of the else clause is not compatible with the inferred type of the whole if/then/else clause."

if/then/elses and dos

If you do an if/then/else inside a do block, and you want to do a sequence of actions inside the if/then/else, then you have to open a new do block.

For example, the following code is incorrect:

main = do
         let a = 3
         putStrLn "Testing if a == 3..."
         if a == 3
           then 
             putStrLn "Yep, it is 3."
             putStrLn "Good."
           else 
             putStrLn "Nope, a is not 3."
             putStrLn "That's impossible."

This is what you have to do:

main = do
         let a = 3
         putStrLn "Testing if a == 3..."
         if a == 3
           then do
             putStrLn "Yep, it is 3."
             putStrLn "Good."
           else do
             putStrLn "Nope, a is not 3."
             putStrLn "That's impossible."

\newpage

User input

To get a line of input from the console and temporarily assign it to the symbol name "line", do this within a do block:

line <- getLine

For example:

Prelude> do {putStrLn "Enter a name"; name <- getLine; putStrLn ("Hi " ++ name)}
Enter a name
Bayle
Hi Bayle

(note: ++ is how you do string concatenation)

Introduction to the arrow operator inside a do

At this point you are probably wondering, "what's that <- operator for?"

The full answer involves a theoretical issue with I/O, the deeper meaning of do blocks, and monads. So I'll put off explaining it until later. For now, just pretend that name is a mutable variable, and that within the context of a do block, the '<-' operator means "assign the result of this function to that variable." Note, however, that the '<-' operator can only be used with certain functions. Functions which receive transient information from 'the outside world' are some of these functions.

The read function

When you use getLine, you get back strings. If the user is actually entering something else type of thing, like a number, you'll have to explicitly convert the string to that other type. So, the following is incorrect code:

isUserInputGreaterThan3_bad = 
    do
      input <- getLine;
      if input > 3
        then putStrLn (input ++ " is greater than 3")
        else putStrLn (input ++ " is NOT greater than 3")

The error you get is:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:6:17:
    No instance for (Num String)
      arising from the literal `3' at Test.hs:6:17
    Possible fix: add an instance declaration for (Num String)
    In the second argument of `(>)', namely `3'
    In the predicate expression: input > 3
    In the expression:
	if input > 3 then
	    putStrLn (input ++ " is greater than 3")
	else
	    putStrLn (input ++ " is NOT greater than 3")
Failed, modules loaded: none.

A function that parses user input and turns strings into other types of values is (for example, numbers) is read. This is the thing you need to fix the above example. For example:

isUserInputGreaterThan3 = 
    do
      input <- getLine;
      let number = read(input)
      if number > 3.0
        then putStrLn (input ++ " is greater than 3")
        else putStrLn (input ++ " is NOT greater than 3")

If the user enters something that isn't a number, you'll get an exception that looks like:

*Test> isUserInputGreaterThan3
woah
*** Exception: Prelude.read: no parse

IMPORTANT NOTE: Note that we used let = to assign the result from read, not <-. Remember that <- can only be used with certain special functions. read is not one of them.

Another note. In the above code, it is important that we said number > 3.0 rather than number > 3. If we hadn't done that, then the function would work find when the user enters and integer, but it would crash when the user tries to enter a number with a decimal point. The reason is that doing this caused Haskell's automatic type inference system to infer that number supported 'Fractional' operations. And this in turn caused Haskell to choose a variant of the read function that returns something that supports 'Fractional' operations, which is necessary for read to be able to parse reals.

Here's the error you get if you put 3 instead of 3.0:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> isUserInputGreaterThan3
4
4 is greater than 3
*Test> isUserInputGreaterThan3
3.5
*** Exception: Prelude.read: no parse

How are you supposed to figure out the root cause of the problem for something like this? We haven't yet talked enough about types to be able to explain how to troubleshoot this type of problem, but we'll come back to it later.

The show function

The show function is sort of the opposite of the read function. The show converts various non-string data types to strings:

Prelude> 2
2
Prelude> show 2
"2"
Prelude> "string" ++ 2

<interactive>:1:12:
    No instance for (Num [Char])
      arising from the literal `2' at <interactive>:1:12
    Possible fix: add an instance declaration for (Num [Char])
    In the second argument of `(++)', namely `2'
    In the expression: "string" ++ 2
    In the definition of `it': it = "string" ++ 2
Prelude> "string" ++ (show 2)
"string2"

\newpage

Conditional I/O: a little trick

There's something tricky about doing conditional I/O in Haskell. We haven't run into this yet in the examples, but in case you try to do it yourself I wanted to mention it now. You can skip over this section and come back to it later if you want.

As previously noted, a do block that contains I/O has a return value of type 'IO something' (where the 'something' varies). Since the return type of both the then and the else must be compatible, this means that if you do I/O in one condition and not in the other, you'll get an error about incompatible types.

Symptoms of the problem

For instance, the following incorrect code causes an error:

conditionalPrint bool str =
    if bool
       then print str
       else ()

Here's the error that you get:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:12:12:
    Couldn't match expected type `IO ()' against inferred type `()'
    In the expression: if bool then print str else ()
    In the definition of `conditionalPrint':
	conditionalPrint bool str = if bool then print str else ()
Failed, modules loaded: none.

What the compiler is saying is something like:

"I noticed that the type of the then clause was IO (), so I assumed that the type of the whole if/then/else clause was IO (). So I expected that the type of the else clause would be IO (), too. But when I calculated the type of the else clause, it turned out to be (). And these two types don't match."

Solution

To fix this (at least in the case of print, which returns type IO ()), you can use the following weird-looking expression, which generates a type of IO () without actually doing any I/O:

( (do {return ()})::IO () )

We'll explain this... later. Here's an example of it in use:

conditionalPrint bool str =
    if bool
       then print str
       else ( (do {return ()})::IO () )

And here's how it looks when you run it (assuming you put it into Test.hs below an appropriate module header):

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> conditionalPrint True "hi"
"hi"
*Test> conditionalPrint False "hi"
*Test>

\newpage

How to loop using recursion

Here's the analog of a for loop. The following function prints out a count from a to b:

countFromAtoB a b = 
		if a<b
		   then do
		   	print a
		   	countFromAtoB (a+1) b
		   else 
		   	print a

Here's how it looks (assuming that you put the above code into Test.hs, following an appropriate module declaration):

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> countFromAtoB 3 5
3
4
5

You might be worried that if you did a loop with many iterations this way, the stack would overflow, because as the number of iterations increased, the recursion depth would increase, and the call stack would have to hold more and more stack frames. However, compilers for functional languages notice when the very last operation that a function does is a recursive call to itself, and they convert these structures to iteration in the compiled code. This is called "tail call optimization"9. So don't worry.

Here's the analog of a while loop:

whileUserDoesntEnterQ = 
                do
                  putStrLn "Enter Q to quit"
                  input <- getLine;

                  if (input == "Q")
                    then print "Bye"
                    else whileUserDoesntEnterQ

\newpage

Example: What number am I thinking of?

Now that we know how to do if/then/else, basic console input and output, and loops using recursion, we can do this example10:

module Main where

import Random

main = do
  num <- randomRIO (1, 100)
  putStrLn "I'm thinking of a number between 1 and 100"
  doGuessing num


doGuessing num = do
  putStrLn "Enter your guess:"
  guess <- getLine
  let guessNum = read guess
  if guessNum < num
    then do 
          putStrLn "Too low!"
          doGuessing num
    else if read guess > num
            then do 
                  putStrLn "Too high!"
                  doGuessing num
            else do 
                  putStrLn "You Win!"

Note that randomRIO is one of those functions that uses the <- operator. Since the result of this function will be different each time you call it, you can think of it as a function which returns transient information about some 'outside world'.

\newpage

Basic data structures and functions

Tuples

Tuples are fixed-length, ordered collections of hetrogeneous elements.

To group two values together into an ordered pair, surround them with parentheses, for example11:

Prelude> (5,3)
(5,3)

The values in a pair don't have to be the same type:

Prelude> (5, "hello")
(5,"hello")

To extract the first and the second values from a pair, you could use the predefined functions fst and snd12:

Prelude> fst (5, "hello")
5
Prelude> snd (5, "hello")
"hello"

In addition to pairs, you can construct triples, quadruples, etc13:

Prelude> (1,2,3)
(1,2,3)
Prelude> (1,2,3,4)
(1,2,3,4)

In general, pairs, triples, quadruples, etc are called tuples.

Tuples are flexible because they can put together different data types. They are not equipped to (easily) change their length, however; if you want a variable-length collection of elements, you may be better off with a list.

Note: the functions fst and snd only work on pairs, not triples, etc. We'll see how to extract values from longer tuples later, when we talk about "pattern matching".

Lists

Lists are variable-length, ordered collections of homogenous elements.

To construct a list, surround the elements of the list with square brackets:

Prelude> [5,3,2]
[5,3,2]

To add an element to the front of a list, use a colon (called a "cons" operator, short for "construct" as in "we are constructing a list"):

Prelude> 7:[5,3,2]
[7,5,3,2]

You can build any list by starting with the empty list (written []) and then adding in the elements one by one:

Prelude> 7:5:3:2:[]
[7,5,3,2]

You can make lists of tuples, lists of lists, lists of functions, or whatever. However, the elements of a list must all be of the same type.

Strings are just lists of characters14:

Prelude> 'H':'e':'l':'l':'o':[]
"Hello"

Some basic list functions are:

head: the first element of a non-empty list
tail: all elements __but__ the first element of a non-empty list
length: length of a list
null: True iff the list is empty
++: concatenate two lists
take: make a new list with the first n elements of the input list (see below)
filter: elementwise filter elements out of a list (see below)
map: elementwise transform a list (see below)
..: enumerate from x to y (enumerable lists only!) (see below)
zip: "zip" two lists together into a single list with pairs (see below)

take takes 2 arguments, an integer and a list. It returns a new list with the first n elements of the list. Example:

Prelude> take 3 [3,6,2,5]
[3,6,2]

filter takes two arguments, a boolean function and a list. It returns another list. It applies the boolean function to each element of the list and only those elements which pass the test (that is, those elements for which the function returns True) get added to the return list. For instance15:

Prelude> filter Char.isLower "Hello World"
"elloorld"

map takes two arguments, a function and a list, and returns another list. It applies the function to each element of the input list, and then adds the result of the function to the output list. For instance16:

Prelude> map Char.toUpper "Hello World"
"HELLO WORLD"

If the elements of a list are a type that is __enumerable__ (that is, if their type is a member of the Enum typeclass.. which will be introduced later), then you can use .. as a shorthand when constructing a list, as in the following example. Int and Char (integers and characters) are examples of enumerable types.

Prelude> [1..10]
[1,2,3,4,5,6,7,8,9,10]

To specify a step other than 1:

Prelude> [1,3..10]
[1,3,5,7,9]
Prelude> [1,4..10]
[1,4,7,10]

Example of zip:

Prelude> zip [1,2,3,4,5] ['a','b','c','d','e']
[(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')]

There is some syntactic sugar for using map and filter on lists. This construct is called a list comprehension. It looks like this:

[expression containing x | x <- inputList, g x]

(except you can use any name in place of x) and it evaluates to:

map (a function representing 'expression containing x') (filter g inputList)

For example17:

Prelude> map toLower (filter isUpper "Hello World")
"hw"
Prelude> [toLower x | x <- "Hello World", isUpper x]
"hw"

You can put more than one filter condition in the list comprehension:

Prelude> let li = [(0,0), (0,1), (1,0), (1,1)]
Prelude> [(x, "hello") | x <- li, fst x == snd x, fst x < 1]
[((0,0),"hello")]

List comprehensions can also be used to combine multiple lists in complex ways. Suppose18 you want to create a list of pairs, one for each point where 0 < x < 2, 0 < y < 3, and x <= y.

Prelude> [(x,y) | y <- [0..5], x <- [0..5], x <= 2, y <= 3, x <= y]
[(0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3),(2,3)]

Latter comma-separated clauses can use the names assigned in former clauses:

Prelude> [(x,y) | x <- [0..2], y <- [x..3]]
[(0,0),(0,1),(0,2),(0,3),(1,1),(1,2),(1,3),(2,2),(2,3)]

The order of the comma-separated clauses in the second part matters:

Prelude> [(x,y) | y <- [x..3], x <- [0..2]]

<interactive>:1:15: Not in scope: `x'
Prelude> [(x,y) | y <- [0..3], x <- [0..y], x <= 2]
[(0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3),(2,3)]

Note that the order of the elements in the resulting list is different from the way it was when x <- came before y <-.

Exercises

Exercise 119 Use map to convert a string into a list of booleans, each element in the new list representing whether or not the original element was a lower-case character. That is, it should take the string "aBCde" and return [True,False,False,True,True].

Exercise 220 Use the functions mentioned in this section (you will need two of them, plus Char.isLower) to compute the number of lower-case letters in a string. For instance, on "aBCde" it should return 3.

Exercise 321 Write a function that takes a list of pairs of length at least 2 and returns the first component of the second element in the list. So, when provided with [(5,'b'),(1,'c'),(6,'a')], it will return 1.

Some simple functions

id: the identity function; id x = x
==: equality
/=: inequality
not: logical negation
&&: logical AND
||: logical OR
and: input: a list of booleans. output: True iff they are all True
or: input: a list of booleans. output: True iff any of them are True

\newpage

More syntax

Temporary function definitions with let and where

You can temporarily define a symbol using let ... in. Example22:

roots a b c =
    let det = sqrt (b*b - 4*a*c)
    in ((-b + det) / (2*a),
         (-b - det) / (2*a))

You can define multiple symbols at once like this23:

roots a b c =
    let det = sqrt (b*b - 4*a*c)
        twice_a = 2*a
    in ((-b + det) / twice_a,
         (-b - det) / twice_a)

Another construct which is like let except that it goes in different places is where. where goes after the block of code rather than before it24 25:

roots a b c =
    ((-b + det) / (2*a), (-b - det) / (2*a))
    where det = sqrt(b*b-4*a*c)

You can use let or where to temporarily redefine something which was already defined outside of the code block. Outside of the code block, the symbol goes back to its previous definition:

x = 1

y = let x = 2 in
    x*2

z = x*2

In that piece of code, you end up with x=1, y=4, z=2:

Prelude> let x=1
Prelude> let y=let {x = 2} in x*2
Prelude> let z=x*2
Prelude> x
1
Prelude> y
4
Prelude> z
2

You can have both a let and a where surrounding the same piece of code:

z = 
    let x = 2 in
    x*y
    where y = 3

In that example, z=6.

If you tried to define the same symbol in both the let and the where clause, the let clause takes precedence. But don't do that.

Another name for this sort of "temporary definition" is "local binding". Another name is "temporary declaration". When a local definition takes precedence over a definition in an outer scope, it is said that the local definition "shadows" the other definition.

Since (outside of a do block, Haskell doesn't have26 sequential statements, let and where often substitute for what in other languages would be a list of sequential variables assignments. Example:

vectorLength x y z =
  let
    xSq = x^2
    ySq = y^2
    zSq = z^2
    sumOfSquares = xSq+ySq+zSq
    ans = sqrt(sumOfSquares)
  in ans

\newpage

Example: Search for a path between two nodes in a graph

In this example, edgeList represents the list of edges in a directed graph, represented as a list of pairs, where the first element of each pair is the edge source, and the second element is the target.

For example, [(0,1), (0,2), (2,3), (3,4)] represents the graph with vertices 0,1,2,3,4 which looks like:

\begin{graph} rankdir = "LR"; size="2,2" 0 -> 1 0 -> 2 2 -> 3 3 -> 4 \end{graph}

Here's the source code:

search edgeList src dst =
  if src == dst
    then
       [src]
    else
       searchEdgesOriginatingAtSrc edgeList src dst

searchEdgesOriginatingAtSrc edgeList src dst =
  if (null edgeList)
    then []
    else let 
      currentEdge = head edgeList
      currentEdgeSrc = fst currentEdge
      currentEdgeDst = snd currentEdge
    in
     if src == currentEdgeSrc
        then let searchResult = search edgeList currentEdgeDst dst in
           if not (null searchResult)
              then src:searchResult
              else searchEdgesOriginatingAtSrc (tail edgeList) src dst
        else searchEdgesOriginatingAtSrc (tail edgeList) src dst

Here's how it looks:

Prelude> :l Test
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> let el = [(0,1), (0,2), (2,3), (3,4)]
*Test> search el 0 3
[0,2,3]
*Test> search el 0 4
[0,2,3,4]
*Test> search el 4 0
[]

We'll come back to this example27 a few times and rewrite it in various ways.

One alternative way that we can write this is to use a where clause to define searchEdgesOriginatingAtSrc within the scope of search, in which case we don't have to pass in the arguments src and dst:

search edgeList src dst =
  if src == dst
    then
       [src]
    else
       searchEdgesOriginatingAtSrc edgeList

  where 
    searchEdgesOriginatingAtSrc edgeList =
      if (null edgeList)
        then []
        else let 
          currentEdge = head edgeList
          currentEdgeSrc = fst currentEdge
          currentEdgeDst = snd currentEdge
         in
           if src == currentEdgeSrc
               then let searchResult = search edgeList currentEdgeDst dst in
                 if not (null searchResult)
                    then src:searchResult
                    else searchEdgesOriginatingAtSrc (tail edgeList)
               else searchEdgesOriginatingAtSrc (tail edgeList)

\newpage

Even more syntax

case expressions

Example:

dayStrExpand abbrev =
    case abbrev of
      "M" -> "Monday"
      "T" -> "Tuesday"
      "W" -> "Wednesday"
      "Th" -> "Thursday"
      "F" -> "Friday"
      "Sat" -> "Saturday"
      "Sun" -> "Sunday"
      _ -> "(INVALID DAY ABBREVIATION!)"

The underscore is the "default" rule that fires if none of the others do. In Haskell the underscore represents a wildcard that matches anything.

Just like with then and else, each expression of the case has to return compatibly-typed values (so that the overall case expression has a sensible return type).

"undefined"

There is a special value in Haskell called undefined. Example:

dayStrExpand abbrev =
    case abbrev of
      "M" -> "Monday"
      "T" -> "Tuesday"
      "W" -> "Wednesday"
      "Th" -> "Thursday"
      "F" -> "Friday"
      "Sat" -> "Saturday"
      "Sun" -> "Sunday"
      _ -> undefined

If you try to print out an undefined value, it results in an exception. Example:

Prelude> let x = undefined
Prelude> let y = x + 1
Prelude> print y
*** Exception: Prelude.undefined

Sometimes people use the symbol \bot (read "bottom"; if you are reading this in HTML, "bottom" looks like an upside-down 'T') to refer to Haskell's "undefined".

Piecewise function definitions

Like in math, you can define functions piecewise. This is much like a case statement. Example:

f 0 = 0
f 1 = 2
f 2 = 5

Note that in that example, the piecewise definition doesn't cover all of the possible cases. If you try to evaluate the function on an undefined case, you'll get an exception:

Prelude> let {f 0 = 0; f 1 = 2; f 2 = 5} 
Prelude> f 0
0
Prelude> f 1
2
Prelude> f 2
5
Prelude> f 3
*** Exception: <interactive>:1:5-29: Non-exhaustive patterns in function f

In a source code file, the different parts of the function definition don't even have to be next to each other. If you are doing this interactively in ghci, however, they all have to be in the same let, because otherwise the latter definitions overwrite the previous ones:

Prelude> let f 0 = 0
Prelude> let f 1 = 2
Prelude> f 1
2
Prelude> f 0
*** Exception: <interactive>:1:4-10: Non-exhaustive patterns in function f

Prelude> let {f 0 = 0;f 1 = 2;}
Prelude> f 1
2
Prelude> f 0
0

If multiple definitions match the same situation, the first one takes precedence:

Prelude> let {f 0 = 0; f 0 = 1}

<interactive>:1:5:
    Warning: Pattern match(es) are overlapped
	     In the definition of `f': f 0 = ...
Prelude> f 0
0

A more interesting example:

Prelude> let {f 0 = 0; f x = 1/(-x)}
Prelude> f (-10)
0.1
Prelude> f (-0.000001)
1000000.0
Prelude> f (0.000001)
-1000000.0
Prelude> f (10)
-0.1
Prelude> f 0
0.0

Guards

You can associate arbitrary conditions with each "piece" of a piecewise defined function using "guards". To do this, you write something like

functionName argument
     | condition1 = (function definition when condition1 is true)
     | condition2 = (function definition when condition2 is true)
     ...
     | otherwise = (function definition when none of the conditions are true)

where the condition1, condition2, etc. are boolean expressions. When the function is evaluated, all of the guards are tried in order until one matches.

For example:

abs x
    | x >= 0 = x
    | x < 0 = -x

or equivalently:

abs x
    | x < 0 = -x
    | otherwise = x

Pattern matching

Rather than using fst and snd to extract values from pairs, you can use "pattern matching" in the function definition instead:

Prelude> let f (a,b) = 2*a+b
Prelude> f (5,7)
17

You can do a similar thing with lists, in place of using head and tail:

Prelude> let f (x:xs) = do print x
Prelude> let g (x:xs) = do print xs
Prelude> f [5,7,3]
5
Prelude> g [5,7,3]
[7,3]

In fact, you can do this for any data type for which you have a constructor28; you simply write the constructor in the function arguments section, and when the function is called Haskell will bind the symbols used in the constructor expression to the appropriate values in the data structure.

You can combine this with piecewise function definition as a substitute for if/then/else expressions on parts of the data structure:

pairIs01Or10 (0,1) = True
pairIs01Or10 (1,0) = True
pairIs01Or10 (a,b) = False

In ghci:

Prelude> let pairTest (0,1) = True; pairTest (1,0) = True; pairTest (a,b) = False
Prelude> pairTest (0,0)
False
Prelude> pairTest (0,1)
True
Prelude> pairTest (1,0)
True
Prelude> pairTest (1,1)
False
Prelude> pairTest (1,2)
False

However, all of the possible inputs to the piecewise-defined function must share some type (later on we'll talk about how to get around this restriction using "typeclasses"29). The following code is illegal because it implies that function f should accept both pairs and lists:

f (a,b) = do print "pair"
f (x:xs) = do print "list"

Here's the error you get in ghci:

Prelude> let f (a,b) = do {print "pair"}; f (x:xs) = do {print "list"};

<interactive>:1:36:
    Couldn't match expected type `(t, t1)' against inferred type `[a]'
    In the pattern: x : xs
    In the definition of `f': f (x : xs) = do print "list"

You can also use the wildcard _ in patterns (instead of using "dummy variables" for values that you don't need; this makes the code easier to read):

fstIs10 (10,_) = True
fstIs10 _ = False

In ghci:

Prelude> let fstIs10 (10,_) = True; fstIs10 _ = False
Prelude> fstIs10 (10,0)
True
Prelude> fstIs10 (9,0)
False

\newpage

Example: Quicksort

qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ 
               [x]                     ++ 
               qsort (filter (>= x) xs)

Or, equivalently, using list comprehensions instead of filter:

qsort []     = []
qsort (x:xs) = qsort [e | e<-xs, e<x] ++ 
               [x]                    ++
               qsort [e | e<-xs, e>=x]

I must say; out of the programming languages that I know so far, Haskell is definitely the best one for expressing quicksort.

Example: Revisiting path search in a graph

With the new syntax that we just covered, we can rewrite the path search algorithm much more concisely:

search edgeList src dst
  | src == dst = [src]
  | otherwise = searchEdgesOriginatingAtSrc edgeList

  where 
    searchEdgesOriginatingAtSrc [] = []
    searchEdgesOriginatingAtSrc ((currentEdgeSrc, currentEdgeDst):restOfEdges)
      | src == currentEdgeSrc =
        case search edgeList currentEdgeDst dst of
          []          -> searchEdgesOriginatingAtSrc restOfEdges
          successList -> src:successList
      | otherwise = searchEdgesOriginatingAtSrc restOfEdges

\newpage

Basic operations on functions

Anonymous function definitions

Sometimes you want to define a once-off function, that is, a function that will only be used once. In this case, for clarity, you may want to just define the function right where it will be used rather than defining it elsewhere in the source code. Also, you might not want to bother to give the function a name.

To do this, you write something like:

(\argument1 argument2 -> (the definition of the function))

For example, recall the addition example:

addition a b = a+b

The equivalent anonymous function would be written like this:

(\a b -> a + b)

This whole expression can then be used in place of whereever you would have used the word addition before. For example:

Prelude> (\a b -> a+b) 2 3
5
Prelude> let addition a b = a+b
Prelude> addition 2 3
5

Such nameless, once-off functions are called "anonymous" functions.

The example created an anonymous function with 2 arguments, but you can make anonymous function with just 1 argument, or with more than 2 arguments, in the same way.

They are useful as arguments to functions that take other functions as arguments, for example, map:

Prelude> map (\x -> x + 1) [1,2,3]
[2,3,4]

Converting prefix functions to infix and vice versa

You can use any prefix function as if it were an infix function, and vice versa.

To use a prefix function as an infix function, put backquotes around it. To use an infix function as a prefix function, put parentheses around it. Example:

Prelude> let addition a b = a+b
Prelude> addition 2 3
5
Prelude> 2 + 3
5
Prelude> 2 `addition` 3
5
Prelude> (+) 2 3
5

We'll be seeing a lot of this because infix functions have to be surrounded by parentheses if you are giving them as inputs to higher-order functions.

Higher-order functions

Higher-order functions are functions that operate on other functions. We've already seen some functions that take other functions as input; for instance, map and filter. Now I'll give some examples of functions that yield other functions as output.

Function composition

'.' is the function composition operator. It takes two functions as input and yields a new function as output. Specifically, the expression f . g yields a function which first does function g, and then does function f on the output of g. For example, if you have a list of numbers, and you want to first take their absolute value and then compute their square roots, you could either do it like this (without function composition):

Prelude> let li = [4.0,-9.0,-16.0,4.0]
Prelude> let li2 = map abs li
Prelude> let result = map sqrt li2
Prelude> li2
[4.0,9.0,16.0,4.0]
Prelude> result
[2.0,3.0,4.0,2.0]

Or you could do it like this (using function composition):

Prelude> let li = [4.0,-9.0,-16.0,4.0]
Prelude> let result = map (sqrt . abs) li
Prelude> result
[2.0,3.0,4.0,2.0]

The two input functions to '.' must each take a single argument.

flip

flip take a function as input and returns a new function as output. Specifically, it takes as input some function f, where f takes two arguments, and it returns a new function f' with the order of the arguments reversed (that is, (flip f) a b = f' a b = f b a). For example:

Prelude> (/) 5 10
0.5
Prelude> (/) 10 5
2.0
Prelude> (flip (/)) 5 10
2.0
Prelude> (flip (/)) 10 5
0.5

Note: for this to work, you have to put parentheses around '/'. (flip /) doesn't work30.

Partial function application

OK, this may seem a bit weird.

If a function takes a bunch of arguments, you can create a new function by giving values to some of those arguments, and leaving the rest unbound.

For example, addition takes two arguments. If we fix one of those arguments permanently at 1, and leave the argument open, then we have created a function (call it plusOne) that takes one argument, and adds 1 to that argument. Here's how to do it:

Prelude> ((+) 1) 2
3
Prelude> ((+) 1) 5
6
Prelude> let plusOne x = ((+) 1) x
Prelude> plusOne 2
3
Prelude> plusOne 5
6

(in other words, the function plusOne has the same effect as the function (\x -> x + 1) which we saw in an earlier example)

(note: remember that putting parentheses around an infix function makes it a prefix function. So, (+) is the prefix version of the + operator.)

Here's another example. In this example, we take map and fix its first argument (the mapping function) to plusOne -- but we leave the second argument open. The result is a specialized version of map -- a function which takes a list and returns the result of incrementing each element in the list by 1:

Prelude> let plusOne x = ((+) 1) x
Prelude> let incrementListElements = map plusOne
Prelude> incrementListElements [1,2,3]
[2,3,4]

We could have just used ((+) 1) instead of plusOne, since those expressions are equivalent (both of them represent a function which takes one argument and returns that argument plus one):

Prelude> let incrementListElements = map ((+) 1)
Prelude> incrementListElements [1,2,3]
[2,3,4]

You can do this with functions that take more than 2 arguments, too. Here's a function which tests to see if its third argument lies in between its first and second arguments:

Prelude> let isZInBetweenXandY x y z = (x < z) && (z < y)
Prelude> isZInBetweenXandY 0 10 5
True
Prelude> isZInBetweenXandY 0 10 15
False
Prelude> isZInBetweenXandY 0 10 (-1)
False

If we fix the first argument to 5, we get a function (call it isZInBetween5andY) which takes two arguments. The first argument of the new function is like the second argument of the original function, and the second argument of the new function is like the third argument of the original function:

Prelude> let isZInBetween5andY = (isZInBetweenXandY 5)
Prelude> isZInBetween5andY 10 7
True
Prelude> isZInBetween5andY 10 15
False
Prelude> isZInBetween5andY 10 4
False

We could also fix both of the first two arguments of isZInBetweenXandY to 5 and 10, respectively, yielding a function waiting for just one more argument (call it isZInBetween5and10):

Prelude> let isZInBetween5and10 = (isZInBetweenXandY 5 10)
Prelude> isZInBetween5and10 7
True
Prelude> isZInBetween5and10 15
False
Prelude> isZInBetween5and10 4
False

We can achieve the same result by starting with isZInBetween5andY and fixing its first argument:

Prelude> let isZInBetween5andY = (isZInBetweenXandY 5)
Prelude> let isZInBetween5and10 = (isZInBetween5andY 10)
Prelude> isZInBetween5and10 7
True
Prelude> isZInBetween5and10 15
False
Prelude> isZInBetween5and10 4
False

And there's no need to actually name the partially applied function (we just did that for readability):

Prelude> (isZInBetweenXandY 5 10) 7
True
Prelude> (isZInBetweenXandY 5 10) 15
False
Prelude> (isZInBetweenXandY 5 10) 4
False

In general, if there is a function f which takes __n__ arguments, you can give it just the first __i__ arguments, enclose the whole things in parenthesis, and you have a function which is like f but with the first __i__ arguments fixed, and which is still waiting to accept the other (__n__ - __i__) arguments.

Note that the syntax for normal ("total") function evaluation is just a special case of partial evaluation --- if you "fix" ALL of the arguments of a function, then it becomes a constant function, i.e. a value.

There's some syntactic sugar for partially applying infix functions. Example:

Prelude> (/2) 10
5.0
Prelude> (2/) 10
0.2

Thinking about the anonymous function syntax in terms of partial function application

Partial function application provides a neat way to think of the anonymous function syntax. If you know about "lambda calculus" from theoretical computer science, that's what we're about to discuss (if you haven't heard of lambda calculus, don't worry about it, we're going to explain the relevant part).

Let's say you define addition as an anonymous function:

(\x y -> x + y)

You can think of what happens when this expression encounters a value in the following way. The x y on the left (in between the backslash and the arrow) means that this expression "wants to eat" two values. When you feed it a value, it takes the leftmost variable in its "want-to-eat list", removes it from the list, and substitutes the value you fed it for that variable in the expression on the right. If the "want-to-eat list" is empty, then we remove the symbols \ -> also.

The way that you feed it a value is to put a value on its right. When it eats the value, the value disappears from the right. For instance, if you feed it a 3, here's what happens:

(\x y -> x + y) 3
= (\y -> 3 + y)

Even when we fully apply the function, we can look at it as a succession of partial applications:

(\x y -> x + y) 3 5
= (\y -> 3 + y) 5
= (\ -> 3 + 5)
= (3 + 5)

So, you see that the core of applying a function is really the substitution operation, where we substitute the arguments being given to the function for the argument symbols in the function body. Partial application just means that we substitute in for of the function arguments in the function body but also leave some other function arguments the way they were.

In this way we can see how normal function application can always be restated in terms of a succession of partial function applications. Keeping this in mind will make the type notation for functions make more sense.

\newpage

Type basics

Every expression in Haskell is assigned a type.

As we mentioned before, you can use ghci's :t command to find out the type of a given expression. The results is called a "type signature". For example:

Prelude> :t 'c'
'c' :: Char
Prelude> :t "a string"
"a string" :: [Char]

This means that the type signature of the expression 'c' is Char and the type signature of the expression "a string" is [Char].

[Char] means a list whose elements are of type Char.

The type signature of tuples is what you'd expect:

Prelude> :t ('a',"hi")
('a',"hi") :: (Char, [Char])

Types can be nested in all sorts of crazy ways:

Prelude> :t [("hello", ["goodbye"])]
[("hello", ["goodbye"])] :: [([Char], [[Char]])]

Type classes

If you ask for the type of a number, you get something unexpected:

Prelude> :t 1
1 :: (Num t) => t

This is our first encounter with a type class. Num is a type class. A number of types can be grouped together into a "type class" (the word "class" here is being used similar to the word "set" in mathematics). For any given type class, some types are part of the class, and the rest aren't.

The motivation for the invention of type classes is that a type class can be associated with a list of functions that must be implemented for each type in the class. So, type classes are a lot like interfaces in Java. If you know that a given type is part of a given type class, then you know that certain specific functions are defined with respect to data structures of that type. To give a concrete example, any type in the type class Num must support the addition operator (+). Num includes the types Int (integer) and Float, among other types. So, if you find yourself with two expressions of type Num, then you are guaranteed31 to be able to add them together.

Type classes are sort of related to "classes" in object-oriented languages, although not as closely as to interfaces. In object-oriented languages, "objects" are individual instantiations of data structures, and objects are instances of classes. By contrast, the instances of a type class are __types__.

Type classes often appear in type signatures. Here's what they mean. We saw that the type signature of 1 is (Num t) => t. The t here is just a variable; it may as well have been a or x.

1 :: (Num t) => t might be read as saying something like, "The type of 1 is some type t, where t belongs to the type class Num". What that means is that the type of 1 is undetermined except that it is constrained to be some type within the Num typeclass.

So, when you enter a numerical constant in Haskell, it enters the world not as an Int or a Float or anything else; it starts out being of undetermined type. Strange but true. However, it is constrained to be some type within the Num typeclass, which is why you are allowed to apply operations like addition to it.

Type signatures of functions

Example:

Prelude> :t not
not :: Bool -> Bool

We'll go through this piece by piece.

The not :: means that the expression not is the thing whose type we are examining.
The Bool -> Bool means a function which takes something of type Bool as input, and returns something of type Bool.

Next we'll look at a function that returns an output of a different type than its input:

Prelude> :t length
length :: [a] -> Int

This says that length is a function whose input must be a list of type a, where a can be any type, and whose output has type Int. So, in other words, the input to length must be a list, but it doesn't matter what type of element is in the list, and the output of length will have type Int.

Now we'll look at a function whose type signature involves a type class.

Prelude> :t abs
abs :: (Num a) => a -> a

We'll go through this piece by piece.

The (Num a) => means, "in the following type signature, __a__ must be in the type class Num".
The a -> a means a function which takes something with type a as input, and returns something of type a.

Putting it together, this says that abs will accept as input a single argument of any type, provided that that type is in the Num type class, and that the output of abs will be of the same type as the input.

Note that a -> a doesn't mean that abs returns the same __value__ that it was given! It just means that whatever it returns will have the same __type__ as the thing that it was given.

Type signatures of functions which take multiple inputs

Now let's look at boolean AND (&&), which is a function which takes two inputs of type Bool, and produces one output of type Bool

Prelude> :t (&&)
(&&) :: Bool -> Bool -> Bool

Weird, huh? The first thing to say about this is that there are implicit parentheses which are left out of this expression. It is assumed for type signatures that the arrows associate to the right unless there are parentheses that say otherwise. If we write the parentheses explicitly it would be:

(&&) :: Bool -> (Bool -> Bool)

Now, the key to understanding this is to think of partial function application. Imagine that we had the expression:

(&&) True False

As noted before, when a function takes multiple arguments, normal function application can always be thought of as a successive series of partial function applications. So, the above expression is equivalent to:

((&&) True) False

where ((&&) True) means the function that you get if you start with (&&) and then fix its first argument to True, leaving the second argument unbound.

So, think of how (&&) behaved right there. It started out as a function that took two arguments. But then we fed it an argument, and as a result we got a function that took only one argument.

So, (&&) can equivalently be thought of as a function that takes __one__ argument as input, and then produces another function as output.

What is the type of the function ((&&) True)? Well, this function takes one argument of type Bool, and returns something of type Bool. So, it must have type Bool -> Bool. And indeed we see that this is right:

Prelude> :t ((&&) True)
((&&) True) :: Bool -> Bool

So, going back to the original function (&&), what type is it (if we consider what happens when we partially apply just a single argument)? Well, it takes as input something of type Bool, and then it gives us back a new function which has type Bool -> Bool. So, (&&) is a function that goes from type Bool to type Bool -> Bool. So, its type is therefore:

Bool -> (Bool -> Bool)

And, remembering that the arrows are assumed to associate to the right, we see that this is equivalent to the reported type signature for (&&):

Bool -> Bool -> Bool

Generalizing our reasoning

In general, the procedure for figuring out the type of something like a function is:

Figure out the type of the first argument that it takes.
Say that that type was A. Now you know that the type of the whole thing is A -> X, where X is a placeholder for the type of the thing that you get when you apply the first argument.
Now recurse; use the same procedure to figure out the type of X.

In the previous example, our reasoning followed that procedure:

The first argument that (&&) takes is type Bool
Therefore, the type of (&&) must be Bool -> X, where X is the type of an expression that you get from applying an argument to (&&) (an example of such an expression is ((&&) True)).
Now we recurse. What is the type of ((&&) True)?
Well, the first argument that ((&&) True) takes is type Bool, and it returns Bool, so it has type Bool -> Bool
Substituting X = Bool -> Bool into Bool -> X, we see that the type of (&&) is Bool -> (Bool -> Bool)

Types of functions that take functions as arguments

Example:

Prelude> :t map
map :: (a -> b) -> [a] -> [b]

Including the implicit parentheses, this type signature could be written:

(a -> b) -> ([a] -> [b])

For example, consider

Prelude> let showInt :: Int -> String; showInt i = show i
Prelude> :t showInt
showInt :: Int -> String
Prelude> map showInt [2,5,7] 
["2","5","7"]
Prelude> (map showInt) [2,5,7]
["2","5","7"]
Prelude> :t (map showInt)
(map showInt) :: [Int] -> [String]

We can see that map takes one argument, which is a function of type a -> b (in our case a is Int and b is String), and yields another function with type [a] -> [b] (in our case, this other function is (map showInt)). This explains why map's type signature is:

(a -> b) -> ([a] -> [b])

Explicitly specifying types

In order to explicitly specify the type of some expression, write something like:

expression :: typeName

for example:

1 :: Int

is an expression for 1, with type Int.

In order to explicitly specify the type of some named function, write something like this:

square :: Int -> Int
square x = x*x

When you explicitly specify the type for some expression, it must match (i.e. be more specific than) the type that Haskell would infer for the same expression.

Infix operators must be surrounded by parentheses (i.e. converted to prefix form) in type declarations.

It is recommended32 that you explicitly specify the types of the outermost functions in your programs.

Type synonyms

The type command lets you define syntactic sugar for oft-used types. For example, if you were writing a program in which you often passed around lists of 3D coordinates, where each coordinate was a Double, then you might get tired of typing [(Double,Double,Double)]. So you could do:

type List3D = [(Double,Double,Double)]

An oft-used type synonym is:

type String = [Char]

\newpage

The IO types

When you take the type of a function which does input or output, it yields something of type IO something, where "something" is the type that you would "expect" that function to yield. Examples:

Prelude> :t putStrLn
putStrLn :: String -> IO ()
Prelude> :t getLine
getLine :: IO String

You'd expect that putStrLn wouldn't return anything (in some languages you might call this a return type of 'void', but in Haskell we'd say it's '())'). You'd expect that getLine would return a string.

Even some functions which don't seem to actually do IO have this in their type:

Prelude> :m Random
Prelude Random> :t randomRIO
randomRIO :: (Random a) => (a, a) -> IO a

What's happening here? Isn't the result of getLine a string? Isn't the result of randomRIO a number?

No, it isn't. Note that whenever we actually used these functions in examples, we used them within do blocks, and we used the <- operator to assign their result. For example:

Prelude> do {putStrLn "Enter a name"; name <- getLine; putStrLn ("Hi " ++ name)}
Enter a name
Bayle
Hi Bayle

You can think of the <- operator as removing the "IO" from the type of a value. Whenever you want to use a function with an "IO" in its return type, you'll need to use the <- operator if you want to get at the "actual" return value.

Passing values outside of a do block

The <- operator only works within do blocks. How do you pass a value out of the do block? That is, how do you send a value from somewhere inside of a do block to somewhere outside of the do block, so that that value can be used somewhere else?

Well, you can send values out of do blocks containing I/O, but those values are required to be one of those IO something types! You're not allowed to smuggle any "naked" values out of an I/O-containing do block.

A do block returns the value of the expression on its last line33. If the last expression itself yields an IO something type, all is well:

Prelude> :t do {getLine}
do {getLine} :: IO String

If you try to return something of a "naked type" out of a do block containing I/O, however, there's a type error34:

Prelude> :t do {putStrLn "hi"; True}

<interactive>:1:19:
    Couldn't match expected type `IO t' against inferred type `Bool'
    In the expression: True

Incidentally, it is illegal for the last line to be a <- expression; these things are so weird that they're technically not even considered to be expressions:

Prelude> :t do {name <- getLine}
<interactive>:1:4: The last statement in a 'do' construct must be an expression

return

If you want to return something from a do block containing I/O, and you are willing to submit to the indignity of changing its type from something to IO something, you can use the return construct:

Prelude> :t do {putStrLn "hi"; return True}
do {putStrLn "hi"; return True} :: IO Bool

return is kinda the reverse of <-; it changes a value of type something into a value of type IO something.

Unlike in some other programming languages, return does not mean anything like "exit this function and return this value".

Is there any way to get the IO out of the type permanently?

No. The reason is that Haskell wants to clearly segregate all computation which is causally linked to the outside world35. This is related to monads.

We'll explain why soon, but in order to have the background to do that we need to explain "referential transparency".

\newpage

Referential transparency

A referentially transparent expression is one that can be replaced with its value without changing the behavior of the program. So, referential transparency means that, if you compute the function f(x) and find that f(x) = y, then every time you encounter f(x) in the future, you can just replace it with y.

This may sound like it is always the case, but here's some examples where it's not.

Failure of referential transparency: mutable variables

Consider the Python program:

a=1;
c=a*5;

a=5;
d=a*5;

In Python, this results in c=5 and d=25. But if we assumed that we had referential transparency, then as soon as the interpreter saw "a=1", it would be allowed to substitute "1" for every occurence of the expression "a" from then on. And then at the end we'd have d=1*5 = 5.

So, in this case, if (after the first assignment to a) we had replaced every instance of the expression "a" with its value, then we would have changed the behavior of the program.

In general, if you ever find an expression that can return different things at different times, then that expression is not referentially transparent (in this case, "a" was shown to be such an expression).

Failure of referential transparency: hidden state

Consider this Perl program:

srand(0);
print "\n";
print rand();
print "\n";
print rand();

The function rand(), which calls a pseudorandom number generator, returns a different result upon each call, even though it was passed the same arguments each time. This is because the result of the rand() function doesn't just depend on the arguments that you pass it; it also depends on some __hidden state__ which is altered upon each call.

Failure of referential transparency: input

In Python, the input() function reads input from the user. If the interpreter evaluated input() once, and then after that, whenever it saw input(), it just substitued the return value it got the first time without actually calling input() again, that would change the program's behavior.

Failure of referential transparency: output

In Python, the print command doesn't return anything. So, it doesn't return different things at different times. Still, it is not referentially transparent. If the interpreter just replaced every print command with nothing, that would change the program's behavior.

Side effects

In the last example, referential transparency failed not because the return value of an expression was not constant, but because print has a side-effect. Side-effects are when the evaluation of a function has a "side-effect" beyond the computation of the return value. For instance, in Perl, consider the statement:

$a = (print "hi");

This statement does two kinds of things. First, it calls the print function, which computes a return value, which gets put into variable a. But the call to print also has a __side-effect__: the argument is printed to the console.

Referential transparency: summary

The essence of referential transparency is that functions don't care about anything but their arguments and their return value. The point is that referentially transparent expressions and functions are just normal expressions and functions like you find in mathematics; they are fixed mappings from inputs to outputs. None of this funny business that you see in computer programming like x=x+1 or expressions like print that __do__ things when you evaluate them; if they're referentially transparent, they're just mathematical expressions.

\newpage

I/O, do blocks, and monads

The IO types segregates non-referentially transparent computations

Haskell tries to keep referentially transparent expressions segregated from other code. In this way, it's clear which expressions can be relied upon to be referentially transparent, and which cannot. In Haskell, we call referentially transparent functions "pure functions", and we call other functions "actions".

This segregation is achieved by marking actions as type IO something.

Actions are a type of value; they don't "do" anything right away

In Haskell, just as pure functions are first-class objects, so are actions. For example, the type of putStrLn is String -> IO (). This means that putStrLn takes a String as input and returns something of type IO () as output. IO () is an example of an "IO action". This should be thought of as a representation of the action of printing out that string to the console.

This action can be combined with other actions to make a sequence of actions. For instance, you might combine putStrLn "Line 1" with putStrLn "Line 2" to make a composite action that prints out two lines to the console. The way that you combine these two actions into a third action in Haskell is:

myCompositeAction = do
                      putStrLn "Line 1"
                      putStrLn "Line 2"

In this example, the type of myCompositeAction is also IO (). So, the function myCompositeAction doesn't "do" anything; it merely returns an IO action (which, in this case, happens to be a combination of two smaller actions).

How does anything get done, then? The function main is special. main returns an IO action, and then Haskell actually executes that IO action.

So, you should think of all of your functions which use IO as pure functions which operate on actions. Their job is to prepare the final IO action that will be returned by main and then36 executed by Haskell.

IO actions aren't just predetermined sequences of input and output. Later subactions can conditionally depend on earlier subactions. For example, recall the doGuessing subroutine in our random number guessing program:

doGuessing num = do
  putStrLn "Enter your guess:"
  guess <- getLine
  let guessNum = read guess
  if guessNum < num
    then do 
          putStrLn "Too low!"
          doGuessing num
    else if read guess > num
            then do 
                  putStrLn "Too high!"
                  doGuessing num
            else do 
                  putStrLn "You Win!"

The type of doGuessing is (Ord a, Read a) => a -> IO (). This means essentially that this is a subroutine that takes a single input (a number37), and produces an IO action as an output. So, IO actions can be entire programs.

And main returns an IO action, which is then run. In other words, your program, considered as a whole, IS an IO action.

IO and monads as syntactic sugar for explicit labeling of OutsideWorld influence

We aren't going to explain yet what monads are, but here we will explain what monads are used for in the context of IO38:

I/O operations aren't referentially transparent because they interact with the outside world, yet this interaction isn't reflected in their formal inputs and outputs. For instance, a putStrLn statement doesn't return anything or change any variables, yet it changes the outside world. Similarly, a getLine statement both changes and is changed by the outside world, yet the outside world isn't reflected in its input arguments. So we could substantially improve the situation if we could explicitly pass around a symbol that represents the outside world. Maybe the signatures of putStrLn and getLine should be the following:

putStrLn :: OutsideWorld -> String -> OutsideWorld
getLine :: OutsideWorld -> (OutsideWorld, String)

Instead of writing:

putStrLn "What's your name?"
name <- getLine
putStrLn ("Hello, " ++ name)

we'd write:

main outside = 
  let outside2 = putStrLn outside "What's your name?"
      (outside3, name) = getLine outside2
  in putStrLn outside3 ("Hello, " ++ name)

This correctly models the dependencies of the computation on the various interactions with the outside world. However, it is hard to read. Also, what should the computer do if you give it this?:

main outside = 
  let outside2 = putStrLn outside "What's your name?"
      (outside3, name) = getLine outside2
  in putStrLn outside2 ("Hello, " ++ name)

In that example, we passed outside2 in the last call to putStrLn; but we should have passed outside3.

Haskell address these problems by expressing program I/O using a weird formalism that allows them to essentially follow the idea embodied in these examples (they pass along a symbol representing the outside world) without the programmer having to write it that way. This solution is called monads. Monads can be implemented within Haskell with no special handling from the compiler39 (although it's nice if the compiler optimizes them).

Monads can also be used for other things besides I/O.

The do block notation is syntactic sugar for monads. do notation allows us to write code that looks like a sequence of imperative I/O commands, and then internally transforms it into a set of interdependent expressions which represents dependencies on the outside world by passing along an I/O symbol40.

So, in summary: programs that communicate with the outside world are first-class objects called "IO actions". When you write a function that does I/O, its return type will be an IO action. do blocks are the syntax that Haskell uses for composing actions together into larger actions. The return type of a do block is an action. Under the hood, do blocks are syntactic sugar for a construct called "monads", which we haven't explained yet. But, in the case of IO actions, monads can be thought of as a pattern for writing code so that, technically, a symbol representing the "outside world" is passed to and from our IO functions, without cluttering up our function calls by actually writing that symbol.

\newpage

Detour: What is Haskell?

(or: words that programming language nerds use to describe Haskell)

Haskell is a functional programming language.

When I heard that, I assumed that that was because Haskell lets functions be "first-class objects", and because Haskell provides you with cool higher-order operators to manipulate functions with. Both of these things are true, and I think both tend to go together with functional languages, but they don't get to the heart of the meaning of the word 'functional'.

Functional languages are often contrasted with imperative languages. Imperative languages tell the computer what to do. They tend to be step-by-step lists of instructions. To give an example in Python,

def factorial(k):
   x = 1
   i = 1
   while (i <= k): 
      x = x*i
      i = i+1
   return x


print(factorial(5))

The function factorial says something like:

__Set the contents of memory location "x" to 1. Then set the contents of memory location "i" to 1. Then, as long as the contents of memory location "i" is less than or equal to "k", do the following: multiply the contents of memory location "x" by the contents of memory location "i", then increment the contents of memory location "i".__

}}}By contrast, in Haskell, the focus is not on telling the computer a sequence of instructions, but rather on defining the functions that you want computed. Functions are defined much as they are in mathematics. Here's how you might define a function <code>factorial</code> in Haskell:

{{{
factorial 0  = 1
factorial n = n*(factorial (n-1))

This sort of thing is Haskell's focus, and that is the reason that Haskell is called "functional". In Haskell, running a program corresponds to evaluating a function.

But Haskell has an imperative side too. It has to; a mathematical function just defines a mapping from inputs to outputs. You can define all the functions you want, and the computer might then be able to compute the output of any of those functions for any particular input, but if that was all you did, the computer would just sit there, because you haven't told it to __actually__ evaluate any particular function on any particular input (like a C program with lots of function definitions but an empty main()). Also, mapping inputs to outputs doesn't provide a notion of __actions__ (example: the action of printing something to the screen).

So, just like "imperative" languages, Haskell does provide a facility for giving the computer a sequence of instructions. But the code that does this is clearly separated from the other code42. The imperative commands are found only within do blocks43. Here's a program that is equivalent to the Python program above:

factorial 0 = 1
factorial n = n*(factorial (n-1))

main = do print(factorial 5)

Haskell is a lazy, pure functional language with static, strong typing

Lazy means that expressions don't get evaluated until (and __unless__) they have to be. Contrast with C, a "strict" language; in C, each expression is evaluated as soon as it is encountered.

One neat thing about a lazy language is that you can create expressions which would take up an infinite amount of space or time to completely evaluate. For instance,

Prelude> do {print (take 5 [1..])}
[1,2,3,4,5]

is Haskell for "print out the list consisting of the first five elements of the list [1..]", where [1..] is an infinite list consisting of all of the integers. It runs fine, because Haskell never actually tries to construct the infinite list.

When you define an expression in Haskell, Haskell doesn't actually try to evaluate it; it just remembers, "ok, if I ever need the definition of that expression, here's where to find it". Only when the expression is needed for I/O (or is needed in order to evaluate some other expression that is needed for I/O) does it get evaluated.

Laziness allows you to define infinite data structures, which are often cleaner to define and work with (see section Circular function definitions for infinite data structures, below). L

aziness also allows you to do things like make a lazy database query that, if not lazy, would return a result too large to fit into memory; and then to retrieve the results one-by-one (or in small sets) (note: Haskell's laziness won't automatically make an eager database query into a lazy one, your database framework would have to do that; but this is a good example of a situation in which laziness can be helpful).

In some cases, laziness can make it easier to reuse modular code; for example, if you are doing a search through state space, you can construct the state space lazily and pass it to some other routine, and the other routine can decide to do breadth-first search or depth-first search or what have you. This is more reusable because the construction of the state space is completely modularized from its exploration. This separation also tends to make the code clearer.

See this essay for more on the value of laziness: http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html

Laziness has a price. When you pass around a variable with a lazy value that has not been fully evaluated, it has to keep track of the state of the partial evaluation, which can incur considerable memory overhead. In some cases the programmer must be aware of this sort of thing and plan carefully in order to avoid out of memory errors (see http://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl%27 for an example). There is also some extra CPU processing needed to keep track of this stuff.

Pure means that code in Haskell __(except within "do" blocks44!)__ satisfies __referential transparency__. We'll define45 that in the next section.

Static typing means that type-checking is computed at compile-time rather than at run-time.

Strong typing is used in different ways by different people, but what I mean here is that you can't do something like a typecast in C; that is, you can't tell Haskell to let you get away with type violations.

Pros and cons of Haskell

Since I haven't actually programmed in Haskell much yet (todo), all I can do is repeat what I've heard from others.

Pros

Can change the way you think about programming46
Concise syntax
Higher-order functions combined with lazy evaluation allows one to modularize and reuse code in new ways47
Faster to program48
Extensible49
Good community50
Good for Domain Specific Embedded Languages51
Referential transparency makes program correctness proofs easier52.
Referential transparency means that all control flow dependencies are explicitly represented in the code. In theory I think this means that the compiler might be able to more easily parallelize the code53.

Cons

Hard to debug54
Slower than C55
Hard to optimize56
Hard to learn57

\newpage

Circular function definitions for infinite data structures

Because Haskell has a lazy evaluation strategy, you can use circular definitions in functions to create infinite data structures. Since Haskell is lazy, it will only construct as much of the data structure as it immediately needs. Therefore, Haskell will only go through as many "loops" around the circular definition as it needs to construct the part of the data structure that it immediately needs.

Example: infinite list of ones:

Prelude> let ones = 1 : ones
Prelude> take 5 ones
[1,1,1,1,1]

Example: positive integers:

Prelude> let intsStartingWith n = n :  (intsStartingWith (n+1))
Prelude> let positiveInts = intsStartingWith 1
Prelude> take 5 positiveInts
[1,2,3,4,5]

Example: fibonacci sequence:

Prelude> let fib = 1:1:[a+b | (a,b) <- zip fib (tail fib)]
Prelude> take 7 fib
[1,1,2,3,5,8,13]

Let's talk about how fib works. First, we note that the fib defined above is equivalent to this:

fib = 1:1:fib';
fib' = [a+b | (a,b) <- fib'']
fib'' = zip fib (tail fib)

fib is a list. When Haskell needs the first element of the list, it returns 1. When Haskell needs the second element of the list, it returns 1. When Haskell needs the second element of the list, it notes that the third element of fib is the first element of fib'.

So, now Haskell tries to compute the first element of fib'. The first element of fib' is a+b where (a,b) is the first element of fib''.

We see that the first element of fib'' is the first two elements of fib, wrapped up into a pair. And we already know what those are (1 and 1). So we can compute that the first element of fib'' is (1,1).

And now we can compute that the first element of fib' is 2. And now we can compute that the third element of fib'' is 2.

The process of computating later elements of fib is similar.

\newpage

Custom data types

Custom data structures can be defined using the data keyword. Here's an example that holds a pair of integers:

data PairOfIntsType = PairOfInts Int Int

Here's what this means. First, we have the keyword data. Now, we have the name of a new type. Then an equals sign. Now we define a constructor, which is both a declaration of a data structure, and the name of the function used to construct values of this structure.

This constructor is named "PairOfInts?" and it holds two Ints. It has type "PairOfIntsType?".

If you're familiar with constructors from object-oriented programming, these are not the same sort of thing -- the word is being used in a different way. In object-oriented languages, object constructors are functions that not only create a data structure, but can also contain code that performs initialization on the new data structure. Haskell constructors contain no code. It's easier to think of them as cupboards58 rather than normal functions -- all that you do with them is put data in them, and then take it back out later.

The part in between data and = is sometimes called a "type constructor" -- in this case the part to the right of the = is called a "data constructor" to distinguish them.

As far as I know59, you can't use the data keyword in ghci; you have to use it in a source code file and then load the file.

Here's how you store values in the new data structure and retrieve them later:

*Test> :t PairOfInts
PairOfInts :: Int -> Int -> PairOfIntsType
*Test> let a = PairOfInts 2 3
*Test> :t a
a :: PairOfIntsType
*Test> let firstIntOfPair (PairOfInts x y) = x
*Test> let secondIntOfPair (PairOfInts x y) = y
*Test> firstIntOfPair a
2
*Test> secondIntOfPair a
3
*Test> let b = PairOfInts 5 2
*Test> firstIntOfPair b
5
*Test> secondIntOfPair b
2

As you can see, PairOfInts? is the name of a function which takes two Ints and returns a value of type PairOfIntsType. We use this function to store pairs of integers in the form of a PairOfIntsType value. Then we use pattern matching to define functions that extract the integers out of a.

Note that Haskell doesn't know how to print the contents of a user-defined data type, and therefore the interpreter doesn't know how to display it:

*Test> a

<interactive>:1:0:
    No instance for (Show PairOfIntsType)
      arising from use of `print' at <interactive>:1:0
    Possible fix: add an instance declaration for (Show PairOfIntsType)
    In the expression: print it
    In a 'do' expression: print it

(if you are wondering, Show is a type class; types must be members of the Show class to be able to be printed. In Java, Show might have been named Showable. We'll discuss how to make a user-defined type a member of the Show class later.)

Enumerated sets

Examples in this section are some of the text is taken from YAHT.

You can use datatypes to define things like enumerated sets, for instance, a type which can only have a constrained number of values. We could define a color type:

data Color
    = Red
    | Orange
    | Yellow
    | Green
    | Blue
    | Purple
    | White
    | Black

Instead of having a single data constructor on the right side of the equals sign, we have a bunch of them, separated by the | symbol. And instead of these data constructors taking argument, as with

We could then write a function to convert between a Color and a RGB triple. For example:

colorToRGB Red    = (255,0,0)
colorToRGB Orange = (255,128,0)
colorToRGB Yellow = (255,255,0)
colorToRGB Green  = (0,255,0)
colorToRGB Blue   = (0,0,255)
colorToRGB Purple = (255,0,255)
colorToRGB White = (255,255,255)
colorToRGB Black = (0,0,0)

Now we can do things like:

*Test> let c = Orange
*Test> colorToRGB c
(255,128,0)

If you are still stuck on thinking of constructors as in objected-oriented languagees, where a "constructor" is an initialization function, you'll be surprised to see that in Haskell, a value of a user-defined type "remembers" which constructor it was constructed with.

A better way of looking at it is that the constructor, along with any arguments that it takes, __is__ the value. There are only 8 different values which are of Color; the values of type Color are Red, Orange, Yellow, Green, Blue, Purple, White, Black. With other types, such as PairOfIntsType, two example values are (PairOfInts 5 2) and (PairOfInts 2 3).

In fact, the data type Bool is defined like this in the Prelude:

data Bool = False | True

Constructors can have different type signatures

In the previous examples, PairOfIntsType had a constructor that took two Ints, and Color had 8 different constructors, none of which took any arguments. But you can mix and match, too. For example60,

data Color
    = Red
    | Orange
    | Yellow
    | Green
    | Blue
    | Purple
    | White
    | Black
    | Custom Int Int Int -- R G B components


colorToRGB Red    = (255,0,0)
colorToRGB Orange = (255,128,0)
colorToRGB Yellow = (255,255,0)
colorToRGB Green  = (0,255,0)
colorToRGB Blue   = (0,0,255)
colorToRGB Purple = (255,0,255)
colorToRGB White = (255,255,255)
colorToRGB Black = (0,0,0)
colorToRGB (Custom r g b) = (r,g,b)   -- this one is new

To use this,

*Test> let a = Red
*Test> let b = Custom 120 240 100
*Test> colorToRGB a
(255,0,0)
*Test> colorToRGB b
(120,240,100)

Recursive data types

Example: binary tree:

data BinaryTreeOfIntsType = Leaf Int
                            | Branch BinaryTreeOfIntsType BinaryTreeOfIntsType

Example function on the new type:

binaryTreeHeight Leaf _ = 1
binaryTreeHeight Branch x y = max (binaryTreeHeight x) (binaryTreeHeight y)

Usage example:

*Test> let a = Leaf 5; b = Leaf 6; x = Branch a b
*Test> binaryTreeHeight x
2

Example: alternative definition of [Int]:

data MyListOfInts = EmptyList | Cons Int MyListOfInts

Example function:

listLength EmptyList = 0
listLength (Cons x xs) = listLength xs + 1

Usage example:

*Test> let l = Cons 3 (Cons 2 (Cons 1 EmptyList))
*Test> listLength l
3

\newpage

Type class basics

Recall that a type class is a set of types that share the property that certain functions are defined for every type in the type class.

How to declare a type class

The class keyword defines a type class.

Example: say you were writing a game and you had a bunch of characters and objects in the game, each of which had a position, a color, and a velocity. You might define the following class61:

class PhysicalObject a where
    getPosition :: a -> (Int,Int)
    getVelocity :: a -> (Int,Int)
    getColor :: a -> Color

We can read this type class definition as "There is a type class 'PhysicalObject?; a type 'a' is an instance of PhysicalObject? if it provides the following three functions: . . . ".

Classes can also provide default functions. For example, perhaps many of the PhysicalObjects in the game are immobile; in which case the class declaration might include:

class PhysicalObject a where
    getPosition :: a -> (Int,Int)
    getVelocity :: a -> (Int,Int)
    getColor :: a -> Color

    getVelocity a = (0,0)

Now, for any type in the class, the function getVelocity always returns 0 -- unless getVelocity is redefined for that type.

Classes have something called "minimally complete definitions". Since some functions might be defined by default in the class, if you want to make a type a member of a given type class, you don't necessarily need to define all of the functions of that type class -- only the ones which are not defined by default62.

In English-language discussions about Haskell programs, one way to express what the functions of a type class are is just to write the part of the type class declaration that gives the function signatures. For example, here is the type class Show:

class Show a where
  showsPrec :: Int -> a -> ShowS
  show :: a -> String
  showList :: [a] -> ShowS

How to make a type a member of a type class

The instance keyword asserts that a type is a member of a type class.

We'll use the type class Show as an example. Note that the function print requires its input to be of some type in the type class Show:

Prelude> :t print
print :: (Show a) => a -> IO ()

So it will be useful for debugging purposes to make user-defined data types members of the Show type class whenever possible. We will equip our user-defined data type from a previous example, PairOfIntsType, with the Show type class.

The minimally complete definitions for Show is to define either showsPrec or show. We'll define show in this example:

data PairOfIntsType = PairOfInts Int Int

instance Show PairOfIntsType where
    show (PairOfInts a b) = "PairOfInts(" ++ show(a) ++ ", " ++ show(b) ++ ")"

Here's how it looks:

*Test> print (PairOfInts 3 4)
PairOfInts(3, 4)

shortcut: derives

For "sufficiently simple" data types, the compiler can automatically figure out how to make that data type a member of certain type classes, saving you the effort of writing the class functions and the instance declaration. You do this using the deriving keyword in the data declaration. Here's how to tell the compiler to "derive" class Show for our PairOfIntsType:

data PairOfIntsType = PairOfInts Int Int
       deriving Show

Here's how it looks:

*Test> print (PairOfInts 3 4)
PairOfInts 3 4

\newpage

Monads

Formal definition

I don't expect this to mean very much yet, but for future reference, I'm going to define monads straight away63. You can skip this subsection and come back to it later; just remember that there is a data structure that we'll call m, that there are two functions return and >>=, and that there are some laws that constrain how these functions should interact.

A monad consists of any data type M (it doesn't have to be named "M", that's just a placeholder) which is a member of the Monad type class, an abbreviated version of which reads (note that the >>= function is infix, by the way):

class Monad m where
  return :: a -> m a
  (>>=)  :: m a -> (a -> m b) -> m b

and where, furthermore, the effects of the functions return and >>= satisfy the following "laws" (or rules):

The left unit law

(return x) >>= f
            =
f x

The right unit law

m >>= return
    =
m

The associativity law

(m >>= f) >>= g
       =
m >>= (\x -> (f x >= g))

The >>= function is pronounced "bind".

You might be confused by the appearance of types such as "m a" and "m b" in the Monad typeclass function signatures. These are "polymorphic types", which we're not going to explain until after this chapter on monads. For now, you can just pretend that the function signatures look like this:

return :: a -> M
(>>=)  :: M -> (a -> M) -> M

For notational convenience, we'll often use the placeholder __m__ to represent some expression or value of type M, where M is a member of the Monad type class.

The intuition behind monads

Monads can be used to do various things, and in each case, the two functions have a slightly different intuition. But the common intuition is basically the following.

Monads take an ordered chain of expressions and combine them in some way. It might combine them by executing the chain in order as a sequence of commands, or it might do something different with them. The data structure __m__, whose type is a Monad, can be used by the combining machinery to store its state while the individual expressions are being evaluated.

The bulk of the combining machinery is often in the function >>= ("bind"). Intuitively, this function takes two expressions as input, and "combines" them in some way. Formally, however, it takes two inputs of different types; the input on the left is an expression that doesn't take any inputs, and that returns an __m__, whereas the input on the right is a function that takes one ordinary value as input and then returns an __m__.

The return function converts an ordinary value to a value of the M type.

Intuitions for use-cases of monads:

(a) "interceptors": a way for a metaprogrammer to extend the language by arbitrarily doing any computation, even a stateful one, interspersed between every two step of the programmer's code

the >>= operator, which the monad author (the metaprogrammer) gets to define, is applied between each two lines within a "do". The monadic value can be whatever the metaprogrammer wants; the programmer isn't allowed to look inside it or change it -- so the metaprogrammer can save arbitrary state in between lines of the 'do'. In order to get the final result of the monad (since Haskell is based on every function returning a value), the programmer puts a "return" at the end of the "do" -- the metaprogrammer can define this however he likes, the value returned by "return" can be an arbitrary transformation of the monadic value.

(b) "contamination markers": a flexible way to mark certain values/certain parts of code (especially impure code) as "contaminated" via the type system -- e.g. I/O is impure so it is "contaminated" and we mark that with the IO monad. We can do whatever we like inside the I/O-flavored 'do' s (that is, we can do whatever we like with an IO monadic value), and we can do whatever we like with pure functions outside the I/O section, but the I/O monadic type allows us to use to type system to track which code is pure and which is contaminated with I/O -- e.g. no function can access IO input, or the results of any computation that accessed IO input, unless the type of the return value of that function is IO monadic.

(c) a way to lift types and functions on those types into a larger universe of discourse. E.g. consider the Maybe monad. If we have the type Int, now we also have the value Maybe Int. And, using bind, we have a way to translate any function that operates on Int into a function that operates on Maybe Int.

But now we can also write functions that take two arguments, one of them an Int and one of them a Maybe Int, so we have a larger universe of discourse.

(d) "side-effect domains": another way to look at the "contamination markers" intuition; each monadic value is like a subdomain within which side-effects may be permitted in a controlled fashion

(e) "special containers": monads are simply encapsulated containers for values

Monads for passing state

Recall that the reason that monads are related to I/O is that we want to pretend that we are passing around a symbol like RealWorld which represents the state of the world -- without actually having to put the symbol in the function arguments.

In this section we'll show how a state can be passed around, without giving the state as an argument to all of the functions, by using the functions from the Monad typeclass64. Rather than using I/O in the example, however, we'll talk about a function that visits each node in a binary tree, and retains state across its visits.

Since there are various bits of code throughout this section, a copy of the final version of the code may be found at the end of the section.

Here's a binary tree datatype, along with an example function that visits (and modifies) each node in the tree -- but this example function does not retain state in between nodes:

data Tree
  = Leaf Int
  | Branch (Tree) (Tree)

type FunctionOnTreeLeaves = Int -> Int

mapTree :: FunctionOnTreeLeaves -> Tree -> Tree
mapTree f (Leaf a) = Leaf (f a)

Now, what if we want to retain state in between nodes? For instance, what if we want to assign unique numbers to each of the leaves of the tree, ordered as we traverse the tree from left to right? The simplest thing to do is just to pass the state around.

So instead of using the stateless FunctionOnTreeLeaves type, we need to use something like a "StatefulFunctionOnTreeLeaves?" type. But that's a mouthful, so we'll call this type TransformationFunction instead.

For convenience, we'll define a type "State" for expressions concerning the current state. Since we're just counting, the State is just an integer:

type State = Int

type TransformationFunction = (Int -> State -> (State, Int))

mapTreeState :: TransformationFunction ->
                Tree -> State -> (State, Tree)
mapTreeState f (Leaf oldValue) state =
    let (state', newValue) = f oldValue state
    in (state', Leaf newValue)
mapTreeState f (Branch lhs rhs) state =
    let (state' , lhs') = mapTreeState f lhs state
        (state'', rhs') = mapTreeState f rhs state'
    in (state'', Branch lhs' rhs')

As you can see, the function mapTreeState takes three inputs; a transforming function, a tree, and an initial state. It returns a tuple containing two output: the final state, and the transformed tree.

The transforming function takes two inputs and returns a tuple containing two outputs. The inputs are the old value of a leaf and also the state, and the outputs are the new state and the new value of the leaf.

When mapTreeState encounters a branch, it calls itself for each side of the branch. When mapTreeState encounters a leaf, it uses the transforming function to change the value of the leaf.

Now if we want to number the leaves by counting, we just have to write a transforming function:

numberLeaf :: TransformationFunction
numberLeaf oldValue oldState = (oldState+1, oldState)

Let's test it out:

*Test> let leaf_a = Leaf 5; leaf_b = Leaf 6; 
*Test> let branch_x = Branch leaf_a leaf_b; leaf_c = Leaf 3;
*Test> let branch_y = Branch branch_x leaf_c; tree1 = branch_y;
*Test> tree1
Branch (Branch (Leaf 5) (Leaf 6)) (Leaf 3)
*Test> let (finalState, tree2) = mapTreeState numberLeaf tree1 0
*Test> tree2
Branch (Branch (Leaf 0) (Leaf 1)) (Leaf 2)
*Test> finalState
3

So, it works. But man, passing around all those states is ugly.

We're going to define a bunch of machinery that will eventually make it prettier, but be warned: it's going to get a lot worse before it gets better.

todo:make first version this version

First, we rewrite 'type TransformationFunction', 'mapTreeState' and 'numberLeaf' so that they numberLeaf operates on leaf data structures, rather than on the integers contained in the leaves:

type TransformationFunction = (Tree -> State -> (State, Tree))

mapTreeState :: TransformationFunction ->
                Tree -> State -> (State, Tree)

mapTreeState f (Leaf x) state = f (Leaf x) state

mapTreeState f (Branch lhs rhs) state =
    let (state' , lhs') = mapTreeState f lhs state
        (state'', rhs') = mapTreeState f rhs state'
    in (state'', Branch lhs' rhs')

numberLeaf :: TransformationFunction
numberLeaf (Leaf x) oldState = (oldState+1, Leaf oldState)
numberLeaf (Branch lhs rhs) oldState = undefined

Next, we introduce this type:

type StateWanter = State -> (State, Tree)

We rewrite our function signatures in terms of this new type:

type TransformationFunction = (Tree -> StateWanter)

mapTreeState :: TransformationFunction ->
                Tree -> StateWanter

The intuition here is: if we partially apply a function of type TransformationFunction by giving it a Tree as input, but not telling it what the current State is yet, then that partial application yields a function of type StateWanter. You can think of a StateWanter as saying, "OK, you figure out what the current State of the world is, and tell me. Then I'll do my thing, and I'll return to you the modified Tree, and I'll also tell you what the new State of the world is."

And we can now think of mapTreeState like this, too. What if we knew what the transformation function was, and we knew what the initial Tree looked like, but we didn't yet know what the initial State was? In this case, we could partially apply the first two arguments of mapTreeState, and then mapTreeState would turn into a StateWanter, just like the TransformationFunction did.

Now we'll implement the two monad functions (for the moment, without tying them to the Monad typeclass). Unfortunately we'll get an error if we just create two functions named return and >>=, because those names are already in use by the Monad typeclass. So we'll use myReturn instead of return and myBind instead of >>=.

myReturn :: Tree -> StateWanter
myReturn tree = (\currentState -> (currentState, tree))

myBind :: StateWanter -> (Tree -> StateWanter) -> StateWanter
m `myBind` f = (\currentState -> 
          let 
            (state', t) = m currentState
            m' = f t
          in 
            m' state'
          )

Using65 myReturn and myBind, we can rewrite mapTreeState without ever having to explicitly talk about the state in the function body:

mapTreeState_myfuncs :: TransformationFunction -> Tree -> StateWanter

mapTreeState_myfuncs f (Leaf x) = f (Leaf x)

mapTreeState_myfuncs f (Branch lhs rhs) =
    (mapTreeState_myfuncs f lhs) `myBind` (\ lhs' ->
     (mapTreeState_myfuncs f rhs) `myBind` (\ rhs'  ->
      myReturn (Branch lhs' rhs')
    ))

In this example, the type StateWanter is the monad type (even though it isn't, yet, formally an instance of the Monad type class).

It works just like the first version:

*Test> let leaf_a = Leaf 5; leaf_b = Leaf 6; 
*Test> let branch_x = Branch leaf_a leaf_b; leaf_c = Leaf 3;
*Test> let branch_y = Branch branch_x leaf_c; tree1 = branch_y;
*Test> mapTreeState_myfuncs numberLeaf tree1 0
(3,Branch (Branch (Leaf 0) (Leaf 1)) (Leaf 2))

Making it into a real Monad

OK, now we will turn the StateWanter in this example into an actual full-fledged instance of the Monad type class.

Unfortunately, type synonyms for functions can't be instances of the Monad type class; only "actual" data types can be66. So, since we can't make StateWanter itself an instance of Monad, we're going to create a new data type, Wrapped_StateWanter, which is a wrapper that exists only to hold a StateWanter value.

This isn't going to change things very much conceptually, and it's not going to change the body of the function mapTreeState, but it's going to change the type signatures of everything (since everything will now work with Wrapped_StateWanter instead of StateWanter), and it will unfortunatly add a LOT of cruft to our backend machinery.

The payoff is that when we are done, we will be able to use do blocks as syntactic sugar for mapTreeState. This seems like it is adding a lot of cruft for only a little gain; but in the long run it makes sense because the nasty, crufty backend machinery only has to be written once and then can be used with any function with the same type signature as mapTreeState.

The first thing we're going to do is a little bit of black magic. This black magic involves "polymorphism", which will be explained in the next chapter. Don't try to understand the following code:

newtype Wrapped_StateWanter_poly a = Wrapped_StateWanter (State -> (State, a))
type Wrapped_StateWanter = Wrapped_StateWanter_poly Tree

until you read the next chapter, you can pretend that we had done this instead:

data Wrapped_StateWanter = Wrapped_StateWanter StateWanter

Next, instead of using the transformation functions of the type TransformationFunction, which involves StateWanter, we'll replace this type with the type TransformationFunction2, which uses Wrapped_StateWanter instead.

type TransformationFunction2 = (Tree -> Wrapped_StateWanter)

Next, here is the instance declaration that defines the functions return and >>=, and makes the Wrapped_StateWanter type into an instance of the Monad typeclass:

instance Monad Wrapped_StateWanter_poly where

  return tree =  Wrapped_StateWanter (\currentState -> (currentState, tree))

  (Wrapped_StateWanter m) >>= f = Wrapped_StateWanter (
     \currentState -> 
          let 
            (state', t) = m currentState
            Wrapped_StateWanter m' = f t
          in 
            m' state'
          )

Last piece of black magic: ignore the Wrapped_StateWanter_poly in the first line of that chunk of code; pretend that it had said Wrapped_StateWanter instead.

You can see that our return is just like our myReturn, except that we wrap the result in a Wrapped_StateWanter.

Our >>= is just like our myBind, except that:

The left-hand input to >>= is of type Wrapped_StateWanter, whereas the left-hand input to myBind was of type StateWanter. So, we unwrap it, and assign m to the StateWanter which is inside the Wrapped_StateWanter.
The type of our right-hand input, f, used to be a function of type Tree -> StateWanter, but now it is a function of type Tree -> Wrapped_StateWanter. So, we unwrap the result when we evaluate f t, and assign the unwrapped result to m'.
The output type is now Wrapped_StateWanter, where it used to be type StateWanter. So, we wrap our output.

Next, here's our new version of mapTreeState_myfuncs, which we're going to call mapTreeStateM (M is for Monad). Nothing substantial has changed; we just changed the type signature to use TransformationFunction2 instead of TransformationFunction, and changed the code to use return and >>= instead of myReturn and myBind.

mapTreeStateM :: TransformationFunction2 -> Tree -> Wrapped_StateWanter

mapTreeStateM f (Leaf x) = f (Leaf x)

mapTreeStateM f (Branch lhs rhs) =
    (mapTreeStateM f lhs) >>= (\ lhs' ->
     (mapTreeStateM f rhs) >>= (\ rhs'  ->
      return (Branch lhs' rhs')
    ))

Next, rather than re-write numberLeaf to deal with the wrapper around StateWanter?, we'll write a generic function that wraps transformation functions:

wrapTransformationFunction :: TransformationFunction -> TransformationFunction2
wrapTransformationFunction f = (\ tree -> Wrapped_StateWanter (f tree) )

With our new version, mapTreeStateM, we can't use the same calling syntax as we did before; remember that we before, we did:

mapTreeState_myfuncs numberLeaf tree1 0

This worked because mapTreeState_myfuncs numberLeaf tree1 returned a StateWanter. Then that StateWanter consumed the 0 at the right, and returned a Tree.

But now, our mapTreeStateM is going to return a Wrapped_StateWanter. This isn't a function, and so it will be unable to directly consume the inital state.

Rather than make the caller change, however, we can just wrap mapTreeStateM to produce a wrapped function, mapTreeState2, with the same calling conventions as before:

runStateM :: Wrapped_StateWanter -> State -> Tree
runStateM (Wrapped_StateWanter statewanter) state = snd (statewanter state)

wrapTreeFunction :: (TransformationFunction2 -> Tree -> Wrapped_StateWanter) -> (TransformationFunction -> Tree -> State -> Tree)
wrapTreeFunction treeFnM = (\ f tree init_state ->
         let
       fM = wrapTransformationFunction f
       wsw_answer = treeFnM fM tree
         in
       runStateM wsw_answer init_state
                           )

mapTreeState2 = wrapTreeFunction mapTreeStateM

Here's an imaginary "debugging trace" showing how the wrapping works:

mapTreeState2 numberLeaf tree1 0
= (mapTreeState2) numberLeaf tree1 0
= (wrapTreeFunction mapTreeStateM) numberLeaf tree1 0
= (\f tree init_state -> 
         let 
             fM = wrapTransformationFunction f
             wsw_answer = mapTreeStateM fM tree
         in
       runStateM wsw_answer init_state
  ) numberLeaf tree1 0
= (      let 
             fM = wrapTransformationFunction numberLeaf
             wsw_answer = mapTreeStateM fM tree1
         in
       runStateM wsw_answer 0
  )
= (      let 
             wsw_answer = (..a Wrapped_StateWanter containing 
                             a StateWanter, call it statewanter..)
         in
       runStateM wsw_answer 0
  )
= snd (statewanter 0)

And that's it. Here's the demo:

*Test> let leaf_a = Leaf 5; leaf_b = Leaf 6; 
*Test> let branch_x = Branch leaf_a leaf_b; leaf_c = Leaf 3;
*Test> let branch_y = Branch branch_x leaf_c; tree1 = branch_y;
*Test> mapTreeState2 numberLeaf tree1 0
Branch (Branch (Leaf 0) (Leaf 1)) (Leaf 2)
*Test> mapTreeState2 numberLeaf tree1 5
Branch (Branch (Leaf 5) (Leaf 6)) (Leaf 7)

do block notation

Finally, we get to use the do block syntactic sugar.

todo: define it

So, the following mapTreeStateM is equivalent to the previous one:

 mapTreeStateM f (Leaf x) = 
  do
     result <- f (Leaf x)
     return result

mapTreeStateM f (Branch lhs rhs) =
  do
     lhs' <- (mapTreeStateM f lhs)
     rhs' <- (mapTreeStateM f rhs)
     return (Branch lhs' rhs')

So, there you have it. At the expense of a crufty backend that only has to be written once, you have a clean notation for a stateful algorithm, but "really" you have explicitly represented the state in a referentially transparent way throughout.

The backend code will only work for algorithms whose output is a tree. In the next chapter we'll see how to generalize it to remove that restriction.

Here's the final code:

module Test where





-- begin black magic
newtype Wrapped_StateWanter_poly a = Wrapped_StateWanter (State -> (State, a))
type Wrapped_StateWanter = Wrapped_StateWanter_poly Tree
-- end black magic
--
-- You can pretend that the above black magic said this instead:
--
--    data Wrapped_StateWanter = Wrapped_StateWanter StateWanter
--


-- set up the Monad machinery:
instance Monad Wrapped_StateWanter_poly where

  return tree =  Wrapped_StateWanter (\currentState -> (currentState, tree))

  (Wrapped_StateWanter m) >>= f = Wrapped_StateWanter (
     \currentState -> 
          let 
            (state', t) = m currentState
            Wrapped_StateWanter m' = f t
          in 
            m' state'
          )

type TransformationFunction2 = (Tree -> Wrapped_StateWanter)




-- the "actual" code:

type State = Int
type StateWanter = State -> (State, Tree)
type TransformationFunction = (Tree -> StateWanter)

data Tree
  = Leaf Int
  | Branch (Tree) (Tree)
   deriving Show

numberLeaf :: TransformationFunction
numberLeaf (Leaf x) oldState = (oldState+1, Leaf oldState)
numberLeaf (Branch lhs rhs) oldState = undefined


mapTreeStateM :: TransformationFunction2 -> Tree -> Wrapped_StateWanter

mapTreeStateM f (Leaf x) = 
  do
     result <- f (Leaf x)
     return result

mapTreeStateM f (Branch lhs rhs) =
  do
     lhs' <- (mapTreeStateM f lhs)
     rhs' <- (mapTreeStateM f rhs)
     return (Branch lhs' rhs')




-- wrapping code:
wrapTransformationFunction :: TransformationFunction -> TransformationFunction2
wrapTransformationFunction f = (\ tree -> Wrapped_StateWanter (f tree) )

runStateM :: Wrapped_StateWanter -> State -> Tree
runStateM (Wrapped_StateWanter statewanter) state = snd (statewanter state)

wrapTreeFunction :: (TransformationFunction2 -> Tree -> Wrapped_StateWanter) -> (TransformationFunction -> Tree -> State -> Tree)
wrapTreeFunction treeFnM = (\ f tree init_state ->
         let
       fM = wrapTransformationFunction f
       wsw_answer = treeFnM fM tree
         in
       runStateM wsw_answer init_state
                           )

mapTreeState2 = wrapTreeFunction mapTreeStateM

\newpage

More on types: type classes etc

(and polymorphic types, kinds, strict fields, etc)

\newpage

Module syntax

\newpage

Common library functions

\newpage

Solutions to exercises

Basic data structures

Lists

1. map Char.isLower "aBCde"

2. length (filter Char.isLower "aBCde")

3. fst (head (tail [(5,'b'),(1,'c'),(6,'a')]))

\newpage

Todo

Why didn't we have to put this inside a do block? Actually, you can execute single I/O operations outside of a do block. But there is no way67 outside of do blocks to specify multiple commands in sequence.

But we can still put it in a do block if we want:

Prelude> do {print "Hello World"}
"Hello World"
Prelude>

pairs, tuples

lists

strings

sgn -33

<interactive>:1:5: No instance for (Num (a -> a1)) arising from the literal `33' at <interactive>:1:5-6 Possible fix: add an instance declaration for (Num (a -> a1)) In the second argument of `(-)', namely `33' In the expression: sgn - 33 In the definition of `it': it = sgn - 33

Test> sgn (-33) -1

errors:

Test> weekdayStrExpand "sdf"

<interactive>:1:0: Not in scope: `weekdayStrExpand'

*Test> :t (\x -> let r = read(x) in (r,r > 3))
(\x -> let r = read(x) in (r,r > 3)) :: (Read a, Num a, Ord a) => String -> (a, Bool)
*Test> :t (\x -> let r = read(x) in (r,r > 3.0))
(\x -> let r = read(x) in (r,r > 3.0)) :: (Read a, Fractional a, Ord a) => String -> (a, Bool)

---

foldr

---

as-patterns (@)

error()

fixity declarations

type synonyms

sub ----

Example: the Maybe data type

Often you have a computation which can either succeed (in which case it returns an answer) or fail (in which case there is no answer, besides "I failed"). An example would be a function which searches through a list for an element satisfying a given predicate and then returns the first element satisfying that predicate68.

A common way to deal with this issue in Haskell is to create a special return value, call it Nothing, which indicates when the computation was unsuccessful. We can do that by wrapping the actual return type, whatever it is, in a new type that contains a wrapped version of each of the values of the previous type, and that also

data Maybe a = Just a
               | Nothing

todo: this is parametric poly!

todo: the "trivial monad"

todo: field names

todo: newtype?? if no, rmove from monad example

Monad t =>

give the example of the readsTree parser from the Gentle Introduction

write multiple "deriving"s like this: (,,,)

Prelude> let f x = if x then do {print "true"; return True;} else do {print "false"; return False;}
Prelude> f True
"true"
True
Prelude> f False
"false"
False
Prelude> mapM f [True, False, True, False]
"true"
"false"
"true"
"false"
[True,False,True,False]
Prelude> filterM f [True, False, True, False]

<interactive>:1:0: Not in scope: `filterM'
Prelude> :m Control.Monad
Prelude Control.Monad> filterM f [True, False, True, False]
"true"
"falfor se"
"true"
"false"
[True,True]

-- Example of putting actions in a tree and manipulating them.

-- Example mostly from cgibbard at http://reddit.com/r/programming/info/2bxc1/comments/c2dkol

module Test where

import Control.Monad

data Tree a = Tip | Branch a (Tree a) (Tree a) 


foldTree :: b -> (a -> b -> b -> b) -> Tree a -> b
foldTree t b Tip = t
foldTree t b (Branch x l r) = b x (foldTree t b l)
                                  (foldTree t b r)


runTree :: (Monad m) => Tree (m (Either a b)) -> m [Either a b]
runTree = foldTree (return [])
                   (\x l r -> do v <- x
                                 case v of
                                   Left u  -> liftM (v:) l
                                   Right u -> liftM (v:) r)


a = liftM Left (putStrLn "Why ")
b = liftM Right (putStrLn "hi ")
c = liftM Right (putStrLn "hello ")
d = liftM Left (putStrLn "there man")
e = liftM Left (putStrLn "there buddy")

testTree = (Branch a (Branch b (Branch d Tip Tip) (Branch e Tip Tip)) (Branch c (Branch e Tip Tip) (Branch d Tip Tip)))

-- todo: write (for conventional trees with leaves) examples of: tree traversal, summing, traversal w/ functions (for speed, he says), tree reversing

explain dollarsign operator

functions only take one argument

functions bind strongest of all

	  -- both from http://www.idt.mdh.se/kurser/cd5100/ht01/haskellf-ht01.pdf

use "take" evaluation on pages 29, 42 http://www.idt.mdh.se/kurser/cd5100/ht01/haskellf-ht01.pdf

mb polymorphic expln from here? but prob want simpler example; yaht? http://www.idt.mdh.se/kurser/cd5100/ht01/haskellf-ht01.pdf

"http://www.haskell.org/tutorial/functions.html": "Function application has higher precedence than any infix operator"

Functions are values. A function can accept the name of another function as its argument. The functions associate to the left.

Recall the 'const' function, which takes one argument, and returns a function that takes another argument and ignores it (alternately, you can think of const as a function that takes two arguments, ignoring the second, and returning the first).

Consider two functions, "(const 'a')" and "(const 'b')". Both of these take one argument, but both of them ignore their argument. The first one always returns 'a', and the second one always returns 'b'.

In the following example, we feed "(const 'b')" as the argument to "(const 'a')". "(const 'a')" eats its argument and then ignores it, and returns 'a':

Prelude> (const 'a') (const 'b') 'a'

Since the function "(const 'b')" was ignored, it doesn't matter if we replace it with something else, for example, the function "const":

Prelude> ((const 'a') const)) 'a'

Because functions associate to the left, this is equivalent to:

Prelude> (const 'a') const 'a' Prelude> const 'a' const 'a'

Note that the second "const" never gets to "eat" an argument. The second "const" is treated as a value which is fed to the first "const" and then ignored. The second "const" is never evaluated, so it doesn't matter that it is never supplied with an argument. In fact, if you DID put an argument over on the right, it wouldn't work:

> const 'a' const 'b'

<interactive>:1:0: Couldn't match expected type `Char -> t' against inferred type `Char' In the expression: const 'a' const 'b' In the definition of `it': it = const 'a' const 'b'

That's because this is happening\footnote{although the error message is different, so maybe i am misunderstanding? todo}:

> const 'a' const 'b' > ((const 'a') const) 'b' > ('a') 'b'

So, when you read an expression like: const 'a' const 'b' you may be tempted to say, 'well, what we have are two functions, so what are the arguments being given to each of those functions', and then in your mind parse it as if it were: (const 'a') (const 'b')

But this isn't right. Instead, you should say, 'What is on the left? The function 'const'. What happens after we give 'const' its first argument? We get back a function of one argument which always returns 'a'. Now this function is leftmost. So we give it its argument -- now it eats the second 'const', and returns 'a'. Now we have 'a' leftmost, and 'b' next to it. Oops, 'a' isn't a function, so it can't eat 'b', so we have an error.

This procedure works as long as we don't have any infix operators (i think; todo). If we have infix operators, then you should imagine that before doing this procedure, you put parentheses around the arguments of the infix operators; and then you should convert the infix operators into normal prefix functions using (.), and move them in front of their arguments (i think; todo).

Gotcha:

As noted, infix operators are different. The names of functions are themselves values; "const" is a value. But unlike the names of functions, a naked infix operator IS NOT A VALUE! a function CANNOT EAT A NAKED INFIX OPERATOR. To turn an infix operator into a value, you HAVE TO surround it with parentheses.

Prelude> :t (++) (++) :: [a] -> [a] -> [a]

Prelude> :t ++ <interactive>:1:0: parse error on input `++'

> (const 'a') (++) 'a'

> (const 'a') ++ <interactive>:1:14: parse error (possibly incorrect indentation)

Since infix operators have lower precedence than functions, if you have an expression with a single infix operator and no parentheses around it, then everything in the expression becomes parts of the arguments of the infix operator. 

For example, let's use the infix operator "." to make an expression out of the composition of two functions. Consider first the function "(\y -> const 3)"; this will then take two arguments, ignore them, and return "3".

Prelude> :t (\y -> const 3)
(\y -> const 3) :: (Num t1) => t -> b -> t1
Prelude> (\y -> const 3) 5 7
3


We will make that function by composing "(\x y -> x)" and "(const 3)". So, this means that we will feed "(const 3)" as the first argument (x) to "(\x y -> x)". This will produce the function "(\y -> const 3)". 

The composition of those two functions is "(\x y -> x) . (const 3)". We see that this has the correct type:

Prelude> :t (\x y -> x) . (const 3)
(\x y -> x) . (const 3) :: (Num b) => a -> t -> b

So, it wants two values and then it'll return one value (which will always be 3). So, you might try just adding two values to the right, and expect to get 3. But:

Prelude> (\x y -> x) . (const 3) 5 7

<interactive>:1:21:
    No instance for (Num (t -> a -> b))
      arising from the literal `3' at <interactive>:1:21
    Possible fix: add an instance declaration for (Num (t -> a -> b))
    In the first argument of `const', namely `3'
    In the second argument of `(.)', namely `(const 3) 5 7'
    In the expression: (\ x y -> x) . ((const 3) 5 7)

Let's look at the types:

Prelude> :t (\x y -> x) . (const 3) 5 7
(\x y -> x) . (const 3) 5 7 :: (Num (Integer -> a -> b)) => a -> t -> b

That's funny, even though we added in '5 7', it still wants two values. What if we add another value:

Prelude> :t (\x y -> x) . (const 3) 5 7 11
(\x y -> x) . (const 3) 5 7 11 :: (Num (Integer -> Integer -> a -> b)) => a -> t -> b

It STILL demands two more values. Why isn't it accepting the values we gave it? It keeps wanting more!! You could keep adding stuff to the right forever and it won't be satisfied. The problem is that, since function application has higher precedence than '.', an infix operator, these things are being parsed like this:


((\x y -> x)) . ((const 3) 5 7 11)

which results in a single function.

In this example, the error messages are being obscured by the use of numbers, which allows Haskell's type inference system to hope against hope that "5" can be a function that operates on arguments "7 11". This happens because "5" isn't a specific type, just an instance of a typeclass. This means that the non-sensical expression (5 7 11) doesn't immediately generate a type error when it is first encountered:

Prelude> :t (5 7 11)
(5 7 11) :: (Num (Integer -> Integer -> t)) => t

Even though it can't be evaluated:

Prelude> (5 7 11)
<interactive>:1:1:
    No instance for (Num (t -> t1 -> t2))
      arising from the literal `5' at <interactive>:1:1-6
    Possible fix: add an instance declaration for (Num (t -> t1 -> t2))
    In the expression: (5 7 11)
    In the definition of `it': it = (5 7 11)

This doesn't happen with characters (which ARE an instance of a specific type):

Prelude> :t ('a' 'b' 'c')

<interactive>:1:1:
    Couldn't match expected type `t1 -> t2 -> t'
	   against inferred type `Char'

Which means that the error messages in the larger expression are more interpretable:

Prelude> :t (\x y -> x) . (const 'a') 'b' 'c'
 
<interactive>:1:15:
    Couldn't match expected type `t -> a -> b'
	   against inferred type `Char'
    Probable cause: `const' is applied to too many arguments
    In the second argument of `(.)', namely `(const 'a') 'b' 'c''

Since the above is parsed like this, the same error results if you put the parentheses in:


Prelude> :t ((\x y -> x)) . ((const 'a') 'b' 'c')

<interactive>:1:18:
    Couldn't match expected type `t -> a -> b'
	   against inferred type `Char'
    Probable cause: `const' is applied to too many arguments
    In the second argument of `(.)', namely `((const 'a') 'b' 'c')'


The solution is to explicitly put the parentheses the way you want them:


Prelude> ((\x y -> x) . (const 'a')) 'b' 'c'
a
Prelude> ((\x y -> y) . (const 'a')) 'b' 'c'
c


Gotcha 2:

This becomes especially weird when you pass around an expression containing a partially applied infix operator. For example, this generates a syntax error:

Prelude> const .
<interactive>:1:7: parse error (possibly incorrect indentation)

But this doesn't:

Prelude> (const .) (+1) 3 5
4

You might thing that the subexpression (const .) would generate a syntax error, since there is no right argument. But instead, it just does partial evaluation. The previous expression is equivalent to (i think, todo):

Prelude> ((.) const) (+1) 3 5
4


Inside the parentheses that surround the infix operator, it still behaves like an infix operator, having lower precedence than functions. For example:
 
Prelude> ((-) 2   .) (*3) 4
-10

This is equivalent to:

> ((.) ((-) 2)) (*3) 4

Note how all the stuff to the left of the infix operator (until we hit its enclosing parens) got grouped together.

Similarly,

> (. (*2)) (+1) 3
7

how to debug typeclasses (which file an instance decl is from)

import Control.Arrow
import Control.Monad

main = print (x,y)
       where (x,y) = g  ("hi", "there")

g = join (***) f

f = (\x -> 3)

p2 f k = \ r -> k (f r) r


data Dumbtype a b = Dumbtype a b | Nothin a b


instance Monad (Dumbtype a) where
	return r = Dumbtype undefined r
	Dumbtype a f >>= k = k f
        Nothin a f >>= k = k f


instance Monad (Dumbtype Char) where
	return r = Nothin 'a' r
	Dumbtype a f >>= k = k f
        Nothin a f >>= k = k f

from http://homepages.cwi.nl/~ralf/gpce06/:

-- A closed datatype for expression forms data Exp = Lit Int

-- An operation for evaluation eval :: Exp -> Int eval ( Lit i ) =i eval (Add l r) = eval l + eval r -- Another operation - for printing print :: Exp -> IO () print ( Lit i ) = putStr (show i) print (Add l r) = do print l ; putStr " + " ; print r

Add Exp Exp

from http://homepages.cwi.nl/~ralf/gpce06/: replace data type with class type to get extensibility:

-- An open type class for expression forms class Exp x -- Concrete expression forms data Lit = Lit Int data (Exp x, Exp y) => Add x y = Add x y instance Exp Lit instance (Exp x, Exp y) => Exp (Add x y) -- An evaluation operation class Exp x => Eval x where eval :: x -> Int instance Eval Lit where eval ( Lit i ) = i instance (Eval x, Eval y) => Eval (Add x y) where eval (Add x y) = eval x + eval y

gotcha/note: with polymorphic data types, some fully instantiated types might be members of a typeclass, while others might not, depending on the type arguments.

example; in the previous example, you might expect that either "Add" is a member of typeclass "Eval", or it is not. But "Add" is not a single data type; it is a polymorphic data type, meaning that it defines a set of possible data types. To specify an individual type (todo: am i using the right terminology here?), you must choose types to bind to the type arguments x and y in data (Exp x, Exp y) => Add x y = Add x y.

Depending on what you choose for x and y, Add x y may or may not be a member of typeclass Eval. For example, Add Lit Lit is a member of Eval. But consider this additional code:

data LitC = LitC Char
instance Exp LitC

e = Add (Lit 5) (LitC 'a')

f = eval e

Note that Lit was a member of Eval, but LitC? is not. So, although the type Add Lit Lit is an instance of Eval, we see that the type Add Lit LitC is __not__ an instance of typeclass Eval:

Main> :r [1 of 1] Compiling Main ( tst.hs, interpreted )

tst.hs:20:4: No instance for (Eval LitC?) arising from a use of `eval' at tst.hs:20:4-9 Possible fix: add an instance declaration for (Eval LitC?) In the expression: eval e In the definition of `f': f = eval e Failed, modules loaded: none.

overlapping instances example: adapted from http://homepages.cwi.nl/~ralf/gpce06/

-- Datatypes for different kinds of shapes
data Rectangle = Rectangle Int Int Int Int
data Circle = Circle Int Int Int
-- A type class that comprehends all shapes
class Shape x
instance Shape Rectangle
instance Shape Circle
-- A type class for a multiple dispatch
class (Shape x, Shape y) => Intersect x y where
   intersect :: x -> y -> Bool
-- A special case for RxR ( others omitted for brevity )
instance Intersect Rectangle Rectangle where
   intersect r r' = True

instance Shape x => Intersect Rectangle x where
   intersect r r' = False

t = intersect (Rectangle 0 0 0 0) (Rectangle 0 0 0 0)

you have to use ghci -XMultiParamTypeClasses? -XFlexibleInstances? -XOverlappingInstances? for this to be permitted. When you use overlapping instances, the most specific instance is the one chosen; so t = True.

Gotcha: it's possible for two instance definitions to be "equally specific", which causes an 'overlapping instances' error even though we have '-XOverlappingInstances?' enabled. Add this code:

instance Shape x => Intersect Circle x where
   intersect r r' = False

instance Shape x => Intersect x Circle where
   intersect r r' = False

t2 = intersect (Circle 0 0 0 ) (Circle 0 0 0)

Now we have a problem:

*Main> :r
[1 of 1] Compiling Main             ( tst.hs, interpreted )

tst.hs:28:5:
    Overlapping instances for Intersect Circle Circle
      arising from a use of `intersect' at tst.hs:28:5-44
    Matching instances:
      instance [overlap ok] (Shape x) => Intersect Circle x
        -- Defined at tst.hs:(22,0)-(23,24)
      instance [overlap ok] (Shape x) => Intersect x Circle
        -- Defined at tst.hs:(25,0)-(26,24)
    In the expression: intersect (Circle 0 0 0) (Circle 0 0 0)
    In the definition of `t2':
        t2 = intersect (Circle 0 0 0) (Circle 0 0 0)
Failed, modules loaded: none.

spellchecker examples

http://reddit.com/r/programming/info/1kfhu/comments/c1kg86

http://pitekus.blogspot.com/2007/04/norvigs-spelling-corrector-in-haskell.html

http://groups.google.com/group/fa.haskell/browse_thread/thread/0821d82cf27e5a2e/5eb08166a5e1fff6?lnk=raot#5eb08166a5e1fff6

http://groups.google.com/group/fa.haskell/attach/7c043e40b1251efc/spell2.hs?part=2

https://www.blogger.com/comment.g?blogID=11788780&postID=5131957265387550855:

(cale gibbard's comment)

Regarding mixing (.) and ($), it helps to remember that (.) binds as strongly as possible to its parameters (only function application has precedence over it), whereas ($) binds as weakly as possible. So if I write some expression like:

f . g a b . h c $ x . y

it means:

(f . ((g a) b) . (h c)) $ (x . y)

http://www.randomhacks.net/articles/2007/03/12/monads-in-15-minutes

Monads in 15 minutes: Backtracking and Maybe

Posted by Eric Kidd on 03/12/2007

This morning, a programmer visited #haskell and asked how to implement backtracking. Not surprisingly, most of the answers involved monads. After all, monads are ubiquitous in Haskell: They’re used for IO, for probability, for error reporting, and even for quantum mechanics. If you program in Haskell, you’ll probably want to understand monads. So where’s the best place to start?

A friend of mine claims he didn’t truly understand monads until he understood join. But once he figured that out, everything was suddenly obvious. That’s the way it worked for me, too. But relatively few monad tutorials are based on join, so there’s an open niche in a crowded market.

This monad tutorial uses join. Even better, it attempts to cram everything you need to know about monads into 15 minutes. (Hey, everybody needs a gimmick, right?) Backtracking: The lazy way to code

We begin with a backtracking constraint solver. The idea: Given possible values for x and y, we want to pick those values which have a product of 8:

solveConstraint = do x <- choose [1,2,3] y <- choose [4,5,6] guard (x*y == 8) return (x,y)

Every time choose is called, we save the current program state. And every time guard fails, we backtrack to a saved state and try again. Eventually, we’ll hit the right answer:

> take 1 solveConstraint [(2,4)]

Let’s build this program step-by-step in Haskell. When we’re done, we’ll have a monad. Implementing choose

How can we implement choose in Haskell? The obvious version hits a dead-end quickly:

-- Pick one element from the list, saving -- a backtracking point for later on. choose :: [a] -> a choose xs = ...

We could be slightly sneakier, and return all the possible choices as a list. We’ll use Choice whenever we talk about these lists, just to keep things clear:

type Choice a = [a]

choose :: [a] -> Choice a choose xs = xs

Running this program returns all the possible answers:

> choose [1,2,3] [1,2,3]

Now, since Haskell is a lazy language, we can work with infinite numbers of choices, and only compute those we actually need:

> take 3 (choose [1..]) [1,2,3]

Because Haskell doesn’t compute answers until we ask for them, we get the actual backtracking for free! Combining several choices

Now we have the list [1,2,3] from our example. But what about the list [4,5,6]? Let’s ignore guard for a minute, and work on getting the final pairs of numbers, unfiltered by any constraint.

For each item in the first list, we need to pair it with every item in the second list. We can do that using map and the following helper function:

pair456 :: Int -> Choice (Int,Int) pair456 x = choose [(x,4), (x,5), (x,6)]

Sure enough, this gives us all 9 combinations:

> map pair456 (choose [1,2,3]) [[(1,4),(1,5),(1,6)], [(2,4),(2,5),(2,6)], [(3,4),(3,5),(3,6)]]

But now we have two layers of lists. We can fix that using join:

join :: Choice (Choice a) -> Choice a join choices = concat choices

This collapses the two layers into one:

> join (map pair456 (choose [1,2,3])) [(1,4),(1,5),(1,6), (2,4),(2,5),(2,6), (3,4),(3,5),(3,6)]

Now that we have join and map, we have two-thirds of a monad! (Math trivia: In category theory, join is usually written μ.)

In Haskell, join and map are usually combined into a single operator:

-- Hide the standard versions so we can -- reimplement them. import Prelude hiding ((>>=), return)

(>>=) :: Choice a -> (a -> Choice b) -> Choice b choices >>= f = join (map f choices)

This allows us to simplify our example even further:

> choose [1,2,3] >>= pair456 [(1,4),(1,5),(1,6), (2,4),(2,5),(2,6), (3,4),(3,5),(3,6)]

Completing our monad: return

We’re getting close! We only need to define the third monad function (and then figure out what to do about guard).

The missing function is almost too trivial to mention: Given a single value of type a, we need a convenient way to construct a value of type Choice a:

return :: a -> Choice a return x = choose [x]

(More math trivia: return is also known as unit and η. That’s a lot of names for a very simple idea.)

Let’s start assembling the pieces. In the code below, (\x -> ...) creates a function with a single argument x. Pay careful attention to the parentheses:

makePairs :: Choice (Int,Int) makePairs = choose [1,2,3] >>= (\x -> choose [4,5,6] >>= (\y -> return (x,y)))

When run, this gives us a list of all possible combinations of x and y:

> makePairs [(1,4),(1,5),(1,6), (2,4),(2,5),(2,6), (3,4),(3,5),(3,6)]

As it turns out, this is a really common idiom, so Haskell provides some nice syntactic sugar for us:

makePairs' :: Choice (Int,Int) makePairs' = do x <- choose [1,2,3] y <- choose [4,5,6] return (x,y)

This is equivalent to our previous implementation:

> makePairs' [(1,4),(1,5),(1,6), (2,4),(2,5),(2,6), (3,4),(3,5),(3,6)]

The final piece: guard

In our backtracking monad, we can represent failure as a choice between zero options. (And indeed, this is known as the “zero” for our monad. Not all useful monads have zeros, but you’ll see them occasionally.)

-- Define a "zero" for our monad. This -- represents failure. mzero :: Choice a mzero = choose []

-- Either fail, or return something -- useless and continue the computation. guard :: Bool -> Choice () guard True = return () guard False = mzero

And now we’re in business:

solveConstraint = do x <- choose [1,2,3] y <- choose [4,5,6] guard (x*y == 8) return (x,y)

Note that since the return value of guard is boring, we don’t actually bind it to any variable. Haskell treats this as if we had written:

  -- "_" is an anonymous variable.
  _ <- guard (x*y == 8)

That’s it!

> take 1 solveConstraint [(2,4)]

Another monad: Maybe

Every monad has three pieces: return, map and join. This pattern crops up everywhere. For example, we can represent a computation which might fail using the Maybe monad:

returnMaybe :: a -> Maybe a returnMaybe x = Just x

mapMaybe :: (a -> b) -> Maybe a -> Maybe b mapMaybe f Nothing = Nothing mapMaybe f (Just x) = Just (f x)

joinMaybe :: Maybe (Maybe a) -> Maybe a joinMaybe Nothing = Nothing joinMaybe (Just x) = x

Once again, we can use do to string together individual steps which might fail:

tryToComputeX :: Maybe Int tryToComputeX = ...

maybeExample :: Maybe (Int, Int) maybeExample = do x <- tryToComputeX y <- tryToComputeY x return (x,y)

Once you can explain how this works, you understand monads. And you’ll start to see this pattern everywhere. There’s something deep about monads and abstract algebra that I don’t understand, but which keeps cropping up over and over again. Miscellaneous notes

In Haskell, monads are normally defined using the Monad type class. This requires you to define two functions: return and >>=. The map function for monads is actually named fmap, and you can find it in the Functor type class.

Also, every monad should obey three fairly reasonable rules if you don’t want bad things to happen:

-- Adding and collapsing an outer layer -- leaves a value unchanged. join (return xs) == xs

-- Adding and collapsing an inner layer -- leaves a value unchanged. join (fmap return xs) == xs

-- Join order doesn't matter. join (join xs) == join (fmap join xs)

That’s it! For more information, see Wadler’s Monads for Functional Programming and the excellent All About Monads tutorial. And if you liked the backtracking monad, you’ll also like The Reasoned Schemer.

(This tutorial is dedicated to “ivanm” on #haskell. I hope this helps! And many thanks to Reg Braithwaite for commenting on an early draft.)

Update: Replaced mult456 with pair456, and explained what’s going on with guard in the final backtracking example.

http://blog.moertel.com/articles/2004/03/13/concurrent-port-scanner-in-haskell

Concurrent port scanner in Haskell

by Tom Moertel

module Main (main) where

import Control.Concurrent
import Control.Exception
import Data.Maybe
import Network
import Network.BSD
import System.Environment
import System.Exit
import System.IO

main :: IO ()
main = do
    args <- getArgs
    case args of
        [host, from, to] -> withSocketsDo $
                            scanRange host [read from .. read to]
        _                -> usage

usage = do
    hPutStrLn stderr "Usage: Portscan host from_port to_port"
    exitFailure

scanRange host ports =
    mapM (threadWithChannel . scanPort host . fromIntegral) ports >>=
    mapM_ hitCheck
  where
    hitCheck mvar = takeMVar mvar >>= maybe (return ()) printHit
    printHit port = putStrLn =<< showService port

threadWithChannel action = do
    mvar <- newEmptyMVar
    forkIO (action >>= putMVar mvar)
    return mvar

scanPort host port =
    withDefault Nothing (tryPort >> return (Just port))
  where
    tryPort = connectTo host (PortNumber port) >>= hClose

showService port =
    withDefault (show port) $ do
        service <- getServiceByPort port "tcp" 
        return (show port ++ " " ++ serviceName service)

withDefault defaultVal action =
    handle (const $ return defaultVal) action

-- Local Variables:  ***
-- compile-command: "ghc -o Portscan --make Portscan.hs" ***
-- End: ***

Example usage:

$ ./Portscan internalhost.moertel.com 1 1000
21 ftp
22 ssh
80 http
111 sunrpc
139 netbios-ssn
443 https
445 microsoft-ds
631 ipp
709
720

http://community.moertel.com/~thor/talks/pgh-pm-talk-haskell.pdf

<EatenByGrues?> /join #dzen <bshanks> so, " pass <- look "password" `mplus` return "nopassword"" === pass <- (look "password" `mplus` return "nopassword") <EatenByGrues?> wow <lispy> bshanks: yeah, the brakets won't be needed <bshanks> for some reason i thought i wasn't allowed to operate on "incoming" monads to the right of the <- operator. but i guess you can, thx

totimkopf is now known as ruddel <lispy> bshanks: the key here is that mplus returns a monad <lispy> :t mplus <lambdabot> forall (m :: * -> *) a. (MonadPlus? m) => m a -> m a -> m a
flw (n=chatzill@61.135.207.196) has joined #haskell
ruddel has quit () <hackage> Uploaded to hackage: dnsrbl 0.0.3
ruddel (n=noneya@c-98-198-217-142.hsd1.tx.comcast.net) has joined #haskell
ruddel is now known as ruddle
ruddle is now known as ruddel <lispy> it's too bad mplus isn't something like

<lispy> pass <- look "password"<lispy> that would look a bit more natural I think <lispy> > let (<lambdabot> Just 1 <lispy> :t (<lambdabot> Bool -> Bool -> Bool <lispy> But, not all instances of mplus would look right with (<bshanks> yeah, i had understood "mplus" but not <-. i had thought that "<- x" was extra-special magic sauce, and that "x" wasn't allow to be an expression (of course, since everything is an expression that's a silly thing to think in itself) <lispy> sounds like you have it all figured out now though <lispy> you could even have a case on the othre side <bshanks> my lord <lispy> > do x <- case 1 of 1 -> Just 2; return x :: Maybe Int <lambdabot> Parse error in pattern at "::" (column 41) <heatsink> >let (<lispy> well, hard to do that in one line <heatsink> > let (<lambdabot> [1,2,3,4,5,6] <lispy> > do x <- {case 1 of 1 -> Just 2}; return x :: Maybe Int <lambdabot> Parse error at "{case" (column 9) <lispy> bshanks: and on the LHS of the arrow you can do patterns <bshanks> huh, i didn't know that either


return "nopassword"
) = mplus in Just 1		Just 2
)
)
) = mplus in do x <- [1,2,3]		[4,5,6]; return x
) = mplus in do x <- [1,2,3]		[4,5,6]; return x

ruddel has quit (Client Quit) <lispy> > do Just x <- return (Just 1); return 1 :: [Int] <lambdabot> [1] <lispy> > do Just x <- return (Just 1); return x :: [Int] -- that's what I meant to type <lambdabot> [1]
laddie (n=laddie@c-98-193-51-27.hsd1.il.comcast.net) has left #haskell <lispy> bshanks: it's because haskell uses the monad laws to transfrom the do-syntax into (>>=) and return
xinming_ (n=hyy@218.73.141.172) has joined #haskell <lispy> oh and lambdas <lispy> The patterns are where the lambda abstraction starts
lurkymclurkleton (n=ajsmith@ip72-209-220-180.dc.dc.cox.net) has joined #haskell
ruddel (n=thrashed@c-98-198-217-142.hsd1.tx.comcast.net) has joined #haskell <lispy> > do Just x <- return Nothing; return x :: [Int] <lambdabot> [] <lispy> Notice that the pattern match failed <lispy> So we get the empty list instead of x

Footnotes:

1. Maybe "everything else", but I'm putting "almost" in case there are exceptions that I don't know about.

2. Types aren't actually "syntax", but I'm talking about them in this section anyway because if you mess them up in Haskell, you get compile-time errors.

3. the example given is the sort of thing usually be considered syntactically wrong at a very basic level, analogous to forgetting a semicolon in some languages -- but Haskell's syntax is so flexible that in Haskell, the example yields a type error rather than a syntax error

4. there are ways that you can temporarily redefine a previously defined functions; for example, __let__ and __where__ clauses

5. At least, I suspect that you can... I haven't proved this though.

6. In the case of an if/then/else, you can give it the same name in the 'then' and the 'else'.

7. todo: is this accurate?

8. from Yet Another Haskell Tutorial

9. Look up "tail call optimization" in Wikipedia for more details.

10. This (slightly modified) example is from YAHT

11. example from YAHT

12. example from YAHT

13. example from YAHT

14. example from YAHT

15. example from YAHT

16. example from YAHT

17. example from YAHT

18. example modified from YAHT

19. From YAHT

20. From YAHT

21. From YAHT

22. example from Yet Another Haskell Tutorial

23. example from Yet Another Haskell Tutorial

24. There are other differences in the placement of let and where, relating to case expressions and guards, which will be introduced later. A where clause is only allowed at the top level of a set of expressions or case expression (sentence from A Gentle Introduction). However, where clauses can scope over guards, which let clauses cannot.

25. example from Yet Another Haskell Tutorial

26. unless you manually do the monad stuff that do blocks are syntactic sugar for

27. modified example from YAHT

28. The constructor for pairs is a comma-separated expression within parentheses; the constructor for lists is the colon.

29. is there another way to do it? todo

30. more precisely, '(flip /)' isn't syntactically incorrect, but it means something different from what we want here

31. well, you're guaranteed to be able to try; membership in a type class doesn't prevent functions from throwing exceptions or from returning __undefined__. todo

32. By Hal Daume III, at least, in Chapter 4 of YAHT. When I am more experienced in Haskell I will say whether or not I recommend it also.

33. if it ends in an __if/then/else__, then the 'last line' is either the last line of the __then__ or the last line of the __else__, depending on which fork is chosen

34. actually, there's not an error if you try to return "3", but there is for "[3]". Dunno why yet. todo.

35. and to things like random number generation, which can be thought of as a transient event in the outside world

36. in the next section, we will explain that Haskell uses "lazy evaluation"; which means that it's not actually the case that all of the other functions are evaluated before the program starts to execute; actually, these two things happen in concert

37. actually, the function is more general than numbers; the argument "num" can be any type that is a member of both the Ord and the Read typeclasses

38. examples from YAHT

39. this sentence from YAHT

40. actually i guess you don't know for sure that an I/O symbol is being passed along, but it may as well be. todo.

41. This section and the following section on referential transparency were originally placed at the beginning of the tutorial, but then I decided that, although it could be understood at the beginning, it would be better appreciated later on. Also, I didn't want to bore the reader with abstractions before showing them some code. But now I want to explain about why I/O is weird in Haskell, which requires an understanding of referential transparency.

42. if you wanted to, actually, you COULD put all of your code in the "imperative" sections. But you usually don't.

43. This is a half-truth. First, you CAN put single I/O functions outside of __do__ blocks, you just can't give a sequence of them, AFAIK; useful programs with only one I/O command in them are rare, though, so you won't usually do that. Second, you should be aware that __do__ blocks can mean other things besides "imperative section"; we'll get to that when we get to monads.

44. actually, you can execute non-referentially transparent code outside of DO blocks, but AFAIK you can't execute more than one of them in sequence, so you usually won't. In any case, non-referentially transparent functions in Haskell are essentially marked as 'contaminated' in their type signature, enabling you to keep them separate from the rest.

45. i've been told that different people have different formalizations of this, so I'll give you the one I know.

46. source: http://www.oreillynet.com/mac/blog/2006/03/haskell_vs_ocamlwhich_do_you_p.html#comment-34114

47. see this essay: http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html

48. Hal Daume III says in Yet Another Haskell Tutorial: "I have found that I write more bug-free code in less time using Haskell than any other language. I also find it very readable and extensible. Perhaps most importantly, however, I have consistently found the Haskell community to be incredibly helpful."

49. source: Hal Daume III, see above

50. source: Hal Daume III, see above

51. http://citeseer.ist.psu.edu/hudak98modular.html

52. source: http://aszt.inf.elte.hu/~fun_ver/index.html.en

53. source: http://morpheus.cs.ucdavis.edu/papers/sweeny.pdf slide 46

54. source: Hal Daume III, see above

55. source: Hal Daume III, see above. Daume suggests O'Caml if you have a need for speed.

56. source: "...From that point on the main struggle was squeezing enough performance out of GHC. It *IS* possible, but it is *NOT* trivial. GHC can be as fast as Caml but while in Caml naively written code will often still be reasonably efficient, in Haskell more often than not that won't be the case and you will have to give various hints to the compiler to get it to generate efficient code." -- http://www.oreillynet.com/mac/blog/2006/03/haskell_vs_ocamlwhich_do_you_p.html#comment-34114

57. source: http://www.oreillynet.com/mac/blog/2006/03/haskell_vs_ocamlwhich_do_you_p.html#comment-34114

58. by which I don't mean anything technical; I literally imagine them as a set of small wooden cabinets

59. todo

60. example from YAHT

61. example from YAHT, including the following sentence

62. in some classes, all of the functions are defined by default, but in a circular manner -- to break the circularity you need to redefine some of the functions, but you have some choice in which ones to redefine. In these cases the "minimally complete definition" for the type class may be a boolean expression with ANDs and ORs, for instance, "you must define either 'showsPrec' or 'show', but you don't have to define both"

63. definition from Haskell Wikibook at http://en.wikibooks.org/wiki/Haskell/Understanding_monads

64. example modified from YAHT

65. this sentence modified from YAHT

66. at least, I think this is true; but I'm not sure

67. that I know of..

68. example from YAHT

tutorials-haskellTutorial

Haskell tutorial

Preface

Copyright

Note to people reading this on the wiki

Hello World

Basic syntax

Haskell is **case-sensitive**

Types

Defining functions

Calling functions

__do__ blocks

Lack of destructive variable updates

How can this be?

You can redefine functions within __do__ blocks

You can redefine functions within ghci

Using whitespace as shorthand for \{\} and ;

Comments

Module import

Debugging within ghci

Some special values: True, False, ()

Quick note on numerical negation and order of operations

if/then/else

if/then/elses and dos

User input

Introduction to the arrow operator inside a __do__

The __read__ function

The __show__ function

Conditional I/O: a little trick

Symptoms of the problem

Solution

How to loop using recursion

Example: What number am I thinking of?

Basic data structures and functions

Tuples

Lists

Exercises

Some simple functions

More syntax

Temporary function definitions with __let__ and __where__

Example: Search for a path between two nodes in a graph

Even more syntax

case expressions

"undefined"

Piecewise function definitions

Guards

Pattern matching

Example: Quicksort

Example: Revisiting path search in a graph

Basic operations on functions

Anonymous function definitions

Converting prefix functions to infix and vice versa

Higher-order functions

Function composition

flip

Partial function application

Thinking about the anonymous function syntax in terms of partial function application

Type basics

Type classes

Type signatures of functions

Type signatures of functions which take multiple inputs

Generalizing our reasoning

Types of functions that take functions as arguments

Explicitly specifying types

Type synonyms

The __IO__ types

Passing values outside of a do block

__return__

Is there any way to get the IO out of the type permanently?

Referential transparency

Failure of referential transparency: mutable variables

Failure of referential transparency: hidden state

Failure of referential transparency: input

Failure of referential transparency: output

Side effects

Referential transparency: summary

I/O, do blocks, and monads

The IO types segregates non-referentially transparent computations

Actions are a type of value; they don't "do" anything right away

IO and monads as syntactic sugar for explicit labeling of OutsideWorld influence

Haskell is case-sensitive

do blocks

You can redefine functions within do blocks

Introduction to the arrow operator inside a do

The read function

The show function

Temporary function definitions with let and where

The IO types

return

shortcut: derives