notes-computer-jasper-jasperLazyNotes1

--

consider a strictness analyzer which is conservative in that it marks functions as 'maybe strict in argument 1' when its not sure, and values as 'maybe infinite' when not sure. these can be overriden by declarations

--

also maybe it notices when it is recursing into a function and if so, strictifies that.

sort of like that haskell library that provides lists that evaluate the whole list when you evaluate any element of it.

but not quite. b/c we still want to allow infinite lists. more like codata: if you find yourself recursing WITHOUT producing any output, THEN strictify.

note that 'recursing' has to include MUTUAL recursion, so that the programmer can choose to divide a big recursive function into multiple functions that call each other

so, in the example with foldl in http://www.haskell.org/haskellwiki/Stack_overflow , we would notice that we are building up a thunk by recursing between + and foldl, without producing any output (this is not codata-like guarded recursion). so we strictify the thunk.

the example with foldr is tougher. looking at the expansion which is explicitly shown in http://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl%27, we see that we actually can't reduce this without knowing that + is associative. So this really falls into the 'almost tail recursive' detection scenario.

however, the strictness analysis should be able to be given "sum1 = foldr (+) 0", and the knowledge that + is associative, and from this derive that sum1 will hang if applied to codata.

--

so it seems that data and codata might be just the language we need to talk about the sort of strictness analysis that i want.

btw the terms 'data' and 'codata' are confusing because 'data' already has another definition. when i write about 'data', you don't know if mean the usual definition of the term, which includes all codata, or the data/codata definition, which excludes some codata. so i'd like to call it finite data vs. infinite data instead of data vs. codata.

so, note that finite data is a value for which, if a substructurally recursive function is strictly (or lazily) applied to it, that function will terminate. infinite data is a value for which, if a guarded recursive function is lazily applied to it, that function will terminate and produce at least one next element.

the traditional use seems to be to restrict the language in order to prove termination. but in this case we want the compiler to hypothetically consider strictification and decide if there's any chance that strictification would convert a terminating program into a non-terminating one.

if the program was already non-terminating, then we don't care (it's the programmer's fault). so if a programmer calls a non-substructurally recursive function on finite data, that's fine, and if they lazily call a non-guarded recursive function on infinite data, that's fine.

what we are considering is strictification. if you strictify something which is applied to finite data, it'll still terminate, it might just be less efficient. but if you strictify something which is applied to infinite data which is not necessarily finite data, you may turn something terminating into something nonterminating. so that's what we want to watch out for.

so, we want to identify things which may not be finite data and which may be infinite data, and avoid strictifying them.

the conservative approach is to assume that anything might be infinite data, and so not strictify unless the programmer declares something as finite data, or unless we infer that. This is similar to my idea that laziness is the default but that optionally 'eagerness' could be attached to values.

however, we could be more aggressive. as noted above, if you have a function that is not terminating on finite data, and furthermore not even a single lazy step is terminating on infinite data, then we don't care because the programmer made a mistake. if we have an arbitrary function being applied to arbitrary finite data, the compiler can reason:

so the only cases in which strictification is forbidden is when (a) the argument is infinite data, and (b) the function operating on it lazily terminates on infinite data.

the conservative thing to do is to assume that any finite data that we don't understand must be infinite data, and any function that we don't understand must lazily terminate on infinite data.

but we could get more aggressive by prohibiting either undeclared infinite data (e.g. forcing possibly lazy values to be explicitly declared as 'lazy', except when this is inferred), or undeclared functions that terminate on infinite data (e.g. forcing functions that can safely operate on lazy data structures to be declared as such, except when this is inferred). if we did either of these, then the compiler need only check if it can infer either that the inputs are infinite data or that the function is safe on infinite data, and if neither are true, then it can strictify.

(and by strictify, we might mean strictify inputs, or we might mean only strictify if the function finds that it is (possibly mutually) recursively calling itself before producing any output)

note that if a function finds that it is (possibly mutually) recursively calling itself before producing any output, then it must not be guarded recursive.

so how about this. if a function finds that it is (possibly mutually) recursively calling itself before producing any output, then unless it is specially declared as safe for infinite data, we strictify.

the only time this causes a problem is if we have some weird function which recurses on itself more than once before producing output. this is the case in which the programmer is required to declare.

this could be annoying because if the programmer forgets to declare this for such a function, and that function is run on finite data, it will be strictified and everything will seem fine, but then a bug will only crop up when it is run on infinite data. Worse, if that function is in library code, now the library can't be used with infinite data.

so maybe we should require the programmer to declare for EVERY function either 'safe for infinite data' or 'unsafe for infinite data', except when we can infer this at compile time. this would mean automatically identifying guarded recursion, and forcing a declaration in all other cases.

another approach would be to provide both a facility to declare infinite data, and a facility to declare functions safe for infinite data (and therefore unsafe for strictification on infinite data). Now if the library writing forgets to declare their function safe for infinite data, the user can prevent autostrictification by passing in something declared as infinite data.

so, the proposal is as follows:

1) provide a facility to declare that certain inputs to certain functions are cool with infinite data, even though the function may not be guarded recursive 2) provide a facility to declare that certain outputs from certain functions will only yield infinite data if the function is given infinite data input 3) provide a facility to declare that certain outputs from certain functions will always yield finite data, even when the function's inputs are infinite data (e.g. the Haskell function 'take'). This implies (2). 4) provide a facility to declare that certain values are eager data, or that they are lazy data

and then during compile time: 1) infer substructural recursion. Mark such functions as 'only yields infinite data if given infinite data'. (this covers 'sum') 2) infer functions will only recur on some of their arguments, and mark them as 'always yields finite data' and 'only yields infinite data if given infinite data' in the other arguments (this covers 'take') 4) transitively dataflow trace eager data through functions that 'only yields infinite data if given infinite data', to mark their outputs as eager when only eager data is provided 4) transitively dataflow trace lazy data through functions that are not 'always yields finite data', to mark their outputs as lazy when lazy data is provided 5) if any value is marked both 'eager' and 'lazy' then raise a compile-time error

and then during runtime:

1) upon exiting a function or calling into a function, for those outputs or parameter values that are marked eager, strictify 2) upon exiting a function, check to see if we are stuck in a mutual recursion loop that has not produced any lazy output in an entire pass. If so, then unless either (a) the data is marked 'lazy', or (b) the data is passing into function inputs marked 'cool with infinite data', strictify.

i think this would allow the runtime to convert foldl to foldl' automatically, while still permitting codata-safe but non-guarded recursive functions (like interpreters), and still allowing a client to make such a libary function work if the original author forgot to declare it as such (by passing in 'lazy' values), and still allow stream fusion (because the auto-strictification only occurs when the runtime detects a loop that is not producing lazy output).

now, we don't really need declarations (2) and (3), because if we miss some of these, we only under-propagate the eager attribute and over-propagate the lazy attribute. This just reduces the power of the user to use eager data to overcome laziness (because we're being overly 'safe' about not letting the user force strictness through things that we worry may be like 'iterate') , and increases the clumsiness of the 'lazy' attribute (because it will not be removed through some things that are undetectably like 'take').

also, the 'cool with infinite data' is really about the path between certain inputs and certain arguments to recursive calls, but we might want to move just to the function as a whole, b/c that might seem simpler.

also, i guess 'cool with infinite data' applies to foldl inside 'sum1 = foldr (+) 0', but not always to foldr, so we need to be able to add that modifier on a per-call basis.

so the streamlined proposal is:

1) you can declare functions as 'lazyrec', meaning that you don't want autostrictification within them 2) when calling functions, you can call them thru the functor 'lazyrec', which is like declaring them 'lazyrec' just for that call 3) you can declare values as 'lazy', meaning that you don't want autostrictification applied to them or their heirs 4) you can declare values as 'eager', meaning that you want strictification applied to them and their heirs

during compile time:

1) infer substructural recursion. Mark such functions as 'only yields infinite data if given infinite data'. (this covers 'sum') 2) infer functions will only recur on some of their arguments, and mark them as 'always yields finite data' and 'only yields infinite data if given infinite data' in the other arguments (this covers 'take') 4) transitively dataflow trace eager data through functions that 'only yields infinite data if given infinite data', to mark their outputs as eager when only eager data is provided 4) transitively dataflow trace lazy data through functions that are not 'always yields finite data', to mark their outputs as lazy when lazy data is provided 5) if any value is marked both 'eager' and 'lazy' then raise a compile-time error

and then during runtime:

1) upon exiting a function or calling into a function, for those outputs or parameter values that are marked eager, strictify 2) upon exiting a function, check to see if we are stuck in a mutual recursion loop that has not produced any lazy output in an entire pass. If so, then unless either (a) the data is marked 'lazy', or (b) the function at the top of the loop is marked 'lazyrec', strictify.

---

http://www.cs.cmu.edu/~me/150/resources/lectures/23/LazyEvaluation.pdf

---

If you haven't read it,

If you haven't read it, check out Why Functional Programming Matters. By Ehud Lamm at Wed, 2007-05-30 18:05

Why Functional Programming Matters
login or register to post comments

Why Functional Programming Matters by John Hughes (1984)

Let me reemphasize this. This is one the best motivating papers for lazy evaluation and higher order functions even 23 years after it's creation. The most significant feature missing from it's treatment of laziness is circular programming, a.k.a. tying the knot. By Derek Elkins at Wed, 2007-05-30 22:15

login or register to post comments

http://www.haskell.org/haskellwiki/Tying_the_Knot

--

i think this is probably not a good idea, but just putting it out there: should we be lazy only across fn boundaries?

--

mb have laziness only for referentially transparent things, and have strictness for non-referentially transparent things, including not only I/O to the outside world, but also accesses to aliased reference variables.

(purely stochastic functions are technically non-referentially transparent but we'll treat them as if they were; what we are really concerned about is spooky action at a distance, e.g. aliasing)

if you do this, though, then what about accesses to aliased reference variables which are 'within one of the current monads'? this should logically be done lazily, because accesses to something 'within one of the current monads' should not 'count' as aliased reference right? but in that case since there we may have (commutative) multiple monads, this doesn't jive with the 'one receiver' for non-referentially transparent things (although maybe it does, since each such thing's 'receiver' just means either that the thing itself is accessed only via a API/set of methods, or that accesses to it are syntactic sugar for calls to methods of its monad?).

--

if we do something like that, must make sure that stuff like backtracking monads ( http://www.randomhacks.net/2007/03/12/monads-in-15-minutes/ ) still exist/work

--