Bayle Shanks's website: ideas-computer-jasper-jasperTypeNotes1

things that make type systems confusing

the Haskell type class defn syntax

"class Eq a where " looks like a is a type parameter of Eq. (btw does haskell allow a destructuring bind on a in that statement?)

http://stackoverflow.com/questions/3467279/how-to-create-a-polyvariadic-haskell-function

you are using the type system's type inference combined with polymorphism to decide something which feels syntactic
in unpacking this, you switch from reading things from left-to-right and right-to-left three times
- when you first see 'sumOf 5 8 22 :: Integer', you read it left to right. And indeed, that's how it parses: (((sumOf 5) 8) 22) :: Integer
- but now, you see that you need to determine the type of 'sumOf 5' in order to compute the first term. You know it takes an Int as integer. But you don't know what type it yields as output.
- So you look further to the right. 8 is an Int, but you still don't know what you need. So you look further to the right. 22 is an Int and you gotta have an Int at the end. So the type of ((sumOf 5) 8) 22 must be Int -> Int.
- And now you have to switch and go right to left. Since ((sumOf 5) 8) 22 must be Int -> Int, ((sumOf 5) 8) must be Int -> Int -> Int, and sumOf 5 must be Int -> Int -> Int -> Int.
- Now you start substituting the fn defn. Int -> Int -> Int -> Int matches (Int -> (*)) so (sumOf 5) is replaced by (sumOf . (5 +) . toInteger).
- But now you have to go right again. (((sumOf . (5 +) . toInteger) 8) 22). And now you have to go leftwards from the 8 (not sure if i got this step right, i forgot how . work w/ grouping): ((sumOf . (5 +) . (toInteger 8)) 22) -> ((sumOf . (5 +) 8) 22) -> ((sumOf (5 + 8)) 22).
- Now you have to go right to left again (or just remember your earlier going-left work); ((sumOf (5 + 8)) 22) :: Int so (sumOf (5 + 8)) :: Int -> Int so sumOf :: Int -> Int -> Int. Then go left to right again to expand: (sumOf (5 + 8)) 22 -> (sumOf . ((5 + 8) +) . toInteger) 22. Then go right to left again to apply: (sumOf . ((5 + 8) +) . toInteger) 22 -> (sumOf . ((5 + 8) +)) 22 -> sumOf . ((5 + 8 + 22))
the types of parts of the expression are unexpected and context-dependent and seemingly inconsistent with each other. You wouldn't expect the type of 'sumOf 5' to be 'Int -> Int -> Int', you'd expect it to be 'Int', because if you said (sumOf 5) + 3 you'd get 8. It's only 'Int -> Int -> Int -> Int' in this particular context. Indeed, right in this derivation, we see that (sumOf 5) is of type Int -> Int -> Int -> Int, but (sumOf (5+8)) is of type Int -> Int, and (sumOf (5+8+22)) is of type Int->Int->Int.
in fact, the compiler is using its knowledge of how many times the function will be applied in the future to alter the function's behavior. In theory, you could make a function which does one thing if it will be applied less than 143 times before a scalar results, and another thing if it will be applied 143 times before a scaler results, and another thing if it will be applied 144-251 times... etc. The type logic counts via type-level inductive hooey. This seems like an abomination because it seems like something syntactic far away from the location of the function in the source code, and in the future from the perspective of data flow, is affecting the semantics of the function.

Hmm, the more i think about it, the more i think that we should ban type inference determining ad-hoc polymorphism which is polymorphic only in the return type (see [1] for more reasoning on that). This won't make the language any less powerful than, say, Python (in Python there is no type inferencing so in order to make an ad-hoc polymorphic choice, the class of the value that the object that you are dispatching on must differ (which would correspond to an input value in Haskell and probably in Jasper). So i'm proposing that in Jasper, you can have all the type inferencing you want, it's just that type inferencing won't be allowed to drive ad-hoc polymorphism from the "future" (from the types of the outputs).

Or, try THIS one on for size: the statically known (at the time of compiling the typeclass) kinds of the return types of different ad-hoc poly fns of the same inputs may not vary. Hmm.. but that sounds like you cant have two instances of the same kind of two different typeclasses. For example, it would prevent the empty homogeneous list from being both an empty list of ints and an empty list of fns, at different points in the program. No... it just prevents the existence of a fn, 'empty', that returns that. You can still have Int.empty and Fn.empty; or even, empty(Int) and empty(Fn). So, you can't have a single fn, 'empty', that sometimes returns statically known lists of ints and sometimes statically known lists of Fns, on the same inputs. This should prevent any type-level inductive hooey like Haskell's variadic fns, which uses the static knowledge of how many times the result of a fn will be applied in the future, which seems like a 'syntactic' thing, to alter the behavior of the function over the various applications. It still allows 'messages from the future', that is, it allows the compiler to automatically select among multiple instantiations of return-type-only-ad-hoc-polymorphic functions based on the expected return value, but it guarantees that all of the possibilities will be of the same kind (and kind incorporates the notion of 'syntactic' difference here)

but you could get around this by giving a dummy argument to the fun (empty(dummy)) so that the compiler can infer 'dummy' to be the desired return type

example from http://kawagner.blogspot.com/2006/12/my-haskell-experience.html

" Lets have two simple, small functions:

giveElements :: (a -> Bool) -> [[a?]] -> [[a?]] giveElements condition = filter (detectElement condition)

detectElement :: (a -> Bool) -> [a] -> Bool detectElement = any

The above seems a bit useless but I've extracted from it's surrounding code to show the problem. The 'giveElements' function is able to extract certain elements from a list and returns the filtered list. The 'detectElement' function does the job of deciding which elements are in and which are not. It does this by checking if any element of a list fulfills a certain condition.

So short so nice, but after some work on your program you discover that your 'condition' function required access to outside state to do it's job. Maybe the function has to do some a small modification of some data or should simply write a short message into a log. To allow this, you have to rewrite the above functions to use a monad. For this example lets simply use the IO monad here. Now the functions become:

giveElements :: (a -> IO Bool) -> [[a?]] -> IO [[a?]] giveElements condition = filterM (detectElement condition)

detectElement :: (a -> IO Bool) -> [a] -> IO Bool detectElement c els = do r <- mapM c els return $ or r

The first function still looks quite similar, only the 'filter' function is replaced by 'filterM' now. 'filterM' is the monadic version of filter and gives the monad back from the condition. So without filterM the state won't get up from the detectElement function (and of course there would be a type error).

detectElement looks a bit more different. The reason is that there is no monadic version for 'any'. So I use a monadic map (the mapM function) to create a list of Bools by applying the condition to each element and then check if at least one of the results it true. This doesn't requires more evaluations of c than the first version because of lazy evaluation, but it still required a complete rewrite of my code. It's still readable, but not as much as before.

...

And if you have more complex functions the above translation process from a non-monadic into a monadic-one is much more difficult and IMO it also destroy the clarity of the code. Also state monads have bad runtime characteristics because of their 'inside-out' evaluation which can fill up the stack quickly. You can use IORefs instead - but then you can't simply transform your program to using continuations anymore for example. And if you put a IO monad into a state-monad, you have to lift all IO-accessing functions via liftIO.

And whats the point of having a pure functional language, going thru lots of trouble to maintain the purity, only to get things like IORef which you tried to get rid of in the first place? If you design a program to use IO monads from the beginning everywhere, then you have fewer problem maintaining and expanding it - but then you're using Haskell like a normal imperative language. Whats the point of Haskell then? ...

Its not a problem of functional programming in itself, its a Haskell thing: While monads are quite powerful in principle IMO they are a pain to use in practice. Because monads show up in the return type of functions, the easy composition of functions which is the prime reason for the power of functional programming only works as long as all those functions use (compatible) monads, and sadly many functions simply don't exists in this form. "

hopefully Jasper's attribute types will allow the monad-ness to be treated as orthogonal to everything else.

---

ok, whether or not we have principal types, i think languages to date have focused too much attention on them. Martin Odersky says http://www.codecommit.com/blog/scala/is-scala-not-functional-enough#comment-4162 "As to combining Hindley/Milner and subtyping: In principle it’s possible, but in practice it does not work so well. The problem is not so much that you end up too often with Any, but that inferred types become impractically large.". i'll have to look at scala to see what he came up with instead.

anyhow, my point is that some extant powerful type systems, like Haskell's, force the programmer to raise the level of abstraction but still to be very precise. You end up with long type expressions because of this. Also, you have to take all the facts about the type of an object and put them together into one long type expression, making sure that they are composed together in the proper fashion. For example, monads and monad transformers; the programmer things "this is a list of ints, and it is in the IO monad". To the programmer, these are two orthogonal pieces of information about the type of the object, but in the type system, one wraps around the other, and in a certain order (and it gets worse with monad transformers).

btw i'm sure someone has used the name "attribute types" before but it's probably something different from what i mean.

with attribute types, we never try to make the programmer interact with a single expression containing all the type information about a value (although they can query the compiler for that during debugging if they want). For example, this value is a list of something; this value is in the IO monad; this value is a list of ints. The programmer can look at either of these three as the type of the value in different contexts (note that the third implies the first).

---

in jasper, what haskell calls 'existential types', which means expressions which dont completely determine a type, e.g. 'some type with a Show method' will be the rule, not the exception. this allows things like heterogenous lists. i think the usual problem with this is (esp. if you feed such a list through a function which reshapes it), then you lose the typing information on what individual items were, and all the type system knows is that they are elements of the heterogenous list, with only whatever properties are common to all items in that list. then how do you then polymorph on individual elements later?

e.g. if you construct a structure where you know that the first item in each list is a function reference, and all the rest of the items are integers, how do you get the type system to let you apply the function reference later? all it knows is that these are each members of a heterogenous list. i think the Right Thing for this case according to languages like Haskell is to create a data type (ADT) that specifically has a fn in one field and then a list of ints in another. But this imposes the type system on the programmer; in addition to the task of writing the program, they have the new task of explaining it to the compiler. And what if your knowledge about the structure of the list is hard to express; what if you know that the list is a valid AST for some language; do you want to translate the entire semantic grammar of that language to Haskell's type system?

in jasper we want to make the task of explaining the program to the compiler optional. the 'promise' assertion will help here -- 'promise' tells the compiler to take something as a given (and to check it at runtime, unless runtime promise checks are disabled). This helps because you can tie promises to positions in code, rather than have them be universal truths about data types -- 'if we are here in the code, then you may assume this'. This saves you from having to construct a new data type for every state of knowledge that you have about your data (e.g. 'if this node of the AST is under a FOR-LOOP node, then the second child is an iteration variable symbol'), and from having to encode all your knowledge about the valid state transitions into the type system (e.g. an AST node of type AST-FOR-LOOP's second child is a node of type AST-UNDER-FOR-LOOP-ITERATION-VAR). Instead at the relevant point in your code you say 'we must be in an AST node corresponding to a for loop iteration symbol; i promise that this node is of type Symbol' This makes the program harder to reason about but easier to write, and even easier to read if the reader isn't an expert at the type system. It is equivalent to having weak typing/allowing unsafe type coercion like in C.

i think the other usual problem is, if you pass elements through a function that only demands an existential type, and creates new elements, you must distinguish between functions which guarantee to create new elements of the same type it was given, or functions which guarantee only that the new elements will satisfy the existential type.

---

i guess our type system would be described as

a type and effect system, optionally static, weak, inferred, structural (with optional nominal types)

---

in Haskell, existential types let you define a type which doesnt say 'this is a list of Ints', but rather, something like, 'this is a list of objects whose types are in the Show typeclass'. In Jasper all types are existential because 'everything is an interface'; you describe the properties that a type in a certain slot has, e.e. the interfaces that types belong to, not the types themselves.

although it should not be mandatory to state which exceptions your code might throw (that's the compiler's job), it should be possible for a caller to insist that you throw only a subset of a given set of exceptions, and that anything else is a type error

could use Coq for type assertions and proving stuff about Jasper programs, in the same way than Shen uses Prolog

would eventually have to port Coq into Jasper. note that Coq's not-Turing-completeness (via requiring 'obviously decreasing' structural recursion) is key.

hmm.. in addition to the restrictions contemplated above to prevent http://stackoverflow.com/questions/3467279/how-to-create-a-polyvariadic-haskell-function , might want to have more restrictions on global type inference than on local (see http://lambda-the-ultimate.org/node/2862 ).

taints and dataflow analysis (?)

the type system can be extended to detect 'taints' or 'untaints'. an untaint is the negation of a taint. a taint can be defined either as a whitelist (anything using only operations in the whitelist is untainted; anything built only from untainted operations is untainted; everything else is tainted) or a blacklist (anything using any operations in the blacklist is untainted; anything built only from tainted operations is tainted; everything else is untainted).

Note that the taint system has negation at the top (taint = not untaint) and at the bottom (the base case can be a whitelist or a blacklist), but that the bottom must be all one polarity or all the other, not both. This should simplify things by making it unable to express all boolean functions. I'm hoping that most of the type inference system can use positive logic, perhaps with a few extensions to deal with stuff like this.

note that the necessity of making a blacklist complete would require Jasper to fix a set of primitive operations that everything is compiled to. An alternative is to first design the system to be fully expressive, but then to only support a published list of primitive properties that are guaranteed to be defined correctly (e.g. blacklisted if necessary) for every language primitive (e.g. stuff like side effect-less/side effect-ful, deterministic (the other part of referential transparency), reads, writes, moves, null). This achieves the goal of giving the language the right sort of machinery that can be repurposed to other tasks.

a taint can be calculated based on the operations used to produce a value (e.g. 'side-effect free' is an untaint of this sort; a function built out of side-effect free parts is side-effect free), or based on all operations applied to a VARIABLE (e.g. 'unique' is an untaint of this sort; the unique value could be 3 or it could be a function, but what matters is that if it ever read into another variable, then the original value is never used anymore; another one is 'const'). these two are related; e.g. if we want a variable containing a reference to be immutable, then we can only apply immutability-preserving ('const' in that variable) functions to it; but a const function if a taint of the first kind (defined inductively because a function that only applies const (in that variable) functions to that reference, and doesnt use any primitive mutation ops on it, is itself const).

we need ways to express: (a) a value or variable has the property (e.g. immutability) and (b) that an operation preserves or breaks the property (is whitelisted or blacklisted) (e.g. a function that does not write to a reference is 'const' in the D language, meaning it doesnt break the immutability)

perhaps this is not yet enough. We should be able to express D's transitive const (http://dlang.org/const-faq.html). We should be able to express Rust's pointer type system ( http://static.rust-lang.org/doc/tutorial.html#boxes-and-pointers ). We should be able to express [2].

---

swkane had a good idea that i summarized as:

" that's an interesting perspective that i hadn't considered; let the types stay complicated but try to find a way to make readable labels for them so you only have to dig into the complexity when you need to. "

--- http://www.haskell.org/haskellwiki/GADTs_for_dummies ( http://web.archive.org/web/20130702221947/http://www.haskell.org/haskellwiki/GADTs_for_dummies )brings up an excellent point: type classes are like arbitrary functions on types, with normal Haskell stuff like pattern matching, algebraic data types (multiple constructors), guards, etc, except with a confusing relational (rather than functional) syntax.

this brings up the obvious point: Jasper could be like this but use normal syntax

seems like there is really a lot of mileage to be had just by delivering a uniform notation for data, code, patterns, types, typeclasses

remember that since type signatures are optional and principal types are not the core concept, you need a way to add type signatures to someone else's library code (without modifying their source code) ---

i guess maybe the reason why e.g. Haskell has an idiosyncractic syntax for typing is that they want to syntactically prohibit things they dont know how to automatically typecheck in a decidable, relatively complete way (relative to some nicely characterizable subset of dynamically valid programs)

GHC's typechecking algorithm has expanded over the years as progress was made

i guess i think it would be better to have a very simple, uniform syntax, assumptions, and compiler type errors that are simple enough to use in debugging, then allow for the case that sometimes the typechecker simply wont be up for the task, and the programmer must use assumptions to bridge the gap

see also PrincipalTypingConsideredHarmful?