Bayle Shanks's website: proj-oot-ootTypeNotes1

if a variable can have many interface types (predicates? attribute types?), then need to decide at some point how much inference the compiler will do for you. right now i'm thinking that maybe CNF should be available but not DNF. So, for example, if you say that variable X is either of type A or B, and A implies Meters and Int, and B implies Meters and Float, then the compiler should allow you to pass X to something that demands an input of Meters -- that is, it performs inference to convert "(Meters and Int) or (Meters and Float)" to CNF form, namely, "Meters and (Int or Float)". And you could provide a case statement with "(Meters and Int) or (Meters and Float)", or a case statement with "(Int or Float)".

But not the reverse; when you provide a case statement on types, the compiler has to prove that your cases are exhausive, but if you say that X is either Meters or Feet, and then separately say that X is either Int or Float, then you cannot provide four cases ("(Meters and Int) or (Meters and Float) or (Feet and Float) or (Feet and Int)") -- the DNF of what is known about X is not computed. However, you could have a case statement with Meters or Feet, and then a nested case statement with Int or Float.

To give examples with more inference, you could say that X is "(Meters and Int) or (Meters and Float) or (Feet and Float)". Now you could have a case with "Meters or Feet", and in the Feet branch, you could assume it was Float.

so, the verification algorithm for verifying predicate P given known information X: if X is a conjunction, then for each conjunct C in X, search for syntactic matches of C (or weaker; so if C is "A or B", then "A or B or D" matches) in P (including subexpressions of P), and replace them with True (and simplify by removing the Trues, ie. "A and True" -> "A"). (to save time later, you might want to remove atomic conjuncts from X after doing this, assuming you won't expand some named type later and find one of those inside of it). if P becomes True, return Match

  if P is atomic, and wasn't changed to True, return False

  if P is a conjunction, then recursively test each conjunct, and return Match iff they all return Match

  else, P must be a disjunction. return Match if, for any conjunct C of X (C must be a disjunction) (if X is a disjunction, then you have only one C, namely X itself), this returns Match:
       
       test every disjunct Dx in C as follows, and return Match iff they all return Match:
         if Dx is syntactically weaker than P (unnecessary; will be replaced in step 1 of recurse), or some disjunct of P (is that step neccessary? i guess not, for the same reason), return Match
         o/w, recurse, matching Dx against P

todo: add XORs, negation

some examples:

if you say X is "(Meters and Int and Red and Tall) or (Meters and Float and Red and Tall) or (Feet and Float and Red and Tall) or (Feet and Int and Blue and Tall)"

case on "Meters or Feet" and then if it's meters, can you assume it's Red? yes

case on "Red or Blue" and then if it's Red, case on "(Meters and Int and Tall) or (Meters and Float and Tall) or (Feet and Float and Tall)"

case on "Red or Blue" and then if it's Red, case on "(Meters and Int) or (Meters and Float) or (Feet and Float)", and then assume Tall

If you said X is "(Meters and Int and C) or (Meters and Float and D) or (Feet and Float or Red)", and C was "(Int and Red) or (Float and Blue)", and D was "(Float and Red) or (Int and Blue)", you cannot assume that X is Red, because you cannot express that Int and Float are mutually exclusive, because the system is monotonic (ppl also say "monotone"). or mb i should say "positive", b/c not only are the results of deductions in the system monotonic, but also all of the expressions comprehended by the system involve only positive literals.

for example, if you say X is "(Meters and ((Int and Red) or (Float and Red)) and Tall) or (Feet and ((Float and Red) or (Int and Blue)) and Tall)", then you can do

cases "(Tall and Red) or Feet" cases "Red or (Feet and ((Float and Red) or (Int and Blue)))" cases "Red or Blue" cases "Red or (Int and Blue)"

what you cant do is X = "(A or B) and (C or D)", cases "(A and C) or (A and D) or (B and C) or (B and D)"

the reason you cant do X = "(A or B) and (C or D)", cases P="(A and C) or (A and D) or (B and C) or (B and D)" is that the information from the different disjunctions in X (the two different conjuncts, that is) is never combined; the algorithm iterates over ["(A or B)", "(C or D)"] in the line "for any conjunct C of X". Each of the tests will fail, because "A or B" is not strong enough to imply P, and neither is "C or D". in other words, looking at this as algebraic simplification, "and" is never distributed over "or".

for the same reason, you cant do

X = "(A or C) and (A or D) and (B or C) and (B or D)" (cases) P = "(A and B) or (C and D)"

the reverse of both of these matches

however, note that if

X = "(Meters and ((Int and Red) or (Float and Red)) and Tall) or (Feet and ((Float and Red) or (Int and Blue)) and Tall)", then you can do P = "Meters or Feet"

then within the both case branches, you are permitted to assume Tall, and within the first branch, you may assume Red. This is because the compiler transparently propagates all assumptions back up to the case statement (or fn defn), and so effectively does

P = "(Meters and Tall and Red) or (Feet and Tall)"

perhaps that is misleading, tho? what if you did

X = "(Meters and ((Int and Red) or (Float and Red)) and Tall) or (Feet and ((Float and Red) or (Int and Blue)) and Tall)", then you can do P = "Red or Feet"

and then assumed Meters in the first branch? If this is transparently added to the condition, then things which are "Feet and Float and Red" will always go through the second branch, whereas just from looking at the case statement as written, it is ambiguous which branch these examples go thru (they match both conditions). i guess the easiest way to deal with this is to leave it "undefined" which branch of a case statement things go thru in case of ambiguity (i.e. when they fit the conditions for multiple branches).

but what about when someone wants to have a sort of case statement where instances go thru EVERY applicable branch, rather than ANY applicable branch? well, in that case, i guess you'd not automatically add other assumed things; the code in the branches of such a conditional should assume that the ONLY type information that can be assumed is what is in the conditional. but, anyway, the most common time when P will be a disjunct is not when there is an explicit case statement, but rather when some variable is required to be a member of a disjunctive type

(note that, in general, the facility for proving that case statements are exhaustive removes the need for defaults in such cases -- maybe a syntactic sugar transformation that transforms case statements without defaults into something else? all cases would have to be by type for this to work)

examples: cases "(Tall and Red) or Feet": P = (Tall and Red) or Feet X = "Tall and ((Red and (Int or Float)) or (Feet and ((Float and Red) or (Int and Blue))))"

P = Red or Feet X = "(Red and (Int or Float)) or (Feet and ((Float and Red) or (Int and Blue))"

dx1 = (Red and (Int or Float)) P = Red or Feet

dx1 = (Int or Float) P = True or Feet dx1 matches

dx2 = (Feet and ((Float and Red) or (Int and Blue)) P = Red or Feet

dx2 = (Float and Red) or (Int and Blue) P = Red or True dx2 matches

both disjuncts match, Match

cases "Red or (Feet and ((Float and Red) or (Int and Blue)))" P = Red or (Feet and ((Float and Red) or (Int and Blue))) X = Tall and ((Red and (Int or Float)) or (Feet and ((Float and Red) or (Int and Blue))))

P = Red or (Feet and ((Float and Red) or (Int and Blue))) X = (Red and (Int or Float)) or (Feet and ((Float and Red) or (Int and Blue)))

dx1 = Red and (Int or Float) P = Red or (Feet and ((Float and Red) or (Int and Blue)))

dx1 = Int or Float P = True or (Feet and ((Float and Red) or (Int and Blue))) dx1 matches

dx2 = Feet and ((Float and Red) or (Int and Blue)) P = Red or (Feet and ((Float and Red) or (Int and Blue)))

dx2 = (Float and Red) or (Int and Blue) P = Red or (True and ((Float and Red) or (Int and Blue)))

dx2 = (Float and Red) or (Int and Blue) P = Red or ((Float and Red) or (Int and Blue))

dx2 = (Float and Red) or (Int and Blue) {still in step 1} P = Red or True dx2 matches

both disjuncts match, Match

cases "Red or Blue" ? X = Tall and ((Red and (Int or Float)) or (Feet and ((Float and Red) or (Int and Blue)))) P = Red or Blue

X = (Red and (Int or Float)) or (Feet and ((Float and Red) or (Int and Blue))) P = Red or Blue

dx1 = Red and (Int or Float) P = Red or Blue

dx1 = Int or Float P = True or Blue dx 1 matches

dx2 = Feet and ((Float and Red) or (Int and Blue)) P = Red or Blue

dx2 = (Float and Red) or (Int and Blue) P = Red or Blue

dx21 = Float and Red P = Red or Blue

dx21 = (True) P = True or Blue dx21 matches

dx22 = Int and Blue P = Red or Blue dx22 matches

both disjuncts match, dx2 matches both disjuncts match, Match

cases "Red or (Int and Blue)" this'll work too -- it'll be the same as the previous until dx22, which will match P directly rather than one of its conjuncts matching

(note: we should not expand atoms until we get to them (or rather, until we get just above them; if we are in an OR, we must expand the atoms now in case they are ORs too, and then for those atoms which turn out to be conjuctions, we can leave them that way, for those that turn out to be ORs, we must expand each of their atomic disjuncts), and we should keep a running list of what has been set to True)

X = "(A and B) or (C and D)" (cases) P = "(A or C) and (A or D) and (B or C) and (B or D)"

matches

X = (A and C) or (A and D) or (B and C) or (B and D)" (cases) P="(A or B) and (C or D)"

matches

as per doug's idea, if something doesn't typecheck, optionally the compiler can run an NP-complete algorithm to try and make it typecheck (equivalently, compiler option to use NP-complete alg from the get-go)

each symbol in the program is associated with a node. the arcs from that node are various properties possessed by the symbol (e.g. the symbol possesses each of these properties simultaineously).

---

the type system must:

be extensible enough to easily support milewski's race-free type system (see below) as an extension (we'll include this in the std language, but it's a good example of the kind of extensibility that we want)
- this probably requires an extensible form of data flow analysis in order to support deduction of e.g. whether a function preserves uniqueness
a way to express "for function f(x,y), if argument x is const, then we return a const" (e.g. you shouldn't have to define the fn twice like in C++, see http://bartoszmilewski.com/2009/06/15/race-free-multithreading-owner-polymorphism/ )
a way to express a type demand vs. a type promise (e.g. D's immutable vs. const)
support typeclasses (and through them, ad-hoc polymorphism)
support parametric polymorphism
support checking arbitrary preconditions and postconditions and propagating the relevant assertions
support polymorphic dispatch based on type
be optional (equivalently, support an 'anything' type); for untyped things, the effect is that polymorphic dispatch based on type will be done at runtime
deduce the purity of functions (assuming that impure primitives are annotated as such)
not be terribly complicated

---

milewski's race-free type system:

http://bartoszmilewski.com/2009/05/21/unique_ptr-how-unique-is-it/ :

some type properties:

immutable "compiler, i demand that this cannot be mutated"
mutable: not immutable
const "compiler, i promise (and you must verify) that in this function i will not mutate this"
unique (relevant to pointer only): "compiler, i promise that at any one time there will be only one copy of this pointer)
lent "compiler, i promise (and you must verify) that in this function i will not alias (make an additional copy of without deleting the original) this pointer"
owned: not lent
owner: each object has an 'owner' object
- if an object is self-owned, it can be accessed by multiple threads (he calls this a "monitor")
- if an object is self-owned, all of its methods are implicitly synchronized
- if an object is self-owned, it cannot have a thread-local member.
by default, an object's owner is 'thread', meaning it is a threadlocal
unique and immutable objects' owners are 'unique' and 'immutable' respectively
instance variables are owned by their object, by default (another owner may be specified if desired, usually self-ownership e.g. monitors)
the root of each ownership tree (except for unique, immutable, and threadlocals) is associated with a lock. this lock must be held in order to access any owned object (not just write, but also reading, esp b/c of sequential non-concurrency)
except if an object is declared as 'lockfree', in which case no lock is necessary (but the compiler does ensure sequential concurrency; analogous to 'volatile' in Java or strong (not weak) atomics in C++). lockfree objects are not protected by the system, e.g. this is the way to escape from the system
you cannot embed a threadlocal inside a monitor
in his implementation, if you try to access an instance variable of a monitor without going thru its methods (which are implicitly synchronized), and without being within a synchronization block for that monitor (i.e. acquiring the lock for that monitor) you'll get a compile-time error. but for oot i think the compiler would just automatically enclose the statement within a synchronize block.
if one object was owned by another object without the former being embedded in the latter, then the owner must be declared 'final'. This is because the ownership relation must be unchangable throughout the life of the program.
Don’t share local (stack) variables—they might disappear
note: 'immutable' and 'unique' must be deep (recursive), not shallow
if immutable objects may be mutated during creation, make sure they aren't shared yet
note that value types (like int) are by definition immutable

i believe the previous is a complete description of milewsky's system, (except for owner polymorphism, which i think can be done in a simple flexible way in Oot with annotations anyway), but should ask milewsky to be sure.

in oot:

i think the 'lent' and 'const' could be inferred even if not declared
we probably want to make 'shared' the default rather than 'threadlocal'

additional operation: ':=', called "move": x := y == x = y; y = null;. The idea is that x gets the pointer and y doesn't have it anymore; this can be applied even to unique pointers, whereas normal =, which means copy, cannot.

you can declare a pointer with the type propery (other examples of type properties are , meaning "compiler, i demand that this cannot be mutated" and const, meaning)

his type system apparently provably avoids race condititions (both 'low level' and 'high level' race conditions, whatever that means). it does not prevent deadlocks.

Lessons learned from previous work

There are many commonalities in various approaches to race freedom.

    Ownership should be built into the type system. In OO languages each object must have an owner. In C-like languages each piece of data must have an associated lock.
    There should be efficient mechanisms for passing data between threads
        By value
        By reference. (The object being passed must be a monitor.)
        As immutable data, by const reference
        By unique reference using move semantics

Most of those ideas are expressible through type qualifiers. "

other sources: http://bartoszmilewski.com/2009/11/30/unique-objects/

http://bartoszmilewski.com/2009/09/22/ownership-systems-against-data-races/ http://bartoszmilewski.com/2009/06/15/race-free-multithreading-owner-polymorphism/ http://bartoszmilewski.com/2009/06/02/race-free-multithreading-ownership/ http://bartoszmilewski.com/2009/09/22/ownership-systems-against-data-races/

" The D programming language with my proposed race-free type system could dramatically improve the safety of message passing. Race-free type system distinguishes between various types of sharing and enforces synchronization when necessary. For instance, since an Actor would be shared between threads, it would have to be declared shared. All objects inside a shared actor, including the mailbox, would automatically inherit the shared property. A shared message queue inside the mailbox could only store value types, unique types with move semantics, or reference types that are either immutable or are monitors (provide their own synchronization). These are exactly the types of messages that may be safely passed between actors. Notice that this is more than is allowed in Erlang (value types only) or Kilim (unique types only), but doesn’t include “dangerous” types that even Scala accepts (not to mention Java or C++). "

---

note: i guess that in general, attributes that you want the type system to keep track of have a polarity; if they are absent, that says nothing about the system, but if they are present, some proposition is true. e.g. if a variable is immutable, that constrains something, but if it is not immutable, that constrains nothing.

however, sometimes you may need to assert that something is immutable, and sometimes you may need to assert that it is not immutable, right? not exactly... you can look at it as, sometimes (e.g. for milewski race-freedom) you need to prove that something is immutable. functions that mutate it in this viewpoint don't demand that it be mutable, they just mutate it and then the compiler infers that the immutability condition isn't true for that variable, and then the other part of the program, that demands immutability, doesn't compile.

and with structural typing, you might demand that something has such-and-such a member or supports such-and-such an operation, but you wouldn't ever need to demand that such-and-such a member is absent.

otoh immutable types cannot have a (usable) __set function. but i think we can neglect that level of expressiveness for tractability.

immutability and the like, then, are calcuated by specifying, for a basis set of primitive values, whether or not they are immutable, and then for every function, specifying upon which inputs/outputs some forbidden operation has occured. this idea also works for 'unique'.

e.g. for 'f(x) = y := x; y', where := is the move operator (i'd use ->), the compiler would infer, 'does not violate uniqueness for x; does not violate uniqueness for y', based on its knowledge that the primitive := does not violate the uniquness of either its input nor its output.