Bayle Shanks's website: proj-oot-ootNotes3

http://bartoszmilewski.com/2009/05/21/unique_ptr-how-unique-is-it/ :

some type properties:

immutable "compiler, i demand that this cannot be mutated"
mutable: not immutable
const "compiler, i promise (and you must verify) that in this function i will not mutate this"
unique (relevant to pointer only): "compiler, i promise that at any one time there will be only one copy of this pointer)
lent "compiler, i promise (and you must verify) that in this function i will not alias (make an additional copy of without deleting the original) this pointer"
owned: not lent
owner: each object has an 'owner' object
- if an object is self-owned, it can be accessed by multiple threads (he calls this a "monitor")
- if an object is self-owned, all of its methods are implicitly synchronized
- if an object is self-owned, it cannot have a thread-local member.
by default, an object's owner is 'thread', meaning it is a threadlocal
unique and immutable objects' owners are 'unique' and 'immutable' respectively
instance variables are owned by their object, by default (another owner may be specified if desired, usually self-ownership e.g. monitors)
the root of each ownership tree (except for unique, immutable, and threadlocals) is associated with a lock. this lock must be held in order to access any owned object (not just write, but also reading, esp b/c of sequential non-concurrency)
except if an object is declared as 'lockfree', in which case no lock is necessary (but the compiler does ensure sequential concurrency; analogous to 'volatile' in Java or strong (not weak) atomics in C++). lockfree objects are not protected by the system, e.g. this is the way to escape from the system
you cannot embed a threadlocal inside a monitor
in his implementation, if you try to access an instance variable of a monitor without going thru its methods (which are implicitly synchronized), and without being within a synchronization block for that monitor (i.e. acquiring the lock for that monitor) you'll get a compile-time error. but for oot i think the compiler would just automatically enclose the statement within a synchronize block.
if one object was owned by another object without the former being embedded in the latter, then the owner must be declared 'final'. This is because the ownership relation must be unchangable throughout the life of the program.
Don’t share local (stack) variables—they might disappear
note: 'immutable' and 'unique' must be deep (recursive), not shallow
if immutable objects may be mutated during creation, make sure they aren't shared yet
note that value types (like int) are by definition immutable

i believe the previous is a complete description of milewsky's system, (except for owner polymorphism, which i think can be done in a simple flexible way in Oot with annotations anyway), but should ask milewsky to be sure. according to http://fpcomplete.com/the-downfall-of-imperative-programming/ , he's since embraced functional programming to avoid data races, but i'm guessing he still thinks the above methodology would do the trick.

in oot:

i think the 'lent' and 'const' could be inferred even if not declared
we probably want to make 'shared' the default rather than 'threadlocal'

additional operation: ':=', called "move": x := y == x = y; y = null;. The idea is that x gets the pointer and y doesn't have it anymore; this can be applied even to unique pointers, whereas normal =, which means copy, cannot.

you can declare a pointer with the type propery (other examples of type properties are , meaning "compiler, i demand that this cannot be mutated" and const, meaning)

his type system apparently provably avoids race condititions (both 'low level' and 'high level' race conditions, whatever that means). it does not prevent deadlocks.

he also says "Such system would be incomplete without some support for lock-free programming. There are very good reasons for the language to enforce sequential consistency. For comparison, C++ committee, after a long struggle, decided to allow non-SC primitives (weak atomics). Java only enforces SC for volatile variables. " then goes on to introduce the lockfree qualifier, which means "A lockfree variable is guaranteed to be always atomically accessed in a sequentially consistent way." He has noted elsewhere that 'low level' race conditions defeat even lockfree algorithms.

not quite sure what this means. i guess sequential consistency is a guarantee that sequential code will always execute sequentially? is he saying this is good and that C++ has gone too far? or is he saying that you only need it when you are writing lockfree algorithms, which is why it seems his language has SC off by default but on when the lockfree qualifier is used? i think the latter. he notes this example which i don't entirely understand:

Speaking of lock-free programming, here’s the implementation of the infamous Double-Checked Locking Pattern:

class Singleton<T> { private: lockfree T _value; public: T get() lockfree { if (_value is null) { synchronized(this) { if (_value is null) _value = new T; } return _value; } } }

A lockfree variable is guaranteed to be always atomically accessed in a sequentially consistent way. A lockfree method is not synchronized, but it can only operate on lockfree variables (or use a synchronized section). Even though the use of lockfree doesn’t introduce low-level races, it may create high-level races when accessing more that one lockfree variable at a time. But we all know that lock free programming is only for desperadoes. "

Lessons learned from previous work

There are many commonalities in various approaches to race freedom.

    Ownership should be built into the type system. In OO languages each object must have an owner. In C-like languages each piece of data must have an associated lock.
    There should be efficient mechanisms for passing data between threads
        By value
        By reference. (The object being passed must be a monitor.)
        As immutable data, by const reference
        By unique reference using move semantics

Most of those ideas are expressible through type qualifiers. "

other sources: http://bartoszmilewski.com/2009/11/30/unique-objects/

http://bartoszmilewski.com/2009/09/22/ownership-systems-against-data-races/ http://bartoszmilewski.com/2009/06/15/race-free-multithreading-owner-polymorphism/ http://bartoszmilewski.com/2009/06/02/race-free-multithreading-ownership/ http://bartoszmilewski.com/2009/09/22/ownership-systems-against-data-races/

---

 "Andrei insists that "shared" should guarantee the absence of low level data races. Whenever I give examples of how this makes performance unpredictable, I hear the answer: "cast shared away". Currently nobody complains about it because this feature hasn't even been implemented. "

" D1 to be discontinued on December 31, 2012 (after the end of the world) by andralexin programming

[–]DrBartosz? 7 points 8 months ago

I'd look at existing solutions (Chapel, C++AMP, and even Haskell) as a starting point for task-based parallelism and GPU. I like the ownership system and it would be nice to have it in the language, but that's not the highest priority. What I'd like to see is better support for move semantics/uniqueness, which is essential for thread/task safety. I'm also not sure what to think about the "shared" experiment. The idea seemed good at the time, but we could never agree (or I could never agree with the rest of the team) about the details. So now it seems like a half baked idea (essentially, in order to do anything interesting, you have to cast "shared" away). "

much earlier (i am writing this in aug 2012) he talks about http://bartoszmilewski.com/2008/07/30/sharing-in-d/ and http://bartoszmilewski.com/2008/08/11/sharing-and-unsharing-of-data-between-threads/ . if his objections are unchanged since then, the performance objections are:

" The first is performance–not for shared objects, mind you, but for the non-shared ones. Walter tells us that accessing a thread-local static variable adds between 1 to 3 instructions of overhead. That seems quite reasonable. Especially considering that in multi-threaded environment the use of global non-shared variables is rarely correct.

There is also a performance penalty when starting a new thread–all static variables it has access to have to be default-initialized, plus all module constructors have to be called. This might amount to quite a bit. We will recommend not to overuse global variables and module constructors. The way to amortize this cost is to create thread pools. "

another objection seems to be the difficulty of 'unsharing' objections, and of resharing them after unsharing them. this sort of thing is necessary if you want to get a shared object from another thread and then pass it into an unknown library.

also " Note to D programmers: The current semantics of D “shared” is slightly different from my proposal. For instance, it forces all embedded objects to be monitors (their methods must be synchronized by their own lock), requires explicit use of the synchronized keyword, and forces all access in non-synchronized methods to be sequentially consistent. (And it doesn’t guarantee freedom from races.)"

dunno if this is still his latest.

btw the 'promise' vs. 'demand' thing w/r/t const vs. immutable and lent vs. unique (lent should be called noalias or somesuch) seems pretty common (wasn't this also the deal with Haskell's existentially quantified types or maybe with Haskell typeclasses vs. Java interfaces? or were those other things?). should have syntax for it, so you can just say 'i promise it will be immutable' and 'i demand it be immutable' without learning two keywords for each of these cases and remembering which one is which (or at least, have a keyword naming convention like promiseImmutable demandImmutable)

---

toread: http://bartoszmilewski.com/2011/01/05/using-f-sequences-and-pipelines/ http://bartoszmilewski.com/2010/09/11/beyond-locks-software-transactional-memory/

--- http://bartoszmilewski.com/2010/05/11/parallel-programming-with-hints/

---

" The standard concern is that new type qualifiers introduce code duplication. The classic example is the duplication of getters required by the introduction of the const modifier:

class Foo { private Bar _bar; public: Bar get() { return _bar; } const Bar get() const { return _bar; } } "

in Oot, Oot should be able to infer that the const-ness holds iff certain of the args passed in are const

---

STM

" Back to shared memory, this time without locks, but with transactional support. STM, or Software Transactional Memory, is a relatively new paradigm that’s been successfully implemented in Haskell, where the type system makes it virtually fool-proof. So far, implementations of STM in other languages haven’t been as successful, mostly because of problems with performance, isolation, and I/O. But that might change in the future–at least that’s the hope. "

" In Chapel, you use the forall statement for loops or begin to start a task. "

" All three HPCS languages chose fine-grained tasks (activities in X10, implicit threads in Fortress), not threads, as their unit of parallel execution. A task may be mapped to a thread by the runtime system but, in general, this is not a strict requirement. Bunches of small tasks may be bundled for execution within a single system thread. "

" To make things more concrete, let’s say you want to distribute a (not so big) 4×8 matrix among currently available locales by splitting it into blocks and mapping each block to a locale. First you want to define a distribution–the prescription of how to distribute a block of data between locales. Here’s the relevant code in Chapel:

const Dist = new dmap(new Block(boundingBox=[1..4, 1..8]));

A Block distribution is created with a bounding rectangle of dimension 4×8. This block is passed as an argument to the constructor of a domain map. If, for instance, the program is run on 8 locales, the block will be mapped into 8 2×2 regions, each assigned to a different locale. Libraries are supposed to provide many different distributions–block distribution being the simplest and the most useful of them.

When you apply the above map to a domain, you get a mapped domain:

var Dom: domain(2) dmapped Dist = [1..4, 1..8];

Here the variable Dom is a 2-D domain (a set of indices) mapped using the distribution Dist that was defined above. Compare this with a regular local domain–a set of indices (i, j), where i is between 1 and 4 (inclusive) and j is between 1 and 8.

var Dom: domain(2) = [1..4, 1..8];

Domains are used, for instance, for iteration. When you iterate over an unmapped domain, all calculations are done within the current locale (possibly in parallel, depending on the type of iteration). But if you do the same over a mapped domain, the calculations will automatically migrate to different locales.

This model of programming is characterized by a very important property: separation of algorithm from implementation. You separately describe implementation details, such as the distribution of your data structure between threads and machines; but the actual calculation is coded as if it were local and performed over monolithic data structures. That’s a tremendous simplification and a clear productivity gain. "

" The combination of STM and PGAS in Chapel necessitates the use of distributed STM, an area of active research (see, for instance, Software Transactional Memory for Large Scale Clusters).

In Chapel, you not only have access to atomic statements (which are still in the implementation phase) and barriers, but also to low level synchronization primitives such as sync and single variables–somewhat similar to locks and condition variables. The reasoning is that Chapel wants to provide multi-resolution concurrency support. The low level primitives let you implement concurrency in the traditional style, which might come in handy, for instance, in MPMD situations. The high level primitives enable global view programming that boosts programmer productivity.

However, no matter what synchronization mechanism are used (including STM), if the language doesn’t enforce their use, programmers end up with data races–the bane of concurrent programming. The time spent debugging racy programs may significantly cut into, or even nullify, potential productivity gains. Fortress is the only language that attempted to keep track of which data is shared (and, therefore, requires synchronization), and which is local. None of the HPCS languages tried to tie sharing and synchronization to the type system in the way it is done, for instance, in the D programming language (see also my posts about race-free multithreading). "

" Here’s the tongue-in-cheek summary of the trends which, if you believe that the HPCS effort provides a glimpse of the future, will soon be entering the mainstream:

    Threads are out (demoted to latency controlling status), tasks (and semi-implicit parallelism) are in.
    Message passing is out (demoted to implementation detail), shared address space is in.
    Locks are out (demoted to low-level status), transactional memory is in."

" Message passing’s major flaw is the inversion of control–it is a moral equivalent of gotos in un-structured programming (it’s about time somebody said that message passing is considered harmful). MP still has its applications and, used in moderation, can be quite handy; but PGAS offers a much more straightforward programming model–its essence being the separation of implementation from algorithm. The Platonic ideal would be for the language to figure out the best parallel implementation for a particular algorithm. Since this is still a dream, the next best thing is getting rid of the interleaving of the two in the same piece of code. "

" But if you send a message to one thread and set up a handler for the result in another, or have one big receive or select statement at the top to process heterogeneous messages from different sources, you are heading towards the land of spaghetti. If you’re not careful, your program turns into a collection of handlers that keep firing at random times. You are no longer controlling the flow of execution; it’s the flow of messages that’s controlling you. Again, programmer productivity suffers. (Some research shows that the total effort to write an MPI application is significantly higher than that required to write a shared-memory version of it.)"

STM:

" (Transactions unfortunately do not address one other issue, which turns out to be the most fundamental of all: sharing. Indeed, TM is insufficient – indeed, even dangerous – on its own is because it makes it very easy to share data and access it from multiple threads; first class isolation is far more important to achieve than faux isolation. This is perhaps one major difference between client and server transactions. Most server sessions are naturally isolated, and so transactions only interfere around the edges. I’m showing my cards too early, and will return to this point much, much later in this essay.) '" -- http://www.bluebytesoftware.com/blog/2010/01/03/ABriefRetrospectiveOnTransactionalMemory.aspx

---

" Everything I described so far is common to CSP (Communicating Sequential Processes) and the Actor model. Here’s what makes actors more general:

Connections between actors are dynamic. Unlike processes in CSP, actors may establish communication channels dynamically. They may pass messages containing references to actors (or mailboxes). They can then send messages to those actors. Here’s a Scala example:

receive { case (name: String, actor: Actor) => actor ! lookup(name) }

The original message is a tuple combining a string and an actor object. The receiver sends the result of lookup(name) to the actor it has just learned about. Thus a new communication channel between the receiver and the unknown actor can be established at runtime. (In Kilim the same is possible by passing mailboxes via messages.) " -- http://bartoszmilewski.com/2009/07/16/on-actors-and-casting/

" The D programming language with my proposed race-free type system could dramatically improve the safety of message passing. Race-free type system distinguishes between various types of sharing and enforces synchronization when necessary. For instance, since an Actor would be shared between threads, it would have to be declared shared. All objects inside a shared actor, including the mailbox, would automatically inherit the shared property. A shared message queue inside the mailbox could only store value types, unique types with move semantics, or reference types that are either immutable or are monitors (provide their own synchronization). These are exactly the types of messages that may be safely passed between actors. Notice that this is more than is allowed in Erlang (value types only) or Kilim (unique types only), but doesn’t include “dangerous” types that even Scala accepts (not to mention Java or C++). "

" The two topics that stood out were: functional programming and DSLs. It seems like every language tries to assimilate some functional elements in order to deal with parallelism. Still, the problem is that in most languages the added safety features for concurrency are not enforceable and don’t compose well. A compiler can potentially check whether a lambda passed to a parallel loop doesn’t modify shared state, but it can’t extend this check to functions called from that lambda. The solution would be an effect system, which includes side effects of a function in its signature, but effect systems are extremely hard to program with. Compare this with Haskell monads, which serve as a simple but safe effect system. Unfortunately many imperative programmers have fits when they hear the M word. (I remember C programmers having the same reaction to the word “object.”) "

---

" The especially interesting Microsoft talk was about support for asynchronous programming in C# and VB. The speaker was Mads Torgersen, the organizer of the conference. Mads described a coroutine-style programming model where you can block in any part of your code using await and, when an event is triggered, resume execution right where you stopped. In the meanwhile, your program can continue executing other code — on the same thread. (You might recognize here a continuation monad in disguise.) This is a perfect model for GUI applications that call asynchronous operations. I talked to Herb Sutter about it later and he said that a very similar thing is being proposed for C++ (I’m going to be on the C++ sub-committee for parallelism with him, and I’ll report more details). " --- " Herb also had a talk about C++11 and how different it feels from the previous versions. For me the really big deal is move semantics that tips the scales from passing by reference to passing by value — where passing by value is really passing by reference behind the scenes."

---

" The newest addition to C# is support for asynchronous interfaces (somewhat similar to Boost::ASIO). This is a hot topic at Microsoft because the new Windows 8 runtime is based on asynchronous API — any call that might take more than 50ms is implemented as an asynchronous API. Of course you can program to asynchronous API by writing completion handlers, but it’s a very tedious and error-prone method. Microsoft’s Mads Torgersen described how C# offers several layers of support for asynchronous programming.

But what caught my interest was how C# deals with composition of futures (they call them task objects). They have the analog of an aggregate join called WhenAll? and an equivalent of “select” called WhenAny?. However these combinators do not block; instead they return new futures. There is another important combinator, ContinueWith?. You give it a function (usually a lambda) that will be called when the task completes. And again, ContinueWith?