proj-oot-concurrency

Concurrency ideas

transactions

futures

 (see http://en.wikipedia.org/wiki/Future_%28programming%29)

when feature is read, you just keep going asynchronously until its value is needed, i.e. it has lazy semantics (see Promise pipelining and read-only views on http://en.wikipedia.org/wiki/Future_%28programming%29)

afaict, most of the complexity of futures is just b/c you are implementing a bit of lazy semantics within a non-lazy world -- in a lazy language they would be simpler (afaict, the only trick is to easily send the message when the future is created, although the actual value is lazily returned)

 4 kinds (see http://en.wikipedia.org/wiki/Future_%28programming%29, Blocking vs non-blocking semantics)
  futures can just be resolved normally via referentially transparent lazy evaluation
   ordinary futures cannot be synchronized (compiler error)
   blocking futures are the same, but block if synched
     either kind can support status reading, i.e. reading if the future has been resolved yet. this introduces nondeterminism, tho!! must be marked with &

note: an example of blocking futures that don't support status reading is Oz's dataflow variables, accd. to wikipedia

note: an example of unsynchronizable futures that don't support status reading is E's remote promises, accd. to wikipedia

note: you want tell the future to start doing its thing before it's value is needed (i.e. what if obtaining its value involves a high-latency roundtrip to another machine on the network? you want to send the message asap, and then when you get the return message, that's the value; if the value is needed before that, you block); "eager futures"; so although the value of the future can be resolved lazily, you want it to start doing its thing before that

note: constraint variables may act like futures in this regard, that you might want to start solving the constraint before you need the solution

the section "Semantics of futures in the Actor model" in the wikipedia page is instructive. it highlights that although, above, i was treating futures as a generic thingee whose evaluation's "eager step" and "lazy step" can be anything (btw, i think this conception of futures may be novel/my own), part of the point of futures is to be a (syntactically lightweight?) way for one concurrent process to ask for some information from another concurrent process (also providing a way for exceptions to be thrown across process boundaries). so make sure this use case is ez and concise.

note: how do neurons deal with this? could neural assemblies possibly "block"? you'd think they'd have to -- this is like the asynchronous replacement a IC's clock signal. perhaps neural oscillations are not just clock signals but also "blocking signals" for some local domain?

synchronizing variables (m-vars)

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-MVar.html

wikipedia traces these to the Id language, and says "M-vars support atomic operations to "take" or "put" the current value, where taking the value also sets the M-var back to its initial "empty" state"

synchronization barriers

all processes in the barrier group must wait for each other to arrive at the barrier before they cross the barrier

monitors

an object with methods, only one of which may be operating on a given instance of an object at a time

also has "condition variables", which are boolean expressions such that a client can call a monitor method conditionally upon the condition variable becoming true. if the C.V. is false, the caller is added to a queue, and is executed when the condition variable becomes true (after the others preceding it in the queue). instead of having expressions, the C.V. can be done manually, i.e. there is a "wait" operation that waits until the C.V. is true, and a "signal" operation that sets the value of the C.V..

in oot, i guess a monitor would be specified as a value along with an associated list of functions that take that value as an argument (and a specification, for ea. fn, which argument it is)

threads vs processes

shared memory vs not

fork

fork on list, i.e.

 x = [1 "a" 3] fork

creates 3 processes, each with x equal to a different thing

(probably better just to do this with the usual looping constructs, i.e. forkmap, forkfold, etc)

functional reactive

w/ discrete and continuous time continuous: yampa, frtime discrete: esterel, lustre, signal

semaphores

any/all to fork

channels (see occam, csp, pi-calculus)

is this just like a syntax-supported version of arbitary streams e.g. stdin, stout?

todo: get ideas from https://www.cs.kent.ac.uk/research/groups/plas/wiki/OccamPiReference

atomic test-and-set

oot should allow atomic sections of code, and on platforms in which atomic operations are available, oot should allow access to them using atomic sections

and atomic compare-and-swap ("compares the contents of a memory location to a given value and, if they are the same, modifies the contents of that memory location to a given new value. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response (this variant is often called compare-and-set), or by returning the value read from the memory location (not the value written to it). ...Algorithms built around CAS typically read some key memory location and remember the old value. Based on that old value, they compute some new value. Then they try to swap in the new value using CAS, where the comparison checks for the location still being equal to the old value. ... [edit] ABA Problem

Some of these algorithms are affected by and must handle the problem of a false positive match, or the ABA problem. It's possible that between the time the old value is read and the time CAS is attempted, some other processors or threads change the memory location two or more times such that it acquires a bit pattern which matches the old value. The problem arises if this new bit pattern, which looks exactly like the old value, has a different meaning: for instance, it could be a recycled address, or a wrapped version counter.

A general solution to this is to use a double-length CAS (e.g. on a 32 bit system, a 64 bit CAS). The second half is used to hold a counter. The compare part of the operation compares the previously read value of the pointer *and* the counter, to the current pointer and counter. If they match, the swap occurs - the new value is written - but the new value has an incremented counter. This means that if ABA has occurred, although the pointer value will be the same, the counter is exceedingly unlikely to be the same (for a 32 bit value, a multiple of 2^32 operations would have had to occurred, causing the counter to wrap and at that moment, the pointer value would have to also by chance be the same). " -- http://en.wikipedia.org/wiki/Compare-and-swap)

also ll/sc: "Load-Link/Store-Conditional...Load-link returns the current value of a memory location. A subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. If any updates have occurred, the store-conditional is guaranteed to fail, even if the value read by the load-link has since been restored. As such, an LL/SC pair is stronger than a read followed by a compare-and-swap (CAS), which will not detect updates if the old value has been restored (see ABA problem)....Real implementations of LL/SC do not always succeed if there are no concurrent updates to the memory location in question. Any exceptional events between the two operations, such as a context switch, another load-link, or even (on many platforms) another load or store operation, will cause the store-conditional to spuriously fail. Older implementations will fail if there are any updates broadcast over the memory bus.

This is often called weak LL/SC by researchers, as it breaks many theoretical LL/SC algorithms. Weakness is relative, and some weak implementations can be used for some algorithms.

Some limitations of weak LL/SC versus CAS are that:

stm

http://en.wikipedia.org/wiki/Software_transactional_memory http://en.wikipedia.org/wiki/Concurrent_Haskell

"message passing"

general term for a paradigm, not a feature. some features:

tuple spaces

see linda, ease http://en.wikipedia.org/wiki/Ease_programming_language, pylinda

atomic n-process consensus

n-process consensus is when n different processes submit values, and one of those values is chosen, and all of the processes agree which value was chosen.

http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=62593 shows that n-process consensus is a universal atomic operation for transforming sequential algorithms into wait-free concurrent ones.

however, http://en.wikipedia.org/wiki/Non-blocking_synchronization says "It was shown in the 1980s[2] that all algorithms can be implemented wait-free, and many transformations from serial code, called universal constructions, have been demonstrated. However, the resulting performance does not in general match even naive blocking designs. It has also been shown[3] that the widely-available atomic conditional primitives, CAS and LL/SC, cannot provide starvation-free implementations of many common data structures without memory costs growing linearly in the number of threads. Wait-free algorithms are therefore rare, both in research and in practice."

atomic fetch-and-cons

the paper "Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 also shows that fetch-and cons is universal (do they really mean this? can fetch-and-cons simulate n-process consensus?). fetch-and-cons atomically (conses an item to the front of a list and returns the cdr of the new list, (which is identical to the old list before this cons)).

atomic multiple assignment

"Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 shows that atomic multiple assignment to n registers can solve n-process consensus, and (2n-2)-process consensus, but not (2n-1)-process consensus.

atomic memory-to-memory move

"Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 shows that atomic memory-to-memory move solves n-process consensus for arbitrary n. Furthermore,

" Corollary 17: It is impossible to construct a wait-fne implementation of memory-to-memory move or swap from a set of registers that support any combination of read, write, test-and-set, swap, or fetch-and-add opcra- tions. Corollary 18: It is impossible to construct a wait-free implementation of memory-to-memory move or swap from a set of FIFO queuds. "

atomic memory-to-memory swap

"Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 shows that atomic memory-to-memory swap solves n-process consensus for arbitrary n. (but this can't be right, b/c of corollary 17.... perhaps the corollary refers to local memory to shared memory swap, whereas shared memory to shared memory swap is stronger?)

atomic augmented queues

Augmented queues also have the operation peek, which returns the first item in the queue without changing the queue. "Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 shows that augmented queues solve n-process consensus for arbitrary n. Furthermore,

" Corollary 13: It is impossible to construct a wait-free implementation of thi augmented queue from a set of registers that support any combination of read. wtite, test-and-set, swap. or fetch-and-add operations. Corollary 14: It is impossible to construct a wait-free implementation of the augmented queue from a set of regular queues. "

atomic queues

A FIFO queue which only has two operations, enqueue (enq) and dequeue (deq) can solve 2-process consensus but not (wait-free) 3-process consensus, cite "Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 , which also shows that

" Theorem 11: There is no wait-free solution to three- process consensus using FIFO queues. "

weakness of read, write. rest-and-set. swap. and fetch-and-add

"Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259

" Although read-modify-write registers are more powerful than read/write registers, many common read-modify-write operations are still computationally weak. In particular, one cannot construct a wait-free solution to three process consen- sus using registers that support any combination of read, write. rest-and-set. swap. and fetch-and-add operations. "

atomic compare-and-swap

todo combine with above. "Impossibility and universality results for wait-free synchronization" http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=6259 shows that

"Theorem 7: compare-and-swap solves n-process consensus for axbitrary n."

and

" Corollary 8: It is impossible to construct a waMrce implementation of a cornpure-und-swap register from a set of registers that support any combination of reu4 write. test-and-set, swap, orfetck-and-add operations. "

oz thread directive

in oz, you can wrap any expression with the directive "thread", which means that that expression is supposed to be calculated in a separate thread.

async functions

or maybe just "spawn and return" in normal funcs?

chords with pattern matching, return-to, timeouts and priorities

see http://lucacardelli.name/Papers/Polyphony%20(TOPLAS).pdf

a chord is a list of "header" function calls and one function body; each header may be synch or asynch; if any of the synch headers are called, it blocks until all of the other headers on that object are called; similarly, if any of the asynch headers are called, the call is saved until all of the other headers are called; when all of the headers are called, the body is executed.

return-to means that each header function can return a value (i.e. in the body, you say something like "return x to f" to return x as the result of header function f. note that the authors don't like return-tos; but u can just select the first thread in the chord to run the body in.

the idea is that the runtime doesn't actually have to spawn threads all the time to run chords; the async calls can just be queued up.

note that this idea is inherently OOPish as the object being called gets a state associated with it (namely, which chord method calls are pending).

use the alternative syntax in section 6.2 of the paper:

" There is some redundancy in the concrete syntax of Polyphonic C as presented here: the attributes, modifiers and return type information of each method have to be repeated (consistently) for each chord in which the method appears. One alternative approach (essentially that of Funnel [Odersky 2000]) would be to allow synchronous method definitions to have more than one body, each of which is guarded by a purely asynchronous pattern, and to specify modifiers and attributes of asynchronous methods in separate declarations. In this style, the reader-writer lock of Section 4.2 could look something like this: class ReaderWriter? { async idle (); Just signatures . Any modifiers or async s(int); attributes would occur here too ReaderWriter? () { idle (); } public void Shared () when idle() { s(1); } when s(int n) { s(n+1); } public void ReleaseShared? () when s(int n) { if (n == 1) idle (); else s(n-1); } public void Exclusive() when idle() {} public void ReleaseExclusive?() { idle (); } } This alternative syntax is more compact in some cases (e.g. in subclasses of ActiveObject?), but is also less flexible: one must group chords by synchronous methods (rather than, for example, by asynchronous state) and it is awkward to turn void into async or vice-versa. "

since we are dropping the one-synch-call-per-chord limit and using return-tos, there's no need to group the chords by synch call anymore and the disadvantages go away.

cells, rendevous, reader/writer locks, n-wait

see http://lucacardelli.name/Papers/Polyphony%20(TOPLAS).pdf

a cell is something that starts out empty, then someone puts a value in it, and then someone else gets a value out of it, and when the value is gotten, the cell goes back to being empty. e.g. in the polyphonic c# chord syntax (& is for chording, not AND):

(from the paper)

public class OneCell? { public OneCell? () { empty(); } public void Put(object o) & private async empty() { contains(o ); } public object Get() & private async contains(object o) { empty(); return o; } }

a rendevous are a set of functions that make the values they are passed available as the return values of other functions in the set:

(from the paper)

class RendezVous? { public int f (int i ) & public int g(int j ) { return j to f ; return i to g; } }

a reader/writer lock lets ppl get reader locks (non-exclusive to other readers) or writer locks (which exclude other readers and writers). a fair one, among other things, guaranteed that when someone requests a writer lock, no other readers are given locks until the writer gets and releases theirs (but you do have to wait for readers who already have a lock to finish).

(from the paper)

   class ReaderWriterPrivate
   {
     ReaderWriter () { idle (); }
     private int n = 0; // protected by s()
     public void Shared () & async idle() { n=1; s(); }
     public void Shared () & async s() { n++; s(); }
     public void ReleaseShared () & async s() {
         if (--n == 0) idle(); else s();
     }
     public void Exclusive() & async idle() {}
     public void ReleaseExclusive() { idle (); }
   }
   class ReaderWriterFair
   {
      ... // same content as in ReaderWriterPrivate, plus:
     public void ReleaseShared () & async t() {
         if (--n == 0) idleExclusive(); else t ();
     }
     public void Exclusive() & async s() { t(); wait(); }
     void wait() & async idleExclusive() {}
   }

note that the chord implementations of this seems to be a weird way of specifying a state machine -- is there a syntactically better way? hmm, yes, as the paper explains, some of the internal method calls are being used as a way to remember the state of some FSM (and wake other code paths when the state changes), with an invariant that multiple state-label calls won't be pending at once. e.g. empty and contains in the cell, idle and s and t and idleExclusive in the ReaderWriterFair?. if we have patterns with guards on the chord fns, then we can represent the state explicitly as a field, e.g.

(i think; i made this up based on OneCell? above)

public class OneCell? { private state;

  public OneCell () {
    state = nil;
  }
  public void Put(object o) & state == nil {
    state = o;
  }
  public object Get() & state == o {
    state = nil;
    return o;
  }}

(note this only works if the implementation rechecks the guards involving some field like "state" whenever "state" changes; probably should not allow any globals or other external refs to be involved in the guards for this reason)

(note: in oot, should use && for chording, as & is for AND)

a 2-wait is this:

(from the paper)

class Join2 { public IntCallback? firstcb ; public IntCallback? secondcb; public Join2 () { firstcb = new IntCallback? (first); secondcb = new IntCallback? (second ); } public void wait(out int i , out int j ) & async first(int fst ) & async second (int snd ) { i = fst ; j = snd ; } }

the idea is that when a client calls a bunch of services and gives them callbacks, sometimes it then wants to wait until all of those callbacks have been called back.

todo: read about event filters in http://www.springerlink.com/content/f01r358705v11502/ Concurrent ML: Design, application and semantics and read the other refs in the chord paper

actors-style objects ("active objects")

objects where every method is synch'd, i.e a monitor w/ state

how to specifiy timers, timeouts and priorities?