Should we separate this into a deterministic concurrency part and a non-deterministic concurrency part?

We are reaching the need to break things down into sections, so that we can have level 4 sections for individual proposals, eg Glitch should have its own section

need to go thru Java and Python concurrency builtin packages and add their stuff. But in general, shouldn't we cover each language's default/recommended approach to concurrency? Perhaps a "concurrency by language" chapter?

that pulsar/quasar or something thing

c11 atomics

polyphonic C

asynchrony vs. parallelism?

are distributed systems covered in this part?

Multi-core architectures and their software landscape Raphael `kena' Poss Note: The following text will be published in the Computing Science Hand- book , Volume 1. This document is a draft copy. Do not distribute


kazinator 2 days ago


Yikes! I'm appalled by the one-sided justification for semaphores in this book. (2.3, "Why semaphores?")

Semaphores are actually a horrible primitive for building concurrency. I sometimes call them "the goto of synchronization", which is deliberately ironic since they are a gift to the world from the "goto considered harmful" thinker, Dijkstra.

Debugging a problem with semaphores is difficult because they have no useful state beyond their count. You don't know how a set of semaphores might have gotten into a given state. (Vaguely analogous to the the argument that in a spaghetti program with backward gotos, it's difficult to have an idea of how control arrived at a particular node in the program.)

For instance, when we (ab)use a semaphore to simulate a mutex, that semaphore does not actually track the ownership: who is locking it now? It can be signaled by any task at all, whereas a proper mutex can diagnose the fact that it is unlocked by its owner.

Semaphores most certainly do not impose deliberate constraints that avoid programmers avoid errors, whatever that is supposed to mean, ouch!

When higher level primitives are correctly built on semaphores (condition variables, read-write locks, you name it), the solutions look like Rube Goldberg mousetraps. And that's before additional requirements are added like error checking, or handling priority inversion and whatnot.

Semaphores have one very good application and that is a situation in which a very fragile context (such as an interrupt service routine) needs to generate a wakeup signal. The semaphore signal operation can be made reentrant fairly easily. This is not only true in OS kernels. Note that in POSIX, for instance, the sem_post operation is noteworthy for being async-signal-safe, meaning that it can be called from a signal handler. By contrast, other things like pthread_cond_signal cannot be.

reply "


have we covered everything in the previous article? locks, atomicity, critical sections, worker pools working on a queue of tasks, forking also channels, IPC

Norman Ramsey and Simon Peyton Jones. Featherweight concurrency in a portable assembly language

Channels Are Not Enough... or Why Pipelining Is Not That Easy


here's a readable description of non-blocking asynch I/O:

" Advantages

One advantage of non-blocking, asynchronous operations is that you can maximize the usage of a single CPU as well as memory. Synchronous, blocking example

An example of synchronous, blocking operations is how some web servers like ones in Java or PHP handle IO or network requests. If your code reads from a file or the database, your code "blocks" everything after it from executing. In that period, your machine is holding onto memory and processing time for a thread that isn't doing anything.

In order to cater other requests while that thread has stalled depends on your software. What most server software do is spawn more threads to cater the additional requests. This requires more memory consumed and more processing. Asynchronous, non-blocking example

Asynchronous, non-blocking servers like ones ones made in Node, only use one thread to service all requests. This means an instance of Node makes the most out of a single thread. The creators designed it with the premise that the I/O and network operations are the bottleneck.

When requests arrive at the server, they are serviced one at a time. However, when the code serviced needs to query the DB for example, it sends the callback to a second queue and the main thread will continue running (it doesn't wait). Now when the DB operation completes and returns, the corresponding callback pulled out of the second queue and queued in a third queue where they are pending execution. When the engine gets a chance to execute something else (like when the execution stack is emptied), it picks up a callback from the third queue and executes it. "



which concurrency methods are deterministic and which are non-deterministic?

which are for distributed systems and which are for single computers with an underlying shared memory?

the founder of Clojure explains how the message-passing model is designed for distributed systems, and how that makes it less suited for shared memory:

" It is important to understand that the actor model was designed to address the problems of distributed programs. And the problems of distributed programs are much harder - there are multiple worlds (address spaces), direct observation is not possible, interaction occurs over possibly unreliable channels, etc. The actor model supports transparent distribution. If you write all of your code this way, you are not bound to the actual location of the other actors, allowing a system to be spread over multiple processes/machines without changing the code.

I chose not to use the Erlang-style actor model for same-process state management in Clojure for several reasons:

    It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock. Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.
    It doesn’t let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying. Reads and writes get serialized and block each other, etc.
    It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can’t observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.
    It is often the case that taking something that works well locally and transparently distributing it doesn’t work out - the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn’t transparent and the code has to change anyway.

Clojure may eventually support the actor model for distributed programming, paying the price only when distribution is required, but I think it is quite cumbersome for same-process programming. YMMV of course. " --


Some examples of nondeterministic concurrency (i think?):

" However, new programming models transcend sequential-execution programming:

    When writing a multi-threaded program, the programmer may write each thread as a sequence of instructions without specifying the timing of any instruction relative to instructions in other threads.
    In event-driven programming, the programmer may write sequences of instructions to respond to events without specifying an overall sequence for the program.
    In dataflow programming, the programmer may write each section of a computing pipeline without specifying the timing relative to other sections." --


" Classifying models of concurrency A model of concurrency usually consist of a set (domain) whose elements denote concurrent systems, together with some structure. The structure takes the shape of a collection of operators, turning the domain into an algebra (a process algebra), optionally in combination with a collection of predicates, or a collection of morphisms between the elements, making the domain into a category.

Classifying models of concurrency with respect to the kind of mathematical objects that are used to represent processes, I find it convenient to distinguish 5 types of models.

    GRAPH oriented models. Here a concurrent system is represented by a process graph, or state-transition diagram, also called automaton. Or by a richer graph-like object. A variant are labelled transition systems, in which a concurrent system is not denoted by a whole graph, but by a vertex in a graph; the entire domain is then one large graph.
    NET oriented models, in which a concurrent system is represented by a Petri net, or a net-like object.
    EVENT orient models, in which a concurrent system is represented by a set of events (action occurrences) together with some structure on this set, determining the relations between the events. This class of models includes the various brands of event structures.
    EXPLICIT models. In the models mentioned above, a concurrent system is not really modeled as a single graph/net/event structure, but actually by an equivalence class of such objects. Thus different graphs/nets/etc. represent the same concurrent system. Which equivalence relation is divided out is partly determined by which properties of concurrent systems are considered to be relevant. The choice of equivalence is important in proving the correctness of specifications w.r.t. implementations.
    The explicit models on the other hand are not quotient models, but offer one `explicit' object to represent a system. This object is usually a mathematically coded set of the relevant properties, or of the possible executions, of the represented system.
    TERM models. Here a concurrent system is represented as a term in a system description language. Well known system description languages are CCS and CSP. Instead of speaking of term models as type of model, one could say that systems can be denoted either syntactically (by means of an expression in a language) or semantically (in a model of type 1-4). The notion of a term model is the result of unifying these views. Usually, the meaning of an expression is given by means of a mapping from the used set of terms into another model. This mapping constitutes the semantics of the used language. 

Besides w.r.t. type (as above), models of concurrency can be classified w.r.t.

    the equivalence relation that is divided out (explicitly in term, graph, net and event oriented models; implicitly in explicit models),
    the scope of the model (which systems can be represented and which cannot)
    and the chosen structure (the selection of operators, relations and morphisms defined on the model). 

As there are canonical translations between the various types of models, models of different type can be compared w.r.t. scope, structure and level of identification. Rob van Glabbeek " --


Rob van Glabbeek courses:

Comparative Concurrency Semantics

Modelling Concurrent Systems

Foundations of Control Structures


great thread on (async/await vs implicit cooperative coroutines) and single-threadedness of javascript, at:


tanto 14 hours ago

The moment I started using async await (with babel) combined with the new fetch API so many libraries got obsolete.

Getting data is as easy as:

  async function main () {
    try {
      const res = await fetch('');
      const json = await res.json();
    } catch (e) {
      // handle error

So I am quite happy when this lands in modern browsers asap.


tomp 7 hours ago

Can anyone point to a good explanation why this is preferable to cooperative threads (coroutines)? I.e. the above code could easily be written as

  function main() {
    try {
      const res = fetch('');    // yield here
      const json = res.json();                                      // yield here
    } catch (e) {
      // handle error

and the runtime would automatically yield this coroutine and let other coroutines run whenever some call blocks.

I guess the only realy difference would be that I would make `await` (and `async`) implicit, similar to how exceptions are handled implicitly in the above snippet (i.e. we don't annotate functions as `throwing` and we don't use an `attempt` statement (analogous to `await`) when calling `throwing` functions).

Edit: Reading various articles, the best explanation is that since the single-threaded nature of JavaScript? makes synchronisity implicit (i.e. all code is run in an implicit transaction), it has to make asynchronisity explicit. The alternative would be to have coroutines/fibres (implicit asynchronisity), but with explicit `atomic` blocks (for synchronisity).

C# doesn't provide any synchronisity guarantees, so I guess the motivations there were different (it's really hard (impossible?) to implement fibres efficiently, and even harder to allow for native code interoperability).


xg15 4 hours ago

There is also the the point of flexibility. Async/await works on promises, however promises themself are just perfectly un-magical JS objects that accept a callback. This results in two effects:

1) Yielding is decoupled from the asynchronous action itself, giving the caller fine-grained control when to yield. For example, with async/await, management of parallel operations is straight-forward:

  var promise = fetch('');
  // ... do something useful while the fetch runs ...
  var result = yield promise;

Same goes for concurrent fetches:

  var promise1 = fetch('');
  var promise2 = fetch('');
  [result1, result2] = yield Promise.all(promise1, promise2 );

2) Promises + async/await allow you to switch back-and-forth very easily between callback-style and coroutine-style APIs, which makes integrating old APIs very easy:

  async function f() {
   var p = fetch('');
   p = p.then(function(inbetweenResult) {return new Promise(function(resolve) {
    oldapi.onresult = resolve;
   return await p;

... and so on.


cwmma 2 hours ago

The explicitness is the reason currently you can assume* that if nobody will ever modify variables unless you explicitly give up control somehow, if you were to have implicit await then that would no longer be the case and any function call could theoretically pause the function and give up control to other functions.


olau 29 minutes ago

Just to expand a bit upon this: That you can assume a single thread simplifies the code immensely, e.g. when modifying state. It's important to be explicit about when this assumption is broken.


SomeCallMeTim? 10 hours ago

I held my nose when I was forced to write JavaScript? until I learned about async/await.

I came from a language that can do this kind of thing implicitly using coroutines (i.e., I could write imperative code that would pause on a network request and resume afterwards). Have to wade through callback hell -- even with Promises -- was a serious step down.

With async/await being explicit, combined with functionality like Promise.all(), this is actually a better solution than I had before, because I can spawn a dozen queries at once and wait for them all to complete before I continue. I've used that to good advantage already once with an await Promise.all(...) call, and the patterns are so easy to follow that it brings tears to my eyes.

Or I can inject more traditional callbacks when the logic would be easier (or when it would allow several simultaneous calls to complete), which does happen from time to time.

As to whether I'd learn a new language because of it: I'm considering learning Elixir because of even stronger guarantees [1], actually, so yes, I would.

[1] Apparently in addition to being able to write code using light threads like this, you can also put a cap on how long the VM will run code, so if some idiot developer puts a loop in the code that doesn't return control enough, instead of destroying your UI experience it will just pause that loop and resume the main loop. Still have yet to try it though.


ch49 7 hours ago

Why is "async" keyword needed? Can't JS engine infer from the use of "await" in a function that this function need to be async? I'm using async/await for a while now, and so many times I've introduced bugs in my code because i forget to put "async" in front of the function, or put "async" in front of wrong function. It's simply annoying to go back and put "async" when in middle of writing a function I realise I need to use await.


ilkkao 7 hours ago

Reason is mentioned here:

reply says: zenparsing commented on Feb 8

Keep in mind that await is not a keyword outside of async functions. So this is actually a valid function call:

function f() { await (something); }

Among other reasons, we need the special form so that we can parse "await" as a keyword inside of it.


(confusing subthread:


amelius 7 hours ago

Perhaps this is the reason:

Is it possible to have a yield inside a called function (i.e., not the generator function itself)? Last time I checked this was not allowed (?)

Anyway, if so, I would strongly prefer this over a async/await construct, which is less general.


masklinn 4 hours ago

> Is it possible to have a yield inside a called function (i.e., not the generator function itself)?

Call a generator function from a generator function? Sure, you just need to transitively yield its content using `yield* [[expr?]]` (which delegates part of the iteration to the `expr` generator)


amelius 4 hours ago

So what tomp is doing:

    const res = fetch('');    // yield here

actually requires a "yield" in front of the "fetch"? But what if I want the fetch() function to decide whether to yield or not?

Of course, fetch(), could yield a status specifying whether its calling-ancestors should yield, but this can become unwieldy very quickly, and might require an exception mechanism of its own. Better to just let called functions yield.


masklinn 4 hours ago

In javascript it would yes (actually a `yield*`, or an `await` in async/await). With native coroutines it wouldn't (the runtime would implicitly yield on IO).

> But what if I want the fetch() function to decide whether to yield or not?

In javascript? The question doesn't really make sense, a function is either sync or async.


amelius 2 hours ago

> The question doesn't really make sense, a function is either sync or async.

Well, either fetch() blocks, or not. If it blocks, then it should yield, its caller should yield, etc. If it doesn't block, then it can just run to completion.


masklinn 1 hour ago

Don't elide important parts of the quote. In javascript your question doesn't make sense.

> Well, either fetch() blocks, or not.

Javascript functions always block.

> If it blocks, then it should yield, its caller should yield, etc. If it doesn't block, then it can just run to completion.

If a function is async, you must yield[0]. It may not need* that capability (conditionally) and doesn't have to yield at all e.g.

    function*() {
        return 4;

is valid and will not suspend. You must still yield*[0] on it so that it is properly resolved as a generator.

[0] I think that's not an issue for async/await as it's syntactic sugar for then chains, the code itself may be converted to a generator or whatever but it will embed the driver

reply " )


" startling 14 hours ago

Is the syntax composable? Can I do

    const json = await (await fetch(')).json();

or do I have to name it?


dvlsg 14 hours ago

You probably can, but you may as well just do this instead:

    const json = await fetch('').then(res => res.json());

reply "


bigtones 12 hours ago

Thank you to Microsoft and Anders Hejlsberg for inventing Async/Await (and to F# for the original inspiration)


Lx1oG-AWb6h_ZG0 12 hours ago

async/await as we know it actually came from Midori's C# dialect, which indeed was inspired from F#:

(Although, just to be pedantic, the concept of futures is way older than F#. Alice's syntax, for example, is pretty close)


MichaelGG? 10 hours ago

The nice part about F#'s async implementation is that it was purely a library. I wish languages aimed at allowing developers to build things for the language, rather than providing more compiler built-ins. Making it extensible is more elegant (though I understand it might be easier to optimize perf).


barrkel 9 hours ago

Integrated features are easier to tool and provide good diagnostics for. It's nice to have enough expressiveness to model monadic computations without too much syntactic overhead - and monads are usually enough, since they hand blobs of user code to the library to be composed as necessary - but debugging a reshuffled graph of closures is no picnic.

reply "


one design choice when dealing with Observables (event streams) or similar is: If an event stream is created but unobserved, does it still execute?

In Reactive Cocoa 3 for example, both Signal and SignalProducer? are provided to differentiate based on this; an unobserved Signal still produces events, but an unobserved SignalProducer? does not [1]. This is also called 'hot' and 'cold' in some communities [2].


another design choice when dealing with Observables and similar is: How is backpressure handled, that is, how are finite buffers enforced? We could just block but that defeats some of the purposes of 'reactive' Observables. So, we need to set a limit on buffer size, and communicate it, and either the producer or the consumer or both must be responsible for maintaining that limit; which one is responsible? How is the limit communicated and what happens if the responsible party tries to ignore it?

Quoting Erik Meijer who invented Rx Observables:

> My opinion about back pressure is that you either have a truly reactive system where the producer is fully in charge and the onus is on the consumer to keep up; or a truly interactive the consumer is fully in charge and the onus is on the producer to keep up. This is one of the cases where having different types of collections makes sense (just like we have different collections like lists, maps, sets, ... with different performance characteristics and different operations)


the 'reactive streams' standard for event streams with finite buffers and backpressure (via the consumer choosing a buffer size):


summary of

asynch I/O in Javascript: callbacks vs promises vs async/await vs observables

Javascript esp. Node.js typically uses callbacks for asynchrony.

Problems with callbacks for async I/O:

for async I/O, something better than callbacks is Promises (which are standard in Javascript ES6). Promises clearly specify error handling and scheduling and replace nested callback syntax with chained method invocation syntax. Promises are callbacks under the hood. The callbacks in Promises are always scheduled as microtasks (even if the promise is already settled when the callback is added).

Promises can be unsettled, or in two 'settled' states, 'resolved' or 'rejected'. You can setup a different callback for each settled state. Unlike callbacks, the callbacks setup by a Promise are guaranteed to only be called once.

"You can think of resolve as the Promise equivalent of return and reject as throw. As we'll see later on, this semantic equivalency is syntactically realized by the async and await keywords."

With promises, there is still some syntactic cruft around error-handling, however, because you must explicitly chain the error handler (in comparision to language-supported exceptions, which are lexically scoped).

Async/await can be thought of as syntactic sugar for promises. It is even better because it eliminates the chaining syntax altogether, and handles errors the same way the language does. In async/await, you declare functions 'async' to indicate that they return a Promise, and then you put 'await' in front of function calls to indicate that any following statements are actually chained to the result of the function call in a callback (eg they are done after the promise settles). However async/await is not as expressive as Promise; for example, with Promise (but not with 'async/await'), there is a way (the 'all' method) to, for each item in a list, start an (micro)task, with all these tasks running concurrently.

Some issues with promises:

Observables (event streams) are generalizations of Promises; instead of representing a single event, an Observable represents a stream of events. Observers 'subscribe' to the event stream. Subscriptions are first-class objects, which are held by the subscriber and used to unsubscribe later.

Javascript Observables might return immediately (synchronously) or asynchronously (unlike Promises, which always schedule their callbacks as microtasks (async).

Observables have a 'cleanup' function which is called when all of the subscribers have unsubscribed. This allows for cancellation, (currently) unlike Promises.

Note that, although Observables are more general than Promises, typically only Promises are needed.

The async/await syntax could in theory be used with Observables as well as Promises (but this is not yet implemeted in core Javascript).

Lots of good Javascript examples are given (but a few comments in think the callback example is unnecessarily verbose).



" The AsyncIterator? interface is identical to the Iterator interface, except that each of the iterator methods returns a promise for an iterator result pair. "


summary of :

Javascript has 'tasks' and also 'microtasks' (or 'jobs'). Both are queues (FIFO stacks). The microtasks queue is processed at the end of each task; and ALSO at the end of each callback within a task, IF the Javascript is empty (eg we're not in the middle of a function call; note that this means that microtasks are executed after handling a callback for actual user clicks, programmatic but NOT after handling the same callback for program-generated clicks).

Browser updates happen only in between tasks.

Observer callbacks and promise callbacks are scheduled as a new microtask (eg within the current task). Calling UI callbacks (eg mouse click handler) and timeouts occur are scheduled as new tasks.

So, if a timeout is set for 0 time and then a promise is set to 'settled' (completed) but the promise has an attached callback, the promise's callback will be called first, not the timeout's, because the promise callback is scheduled as a microtask in the current task, whereas the timeout's callback is scheduled as a new task.


" Atomic instructions on CUDA atomic{Add, Sub, Exch, Min, Max, Inc, Dec, CAS, And, Or, Xor} " --


distinction between futures (which start executing as soon as they are created) and by-needs ('lazy'; they don't start executing unless their result is demanded, at which point they start executing and convert into a future).

See SEAM transients: A Virtual Machine for Multi-Language Execution section 3.

There are apparently other types of transients as well:

" A transient is a placeholder for a value which is not yet known. When the value becomes known, the transient is re- placed by the value. Computations that find a transient in place of a value block until the value becomes known. Tran- sients come in several flavors, of which we describe two in this paper. A future is a transient which can be explicitly re- placed by its value. Every future stores a queue of blocked threads to reschedule when it is replaced. A by-need is a transient associated with a first-class computation (see Sec- tion 2.3). When a thread requires the value of a by-need, the by-need turns into a future (on which the requesting thread blocks), while simultaneously a new runnable thread is cre- ated with two tasks on its stack: one to explicitly replace the future, and on top of it a task created from the first-class computation. In other words, the first-class computation is executed at most once. By-needs are used to implement lazy (that is, demand-driven) computations.

...A variant of by-needs is imaginable which would simply push the two tasks on the requesting thread, to dispense with the overhead of explicit thread creation. The difference is observable only in languages which have a notion of a first- class thread with identity...

Further kinds of transients are supported by the implementation, which we will not present here



the Erlang guy gives a good example of why the concurrency model of Erlang is easier to reason about than in JS:

" Here's a example of a green callback in Erlang: loop(F) -> receive {new_callback, F1} -> loop(F1); Msg -> F(Msg), loop(F) end.

When the processes running this code receives a message Msg it evaluates the function F(Msg). There is no uncertainty about this, we know _exactly_ when the callback is triggered. It is triggered immediately after receiving the message Msg.

This wee code fragment is doubly beautiful, if you send the process a message {new_callback, F1} then it will change its behavior using the new callback on the next invocation.

I don't know how you would write this in Javascript. I've written quite a lot of jQuery and know how to setup and remove callbacks. But what happens if an event is triggered in the time interval between removing an event handler and adding a new one. I have no idea, and life is too short to find out. "



thread vs process:

a process has its own memory; multiple threads within a process share the process's memory


" process:

    process id
    file descriptor
    shared libraries
    instruments of interprocess communications (pipes, semaphores, queues, shared memory, etc.)
    specific OS sources


    attributes (for sheduler, like priority, policy, etc.)
    specific thread data
    specific OS sources" -- [4]


Ryan Cheu, Occasional Golang user 377 Views · Ryan has 60+ answers in Computer Programming

This article is a good description of how Goroutines work, quoted below:

    [T]he runtime manages the goroutines throughout from creation to scheduling to teardown. The runtime is allocated a few threads on which all the goroutines are multiplexed.

Whether they're green threads depends on your definition of a "green thread". The execution of the code is run on native threads, but the go runtime is managing the execution of the goroutines on different native threads.


" Since that paper was written, conventional wisdom has suggested that high performance servers require native threads, or more recently, event loops.

Threads carry a high overhead in terms of scheduling cost and memory footprint. Event loops ameliorate those costs, but introduce their own requirements for a complex, callback driven style.

Go provides programmers the best of both worlds. " -- [5]


OpenACC?: compiler directives for GPU accelerators intro: cheat sheet (appears to be missing section Data Clauses though): programming guide: spec:

example: the lines starting with "#pragma acc" in:

#pragma acc data copy(A) create(Anew)
while ( error > tol  &&  iter  <  iter_max )  {
  error = 0.0;
  #pragma acc kernels
  #pragma acc loop
  for (  int  j = 1; j < n-1;  j++ )  {
    for (  int  i = 1; i < m-1; i++ )  {
       Anew [j] [i] = 0.25 * ( A [j] [i+1] + A [j] [i-1] +
                                      A [j-1] [i] + A [j+1] [i];
       error = fmax ( error, fabs (Anew [j] [i] - A [j] [i];
  #pragma acc loop
  for (  int j = 1; j < n-1; j++) {
     for (int = i; i < m-1; i++ )  {
       A [j] [i] = Anew [j] [i];
   if (iter % 100 == 0) printf ("%5d, %0.6f\n", iter, error);

the most important pragmas appear to be:

builtin reduction operators (for the 'parallel reduction' clause):

, ^, &&,

" OpenACC? defines three levels of parallelism: gang, worker, and vector. Additionally execution may be marked as being sequential (seq).

Vector parallelism has the finest granularity, with an individual instruction operating on multiple pieces of data (much like SIMD parallelism on a modern CPU or SIMT parallelism on a modern GPU). ... Gang parallelism is coarse-grained parallelism, where gangs work independently of each other and may not synchronize.

Worker parallelism sits between vector and gang levels. A gang consists of 1 or more workers, each of which operates on a vector of some length. Within a gang the OpenACC? model exposes a cache memory, which can be used by all workers and vectors within the gang, and it is legal to synchronize within a gang, although OpenACC? does not expose synchronization to the user. "


to summarize Python Copperhead (details in Self:proj-oot-ootConcurrencyNotes3):

Python Copperhead compiles a pure subset of Python (to C++, where it uses the Thrust library to run it on CUDA, TBB, or OpenMP?). Functions to be compiled to Copperhead code are marked with '@cu'. The primitive functions are given above; in addition, list comprehensions can be used as syntactic sugar. Copperhead is statically typed (and type inferred). The types are (mostly):

Python Copperhead executes async and returns futures (but "successive calls to Copperhead functions always wait for previous calls to finish before beginning").

Can use 'with places.gpu0' 'with places.openmp', etc, to control what is run on the GPU.


from copperhead import *
import numpy as np

def saxpy(a, x, y):
  return [a * xi + yi for xi, yi in zip(x, y)]

x = np.arange(2**20, dtype=np.float32)
y = np.arange(2**20, dtype=np.float32)

with places.gpu0:
  gpu_result = saxpy(2.0, x, y)

with places.openmp:
  cpu_result = saxpy(2.0, x, y)


Python Copperhead compiles to Thrust (created by Nvidia, i think), and Copperhead/Thrust work on CUDA, TBB (Threading Building Blocks), and OpenMP?.


axpy is 'the "hello world" of parallel programs. (axpy is the type-generic form of saxpy...' [8]


pypy-stm's TransactionQueue? construct:

that document also explains a more blunt, lower-level construct called atomic sections: "Atomic sections are similar to re-entrant locks (they can be nested), but additionally they protect against the concurrent execution of any code instead of just code that happens to be protected by the same lock in other threads." -- [9]


implementation of various atomics from others:

Universal Constructions for Multi-Object Operations (Extended Abstract) by James H. Anderson and Mark Moir



more stuff on memory order consistency models:


Milewski's example of how even making everything acquire-release leads to counterintuitive results; really only sequential consistency is intuitive:

" Now here the funny story begins. Without much thinking, I decided that just slapping memory_order_acq_rel on any of the writes would fix the problem. Here’s what I originally wrote:

[begin quote] The correct C++ code in this case is:

Peterson::Peterson() {, memory_order_release); _interested[0].store(false, memory_order_release); _interested[1].store(false, memory_order_release); } void Peterson::lock() { int me = threadID; either 0 or 1 int he = 1 – me; the other thread _interested[me].exchange(true, memory_order_acq_rel);, memory_order_release); while (_interested[he].load(memory_order_acquire) && _victim.load(memory_order_acquire) == me) continue; spin }

The ordering memory_order_acq_rel produces an mfence instruction on the x86. This fence will prevent the movement of the load of _interested[he] before the store to _interested[me], which could otherwise happen (and break the algorithm).[end quote]

You can read comments to this post–especially the ones by Anthony and Dmitriy, who made me eat crow. In short, the point is that the “acquire” part of memory_order_acq_rel has no meaning unless another thread writes to (“releases”) the same variable. Since only thread 0 writes to _interested[0] and only thread 1 one writes to _interested[1], this synchronization accomplished nothing (outside of the x86). Dmitriy’s implementation, which synchronizes on _victim, is correct (but I had to ask many questions before understanding Anthony’s proof of it). " ("Dmitriy's implementation" is in a comment on this post, )

here's another example:

from that post: "I had no idea what I was getting myself into when attempting to reason about C++ weak atomics. The theory behind them is so complex that it’s borderline unusable. It took three people (Anthony, Hans, and me) and a modification to the Standard to complete the proof of a relatively simple algorithm. Imagine doing the same for a lock-free queue based on weak atomics! "

--- explains the difference between linearizability and sequential consistency talks about this too



Seven Concurrency Models in Seven Weeks



gpu.js relies on the assumption that the kernel function is using only a subset of legal JavaScript? syntax:

    1D, 2D, 3D array of numbers or just numbers as kernel input
    1D, 2D, 3D array of numbers as kernel output
    Number variables
    Custom and custom native functions
    Arithmetic operations (+, +=, -, *, /, %)
    Javascript Math functions (Math.floor() and etc.)
    if and else statements
    const and let
    No variables captured by a closure