Bayle Shanks's website: proj-oot-ootConcurrencyNotes2

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.4173 Four Concurrency Primitives for Haskell (1995) by Enno Scholz

https://www.google.com/search?q=small+set+of+concurrency+primitives

http://research.microsoft.com/en-us/um/people/simonpj/papers/lw-conc/

http://julia.readthedocs.org/en/latest/manual/parallel-computing/

" Parallel programming in Julia is built on two primitives: remote references and remote calls. A remote reference is an object that can be used from any process to refer to an object stored on a particular process. A remote call is a request by one process to call a certain function on certain arguments on another (possibly the same) process. A remote call returns a remote reference to its result. Remote calls return immediately; the process that made the call proceeds to its next operation while the remote call happens somewhere else. You can wait for a remote call to finish by calling wait on its remote reference, and you can obtain the full value of the result using fetch. You can store a value to a remote reference using put. "

http://davidad.github.io/blog/2014/03/23/concurrency-primitives-in-intel-64-assembly/

---

http://berb.github.io/diploma-thesis/community/050_index.html

---

http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/

---

http://www.cl.cam.ac.uk/~so294/documents/ec209.pdf

Relaxed memory models must be rigorous

---

http://www.cs-lab.org/historical_beam_instruction_set.html

---

http://www.cs-lab.org/historical_beam_instruction_set.html#2.9%20Inter-process%20Communication

---

" Concurrency is concerned with controlling non-determinism, which can arise in all sorts of situations having nothing to do with parallelism. Process calculi, for example, are best viewed as expressing substructural composition of programs, and have very little to do with parallel computing. (See my PFPL and Rob Simmons’ forthcoming Ph.D. dissertation for more on this perspective.) " -- http://existentialtype.wordpress.com/2012/08/26/yet-another-reason-not-to-be-lazy-or-imperative/

---

" ppgallardo says: May 2, 2011 at 6:37 am

Rate This

Dear Prof. Harper,

According to your own comments:

I simply meant that it is not clear whether monadic isolation of effects (esp store effects) is a good idea, because it precludes benign effects.

… for the sake of parallelism: you can’t evaluate a Sequence.map in parallel unless you know that the function being mapped is effect-free.

It seems that there is a conflict here. You can either have effects isolated by default (Haskell), which is good for parallelism, but requires some mechanism like unsafePerformIO to use begin effects, or you can have unsafe free access to effects by default (ML), which is good for begin(safe) effects, but you would need your own effect system to use parallelism. I don’t see clearly that this second approach is a better one. Reply

    ppgallardo says:	
    May 2, 2011 at 10:04 am	
     
    0
     
    0
     
    Rate This

    Sorry, I meant benign effects
    Abstract Type says:	
    May 2, 2011 at 11:30 am	
     
    5
     
    0
     
    Rate This

    Exceptions are perfectly compatible with parallelism; it is only storage effects (which includes I/O that creates a problem. If we isolate storage effects using a re-organized library in the manner I described, we get what we need for parallelism. I consider this significant, but you may not. The concept of purity is not entirely clear, it seems to me.
    neelk says:	
    May 3, 2011 at 4:00 am	
     
    0
     
    0
     
    Rate This

    When you say exceptions are compatible with parallelism, do you mean exceptions with or without handlers?

    In the absence of handlers you’re obviously correct (and it synergizes beautifully with strictness), but if there’s a catch-form then it seems to me that the naive parallel semantics becomes nondeterministic.

    Using par(e,e') to indicate a parallel tuple, consider the expression:

      par(raise (Error 1), raise (Error 2))
      handle 
        Error n => n

    This seems like it may give different answers depending on the scheduling of the left and right components of the pair.
    Abstract Type says:	
    May 3, 2011 at 3:48 pm	
     
    3
     
    0
     
    Rate This

    With handlers, of course, otherwise they wouldn’t be called exceptions! Your example shows only that the join-point logic is more subtle than you are imagining it to be. The crucial point is to ensure that parallelism does not change the semantics of the program. So, in this case, we must ensure that the leftmost exception is the one that is propagated, if any, because that is what would happen in the sequential case. The join point logic ensures that this is the case; I won’t bother to describe here how that is done, it’s not hard to imagine.

    Interestingly, we can also admit other effects, such as printing, by the same means: suspend the output until the join-point is reached, then print in left-to-right order. Input doesn’t seem susceptible to this kind of treatment, but random seeds easily are, because they can be split. We rely on this to implement treaps in parallel (though there are other means based on fingerprinting that avoid the need for random numbers).

    Crucially, the forms of parallelism we consider always have a well-defined join point. Without that, I cannot envision a way to admit effects, even exceptions, in parallel without affecting the sequential meaning. So, for example, parallel futures are out of the question because there is no well-defined join point. Indeed, this is precisely the same reason why it is so awkward to manage exceptions in a lazy language, among other things.

    It seems that I ought to have explained my motivations for this post more thoroughly up front. It seemed that I could just break out the one point, re-defining the interfaces of some of the basis libraries, to get the isolation of effects that I want to ensure absence of race conditions in parallel. My mistake."

---

http://lambda-the-ultimate.org/node/1615 Event-Based Programming without Inversion of Control. Philipp Haller and Martin Odersky.

http://lambda-the-ultimate.org/node/1617#comment-19873

Still has limitations

The Scala paper acknowledges some limitations:

    To conclude, managing information about statements following a call to receive would require changes either to the compiler or the VM. Following our rationale for a library-based approach, we want to avoid those changes.

If you want statements to follow a receive, you have to manually-CPS them yourself, possibly duplicating code between different branches of the receive. Moreover, any function that calls a function that uses receive also cannot contains statements after the receive, and so on, recursively. Scala's type system will enforce that for you, but that doesn't stop it from being annoying.

For me, that's a dealbreaker. I could just as easily use Java and the Proactor pattern. I use threads so that I can write my control flow in direct-style; if I wanted to CPS things I can do so in other languages.

Now, it's certainly possible to get around this: just write a VM with first-class continuations. But then you start running into other problems, like first-class continuations not playing nicely with a C# or Java FFI (which is why Scala doesn't support them). Beware the Turing Tarpit. Once you've generalized your VM to fit every conceivable language, its functionality is probably so general that it becomes faster for a language designer to write his own VM than to target yours. By Jonathan Tang at Wed, 2006-07-19 00:50

---

[–]dons [F] 13 points 7 years ago

Right! Here's what's happening:

    Data Parallel Haskell on the GPU
    SSE for GHC

So that gives us 4 kinds of parallelism to play with:

    explicit threads (forkIO)
    implicit threads (par)
    transparent parallelism on multicore (DPH-based nested parallel arrays)
    GPU-based (nested) parallel arrays (Sean's DPH + GPU) ongoing

The GPU is really the "FP chip that never was", with its hardware supported folds, scans and maps.

Btw, Microsoft is helping fund the Data Parallel Haskell work. I guess they really need a solution for massively multicore programming :-)

    permalink
    save
    parent

'join' functions with arguments from multiple threads is natural if you allow 'shared memory' partially applied functions and closures

kyrra 1 day ago

link

Single-threaded Go programs also run faster for some (many?) cases. The runtime is able to bypass a number of locks when running in a single-threaded mode. And depending on the profile of your code, turning on multiple threads for a Go program can slow it down even if you use channels (as moving data from one CPU core to another can be much slower than just doing a context switch in a single core).

---

BKuhn likes http://vertx.io/manual.html

http://lwn.net/Articles/615834/

---

On the Connection Between Memory Management and Data-race Freedom

As I alluded in the previous post, I have noticed an interesting connection between memory management and data-race freedom. I want to take a moment to elaborate on this, becaause the connection was not obvious to me at first, but it ultimately drives a lot of the Rust design decisions.

First, I believe that if you want to guarantee data-race freedom, and you want to support the cheap transfer of mutable state between tasks, then you must have a garbage-collector-free subset of your language. To see what I mean by “cheap transfer of mutable state”, consider something like double-buffering: you have one drawing and one display task exchanging buffers (so there are only two buffers in total). While the drawing task is preparing the next frame, the display task is busy displaying the current one. At the end, they exchange buffers. In order to prevent data races in a scenario like this, it is vital that we be able to guarantee that when the buffers are exchanged, neither task has any remaining references. Otherwise, the display task would be able to read or write from the buffer that the drawing task is currently writing on.

Interestingly, if we wanted to free one of those buffers, rather than send it to another task, the necessary safety guaranty would be precisely the same: we must be able to guarantee that there are no existing aliases. Therefore, if you plan to support a scenario like double buffering and guarantee data-race freedom, then you have exactly the same set of problems to solve that you would have if you wanted to make GC optional. Of course, you could still use a GC to actually free the memory, but there is no reason to, you’re just giving up performance. Most languages opt to give up on data-race freedom at this point. Rust does not.

But there is a deeper connection than this. I’ve often thought that while data-races in a technical sense can only occur in a parallel system, problems that feel a lot like data races crop up all the time in sequential systems. One example would be what C++ folk call iterator invalidation—basically, if you are iterating over a hashtable and you try to modify the hashtable during that iteration, you get undefined behavior. Sometimes your iteration skips keys or values, sometimes it shows you the new key, sometimes it doesn’t, etc. In C++, this leads to crashes. In Java, this (hopefully) leads to an exception.

But whatever the outcome, iterator invalidation feels very similar to a data race. The problem often arises because you have one piece of code iterating over a hashtable and then calling a subroutine defined over in some other module. This other module then writes to the same hashtable. Both modules look fine on their own, it’s only the combination of the two that causes the issue. And because of the undefined nature of the result, it often happens that the code works fine for a long time—until it doesn’t.

Rust’s type system prevents iterator invalidation. Often this can be done statically. But if you use @mut types, that is, mutable managed data, we do the detection dynamically. Even in the dynamic case, the guarantee that you get is much stronger than what Java gives with fail-fast iteration: Rust guarantees failure, and in fact it even points at the two pieces of code that are conflicting (though if you build optimized, we can only provide you with one of those locations, since tracking the other causes runtime overhead right now).

One reason that we are so intolerant towards iterator invalidation is because we wish to guarantee memory safety, and we wish to do so without the use of universal garbage collection (since, as I just argued before, it is basically unnecessary if you also guarantee data-race freedom). Without a garbage collector, iterator invalidation can lead to dangling pointers or other similar problems. But even with a garbage collector, iterator invalidation leads to undefined behavior, which can in turn imply that your browser can be compromised by code that can exploit that behavior. So it’s an all-around bad thing.

Therefore, I think it is no accident that the same type-system tools that combat iterator invalidation wind up being useful to fight data-races. Essentially I believe these are two manifestations of the same problem—unexpected aliasing—in one case, expressed in a parallel setting, and in the other, in a sequential setting. The sequential case is mildly simpler, in that in a sequential setting you at least have a happens-before relationship between any two pairs of accesses, which does not hold in a parallel setting. This is why we tolerate &const and &mut aliases in sequential code, but forbid them with closure bounds in parallel code.

In some way this observation is sort of banal. Of course mutating without knowledge of possible aliases gets you into trouble. But I think it’s also profound, in a way, because it suggests that these two seemingly unrelated issues, memory management and data races, cannot be truly separated (except by sacrificing mutability).

Most languages make no effort to control aliasing; if anything, they use universal garbage collection to prevent the end user from having to reason about when aliasing might exist to a given piece of data. This works well for guaranteeing memory is not freed, but as I’ve suggested, it can lead to a variety of incorrect behaviors if mutation is permitted.

This observation has motivated a lot of the research into ownership types and linear types, research which Rust draws on. Of other recent non-research languages, the only other that I know which takes the approach of controlling aliasing is Parasail. Not coincidentally, I would argue, both Rust and Parasail guarantee data-race freedom, while most langauges do not.

Posted by Nicholas D. Matsakis rust " -- http://smallcultfollowing.com/babysteps/blog/2013/06/11/on-the-connection-between-memory-management-and-data-race-freedom/

---

https://github.com/mozilla/rust/blob/master/AUTHORS.txt

---

the correct soln is, not to have blocking versions of every call, but to make eveery call nonblocking, and to prlovidee a construct like await to turn a nonblocking call with a callback (or a stream) into a blocking call

---

a notation of 'adjacency' for threads (a graph topology); and maybe also a geometry

---

my friend PR was saying how he just does REST for most IPC instead of using a message bus, which is more complex (and can be harder to debug, if 'debuggability' was not priority for the implementation of the bus). but what about 'simple' IPC like i have in one of atr's drivers (a separate process periodically polls an external webpage, and then puts any new information it finds into update messages, which are read by the primary process and used to update in-memory data structures representing the current state of the world)? he said that's fine. We didn't talk about this but i bet the key to its simplicity is probably that (a) it is spreading information, not commands or state changes, (b) it is 'addressless', ie the semantics of the state-of-the-world update messages don't rely on them being 'sent to' any particular process. tuple spaces?

---

doesn't look too informative, but there are some lists of features:

http://stackoverflow.com/questions/134867/which-programming-language-makes-concurrent-programming-as-easy-as-possible

---

here's how i bet dataflow/the 100-step rule in the brain works:

every time a value is assigned to a register, all later instructions that depend on that value immediately start executing (or re-executing, see below)
- in this way, the number of serial 'steps' in the hundred step rule really are only the serial dependencies, the longest number of hops between the input value and the output value
a different node (neuron, CPU) is assigned to each instruction (source code location)
branches: for non-side-effectful fns, both sides of a branch are 'speculatively' executed before the branch choice is known; the branch choice then chooses which of these results is used (like a gate on a river)
- this can't be done too deeply, however, b/c the number of paths increases geometrically with the number of branch points speculatively crossed
when some but not all dependencies (register or memory locations used as inputs) of an instruction are known, the instruction executes anyways, using the 'old' values of the dependencies -- this is a 'dataflow' system after all. In this fashion, the system is continually 'guessing' even in the face of a lack of complete information (eg it's robust to partial information and to temporary cutoffs from upstream inputs)
in place of moving data values into registers and memory addresses, they are here moved into 'nodes' (neurons, CPUs); an individual node holds one data value
just as move and control instructions are the most used in machine CPUs [1], here 'movement' between nodes is frequent
- so this architecture is good for routing, too; in this way, routing and computation are combined
- and control instructions are also like routing; instead of controlling a jump of a PC (program counter) from one source code location to another within one node (CPU), since nodes correspond to source code locations (individual instruction instances), we select data from one or another upstream node depending on the data coming from a different node (like a transistor does, if i understand correctly)
- using the RISC principal of focusing on optimizing the most common instructions, here the 'movement' and 'routing' operations would be optimized (data transfer from one node to another, and choosing which data to select as input from among various sources)
the control flow graph (before edge contraction into basic blocks) is actually spatially layed out into different nodes/neurons, and data actually spatially flows along these paths
continuation/backwards control: note that when branches are taken speculatively, it's sort of like each branch is represented by a continuation, and the output of the branch instruction is a continuation; it reminds me of the monadic 'bind' logic in Haskell. Said another way, the actual decision is being made at the end of the branch, not at the beginning, and is manifested not as a switching element that chooses its output based on a logical condition, but instead as a switching element that chooses its INPUT based on a logical condition, but instead
since the above doesn't deal with side effects, and since the geometric cost of nested speculative branching does not scale, there does also have to be some mechanisms to actually do 'forward braches' too, where only one branch executes
also, for many cases of iteration, ie those running iterations of some convergent algorithm, a continually-running iterative process will suffice
for eg foreach loops, we require at least one node/neuron per element

---

so far the best way i can think of to implement that in the Oot language is:

still allow sequential writing of algorithms, e.g.: x = 2 + param1 y = x*4 z = x + y

however, allow variables and functions to be declared as 'dataflow persistent', which means that they update automatically. After they have been executed once, as updates become available, execute each statement in parallel, eg after an update comes in for 'param1', x changes, and then y and z start recalculating, and then after y changes z recalculates again.

have a sentinel representing unavailable data. If this value comes in on an update, then it is discarded, and the old value is kept. It is possible, however, as with promises, to use a metaconstruct to check if something is currently available or not.

except for 'dataflow persistent' updates, the parallel semantics between steps don't matter when there is no I/O (input or side effects). When there is side effects, we may want to do something else anyways, as the paradigm in the previous section didn't specify exactly how side effects would work.

---

i still haven't got a straight answer for what region typing and region and effect typing is from searching the web (i think if i spent more time reading and thinking about the search results it would become clear), but here's one simple idea for something we would probably like:

haskell has the idea of segregating functions by type into 'pure' or 'impure'. Pure functions can only read/write their formal input and output parameters, and impure ones can read/write other things too. Haskell lets you (actually, forces you to) say in the type signature whether something is pure or impure. (Part of) the idea is that programmers can look at the type signature of a pure function and know that they don't have to worry about side-effects, making the program simpler to reason about and debug.

But sometimes, deep in a 'pure' call chain, you want to have some impurity. Perhaps you want to temporarily print out debug information to the console, or perhaps you want a logging system, or a profiling system, or a cache. So sometimes you want to bury an impure function inside a pure one without changing all of your type signatures for all of the many pure functions that call (directly or indirectly) that function. How to represent this?

Haskell provides escape hatches such as 'impureIO' which typecheck as pure but are actually impure. This is like promising 'OK this is impure but it acts just as if it were pure'. In fact, some authors define impurity merely in terms of 'visible' side-effects/non-determinism, that is they would allow something 'that has side-effects but acts as if pure' to be counted as pure, but imo these things are leakly abstractions and what is 'invisible' in one context might become visible in another (eg if the program crashes becauset he disk is full and the caching or logging system couldn't handle that).

So imo the thing to do is to have the type signature say 'This is impure but only with respect to the following variables' (or, since I/O is included too, parts of the world state ('bits of state')). Then you group variables and bits of state together into named 'regions' of state, for instance 'Caching subsystem state'. Then the type signature for various things says essentially "Pure except for caching subsystem state".

"effects" would mean just side effects, and both variable state and "world state" as perturbed by effects would be part of these 'regions'.

i think haskell actually allows you to do stuff like this with its various state monads. But i think the type signature notation might get verbose and hard to understand for beginners (also, this monad would commute with most others, but haskell doesn't understand the concept of monad commutivity and assumes that any monad might be non-commutative to be safe, so you have to write monad transformers etc, cluttering things up even more).

note: searching for this kind of thing often turns up the programming language "Disciple"

this sort of idea of 'regions' also fits in with my desire for concurrency because i want to consider main memory not as something that is immediately accessible to the CPU, but rather as a remote device that can be read or written to with a variable delay and with the possibility of non-atomicity and contention with the other CPUs. Otoh i want to abstract the various synchronization protocols away into 'memory devices' that provide/guarantee various memory consistency models (eg maybe the physical memory implementation provides very few guarantees but on top of that is implemented a memory abstraction guaranteeing sequential consistency). These 'memory devices' have a lot in common with 'regions'.

and of course regions also have something in common with the division of state into bits of state carried by particular object instances in OOP, and into particular closures in various languages supporting closures.

furthermore in the above, we see the idea of updates to 'dataflow persistent' variables; but since these are immediately updated, what do you do when you need consistency/synchronization between multiple different 'dataflow persistent' variables? This is the same problem as getting synch between multiple variables inside a memory region being concurrently accessed by multiple CPUs. Such synchronization protocols are often expressed in terms of regions of code (eg transactions, eg critical regions), but another way to look at it is in terms of regions of variables, eg variable A and variable B are (defined to always be) in a 'synchronization region' which guarantees sequential consistency. Again, these 'synchronization regions' appear to be similar or identical to the 'memory devices' above, and hence to the 'regions' described above.

note: one solution to some of these issues that i think is commonly used in today's digital circuits (altho mb i'm misunderstanding) is 'clocks', namely: (a) you have a 'clock' which emits a pulse at regular intervals ('cycles'), (b) each 'dataflow persistent' variable can only update at most once per clock cycle, (c) by default, a read to a 'dataflow persistent' variable yields its value at the end of the PREVIOUS clock cycle (d) accept the design constraints that any chain of operations which updates multiple 'dataflow persistent' variables in a way such that all of these updates should be 'atomic', that is either an outsider would see all of them, or none of them, must complete within one clock cycle; these operations can read the transient value of variables within the same clock cycle. This allows you to guarantee that all 'outsider' reads will see only consistent variable values, eg you have (pseudo)atomic updates to sets of variables, from the perspective of outsiders. The cost is the design constraint in (d), that the computations of these updates must complete within one cycle; this means that the design is not composable in the sense that you might not be able to eg take two operations F and G, where both F and G 'synchronously' update multiple variables in a (pseudo)atomic/consistent fashion, and create a new operation H which is the composition "first do F, then do G", because maybe F and G are each fast enough to fit within one clock cycle, but their sum is not.

note that the solution in the previous paragraph is closely related to the one of Glitch 'managed time' epochs and Ex Tempore; see [Self-proj-oot-ootConcurrencyNotes1], and possibly to Trellis. Also, possibly propagators, and also Alan Kay is quite enamored of it and also says Christopher Strachey thought of it: https://news.ycombinator.com/item?id=11812631

the classic example of a problem with concurrency in computers is:

there is a variable S in main memory which is supposed to represent a counter
S is computed by observing a bunch of things to be counted, and each time an observation is made, S is incremented
but these observations are done independently by a bunch of different CPUs in parallel
the way each CPU does an increment of S is: (1) read the current value (call it 'c') of S from main memory, (2) compute c+1, (3) write c+1 into S
but what if processor #1 does step (1), then processor #2 does steps (1), (2), and (3), then processor #1 does steps (2), (3)? The result will be S=1, not S=2 as desired.
generalizing this, even if there are 1000 different processors all observing a count at about the same time, we could end up with S=1!

note also that the serial way of doing this is just to loop over all of the potential observations.

in computers, we solve this sort of thing in a variety of ways (such as transactions) that imo tend to be implemented via atomic primitives and/or locking and/or synchronization and/or not sharing memory (message-passing). Imo this boils down to preventing any CPU from (reading values from shared memory and then making changes to shared memory based on what it read) while another CPU is 'in the middle of' a similar operation upon 'related' variables in shared memory. This means that one way or another the CPUs must coordinate (synchronize) the timing of their accesses to shared memory. If there are so many CPUs that they becomes infeasible, then you have to stop sharing memory globally; for example, instead of having one variable S in main memory which is accessed by every counting CPU, you have to arrange all of the counting CPUs into eg a tree, where each counting CPU reports its results to a summing CPU 'above' it in the tree, and then the summing CPUs report THEIR results to other summing CPUs above them, etc, until you get to the top of the tree; each summing CPU has the same problem as above w/r/t updating its sum in response to the various inputs from the CPUs below it, but it only has a fixed number of CPUs below it so the synchronization overhead is manageable.

how would the brain solve this? my guess is that it wouldn't do very much locking or synchronization, nor would it have a compare-and-swap (CAS) atomic primitive, but rather: (1) mostly message-passing (2) atomic primitives for common operations such as summation, but limited in fan-in and dynamic range (in contrast to data-movement atomic primitives like compare-and-swap) (eg the frequency of an output neuron could be the sum of the frequencies of its inputs over some regime) (3) maybe a little synchronization at high levels (eg serial conscious thought)

the brain probably would not use the serial method of looping over potential observations.

---

note the similarities between the central clock solution and the pure message-passing solution and the putative brain solution:

in the central clock solution, we have hardcoded upper limits on the longest amount of time that can be taken by any computation that place in between visible writes to variables
in the pure message-passing solution, we have hardcoded upper limits on the number of data elements we can operate upon (because we need another summing CPU for every few counting processes, given our hard limits on how many things we can synchronize at once, and if we assume that we don't want to do any looping because that would turn this into a 'serial solution')
in the putative brain solution, we have hardcoded upper limits on the number of data elements we can operate upon, both because we are using message-passing, and because the atomic primitive we have for eg summation have a hard upper limit on fan-in

---

http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

" Synchronous functions return values, async ones do not and instead invoke callbacks.

    Synchronous functions give their result as a return value, async functions give it by invoking a callback you pass to it.

    You can’t call an async function from a synchronous one because you won’t be able to determine the result until the async one completes later.

    Async functions don’t compose in expressions because of the callbacks, have different error-handling, and can’t be used with try/catch or inside a lot of other control flow statements.

    Node’s whole shtick is that the core libs are all asynchronous. (Though they did dial that back and start adding ___Sync() versions of a lot of things.)

Promises do make async code a little easier to write. They compose a bit better, so rule #4 isn’t quite so onerous.

But, honestly, it’s like the difference between being punched in the gut versus punched in the privates. Less painful, yes, but I don’t think anyone should really get thrilled about the value proposition.

You still can’t use them with exception handling or other control flow statements. You still can’t call a function that returns a future from synchronous code. (Well, you can, but if you do, the person who later maintains your code will invent a time machine, travel back in time to the moment that you did this and stab you in the face with a #2 pencil.)

You’ve still divided your entire world into asynchronous and synchronous halves and all of the misery that entails. So, even if your language features promises or futures, its face looks an awful lot like the one on my strawman.

(Yes, that means even Dart, the language I work on. That’s why I’m so excited some of the team are experimenting with other concurrency models.)

https://github.com/dart-lang/fletch

...

Async-await is nice, which is why we’re adding it to Dart. It makes it a lot easier to write asynchronous code. You know a “but” is coming. It is. But… you still have divided the world in two. Those async functions are easier to write, but they’re still async functions.

You’ve still got two colors. Async-await solves annoying rule #4: they make red functions not much worse to call than blue ones. But all of the other rules are still there:

    Synchronous functions return values, async ones return Task<T> (or Future<T> in Dart) wrappers around the value.

    Sync functions are just called, async ones need an await.

    If you call an async function you’ve got this wrapper object when you actually want the T. You can’t unwrap it unless you make your function async and await it. (But see below.)

    Aside from a liberal garnish of await, we did at least fix this.

    C#’s core library is actually older than async so I guess they never had this problem.

It is better. I will take async-await over bare callbacks or futures any day of the week. But we’re lying to ourselves if we think all of our troubles are gone. As soon as you start trying to write higher-order functions, or reuse code, you’re right back to realizing color is still there, bleeding all over your codebase.

...

Three more languages that don’t have this problem: Go, Lua, and Ruby.

Any guess what they have in common?

Threads. Or, more precisely: multiple independent callstacks that can be switched between. It isn’t strictly necessary for them to be operating system threads. Goroutines in Go, coroutines in Lua, and fibers in Ruby are perfectly adequate.

...

Node with its ever-marching-to-the-right callbacks stuffs all of those callframes in closures. When you do:

function makeSundae(callback) { scoopIceCream(function (iceCream) { warmUpCaramel(function (caramel) { callback(pourOnIceCream(iceCream, caramel)); }); }); }

Each of those function expressions closes over all of its surrounding context. That moves parameters like iceCream and caramel off the callstack and onto the heap. When the outer function returns and the callstack is trashed, it’s cool. That data is still floating around the heap.

The problem is you have to manually reify every damn one of these steps. There’s actually a name for this transformation: continuation-passing style. It was invented by language hackers in the 70s as an intermediate representation to use in the guts of their compilers. It’s a really bizarro way to represent code that happens to make some compiler optimizations easier to do.

No one ever for a second thought that a programmer would write actual code like that. And then Node came along and all of the sudden here we are pretending to be compiler back-ends. Where did we go wrong?

Note that promises and futures don’t actually buy you anything, either. If you’ve used them, you know you’re still hand-creating giant piles of function literals. You’re just passing them to .then() instead of to the asynchronous function itself.

...

Awaiting a generated solution

Async-await does help. If you peel back your compiler’s skull and see what it’s doing when it hits an await call you’d see it actually doing the CPS-transform. That’s why you need to use await in C#: it’s a clue to the compiler to say, “break the function in half here”. Everything after the await gets hoisted into a new function that it synthesizes on your behalf.

This is why async-await didn’t need any runtime support in the .NET framework. The compiler compiles it away to a series of chained closures that it can already handle. (Interestingly, closures themselves also don’t need runtime support. They get compiled to anonymous classes. In C#, closures really are a poor man’s objects.)

You might be wondering when I’m going to bring up generators. Does your language have a yield keyword? Then it can do something very similar.

(In fact, I believe generators and async-await are isomorphic. I’ve got a bit of code floating around in some dark corner of my hard disc that implements a generator-style game loop using only async-await.)

Where was I? Oh, right. So with callbacks, promises, async-await, and generators, you ultimately end up taking your asynchronous function and smearing it out into a bunch of closures that live over in the heap.

Your function passes the outermost one into the runtime. When the event loop or IO operation is done, it invokes that function and you pick up where you left off. But that means everything above you also has to return. You still have to unwind the whole stack.

This is where the “red functions can only be called by red functions” rule comes from. You have to closurify the entire callstack all the way back to main() or the event handler.

...

Reified callstacks

But if you have threads (green- or OS-level), you don’t need to do that. You can just suspend the entire thread and hop straight back to the OS or event loop without having to return from all of those functions.

Go is the language that does this most beautifully in my opinion. As soon as you do any IO operation, it just parks that goroutine and resumes any other ones that aren’t blocked on IO.

If you look at the IO operations in the standard library, they seem synchronous. In other words, they just do work and then return a result when they are done. But it’s not that they’re synchronous in the sense that it would mean in JavaScript?. Other Go code can run while one of these operations is pending. It’s that Go has eliminated the distinction between synchronous and asynchronous code.

Concurrency in Go is a facet of how you choose to model your program, and not a color seared into each function in the standard library. This means all of the pain of the five rules I mentioned above is completely and totally eliminated.

So, the next time you start telling me about some new hot language and how awesome its concurrency story is because it has asynchronous APIs, now you’ll know why I start grinding my teeth. Because it means you’re right back to red functions and blue ones.

interestingly, when i read the first part of the article, talking about 'red' and 'blue' functions where only red fns can call red fns, and the syntax for red is more annoying, i thought of Haskell's pure (blue) vs. monadic (red) functions.

later: interestingly, everyone on HN thought the same:

https://news.ycombinator.com/item?id=8984648

---

jacques_chester 1 day ago

The problem is that microservices are not objects. They leak reality into your problem domain in a way that simply cannot be made to go away.

If regular object oriented programming languages had method calls that randomly failed, were delayed, sent multiple copies of a response, changed how they behaved without warning, sent half-formed responses ... then yes it would be the same.

Distributed systems are hard, because you cannot change things in two places simultaneously. All synchronisation is limited by the bits you can push down a channel up to, but not exceeding, the speed of light. In a single computer system this problem can be hidden from the programmer. In a distributed system, it cannot.

Probably the most devastating critique of the position that "it's just OO modeling!" came in A Note on Distributed Computing, published in 1994 by Waldo, Wyant, Wollrath and Kendall[0]:

"We look at a number of distributed systems that have attempted to paper over the distinction between local and remote objects, and show that such systems fail to support basic requirements of robustness and reliability. These failures have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible."

[0] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7...

reply "

https://blog.mozilla.org/javascript/2015/02/26/the-path-to-parallel-javascript/

https://news.ycombinator.com/item?id=9115231

---

ChuckMcM? 13 hours ago

In my experience there are three things that will break here;

1) At-most-once is a bridge to an elementary school which has an inter-dimensional connection to a universe filled with pit vipers. Kids will die, and there is nothing you can do to stop it.

2) Messages are removed when acknowledged or memory pressure forces them to be kicked out. Black Perl messages, those that sail in from out of nowhere, and lonely widows (processes that never find out their loved ones are dead) will result.

3) Messages are ordered using wall clock millisecond time. This will leave your messages struggling to find their place in line and messages that should be dead, not be dead (missing fragment problem).

Obviously all these are simply probabilistic trade-offs based on most likely scenarios which result in arbitrarily small windows of vulnerability. No window is small enough at scale over time.

Often when these things have bitten me it has been non-programming stuff. For example a clock that wouldn't follow NTP because it was too far ahead of what NTP thought the time was, an operator fixed that by turning time back 8 seconds. A client library that was told messages arrive at most one time, and so made a file deletion call on the arrival of a message, a restored node holding that message managed to shoot it out before the operator could tell it that it was coming back from a crash, poof damaged file. And one of my favorites in ordering, a system that rebooted after an initial crash (resetting its sequence count) and getting messages back into flight with the wrong sequence number but with legitimate sequence values. FWIW, these sorts of things are especially challenging for distributed storage systems because files are, at their most abstract, little finite state machines that walk through a very specific sequence of mutations the order of which is critical for correct operation.

My advice for folks building such systems are never depend on the 'time', always assume at-least-once, and build in-band error detection and correction to allow for computing the correct result from message stream 'n' where two or more invariants in your message protocol have been violated.

Good luck!

-- https://news.ycombinator.com/item?id=9208501

---

" Python has great concurrency primitives, including generators, greenlets, Deferreds, and futures. Python has great concurrency frameworks, including eventlet, gevent, and Twisted. Python has had some amazing work put into customizing runtimes for concurrency, including Stackless and PyPy?. All "

---

" One technical aspect that is also different is that Backbone is heavily event-oriented. Mithril, on the other hand, purposely avoids the observer pattern in an attempt to abolish "come-from hell", a class of debugging problems where you don't know what triggers some code because of a long chain of events triggering other events.

A particularly nasty instance of this problem that sometimes occurs in "real-time" applications is when event triggering chains become circular due to a conditional statement bug, causing infinite loops and browser crashes. " -- http://lhorie.github.io/mithril/comparison.html

---

PEP 492 - Coroutines with async and await syntax

HN discussion: https://news.ycombinator.com/item?id=9397320

notes on that discussion (havent read all of it yet, just some) " rdtsc 4 days ago

parent

flag

Looks good, but coming from someone who used Twisted for a few years, I had found deferreds to be messy and switched to yield-type co-routines (http://twistedmatrix.com/documents/10.2.0/api/twisted.intern...), but eventually I found those pretty verbose as well. Small demo examples always look clean and fun, but large applications at the top level will end up looking basically like this:

    y = yield f(x)
    z = yield g(y)
    w = yield h(z)
    ...

In this case it would be async/await instead of pure yields.

The worst thing was having to hunt for Twisted version of libraries. "Oh you want to talk XMPP? Nah, can't use this Python library, have to find the Twisted version of it". It basically split the library ecosystem. Now presumably it will be having to look for async/await version of libraries that do IO. " -- https://news.ycombinator.com/item?id=9397943

" jonesetc 4 days ago

parent

flag

> but eventually I found those pretty verbose as well...

It is definitely a bit verbose, but I decided that the clarity for the rest of the code is worth putting a yield before each function call. Also I've found a few projects (for Tornado at least) that cut down on this boiler plate and make the yield only required at the lowest level where the async really happens. [0]

> The worst thing was having to hunt for Twisted version of libraries. "Oh you want to talk XMPP? Nah, can't use this Python library, have to find the Twisted version of it". It basically split the library ecosystem. Now presumably it will be having to look for async/await version of libraries that do IO.

I work with Tornado and this is absolutely the worst part. At least with Tornado the newest version is embracing interoperability with python 3.4+ native AsynIO?.

[0] https://github.com/virtuald/greenado

https://docs.python.org/3/library/asyncio.html

https://news.ycombinator.com/item?id=9582980

 trentnelson 6 hours ago

    > To make things worse, non-blocking I/O is done completely differently
    > under Unix and under Win32.  I'm not even sure Win32 provides enough
    > support for async I/O to write a real user-level scheduler.

sigh, VMS got the link between processes, threads, I/O and waitable events (specifically, the link between tying the completion of future I/O to subsequent computation) right from day one. And by virtue of Cutler, therefore, so did NT, and thus, Windows.

UNIX did not. The core concept of separating the work (computation to be done after an event occurs) from the worker[1] (the thread that performs the work) is absent; the manifestation of that is the lack of good, completion-oriented asynchronous I/O primitives. Instead of being able to say to the kernel "here, do this, then let me know when you're done"[2] and moving on to the next piece of work in the queue, you have to do the elaborate non-blocking multiplex dance for socket I/O, palm file I/O off onto a separate set of threads that can block (or do AIO) and generally manage all threading and concurrency primitives yourself.

It took me ten years of UNIX systems programming to suddenly grasp the elegance of the VMS/NT/Windows approach a few years ago. It provides you with everything you need to optimally exploit all your cores for work that is both heavily compute bound and I/O bound.

It has been fascinating to see the difference in performance between Linux and Windows in practice with PyParallel? when Windows kernel primitives are exploited properly:

https://speakerdeck.com/trent/pyparallel-pycon-2015-language....

And more recently, with 10Gbe hardware at home:

Linux lwan (the top performer on Techempower Framework Benchmark):

    [trent@zebra/ttypts/1(~s/wrk)%] time ./wrk --timeout 120 --latency -c 256 -t 12 -d 30 http://10.0.0.2:8080/plaintext
    Running 30s test @ http://10.0.0.2:8080/plaintext
      12 threads and 256 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     5.34ms    7.46ms 197.13ms   82.40%
        Req/Sec    14.41k   364.49    18.82k    76.61%
      Latency Distribution
         50%  398.00us
         75%    9.01ms
         90%   17.50ms
         99%   28.03ms
      5178617 requests in 30.10s, 0.93GB read
    Requests/sec: 172048.49
    Transfer/sec:     31.67MB

Windows PyParallel?:

    [trent@zebra/ttypts/1(~s/wrk)%] time ./wrk --timeout 120 --latency -c 256 -t 12 -d 30 http://10.0.0.2:8080/plaintext
    Running 30s test @ http://10.0.0.2:8080/plaintext
      12 threads and 256 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.52ms    9.38ms 492.43ms   99.33%
        Req/Sec    18.37k     1.01k   22.75k    73.50%
      Latency Distribution
         50%    1.09ms
         75%    1.28ms
         90%    1.56ms
         99%    5.18ms
      6598900 requests in 30.10s, 1.03GB read
    Requests/sec: 219236.69
    Transfer/sec:     34.92MB
    ./wrk --timeout 120 --latency -c 256 -t 12 -d 30   106.30s user 138.87s system 814% cpu 30.114 total

[1]: https://speakerdeck.com/trent/parallelism-and-concurrency-wi...

[2]: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-...

pascal_cuoq 3 hours ago

Xavier's first sentence states that they two operating systems have a visibly different philosophy, not that one is better than the other. The second sentence should be interpreted in the context of this first sentence: if you try to emulate Unix's primitive with Windows', and especially if you want to do this and write a user-level scheduler that does not occasionally deadlock without reason, you will get stuck in a couple of places.

This doesn't mean that Windows' philosophy does not give you optimal performance in PyParallel?. It simply means that OCaml had chosen for its low-level system primitives a Unix model and that it was difficult to make a Windows version of the same primitives so that OCaml programmers could write this kind of program portably between Windows and Unix.

NOTE: without, at the time it is in my timezone, looking up the full post, I have to say that I don't think that the quoted two sentences have anything to do with the discussion. It seems to me that the two sentences assume that a multicore (multiprocessor, at the time the post was written) OCaml runtime is not available, and discusses the options to still provide threads. A user-level scheduler is one option to provide threads to OCaml programs without a concurrent OCaml runtime. Another option is to use Windows' native threads and superior philosophy for blocking primitives to run each OCaml thread as a native thread (although at most one of these will be running at any given time. All the others will be waiting on the heap mutex).

OCaml ended up providing threads under Windows and a Unix-like “Unix” module around 1996-ish, way before the linked discussion. So thanks for the explanation about VMS, but I think it is off-topic, too.

NOTE 2: I have now read the original post. You should, too. It starts with:

> Threads have at least three different purposes:

> 1- Parallelism on shared-memory multiprocessors.

> 2- Overlapping I/O and computation (while a thread is blocked on a network

> read, other threads may proceed).

>3- Supporting the "coroutine" programming style

> (e.g. if a program has a GUI but performs long computations,

> using threads is a nicer way to structure the program than

> trying to wrap the long computation around the GUI event loop).

> The goals of OCaml threads are (2) and (3) but not (1) (for reasons

> that I'll get into later)

What makes it relevant to the current discussion is (1), but Xavier is discussing (2) and (3) at the time of the quote you chose to take out of context.

trentnelson 2 hours ago

Oh, no, that's what the sigh was for; Windows has the best model, but there's no equivalent on UNIX, so, you end up having to code to the lowest common denominator (the UNIX model) if you want your software to run somewhere else other than Windows (i.e. almost all open source software).

I'm not disputing any of the technical things he's saying; just ranting about the unfortunate nature of two vastly different kernel models, and the fact that no open source stuff properly exploits Windows facilities, despite them being technically superior.

 tangled 7 hours ago

It's interesting to read Xavier's annual statement on why there will never be multi core support in OCaml: http://mirror.ocamlcore.org/caml.inria.fr/pub/ml-archives/ca...

http://mirror.ocamlcore.org/caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html

avsm 5 hours ago

If you'd like to see the approach we're taking at OCaml Labs in order to build multicore, read KC's blog post here:

http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...

The core idea is incredibly exciting (to us, anyway). Rather than baking in a specific multicore scheduler, we're allowing pluggable schedulers written in OCaml. They use algebraic effects to allow an independent scheduler to compose concurrency among OCaml threads. This will ensure that the OCaml runtime remains lean, and even allow applications to define their own strategies for concurrent scheduling.

DonPellegrino? 9 hours ago

More information is available

in my original post in r/ocaml: https://www.reddit.com/r/ocaml/comments/36ninh/403_scheduled...

in the repost in r/programming: https://www.reddit.com/r/programming/comments/36ppx0/ocaml_4...

DonPellegrino? 6 hours ago

For those asking "How the hell does OCaml not support multicore in 2015????", this is my reply, crossposted from /r/ocaml:

You can make OS level threads, but they can't be both running at the same time due to the GIL (Global Interpreter Lock). Then why are they even there you might ask? Because it allows you to do a blocking call on a thread and to keep executing other stuff in the main thread. Other languages that have a GIL (and the same restriction) are Javascript (including Node.js), Ruby and Python.

Now, IN PRACTICE, things are a bit different. You're never gonna make your own thread to block on things. You're gonna use Lwt to manage all your concurrency so you can do tons of blocking stuff at the same time and combine the tasks nicely without ending up in a Node.js-style "callback hell".

But still, even with tons of concurrency, you don't have parallelism. It's all you need for 98% of your programs, but if you then need to do heavy number-crunching it won't be enough. This is the exact same situation that happens in Node.js, Python, etc, except that OCaml is massively faster than those languages, so even some CPU-bound work is acceptable because OCaml is really performant.

Currently, there's 2 options if you wanna do CPU-bound work: you can use ctypes to call C code easily (from Lwt_preemptive) and then release the lock from within C with caml_release_runtime_system(), so your C code will be truly parallel (and running in the thread pool automatically managed by Lwt_preemptive), and you can call caml_acquire_runtime_system() before returning the result back to OCaml to get the lock back and merge back with the normal code.

The second option is to do an oldschool fork() and communicate with message-passing. Or have a master that manages workers and communicates with ZMQ, HTTP, TCP, IPC, etc. Or use a library that does it all for you like parmap, Async Parallel, etc etc.

What this "multicore support" means is that you'll be able to have threads in the same process that run in parallel because the GIL is going away. In practice it'll probably be implemented directly into Lwt so you'll be able to do something with Lwt_preemptive and just tell it to run some function in a separate thread and then use >>= to handle its result. It's gonna be simpler than both options I described above.

Again, more technical information is available in my r/ocaml post

jwatzman 4 hours ago

> The second option is to do an oldschool fork() and communicate with message-passing. Or have a master that manages workers and communicates with ZMQ, HTTP, TCP, IPC, etc. Or use a library that does it all for you like parmap, Async Parallel, etc etc.

I work on the Hack language typechecker at Facebook. The typechecker is written in OCaml, and since it needs to operate on the scale of Facebook's codebase (tens of millions of lines of code), it's a pretty performance-sensitive program. We needed real parallelism, but doing it with fork() and IPC was too costly for us, both in terms of storage (if you aren't careful you end up duplicating a bunch of data) and CPU (serializing/deserializing OCaml data structures to send over IPC is CPU-intensive).

We ended up doing something somewhat more interesting. Before we fork(), we mmap a MAP_ANON

MAP_SHARED region of memory -- that region will be backed by the same physical frames in each child after we fork, so writes to it in one child process will be visible in the others. We use a little bit of C code to safely manage the shared-memory concurrency here.

The code for this all open source (along with the rest of the typechecker, HHVM runtime, etc) if you want to take a look: https://github.com/facebook/hhvm/blob/master/hphp/hack/src/h...

I also gave a tech talk a while ago on internals of the type system and typechecker; the latter part starts here: https://www.youtube.com/watch?v=aN22-V-b8RM&feature=youtu.be...

AceJohnny?2 4 hours ago

> We ended up doing something somewhat more interesting. Before we fork(), we mmap a MAP_ANON

Isn't that similar to how Linux implemented threads for a long time (before NPTL [1]) ?

I vaguely recall that for a long time people were complaining about the cost of starting threads in Linux, because it basically amounted to fork()+shared memory.

[1] http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library

jwatzman 1 hour ago

I don't know the history of threads/NPTL on Linux. However, the distinction between "thread" and "process" in the Linux kernel is mostly a human one, not a technical one. Take a look at the clone() syscall -- spawning a thread vs. forking a process amount just to different flags to that call, to tell it whether to copy pages or not, how to assign a new ID to the new thread/process, etc. (Not sure if that's how fork() and friends are actually implemented under the hood.)

eurleif 2 hours ago

>Other languages that have a GIL (and the same restriction) are Javascript (including Node.js)

Not quite true; JS just doesn't support threads at all. It's asynchronous and single-threaded. In node.js's case, an event loop uses a system call like epoll or kqueue to wait for many events at a time, and dispatches those events to the correct callbacks.

You can do parallelism in JS with Web Workers, and they do use native OS threads, but they lack shared memory, and can only communicate using message passing. So from the perspective of the JS code, they behave more like processes than threads. No GIL, in any case.

sampo 5 hours ago

> Node.js, Python, etc, except that OCaml is massively faster than those languages

The numerical benchmark table in http://julialang.org/ suggests that JavaScript? is quite a number crunching beast, within 2x-3x of C performance.

DonPellegrino? 4 hours ago

The thing is that numbers are usually wrapped in other things, like objects, hashtables, arrays, etc, and OCaml is a beast at dealing with that kind of code.

From a purely numbers perspective, its operations on integers have to use the LEA instruction instead of ADD (for example) because of the 1-bit tag, which slows things down a bit, but the speed at dealing with symblic code as I explained above more than makes up for it.

---

"Python's multiprocessing library...gives IPC the same API as interacting with threads,"

---

" Concurrency and Paralellism

Rust as a language doesn't really have an opinion on how to do concurrency or parallelism. The standard library exposes OS threads and blocking sys-calls because everyone has those, and they're uniform enough that you can provide an abstraction over them in a relatively uncontroversial way. Message passing, green threads, and async APIs are all diverse enough that any abstraction over them tends to involve trade-offs that we weren't willing to commit to for 1.0.

However the way Rust models concurrency makes it relatively easy to design your own concurrency paradigm as a library and have everyone else's code Just Work with yours. Just require the right lifetimes and Send and Sync where appropriate and you're off to the races. Or rather, off to the... not... having... races. " -- https://doc.rust-lang.org/nightly/nomicon/concurrency.html

---

HLSL shaders have a bunch of stuff that bears on stuff i've been thinking about:

'branch' vs 'flatten' annotations tells whether conditions should actually branch or should just use conditional moves (apparently the hardware sometimes can't handle deep nesting in one mode or another):

https://msdn.microsoft.com/en-us/library/windows/desktop/bb509610%28v=vs.85%29.aspx https://msdn.microsoft.com/en-us/library/Bb313974%28v=XNAGameStudio.31%29.aspx http://www.gamedev.net/topic/594051-hlsl-flow-control-attributes/

nointerpolation:

apparently vertex shaders by default etravel down all code paths and interpolate the results? https://msdn.microsoft.com/en-us/library/windows/desktop/bb509706%28v=vs.85%29.aspx http://www.gamedev.net/topic/488524-hlsl-weird-behavior/

---

'signals' as messages that are also control flow constructs for concurrency

---

" Channels give you a multiplexed connection to the server for bidirectional communication. Phoenix also abstracts the transport layer, so you no longer have to be concerned with how the user has connected. Whether WebSocket?, Long-polling, or a custom transport, your channel code remains the same. You write code against an abstracted "socket", and Phoenix takes care of the rest. Even on a cluster of machines, your messages are broadcasted across the nodes automatically...As a "web framework", Phoenix targets traditional browser applications, but the so-called "web" is evolving. And we need a framework to evolve with it. Phoenix transcends the browser by connecting not only browsers, but iPhones, Android handsets, and smart devices alike. Justin Schneck, Eoin Shanaghy, and David Stump helped Phoenix realize this goal by writing channel clients for objC, Swift, C#, and Java. To appreciate what this enables, Justin demo'd a Phoenix chat application running on an Apple Watch, iPhone, and web browser all powered by native phoenix channel clients: " -- http://www.phoenixframework.org/blog

---

"Go is a fun imperative language and is great for command line tool projects. It's a small language, pretty quick, easy to learn, with mailbox-style concurrency. Since it all gets compiled into a tight 'little' binary, it's easy to ship around (as long as it's been compiled for the arch/os you are shipping it around on). You still run into problems since it's totally easy to share mutable variables across goroutines which can lead to some funny stuff where you probably will end up just throwing a mutex on accessing that variable since it's easier than rearchitecting your program to just pass the value of that variable around via channels.

But oh! you say, that's just lazy programming and totally fixable by not being a dumbshit. And you'd be right. However go has other problems for long running, large stuff, like if one goroutine goes horribly wrong, it's probably going to take the whole damn processes with it which Erlang/Elixir will not do. ...It doesn't have the "startup cost" of a VM based language like Erlang/Elixir, so that's why I say it's great for CLI stuff, when we're comparing the two. The BEAM (Erlang VM) definitely takes a noticeable amount of time to start up, so it's much more suited toward long running stuff. "

---

" This is a really broad question but I'l provide a couple of points why I personally regard Go inferior to Erlang when it comes to building complex systems that must run continuously:

    A crash (unhandled panic) of goroutine crashes the entire system. A whole system may halt due to an individual bug.
    There is shared memory in Go. Hence, if one goroutine crashes, even if you catch the panic, it might leave some permanent in-memory junk i.e. corrupt data that will compromise other parts of the system.
    It's impossible to unconditionally terminate a goroutine. You can do something manually, but this amounts to sending a message "please kill yourself", and hoping this will happen. If due to some bug, you have a goroutine that ends up in a tight CPU bound loop, you can't possibly stop it, and it will waste your resources forever....

Yes, this is how I think about gen_servers. They are kind of like concurrent objects. They have identity, they have interface, and they encapsulate state. Unlike typical OO style, most of the time you'll actually want to have some registration mechanism and refer to processes via aliases. This is mostly needed for fault-tolerance reasons: a process may fail, and be restarted, in which case the new process will get a different pid. Thus, rather than keeping and passing pids around, it's frequently better to use registration/discovery mechanism. A server process registers itself under some name, and clients discover it through this name. If a server process is restarted, the new one will register itself under the same name, and system will continue to work normally. There are various approaches to registration/discovery. Local registration (e.g. via :name option), and gproc (https://github.com/uwiger/gproc) are the ones I use the most. "

" This part is hard to adjust to. I think I like it, but it creates some strange parallels. In OO code, you care about a class, and you instantiated it into an object that you can carry around. Gen servers feel like this; you initialize them and get a processid back that you can pass in as a first parameter to further method calls. So even though the underlying code is very different, I'm running into strange similarities. Can't tell if that's just me over-translating into an OO background. "

---

" The key difference to an Erlang'er is that when something goes wrong you can terminate the process and restart it sanely (which others might argue is why you used an architecture with a channels design in the first place to have an arms length relationship between the code...). With Go all bets are off, you can have a pool of processes watching a channel in case one gets stuck, but even then all processes could get stuck or one can die and take down all the others (shared state...), there is no reliable way to deal with failure on the other end of the channel. "

---

" Prediction #3: Multi-threaded programming will fall out of favor by 2012

Hard to say if this is right or not. Depends on who you ask. This seems basically right for applications that don’t need the absolute best levels for performance, though.

    In the past, oh, 20 years since they invented threads, lots of new, safer models have arrived on the scene. Since 98% of programmers consider safety to be unmanly, the alternative models (e.g. CSP, fork/join tasks and lightweight threads, coroutines, Erlang-style message-passing, and other event-based programming models) have largely been ignored by the masses, including me.

Shared memory concurrency is still where it’s at for really high performance programs, but Go has popularized CSP; actors and futures are both “popular” on the JVM; etc. "

---

should be able to request a shared memory construct with specified memory consistency model, and specified persistence etc guarantees (ACID (Atomicity, Consistency, Isolation, Durability), BASE (basically available, soft state, eventually consistent (eventually consistenct vs strong eventually consistent, see [2]), monotonicity/no rollbacks(see [3]), availability, etc)

---

nailer 7 hours ago

For anyone unfamiliar with JS, async/await is about the best JavaScript? will ever get:

    var orders = await getJSON('/users/joe/orders');

No callbacks, promises etc (Edit: that you have to look at, as other posters note the implementation uses promises in the background).

You still have to explicitly say you want to be async, so it's not quite as good as non-blocking IO in something like Elixir, but it's the future as far as JS goes.

jlongster 6 hours ago

There are promises there, it just hides it. All async functions return a promise, and `await` works with promises builtin. It's basically the same thing as generators, without the `function*` syntax.

Which is unfortunate because async/await only works with promises, and if you want any other async model (like one that doesn't eat errors all the time) you can't use it. Luckily, generators are fine.

PopsiclePete? 1 hour ago

I'm a little sad the async/await model is winning over. I really prefer Go's CSP. It can do everything the async/await model can, plus "streaming" API's. I dig (sorta) C#'s async/await, but I still don't see why it's "winning" over other, better models...

idibidiart 1 hour ago

CSP can be implemented on top of async/await, which itself is implemented on top of promise and generators.

What does streaming APIs have to do with CSP per se?

findjashua 1 hour ago

for consuming 'streaming APIs', why not just use observables (rx.js etc)?

jlongster 3 hours ago

> What do you mean by "if you want any other async model you can't use it"?

Promises are not the only way to do async work in JS, as much as some people like to say that. You have observables and channels, which are more powerful because they handle streaming.

There are proposals to make async/await more composable: https://github.com/jhusain/compositional-functions. TC39 is aware of this but I'm not sure if they are going to fix it or not.

> Also, promises don't eat errors. It's the code using those promises that forget to handle them.

They literally do. They run your code in a try/catch block and store the error away, and you have to remember yourself to manually throw it later. You shouldn't be forced to run code inside a try/catch. http://jlongster.com/Stop-Trying-to-Catch-Me

my note: i looked at https://github.com/jhusain/compositional-functions . I don't understand it, starting with the code after 'Here is the definition of the Composition Function:'. In general, going off both the HN comments and the proposal, it seems that the idea is that present-day js async/await desugars to generators and uses them to produce Promises, whereas in the proposal, async/await desugars to generators but allows the user to provide a custom wrapper that turns the generators into anything, eg a Promise, a Task, an Observable. What i dont get in the proposal Composition Function (for promises) code:

why is gen.next being called with 'undefined'?
i dont get the not finished, chain off the yielded promise and `step` again part

---

examples of callbacks vs promises vs asynch/await:

http://blogs.msdn.com/b/eternalcoding/archive/2015/09/30/javascript-goes-to-asynchronous-city.aspx

my comment: looks great! yeah! i am concerned with the 'eat errors' thing that the guy mentions above. I find this quite annoying when it occurs in Python processes (which is understandable though; but i think this may even happen in Python threads, which is less understandable, but i dont remember).

also, in oot this stuff will be by default; you'll add a special annotation to make something block, or to explicitly see the implicit Promise (or whatever; see below) that everything actually is

also, i guess Observable is more general than Promise, so i guess we'll use that

(btw recall The introduction to Reactive Programming you've been missing)

btw what's the diff between Observables and Channels (eg Go channels), Erlang mailboxes?

this guy seems to think that observables are in essence callbacks (that probably occur in the same thread as the caller), whereas channels are (potentially, in the case of Go) async queues between threads: http://stackoverflow.com/questions/20632512/comparing-core-async-and-functional-reactive-programming-rx

---

random note: interesting use of the word 'complect':

"Callbacks are code that is executed. And that execution has to happen in some thread at some point in time. Often, the time is when the event happens, and the thread is the thread that notices/produces the event. If the producer instead put a message on a channel, you are free to consume the message when you want, in whatever thread you want. So callbacks, which are essentially a form of communication, complect the communication with flow of control by dictating when and where the callback code is being executed."

---

notes on /quotes from http://www.slideshare.net/deanwampler/reactive-design-languages-and-paradigms

reactivemanifesto.org

scalable: loosely coupled, composable, distributed. network problems first-class.
responsive: must respond, even when errors occur
resilient: Failures ﬁrst-‐class, isolated. Errors/ recovery are just other events.
event-driven: Asynchronous, non-‐blocking. Facts as events are pushed.

OOP: state and behavior are joined

Joined in objects. Contrast with FP that separates state (values) and behavior (functions).

Event-driven: A beneﬁt because events make natural objects, but event-handling logic can be obscured by object boundaries.
Scalability: Bad, due to the tendency to over-engineer the implementation by implementing more of the domain model than absolute necessary. This makes it harder to partition the program into “microservices”, limiting scalability. For high-throughput systems, instantiating objects for each “record” can be expensive. Arrays of “columns” are better if a lot of “records” are involved per event (or batches of events).
Responsive: Any code bloat and implementation logic scattered across class boundaries slows down the performance, possibly obscures bugs, and thereby harms responsiveness.
Resilient: Harder to reify Error handling, since ...cuts across domain object boundaries

Rich domain models in code that can’t be teased apart easily into focused, lightweight, fast services. For example, if a fat “customer” object is needed for lots of user stories, the tendency is to force all code paths through “Customer”, rather than having separate implementations, with some code reuse, ...

Example: What should be in a Customer class? Tuesday, May 13, 14 What ﬁelds should be in this class? What if you and the team next door need different ﬁelds and methods? Should you have a Frankenstein class, the superset of required members? Should you have separate Customer classes and abandon a uniform model for the whole organization? Or, since each team is actually getting the Customer ﬁelds from a DB result set, should each team just use a tuple for the ﬁeld values returned (and not return the whole record!), do the work required, then output new tuples (=> records) to the database, report, or whatever? Do the last option...

Claim: OOP’s biggest mistake: believing you should implement your domain model. Tuesday, May 13, 14 This leads to ad-hoc classes in your code that do very little beyond wrap more fundamental types, primitives and collections. They spread the logic of each user story (or use case, if you prefer) across class boundaries, rather than put it one place, where it’s easier to read, analyze, and refactor. They put too much information in the code, beyond the “need to know” amount of code. This leads to bloated applications that are hard to refactor in to separate, microservices. They take up more space in memory, etc. The ad-hoc classes also undermined reuse, paradoxically, because each invents its own “standards”. More fundamental protocols are needed.

todo: on slide 17

http://www.slideshare.net/deanwampler/reactive-design-languages-and-paradigms

---

this isn't really the best place to put this but i dunno where else to put it so i'll put it here for now.

Consider the following analogy between people and nodes in a scalable computing system.

First, let's talk about people.

As a person, let's say you are ambitious in your career. You want to be "one of the top people in your field". But do you mean "in the top n-th percentile" or "in the list of the top n people"? For example, you might want to be in the top 1%, or you might want to be in the top 10 people. Call the former the top-n%-ers, and the latter the top-n-ers. The percentile ambition is scalable, the absolute ambition is not. For example, consider scaling up the system to a galaxy-spanning civilization; assuming about 1-billion Earths, and 10 billion people on each Earth, that's about 1e19 people. Clearly, the percentile ambition scales whereas the absolute ambition becomes harder to achieve as the population increases; as population tends to infinity, the possiblity of achieving the absolute target tends to zero.

So, are we content with the percentile ambition? Probably not; since things tend to be follow power-law distributions, those few people who have achieved the absolute target reap much, much larger rewards than those who achieve slightly lower percentile. Eg to consider wealth, Bill Gates is much, much richer than the poorest person in the top 1% of wealth; the difference in wealth between Bill Gates and the poorest 1%er is much much greater than the difference between the poorest 1%er and the poorest 2%er, even though the difference in percentile is about 1% in each case.

Can this be remedied? Well, in theory perhaps we could change our taxation system to essentially eliminate extreme wealth (i'm not here taking a position on whether this would be a good idea, this is just an example; i'm trying to make a different, more technical point). But there is another reward (or curse, depending on your point of view) that the absolute top-n-ers get: fame. Assuming that the capacity of a human mind is unchanging, and that the social dynamics which lead to mass fame are unchanging, there's nothing that government can do about that; people have a fixed memory capacity, so while many people might recognize the name "Bill Gates" as a well-known rich guy, if the total human population increases a billion-fold to a galaxy-spanning civilization, then it will not be possible for each person to remember the names of the top billion rich people. This is similar to what happens with top universities; although the total number of universities meeting a given educational quality bar has almost certainly expanded greatly in the US over the last few decades, the number of brand names that most people remember has not, and so the repute associated with attending, say, the 30th-best university probably hasn't grown as much as it 'should have'.

So, it seems that we must content ourselves with accepting that the top-n-ers will always be much more powerful than others, at least to the extent that their fame alone gives them power. Which means that, if you are doing business, you'd like these people to be your partners, right? But now we come to another scaling dynamic; just as memory capacity per person stays constant even as the total number of people scales up, so does the number of hours in a day, and therefore the total minutes spent on business conversations by each top-n-er. But, the total number of people who want to talk to the top-n-er increases (to first order) with the total number of people. Therefore, although you want to talk to the top-n-er, they don't have time to talk to you. How they respond to this is up to them; they might simply ignore this bottleneck, and turn down all requests to talk from strangers, with no path for a stranger to get in touch with them; or they might create an organization around themselves, where other people filter requests to speak to them and possibly even enter into business partnerships on their behalf (in practice the human social system will automatically do a little bit of the latter, even if they don't do anything different themselves; people who want to partner with a top-n-er will try to find a 3rd degree connection and then get introduced to a 2nd-degree connection, etc).

Finally, let's transpose this to computing systems. You have a bunch of nodes. Each node is of a fixed (or relatively slowly increasing) size (in terms of its own computational and communications capacity). But you want to scale up the total number of nodes. Typically such systems treat no nodes as special, or maybe introduce a class of 'supernodes' which is some small proportion of the total (this are like top-n-%ers); imperfectly apeing the small-world-network construction recipe of starting with a lattice and then adding a few long-range connections. But if we transpose human power-law dynamics, then we can let ALL nodes (or at least 'most' nodes) remember a small, fixed-size list of 'famous' nodes, the top-n-ers (although, if we transpose human dynamics, then different nodes know slightly different lists of famous nodes, even though there is in theory a single, objective ranking of fame). These 'famous' nodes, however, have the same communications capacity of all the other nodes, and so can't handle any more connections than usual; however nearby nodes serve as 'organizations' that composed of agents that transact business on behalf of the famous nodes. These 'organizations' need not be formal groups with binary membership; in some designs, the 'famous nodes' might not have any actual 'business' which is transacted by proxies, but rather might just serve as sort of routing reference points in terms of which directionality is assessed (ie the structure of a network is high-dimensional, but we might reduce it to a lower dimensionality in which the cardinal directions are proximity to most famous node, proximity to next-most-famous node, etc; one problem with this design is that if the 'famous nodes' are all next to each other then such a dimensionality reduction doesn't help us much in navigating the network very far from that central cluster, in the same way that in very poor rural area where no one is socially proximate to any of the richest people in the world, the ratio of social proximity to various rich people will not change very much between inhabitants of this region).

---

http://smallcultfollowing.com/babysteps/blog/2015/12/18/rayon-data-parallelism-in-rust/

---

so if observables (event streams) are like lists, and subscribing to an observable means to schedule a callback to be called when an event is emitted (note: observables can have first-class subscriptions, see eg https://zeit.co/blog/async-and-await#observables ), then is 'subscribe' like 'map' on something lazy and async, except with a reified, cancellable 'subscription'?

can Oot just have lists ~= iterators = Observables?

and can we also get rid of the annoying distinction between lists and iterators?

(making iterators = lazy dicts = stacks is not any harder; just have the value associated with key 0 = top-of-stack = next item to be iterated)

---

a JS proposal to add Observables:

https://github.com/zenparsing/es-observable

---

[4] contrasts callbacks vs promises vs async/await vs observables for async I/O in Javascript. My summary is in proj-plbook-plPartConcurrencyTodos.

The upshot for Oot is that Observables are generalizations of Promises which are OOP wrappers for one pattern of using callbacks, and that async/await is useful syntactic sugar for Promises and Observables. So therefore it seems like Oot should offer at least:

callbacks
first-class event streams (Observables)
- and (async) iterators should be these, or at least based on them
async/await-like syntax for Observables

as noted above, it would be nice if we could unify lists (and therefore dicts, b/c a list is just a graph) and async iterators and observables.

i'll have to think of an 'async/await-like syntax for Observables' which fits in with our ML-ish syntax. I'm guessing just using Capitalized Annotations, eg Async Await?

---