Bayle Shanks's website: proj-oot-ootConcurrencyNotes4

" Downey; The Little Book of Semaphores

Takes a topic that's normally one or two sections in an operating systems textbook and turns it into its own 300 page book. The book is a series of exercises, a bit like The Little Schemer, but with more exposition. It starts by explaining what semaphore is, and then has a series of exercises that builds up higher level concurrency primitives.

This book was very helpful when I first started to write threading/concurrency code. I subscribe to the Butler Lampson school of concurrency, which is to say that I prefer to have all the concurrency-related code stuffed into a black box that someone else writes. But sometimes you're stuck writing the black box, and if so, this book has a nice introduction to the style of thinking required to write maybe possibly not totally wrong concurrent code.

I wish someone would write a book in this style, but both lower level and higher level. I'd love to see exercises like this, but starting with instruction-level primitives for a couple different architectures with different memory models (say, x86 and Alpha) instead of semaphores. If I'm writing grungy low-level threading code today, I'm overwhelmingly like to be using c++11 threading primitives, so I'd like something that uses those instead of semaphores, which I might have used if I was writing threading code against the Win32 API. But since that book doesn't exist, this seems like the next best thing.

I've heard that Doug Lea's Concurrent Programming in Java is also quite good, but I've only taken a quick look at it. " -- http://danluu.com/programming-books

---

http://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio/ says that Python asyncio is too complicated but then also says:

"What landed in 3.5 (the actual new coroutine objects) is great. In particular with the changes that will come up there is a sensible base that I wish would have been in earlier versions. The entire mess with overloading generators to be coroutines was a mistake in my mind."

https://news.ycombinator.com/item?id=12831989 concurs that Python 3.5's concurrency is better than 3.4's

---

some comments in https://news.ycombinator.com/item?id=12829759 say that greenlets/greenthreads is better than asyncio. But https://news.ycombinator.com/item?id=12831989 and children point out that (a) greenthreads eg preemtive multitasking give you no guarantees about when control will be switched out, so you have to use locks all over the place, in comparison to cooperative multitasking, where, because you know when control might switch, you only have to worry about protecting shared data at those points, and (b) it's easier to do promises and cooperative multitasking across interoperability boundaries to libraries in other languages, in comparison to greenthreads.

---

note: Python 3's asyncio event loops even have TCP facilities:

https://docs.python.org/3/library/asyncio-eventloop.html#creating-connections https://docs.python.org/3/library/asyncio-protocol.html#asyncio.Protocol

---

https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/ points out issues with the following sort of system:

(asyncio-type) when you want to do some blocking action (such as streams-based network I/O in asyncio.start_server), instead of doing it, you schedule it to be done in the future, and then without yielding control you go right on in the same thread that scheduled the action and schedule some more blocking actions

Namely, the problems are that it's more difficult to reason about your code, because if the code says "f(); g();", but f() schedules things for the future, then the code reads as if f() has been completed before g() but actually the final effects of f() are not yet present at the time when g() begins executing.

Instead, e recommends systems like this:

(curio-type) when you want to do some blocking action (such as network I/O using curio.tcp_server), you schedule it to be done in the future, then don't return yet; yield, so that other unrelated threads can progress, but only return after the blocking action is done (he calls this "respecting casuality")

Futhermore, curio has the convention that stuff that doesn't do all of its work synchronously is always labeled with 'async', so that you know that calls into non-async curio library functions don't put off doing their work until later. By contrast, asyncio's stream-based network I/O functions, eg StreamWriter?.write, aren't labeled 'async' yet don't complete all of their work immediately.

Furthermore, "In asyncio, there are many different representations of logically concurrent threads of execution – loop.add_reader callbacks, asyncio.Task callstacks, Future callbacks, etc. In curio, there is only one kind of object that can represent a logical thread – curio.Task – and this allows us to handle them in a uniform way".

Futhermore, in curio you can configure a timeout using a context manager, and it will be propagated downwards.

Futhermore, e notes that whenever you have futures involved, you may want to cancel a task, but then what happens if that task is blocked awaiting some future -- do you cancel that future too? But if multiple tasks are using the same future and you cancel it, then by cancelling one task you might erroneously affect an unrelated task. E says that "...we can't avoid this by checking for multiple waiters when propagating cancellations from tasks->futures" but i didn't understand why not.

Furthermore, e notes that when callbacks are involved, frameworks can't (easily) statically projected the future path of control across callbacks, so it's harder for a framework to assist with things like 'finally' cleanup actions.

Furthermore, implicit dynamic thread-local context is harder to do across callbacks, because even if a framework/language provides a way to do this ordinarily, that way might not work across callbacks.

( " In a curio-style framework, the problem is almost trivial, because all code runs in the context of a Task, so we can store our task-local data there and immediately cover all uses cases. And if we want to propagate context to sub-tasks, then as described above, sub-task spawning goes through a single bottleneck inside the curio library, so this is also easy. I actually started writing a simple example here of how to implement this on curio to show how easy it was... but then I decided that probably it made more sense as a pull request, so now I don't have to argue that curio could easily support task-local storage, because it actually does! It took ~15 lines of code for the core functionality, and the rest is tests, comments, and glue to present a convenient threading.Local-style API on top; there's a concrete example to give a sense of what it looks like in action.

I also recommend this interesting review of async context propagation mechanisms written by two developers at Google. A somewhat irreverant but (I think) fair summary would be (a) Dart baked a solution into the language, so that works great, (b) in Go, Google just forces everyone to pass around explicit context objects everywhere as part of their style guide, and they have enough leverage that everyone mostly goes along with it, (c) in C# they have the same system I implemented in curio (as I learned after implementing it!) and it works great because no-one uses callbacks, but (d) context propagation in Javascript is an ongoing disaster because Javascript uses callbacks, and no-one can get all the third-party libraries to agree on a single context-passing solution... partly because even the core packages like node.js can't decide on one. " -- [1] )

" What makes a Python app "async/await-native"? Here's a first attempt at codifying them:

    An async/await-native application consists of a set of cooperative threads (a.k.a. Tasks), each of which consists of some metadata plus an async callstack. Furthermore, this set is complete: all code must run on one of these threads.
    These threads are supervised: it's guaranteed that every callstack will run to completion – either organically, or after the injection of a cancellation exception.
    Thread spawning is always explicit, not implicit.
    Each frame in our callstacks is a regular sync- or async-colored Python function, executing regular imperative code from top to bottom. This requires that both API primitives and higher-level functions *respect causality* whenever possible.
    Errors, including cancellation and timeouts, are signaled via exceptions, which propagate through Python's regular callstack unwinding.
    Resource cleanup and error-handling is managed via exception handlers (with or try)." -- [2]

Open questions:

" a common pattern I've run into is where I want to spawn several worker tasks that act like "part of" the parent task: if any of them raises an exception then all of them should be cancelled + the parent raise an exception; if the parent is cancelled then they should be cancelled too. We need ergonomic tools for handling these kinds of patterns robustly.

Fortunately, this is something that's easy to experiment with, and there's lots of inspiration we can draw from existing systems: Erlang certainly has some good ideas here. Or, curio makes much of the analogy between its event loop and an OS kernel; maybe there should be a way to let certain tasks sign up to act as PID 1 and catch failures in orphan tasks? " -- [3]

" Cleanup in generators and async generators ... the __del__ method. If we have a generator with some sort of cleanup code...and we iterate it, but stop before reaching the end...then eventually that finally block will be executed by the generator's __del__ method (see PEP 342 for details).

And if we think about how __del__ works, we realize: it's another sneaky, non-causal implicit-threading API! __del__ does not get executed in the context of the callstack that's using the generator – it happens at some arbitrary time and place...in a special context where exceptions are discarded ... Note: What about __del__ methods on other objects, besides generators? In theory they have the same problems, but (a) for most objects, like ints or whatever, we don't care when the object is collected, and (b) objects that do have non-trivial cleanup associated with them are mostly obvious "resources" like files or sockets or thread-pools, so it's easy to remember to stick them in a with block. Plus, when we write a class with a __del_ method we're usually very aware of what we're doing. Generators are special because they're just as easy to write as regular functions, and in some programming styles just as common. It's very very easy to throw a with or try inside some generator code and suddenly you've defined a __del__ method without even realizing it, and it feels like a function call, not the creation of a new resource type that needs managing. ... This one worries me, because it's basically the one remaining hole in the lovely interlocking set of rules described above – and here it's the Python language itself that's fighting us. For now, the only solution seems to be to make sure that you never, ever call a generator without explicitly pinning its lifetime with a with block...PEP 533 is one possible proposal for fixing this at the language level, by adding an explicit __iterclose__ method to the iterator protocol and adapting Python's iteration constructs like for accordingly."

For now, the only solution seems to be to make sure that you never, ever call a generator without explicitly pinning its lifetime with a with block. For synchronous generators, this looks like:

def some_sync_generator(path): with open(path) as ...: yield ...

DON'T do this for obj in some_sync_generator(path): ...
DO do this from contextlib import closing with closing(some_sync_generator(path)) as tmp: for obj in tmp: ...

And for async generators, this looks like:

async def some_async_generator(hostname, port): async with open_connection(hostname, port) as ...: yield ...

DON'T do this async for obj in some_async_generator(hostname, port): ...
DO do this class aclosing: def __init__(self, agen): self._agen = agen

    def __aenter__(self):
        return self._agen

    def __aclose__(self, *args):
        await self._agen.aclose()

async with aclosing(some_async_generator(hostname, port)) as tmp: async for obj in tmp: ...

It might be possible for curio to subvert the PEP 525 __del__ hooks to at least catch cases where async generators are accidentally used without with blocks and signal some kind of error.

PEP 533 is one possible proposal for fixing this at the language level, by adding an explicit __iterclose__ method to the iterator protocol and adapting Python's iteration constructs like for accordingly.

---

timeouts with dynamic context managers in Python curio:

Imposing a timeout on it from outside async def main(): sock = ... async with curio.timeout_after(60): # 1 minute await upload_big_file_over_http(sock) "

more ideas for timeout features:

https://github.com/dabeaz/curio/issues/82#issuecomment-257078638

---

" The fundamental problem here is that Futures often have a unique consumer but might have arbitrarily many, and that Futures are stuck half-way between being an abstraction representing communication and being an abstraction representing computation. The end result is that when a task is blocked on a Future, Task.cancel simply has no way to know whether that future should be considered to be "part of" the task. So it has to guess, and inevitably its guess will sometimes be wrong. (An interesting case where this could arise in real code would be two asyncio.Tasks that both call await writer.drain() on the same StreamWriter?; under the covers, they end up blocked on the same Future.) In curio, there are no Futures or callback chains, so this ambiguity never arises in the first place. "

---

" asyncio's global event loop fetching API is going to be reworked in 3.6 and backported to 3.5.3. If I understand correctly (which is not 100% certain, and I don't think the actual code has been written yet [edit2: here it is]), the new system will be: asyncio.get_event_loop(), instead of directly calling the currently-registered AbstractEventLoopPolicy?'s get_event_loop() method, will first check some thread-local global to see if a Task is currently executing, and if so it will immediately return the event loop associated with that Task (and otherwise it will continue to fall back on the AbstractEventLoopPolicy?. This means that inside async functions it should now be guaranteed (via somewhat indirect means) that asyncio.get_event_loop() gives you the same event loop that you'd get by doing an await. And, more importantly, since asyncio.get_event_loop() is what the callback-level APIs use to pick a default event loop when one isn't specified, this also means that async/await code should be able to safely use callback-layer functions without explicitly specifying an event loop, which is a neat improvement over my suggestion above. " -- [4]

---

steveklabnik 8 days ago [-]

Have you seen the tokio stuff in Rust? It's also an interesting take on the "spin up a zillion threads" problem that's not exactly green threads.

---

"The best of all worlds is to write threaded code in languages that structurally eliminate the majority of issues threads have. The downside is that the "languages that structurally tame threads" is a pretty short list, even now; Rust, Haskell, Erlang/Elixir,"

---

quotemstr 9 days ago [-]

The idea espoused in this blog post, that

> if you have N logical threads concurrently executing a routine with Y yield points, then there are NY possible execution orders that you have to hold in your head

is actively harmful to software maintainability. Concurrency problems don't disappear when you make your yield points explicit.

Look: in traditional multi-threaded programs, we protect shared data using locks. If you avoid explicit locks and instead rely on complete knowledge of all yield points (i.e., all possible execution orders) to ensure that data races do not happen, then you've just created a ticking time-bomb: as soon as you add a new yield point, you invalidate your safety assumptions.

Traditional lock-based preemptive multi-threaded code isn't susceptible to this problem: it already embeds maximally pessimistic assumptions about execution order, so adding a new preemption point cannot hurt anything.

Of course, you can use mutexes with explicit yield points too, but nobody does: the perception is that cooperative multitasking (or promises or whatever) frees you from having to worry about all that hard, nasty multi-threaded stuff you hated in your CS classes. But you haven't really escaped. Those dining philosophers are still there, and now they're angry.

The article claims that yield-based programming is easier because the fewer the total number of yield points, the less mental state a programmer needs to maintain. I don't think this argument is correct: in lock-based programming, we need to keep _zero_ preemption points in mind, because we assume every instruction is a yield point. Instead of thinking about NY program interleavings, we think about how many locks we hold. I bet we have fewer locks than you have yields.

To put it another way, the composition properties of locks are much saner than the composition properties of safety-through-controlling-yield.

I believe that we got multithreaded programming basically right a long time ago, and that improvement now rests on approaches like reducing mutable shared state, automated thread-safety analysis, and software transactional memory. Encouraging developers to sprinkle "async" and "await" everywhere is a step backward in performance, readability, and robustness.

...

quotemstr 9 days ago [-]

Those options aren't as distinct as you might imagine. Would calling it fiber-per-request make you happy?

(By the way: most of the time, a plain-old-boring thread-per-request is just fine, because most of the time, you're not writing high-scale software. If you have at most two dozen concurrent tasks, you're wasting your time worrying about the overhead of plain old pthread_t.)

I'm using a much more expansive definition of "thread" than you are. Sure, in the right situation, maybe M:N threading, or full green threads, or whatever is the right implementation strategy. There's no reason that green threading has to involve the use of explicit "async" and "await" keywords, and it's these keywords that I consider silly.

vomjom 9 days ago [-]

(I agree that thread-per-request works just fine in the majority of cases, but it's still worthwhile to write about the cases where it doesn't work.)

Responding to your original post: you argue that async/await intends to solve the problem of data races. That's not why people use it, nor does it tackle that problem at all (you still need locks around shared data).

It only tries to solve the issue of highly-concurrent servers, where requests are bound by some resource that a request-handling threads have to wait for the result of (typically I/O).

Coroutines/fibers are not an alternative to async servers, because they need primitives that are either baked into the language or the OS itself to work well.

(bayle: i don't understand the next 3 posts very well:)

gpderetta 8 days ago [-]

Coroutines/fibers are completely orthogonal to async anything. The OP is arguing against poor-man coroutines, aka stackless coroutines aka top-level yield only, which are significantly less expressive and composable than proper stackfull coroutines (i.e. first class one shot continuations).

An alleged benefit of stackless coroutines is that yield point are explicit, so you know when your state can change. The OP is arguing that this is not really a benefit because it yield to fragile code. I happen to strongly agree.

barrkel 8 days ago [-]

Green threads / coroutines / fibers are isomorphic with async keyword transparently implemented as a continuation passing style transform, which is how async callbacks usually work. Actual CPU-style stacks in a green thread scenario are nested closure activation records in an explicit continuation passing style scenario, and are implicit closure activation records (but look like stacks) when using an 'async' compiler-implemented CPS.

Properly composed awaits (where each function entered is entered via an await) build a linked list of activation records in the continuations as they drill down. This linked list is the same as the stack (i.e. serves the same purpose and contains the same data in slightly different layout) in a green threads scenario.

What makes all these things different is how much they expose the underlying mechanics, and the metaphors they use in that exposition. But they're not orthogonal.

(If you meant 'async' as in async IO explicitly, rather than the async / await keyword with CPS transform as implemented in C#, Python, Javascript, etc., then apologies.)

gpderetta 8 days ago [-]

I do mean async as in generic async IO.

As you said, you can of course recover stackful behaviour by using yield/await/async/wathever at every level of the call stack, but in addition to being a performance pitfall (you are in practice heap allocating each frame separately and yield is now O(N): your iterpreter/compiler/jit will need to work hard to remove the abstraction overhead), it leads to the green/red function problem.

cderwin 9 days ago [-]

Please correct me if I'm wrong, but doesn't asyncio in the form of async/await (or any other ways to explicitly denote context switches) solve the problem of data races in that per-thread data structures can be operated on atomically by different coroutines? My understanding is that unless data structures are shared with another thread, you don't usually need locks for shared data.

omribahumi 9 days ago [-]

I think that the biggest argument against it is code changes. Think about a code change that adds an additional yield point without proper locking.

Has any language tackled this with lazy locking? i.e. lock only on yield. Maybe this could even be done in compile time

lmm 9 days ago [-]

> Look: in traditional multi-threaded programs, we protect shared data using locks. If you avoid explicit locks and instead rely on complete knowledge of all yield points (i.e., all possible execution orders) to ensure that data races do not happen, then you've just created a ticking time-bomb: as soon as you add a new yield point, you invalidate your safety assumptions. > Traditional lock-based preemptive multi-threaded code isn't susceptible to this problem: it already embeds maximally pessimistic assumptions about execution order, so adding a new preemption point cannot hurt anything.

You get an equal and opposite problem: whenever you add one more lock, you invalidate your liveness assumptions.

hueving 8 days ago [-]

> as soon as you add a new yield point, you invalidate your safety assumptions.

While true, locks aren't free from this problem. They have the inverse. If someone adds code that accesses a data structure that should be protected by a lock and they forget to add the lock, you also lose all of your safety assumptions.

---

Animats 8 days ago [-]

Part of the problem is that object-oriented programming is now out of fashion. If objects only allow one active thread inside the object at a time, you have a conceptual model of how to deal with concurrency. Rust takes this route, and Java has "synchronized". It's done formally, with object invariants, in Spec#. Objects in C++ are often used this way in multi-thread programs.

If you don't have some organized way of managing concurrency, you're going to have problems. Without OOP, what? "Critical sections" lock relative to the code, not the data. "Which lock covers what data?" is a big issue, and the cause of many race conditions.

(The dislike of OOP seems to stem from the problems of getting objects into and out of databases in web services. One anti-OOP article suggests stored procedures as an alternative. Many database-oriented programs effectively use the database as their concurrency management tool. Nothing wrong with that, but it doesn't help if your problem isn't database driven.)

Python has the threading model of C - no language constructs for threads. It's all done in libraries. There's no protection against race conditions in user code. The underlying memory model is protected, by making operations that could break the memory model atomic, but that's all. CPython also has some major thread performance problems due to the Global Interpreter Lock. Having more CPUs doesn't speed things up; it makes programs slower, due to lock contention inefficiencies. So the use of real threads is discouraged in Python.

There's a suggested workaround with the "multiprocessing" module. This creates ultra-heavyweight threads, with a process for each thread, and talks to them with inefficient message passing. It's used mostly to run other programs from Python programs, and doesn't scale well.

So Python needed something to be competitive. There are armies of Javascript programmers with no experience in locking, but familiarity with a callback model. This seems to be the source of the push to put it in Python. Like many language retrofits, it's painful.

Does this imply that the major libraries will all have to be overhauled to make them async-compatible?

zzzeek 7 days ago [-]

> Does this imply that the major libraries will all have to be overhauled to make them async-compatible?

well, "have to" implies that the community accepts this system as the One True Way to program. Which is why I like to point out that this is unwarranted (but yes, because the explicit async model is what I like to call "viral", in that anything that calls async IO must itself be async, so must the caller of that method be async, and turtles all the way out, it means an enormous amount of code has to be thrown out and written in the explicit async style which also adds significant function call / generator overhead to everything - it's basically a disaster).

It's very interesting that you refer to database driven programming as the reason OOP is out of fashion, since IMO one of the biggest misconceptions about async programming is that it is at all appropriate for communciation with a locally available relational database. I wrote in depth on this topic here: http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an... with the goal of the post being, this is the one time I'm going to have to talk about async :)

---

so someday i gotta decide:

do we want async/await by default?

and, do we want syntax for async/await so that we aren't always typing 'async/await'?

---

tschellenbach 9 days ago [-]

Are there any languages that have really nailed this? I've used gevent, eventlet, (both python), promises, callbacks (node) and none of them come close to being as productive as synchronous code.

I'd like to try out Akka and Elixer in the future.

ezyang 9 days ago [-]

I like to tell people that the killer app for Haskell is writing IO bound, asynchronous code. The secret weapon is do-notation, which lets you write code as if it were sequential, but have it desugar into what is (essentially) a series of chained callbacks.

I like to point at Facebook's use of Haskell as a good example of being successful in this space http://community.haskell.org/~simonmar/papers/haxl-icfp14.pd... It would be disingenuous to suggest that Haskell is good in all situations, but if there was one place where it should be used, this it.

...

ezyang 9 days ago [-]

...

The second is that do-notation is distinct from the IO monad. Even if Haskell didn't have green threads in the runtime, I could still write an async/callback library that looked just as natural as sequential code. Why? It has nothing to do with the IO monad: it has to do with the fact that "do x <- e; m" desugars to, in JavaScript? notation, bind(e, function(x) { m }); it's been "callbackified automatically".

...

marcosdumay 8 days ago [-]

Haskell does not even allow one to write sequential code.

The IO monad enforces sequence on IO operations, and when you fork it, you get a new, independent sequence of IO operations to play with, not a new thread.

Haskell is really great for concurrent programming. Not only because of green threads (the mainstream concept that is nearest to the IO monad), but because of the "everything is immutable" rule, and very powerful primitives available.

...

runeks 8 days ago [-]

I would argue that the Haskell language itself, through lazy evaluation, basically has built-in async/await support. Due to lazy evaluation, everything is a async/await - every time an expression is evaluated. In Python, you pass values around. In Haskell, you pass around descriptions of how to fetch a particular value, and then the runtime system makes sure it happens when/if it needs to.

It's a bit like Excel. Every cell is a variable that contains an expression, which defines what this cell evaluates to. With that description in hand, it's a simple matter of not evaluating cells that are not in view, and marking an exception in the evaluation with #######. If it were Python, each cell could contain code that modifies other cells, and it would be impossible to make sense of anything.

retrogradeorbit 9 days ago [-]

Erlang (and by extension Elixir and LFE) has "nailed" it by making the actor pattern first class. Go's channels are great, but Go itself is quite low level. Also you should checkout Clojure's core.async to see what improved channel constructs on top of a high level, lock-free, multithreaded language core looks like.

Part of the problem with Python ecosystem is the insular mind set of its proponents. Python fanboys have no interest in going and seeing whats on the other side. So the platform has become a bit of an echo chamber with Pythonistas declaring their clunky approaches the industry best.

You can see this by looking at how little love a CSP solution for python gets [5] verses the enormous buy-in it's more popular frameworks receive.

venantius 8 days ago [-]

core.async is using locks under the hood - it's just hiding that from you as an implementation detail.

retrogradeorbit 8 days ago [-]

How is it possible then that core.async works on javascript platform, a platform that has no mutexes?

Maybe there is a lock to implement the thread macro (clojure only), but then that uses native threads. How would you propose to handle access to channels between native threads without locks?

As far as I know there is no locking performed in asynchronous code implemented using the go macro. The go macro is a macro that turns your code inside out into a state machine, is it not? Each <! and >! point becomes and entry/exit into that state machine. There are no locks here because the go macro can essentially "rewrite" your source code for you and there is only a single thread of execution through the interconnected state machines.

Matthias247 9 days ago [-]

I've written quite a lot of concurrent code through the last years (network servers, protocol, ...) and overall I now like Go most.

The biggest reason for this is not that necessarily that I think it has absolutely the best concurrency model, but that it's the most consistent one. Nearly all libraries are written for the model, which means they assume multithreaded access, blocking IO (reads/writes) and no callbacks. As a result most libraries are interoperable without problems.

Erlang/Elixir should have similar properties - however I haven't used it.

Javascript has a similar property because at least everything assumes the singlethreaded environment and concurrency through callbacks (or abstraction of them like promises and async/await on promises). I also like the interoperability and predictability here. But sometimes nested callbacks (even with promises) lead to quite a big of ugly code. And calling "async methods" is not possible from "sync methods" without converting them to async first (which could mean some big refactoring). So I prefer the Go style in general.

The worst thing from my point of view are all the languages that do not have a standard concurrency model, e.g. C++, Java, C#, and according to this article also Python. Most of them have several libraries for (async) IO which can be beautiful by themselves but won't integrate into remaining parts of the application without lots of glue code. E.g. boost asio is nice, but you need a thread with an EventLoop?. If your main thread is already built around QT/gtk you now need another thread and then have 2 eventloops which need to interact. Some question for Java frameworks, e.g. integrating a Netty EventLoop? in another environment (Android, ...). In these languages we then often get libraries which are not generic for the whole language but specific to a parent IO library (works with asio, works with asyncio, ...) and thereby some fragmented ecosystems.

A standard question that also always arises in these "mixed-threaded" languages when you have an API which takes a callback is: From which thread will this callback be invoked? And if I cancel the operation from a thread, will it guarantee that the callback is not invoked. If you don't think about these you are often already in bug/race-condition land.

quotemstr 9 days ago [-]

C++? Java? Python? The traditional thread model isn't bad merely because it's traditional. I much prefer it to promise hell and to async-everything. About the only thing that beats it is CSP, which you can also represent sequentially without funky new keywords and which you can implement as a library for C++, Java, or Python.

I never understood why people tout Go's goroutine feature so much. You can have it in literally any systems language.

reality_czech 9 days ago [-]

The whole point of Golang is that every library and every project that uses Go will support coroutines and channels. Sure you can write a toy project in a language like C that has these concepts, but your toy library will effectively be usable with all of the other libraries that have ever been written for C. Any library that calls a blocking function will break your coroutine abstraction.

It's like saying that indoor plumbing is no big deal-- it's just liquid moving through a pipe. Well yes. Yes, it is. But if you don't have plumbing in your neighborhood, or a sewage treatment plant in your city, you can't fake it by fooling around in your garage. And frankly, it's not going to smell like a rose.

btrask 9 days ago [-]

I wrote such a library in C[1] and in practice it's been no problem. Most libraries that do IO provide hooks (for example I made SQLite fully async[2], with no changes to their code). For cases where that isn't possible (or desirable), there's also an easy way to temporarily move the entire fiber to a thread pool.[3] That's actually much faster than moving back and forth for every call (which is what AIO emulation normally entails).

[1] https://github.com/btrask/libasync [2] https://github.com/btrask/stronglink/blob/master/res/async_s... [3] https://github.com/btrask/libasync/blob/master/src/async.h#L...

Disclaimer: not production ready, for most values of "production"

Edit: stacks don't grow dynamically, of course. But that's also a problem in Go if you want to efficiently call C libraries. If you really need efficiency, you can use raw callbacks for that particular section.

int_19h 9 days ago [-]

> The whole point of Golang is that every library and every project that uses Go will support coroutines and channels.

Of course, this also means that Go is making it hard for its libraries to be used by other languages. So it's probably a bad candidate to write something like a cross-plat UI toolkit, if you hope for its wide use.

In contrast, threads and callbacks are both well-supported in existing languages; so if you write a library in C using either, pretty much any language will be able to consume it.

LukeShu? 9 days ago [-]

(I'm not terribly familiar with Python's threading, so I'm not going to talk about it)

I never understood why people tout Go's goroutine feature so much. You can have it in literally any systems language.

There are two big reasons for it.

Firstly, goroutines are extremely lightweight. "Traditional" threading in C, C++, and Java means native OS threads, which are comparatively expensive. Sure, fiber/coroutine libraries exist for these languages, but they are far from common (and, the only fiber library for Java that I know of, Quasar, came after Go).

Secondly, Go's ecosystem encourages CSP-style message-passing, rather than "traditional" memory-sharing. This is channels, not goroutines, but they make working with goroutines very nice. This is less concrete than the first reason; you certainly can implement message-passing in any of the other languages' threading styles. But empirically, it doesn't happen as often. A factor in this is also that, unfortunately, many CS curricula don't discuss CSP, which means that Go's use of this is the first exposure many programmers have to it.

quotemstr 9 days ago [-]

> But empirically, it doesn't happen as often.

It's sad that people use choice-of-language as a proxy for choice-of-execution-strategy (interpreted? JITed?), choice-of-allocation-strategy, choice-of-linking-strategy, choice-of-packaging, and so on. All of these factors should be orthogonal. By linking them, we create a lot of inefficiency by fragmenting our efforts.

AFAICT, C++ is the only language that's really been successful at being multi-paradigm.

jerf 8 days ago [-]

C++ still drives you very strongly in certain directions for the things you mentioned.

Languages have to hand you very strong default choices for those things, because only the people with the hardest problems and the most time to solve them can afford to pick up a toolbox-box and build their own toolbox to solve a problem. Even the languages that arguably want to be that low of a level like Rust or D still have to offer a much more batteries-included standard library that will make more of those choices for you, and which will be for the vast majority of users the "real" version of that language.

lmm 9 days ago [-]

I use Scala without Akka. Just straightforward Futures and for/yield. It's great: the distinction between "=" and "<-" is minimal overhead when writing, but enough to be visible when reading code. You have to learn the tools for gathering effects (e.g. "traverse"), but you only have to learn them once (and they're generic for any effect, rather than being specific to Future, you can use the exact same functions to do error handling, audit logs and the like).

mi100hael 8 days ago [-]

After using Akka-HTTP, I never want to write a HTTP service with anything else.

lmm 8 days ago [-]

akka-http is nice. akka-actor (i.e. the project that was originally called "akka") is awful. The name overlap is unfortunate.

jscholes 7 days ago [-]

In your opinion, what's wrong with akka-actor?

lmm 6 days ago [-]

It sacrifices type safety without offering enough value to be worth that - especially given that the model also eliminates useful stack traces. It forces non-distributed applications to pay the price of an API designed for distributed ones. Its FSM model doesn't offer the conciseness it should.

jackweirdy 9 days ago [-]

How do you define "Productive"?

Aside from that, personally I've used both Akka and plain Scala with Futures, as well as node with Promises, bare callbacks and async (though I've not tried fibers). I find Promises and Futures are the perfect balance between simplicity of use and the benefits of using the Async model. There's no need to reason about threads, as they abstract away the actual async implementation, and the interface they expose is very easy to reason about.

dhd415 8 days ago [-]

I'm surprised there aren't more mentions of Tasks in C# or F# on the .NET platform as examples of asynchronicity done well.

From the perspective of uniformity and availability, while C# provided asynchronicity via callbacks before the introduction of Tasks in the 4.5 release of the .NET Framework, all the core libraries that used callback-style async (as well as some that had been strictly synchronous-only) were updated with Task-based overloads, so there are no problems with Task-based async being inconsistently available. Additionally, adoption of Task-based async in third-party libraries has been high, so it's relatively uncommon to encounter code that does not support it.

From the perspective of code productivity, it's hard to get much better than simply adding the async and await keywords where necessary. As a very simple example, consider a typical server application that receives requests via HTTP, processes them via an HTTP call to another service as well as a database call, and then returns an HTTP response. The sync code (blocking with a thread-per-request model) might look something like this:

    void handleRequest(HttpRequest request) {
        var serviceResult = makeServiceCallForRequest(request);
        var databaseResult = makeDatabaseCallForRequest(request);
        sendResponse(constructResponse(request, serviceResult, databaseResult));
    }

In order to make that same process async (non-blocking with a dynamically-sized thread pool handling all requests), the code would look like this:

    async Task handleRequestAsync(HttpRequest request) {
        var serviceResult = await makeServiceCallForRequestAsync(request);
        var databaseResult = await makeDatabaseCallForRequestAsync(request);
        await sendResponseAsync(constructResponse(request, serviceResult, databaseResult));
    }

It could even be taken one step further to make the service request and database call concurrently if there were no dependencies between the two which would reduce processing latency for individual requests:

    async Task handleRequestAsync(HttpRequest request) {
        var serviceResultTask = makeServiceCallForRequestAsync(request);
        var databaseResultTask = makeDatabaseCallForRequestAsync(request);
        await sendResponseAsync(constructResponse(request, await serviceResultTask, await databaseResultTask));
    }

I've added asynchronicity into a C# server application as above with substantial improvements in both individual request latency and overall scalability. I'm now working on a Java8 system and bemoaning the comparatively primitive and inconsistent async capabilities in Java8.

HeyImAlex? 9 days ago [-]

Writing concurrent code in go takes a lot less thinking than js. Or... a different kind of thinking? But holistically I greatly prefer it for complex asynchronous code.

Lack of generics on channels really hurts the library ecosystem though. Many things you need to write yourself.

daurnimator 9 days ago [-]

Try lua with cqueues http://25thandclement.com/~william/projects/cqueues.html

qznc 8 days ago [-]

Concurrent ML according to Andy Wingo. He recently wrote a good series on concurrency in programming languages: https://wingolog.org/tags/concurrency

mentat 9 days ago [-]

goroutines with channels are well loved for concurrency in Go

junke 8 days ago [-]

For Common Lisp, see lparallel and lfarm.

https://lparallel.org/overview/

xamlhacker 9 days ago [-]

Try await/async in F#.

RX14 8 days ago [-]

Seriously check out crystal. Go's goroutines seem to do quite well, and crystal is pretty close to go in terms of concurrency, but is a higher-level language overall.

---

my summary on what the Midori/Joe Duffy stuff that i've read in the past few days (see duffyRelatedNotes) means for us:

not too different from what i'd previously imagined
message-passing, channels, actors
Duffy likes their Task<T> abstraction (i think it's this: http://wayback.archive.org/web/20160502075105/https://msdn.microsoft.com/en-us/library/dd321424(v=vs.110).aspx ), and the data parallel stuff with it: http://wayback.archive.org/web/20160614021041/https://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel(v=vs.110).aspx
data parallelism (and/or concurrent collections); parallel for, foreach/map, filter, reduce (aggregations)
type system annotations
- immutable, noalias, readonly, mutable (i guess the default would be 'immutable' for us, although he suggests immutable for primitive/atomic data types and 'readonly' for composite data types), 'init' (currently being initialized)
a noalias "graph has a single “in-reference” to the root object in the object graph, and no “out-references” (except for immutable object references, which are permitted).". Although i guess i would permit 'noalias' to contain other 'noalias' references, too.
- Casting between these type attributes (except for 'init', which is weird): you can turn a noalias into anything (consuming the noalias, though; 'move semantics'), and you can turn anything into a readonly
- any expression consuming only "isolated and/or immutable inputs and evaluating to a readonly type was implicitly upgradeable to immutable"
- because of the previous bullet point, and because you can turn a noalias into anything, "making new isolated and immutable things was straightforward using ordinary expressions"
- should be some way to turn something back into a noalias at the end of a function or task, if it really is
- when the rest of the thread is blocked, things that are only aliased by other stuff in this thread should be treated as noalias
- query functions cannot mutate anything (so library-provided queries are readonly, and lambda functions passed into queries are readonly (i think?))
- a function must itself be noalias in order to be allowed to mess with noalias data (i think?)
- he calls these permissions, and thinks of them as somewhat like security permissions (and sometimes thinks about capability security permissions, although i don't think these are that, because the permissions aren't first-class objects that you can pass around by themselves afaict)
- 'immutable' should be 'deep' by default, not shallow
- in generics, type system annotations can be at the top level or the level of captured types, eg "mutable List<int>" vs "List<mutable int>". Just 'shallow' vs 'deep' isn't expressive enough.
- parameterization over these annotations
- side-effect annotations, eg "eliminated all ambient side-effectful operations, leveraging capability objects instead. This obviously covered I/O operations – all of which were asynchronous RPC in our system – but also even – somewhat radically – meant that even just getting the current time, or generating a random number, required a capability object. This let us model side-effects in a way the type-system could see, in addition to reaping the other benefits of capabilities."."
- annotate a method to indicate what it does to the 'self' object
- in order to call a method that mutates 'self', your reference to the object must be mutable (eg you can't call mutating methods on a readonly object)
- annotate a lambda function to indicate what it does to stuff in its closure; eg an 'immutable' annotation prevents it from mutating anything in its closure. He gives the example of an interface defining the type of lambda functions intended to be passed into data-parallel stuff like parallel 'map', where you don't want the lambda function to be mutating stuff in its closure (because you may be applying it to each element in the collection simultaneously; so it shouldn't rely on having state that persists over calls) (note: the 'map' example shows that we may want to use our 'state domain mask' stuff here, eg it's okay for our supposedly-immutable-closure lambda function to use caches, right?)
- 'init' is a special case: init "meant fields weren’t yet guaranteed to be assigned to, non-nullability hadn’t yet been established, and that the reference could not convert to the so-called “top” permission, readonly. Any constructor got this permission by default and you couldn’t override it. We also automatically used init in select areas where it made the language work more seamlessly, like in object initializers. This had one unfortunate consequence: by default, you couldn’t invoke other instance methods from inside a constructor. (To be honest, this was actually a plus in my opinion, since it meant you couldn’t suffer from partially constructed objects, couldn’t accidentally invoke virtuals from a constructor, and so on.) In most cases, this was trivially worked around. However, for those cases where you really needed to call an instance method from a constructor, we let you mark methods as init and they would take on that permission." and "`init` was indeed treated like `mutable` in the sense that you could write to it, however you could not convert one to `readonly`. There were other restrictions to prevent you from storing `this` into it, reading it out, and then aliasing *that*, for example."
- at some points the compiler will have to reason about stuff like 'okay, here are two lambda functions with readonly access to array 'arr', but it is aliased and other stuff elsewhere might write to it via the alias, BUT for the duration of this data-parallel loop those other things can't execute because that other stuff is in other tasks in the same thread and those are blocked right now, so therefore there won't be any race conditions if both of these lambda functions want to read from 'arr' during the loop
- might want to use separation logic for the compiler to reason about immutable subgraphs of data; see link to concurrent version of separation logic, and also link to Midori's 'view' (variant of separation logic?): Separation Logic... "is a formal technique for proving the disjointness of different parts of the heap" [6]. Concurrent form: http://www.cs.cmu.edu/~brookes/papers/seplogicrevisedfinal.pdf . Midori team's proof technique: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/views.pdf
promises, async/await
STM (transactions)
- unbounded (eg no limits on how many things they can access per transaction or anything like that)
- when an exception occurs in a transaction keep a copy of the stack trace, but rollback the transaction
- the compiler, the debugger, and the GC (to keep around before- and after- versions of variables during a transaction, in case they must be rolled back) must know about transactions
- "Optimistic reads mandate that there is a version number for each location somewhere, and pessimistic writes mandate that there is a lock for each location somewhere."
- what Duffy calls Harris-style implementation policy: "optimistic reads, in-place pessimistic writes, and automatic retry."
- "You’re in one atomic block and then enter into another one. What happens if that inner transaction commits or rolls back, before the fate of the outer transaction is known? Intuition guided us to the following answer: If the inner transaction rolls back, the outer transaction does not necessarily do so. However, no matter what the outer transaction does, the effects of the inner will not be seen. If the inner transaction commits, the effects remain isolated in the outer transaction. It “commits into” the outer transaction, we took to saying. Only if the outer transaction subsequently commits will the inner’s effects be visible; if it rolls back, they are not."
- "Some TM variants also provide for “condition variable”-like facilities for coordination among threads. I think Haskell was the first such TM to provide a ‘retry’ and ‘orElse’ capability. When a ‘retry’ is encountered, the current transaction is rolled back, and restarted once the condition being sought becomes true. How does the TM subsystem know when that might be? This is an implementation detail, but one obvious choice is to monitor the reads that occurred leading up to the ‘retry’ – those involved in the evaluation of the predicate – and once any of them changes, to reschedule that transaction to run. Of course, it will reevaluate the predicate and, if it has become false, the transaction will ‘retry’ again."
- "The ‘orElse’ feature is a bit less obvious, though still rather useful. It enables choice among multiple transactions, each of which may end up issuing a ‘retry’. I don’t think I’ve seen it in any TMs except for ours and Haskell’s. To illustrate, imagine we’ve got 3 blocking queues like the one above. Now imagine we’d like to take from the first of those three that becomes non-empty. ‘orElse’ makes this simple: BlockingQueue? bq1 = ..., bq2 = ..., bq3 = ...; atomic {object obj = orElse {bq1.TakeOne?(), bq2.TakeOne?(), bq3.TakeOne?()};}
- "I waved my hands a bit above perhaps without you even knowing it. When I talk about optimistic, pessimistic, and automatic retry, I am baking in a whole lot of policy. ...There was a similar problem when deciding to back off outer layers of nesting, and in fact this becomes more complicated when deadlocks are involved. Imagine:

atomic { atomic { x++; y++; atomic { atomic { y++; x++; } } } }

This deadlock-prone example is tricky because rolling back the inner-most transactions won’t be sufficient to break the deadlock that may occur. Instead the TM policy manager needs to detect that multiple levels of nesting are involved and must be blown away in order to unstick forward progress. " "Disillusionment Part I: the I/O Problem...what about a read or write from a single block or entire file on the filesystem? Or output to the console? Or an entry in the Event Log? What about a web service call across the LAN? Allocation of some native memory?...The answer seemed clear. At least in theory. The transaction literature, including Reuter and Gray’s classic, had a simple answer for such things: on-commit and on-rollback actions, to perform or compensate the logical action being performed, respectively. (Around this same time, the Haskell folks were creating just this capability in their STM, where ‘onCommit’ and ‘onRollback’ would take arbitrary lambdas to execute the said operation at the proper time.)

no-tear memory manipulation (eg the language ensures that updates to a data structure are atomic at the level of, eg, updating the length of a string and updating the string, or updating all bytes of an 8-byte float). Multi-word slice and interface representations like Go has should just work, without worrying about races due to tearing
- and atomically initialize objects
think about ACID
think about commuting updates to shared data structures
think about fork/join
at the low level, locks, barriers, threads, condition variables, events, and lock-free algorithms, thread-pool with work-stealing scheduler (?),
unified resource management within a process
at the low level, of course we need memory fences, but i don't think we'll expose this to oot programmers
sequential consistency, and "single-threaded programming models connected through message passing, optionally with provably race-free fine-grained parallelism inside of those single-threaded worlds.""
think about model checking. Mb look into MS Zing, MS's "P" language.
effect typing
- paper: Type and Effect Systems.
at first he tried to do this using custom extensions rather than messing with the core type system and the compiler, but that failed, because stuff needs to be "deeply integrated into the compiler and its type system" in order to be "usable" and because "I should note that I experimented with the attribute approach before Roslyn analyzers came on the scene, so I was left with post-build IL analysis. The diagnostics for this were pretty terrible: a different tool, error messages at too low a semantic level, ..., all the usual problems, most of which analyzers solve. There are places you simply can't place attributes today. It's annoying that this is the limiting factor because it's so "easy" to fix conceptually. However, it's more subtle than it appears, due to the absence of attribute "slots" in various places in the MSIL metadata. (Local variables and generic instantiations are the ones you are left wanting most.) There's a proposal up on GitHub? to solve this: https://github.com/dotnet/rosl.... If that was done, we could get much closer."
stuff can be either mutable or aliased, but not both (without locking etc)
he doesn't like the syntax G<%P> for generic permission attributes: "Unfortunately, I originally picked the % character to indicate that a generic type was a permission, which was quite odd; e.g., G<%P> versus G<T>. We renamed this to permission; e.g., G<permission P> versus G<T>."
generic permissions were used frequently and this made code too verbose, but syntactic sugar could be confusing: "Generic permissions were needed in way more places than we expected, like most property getters. We experimented with various “shortcuts” in an attempt to avoid developers needing to know about generic permissions. This hatched the readable+ annotation, which was a shortcut for “flow the this parameter’s permission.” This concept never really left the system, although (as we will see shortly), we fixed generics and eventually this concept became much easier to swallow, syntax-wise (especially with smart defaults like auto-properties)."
in the end, they tried to bring Midori into C#, but they couldn't make it work (given the need to preserve much backwards compatibility in C#)
async/await
whenever you block, you are paused, and other coroutines get to execute; you can never block and assume that no one else is running
variables that are accessible within transactions maybe should have a special type, and can ONLY be accessed in transactions (or at least, out-of-transaction accesses are syntactic sugar for a one-statement transaction)
think about TOCTOU bugs
zero-copy of immutable and noalias objects over message-passing
something can be noalias AND immutable/readonly; noalias is 'ownership' and readonly is 'permission' (and immutable is permission+knowledge about others' permissions)
globals must be immutable (?)
think about: "It’s interesting to note that we were noodling on this around the same time Node.js was under development. The core idea of an asynchronous, non-blocking, single process-wide event loop, is remarkably similar. Perhaps something tasty was in the water during 2007-2009. In truth, many of these traits are common to event-loop concurrency."
for message-passing in .NET, Duffy recommends checking out Orleans https://github.com/dotnet/orleans – built in part by some ex-Midori members – TPL Dataflow https://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx , and Akka.NET http://getakka.net/
strongly-typed message passing protocols; typed channels: "...each channel to have a user-defined protocol for communication. The protocol is defined as a state machine.[5] The data sent over a channel can be optionally restricted to conform to a certain pre-defined schema. The compiler and runtime will enforce the conformance with the schema.[6] Under the hood, a schema is translated into a serializable ... class that contains only properties and side effect-free methods.[7]"
- make sure every possible message in the protocol has a handler (or a CASE statement or whatever)
- perhaps mandate that all previous messages (perhaps only from the same source) are received before any message is sent? This helps with backpressure.. i'm skeptical though, it seems like there have gotta be some 'server'-like use cases that this wouldn't work for
consider: "Singularity also addresses robustness by seriously limiting what processes can do. Singularity processes are sealed. They cannot load libraries dynamically, modify their own code, or share memory with other processes. Some serious limitations: just-in-time compilation is impossible in sealed processes, for example."
bounds checks, null pointer checks
consider an 'exchange heap' which would be the only place where inter-thread shareable data could be stored?
consider limiting intra-thread shared data to non-recursive data structures, to make garbage collection by reference counting possible there. Also, everything in the exchange heap would be noalias.
- could this help us reduce the need for stop-the-world-across-all-threads garbage collection pauses?
for Consistency, a way to specify assertions that should run at the end of each transaction, and roll it back if they aren't met. Duffy suggests checking out "Eiffel- or Spec#-style contracts and assertions like those in .NET 4.0, run at the end of each transaction, and you’re well on your way to verified consistency also. The ‘check E’ work in Haskell was right along these lines.)"
to deal with Duffy's "Disillusionment Part II: Weak or Strong Atomicity?", where a variable that is being modified within a transaction is also modified outside of a transaction, any variable accessable (transactionally) within a transaction cannot be accessed outside of a transaction at all; any attempt to do so is considered syntactic sugar for an implicit transaction. Duffy says, "This had the unfortunate consequence of making reusable data structures less, well, reusable. Collections for example presumably need to be usable from within and outside transactions alike." I think we can fix this since we're designing the language from scratch and don't care about performance.
to deal with Duffy's "Disillusionment Part III: the Privatization Problem", where intra-transaction writes are visible outside the transaction before it commits, just have your own copy of variables and have all writes only affect this copy, and only when the transaction commits do you update shared memory. Duffy says, "This, however, led to still other problems, like the granular loss of atomicity problem. Depending on the granularity of your buffered writes – we chose object-level – you can end up with false sharing of memory locations between transactional and non-transactional code. Imagine you update two separate fields of an object from within and outside a transaction, respectively, concurrently. Is this legal? Perhaps not. The transaction may bundle state updates to the whole object, rather than just one field." Since we've already decided that transactional variables can only be accessed within a transaction, i think we're good here.

---

so, there are lots of different paradigms and constructs in concurrency. There seem to be a few ways to organize them:

different ways to organize thinking on concurrency:

process vs subprocess (two examples of subprocess: tasks; data parallelism)
by ACID letter
deterministic vs non-
by level of abstraction (eg close to the metal vs. higher-level language stuff)
- eg regarding locks: "And if you care about performance, you are also going to need to think about hardware esoterica such as CMPXCHG, spin waiting, cache contention, optimistic techniques with version counters and memory models, ABA, and so on." -- http://joeduffyblog.com/2010/01/03/a-brief-retrospective-on-transactional-memory/
data-parallel vs not

---

"wasn't it hans-boehm, who famously posited (?) that concurrency needs to be part of the language, rather than bolted on after the fact ? ... http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf " -- [7]

summary of http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf :

The article argues that implementing concurrency as a library, eg pthreads, is flawed because:

1) "whether or not a race exists depends on the semantics of the programming language"

eg consider:

if (x == 1) ++y;
if (y == 1) ++x;

is there a race?

Upon first glance, no, the only outcome of this program is 0,0

But if the language semantics specifies that "our compiler is allowed to transform sequential code not containing calls to pthread operations in any way that preserves sequential correctness", then this could be transformed to:

++y; if (x != 1) --y;
++x; if (y != 1) --x;

in which case, yes, there is a race.

2) storing multiple variables into "one" memory location. For example, a bitfield; you might implement writing to bits in the bitfield by rewriting a whole word, but this would mean that two threads that are writing to different bits in the bitfield (different variables) might still cause a race

3) storing variables in registers; the compiler might decide to store some variables in registers, which may lead to concurrency bugs if the compiler moves data to and from shared memory (out of and into the corersponding register) at the wrong time

So, the compiler needs to be aware of concurrency; it needs to know that, eg, it can't reorder memory accesses around calls to "pthread_mutex_lock"

---

Cancelable Promises: Why was this proposal withdrawn? (github.com)

https://news.ycombinator.com/item?id=13210849 https://news.ycombinator.com/item?id=13211852 https://news.ycombinator.com/item?id=13214487

---

"Very few languages have followed the “Message passing concurrency road”; some others that did so ((besides Erlang, the topic of the article)) were Oz and Occam."

---

from a paper on Erlang:

" Isolation has several consequences:

1. Processes have “share nothing” semantics. This is obvious since they are imagined to run on physically separated machines.

2. Message passing is the only way to pass data between processes. Again since nothing is shared this is the only means possible to exchange data.

3. Isolation implies that message passing is asynchronous. If process communication is synchronous then a software error in the receiver of a message could indefinitely block the sender of the message destroying the property of isolation.

4. Since nothing is shared, everything necessary to perform a distributed computation must be copied. Since nothing is shared, and the only way to communicate between processes is by message passing, then we will never know if our messages arrive (remember we said that message passing is inherently unreliable.) The only way to know if a message has been correctly sent is to send a confirmation message back

Programming a system of processes subject to the above rules may appear at first sight to be difficult — after all most concurrency extensions to sequential programming languages provide facilities for almost exactly the opposite, providing things like locks, and semaphores, and provision for shared data, and reliable message passing. Fortunately, the opposite turns out to be true — programming such a system turns out to be surprisingly easy, and the programs you write can be made scalable, and fault-tolerant, with very little effort. ... Languages which support this style of programming (parallel processes, no shared data, pure message passing) are what Andrews and Schneider refer to as a “Message oriented languages.” ...

1. Processes are the units of error encapsulation — errors occurring in a process will not affect other processes in the system. We call this property strong isolation.

2. Processes do what they are supposed to do or fail as soon as possible.

3. Failure, and the reason for failure, can be detected by remote processes.

4. Processes share no state, but communicate by message passing.

" -- [8]

---

"Channels and goroutines have made our job so much easier. It's not just the fact that channels and goroutines are cheaper in terms of resources, compared to threadpool-based Futures and Promises, resources being memory and CPU. They are also easier to reason about when coding." -- [9]

(but a comment on that post:

 haspok • 2 days ago

I don't really understand your Actors vs Channels example: there is no concurrency there, only a single pipeline. Actors/Processes (Erlang) shine if there are concurrent pipelines (a lot of them), which might need to be created/destroyed dynamically (supervisors FTW!), even better if you need to group messages in those according to some criteria, or prioritize them, etc. For this task they are certainly an overkill.

)

---

" Case study ((of Golang being simpler than Scala/channels and goroutines being smipler than Futures and Promises))

Recently, we've been struggling with an issue where we had to process some billing information.

The data came through a stream, and had to be persisted to a MariaDB? database. As persisting directly was impractical due to the high rate of data consumption, we had to buffer and aggregate, and persist on buffer full or after a timeout.

Kafka, MariaDB?, buf

First, we made the mistake of making the `persist` function synchronized. This guaranteed that buffer-full-based invocations would not run concurrently with timeout-based invocations. However, because the stream digest and the `persist` functions did run concurrently and manipulated the buffer, we had to further synchronize those functions to each other!

In the end, we resorted to the Actor system, as we had Akka in the module's dependencies anyway, and it did the job. We just had to ensure that adding to the buffer and clearing the buffer were messages processed by the same Actor, and would never run concurrently. This is just fine, but to get there we needed to; learn the Actor System, teach it to the newcomers, import those dependencies, have Akka properly configured in the code and in the configuration files, etc. Furthermore, the stream came from a Kafka Consumer, and in our wrapper we needed to provide a `digest` function for each consumed message that ran in a `Future`. Circumventing the issue of mixing Futures and Actors required extra head scratching time.

Enter channels.

buffer := []kafkaMsg{} bufferSize := 100 timeout := 100 * time.Millisecond

for { select { case kafkaMsg := <-channel: buffer = append(buffer, kafkaMsg) if len(buffer) >= bufferSize { persist() } case<-time.After(timeout): persist() } }

func persist() { insert(buffer) buffer = buffer[:0] }

Done; Kafka sends to a channel. Consuming the stream and persisting the buffer never run concurrently, and a timer is reset to timeout 100 milliseconds after no messages received. '

---

" The main difference between what’s happening here is that the go operation ((goroutines in Golang) returns nothing while the spawn operation ((in Elixir)) returns an id for the process.

Both systems utilize a similar communication style with these routines via messages queues. Go calls them channels, while Elixir has inboxes.

With Go, a channel can be defined so that anything can pass messages to if it has the channel name. With Elixir, messages are sent to a process either via the process id or a process name. Channels in Go are defined with types for the messages, while inboxes in Elixir utilize pattern matching. "

---

"goroutines are 2KB each, while Elixir processes are 0.5KB each. Elixir processes have their own isolated heap spaces which are individually reclaimed when the process finishes, while goroutines utilize shared memory and an application-wide garbage collector to reclaim resources."

---

" Erlang defines a set of patterns for best practices when utilizing concurrency and distribution logic all bundled under “OTP”. In most cases with Elixir code, you’ll never touch the raw spawn and send/receive functions, deferring to the abstractions for this functionality.

Wrappers include Task for simple async/await style calls; Agent for concurrent processes which maintain and update a shared state; GenServer? for more complex custom logic.

In order to constrain maximum concurrency to a particular queue, Go channels implement buffers which can either receive a a certain number of messages (blocking the sender if at the limit) or unlimited messages which could have a memory constraint.

Elixir inboxes default to unlimited messages but can utilize Task.async_stream to define max concurrency for an operation and blocking senders in the same way as limited buffers on a channel. "

---

" Just as you can manually call the scheduler with Go, you can also manually implement a version of supervisors in Go via the suture library.

Note: Within handlers that you’ll implement with Go for most servers, panics are already addressed. Therefore, a crash within a web request that wasn’t critical enough to kill the entire application won’t. You’ll still have to address it on your own goroutines though. Don’t let this explanation imply that Go is fragile, because it is not. "

---

kyrra 2 days ago [-]

As linked elsewhere here, tight loops that never preempt are being fixed in Go 1.8/1.9[0]. Looks like a flag may been added to Go 1.8 called "GOEXPERIMENT=preemptibleloops" that adds a preemptible point at the end of a loop. It's behind a flag for performance/testing reasons, but they are working on it.

[0] https://github.com/golang/go/issues/10958

---

derefr 2 days ago [-]

The way I find it simplest to "enlighten" people about Erlang's peculiar philosophy is its approach to scheduling:

The Erlang VM is "reduction"-scheduled. This means that the given Erlang process currently running on a scheduler thread can get pre-empted, but only as a result of executing a call/return instruction. (Effectively, the pre-emption is a check inside the implementation of the call/return opcode.) As long as you don't execute any call/returns (don't call functions and don't return from your own function), your function body can run as long as it likes.

This is a design choice: because processes won't be pre-empted "in the middle" of a function, any Erlang process can feel safe executing an instruction that calls into native code, while not having to worry that that native code could itself be pre-empted and leave dirty state in the Erlang process's heap while some other process gets scheduled and tries to then message or introspect that process. It gives you a lot of leeway "for free."

So how does Erlang ensure that processes don't hog a core forever, given that you could theoretically just write a loop that spins forever? Well, in Erlang, you can't write a loop. Instead of loops, you have tail-calls with explicit accumulators, ala Lisp. Not because they make Erlang a better language to write in. Not at all. Instead, because they allow for the operational/architectural decision of reduction-scheduling. Without loops in the language, every function body will execute for only a finite amount of time before hitting one of those call/return instructions, and thus activating the reduction-checker.

The Erlang "platform" has been shaped around the choices of how to best construct a production runtime that gives you "hard things" (like calling into native-code libraries while maintaining thread-safety) for free. Or rather, you could say that where everyone else pays these costs when they hit the particular problem, Erlang pays the cost up-front in the design of the language+platform and how you're forced to code at all times, in order to make these hard things easy.

The same is true of so many other Erlang things:

how synchronous messaging has to be implemented on top of asynchronous messaging with expected reply-refs and timeouts, so as to make the sender process, rather than the receiver process, be the thing that defaults to crashing if the receiver doesn't recognize the message;
how OTP-framework code has to be structured as delegate functions that return to the framework, so that the framework can "be there" in each process to handle hot code upgrades and process hibernation;
how sockets either block (when {active, once}), or will saturate a process with packet messages (if just active) until that process crashes on overload--because the network listener is a separate part of the runtime that lives in a hot loop and wants to just be given a place to stuff packets into, and isn't allowed to do anything that's not an O(1) operation, like expanding the size of a process's message inbox;

etc.

---

giovannibajo1 3 days ago [-]

I think the part about cooperative/preemptive multitasking isn't saying it all.

Go multitasking is based on the compiler inserting switchpoints on function calls and syscall boundaries. But this affects the scheduling of a single OS-level threads executing that specific goroutine. The number of OS-level threads that the Go scheduler uses can arbitrarily grow, and OS-level threads are preemptively multitasked.

So I think the description is focusing a narrow view of the problem. What is usually required by applications is low latency in reply to system events (e.g.: data available on network sockets), and Go performs very well in this context. For instance, the fact that Go is transparently using a epoll/kqueue based architecture under the hood is probably affecting latency much more than the whole "cooperative" issue as depicted.

eternalban 2 days ago [-]

> I think the part about cooperative/preemptive multitasking isn't saying it all.

That's still not the entire story:

GC & tight loops in Go: https://github.com/golang/go/issues/10958

Per process vs per runtime GC: https://news.ycombinator.com/item?id=12043088

eternalban 2 days ago [-]

Can't update parent.

Please /do not/ read my OP as a dig/boost at any level. Just pure geek interest in language architectures and sharing info.

---

bad_user 5 days ago [-]

Some notes:

the actor model, along with the Akka implementation, has nothing to do with functional programming; and it isn't orthogonal either, since an actor's mailbox interactions are definitely not pure, with actors being stateful and at the same time non-deterministic; in general if you place those in the same article, there's a high probability that you never did functional programming; and if you did actual functional programming (as in programming with mathematical functions), then you wouldn't want to go back to a language that makes that impossible ;-)
Akka actors are not the only game in town for processing data, you can also use Finagle, FS2, my own Monix and even Akka Streams; And yes, concurrency often requires multiple solutions because there's no silver bullet and Go's channels suck compared with what you can do with a well grown streaming solution

16bytes 4 days ago [-]

I don't understand the claim that Akka actors are both impure and non-deterministic.

If your actor is communicating only through its inbox, then it is both pure and deterministic. Given the same set of messages in the same order, you arrive at the same actor state. Sure, you can do wacky things with side-effects, but that's not akka's fault.

...

bad_user 4 days ago [-]

> Given the same set of messages in the same order, you arrive at the same actor state

You just described object identity (see [1]), which are objects whose state is determined by the history of the messages received. An object with identity is stateful, side-effectful and impure by definition.

So no, an actor is almost never pure or deterministic. I'd also like to emphasize determinism here, because you can never rely on a message ordering, given their completely asynchronous nature, so you get something much worse than OOP objects with identity.

> This sounds a lot like the "no true Scotsman" fallacy

I'm talking about my experience. Given that I'm currently a consultant / contractor, I have had a lot of experience with commercial Scala projects initiated by other companies. And in general the projects that use Akka actors are projects that have nothing to do with functional programming.

This happens for a lot of reasons, the first reason being that most people are not in any way familiar with functional programming, or what FP even is for that matter. Not really surprising, given that most Scala developers tend to be former Java/Python/Ruby developers that get lured by Akka actors and once an application grows, it's hard to change it later. Evolution to functional programming happens in a web service model where new components get built from scratch.

But the second reason is more subtle. Functional programming is all about pushing the side-effects at the edges of your program. And if you want to combine the actor model with functional programming, you have to model your actors in such a way as to not contain any business logic at all (e.g. all business logic to be modeled by pure functions, immutable data-structures and FP-ish streaming libraries), evolved only with `context.become` (see [2]). So such actors should be in charge only with communications, preferably only with external systems. This doesn't happen because it's hard to do, because developers don't have the knowledge or the discipline for it and because it then raises the question: why use actors at all?

Because truth be told, while actors are really good at bi-directional communications, they suck for unidirectional communications, being too low level. And if we're talking about communicating over address spaces, for remoting many end up with other solutions, like Apache Kafka, Zookeeper, etc.

On combining Akka actors with functional programming, I made a presentation about it if interested (see [3]).

[1] https://en.wikipedia.org/wiki/Identity_(object-oriented_programming)

[2] https://github.com/alexandru/scala-best-practices/blob/master/sections/5-actors.md#52-should-mutate-state-in-actors-only-with-contextbecome

[3] https://alexn.org/blog/2016/05/15/monix-observable.html

---

unoti 5 days ago [-]

I have a simple question. In the article they mentioned they had a concurrency issue with a timed buffer that they later neatly solved with go channels and goroutines. They said that they solved the problem in Scala by moving to the actor model, but that required importing Akka into their project and training everyone how to use Akka.

My simple question is: couldn't they have achieved the heart and soul of the actor model by just making an object on its own thread, and talking to that object on a simple synchronized message queue? It's a handful of easy to understand lines of code, and nobody needs to delve into the sea of madness that is learning and configuring Akka and its actor model.

In more general terms, it's possible to use Scala as Java that plays well with immutability and functional programming techniques without turning your codebase into an overly complex difficult-to-understand mess. But for some reason people just can't stop themselves.

For what it's worth, Elixir hits a real sweet spot of functional goodness, combined with awesome concurrency without getting too deep into bizarre complexity.

joshlemer 5 days ago [-]

I think it's becoming a more commonly held opinion in the Scala community that people often tend to go off the deep end with Akka and I tend to agree with that. In particular, I think that most of what people use Actors for can be done with Futures, and what can't be done with Futures can most of the time be done with Akka Agents (http://doc.akka.io/docs/akka/current/scala/agents.html).

And when I use Actors, I tend to want o wall them off in their own place in the codebase, instead of letting the actor-ness touch multiple parts of the code.

ktoso 4 days ago [-]

(disclaimer: Akka team here).

The post, somewhat surprisingly, omits to mention Akka Streams (which is a perfect fit, since, as the post mentions 'The data came through a stream [...]'), and Reactive Kafka (which is an official Akka project https://github.com/akka/reactive-kafka ) which solve this exact use case.

These projects/modules have been around since 2014, and we've been presenting / sharing information about them a lot since then. Perhaps this post can be interpreted that we need to put even more effort into the discoverability of them (in fact, we are right now reworking documentation for this and more). Using Akka Streams, a complete re-implementation of the example use-case mentioned in the post would look like this:

`Consumer.plainSource(kafkaConsumerSettings, kafkaSubscription).groupedWithin(maxItemsInBatch, 100.milliseconds).runForeach(batch => persist(batch))`

Too bad the team missed these... I would have loved to point the team to the right (existing) abstraction/library in one of our many communities (github, gitter chat and the mailing lists - we're pretty nice people, come around some time), rather than posting like this. What we've learnt here though is that we need to work even harder on the discoverability of those libraries - and indeed it is one of the things we focus on nowadays (with docs re-designs, cross links, better search and more).

Anyway, just wanted to let you all know what Akka has in store for such use cases, Streaming is definitely a first class citizen in the toolkit.

---

" [swift-evolution] Proposal: Add generator functions to the language ... >> I'd love to have first-class support for generators like this, but it's a lot of work. It's a lot easier to do this sort of thing in a scripting language like Python than it is to do in a language like Swift, because it requires reifying the stack into a data structure that can be passed around. And I suspect there's a lot of non-trivial questions that have to get answered before you can even propose an implementation for this. "

---

ongoing discussion on co-routines in Rust

https://users.rust-lang.org/t/coroutines-and-rust/9058/13

---

 wahern 1 day ago [-]

Go-style coroutines will still be more efficient and more elegant than C++ coroutines. Goroutines are stackful, whereas in C++ you'll need to manually chain coroutines. That means a dynamic allocation and deallocation for _each_ call frame in the chain. That's more efficient than JavaScript?-style continuation closures, but in many situations still far more work than would be required for stackful coroutines.

Everything described in Gor Nishanov's video applies equally to stackful vs non-stackful coroutines. That is, the code conciseness, composability, and performance advantages of non-stackful coroutines are even greater with stackful coroutines.

Nishanov dismisses stackful coroutines out-of-hand because to be memory efficient one would need relocatable (i.e. growable) stacks. But that begs the question of how costly it would be to actually make the stack relocatable. I would assume that in a language like C++ where stack layout must already be recorded in detail to support exceptions, that efficiently relocatable stacks wouldn't be too difficult to implement.

At the end of the day, without stackful coroutines networking still won't be as elegant as in Go. And for maximum performance, state machines (e.g. using computed gotos) will still be useful. You'll either need to sacrifice code clarity and composition by explicitly minimizing chains, or you'll sacrifice performance by having deep coroutine chains.

pcwalton 1 day ago [-]

> Nishanov dismisses stackful coroutines out-of-hand because to be memory efficient one would need relocatable (i.e. growable) stacks. But that begs the question of how costly it would be to actually make the stack relocatable. I would assume that in a language like C++ where stack layout must already be recorded in detail to support exceptions, that efficiently relocatable stacks wouldn't be too difficult to implement.

No, this is a common misconception. Unwind tables only store enough information to be able to locate each object that has to be destroyed. It's an entirely different matter to move those objects, because a system that does that has to be able to find all outstanding pointers to an moved object in order to update them. It's legal, and ubiquitous, in C++ to hide pointers to objects in places that the compiler has no knowledge of. Thus moving GC as it's usually implemented is generally impossible (outside of very fragile and application-specific domains like Chrome's Oilpan).

The best you can do in an uncooperative environment like that of C++ is to allocate large regions of a 64-bit address space and page your stacks in on demand. (Note that this setup is gets awfully close to threads, which is not a coincidence—I think stackful coroutines and threads are essentially the same concept.)

---

LLVM coroutine intrinsics:

http://llvm.org/docs/Coroutines.html

---

DonbunEf?7 84 days ago [-]

The original smart-contract language, E, is almost nothing like Solidity. To focus on Turing-completeness, it's true that E is Turing-complete and has an `eval()` primitive, which normally would be dangerous. However, E both comes with sufficient tools to prove that any given instance of `eval()` is safe, and also to limit Turing-complete behavior when needed.

Specifically, in E and Monte, we can write auditors, which are objects that can structurally prove facts about other objects. A common auditor in both languages is `DeepFrozen?`; writing `as DeepFrozen?` on an E or Monte object causes the `DeepFrozen?` auditor to examine the AST of that object and prove facts.

There's a Monte community member working on an auditor for primitive recursive arithmetic, inspired IIUC primarily by the EVM's failings.

The DAO hack happened because of a bug class known as "plan interference" in the object-capability world; these bugs happen because two different "plans", or composite actions of code flow, crossed paths. In particular, a plan was allowed to recursively call into itself without resetting its context first. EVM makes this extremely hard to get right. E and Monte have builtin syntax for separating elements of plans with concurrency; if you write `obj.doStuff()` then it happens now, but `obj<-doStuff()` happens later.

So, uh, yeah. Smart contracts aren't a bad idea, but Ethereum's not very good.

Animats 84 days ago [-]

That's a classic form of GUI bug. Some widget calls something which calls something else which eventually calls the original widget, which is not in a stable state. Classic bugs in this area involve using "save file", and then "new folder", and then renaming some folders in a way which invalidates something upstream.

socrates1024 83 days ago [-]

Do you think it would be possible to improve the EVM by adding E's notion of concurrency? One constraint would be the need to have deterministic scheduling, since every execution would need to be run identically by all validating nodes.

[edit] Incidentally, we pointed out several lessons from the ocap community in a commissioned report from Ethereum foundation back in 2015. Few of those suggestions were adopted at the EVM level or the higher levels though. https://github.com/LeastAuthority/ethereum-analyses/blob/mas...

drdeca 83 days ago [-]

Is this the language E you are referring to : https://en.wikipedia.org/wiki/E_(programming_language) ?

I hadn't heard of it. It sounds neat

---