proj-oot-ootConcurrencyNotes4

Difference between revision 10 and current revision

No diff available.

" Downey; The Little Book of Semaphores

Takes a topic that's normally one or two sections in an operating systems textbook and turns it into its own 300 page book. The book is a series of exercises, a bit like The Little Schemer, but with more exposition. It starts by explaining what semaphore is, and then has a series of exercises that builds up higher level concurrency primitives.

This book was very helpful when I first started to write threading/concurrency code. I subscribe to the Butler Lampson school of concurrency, which is to say that I prefer to have all the concurrency-related code stuffed into a black box that someone else writes. But sometimes you're stuck writing the black box, and if so, this book has a nice introduction to the style of thinking required to write maybe possibly not totally wrong concurrent code.

I wish someone would write a book in this style, but both lower level and higher level. I'd love to see exercises like this, but starting with instruction-level primitives for a couple different architectures with different memory models (say, x86 and Alpha) instead of semaphores. If I'm writing grungy low-level threading code today, I'm overwhelmingly like to be using c++11 threading primitives, so I'd like something that uses those instead of semaphores, which I might have used if I was writing threading code against the Win32 API. But since that book doesn't exist, this seems like the next best thing.

I've heard that Doug Lea's Concurrent Programming in Java is also quite good, but I've only taken a quick look at it. " -- http://danluu.com/programming-books

---

http://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio/ says that Python asyncio is too complicated but then also says:

"What landed in 3.5 (the actual new coroutine objects) is great. In particular with the changes that will come up there is a sensible base that I wish would have been in earlier versions. The entire mess with overloading generators to be coroutines was a mistake in my mind."

https://news.ycombinator.com/item?id=12831989 concurs that Python 3.5's concurrency is better than 3.4's

---

some comments in https://news.ycombinator.com/item?id=12829759 say that greenlets/greenthreads is better than asyncio. But https://news.ycombinator.com/item?id=12831989 and children point out that (a) greenthreads eg preemtive multitasking give you no guarantees about when control will be switched out, so you have to use locks all over the place, in comparison to cooperative multitasking, where, because you know when control might switch, you only have to worry about protecting shared data at those points, and (b) it's easier to do promises and cooperative multitasking across interoperability boundaries to libraries in other languages, in comparison to greenthreads.

---

note: Python 3's asyncio event loops even have TCP facilities:

https://docs.python.org/3/library/asyncio-eventloop.html#creating-connections https://docs.python.org/3/library/asyncio-protocol.html#asyncio.Protocol

---

https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/ points out issues with the following sort of system:

Namely, the problems are that it's more difficult to reason about your code, because if the code says "f(); g();", but f() schedules things for the future, then the code reads as if f() has been completed before g() but actually the final effects of f() are not yet present at the time when g() begins executing.

Instead, e recommends systems like this:

Futhermore, curio has the convention that stuff that doesn't do all of its work synchronously is always labeled with 'async', so that you know that calls into non-async curio library functions don't put off doing their work until later. By contrast, asyncio's stream-based network I/O functions, eg StreamWriter?.write, aren't labeled 'async' yet don't complete all of their work immediately.

Furthermore, "In asyncio, there are many different representations of logically concurrent threads of execution – loop.add_reader callbacks, asyncio.Task callstacks, Future callbacks, etc. In curio, there is only one kind of object that can represent a logical thread – curio.Task – and this allows us to handle them in a uniform way".

Futhermore, in curio you can configure a timeout using a context manager, and it will be propagated downwards.

Futhermore, e notes that whenever you have futures involved, you may want to cancel a task, but then what happens if that task is blocked awaiting some future -- do you cancel that future too? But if multiple tasks are using the same future and you cancel it, then by cancelling one task you might erroneously affect an unrelated task. E says that "...we can't avoid this by checking for multiple waiters when propagating cancellations from tasks->futures" but i didn't understand why not.

Furthermore, e notes that when callbacks are involved, frameworks can't (easily) statically projected the future path of control across callbacks, so it's harder for a framework to assist with things like 'finally' cleanup actions.

Furthermore, implicit dynamic thread-local context is harder to do across callbacks, because even if a framework/language provides a way to do this ordinarily, that way might not work across callbacks.

( " In a curio-style framework, the problem is almost trivial, because all code runs in the context of a Task, so we can store our task-local data there and immediately cover all uses cases. And if we want to propagate context to sub-tasks, then as described above, sub-task spawning goes through a single bottleneck inside the curio library, so this is also easy. I actually started writing a simple example here of how to implement this on curio to show how easy it was... but then I decided that probably it made more sense as a pull request, so now I don't have to argue that curio could easily support task-local storage, because it actually does! It took ~15 lines of code for the core functionality, and the rest is tests, comments, and glue to present a convenient threading.Local-style API on top; there's a concrete example to give a sense of what it looks like in action.

I also recommend this interesting review of async context propagation mechanisms written by two developers at Google. A somewhat irreverant but (I think) fair summary would be (a) Dart baked a solution into the language, so that works great, (b) in Go, Google just forces everyone to pass around explicit context objects everywhere as part of their style guide, and they have enough leverage that everyone mostly goes along with it, (c) in C# they have the same system I implemented in curio (as I learned after implementing it!) and it works great because no-one uses callbacks, but (d) context propagation in Javascript is an ongoing disaster because Javascript uses callbacks, and no-one can get all the third-party libraries to agree on a single context-passing solution... partly because even the core packages like node.js can't decide on one. " -- [1] )

" What makes a Python app "async/await-native"? Here's a first attempt at codifying them:

    An async/await-native application consists of a set of cooperative threads (a.k.a. Tasks), each of which consists of some metadata plus an async callstack. Furthermore, this set is complete: all code must run on one of these threads.
    These threads are supervised: it's guaranteed that every callstack will run to completion – either organically, or after the injection of a cancellation exception.
    Thread spawning is always explicit, not implicit.
    Each frame in our callstacks is a regular sync- or async-colored Python function, executing regular imperative code from top to bottom. This requires that both API primitives and higher-level functions *respect causality* whenever possible.
    Errors, including cancellation and timeouts, are signaled via exceptions, which propagate through Python's regular callstack unwinding.
    Resource cleanup and error-handling is managed via exception handlers (with or try)." -- [2]

Open questions:

" a common pattern I've run into is where I want to spawn several worker tasks that act like "part of" the parent task: if any of them raises an exception then all of them should be cancelled + the parent raise an exception; if the parent is cancelled then they should be cancelled too. We need ergonomic tools for handling these kinds of patterns robustly.

Fortunately, this is something that's easy to experiment with, and there's lots of inspiration we can draw from existing systems: Erlang certainly has some good ideas here. Or, curio makes much of the analogy between its event loop and an OS kernel; maybe there should be a way to let certain tasks sign up to act as PID 1 and catch failures in orphan tasks? " -- [3]