Bayle Shanks's website: proj-oot-ootConcurrencyNotes5

Even if you have non-blocking I/O (async/await) or n:m 'goroutines', need a way to assign priorities to these different 'threads'. Because consider if you have one thread receiving state updates from the outside world, and another thread in an infinite loop of looking at the current state and then doing something; you want the top priority to be to drain the state updates, because if you can't do that, then you'll get an ever-growing backlog of state updates and eventually the 'do something based on state' thread will be acting on very old information.

---

" Our experience shows that cooperative preemption offers several performance advantages over traditional preemptive scheduling in multi-core platforms [3]. The Pillar compiler is expected to generate a yield check at each method entry and backward branch, a well-known technique that ensures yielding within a bounded time. "

of course you also have to do something special when calling unsafe native code, lest it not reliquish control

---

we want pmap, preduce, but also: pscan, pfilter, pmapconcat (or mb just concat; fundamental b/c 'map' alone cannot create more than one new item when looking at an old item) gather, scatter

these will not be prefixed with 'p' because they will be the default.

---

some studies suggest that out of OpenACC?, OpenMP?, OpenCL?, CUDA, that OpenACC? and OpenML? require the least lines of code [1]

this article briefly compares OpenMP? and OpenACC? near the end: [2]

---

generalize limit order books and trading bots to a cognitive architecture.

generalize tuple spaces to contain not just data but also executable programs which can react to incoming events.

The idea is that there is an 'order book' which is a list/database of 'resting orders'. These 'resting orders' serve as both storage for state (like rows in a database), as events (an incoming order is like an event), and as event listeners (an order resting in the book can react to incoming orders by executing some program).

Orders include queries that determine against which other orders they can 'match'. When a new order arrives, it queries the existing orders, and the query might 'match' against one or more existing orders. If it matches an existing order, a handler in the new order is called, with the existing order provided as an argument, and has a change to execute some program.

If the new order does not cease to exist after it matches (or fails to match), and is added as a resting order to the order book, then the other orders in the book run their query against the new order to see if they match against it. If they do, then their handler(s) are called.

The queries are in an O(n) query language (meaning that the time and space complexity of executing the query is linear in the size of the other order against which it is testing for a match). This query language can be a graph regex, or more generally, it could just be a program defining the function doesOtherOrderMatchQuery, where the program must not contain backwards jumps, however the language does include primitives map/reduce/scan to iterate over parts of the other order.

The correspondant of limit order book 'last price' is the last matched pair of orders. The correspondant of limit order book 'bid/ask' is a mechanism to compute a reduction ('reduce') over all resting orders and present the result as a piece of state data (which is itself just a form of order).

---

in Go, channels don't have 'sides'; they are shared mediums and anyone with the channel pointer can read or write to the channel (eg the code below "Let’s say, each handler wants to write asynchronously to the logger" at [3]). When one person reads from the channel, that value is consumed, and others who subsequently read from the same channel won't see that value (perhaps there should also be 'broadcast channels' -- does Go have these?).

---

Posted on February 12, 2007 by Brendan Eich Threads suck

This is not an original thought, but I write with some authority here.

I hacked Unix kernel code out of grad school at SGI, in SGI’s “good old days” (1985-1992). Among other things, we took single-threaded (ST) kernel code and multi-threaded (MT) it on SGI’s SMP boxes. I won a free trip to New Zealand and Australia in 1990 along with Bent Hagemark, with kernel source on magtape in our hot little hands, on account of others’ bugs in adapting some single-threaded (ignoring interrupts) AT&T “Streams” code to SGI’s SMP kernel, and made fixes in the field (thanks to the SGI sales guys in Brisbane, we even got two nights on the Gold Coast in compensation — not long enough!).

You must be this tall to hack on threaded systems, and that means most programmers should run away crying. But they don’t. Instead, as with most other sharp tools, the temptation is to show how big one is by picking up the nearest ST code and jamming it into a MT embedding, or tempting race-condition fate otherwise. Occasionally the results are infamous, but too often, with only virtual fingers and limbs lost, no one learns.

Threads violate abstractions six ways to Sunday. Mainly by creating race conditions, deadlock hazards, and pessimistic locking overhead. And still they don’t scale up to handle the megacore teraflop future.

We can hope for better static analyses to find all races. In the real world, the code is C or C++ and there’s no hope for static salvation. Sure, some languages try to put deadlocks in a syntactic cage, but that walks right into the overhead problem, in spite of heroic VM-based optimizations. Unexpected costs, even if constant or linear, can sink any abstraction. For example (still piquant to Mozilla hackers busy deCOMtaminating), virtual method calls cost; they should be avoided where you’re writing hardware. The same goes for locks: not all abstractions must be MT-safe; some must be ST and fast.

So my default answer to questions such as the one I got at last May’s Ajax Experience, “When will you add threads to JavaScript??” is: “over your dead body!”

There are better ways. Clueful hackers keep rediscovering Erlang. Then there is STM. One retro stylist I know points to an old language-based solution, Hermes.

A requirement for JS3 (along with hygienic macros) is to do something along these more implicit lines of concurrency support. In all the fast yet maintainable MT systems I’ve built or worked on, the key idea (which Will Clinger stated clearly to me over lunch last fall) is to separate the mutable unshared data from the immutable shared data. Do that well, with language and VM support, and threads become what they should be: not an abstraction violator from hell, but a scaling device that can be composed with existing abstractions.

So here’s a promise about threads, one that I will keep or else buy someone a trip to New Zealand and Australia (or should I say, a trip here for those over there): JS3 will be ready for the multicore desktop workload.

Does this mean we won’t be hand-coding MT Gecko? I think Graydon said it best: “I’d rather eat glass.” Most high-level concurrent Mozilla programming will be in JS3 on the back of an evolved Tamarin.

---

nickbauman 2 hours ago [-]

This is how I understand the two approaches to concurrency:

1) Use preemptive system threads that can execute in parallel. A task requiring simultaneous waiting is given an operating system thread of its own so it can block without stopping the entire program. But threads require significant memory and other resources per thread. Also, the operating system can arbitrarily interleave the execution of system threads, requiring the programmer to carefully protect shared resources with locks and condition variables, which is exceedingly error-prone.

2) Have a single-threaded program, where that single thread runs an event loop whose job is to react to external events by invoking a callback function that has been registered. While it doesn't require the same kind of complex synchronization that preemptive threads do, the inverted control structure of this approach requires your own control flow to thread awkwardly through the system's event loop, leading to a maze of event callbacks.

Coroutines work with the latter style in an attempt to tame its complexity. They are themselves complex, however, and, in my opinion, that complexity doesn't pull its weight when you consider in the end you only have one thread.

Can anyone tell me what I'm missing or how these problems with these approaches have been solved in places?

missblit 1 hour ago [-]

I found them useful when using the Boost Asio C++ networking library.

I needed to read some bytes, make a decision, read some more bytes, make another decision, etc. This started out as callback hell since every read was asynchronous, which was very painful, but after using the coroutine support I was able to write the code as if it was synchronous and the logic became a lot easier to follow.

And the coroutines weren't that hard to use, even lacking proper language support. I could basically write something like async_read(socket, buffer, yield_context);. The coroutine yielded behind the scenes and the buffer was populated once control flow was returned to the next line (from who knows where!)

KayEss? 1 hour ago [-]

The implementation of the coroutines might be complex, but their use is very straightforward -- the code looks almost exactly the same as the blocking code would, just with the awaits in there.

As for the number of threads, when I use Boost ASIO with coroutines I often end up with multiple threads servicing a pool of coroutines, so if there is shared state between them then there is still synchronisation. I use channels implemented on top of eventfd to help with that.

When I converted some network code from callbacks to coroutines the code ended up about 1/3 as long and was far simpler and easier to understand. It also fixed several bugs I simply couldn't find.

The reality is that the callbacks are far more complex to use than the coroutines.

CupOfJava? 1 hour ago [-]

The fundamental difference between single threading, coroutines, and multi threading:

Single Thread: 1 thread of execution, 1 execution stack

Coroutine: 1 thread of execution, multiple execution stacks

Multi threading: multiple threads of execution, multiple execution stacks

strictfp 1 hour ago [-]

Or 3) you do like golang and erlang and build really lightweight green threads so you get the programming model of approach 1 and nearly the same speed as number 2.

This of course requires extra effort to prevent blocking of OS threads and in the case where the OS expects thread locality. But IMHO it's the superior approach.

The best solution would be if OSes themselves would provide really lightweight threads. And I suspect that they will, eventually.

nickbauman 1 hour ago [-]

I haven't done Erlang, but in my experience Golang has much the same problems as 1) so I'm not sure I'd agree. +1 to lightweight threads, tho.

maxpolun 1 hour ago [-]

You can have multiple threads with coroutines. However you generally only have one thread per CPU/hardware thread, and use async io within that thread.

This is the general architecture of nginx for example (though it doesn't use coroutines, just callback based async io).

zackmorris 55 minutes ago [-]

I'll take a crack at this - coroutines are not about performance. They are for converting callbacks to a single thread of execution that's easier to reason about (which could themselves be embedded in other threads).

Imperative logic is equivalent to a state machine but again much easier for humans to reason about, that's why developers got so much more done in the past with synchronous execution and macros than they do today as we struggle to build even the simplest user interfaces. I'm hopeful that we can get all of the benefits of 1980s cooperative threading but that the decades of experience since allow us to avoid the pitfalls.

butterisgood 1 hour ago [-]

Concurrency is about how you organize a solution really more than anything. Whether those concurrent tasks execute in parallel or not is a different consideration. Many authors and experts conflate these terms.

You can write concurrent code with pthreads. It may happen to execute in parallel but it doesn't have to. In fact, I believe the early SunOS?/Solaris pthreads had a 'green' scheduler where it could all be done in userspace (this is from my own memory nearly 20 years ago). This made them basically "stack switchers" which is what some lightweight coroutine libraries end up doing.

Having written heavily callback-driven code. Threading contexts manually, thinking about object lifetimes, I can say that coroutines have a much cleaner flow to them and can make it easier to write and later read and debug programs than crazy callbacks and contexts.

Object lifetimes are one thing coroutines help with in a big way. I think it's why Rust is so attractive to so many.

Look at plan 9's libthread for example: https://github.com/9fans/plan9port/tree/master/src/libthread

It allows one to take advantage of nonblocking sockets as a single "kernel execution entity" mechanism across many userspace threads.

This approach works pretty well. The Mongrel2 webserver clearly borrows this approach in their task library:

https://github.com/9fans/plan9port/blob/master/src/libthread...

https://github.com/mongrel2/mongrel2/blob/master/src/task/38...

---

boredandroid 819 days ago [-]

I really think there are a couple of levels of immutability that it is easy to conflate.

Specifically immutability for

1. In memory data structures...this is the contention of the functional programming people.

2. Persistent data stores. This is the lsm style of data structure that substitutes linear writes and compaction for buffered in-place mutation.

3. Distributed system internals--this is a log-centric, "state machine replication" style of data flow between nodes. This is a classic approach in distributed databases, and present in systems like PNUTs.

4. Company-wide data integration and processing around streams of immutable records between systems. This is what I have argued for (http://engineering.linkedin.com/distributed-systems/log-what...) and I think Martin is mostly talking about.

There are a lot of analogies between these but they aren't the same. Success of one of these things doesn't really imply success for any of the others. Functional programming could lose and log-structured data stores could win or vice versa. Pat Helland has made an across the board call for immutability (http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf), but that remains a pretty strong assertion. So it is worth being specific about which level you are thinking about.

For my part I am pretty bullish about stream processing and data flow between systems being built around a log or stream of immutable records as the foundational abstraction. But whether those systems internally are built in functional languages, use lsm style data layout on disk is kind of an implementation detail. From my point of view immutability is a lot more helpful in the large than in the small--I have never found small imperative for loops particularly hard to read, but process-wide mutable state is a big pain, and undisciplined dataflow between disparate systems, caches, and applications at the company level can be a real disaster.

eloff 819 days ago [-]

Excellent points, yes it's important to clarify what we're talking about here. Samza sounds like an event-sourcing style immutable event log. You could think of it like the transaction or replication log of a traditional database. Having that be immutable is very sensible! But you can't always query that in "real-time".

On the other hand, the data structures you query in real-time, making that immutable is problematic, because then you'll need a LevelDB? style compaction step. That doesn't mean to say that it can't be done well, but that it's hard to do well.

hyc_symas 819 days ago [-]

LMDB does ACID MVCC using copy-on-write with no garbage collection or compaction needed. It delivers consistent, deterministic write performance with no pauses. It is actually now in use in a number of soft realtime systems.

eloff 819 days ago [-]

I was specifically thinking of LMDB as a counter-example when I wrote that it's not impossible, just hard to do well. A much more sensible set of tradeoffs than LevelDB?.

---

Map, FlatMap?, Fold, and HashReduce?

" Map creates a single output element per index using the function f , where each execution of f is guaranteed to be independent. The number of output elements from Map is the same as the size of the input iteration domain. Based on the number of collections read in f and the access patterns of each read, Map can capture the behavior of a gather, a standard element-wise map, a zip, a windowed filter, or any combination thereof. FlatMap? produces an arbitrary number of elements per index using function g , where again function execution is independent. The produced elements are concatenated into a flat output. Conditional data selection (e.g. WHERE in SQL, filter in Haskell or Scala) is a special case of FlatMap? where g produces zero or one elements. Fold first acts as a Map, producing a single element per index using the function f , then reduces these elements using an associative combine function r . HashReduce? generates a hash key and a value for every index using functions k and v , respectively. Values with the same corre- sponding key are reduced on the fly into a single accumulator using an associative combine function r . HashReduce? may either be dense, where the space of keys is known ahead of time and all accumulators can be statically allocated, or sparse, where the pattern may gener- ate an arbitrary number of keys at runtime. Histogram creation is a common, simple example of HashReduce? where the key function gives the histogram bin, the value function is defined to always be "1", and the combine function is integer addition. " -- Plasticine: A Reconfigurable Architecture For Parallel Patterns

---

" My own choice for writing fast programs in Erlang is an alternative technology which affords all the benefits of C code without compromising the integrity of the Erlang run-time. It is OpenCL?, a C-like language introduced by Apple in 2008. " -- [4]

---

" Concurrency in Rust is straightforward: you get a choice between OS threads, OS threads, and OS threads. When Rust was originally announced, the team had ambitions to pursue the multi-core MacGuffin? with a green-threaded actor model, but they found out that it’s very hard to do green threads with native code compilation. (Go, for example, does native code compilation, but switches contexts only on I/O; Erlang can context-switch during a long-running computation, as well as on I/O, but only because it’s running in a virtual machine that can count the number of byte-code ops.) "

note: Golang also switches context on function calls

---

" But as with the match keyword, which as in the example above can possess impossible beliefs about integers, the Rust compiler is at times more cautious than intelligent. A common pattern in parallel computing, for example, is to concurrently perform an operation over non-overlapping parts of a shared data array. As a simple (single-threaded) example, the following code will not compile in Rust:

fn main() { let mut v = vec![1, 2]; a mutable vector

    let a = &mut v[0]; // mutable reference to 1st element
    let b = &mut v[1]; // mutable reference to 2nd element}

It’s a perfectly valid code, and accessing non-overlapping slices of an array is obviously safe. But the above program will cause the Rust compiler to hem and haw about competing reference to v.

However, Rust’s parallel-computation cabal has a kind of secret handshake called split_at_mut. Although the Rust compiler is unable to reason about whether specific index values overlap, it is willing to believe that split_at_mut produces non-overlapping slices, so this equivalent program compiles just fine:

fn main() { let mut v = vec![1, 2];

    let (a_slice, b_slice) = v.split_at_mut(1);
    let (a, b) = (&mut a_slice[0], &mut b_slice[0]);}

Most parallel computations end up using split_at_mut. In the interest of code clarity, it would be nice if the Rust compiler got rid of the split_at_mut secret password and could reason sanely about slice literals and array indexes. For the record, the Nim language manages to get this right, and the Rust folks might look to it for inspiration.

(It is also worth noting that processing non-overlapping slices in parallel is destined to come into mortal conflict with The Iterator, which is by its nature sequential. In that sense parallel processing is another victim of Rust’s multi-paradigm nature; the language would be easier to parallelize if it were more aggressively imperative, or more stringently functional, but by occupying the middle ground between The Array and The Function, it loses out on well-known techniques from both camps.) "

---

" One of Go’s major selling points is its concurrency support. I have not yet played with its concurrency features, cutely called goroutines. My impression from the description is that while goroutines are an advancement over vanilla C and C++, Go lacks a good story for handling programmer errors in a concurrent environment. Normal errors are bubbled up as values, but if there’s a programmer error (e.g., index out of range), the program panics and shuts down.

For single-threaded programs, this is a reasonable strategy, but it doesn’t play very well with Go’s concurrency model. If a goroutine panics, either you take down the whole program, or you recover — but then your shared memory may be left in an inconsistent state. That is, Go assumes programmers will not make any mistakes in the recovery process — which is not a very good assumption, since it was programmer error that brought about the panic in the first place. As far as I know, the only language that really gets this right is Erlang, which is designed around shared-nothing processes, and thus programmer errors are properly contained inside the processes where they occur. "

---

" (It’s also worth mentioning that you can get Go-style M:N concurrency model in C by using Apple’s libdispatch. In conjunction with block syntax, it’s a fairly nice solution, though like Go, it’s not robust to programmer error.) "

---

" Coroutines, async/await, "user-visible" asynchronicity

It's in vogue at the moment for new languages to have something like async/await. This does not mean it's a done deal: lots has been done, but lots is still messy. The boundary between synchronous-world and asynchronous world -- in terms of types, control flow, correctness, errors, modularity, composition -- is still very awkward. Whether and how to mitigate between different synchronicity regimes, especially across FFIs or differing runtimes, is hard. Integration with effects is hard. Integration with parallelism is hard. Which parts need to be supported by the language and which parts surface to the user is hard. Cognitive load is still very high. "

---

Heterogeneous memory and parallelism

These are languages that try to provide abstract "levels" of control flow and data batching/locality, into which a program can cast itself, to permit exploitation of heterogeneous computers (systems with multiple CPUs, or mixed CPU/GPUs, or coprocessors, clusters, etc.)

Languages in this space -- Chapel, Manticore, Legion -- haven't caught on much yet, and seem to be largely overshadowed by manual, not-as-abstract or not-as-language-integrated systems: either cluster-specific tech (like MPI) or GPU-specific tech like OpenCL?/CUDA. But these still feel clunky, and I think there's a potential for the language-supported approaches to come out ahead in the long run. "

---

Animats 6 days ago [-]

We rather consider a bus a set of distinct peers with no global state.

If they've gone that far, they may as well implement QNX messaging, which is known to work well. QNX has an entire POSIX implementation based on QNX's messaging system, so it's known to work. Plus it does hard real time.

The basic primitives work like a subroutine call. There's MsgSend? (send and wait for reply), MsgReceive? (wait for a request), and MsgReply? (reply to a request). There's also MsgSendPulse? (send a message, no reply, no wait) but it's seldom used. Messages are just arrays of bytes; the messaging system has no interest in content. Receivers can tell the process ID of the sender, so they can do security checks. All I/O is done through this mechanism; when you call "write()", the library does a MsgSend?.

Services can give their endpoint a pathname, so callers can find them.

The call/reply approach makes the hard cases work right. If the receiver isn't there or has exited, the sender gets an error return. There's a timeout mechanism for sending; in QNX, anything that blocks can have a timeout. If a sender exits while waiting for a reply, that doesn't hurt the receiver. So the "cancellation" problem is solved. If you wan to do something else in a process while waiting for a reply, you can use more threads in the sender. On the receive side, you can have multiple threads taking requests via MsgReceive?, handling the requests, and replying via MsgReply?, so the system scales.

CPU scheduling is integrated with messaging. On a MsgSend?, CPU control is usually transferred from sender to receiver immediately, without a pass through the scheduler. The sending thread blocks and the receiving thread unblocks.

With unidirectional messaging (Mach, etc.) and async systems, it's usually necessary to build some protocol on top of messaging to handle errors. It's easy to get stall situations. ("He didn't call back! He said he'd call back! He promised he'd call back!") There's also a scheduling problem - A sends to B but doesn't block, B unblocks, A waits on a pipe/queue for B and blocks, B sends to A and doesn't block, A unblocks. This usually results in several trips through the scheduler and bad scheduling behavior when there's heavy traffic.

There's years (decades, even) of success behind QNX messaging, yet people keep re-inventing the wheel and coming up with inferior designs.

AceJohnny?2 6 days ago [-]

So, SIMPL?

Synchronous Interprocess Messaging Project for LINUX (SIMPL) is a free and open-source project that allows QNX-style synchronous message passing by adding a Linux library using user space techniques like shared memory and Unix pipes[3] to implement SendMssg?/ReceiveMssg?/ReplyMssg? inter-process messaging mechanisms.

https://en.wikipedia.org/wiki/SIMPL

http://icanprogram.com/simpl/

twoodfin 5 days ago [-]

Receivers can tell the process ID of the sender, so they can do security checks.

How do they implement this securely? I can't immediately think of a POSIX-y way for Process A to prove its pid to Process B without involving the kernel.

wmf 5 days ago [-]

Messaging should be implemented by the kernel.

eschaton 6 days ago [-]

This is basically Mach.

Animats 6 days ago [-]

No, Mach used unidirectional messaging. There was an RPC system, but it came with some formatting and marshalling stuff. Not sure about MacOS?. Apple has an explanation of the five or so IPC mechanisms they ended up with.[1]

[1] https://developer.apple.com/library/content/documentation/Darwin/Conceptual/KernelProgramming/boundaries/boundaries.html#//apple_ref/doc/uid/TP30000905-CH217-BABDECEG

fusiongyro 6 days ago [-]

We actually rely on multicast here at the NRAO for monitor and control of the VLA. I admit it's the only place I've heard of it being used.

ZeroMQ? is getting used more for those kinds of purposes; the Greenbank Telescope uses it for one of their instrument backends and we are now using it for VLITE and REALfast. The new archive system I'm helping build uses AMQP.

jpm_sd 6 days ago [-]

I have also seen ZeroMQ? used for mobile robot control; it was considered for ROS 2.0 [1] before they settled on DDS [2]

[1] http://design.ros2.org/articles/ros_with_zeromq.html [2] http://design.ros2.org/articles/ros_on_dds.html

---

atemerev 6 days ago [-]

Dbus is bloated hell. Whoever came with the idea "let's cram all communications from all sources into the single unified data stream, and let the clients fish what they need out of it" had the strange mapping of mental processes, to say the least. Most other forms of IPC are better (more scalable, more elegant, more comprehensible) — "everything is a file" is better, actor model is better, and I nearly think that even plain shared memory is better than a common bus.

There is a reason there is no "shared bus" in Internet communications.

hp 6 days ago [-]

> There is a reason there is no "shared bus" in Internet communications.

Yes, but dbus isn't _for_ Internet communications. It was designed to wire together the multiple processes that act more-or-less as a whole to implement a desktop environment.

"Better" is contextual. The main problems dbus solves aren't "IPC" at all - they are things like lifecycle tracking, service discovery, and getting events across the system/user-session security boundary.

dbus-broker looks interesting!

coldtea 6 days ago [-]

>Dbus is bloated hell. Whoever came with the idea "let's cram all communications from all sources into the single unified data stream, and let the clients fish what they need out of it" had the strange mapping of mental processes, to say the least.

Yeah, god forbid anyone attempts to unify similar concerns and do away with the mess of ad-hoc solutions that is POSIX/Linux.

bitwize 6 days ago [-]

Read the actual rationale behind dbus by the author before you make such comments on his sanity:

https://news.ycombinator.com/item?id=8648995

Dbus solves problems that the IPC methods you discussed do not. If there were a better solution, it would probably have been adopted by now.

hp 1010 days ago

parent

favorite

on: DBus, FreeDesktop?, and lots of madness

In creating dbus, I talked to a bunch of developers at KDE, specifically those who worked on Qt and DCOP, about their requirements. Then I met the requirements they said I had to meet for them to use dbus. I did the same for developers at GNOME. Because dbus met the requirements that the actual decision-making developers had, and allowed them to do useful things, it was adopted.

There was no mechanism to strongarm anybody into anything. KDE and GNOME devs told many ideas and many people to take a hike over the years. dbus was simply a matter of figuring out what those developers wanted and focusing on solving the actual problem, rather than hypothetical or philosophical problems.

When dbus was adopted, remember, people had already been down many roads; ad hoc IPC mechanisms, ad hoc communication through files and timestamps, hacks over X11 protocol, DCOP, multiple implementations of CORBA, SOAP, ICE (the old X-associated one), etc. People had wrestled with this problem space a lot and they had some pretty developed ideas about how to do things ideally. dbus was about coalescing those ideas into running code, and that was successful and stuck for a decade-plus now.

People sometimes have a "wtf" reaction coming from Internet protocols or kernel concerns, and while there are some wtf-worthy details in any piece of software, lots of the time people just don't understand the problem. Just as GNOME and KDE both took years to understand it and flailed around with all those protocols that didn't work out well.

toyg 1010 days ago [-]

The chronology I remember is a bit different (GNOME needed a replacement for CORBA and came up with DBUS; KDE already had DCOP; then Freedesktop happened, with big players pushing to reduce mismatches so that they could ship both environments with less hassle; and KDE scored some and GNOME scored some (more), with DBUS being one of them), but I was just a random observer so I might have got it wrong.

Still, it's a fact that DCOP back then felt much nicer and more humane (I wouldn't know about speed). I haven't used Linux in a while so I honestly don't know now.

atemerev 6 days ago [-]

The actual reasoning was "we need to get something to make desktop guys happy. Something is better than nothing. Dbus is something". And it is a valid reasoning, right. However, it left some legacy and can now be rethought.

FooBarWidget? 6 days ago [-]

You make it sound like D-Bus is a first attempt and even a hastily put together attempt to solve its target problem. It's not.[1] D-Bus is heavily influenced by (and intended to replace) DCOP, the communication system used by KDE 1 and KDE 2. DCOP was widely lauded as an extremely well-designed system.

[1] https://en.wikipedia.org/wiki/D-Bus#History_and_adoption

hp 6 days ago [-]

yep. as noted in https://news.ycombinator.com/item?id=8648995 it wasn't even just dcop - KDE had tried CORBA before switching to DCOP, and GNOME of course tried CORBA (two different ORBs), then tried Bonobo-on-top-of-CORBA, and SOAP. There were tons of documented protocols on top of X11 (many still in use today). And that's ignoring the countless ad-hoc solutions that various apps used...

Linux desktops are implemented as process swarms and communication among processes is one of the central things they have to deal with.

cjhanks 6 days ago [-]

It largely depends on the use case. dbus is functionally just a transport layer - it can very easily be used to implement an actor model if you choose to do-so (services can dynamically register and unregister channels).

There are quite a few cases where reliable 1-to-many and many-to-many communications need to occur. This is particularly the case when you have many loosely affiliated independent applications with optional communication paths. d-bus, for all of its flaws... does that well enough that I rarely notice it's running on my system.

---

"For example, I run murmur(mumble) servers sometimes, and they deprecated d-bus support for ZeroC? ICE (gplv2 or proprietary), but it seems almost as bloated if not more so. "

rleigh 6 days ago [-]

Ice is doing a lot more stuff than dbus, it's mainly features rather than bloat.

arca_vorago 5 days ago [-]

I def see a lot more features that actually work in ice, that's for sure.

baybal2 6 days ago [-]

As I remember from more than a decade ago, the selling point of DBUS was that they were not trying to design a high performance message bus with sophisticated work mechanisms in spirit of Corba and Bonobo, but a small, flexible, and utilitarian one.

Things like implicit message buffering were deliberate design decisions.

ajross 6 days ago [-]

IMHO the problem with D-Bus was that it was never small and utilitarian. They decided (correctly) to ignore all the engineering effort involved in performance and scalability, and put all that overengineering into the API instead.

D-Bus code is basically unreadable, as not only are the bus names heavily scoped (java-style) to avoid collisions, but also the interface and method names. A tiny python (or whatever) script to invoke a single method on a well-known object should be a one-liner but in practice lives over 6-7 lines just due to verbosity.

D-Bus types are inexplicably low-level for a "utilitarian" IPC mechanism, leading to a bunch of type conversion to do simple things, and a ton of marshalling code in the core. Javascript has shown us how far you can get with just IEEE doubles and UTF-8 strings, yet D-Bus suffers with a type model that looks more like C.

fiddlerwoaroof 6 days ago [-]

Yeah, I used to have a whole bunch of shell scripts to automate KDE 3 apps via dcop and then when dcop was dropped in favor of dbus, the complexity of the latter system discouraged me from porting the scripts.

Whatever technical limitations dcop may have had, its command line was amazing: space separated words and a emphasis on discoverability made it a joy to use

simcop2387 6 days ago [-]

qdbus is an almost adequate replacement for that. It's still more verbose and a bit more difficult to pass some arguments (this is all from memory) than dcop was, but it's servicable.

lima 6 days ago [-]

pydbus nicely abstracts away much of that: https://github.com/LEW21/pydbus

hp 6 days ago [-]

right, nobody was ever intended to use the raw protocol details directly... those details were intended to support an API that looked like using in-process objects (well, perhaps in-process objects with methods declared as 'async').

vidarh 6 days ago [-]

I wish they'd just build something simple, like AREXX. There's a reason AREXX ports were in almost every AmigaOS? app: It was trivially simple to get started. The network effects were huge - pretty much any app was simple to automate. People to some extent built their apps as message pumps where input events (mouse, keyboard) triggered the same command hooks as AREXX messages, so every little piece of the app was scriptable.

If you wanted a more advanced API than AREXX could reasonably accommodate, it was easy enough to layer the more complex bits next to it.

The threshold for people to take full advantage of DBus is still too high. Maybe there's a need for something that complex for inter-application communication, but if so we'd also benefit from something simpler.

Maybe it's just a documentation failure... I don't know.

---

bit_logic 23 hours ago [-]

Many are forgetting the initial reason node.js became popular. Consider the popular server-side landscape before node.js. It was dominated by Java, Python, etc. The primary way these ecosystems handle concurrency is OS-level threading. There was nothing else in the popular languages. Each language had some niche library that did non-blocking I/O (Twisted for Python, Netty for Java), but these all had a critical flaw, which is the rest of the ecosystem didn't support it. Basically every library, almost all existing code used the threading model. And when you mix threaded libraries with a non-blocking I/O server, it completely falls apart because the threaded code blocks.

---

"look into the actor model. Shared mutable state quickly becomes unmaintainanble. It used to be slow to do shared immutable state but that's no longer the case. Today there are many better ways to do shared memory with copy-on-write so data is only copied if it's changed. Think back on history - there are only a handful of computation approaches that have stood the test of time. Pipes in unix, spreadsheets, data management systems like MS Access or FileMaker?. Had concepts from those programs made it into programming, our lives would be a lot simpler today. We don't need to torture ourselves for imagined efficiency by copying the C threading model that quickly dissolves into spaghetti."

lomnakkus 1 hour ago [-]

The actor model is not the panacea it's often made out to be. (Though it's obviously better than shared-memory models for almost everything.). I'm sure you already know this, I'm just pointing it out explicitly.

For one, AFAICT very few implementations provide built-in back-pressure which is crucial if you want to avoid runaway resource consumption or having to just drop messages on the floor. This may not be critical for one's particular scenario, but it very often is for applications where you'd used threading and blocking queues (redezvous being a special case).

---

Animats 20 hours ago [-]

...

They have some good ideas. One is that variables can be marked as restricted to one thread. That should be the default. Other languages could benefit from that feature. Python, for example. If you want to get rid of the Global Interpreter Lock, knowing which variables can't be shared between threads is a big help.

Variables should be owned by a thread or owned by a lock. Rust went that way, and it works well. The proposal here is old C-style locking, where there are locks and variables, but the language doesn't know which locks are protecting which variables.

---

dbaupp 15 hours ago [-]

Rust is a relatively hands-off style of thread-safety: it protects against data races but doesn't try to get things like deadlock freedom that other more restrictive systems require. The benefits apply when using locks, or message passing or even raw atomics (and similarly to higher level libraries like data parallelism in rayon).

However, Rust is also explicitly designed to allow access to the "raw rope" when abstractions don't cut it (or when building the abstractions): that is what `unsafe` allows.

---

ender7 19 hours ago [-]

Please, no. Shared-by-default heap memory is one of the greatest mistakes ever made in language design. We need a better design justification than it's the easiest thing to implement right now.

imtringued 19 hours ago [-]

The justification is that when you have an array of 100000 elements and split it into into isolated chunks that can be processed without any synchronisation

...

Klathmon 12 hours ago [-]

Transferrable objects fit the bill for that perfectly. They are a 0-copy way to transfer data (currently only typed arrays are supported i believe) to and from web workers without the overhead of serialization.

No need for the mess that is (in my opinion) true shared memory.

spankalee 18 hours ago [-]

Fork/join on arrays and trees would be an easier to manage model, IMO.

kllrnohj 20 hours ago [-]

It'd also be nice if immutable objects could just be shared instead of copied or transferred instead of the current situation where they are copied and then made mutable.

alayek 19 hours ago [-]

> instead of the current situation where they are copied and then made mutable.

Could you share an example? As far as I understand, Immutable JS uses something like a trie for structural sharing and avoids memory leaks.

kllrnohj 18 hours ago [-]

When you send the result of Object.freeze() to or from a WebWorker? the object is still copied and comes out the other side as fully mutable (aka, not frozen).

Immutable JS is orthogonal in this case. That's "just" a fancy wrapper on top of a bunch of regular, mutable JS objects that makes it look immutable. It's not truly immutable to the runtime, and has no special interaction with webworkers as a result. It gets deep copied just like any other object.

amelius 20 hours ago [-]

Sorry, but we need shared memory for fast sharing of large immutable data structures.

spankalee 20 hours ago [-]

Then add immutable objects to JS, which enable a host of other optimizations and are useful in single-threaded contexts too.

---

stupidcar 19 hours ago [-]

You've got to love computer scientists:

"Hey, there's this word 'concurrent', which means to happen at the same time, what should it mean in our field?"

"How about, 'happening in independent processes'?"

"Great! There's this other word, 'parallel', that means coexist alongside but be structurally independent. What should that mean?"

"How about, 'happening at the same time'?"

"Fantastic! OK, let's go to the pub."

Hard to believe that people get confused.

---

skimmed this. i think i know all this and don't have to read it again, just putting it here just in case:

[5]

some useful comments on that article from HN:

 galaxyLogic 2 days ago [-]

One thing to note is that async programming like in JavaScript? is "threadless" programming which means all your data-structures are by definition thread-safe.

It's one less complication in your logic when you don't have to think "what if some other thread modifies this data while I am doing it?". And you never get hard-to-detect and hard-to-fix errors caused by that nor do you need to try to set up locks that might cause performance problems.

So some might say that threads and locks and synchronization primitives are the solution while others might say they are the problem.

mistercow 2 days ago [-]

Note though, that many of the problems that come up with threads can still come up with single-threaded async, especially when building a server. As soon as you await, any object visible to another promise chain can potentially change out from under you, and this can cause analogous problems to multithreading. One example where this can get hairy is if you have a database connection shared between two promise chains running "in parallel". You'll get undefined query ordering, which can cause very bizarre behavior.

Don't get me wrong: it's still easier. But there's an interesting property I've noticed of statements of the form "by definition, problem X is impossible when you have Y", which is that X-analogous problems absolutely can happen, and they are often more complicated to solve. For another example: "by definition, memory leaks are impossible when you have a garbage collector".

 he0001 1 day ago [-]

The first example in this article is misleading, imo, since the actual execution is synchronous, not asynchronous. The "programming style" is asynchronous, yes, but the execution will ALWAYS be the way its described since there are no other way it could be executed and therefore synchronous. Unless there are any way of things to happen out of order in relation to the execution flow, its effectively synchronous. If there is an possibility something can be executed out of order, its asynchronous. JS doesn't provide that because the event loop. It will always be executed in the order of the loop and that will be synchronous (even if and event re-queued because it's not done yet).

---

probably not too relevant as this is a networking server library, but:

https://github.com/tidwall/evio

"evio is an event loop networking framework that is fast and small. It makes direct epoll and kqueue syscalls rather than using the standard Go net package, and works in a similar manner as libuv and libevent."

" What this does instead is give a Go program direct access to the event loop. The benefit is that it bypasses all of the stuff that Go wraps around the internal event loop call that allows it to implement the way it offers a thread-like interface for you, and integrates with the channel and concurrency primitives, and maintains your position in the call stack between events, etc. The penalty is... the exact same thing, that you lose all the nice stuff that the Go runtime offers to you to implement the thread-like interface, etc., and are back to a lower-level interface that offers less services.

The performance of the Go runtime is "pretty good", especially by scripting language standards, but if you have sufficiently high performance requirements, you will not want to pay the overhead. The pathological case for all of these nice high-level abstractions is a server that handles a ton of network traffic of some sort and needs to do a little something to every request, maybe just a couple dozen cycle's worth of something, at which point paying what could be a few hundred cycles for all this runtime nice stuff that you're not using becomes a significant performance drain. Most people are not doing things where they can service a network request in a few dozen cycles, and the longer it takes to service a single request the more sense it makes to have a nice runtime layer providing you useful services, as it drops in the percentage of CPU time consumed by your program. For the most part, if you are so much as hitting a database over a network connection, even a local one, in your request, you've already greatly exceeded the amount of time you're paying to the runtime, for instance.

It does seem to me that a lot of people are a bit bedazzled by the top-level stuff that various languages offer, and forget that under the hood, everyone's using the event-based interfaces. What differs between Node and Twisted and all of the dozens or hundreds of other viable wrappers over these calls is the services automatically provided, not whether or not they are "event loops". Go is an event loop at the kernel level. Node is an event loop at the kernel level. Erlang is an event loop at the kernel level. They aren't all the same, but "event-based" vs. "not event-based" is not the distinction; it's a question of what they lay on top of the underlying event loop, not whether they use it. Even pure OS threads are, ultimately, event loops under the hood, just in the kernel rather than the user space. "

---

https://github.com/crossbeam-rs/crossbeam

---

https://github.com/rayon-rs/rayon

" You can use Rayon in two ways. Which way you will want will depend on what you are doing:

    Parallel iterators: convert iterator chains to execute in parallel.
    The join method: convert recursive, divide-and-conquer style problems to execute in parallel.

...

Parallel Iterators

Rayon supports an experimental API called "parallel iterators". These let you write iterator-like chains that execute in parallel. For example, to compute the sum of the squares of a sequence of integers, one might write:

use rayon::prelude::*; fn sum_of_squares(input: &[i32]) -> i32 { input.par_iter() .map(

i * i)

         .sum()}

Or, to increment all the integers in a slice, you could write:

use rayon::prelude::*; fn increment_all(input: &mut [i32]) { input.par_iter_mut() .for_each(

}

*p += 1);

...

Using join for recursive, divide-and-conquer problems

Parallel iterators are actually implemented in terms of a more primitive method called join. join simply takes two closures and potentially runs them in parallel. For example, we could rewrite the increment_all function we saw for parallel iterators as follows (this function increments all the integers in a slice):

/ Increment all values in slice. fn increment_all(slice: &mut [i32]) { if slice.len() < 1000 { for p in slice { *p += 1; } } else { let mid_point = slice.len() / 2; let (left, right) = slice.split_at_mut(mid_point); rayon::join(

increment_all(left),

increment_all(right));

}}

Perhaps a more interesting example is this parallel quicksort:

fn quick_sort<T:PartialOrd?+Send>(v: &mut [T]) { if v.len() <= 1 { return; }

    let mid = partition(v);
    let (lo, hi) = v.split_at_mut(mid);
    rayon::join(|| quick_sort(lo), || quick_sort(hi));}

Note though that calling join is very different from just spawning two threads in terms of performance. This is because join does not guarantee that the two closures will run in parallel. If all of your CPUs are already busy with other work, Rayon will instead opt to run them sequentially. The call to join is designed to have very low overhead in that case, so that you can safely call it even with very small workloads (as in the example above).

...

However, this safety does have some implications. You will not be able to use types which are not thread-safe (i.e., do not implement Send) from inside a join closure. Note that almost all types are in fact thread-safe in Rust; the only exception is those types that employ "interior mutability" without some form of synchronization, such as RefCell? or Rc. Here is a list of the most common types in the standard library that are not Send, along with an alternative that you can use instead which is Send (but which also has higher overhead, because it must work across threads):

    Cell -- replacement: AtomicUsize, AtomicBool, etc (but see warning below)
    RefCell -- replacement: RwLock, or perhaps Mutex (but see warning below)
    Rc -- replacement: Arc

However, if you are converting uses of Cell or RefCell?, you must be prepared for other threads to interject changes. For more information, read the section on atomicity below.

How it works: Work stealing

Behind the scenes, Rayon uses a technique called work stealing to try and dynamically ascertain how much parallelism is available and exploit it. The idea is very simple: we always have a pool of worker threads available, waiting for some work to do. When you call join the first time, we shift over into that pool of threads. But if you call join(a, b) from a worker thread W, then W will place b into its work queue, advertising that this is work that other worker threads might help out with. W will then start executing a. "

"Rayon is pretty cool. It's about as powerful as OpenMP?, and a bit easier to use." [6]

---

from HN discussion on https://www.robinwieruch.de/neural-networks-deeplearnjs-javascript/

nerfhammer 21 hours ago [-]

I was going to write a cynical complaint about how it's hardly useful without GPU support... but it's using WebGL? to hit the GPU. Of course. And it's probably a million times easier than trying install a TensorFlow? stack locally on your desktop.

rawnlq 21 hours ago [-]

Dumb question, but can someone give me a summary of how you can implement this in webgl?

I thought there are only vertex/fragment shaders and compute shaders aren't supported yet? Do you just pretend everything is pixel data?

nsthorat 20 hours ago [-]

Author of deeplearn.js here. A quick summary:

We store NDArrays as floating point WebGLTextures? (in rgba channels). Mathematical operations are defined as fragment shaders that operate on WebGLTextures? and produce new WebGLTextures?.

The fragment shaders we write operate in the context of a single output value of our result NDArray, which gets parallelized by the WebGL? stack. This is how we get the performance that we do.

---

some interesting stuff about what coroutines do in Python that we should offer analogs for in Oot:

[7]

---

"Feature 7: yield from

Instead of writing

for i in gen(): yield i

Just write

yield from gen()

Easily refactor generators into subgenerators.

" For simple iterators, yield from iterable is essentially just a shortened form of for item in iterable: yield item: >>>

>>> def g(x): ... yield from range(x, 0, -1) ... yield from range(x) ... >>> list(g(5)) [5, 4, 3, 2, 1, 0, 1, 2, 3, 4]

However, unlike an ordinary loop, yield from allows subgenerators to receive sent and thrown values directly from the calling scope, and return a final value to the outer generator: " [www.asmeurer.com/python3-presentation/slides.html]

---

lists of IPC mechanisms:

http://kamalmarhubi.com/blog/2015/06/01/a-list-of-linux-ipc-mechanisms/

    pipes
    FIFOs
    POSIX IPC (semaphores, message queues, shared memory)
    memfd
    UNIX domain sockets
    TCP/UDP sockets on loopback interface
    eventfd
    splice and friends
    signals
    the filesystem, including mmap‘ed files
    process_vm_readv and process_vm_writev
    ptrace with PTRACE_PEEK{TEXT,DATA,USER} and PTRACE_POKE{TEXT,DATA,USER}

(that link does not contain further explanation of the mechanisms)

http://www.chandrashekar.info/articles/linux-system-programming/introduction-to-linux-ipc-mechanims.html

    Signals
    Anonymous Pipes
    Named Pipes or FIFOs
    SysV Message Queues
    POSIX Message Queues
    SysV Shared memory
    POSIX Shared memory
    SysV semaphores
    POSIX semaphores
    FUTEX locks
    File-backed and anonymous shared memory using mmap
    UNIX Domain Sockets
    Netlink Sockets
    Network Sockets
    Inotify mechanisms
    FUSE subsystem
    D-Bus subsystem

(that link contains further explanation of each mechanism)

http://tldp.org/LDP/lpg/node7.html

    6.2 Half-duplex UNIX Pipes
    6.3 Named Pipes (FIFOs - First In First Out)
    6.4 System V IPC
        6.4.2 Message Queues
        6.4.3 Semaphores
        6.4.4 Shared Memory

(that link contains further explanation of each mechanism)

https://www.tldp.org/LDP/tlk/ipc/ipc.html

5.1 Signals 5.2 Pipes 5.3 Sockets 5.3.1 System V IPC Mechanisms 5.3.2 Message Queues 5.3.3 Semaphores 5.3.4 Shared Memory

(that link contains further explanation of each mechanism)

http://man7.org/conf/lca2013/IPC_Overview-LCA-2013-printable.pdf

communication:

Pipes
FIFOs
Pseudoterminals
Sockets
Stream vs Datagram (vs Seq. packet)
UNIX vs Internet domain
POSIX message queues
POSIX shared memory
POSIX semaphores
Named, Unnamed
System V message queues
System V shared memory
System V semaphores
Shared memory mappings
File vs Anonymous
Cross-memory attach
proc_vm_readv() / proc_vm_writev()

signals:

Signals
- Standard, Realtime

synchronization:

Eventfd
Futexes
Record locks
File locks
Mutexes
Condition variables
Barriers
Read-write locks

(that link contains further explanation of some mechanisms)

macos (macintosh) ipc mechanisms

https://books.google.com/books?id=K8vUkpOXhN4C&pg=PA1024&lpg=PA1024&dq=mac+ipc+mechanisms&source=bl&ots=OMinY_ZwYB&sig=zsWQPuAEROqMLvs726Ea_Y0U3pQ&hl=en&sa=X&ved=0ahUKEwiXl82FsrDZAhVQ52MKHeJoBUYQ6AEISDAE#v=onepage&q=mac%20ipc%20mechanisms&f=false

Mach IPC -- the lowest level IPC mechanism and the direct basis for many higher-level mechanisms
Mach exceptions
Unix signals
Unnamed pipes
Named pipes (fifos)
XSI/System V IPC
POSIX IPC
Distributed Objects
Apple Events
Various interfaces for sending and receiving notifications, such as notify(3) and kqueue(2)
Core Foundation IPC mechanisms

(that link contains further explanation of each mechanism)

https://www.slideshare.net/Hem_Dutt/ipc-on-mac-osx

message passing
synchronization
shared memory
remote procedure calls (RPC) ...
Shared File
Shared Memory
Mach Ports
Sockets
Apple Events
Distributed Notifications (NSNotification)
Pasteboard (copy-paste between applications)
Distributed Objects (Cocoa?)

(that link contains only a little further explanation of some mechanisms)

http://nshipster.com/inter-process-communication/

" ...a handful of overlapping, mutually-incompatible IPC technologies are scattered across various abstraction layers. Whereas all of these are available on OS X, only Grand Central Dispatch and Pasteboard (albeit to a lesser extent) can be used on iOS.[1]

Mach Ports
Distributed Notifications
Distributed Objects
AppleEvents? & AppleScript?
Pasteboard
XPC

(that link contains only a little further explanation of some mechanisms)

---

http://blog.bfitz.us/?p=2252

Interprocess Communication – Mac, Windows, Linux

" IPC on Mac

...

Sockets
Mach ports
Pipes
FIFOs
Shared memory

There is also XPC, but this requires you to refactor your code into XPC services, so this isn’t really IPC as generally talked about.

IPC on Windows

Data Copy (WM_COPYDATA)
File mapping
Named shared memory
Pipes
RPC
Windows Sockets

This ignores several older techniques such as Clipboard, COM, or DDE. Mailslots are technically IPC, but generally used for broadcast and best-effort instead of point-to-point guaranteed communication. "

(that link contains further explanation of each mechanism in the Windows section only)

IPC on windows:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365574(v=vs.85).aspx

Clipboard
COM
Data Copy
DDE
File Mapping
Mailslots
Pipes
RPC
Windows Sockets

(that link contains further explanation of each mechanism)

https://www.slideshare.net/mfsi_vinothr/ipc-mechanisms-in-windows

(discusses the same mechanisms as previous link, providing more concise further explanation of each mechanism)

https://www.codeproject.com/Articles/13724/Windows-IPC

message queues with WM_COPYDATA (not further discussed)
Clipboard (not further discussed)
COM/DCOM/RPC (not further discussed)
Shared Memory (memory mapped object).
Named Pipe.
WinSock?.
Mailslot.

synch objects:

Mutex
Semaphore
Events

(that link contains further explanation of each mechanism)

POSIX IPC: variant of System V IPC: messages, semphores, shared memory

https://stackoverflow.com/questions/4582968/system-v-ipc-vs-posix-ipc

https://en.wikipedia.org/wiki/Inter-process_communication

File
Signal
Socket
Unix domain socket
Message queue
Pipe
Named pipe
Shared memory
Message passing
Memory-mapped file

Examples of synchronization primitives:

Semaphore
Spinlock
Barrier
Mutual exclusion

Applications:

Java's Remote Method Invocation (RMI)
XML-RPC or SOAP
JSON-RPC
https://en.wikipedia.org/wiki/Windows_Communication_Foundation

Platform communication stack:

Operating system communication stack:

https://en.wikipedia.org/wiki/Apple_event
The Mach kernel's Mach Ports
Microsoft's ActiveX?, Component Object Model (COM), Microsoft Transaction Server (COM+), Distributed Component Object Model (DCOM), Dynamic Data Exchange (DDE), Object Linking and Embedding (OLE), anonymous pipes, named pipes, Local Procedure Call, MailSlots?, Message loop, MSRPC, .NET Remoting, and Windows Communication Foundation (WCF)
System V's message queues, semaphores, and shared memory
POSIX mmap, message queues, semaphores,[2] and shared memory
QNX's PPS (Persistent Publish/Subscribe) service
https://en.wikipedia.org/wiki/OpenBinder

https://en.wikipedia.org/wiki/Template:IPC

Methods: File

Memory-mapped file
Message passing
Message queue and mailbox
Named pipe
Anonymous pipe
Pipe
Semaphore
Shared memory
Signal
Sockets (Internet, Unix)

Protocols and standards:

Apple events
COM+
CORBA
D-Bus
DDS
DCE
ICE
OpenBinder?
ONC RPC
POSIX (various methods) SOAP REST
Thrift
TIPC
XML-RPC
D-bus
https://en.wikipedia.org/wiki/Libevent
SIMPL
LINX

ios ipc:

http://iphonedevwiki.net/index.php/IPC

CPDistributedMessagingCenter?
LightMessaging?
Unix sockets
XPC
Notifications
libobjcipc

http://nshipster.com/inter-process-communication/ "Whereas all of these are available on OS X, only Grand Central Dispatch and Pasteboard (albeit to a lesser extent) can be used on iOS.[1]"

https://stackoverflow.com/questions/26373297/is-inter-process-communication-possible-between-ios-applications-using-sockets

" iOS8 introduced IPC support by exposing mach ports for so called "application groups". Check out this great tutorial:

http://ddeville.me/2015/02/interprocess-communication-on-ios-with-mach-messages "

http://ddeville.me/2015/02/interprocess-communication-on-ios-with-berkeley-sockets "With iOS 8, Apple introduced App Extensions. App Extensions are self-contained apps that developers can ship along with their main application"

https://developer.apple.com/library/content/documentation/iPhone/Conceptual/iPhoneOSProgrammingGuide/Inter-AppCommunication/Inter-AppCommunication.html

"Apps communicate only indirectly with other apps on a device. You can use AirDrop? to share files and data with other apps. You can also define a custom URL scheme so that apps can send information to your app using URLs. Note: You can also send files between apps using a UIDocumentInteractionController? object or a document picker"

https://academy.realm.io/posts/thomas-goyne-fast-inter-process-communication/

https://www.safaribooksonline.com/library/view/ios-application-security/9781457198830/ch08.html "

8INTERPROCESS COMMUNICATION

Interprocess communication (IPC) on iOS is, depending on your perspective, refreshingly simple or horribly limiting. I mostly consider it to be the former. While Android has flexible IPC mechanisms such as Intents, Content Providers, and Binder, iOS has a simple system based on two components: message passing via URLs and application extensions. The message passing helps other applications and web pages invoke your application with externally supplied parameters. Application extensions are intended to extend the functionality of the base system, providing services such as sharing, storage, and the ability to alter the functionality of the Today screen or keyboard. "

android ipc:

https://stackoverflow.com/questions/5740324/what-are-the-ipc-mechanisms-available-in-the-android-os

binders or Messengers with a Service
intents (with Bundles included)
ASHMEM (Anonymous Shared Memory) - "The main difference between Linux shared memory and this shared memory is, in Linux other processes can't free the shared memory but here if other processes require memory this memory can be freed by Android OS." "Only uses ASHMEM if you know what you're doing. The compatibility between different android versions is not"
UNIX sockets and pipes
BroadcastReceivers?
ContentProviders?

QNX: http://www.qnx.com/developers/docs/6.5.0/index.jsp?topic=%2Fcom.qnx.doc.neutrino_sys_arch%2Fipc.html

http://www.qnx.com/developers/docs/6.5.0/index.jsp?topic=%2Fcom.qnx.doc.neutrino_lib_ref%2Fm%2Fmsgsend.html

Synchronous message passing
Message copying
Simple messages
Channels and connections
Message-passing API
Robust implementations with Send/Receive/Reply
Events
Signals
POSIX message queues
Shared memory
Typed memory
Pipes and FIFOs "

" QNX Neutrino offers at least the following forms of IPC: Service:

Message-passing
Signals
POSIX message queues
Shared memory
Pipes
FIFOs "

---

the (sorta) union of methods mentioned in the previous section:

ios message passing via URLs
android binders or Messengers with a Service
android intents (with Bundles included)
android ASHMEM (Anonymous Shared Memory) - "The main difference between Linux shared memory and this shared memory is, in Linux other processes can't free the shared memory but here if other processes require memory this memory can be freed by Android OS." "Only uses ASHMEM if you know what you're doing. The compatibility between different android versions is not"
UNIX sockets
UNIX pipes
android BroadcastReceivers?
android ContentProviders?