notes-computer-programming-programmingLanguageDesign-prosAndCons-dataParallelism

seanmcdirmid 1 day ago

link

Erlang is not designed for parallel programming; it is designed for concurrent programming. These are two very different programming domains with different problems.

Every time someone conflates parallelism with concurrency...everyone gets very confused.

reply

unoti 23 hours ago

link

Isn't it really fair to say that it's designed for both? The way it uses immutable state and something-similar-to-s-expressions to express data make it very straightforward (or even transparent) to distribute work between multiple processes and separate computers, in addition to how it makes it practical and simple to break work into small chunks that can be interleaved easily within the same thread. It's really designed for doing both very well, wouldn't you say?

reply

seanmcdirmid 23 hours ago

link

Not at all. Erlang isn't useful for modern parallel computing as we know it, which is usually done as some kind of SIMD program; say MapReduce? or GPGPU using something like CUDA. The benefit doesn't just come from operating on data all at once, but these systems (or the programmer) also do a lot of work to optimize the I/O and cache characteristics of the computation.

Actor architectures are only useful for task parallelism which no one really knows how to get much out of; definitely not the close-to-linear performance benefits we can get from data parallelism. Task parallelism is much better for when you have to do multiple things at once (more efficient concurrency), not for when you want to make a sequential task faster.

Maybe this will help

http://jlouisramblings.blogspot.com/2011/07/erlangs-parallel...

and

https://news.ycombinator.com/item?id=2726661

reply

reeses 7 hours ago

link

SIMD is a specialized form of parallelism. It is not the only definition of the term.

It should also be clear that task parallelism (or concurrency from your perspective) has not had the benefit of billions of engineer-hours focused on improving its performance. It is within recent memory that if you wanted 20+ CPUs at your disposal, you'd have to build a cluster with explicit job management, topologically-optimized communications, and a fair amount of physical redundancy.

As many of the applications requiring low-end clusters tended to involve random numbers or floating point calculations, we also had the annoyance of minor discrepancies such as clock drift affecting the final output. This would present, for example, in a proportional percentage of video frames with conspicuously different coloration.

seanmcdirmid 4 hours ago

link

Task parallelism was something used to work on 20 years ago when we thought it was the solution to scaling. But then we found that the supercomputer people were right all along, that the only thing that really scales very well is data parallelism. So the focus in the last 5/10 years has been finding data parallel solutions to the problems we care about (say deep neural network training), and then mapping them to either a distributed pipeline (MapReduce?) or GPU solution.

> It is within recent memory that if you wanted 20+ CPUs at your disposal, you'd have to build a cluster with explicit job management, topologically-optimized communications, and a fair amount of physical redundancy.

You are still thinking about concurrency, not parallelism. Yes, the cluster people had to think this way, they were interested in performance for processing many jobs; no the HPC people who needed performance never thought like this, they were only interested in the performance of one job.

> As many of the applications requiring low-end clusters tended to involve random numbers or floating point calculations, we also had the annoyance of minor discrepancies such as clock drift affecting the final output.

Part of the problem, I think, is that we've been confused for a long time. Our PHBs saw problems (say massive video frame processing) and saw solutions that were completely inappropriate for it (cluster computing). Its only recently that we've realized there are often other/better option (like running MapReduce? on that cluster).

reply

seanmcdirmid 15 hours ago

link

Yes, erlang is great for concurrency, GPUs are great for significant scalable parallelism. They both solve different problems, I agree, and that's my point.

reply

seanmcdirmid 4 hours ago

link

> but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.

MIMD hasn't been shown to scale, and its not locking that is the problem, but I/O and memory.

> To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).

When Martin Odersky began pushing actors as a solution to Scala and multi-core, this was my immediate thought: the Scala compiler is slow, can this make it go faster? It was pretty obvious after some thought that the answer was no. But then we have no way yet of casting compilation as a data-parallel task (a point in favor of task parallelism, but it doesn't help us much).

reply

(after a post in which seanmcdirmid asserts that MIMD (task parallelism) doesnt scale and SIMD (data parallelization) does:

seanmcdirmid 4 hours ago

link

> but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.

MIMD hasn't been shown to scale, and its not locking that is the problem, but I/O and memory.

> To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).

When Martin Odersky began pushing actors as a solution to Scala and multi-core, this was my immediate thought: the Scala compiler is slow, can this make it go faster? It was pretty obvious after some thought that the answer was no. But then we have no way yet of casting compilation as a data-parallel task (a point in favor of task parallelism, but it doesn't help us much).

reply

seanmcdirmid 4 hours ago

link

My main issue here is that people here "parallelism" and "a lot faster" they automatically think "scaling." But just hardware threading doesn't get us anywhere near that goal, even if we write our C-style multi-threaded code by hand very carefully.

The PL community is still not having honest up-to-date conversations about parallelism; they are about 20 years behind other fields.

reply

willvarfar 3 hours ago

link

Ok, you have to describe what you think the honest and up-to-date conversation about parallelism is then.

reply

seanmcdirmid 3 hours ago

link

Well, look at how people like Google's Jeff Dean, originally a PL person, became a systems person to basically attack parallelism problems head on. That is, look at the problems that NEED parallel computing, don't think of parallelism as a transparent benefit that is nice to have if it happens, and if it doesn't, its not the end of the world.

Once you accept that parallelism is needed, you realize that it is much more complex than just dividing things up onto multiple cores. That locking is never really the big problem, which really is one of concurrency, the problem becomes all about pumping data to the right place at the right time.

reply