proj-oot-ootConcurrencyNotes7

great article on Julia GPU programming:

[1]

some notes:

  1. y's 1st dimension gets repeated for the 2nd dimension in x
  2. and the scalar z get's repeated for all dimensions
  3. the below is equal to `broadcast(+, broadcast(+, xx, y), z)` x .+ y .+ z

more: https://julialang.org/blog/2018/05/extensible-broadcast-fusion

https://julia.guide/broadcasting

    Conversions and copy! to CPU arrays
    multi dimensional indexing and slicing (xs[1:2, 5, :])
    ​permutedims​
    ​Concatenation (vcat(x, y), cat(3, xs, ys, zs))​
    ​map, fused broadcast (zs .= xs.^2 .+ ys .* 2)​
    ​fill(CuArray, 0f0, dims), fill!(gpu_array, 0) 
    Reduction over dimensions (reduce(+, xs, dims = 3), sum(x -> x^2, xs, dims = 1)
    Reduction to scalar (reduce(*, xs), sum(xs), prod(xs))
    Various BLAS operations (matrix*matrix, matrix*vector)
    FFTs, using the same API as julia's FFT​" (note: lots of hyperlinks in there in the original)

---

" I had some extended notes here about "less-mainstream paradigms" and/or "things I wouldn't even recommend pursuing", but on reflection, I think it's kinda a bummer to draw too much attention to them. So I'll just leave it at a short list: actors, software transactional memory, lazy evaluation, backtracking, memoizing, "graphical" and/or two-dimensional languages, and user-extensible syntax. If someone's considering basing a language on those, I'd .. somewhat warn against it. Not because I didn't want them to work -- heck, I've tried to make a few work quite hard! -- but in practice, the cost:benefit ratio doesn't seem to turn out really well. Or hasn't when I've tried, or in (most) languages I've seen. " [2]

---

" Heterogeneous memory and parallelism

These are languages that try to provide abstract "levels" of control flow and data batching/locality, into which a program can cast itself, to permit exploitation of heterogeneous computers (systems with multiple CPUs, or mixed CPU/GPUs, or coprocessors, clusters, etc.)

Languages in this space -- Chapel, Manticore, Legion -- haven't caught on much yet, and seem to be largely overshadowed by manual, not-as-abstract or not-as-language-integrated systems: either cluster-specific tech (like MPI) or GPU-specific tech like OpenCL?/CUDA. But these still feel clunky, and I think there's a potential for the language-supported approaches to come out ahead in the long run. " [3]