I'm happy that Julia supports GPU programming for simple code, but I don't see how you can run algorithms with inter-thread communication.
https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/lin...
Or directly with cuda shfl intrinsics: https://github.com/JuliaGPU/CUDAnative.jl/blob/b249dfd145501...