undefined | Better HN

0 pointsJonChesterfield2y ago0 comments

Documentation is lagging reality a bit, we'll probably fix that around the next llvm release. Some information is at https://libc.llvm.org/gpu/using.html

That GPU libc is mostly intended to bring things like fopen to openmp or cuda, but it turns out GPUs are totally usable as bare metal embedded targets. You can read/write to "host" memory, on that and a thread running on the host you can implement a syscall equivalent (e.g. https://dl.acm.org/doi/10.1145/3458744.3473357), and once you have syscall the doors are wide open. I particularly like mmap from GPU kernels.

0 comments

6 comments · 2 top-level

keldaris2y ago· 4 in thread

Is there a way to directly use these developments to already write a reasonable subset of C/C++ for simpler usecases (basically doing some compute and showing the results on screen by just manipulating pixels in a buffer like you would with a fragment/pixel shader) in a way that's portable (across the three major desktop platforms, at least) without dealing with cumbersome non-portable APIs like OpenGL, OpenCL, DirectX, Metal or CUDA? This doesn't require anything close to full libc functionality (let alone anything like the STL), but would greatly improve the ergonomics for a lot of developers.

JonChesterfieldOP2y ago

I'll describe what we've got, but fair warning that I don't know how the write pixels to the screen stuff works on GPUs. There are some instructions with weird names that I assume make sense in that context. Presumably one allocates memory and writes to it in some fashion.

LLVM libc is picking up capability over time, implemented similarly to the non-gpu architectures. The same tests run on x64 or the GPU, printing to stdout as they go. Hopefully standing up libc++ on top will work smoothly. It's encouraging that I sometimes struggle to remember whether it's currently running on the host or the GPU.

The datastructure that libc uses to have x64 call a function on amdgpu, or to have amdgpu call a function on x64, is mostly a blob of shared memory and careful atomic operations. That was originally general purpose and lived on a prototypey GitHub. Its currently specialised to libc. It should end up in an under-debate llvm/offload project which will make it easily reusable again.

This isn't quite decoupled from vendor stuff. The GPU driver needs to be running in the kernel somewhere. On nvptx, we make a couple of calls into libcuda to launch main(). On amdgpu, it's a couple of calls into libhsa. I did have an opencl loader implementation as well but that has probably rotted, intel seems to be on that stack but isn't in llvm upstream.

A few GPU projects have noticed that implementing a cuda layer and a spirv layer and a hsa or hip layer and whatever others is quite annoying. Possibly all GPU projects have noticed that. We may get an llvm/offload library that successfully abstracts over those which would let people allocate memory, launch kernels, use arbitrary libc stuff and so forth running against that library.

That's all from the compute perspective. It's possible I should look up what sending numbers over HDMI actually is. I believe the GPU is happy interleaving compute and graphics kernels and suspect they're very similar things in the implementation.

pjmlp2y ago

CUDA allows for straight C++ for quite some time, that is how renderers like nanite are written.

https://docs.nvidia.com/cuda/cuda-c-std/index.html

"C++ Standard Parallelism"

https://www.youtube.com/watch?v=nwrgLH5yAlM

Or if you prefer more vendor neutral,

https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-...

Currently with C++17 support.

SubjectToChange2y ago

I’m cautiously optimistic for SYCL. The absurd level of abstraction is a bit alarming, but single source performance portability would be a godsend for library authors.

1 more reply

SubjectToChange2y ago

If you’re willing to deal with 5 layers of C++ TMP, then a library like Kokkos will let you abstract over those APIs, or at least some of them. Eventually if or when SYCL is upstreamed in the llvm-project it’ll be possible to do it with clang directly.

KeplerBoy2y ago

This is super interesting, thanks!

j / k navigate · click thread line to collapse

0 comments

6 comments · 2 top-level

keldaris2y ago· 4 in thread

JonChesterfieldOP2y ago

pjmlp2y ago

CUDA allows for straight C++ for quite some time, that is how renderers like nanite are written.

https://docs.nvidia.com/cuda/cuda-c-std/index.html

"C++ Standard Parallelism"

https://www.youtube.com/watch?v=nwrgLH5yAlM

Or if you prefer more vendor neutral,

https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-...

Currently with C++17 support.

SubjectToChange2y ago

I’m cautiously optimistic for SYCL. The absurd level of abstraction is a bit alarming, but single source performance portability would be a godsend for library authors.

1 more reply

SubjectToChange2y ago

KeplerBoy2y ago

This is super interesting, thanks!

j / k navigate · click thread line to collapse