1. Output framebuffer.
2. Polygon rasterization (often limited to points and triangles).
3. Texture sampling. This is accessible in CUDA (I have ~zero experience with other GPGPU systems).
4. Afaik also for blending. This might have stopped now.
5. Video codecs (MPEG-2, H.263, H.264, H.265, VP8, VP9, soon AV1) decoding, and also encoding for some of them.
Nvidia RTX also include ray tracing hardware that handles that task more efficienly (I presume by using fixed logic for dispatching memory/cache-aware computations like e.g. content-addressable memory and such).
Most things are handled by the shader cores. They are 1024bit SIMD with lane-masking until Volta, and a more flexible/arbitrary fork/join since Turing (not all Turing has the ray tracing hardware), which also brought a scalar execution port with it (like amd64 getting traditional RAX/RDX/etc. with their opcodes after only having AVX instructions). AMD GCN afaik has a quite explicit SIMD architecture, with a scalar execution port since inception. Also 1024bit iirc.