There's one for PyTorch, I tested it about a year ago. You have to compile it from scratch and IIRC it translates/compile CUDA to ROCm at runtime which causes noticeable pauses on the first run. There may be other tweaks you have to do too. Once set up it performs decently, though.