That said, last time I had to do that in software, I used Taylor series. Might not have been an optimal solution.
EDIT:
AMD's Zen 4 takes 50-200 cycles (latency) to compute sine. I think that strongly suggests AMD uses CORDIC. https://www.agner.org/optimize/instruction_tables.pdf page 130.
Same for Intel, Tiger Lake (Intel gen 11) has 60-120 cycles of latency. Page 353.
I'd guess usually ~50 cycles for Zen 4 (and ~60 for Intel) for float32, float64/float80 datatype. Denormals might also cost more cycles.