These compute units are usually sliced - they can perform either four FP32 multiples or one FP64 multiply on the same die part. This trick was done as long ago as PA-RISC was developed, from what I remember it was HP who introduced sliced ALU, capable of doing one large or several smaller operations on the same hardware.
I can be wrong about who did that first, but most FPUs now are done like that.