Yes good point, I did wonder this... it's easy to look at the individual operations and think "that's cheap" but holistically, the GCD which I assume is expensive could make the overall performance benefits moot. It might be optimisable to not do it unnecessarily after _every_ operation though?
Perhaps then the only advantage in terms of hardware use is that it can be implemented on a CPU with only ints e.g small micro-controllers with no FPU or int divide. I think that by comparison implementing a soft FPU would be impractical, still niche I guess.