Perhaps the rule in the standard is simple - the compiler can arbitrarily round to finer precision than IEEE, but in practice it's complicated as the same code can behave quite differently depending on what chip it's compiled for, the level of optimizations, and other factors. If you want to control it, ie model it as something other than nondeterminism, figuring out the right combination of compiler flags and so on is tricky.
I'll also point out that fma is relatively new, so it's pretty easy to write code that works fine when compiled with default x86_64/SSE2 but will break when compiled for a more recent target cpu.