Plus, x86-64 basically disables both the things you list. Not that it matters because what is a flat fs/gs/etc register when there is another whole level of page tables for the hypervisor. AKA, you do the translation and store it in a TLB. If you really want to compare this, time how long a modern x86 takes on TLB misses, or for that matter how fast its TLBs are. I think you fill find that they are industry leading...
Same basic thing for the x87, its likely mostly powered down, and when active is probably feeding micro-ops through a SSE functional unit....
So the original posters comment is likely correct, and that has been known for a decade+. X86 if anything has a few accidental advantages, and the idea that its somehow "worse" than the alternatives are provably wrong.
You should think about ISA implementations as verification problems instead of problems in building a silicon implementation of the ISA. From that perspective it should be obvious that intel has the best, richest, deepest verification set that exists in the CPU space and since it is tied to x86, that is an advantage for the two x86 vendors. Verification is incredibly hard for complex CPUs and building up verification is time intensive. This is part of the reason that the brainiac end of the design spectrum has pretty much become just x86 and POWER which has nearly the same history.