Of course how accesses on the abstract machine maps to the actual hardware is implementation defined, although most will document to translate them to plain movs.
And implementations have bugs of course; here [1] gcc removing a volatile access to an otherwise unused volatile parameter is considered a bug, even when the parameter is actually passed via register (in this case I would say the standard is underspecified).
You are absolutely correct regarding non-volatile atomics though.