Because you wait but you don't lock. And the wait is not 100x of an instruction, normally it's the same time as for one single instruction. Writes are fast and atomic, because there's no lock.
Only the writer needs to wait a bit, not any other thread.
Google for lock-free vs wait-free