You should write code to minimize the reference count bumps; they are waste of time whether atomic or not.
If the code spends 0.5% of its time bumping references, and you magically reduce that to zero using alien optimization technology, that only gives you a 0.5% improvement.
If the code spends 10% of its time bumping references up and down, something is wrong.
It's even possible to share an Rc-managed value across threads without switching to Arc, as long as the other thread(s) never needs to change the reference count and can be "scoped" (https://doc.rust-lang.org/stable/std/thread/fn.scope.html) to some lifetime that some particular Rc outlives.
In C++ equivalent roadblocks are not sign posted. You may not even realise you're in trouble until some very strange errors begin to happen.
For the second paragraph, that depends a great deal on (a) what the mechanism used to "send a (shared, ref-counted reference thing) to another thread actually means and (b) what objects are used to accomplish this. Certainly simply writing the address of a shared_ptr<T> in C++ will work out as you indicate. But that's not the only way to do it. Rust's benefit comes from you being "unable" to do it an unsafe way; C++'s benefit comes from the fact that somebody has probably implemented the safe way in C++ already :)