> whatever the reference counting algorithm is doing can be emulated by the hand-optimised C++ code so that's just literally impossible.
shared_ptr and unique_ptr have pretty significant overhead and are common practice, even for optimized codebases, so I wouldn't say it's impossible at all.
You are 100% right. At some point I measured smart_ptr to be 25x slower than raw pointers. The compiler I am maintaining is not using C++ style smart pointers. It is using a global whole program optimisation pass to reduce reference manipulation to a minimum. Basically what a world class C++ programmer with years of experience optimising performance would do. It is just done automatically.
unique_ptr is nearly overhead free compared to a bare pointer (except for argument passing... because of ABI concerns).
shared_ptr is expensive and easy to build leaks with, so a lot of code bases avoid it where possible. Though it's only expensive when you copy it, moving it is ~free.