And yes, memcpy is fast, but I would not use a little program to convince myself. You will end up with stuff in CPU caches, etc, which will give you a very incorrect intuition.
Better to take a large program where there is a base factory and make some copies there or something and see how it affects things.
That said… for most businesses these days, developer time is more expensive than compute time, so if you’re not shipping an operating system or similar, it simply doesn’t matter.
And an optimizing compiler could do something like copy on write, and make much of the issue moot.
I had a brief period of time designing a simple CPU and it’s made everything since turn my stomach a little bit.