On a Zen 3 CPU, "rep movsb" becomes faster than or the same as anything else above a length slightly greater than 2 kB.
However there is a range of multi-megabyte lengths, which correspond roughly with sizes below the L3 cache but exceeding the L2 cache, where for some weird reason "rep movsb" becomes slower than SIMD non-temporal stores.
At lengths exceeding the L3 size, "rep movsb" becomes again the fastest copy method.
The Intel CPUs have different behaviors.
No comments yet.