Quite often, that doesn't end up very efficient, because without "restrict", the result has to be identical to what it would be if it was copied byte by byte, for all possible overlaps of the two inputs.
Lots of memcpy() implementations are still more efficient than a dumb byte-by-byte copy. They'll copy the (unaligned) head and the tail in bytes, but the bulk of the data using whatever data type and method is fastest.