memory and string functions in libc have poor performance because you are not using XMM registers and you have no efficient way of dealing with unaligned data. The most efficient way of copying data when source and destination have different alignments is to read aligned into XMM registers; shift and combine consecutive reads so that they fit the alignment of the destination; then write aligned
A lot of the big distros dropped glibc and switched to eglibc because they didn't want to deal with the glibc maintainers and their refusal to fix things anymore: http://blog.aurel32.net/47
Besides, the glibc developers did optimize memory and string operations, and the result was that people moaned because their buggy Adobe Flash players crashed because they were relying on particular implementation details of previous un-optimized implementations of memcpy that were explicitly not guaranteed by C99.
Such free software!