memcpy again
Aug. 23rd, 2012 08:10 pmIt is fourth time this year I have to re-benchmark memory copy for a customer project. Each time there was a customer telling me: we need memcpy that is better then default in some specific configuration (CPU arch, OS, data amount, alignment, data location).
And it was usually possible to select the right one. Performance of different implementations of memcpy, including one from compiler library/libc differs significantly.
Except for my current case. I have three implementations: naive, SSE3, and REP MOV, and performance is exactly the same! Of course it is evident what kind of corner case I am exploring now :)
And it was usually possible to select the right one. Performance of different implementations of memcpy, including one from compiler library/libc differs significantly.
Except for my current case. I have three implementations: naive, SSE3, and REP MOV, and performance is exactly the same! Of course it is evident what kind of corner case I am exploring now :)