Yes it was entirely covered by that :) I think it was covered by "naive memory m...

tomp · on June 3, 2014

Haha :) Maybe the shipped a better product, but the management said "No, it's not possible that this could run that fast. Something must be wrong.", so they put in some "waiting".

dbaupp · on June 3, 2014

If the problem was just the time taken to do a 1MB copy inside a loop, why did you say the problem was clearing the CPU caches?

pling · on June 3, 2014

Because the CPU has 32k of cache in this case (ARM) so the memcpy was evicting the entire cache several times in the loop as a side effect of doing the work. The actual function of the loop had good cache locality as the data was 6 stack vars totalling about 8k.

dbaupp · on June 3, 2014

So? Copying a megabyte is a really expensive thing to do inside a loop, even ignoring caches. (A full speed memcpy would take 40 microseconds, based on a memory bandwidth of 24 GB/s, which is a long time.)