
On 5/2/13 3:58 PM, Nathaniel Smith wrote:
callgrind has the *fabulous* kcachegrind front-end, but it only measures memory access performance on a simulated machine, which is very useful sometimes (if you're trying to optimize cache locality), but there's no guarantee that the bottlenecks on its simulated machine are the same as the bottlenecks on your real machine.
Agreed, there is no guarantee, but my experience is that kcachegrind normally gives you a pretty decent view of cache faults and hence it can do pretty good predictions on how this affects your computations. I have used this feature extensively for optimizing parts of the Blosc compressor, and I cannot be more happier (to the point that, if it were not for Valgrind, I could not figure out many interesting memory access optimizations). -- Francesc Alted