
Note that optimizing compilers use a pile of linear-time heuristics to attempt to solve exponential-time optimization problems (from optimal register assignment to optimal instruction scheduling, they're all formally intractable even in isolation). When code gets non-trivial, not even a compiler's chief designer can reliably outguess what optimization may do. It's really not unusual for a higher optimization level to yield slower code, and especially not when the source code is pushing or exceeding machine limits (# of registers, # of instruction pipes, size of branch-prediction buffers; I-cache structure; dynamic restrictions on execution units; ...). [Jeremy]
... One of the differences between -O2 and -O3, according to the man page, is that -O3 will perform optimizations that involve a space-speed tradeoff. It also include -finline-functions. I can imagine that some of these optimizations hurt memory performance enough to make a difference.
One of the time-consuming ongoing tasks at my last employer was running profiles and using them to override counterproductive compiler inlining decisions (in both directions). It's not just memory that excessive inlining can screw up, but also things like running out of registers and so inserting gobs of register spill/restore code, and inlining so much code that the instruction scheduler effectively gives up (under many compilers, a sure sign of this is when you look at the generated code for a function, and it looks beautiful "at the top" but terrible "at the bottom"; some clever optimizers tried to get around that by optimizing "bottom-up", and then it looks beautiful at the bottom but terrible at the top <0.5 wink>; others work middle-out or burn the candle at both ends, with visible consequences you should be able to recognize now!). optimization-is-easier-than-speech-recog-but-the-latter-doesn't-work- all-that-well-either-ly y'rs - tim