At 11:56 AM 4/22/04 -0400, Jeremy Hylton wrote:
I could be wrong, but it seems to me that globals shouldn't be nearly as bad for performance as builtins. A global only does one dict lookup, while builtins do two. Also, builtins can potentially be optimized away altogether (e.g. 'while True:') or converted to fast LOAD_CONST, or
On Wed, 2004-04-21 at 10:50, Phillip J. Eby wrote: perhaps
even a new CALL_BUILTIN opcode, assuming that adding the opcode doesn't blow the cacheability of the eval loop.
The coarse measurements I made a couple of years ago suggest that LOAD_GLOBAL is still substantially slower than LOAD_FAST. Less than 100 cycles for LOAD_FAST and about 400 cycles for LOAD_GLOBAL.
http://zope.org/Members/jeremy/CurrentAndFutureProjects/PerformanceMeasureme...
I notice the page says 400 cycles "on average" for LOAD_GLOBAL doing "one or two dictionary lookups", so I'm curious how many of those were for builtins, which in the current scheme are always two lookups. If it was half globals and half builtins, and the dictionary lookup is half the time, then having opcodes that know whether to look in globals or builtins would drop the time to 266 cycles, which isn't spectacular but is still good at only about 3.5 times the bytecode fetch overhead. If builtins are used more frequently than globals, the picture improves still further. Still, it's very interesting to see that loading a global takes almost as much time as calling a function! That's pretty surprising to me. I guess that's why doing e.g. '_len=len' for code that does a tight loop makes such a big difference to performance. I tend to do that with attribute lookups before a tight loop, e.g. 'bar = foo.bar', but I didn't realize that global and builtin lookups were almost as slow.