[Python-Dev] Re: [Python-checkins] python/nondist/peps pep-0329.txt, 1.2, 1.3
Adopt Jack Diederich's suggested module name.
I think pragma.py is a poor name for this, because (a) pragma is a candidate keyword (it has keyword status in most languages that have it) and (b) the word pragma implies compiler directives of any kind, not just the specific function proposed in this PEP. Also, a heads up: unless this PEP gets a lot more support from folks whose first name isn't Raymond, I'm going to reject it. --Guido van Rossum (home page: http://www.python.org/~guido/)
At 08:31 AM 4/20/04 -0700, Guido van Rossum wrote:
Adopt Jack Diederich's suggested module name.
I think pragma.py is a poor name for this, because (a) pragma is a candidate keyword (it has keyword status in most languages that have it) and (b) the word pragma implies compiler directives of any kind, not just the specific function proposed in this PEP.
Also, a heads up: unless this PEP gets a lot more support from folks whose first name isn't Raymond, I'm going to reject it.
Would it be salvageable if it were changed to: * Get rid of bytecode hacking, in favor of a change to the compiler * Optimize builtins *only*, and only those that are never assigned to by the module * use a __future__ statement to enable the behavior initially, before making it the default in a future release * have module.__setattr__ warn when shadowing a previously unshadowed builtin (unless the module uses the __future__ statement, in which case it's an error) Would this be acceptable? It seems to me that this approach would allow Jython and IronPython the option in future of replacing lookups of builtins with static field accesses and/or method calls, which would give them quite a potential performance boost.
Would it be salvageable if it were changed to:
* Get rid of bytecode hacking, in favor of a change to the compiler
* Optimize builtins *only*, and only those that are never assigned to by the module
* use a __future__ statement to enable the behavior initially, before making it the default in a future release
* have module.__setattr__ warn when shadowing a previously unshadowed builtin (unless the module uses the __future__ statement, in which case it's an error)
Would this be acceptable? It seems to me that this approach would allow Jython and IronPython the option in future of replacing lookups of builtins with static field accesses and/or method calls, which would give them quite a potential performance boost.
It is quite the opposite of the PEP! The PEP proposes a quick, very visible hack that works only for one implementation; your proposal here lays the foundation for changing the language to enable the same kind of optimizations. I like that much better, but I doubt that it is doable in the timeframe for 2.4, nor do I think it is needed. Also, your 4th bullet proposes exactly (except for the __future__ statement) what was implemented in moduleobject.c in rev 2.46 and then withdrawn in rev 2.47; it is not feasible for a number of reasons (see python-dev for the gory details; I don't recall what they were, just that they were convincing). The __future__ statement sounds like an excellent idea to me, as it enables experimentation with the new feature. One thing: we need to specify the future behavior very carefully so that other Python implementations will be able to do the right thing without having to reverse-engineer CPython. --Guido van Rossum (home page: http://www.python.org/~guido/)
At 09:50 AM 4/20/04 -0700, Guido van Rossum wrote:
It is quite the opposite of the PEP! The PEP proposes a quick, very visible hack that works only for one implementation; your proposal here lays the foundation for changing the language to enable the same kind of optimizations.
I like that much better, but I doubt that it is doable in the timeframe for 2.4, nor do I think it is needed. Also, your 4th bullet proposes exactly (except for the __future__ statement) what was implemented in moduleobject.c in rev 2.46 and then withdrawn in rev 2.47; it is not feasible for a number of reasons (see python-dev for the gory details; I don't recall what they were, just that they were convincing).
Reviewing the problems I see that the issues are with: 1) extension modules have a problem during initialization if they use setattr 2) modules and subpackages of package, that have names that shadow builtins #1 is fixable, as long as there is some kind of flag for the module's state, which is needed in order to support the __future__ statement anyway. I don't know how this would affect Jython or IronPython, but I don't think they really have "extension modules" as such. #2 is harder, because it needs sane rules. I don't think the parser should have to know about other modules. But banning modules named after builtins isn't appropriate either. OTOH, the only time this causes ambiguity is when the following conditions *all* apply: * The module is a package __init__ * The module contains functions or methods that reference the builtin name * The module does not directly import the name I'm tempted to say that this is broken code, except that it's possible for the module to import a module that then imports the module that has the conflicting name. But I believe I have a solution. See below.
The __future__ statement sounds like an excellent idea to me, as it enables experimentation with the new feature. One thing: we need to specify the future behavior very carefully so that other Python implementations will be able to do the right thing without having to reverse-engineer CPython.
Here is my proposal for the semantics of "optimized builtins": * The compiler identifies names in a module that are builtin names (as defined by the language version), but are never assigned to (or otherwise declared global) in the module. It adds code at the beginning of the module that sets a module-level variable, let's say '__builtins_used__', to list those names whose use it *may* be able to optimize. (Note that this step applies to all modules compiled, not just those with the __future__ statement, and there is some additional complexity needed in the generated code to correctly handle being 'exec'-d in an existing namespace.) * If the __future__ statement is in effect, also set a '__builtins_optimized__' flag in the module dictionary, and actually implement any desired optimizations. * module.__setattr__ either warns or issues an error when setting a name listed in '__builtins_used__', depending on the status of '__builtins_optimized__'. If either is missing, the current (backward-compatible) semantics of setattr should apply. Note that if an extension module uses setattr to initialize itself, it will not break, because it does not have a '__builtins_used__' attribute. Also note that mere container packages will not break because they contain modules or packages named after builtins. Only packages which actually do something with the contained module, while also failing to bind that name, will receive warnings. Such modules can then simply add an explicit e.g. 'import .list' or 'global list' in the appropriate function(s), or use some similar approach to clarify that the named item is a module. There would be some potential pain here when new builtins are added, however, since previously-working code could break. There's a way to fix that too, but it may be a bit harsh. Issue a warning for ambiguous use of global names that are *not* builtin, but are not explicitly bound by the module. That is, if I use the name 'foo' in a function, and it is not a local, and is not declared 'global' or explicitly bound by module level code (i.e. because I am hacking globals() or because the name is a submodule), I should be warned that I should explicitly declare my intended usage. E.g. "Name 'foo' is never assigned a value: perhaps you're missing a 'global' declaration or 'import' statement?" The warning could be introduced as a PendingDeprecationWarning, upgraded to a warning for modules using the __future__ statement. This would then discourage writing such ambiguous code in future. (Oh, and speaking of ambiguity, use of 'import *' would either have to be forbidden in optimized modules, disable all optimization, or else use setattr and thus break at runtime if there's a conflict with an optimized name.) Whew. That's quite a list of things that would have to be done, but presumably we'll have to pay the piper (pyper?) sometime if we want to get to optimized builtin land someday.
On Tue, 2004-04-20 at 12:16, Phillip J. Eby wrote:
Would this be acceptable? It seems to me that this approach would allow Jython and IronPython the option in future of replacing lookups of builtins with static field accesses and/or method calls, which would give them quite a potential performance boost.
This is basically what IronPython does already. A sufficiently clever implementation can make the current semantics go fast enough. Thus, we'd only need changes to the compiler, not even changes to the language. This is the most attractive option to me (see PEP 267). I don't like the PEP 329 approach because it adds extra complexity to each module to work around a limitation of the current implementation that will almost surely disappear one day. Jeremy
At 04:50 PM 4/20/04 -0400, Jeremy Hylton wrote:
On Tue, 2004-04-20 at 12:16, Phillip J. Eby wrote:
Would this be acceptable? It seems to me that this approach would allow Jython and IronPython the option in future of replacing lookups of builtins with static field accesses and/or method calls, which would give them quite a potential performance boost.
This is basically what IronPython does already. A sufficiently clever implementation can make the current semantics go fast enough.
Actually, I suspect the cleverness of implementation required is what has kept anybody from actually implementing it for CPython at least. :)
Thus, we'd only need changes to the compiler, not even changes to the language. This is the most attractive option to me (see PEP 267).
PEP 267 isn't backwards compatible either, since it requires that "Each code object would then be bound irrevocably to the module it was defined in." (And yes, there are lots of not-so-subtle problems with this, such as 'exec codeobject in dict'.) I think it would be helpful for the language spec to include the rules for potential optimizing of built-ins, and cover what happens if you attempt to replace a builtin whose dynamic nature has been optimized away.
I don't like the PEP 329 approach because it adds extra complexity to each module to work around a limitation of the current implementation that will almost surely disappear one day.
Agreed. The thing I'm proposing really needs another PEP. Unfortunately, I need authorship or co-authorship of *another* PEP like I need another hole in my head. :) (I've been asked to help w/246 and 318 already, and have had a web container API pre-PEP on the back burner for a few months now.)
On Tue, 2004-04-20 at 17:21, Phillip J. Eby wrote:
At 04:50 PM 4/20/04 -0400, Jeremy Hylton wrote:
On Tue, 2004-04-20 at 12:16, Phillip J. Eby wrote:
Would this be acceptable? It seems to me that this approach would allow Jython and IronPython the option in future of replacing lookups of builtins with static field accesses and/or method calls, which would give them quite a potential performance boost.
This is basically what IronPython does already. A sufficiently clever implementation can make the current semantics go fast enough.
Actually, I suspect the cleverness of implementation required is what has kept anybody from actually implementing it for CPython at least. :)
I'll grant you it's easier in a language like C#, but I don't think such code would be excessively complicated. The implementation needs a few levels of indirection, but not much more.
Thus, we'd only need changes to the compiler, not even changes to the language. This is the most attractive option to me (see PEP 267).
PEP 267 isn't backwards compatible either, since it requires that "Each code object would then be bound irrevocably to the module it was defined in." (And yes, there are lots of not-so-subtle problems with this, such as 'exec codeobject in dict'.)
I haven't read the text of PEP 267 in a quite a while ;-). At some point, we worked out a scheme that was completely backwards compatible. There are a couple of realistic options. One is to update the code at function definition time to insert the offsets for globals that are appropriate for the module. Another option is to initialize a table in the frame that is used for globals bindings.
I think it would be helpful for the language spec to include the rules for potential optimizing of built-ins, and cover what happens if you attempt to replace a builtin whose dynamic nature has been optimized away.
I'm not opposed to that, but it isn't required to get good performance for globals.
I don't like the PEP 329 approach because it adds extra complexity to each module to work around a limitation of the current implementation that will almost surely disappear one day.
Agreed. The thing I'm proposing really needs another PEP. Unfortunately, I need authorship or co-authorship of *another* PEP like I need another hole in my head. :) (I've been asked to help w/246 and 318 already, and have had a web container API pre-PEP on the back burner for a few months now.)
I know how you feel, even though I've only got one PEP that I'm currently responsible for. Jeremy
At 09:31 PM 4/20/04 -0400, Jeremy Hylton wrote:
On Tue, 2004-04-20 at 17:21, Phillip J. Eby wrote:
I think it would be helpful for the language spec to include the rules for potential optimizing of built-ins, and cover what happens if you attempt to replace a builtin whose dynamic nature has been optimized away.
I'm not opposed to that, but it isn't required to get good performance for globals.
I could be wrong, but it seems to me that globals shouldn't be nearly as bad for performance as builtins. A global only does one dict lookup, while builtins do two. Also, builtins can potentially be optimized away altogether (e.g. 'while True:') or converted to fast LOAD_CONST, or perhaps even a new CALL_BUILTIN opcode, assuming that adding the opcode doesn't blow the cacheability of the eval loop.
On Wed, 2004-04-21 at 10:50, Phillip J. Eby wrote:
I could be wrong, but it seems to me that globals shouldn't be nearly as bad for performance as builtins. A global only does one dict lookup, while builtins do two. Also, builtins can potentially be optimized away altogether (e.g. 'while True:') or converted to fast LOAD_CONST, or perhaps even a new CALL_BUILTIN opcode, assuming that adding the opcode doesn't blow the cacheability of the eval loop.
The coarse measurements I made a couple of years ago suggest that LOAD_GLOBAL is still substantially slower than LOAD_FAST. Less than 100 cycles for LOAD_FAST and about 400 cycles for LOAD_GLOBAL. http://zope.org/Members/jeremy/CurrentAndFutureProjects/PerformanceMeasureme... It would be good to repeat the measurements with current Python. I suspect it's a lot harder to figure out where to measure the start and stop times. The timestamps would need to be integrated with PREDICT and fast_next_opcode, for example. Jeremy
At 11:56 AM 4/22/04 -0400, Jeremy Hylton wrote:
I could be wrong, but it seems to me that globals shouldn't be nearly as bad for performance as builtins. A global only does one dict lookup, while builtins do two. Also, builtins can potentially be optimized away altogether (e.g. 'while True:') or converted to fast LOAD_CONST, or
On Wed, 2004-04-21 at 10:50, Phillip J. Eby wrote: perhaps
even a new CALL_BUILTIN opcode, assuming that adding the opcode doesn't blow the cacheability of the eval loop.
The coarse measurements I made a couple of years ago suggest that LOAD_GLOBAL is still substantially slower than LOAD_FAST. Less than 100 cycles for LOAD_FAST and about 400 cycles for LOAD_GLOBAL.
http://zope.org/Members/jeremy/CurrentAndFutureProjects/PerformanceMeasureme...
I notice the page says 400 cycles "on average" for LOAD_GLOBAL doing "one or two dictionary lookups", so I'm curious how many of those were for builtins, which in the current scheme are always two lookups. If it was half globals and half builtins, and the dictionary lookup is half the time, then having opcodes that know whether to look in globals or builtins would drop the time to 266 cycles, which isn't spectacular but is still good at only about 3.5 times the bytecode fetch overhead. If builtins are used more frequently than globals, the picture improves still further. Still, it's very interesting to see that loading a global takes almost as much time as calling a function! That's pretty surprising to me. I guess that's why doing e.g. '_len=len' for code that does a tight loop makes such a big difference to performance. I tend to do that with attribute lookups before a tight loop, e.g. 'bar = foo.bar', but I didn't realize that global and builtin lookups were almost as slow.
participants (3)
-
Guido van Rossum
-
Jeremy Hylton
-
Phillip J. Eby