LOAD_NAME/LOAD_GLOBAL should be use getattr()
This is my idea of making module properties work. It is necessary for various lazy-loading module ideas and it cleans up the language IMHO. I think it may be possible to do it with minimal backwards compatibility problems and performance regression. To me, the main issue with module properties (or module __getattr__) is that you introduce another level of indirection on global variable access. Anywhere the module.__dict__ is used as the globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have another level of indirection is necessary. That seems inescapable. Introducing another special feature of modules to make this work is not the solution, IMHO. We should make module namespaces be more like instance namespaces. We already have a mechanism and it is getattr on objects. I have a very early prototype of this idea. See: https://github.com/nascheme/cpython/tree/exec_mod Issues to be resolved: - __namespace__ entry in the __dict__ creates a reference cycle. Maybe could use a weakref somehow to avoid it. Maybe we just explicitly break it. - getattr() on the module may return things that LOAD_NAME and LOAD_GLOBAL don't expect (e.g. things from the module type). I need to investigate that. - Need to fix STORE_* opcodes to do setattr() rather than __setitem__. - Need to optimize the implementation. Maybe the module instance can know if any properties or __getattr__ are defined. If no, have __getattribute__ grab the variable directly from md_dict. - Need to fix eval() to allow module as well as dict. - Need to change logic where global dict is passed around. Pass the module instead so we don't have to keep retrieving __namespace__. For backwards compatibility, need to keep functions that take 'globals' as dict and use PyModule_GetDict() on public APIs that return globals as a dict. - interp->builtins should be a module, not a dict. - module shutdown procedure needs to be investigated and fixed. I think it may get simpler. - importlib needs to be fixed to pass modules to exec() and not dicts. From my initial experiments, it looks like importlib gets a lot simpler. Right now we pass around dicts in a lot of places and then have to grub around in sys.modules to get the module object, which is what importlib usually wants. I have requested help in writing a PEP for this idea but so far no one is foolish enough to join my crazy endeavor. ;-) Regards, Neil
I'm worring about performance much. Dict has ma_version from Python 3.6 to be used for future optimization including global caching. Adding more abstraction layer may make it difficult. When considering lazy loading, big problem is backward compatibility. For example, see https://github.com/python/cpython/blob/master/Lib/concurrent/futures/__init_... from concurrent.futures._base import (FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED, CancelledError, TimeoutError, Future, Executor, wait, as_completed) from concurrent.futures.process import ProcessPoolExecutor from concurrent.futures.thread import ThreadPoolExecutor Asyncio must import concurrent.futures.Future because compatibility between asyncio.Future and concurrent.futures.Future. But not all asyncio applications need ProcessPoolExecutor. Thay may use only ThreadPoolExecutor. Currently, they are forced to import concurrent.futures.process, and it imports multiprocessing. It makes large import dependency tree. To solve such problem, hooking LOAD_GLOBAL is not necessary. # in concurrent/futures/__init__.py def __getattr__(name): if name == 'ProcessPoolExecutor': global ProcessPoolExecutor from .process import ProcessPoolExecutor return ProcessPoolExecutor # Following code should call __getattr__ from concurrent.futures import ProcessPoolExecutor # eager loading import concurrent.futures as futures executor = futures.ProcessPoolExecutor() # lazy loading On the other hand, lazy loading global is easier than above. For example, linecache imports tokenize and tokenize is relatively heavy. https://github.com/python/cpython/blob/master/Lib/linecache.py#L11 tokenize is used from only one place (in linecache.updatecache()). So lazy importing it is just moving `import tokenize` into the function. try: import tokenize with tokenize.open(fullname) as fp: lines = fp.readlines() I want to lazy load only for heavy and rarely used module Lazy loading many module may make execution order unpredictable. So manual lazy loading technique is almost enough to me. Then, what is real world requirement about abstraction layer to LOAD_GLOBAL? Regards, INADA Naoki <songofacandy@gmail.com> On Wed, Sep 13, 2017 at 1:17 AM, Neil Schemenauer <nas-python-ideas@arctrix.com> wrote:
This is my idea of making module properties work. It is necessary for various lazy-loading module ideas and it cleans up the language IMHO. I think it may be possible to do it with minimal backwards compatibility problems and performance regression.
To me, the main issue with module properties (or module __getattr__) is that you introduce another level of indirection on global variable access. Anywhere the module.__dict__ is used as the globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have another level of indirection is necessary. That seems inescapable.
Introducing another special feature of modules to make this work is not the solution, IMHO. We should make module namespaces be more like instance namespaces. We already have a mechanism and it is getattr on objects.
I have a very early prototype of this idea. See:
https://github.com/nascheme/cpython/tree/exec_mod
Issues to be resolved:
- __namespace__ entry in the __dict__ creates a reference cycle. Maybe could use a weakref somehow to avoid it. Maybe we just explicitly break it.
- getattr() on the module may return things that LOAD_NAME and LOAD_GLOBAL don't expect (e.g. things from the module type). I need to investigate that.
- Need to fix STORE_* opcodes to do setattr() rather than __setitem__.
- Need to optimize the implementation. Maybe the module instance can know if any properties or __getattr__ are defined. If no, have __getattribute__ grab the variable directly from md_dict.
- Need to fix eval() to allow module as well as dict.
- Need to change logic where global dict is passed around. Pass the module instead so we don't have to keep retrieving __namespace__. For backwards compatibility, need to keep functions that take 'globals' as dict and use PyModule_GetDict() on public APIs that return globals as a dict.
- interp->builtins should be a module, not a dict.
- module shutdown procedure needs to be investigated and fixed. I think it may get simpler.
- importlib needs to be fixed to pass modules to exec() and not dicts. From my initial experiments, it looks like importlib gets a lot simpler. Right now we pass around dicts in a lot of places and then have to grub around in sys.modules to get the module object, which is what importlib usually wants.
I have requested help in writing a PEP for this idea but so far no one is foolish enough to join my crazy endeavor. ;-)
Regards,
Neil _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Sep 13, 2017 at 12:24:31PM +0900, INADA Naoki wrote:
I'm worring about performance much.
Dict has ma_version from Python 3.6 to be used for future optimization including global caching. Adding more abstraction layer may make it difficult.
Can we make it opt-in, by replacing the module __dict__ when and only if needed? Perhaps we could replace it on the fly with a dict subclass that defines __missing__? That's virtually the same as __getattr__. Then modules which haven't replaced their __dict__ would not see any slow down at all. Does any of this make sense, or am I talking nonsense on stilts? -- Steve
On Thu, Sep 14, 2017 at 8:07 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Sep 13, 2017 at 12:24:31PM +0900, INADA Naoki wrote:
I'm worring about performance much.
Dict has ma_version from Python 3.6 to be used for future optimization including global caching. Adding more abstraction layer may make it difficult.
Can we make it opt-in, by replacing the module __dict__ when and only if needed? Perhaps we could replace it on the fly with a dict subclass that defines __missing__? That's virtually the same as __getattr__.
Then modules which haven't replaced their __dict__ would not see any slow down at all.
Does any of this make sense, or am I talking nonsense on stilts?
This is more or less what I was describing here: https://mail.python.org/pipermail/python-ideas/2017-September/047034.html I am also looking at Neil's approach this weekend though. I would be happy with a __future__ that enacted whatever concessions are necessary to define a module as if it were a class body, with import statements maybe being implicitly global. This "new-style" module would preferably avoid the need to populate `sys.modules` with something that can't possibly exist yet (since it's being defined!). Maybe we allow module bodies to contain a `return` or `yield`, making them a simple function or generator? The presence of either would activate this "new-style" module loading: * Modules that call `return` should return the completed module. Importing yourself indirectly would likely cause recursion or be an error (lazy importing would really help here!). Could conceptually expand to something like: ``` global __class__ global __self__ class __class__: def __new__(... namespace-dunders-and-builtins-passed-as-kwds ...): # ... module code ... # ... closures may access __self__ and __class__ ... return FancyModule(__name__) __self__ = __class__(__builtins__={...}, __name__='fancy', ...) sys.modules[__self__.__name__] = __self__ ``` * Modules that call `yield` should yield modules. This could allow defining zero modules, multiple modules, overwriting the same module multiple times. Module-level code may then yield an initial object so self-referential imports, in lieu of deferred loading, work better. They might decide to later upgrade the initial module's __class__ (similar to today) or replace outright. Could conceptually expand to something like: ``` global __class__ global __self__ def __hidden_TOS(... namespace-dunders-and-builtins-passed-as-kwds ...): # ... initial module code ... # ... closures may access __self__ and __class__ ... module = yield FancyModuleInitialThatMightRaiseIfUsed(__name__) # ... more module code ... module.__class__ = FancyModule for __self__ in __hidden_TOS(__builtins__={...}, __name__='fancy', ...): __class__ = __self__.__class__ sys.modules[__self__.__name__] = __self__ ``` Otherwise I still have a few ideas around using what we've got, possibly in a backwards compatible way: ``` global __builtins__ = {...} global __class__ global __self__ # Loader dunders. __name__ = 'fancy' # Deferred loading could likely stop this from raising in most cases. # globals is a deferred import dict using __missing__. # possibly sys.modules itself does deferred imports using __missing__. sys.modules[__name__] = RaiseIfTouchedElseReplaceAllRefs(globals()) class __class__: [global] import current_module # ref in cells replaced with __self__ [global] import other_module def bound_module_function(...): pass [global] def simple_module_function(...): pass # ... end module body ... # Likely still a descriptor. __dict__ = globals() __self__ = __class__() sys.modules[__self__.__name__] = __self__ ``` Something to think about. Thanks, -- C Anthony
On 13 September 2017 at 02:17, Neil Schemenauer <nas-python-ideas@arctrix.com> wrote:
Introducing another special feature of modules to make this work is not the solution, IMHO. We should make module namespaces be more like instance namespaces. We already have a mechanism and it is getattr on objects.
One thing to keep in mind is that class instances *also* allow their attribute access machinery to be bypassed by writing to the instance.__dict__ directly - it's just that the instance dict may be bypassed on lookup for data descriptors. So that means we wouldn't need to change the way globals() works - we'd just add the caveat that amendments made that way may be ignored for things defined as properties. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
12.09.17 19:17, Neil Schemenauer пише:
This is my idea of making module properties work. It is necessary for various lazy-loading module ideas and it cleans up the language IMHO. I think it may be possible to do it with minimal backwards compatibility problems and performance regression.
To me, the main issue with module properties (or module __getattr__) is that you introduce another level of indirection on global variable access. Anywhere the module.__dict__ is used as the globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have another level of indirection is necessary. That seems inescapable.
Introducing another special feature of modules to make this work is not the solution, IMHO. We should make module namespaces be more like instance namespaces. We already have a mechanism and it is getattr on objects.
There is a difference between module namespaces and instance namespaces. LOAD_NAME/LOAD_GLOBAL fall back to builtins if the name is not found in the globals dictionary. Calling __getattr__() will slow down the access to builtins. And there is a recursion problem if module's __getattr__() uses builtins.
On Wed, Sep 13, 2017 at 11:55 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
[...] Calling __getattr__() will slow down the access to builtins. And there is a recursion problem if module's __getattr__() uses builtins.
The first point is totally valid, but the recursion problem doesn't seem like a strong argument. There are already lots of recursion problems when defining custom __getattr__ or __getattribute__ methods, but on balance they're a very useful part of the language. - Lucas
13.09.17 23:07, Lucas Wiman пише:
On Wed, Sep 13, 2017 at 11:55 AM, Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:
[...] Calling __getattr__() will slow down the access to builtins. And there is a recursion problem if module's __getattr__() uses builtins.
The first point is totally valid, but the recursion problem doesn't seem like a strong argument. There are already lots of recursion problems when defining custom __getattr__ or __getattribute__ methods, but on balance they're a very useful part of the language.
In normal classes we have the recursion problem in __getattr__() only with accessing instance attributes. Builtins (like isinstance, getattr, AttributeError) can be used without problems. In module's __getattr__() all this is a problem. Module attribute access can be implicit. For example comparing a string with a byte object in __getattr__() can trigger the lookup of __warningregistry__ and the infinity recursion.
On Fri, Sep 15, 2017 at 12:08 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
13.09.17 23:07, Lucas Wiman пише:
On Wed, Sep 13, 2017 at 11:55 AM, Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:
[...] Calling __getattr__() will slow down the access to builtins. And there is a recursion problem if module's __getattr__() uses builtins.
The first point is totally valid, but the recursion problem doesn't seem like a strong argument. There are already lots of recursion problems when defining custom __getattr__ or __getattribute__ methods, but on balance they're a very useful part of the language.
In normal classes we have the recursion problem in __getattr__() only with accessing instance attributes. Builtins (like isinstance, getattr, AttributeError) can be used without problems. In module's __getattr__() all this is a problem.
Module attribute access can be implicit. For example comparing a string with a byte object in __getattr__() can trigger the lookup of __warningregistry__ and the infinity recursion.
Crazy idea: Can we just isolate that function from its module? def isolate(func): return type(func)(func.__code__, {"__builtins__": __builtins__}, func.__name__) @isolate def __getattr__(name): print("Looking up", name) # the lookup of 'print' will skip this module ChrisA
participants (8)
-
C Anthony Risinger
-
Chris Angelico
-
INADA Naoki
-
Lucas Wiman
-
Neil Schemenauer
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano