Idea to support lazy loaded names.

When developing Python libraries (and any other language for that matter), items are separated into modules for organization and other purpose. The following is an example fake framework to illustrate this: mini-framework/ __init__.py app.py class BaseApplication class Application class ProxyApplication config.py class Config class ConfigLoader template.py class Template class TemplateCompiler class TemplateManager In order to be used externally, one must reference the module: from miniframework.app import Application from miniframework.template import TemplateManager This can be solved by importing these values directly in __init__.py. However this requires fully loading those imported modules: __init__.py: from .app import BaseApplication, Application, ProxyApplication from .config import Config, ConfigLoader from .template import Template, TemplateCompiler, TemplateManager One idea I had was to support lazy imported names. I've seen some frameworks implement this in various means, and figured the idea is good to be part Python. The idea is that for a new module attribute to exist: __lazyload__. During the access of any attribute for a module, the following would happen: * If the named attribute already exists, use it * If the named attribute does not already exist: * If a lazy load of the name has already been attempted, result in a NameError * If a lazy load of the name has not yet been attempted: * Check the __lazyload__ module attribute for the name, perform the loading operation and assign the module attribute the value if found or result in a NameError * Even if not found, set a flag that lazy load has been attempted so it will not be attempted again for the same name The __lazyload__ attribute is intended to be a dictionary. The key of the dictionary is the name of the attribute that would be set/tested for in the module. The value of the dictionary is a string that represents either the module name to load or the module name and attribute to load. If the value starts with a dot, then it is treated as a relative import relative to the module/package containing the __lazyload__ value. With this idea, the packages __init__.py file could look like this: __lazyload__ = { 'BaseApplication': '.app.BaseApplication', 'Application': '.app.Application', ... } The end use of the package (and even the developer) can then perform an import as follows: from miniframework import Application instead of: from miniframework.app import Application This allows the public api be be cleaner, while still being efficient by not loading all modules in __init__.py until the value is actually accessed. Brian Allen Vanderburg II

On Mon, Oct 06, 2014 at 09:34:16PM -0400, Brian Allen Vanderburg II wrote:
[...]
This is a specific example of a more general technique, namely computed attributes. Rather than a solution to the specific example, I would rather have a general solution to the problem of having computed attributes in modules. We have half a solution already: the descriptor protocol, the most obvious example being property(). Well, perhaps a quarter of a solution. Assuming property(), or some equivalent, worked in modules, the solution to lazy loading is simple: have a getter that returns the attribute if it already exists, otherwise initialise it (importing it from elsewhere) then return it. The problem then becomes, how do we get descriptors to be run when accessing them via module attributes? The normal process is that if a name is found in the instance __dict__, the descriptor protocol is not followed: py> class Test(object): ... spam = property(lambda self: 23) ... def __init__(self): ... self.eggs = property(lambda self: 42) ... py> x = Test() py> x.spam 23 py> x.eggs <property object at 0xb7168c5c> This is relevant because modules are instances, and module attributes live in the instance __dict__.
I don't think that you should record that the import has been tried and failed. Regular imports cache successes, not failures, and lazy imports should do the same: just because an import failed a minute ago doesn't mean that it will fail now. -- Steven

On 10/6/2014 11:30 PM, Steven D'Aprano wrote:
So a 'solution' might be to make modules be instances (but with no __new__ or __init__) of a module metaclass, so that module dicts could act like class dicts with respect to descriptors. I have no idea how much code this would break ;-). -- Terry Jan Reedy

On Mon, Oct 06, 2014 at 11:51:45PM -0400, Terry Reedy wrote:
Probably lots :-) It's not common to write something like this outside of a class: x = property(some_function) but if you did, you would normally expect lookups on x to return the property object itself, not call the __get__ method. That's what happens now, and so by backward compatibility that can't change. More importantly, defining top-level functions is ubiquitous in Python code. And functions obey the descriptor protocol! That's how functions magically turn into methods when put inside a class. So if modules called __get__ by default, every single function lookup would break. That would be bad. So any change to the behaviour of module lookups would need to be opt-in. -- Steven

On Oct 6, 2014, at 20:51, Terry Reedy <tjreedy@udel.edu> wrote:
Didn't we just have this discussion a few weeks ago, in the context of making lazy loading of subpackages easier to implement? IIRC, the not-obviously-unreasonable options suggested were: 1) Analogy with __new__: For packages only, if there's a __new__.py, that gets executed first. If it "returns" (not sure how that was defined) an instance of a subclass of ModuleType, that instance is used to run __init__.py instead of a normal module instance. 2) Analogy with metaclass= (or with 2.x __metaclass__): If a module (or a package's __init__.py) does some new syntax or magic comment before any non-comment code, it can specify a custom type in place of ModuleType (not sure how that type gets imported and made available). 3) Analogy with re-classing object instances: Just allow modules to set __class__ during execution (or after, if you want). Variations on this include allowing that for all non-heap types, or even getting rid of the concept of non-heap types. 4) Make it easier to write import hooks for this special purpose. 5) Make it easier for a module to construct a ModuleType-subclass instance with a copy of the same dict and replace itself in sys.modules. It seems like any of these would allow defining properties, other descriptors, and special methods on modules.

On 7 October 2014 13:51, Terry Reedy <tjreedy@udel.edu> wrote:
As Steven noted, making such a change by default would actually break the world, since every module level function would break when it started being treated as a method instead: >>> def f(): pass ... >>> hasattr(f, "__get__") True Anything related to changing module attribute lookup also needs to deal with the fact that manipulating the module namespace via globals() and the "global" directive is a fully supported feature. This is why models which put something that *isn't* a normal module (such as an ordinary class object) in sys.modules tend to be preferred. That approach is already fully supported, since the import system *retrieves the result from sys.modules*, rather than trusting what the import hooks return - that gives the module the ability to replace itself in sys.modules as a side effect of the import process. For example, here's a rough (untested) sketch of one possible way to do a lazily calculated module attribute today: # rest of the module ... class _ThisModule: @property def calculated_attribute(self): return 42 _ThisModule.__dict__.update(globals()) for k, v in _ThisModule.__dict__.items(): if callable(v): setattr(_ThisModule, k, staticmethod(v)) import sys sys.modules[__name__] = _ThisModule() You could potentially wrap most of that dance up in a "replace_module" helper function and stick it somewhere in importlib. Cheers, Nick. P.S. This is a variant of approach #5 from Andrew's list. Approach #3 would allow code along the lines of "__class__ = _ThisModule", potentially simplifying the situation even further than a helper function could. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 06, 2014 at 09:34:16PM -0400, Brian Allen Vanderburg II wrote:
[...]
This is a specific example of a more general technique, namely computed attributes. Rather than a solution to the specific example, I would rather have a general solution to the problem of having computed attributes in modules. We have half a solution already: the descriptor protocol, the most obvious example being property(). Well, perhaps a quarter of a solution. Assuming property(), or some equivalent, worked in modules, the solution to lazy loading is simple: have a getter that returns the attribute if it already exists, otherwise initialise it (importing it from elsewhere) then return it. The problem then becomes, how do we get descriptors to be run when accessing them via module attributes? The normal process is that if a name is found in the instance __dict__, the descriptor protocol is not followed: py> class Test(object): ... spam = property(lambda self: 23) ... def __init__(self): ... self.eggs = property(lambda self: 42) ... py> x = Test() py> x.spam 23 py> x.eggs <property object at 0xb7168c5c> This is relevant because modules are instances, and module attributes live in the instance __dict__.
I don't think that you should record that the import has been tried and failed. Regular imports cache successes, not failures, and lazy imports should do the same: just because an import failed a minute ago doesn't mean that it will fail now. -- Steven

On 10/6/2014 11:30 PM, Steven D'Aprano wrote:
So a 'solution' might be to make modules be instances (but with no __new__ or __init__) of a module metaclass, so that module dicts could act like class dicts with respect to descriptors. I have no idea how much code this would break ;-). -- Terry Jan Reedy

On Mon, Oct 06, 2014 at 11:51:45PM -0400, Terry Reedy wrote:
Probably lots :-) It's not common to write something like this outside of a class: x = property(some_function) but if you did, you would normally expect lookups on x to return the property object itself, not call the __get__ method. That's what happens now, and so by backward compatibility that can't change. More importantly, defining top-level functions is ubiquitous in Python code. And functions obey the descriptor protocol! That's how functions magically turn into methods when put inside a class. So if modules called __get__ by default, every single function lookup would break. That would be bad. So any change to the behaviour of module lookups would need to be opt-in. -- Steven

On Oct 6, 2014, at 20:51, Terry Reedy <tjreedy@udel.edu> wrote:
Didn't we just have this discussion a few weeks ago, in the context of making lazy loading of subpackages easier to implement? IIRC, the not-obviously-unreasonable options suggested were: 1) Analogy with __new__: For packages only, if there's a __new__.py, that gets executed first. If it "returns" (not sure how that was defined) an instance of a subclass of ModuleType, that instance is used to run __init__.py instead of a normal module instance. 2) Analogy with metaclass= (or with 2.x __metaclass__): If a module (or a package's __init__.py) does some new syntax or magic comment before any non-comment code, it can specify a custom type in place of ModuleType (not sure how that type gets imported and made available). 3) Analogy with re-classing object instances: Just allow modules to set __class__ during execution (or after, if you want). Variations on this include allowing that for all non-heap types, or even getting rid of the concept of non-heap types. 4) Make it easier to write import hooks for this special purpose. 5) Make it easier for a module to construct a ModuleType-subclass instance with a copy of the same dict and replace itself in sys.modules. It seems like any of these would allow defining properties, other descriptors, and special methods on modules.

On 7 October 2014 13:51, Terry Reedy <tjreedy@udel.edu> wrote:
As Steven noted, making such a change by default would actually break the world, since every module level function would break when it started being treated as a method instead: >>> def f(): pass ... >>> hasattr(f, "__get__") True Anything related to changing module attribute lookup also needs to deal with the fact that manipulating the module namespace via globals() and the "global" directive is a fully supported feature. This is why models which put something that *isn't* a normal module (such as an ordinary class object) in sys.modules tend to be preferred. That approach is already fully supported, since the import system *retrieves the result from sys.modules*, rather than trusting what the import hooks return - that gives the module the ability to replace itself in sys.modules as a side effect of the import process. For example, here's a rough (untested) sketch of one possible way to do a lazily calculated module attribute today: # rest of the module ... class _ThisModule: @property def calculated_attribute(self): return 42 _ThisModule.__dict__.update(globals()) for k, v in _ThisModule.__dict__.items(): if callable(v): setattr(_ThisModule, k, staticmethod(v)) import sys sys.modules[__name__] = _ThisModule() You could potentially wrap most of that dance up in a "replace_module" helper function and stick it somewhere in importlib. Cheers, Nick. P.S. This is a variant of approach #5 from Andrew's list. Approach #3 would allow code along the lines of "__class__ = _ThisModule", potentially simplifying the situation even further than a helper function could. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Andrew Barnert
-
Brian Allen Vanderburg II
-
Nick Coghlan
-
Steven D'Aprano
-
Terry Reedy