Refactoring the import system to be object-oriented at the top level
(Disclaimer: this is complicated Py-in-the-sky stuff, and I'm handwaving away a lot of major problems with the concept, not least of which is the sheer amount of work involved. I just wanted to get the idea published somewhere while I was thinking about it) I'm in the process of implementing a runpy.run_path function for 2.7/3.2 to allow Python code to use the zipfile and directory execution feature provided by the CPython command line in 2.6/3.1. It turns out the global state used for the import system is causing some major pain in the implementation. It's solvable, but it will probably involve a couple of rather ugly hacks and the result sure as hell isn't going to be thread-safe. Anyway, the gist of the idea in the subject line is to take all of the PEP 302 data stores and make them attributes of an ImportEngine class. This would affect at least: sys.modules sys.path sys.path_hooks sys.path_importer_cache sys.meta_path sys.dont_write_bytecode The underlying import machinery would migrate to instance methods of the new class. The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine. Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account. The main immediate benefit from my point of view would be to allow runpy to create *copies* of the standard import engine so that runpy.run_module and runpy.run_path could go do their thing without impacting the rest of the interpreter. At the moment that really isn't feasible, hence the lack of thread safety in that module. I suspect such a change would greatly simplify experimentation with Python security models as well: restricted code could be given a restricted import engine rather than the restrictions having to be baked in to the standard import engine.
From an OO design point of view, it's a classic migration of global state and multiple functions to manipulate that state into a single global instance of a new class.
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Sounds like the ClassLoader in Java. I think it would be a good idea. On 11/08/2009 05:24 AM, Nick Coghlan wrote:
... Anyway, the gist of the idea in the subject line is to take all of the PEP 302 data stores and make them attributes of an ImportEngine class. This would affect at least:
sys.modules sys.path sys.path_hooks sys.path_importer_cache sys.meta_path sys.dont_write_bytecode
The underlying import machinery would migrate to instance methods of the new class.
The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine.
Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account. ...
Mathias Panzenböck wrote:
Sounds like the ClassLoader in Java. I think it would be a good idea.
On further reflection, it occurred to me that it should be possible to do something along these lines with importlib, *without* necessarily replacing the builtin import machinery (i.e. having a special instance of the class that mapped its instance attributes to the appropriate sys module attributes). Unfortunately, I doubt I'll have the cycles any time soon to pursue the idea further :P Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sat, Nov 7, 2009 at 20:24, Nick Coghlan <ncoghlan@gmail.com> wrote:
(Disclaimer: this is complicated Py-in-the-sky stuff, and I'm handwaving away a lot of major problems with the concept, not least of which is the sheer amount of work involved. I just wanted to get the idea published somewhere while I was thinking about it)
I'm in the process of implementing a runpy.run_path function for 2.7/3.2 to allow Python code to use the zipfile and directory execution feature provided by the CPython command line in 2.6/3.1. It turns out the global state used for the import system is causing some major pain in the implementation. It's solvable, but it will probably involve a couple of rather ugly hacks and the result sure as hell isn't going to be thread-safe.
Anyway, the gist of the idea in the subject line is to take all of the PEP 302 data stores and make them attributes of an ImportEngine class. This would affect at least:
sys.modules sys.path sys.path_hooks sys.path_importer_cache sys.meta_path sys.dont_write_bytecode
The underlying import machinery would migrate to instance methods of the new class.
Do you really mean methods or just instance attributes? I personally don't care personally, but it does require more of an API design otherwise.
The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine.
How would that work? Because they are module attributes there is no way to use a property to have them return what the current sys.import_engine uses.
Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account.
Take into account how? As in when importing a package to always use the import engine used for the parent module in the package?
The main immediate benefit from my point of view would be to allow runpy to create *copies* of the standard import engine so that runpy.run_module and runpy.run_path could go do their thing without impacting the rest of the interpreter. At the moment that really isn't feasible, hence the lack of thread safety in that module.
I suspect such a change would greatly simplify experimentation with Python security models as well: restricted code could be given a restricted import engine rather than the restrictions having to be baked in to the standard import engine.
Huh, I wonder made you think about that example? =)
From an OO design point of view, it's a classic migration of global state and multiple functions to manipulate that state into a single global instance of a new class.
If anything it makes it easier to discover everything that affects importing instead of having to crawl through the sys docs to find every attribute that happens to mention the word "import". -Brett
Brett Cannon wrote:
The underlying import machinery would migrate to instance methods of the new class.
Do you really mean methods or just instance attributes? I personally don't care personally, but it does require more of an API design otherwise.
I did mean methods, but I also realise how much work would be involved in actually following up on this idea. (As the saying goes, real innovation is 1% inspiration, 99% perspiration!) If you don't move the machinery itself into instance methods then you just end up having to pass the storage object around to various functions. Might as well make that parameter 'self' and use methods.
The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine.
How would that work? Because they are module attributes there is no way to use a property to have them return what the current sys.import_engine uses.
Yes, I eventually realised it would be better to turn the dependency around the other way (i.e. have an engine subclass that used properties to refer to the sys module attributes)
Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account.
Take into account how? As in when importing a package to always use the import engine used for the parent module in the package?
Yes, that's what I was thinking. That would be necessary to allow operations like the runpy methods to execute without having side effects on the main import machinery the way they do now. Having run_path and run_module functions that were as side-effect free (and hence thread-safe) as exec and execfile would be kind of cool.
I suspect such a change would greatly simplify experimentation with Python security models as well: restricted code could be given a restricted import engine rather than the restrictions having to be baked in to the standard import engine.
Huh, I wonder made you think about that example? =)
Not *just* your efforts over the last few years, although those were definitely a major inspiration :)
From an OO design point of view, it's a classic migration of global state and multiple functions to manipulate that state into a single global instance of a new class.
If anything it makes it easier to discover everything that affects importing instead of having to crawl through the sys docs to find every attribute that happens to mention the word "import".
Heck, *I* had to stare at dir(sys) for a while to make the list of import-related attributes in my post and I've been working on import related code for years. Even then I almost missed 'dont_write_bytecode' and wouldn't be the least surprised if someone pointed out that I actually had missed something else :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sun, Nov 8, 2009 at 13:20, Nick Coghlan <ncoghlan@gmail.com> wrote:
Brett Cannon wrote:
The underlying import machinery would migrate to instance methods of the new class.
Do you really mean methods or just instance attributes? I personally don't care personally, but it does require more of an API design otherwise.
I did mean methods, but I also realise how much work would be involved in actually following up on this idea. (As the saying goes, real innovation is 1% inspiration, 99% perspiration!)
If you don't move the machinery itself into instance methods then you just end up having to pass the storage object around to various functions. Might as well make that parameter 'self' and use methods.
I don't quite follow. What difference does it make if they are instance attributes compared to methods? The data still needs to be stored somewhere that is unique per instance to get the semantics you want. The other thing you could do with this is provide import_module() on the object so it is a fully self-contained object that can do an entire import on its own without having to touch anything else (heck, you could even go so far as to have their own module cache, but that might be too far as all loaders currently are expected to work with sys.modules).
The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine.
How would that work? Because they are module attributes there is no way to use a property to have them return what the current sys.import_engine uses.
Yes, I eventually realised it would be better to turn the dependency around the other way (i.e. have an engine subclass that used properties to refer to the sys module attributes)
Yeah, you could have it default to the attributes on the sys module if no instance attributes are set.
Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account.
Take into account how? As in when importing a package to always use the import engine used for the parent module in the package?
Yes, that's what I was thinking. That would be necessary to allow operations like the runpy methods to execute without having side effects on the main import machinery the way they do now.
Having run_path and run_module functions that were as side-effect free (and hence thread-safe) as exec and execfile would be kind of cool.
That would be nice to have.
I suspect such a change would greatly simplify experimentation with Python security models as well: restricted code could be given a restricted import engine rather than the restrictions having to be baked in to the standard import engine.
Huh, I wonder made you think about that example? =)
Not *just* your efforts over the last few years, although those were definitely a major inspiration :)
From an OO design point of view, it's a classic migration of global state and multiple functions to manipulate that state into a single global instance of a new class.
If anything it makes it easier to discover everything that affects importing instead of having to crawl through the sys docs to find every attribute that happens to mention the word "import".
Heck, *I* had to stare at dir(sys) for a while to make the list of import-related attributes in my post and I've been working on import related code for years. Even then I almost missed 'dont_write_bytecode' and wouldn't be the least surprised if someone pointed out that I actually had missed something else :)
Yeah, there are a lot of them.
Brett Cannon wrote:
On Sun, Nov 8, 2009 at 13:20, Nick Coghlan <ncoghlan@gmail.com> wrote:
Brett Cannon wrote:
The underlying import machinery would migrate to instance methods of the new class.
Do you really mean methods or just instance attributes? I personally don't care personally, but it does require more of an API design otherwise. I did mean methods, but I also realise how much work would be involved in actually following up on this idea. (As the saying goes, real innovation is 1% inspiration, 99% perspiration!)
If you don't move the machinery itself into instance methods then you just end up having to pass the storage object around to various functions. Might as well make that parameter 'self' and use methods.
I don't quite follow. What difference does it make if they are instance attributes compared to methods? The data still needs to be stored somewhere that is unique per instance to get the semantics you want.
The other thing you could do with this is provide import_module() on the object so it is a fully self-contained object that can do an entire import on its own without having to touch anything else (heck, you could even go so far as to have their own module cache, but that might be too far as all loaders currently are expected to work with sys.modules).
Slight miscommunication there: by "underlying import machinery" I meant the functions that currently do the heavy lifting for imports (i.e. most of the code in import.c), along with their equivalents in importlib. The sys attribute equivalents would indeed just be normal attributes on the as-yet-hypothetical ImportEngine instances. I suspect you're right that there would be problems with the PEP 302 design currently encouraging loader and importer implementations to work with the sys attributes directly - backwards compatibility on that front is one of the big issues I was handwaving away in the original post. A PEP 3115 inspired thought is it may make sense to allow loaders to split load_module() into two distinct steps (prepare_module() and exec_module()) and leave the sys.modules manipulation to the import engine. That is (using the sample load_module() implementation from PEP 302), something along the lines of: def prepare_module(self, fullname, mod=None): if mod is None: mod = imp.new_module(fullname) mod.__file__ = "<%s>" % self.__class__.__name__ mod.__loader__ = self if self._is_package(fullname): mod.__path__ = [] return mod def exec_module(self, fullname, mod): exec self._get_code(fullname) in mod.__dict__ The key difference here is that module caching becomes entirely the responsibility of the import engine rather than relying on each loader to do it correctly. It would also give the import engine a chance to monkey with the module globals before the module code is executed (e.g. ensuring __package__ is set, setting a new __import_engine__ variable, overriding __import__ to play nicely with the current import engine) If a non-global import system adopted such an alternate loader protocol it could easily avoid invoking standard loaders that directly manipulated the sys attributes.
The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine. How would that work? Because they are module attributes there is no way to use a property to have them return what the current sys.import_engine uses. Yes, I eventually realised it would be better to turn the dependency around the other way (i.e. have an engine subclass that used properties to refer to the sys module attributes)
Yeah, you could have it default to the attributes on the sys module if no instance attributes are set.
I was actually thinking of a SysImportEngine subclass that turned them all into properties that referenced the appropriate objects in sys. I'm starting to convince myself that I should *find* the time to experiment with this in the sandbox... then again, I wouldn't be entirely surprised if Guido deemed all this outright abuse of the import system :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
participants (3)
-
Brett Cannon
-
Mathias Panzenböck
-
Nick Coghlan