
I've had some offline discussion with Brett and Nick about PEP 451 which has led to some meaningful clarifications in the PEP. In the interest of pulling further discussions back onto this (archived/public) list, here's an update of what we'd discussed and where things are at. :) * path entry finders indicate that they found part of a possible namespace package by returning a spec with no loader set (but with submodule_search_locations set). Brett wanted some clarification on this. * The name/path signature and attributes of file-based finders in importlib will no longer be changing. Brett had some suggestions on the proposed change and it became clear that the the change was actually pointless. * I've asserted that there shouldn't be much difficulty in adjusting pkgutil and other modules to work with ModuleSpec. * Brett asked for clarification on whether the "load()" example from the PEP would be realized implicitly by the import machinery or explicitly as a method on ModuleSpec. This has bearing on the ability of finders to return instances of ModuleSpec subclasses or even ModuleSpec-like objects (a la duck typing). The answer is the it will not be a method on ModuleSpec, so it is effectively just part of the general import system implementation. Finders may return any object that provides the attributes of ModuleSpec. I will be updating the PEP to make these points clear. * Nick suggested writing a draft patch for the language reference changes (the import page). Such a patch will be a pretty good indicator of the impact of PEP 451 on the import system and should highlight any design flaws in the API. This is on my to-do list (hopefully by tomorrow). * Nick also suggested moving all ModuleSpec methods to a separate class that will simply make use of a separate, existing ModuleSpec instance. This will help address several issues, particularly by relaxing the constraints on what finders can return, but also by avoiding the unnecessary exposure of the methods via every module.__spec__. I plan on going with this, but currently am trying out the change to see if there are any problems I've missed. Once I feel good about it I'll update the PEP. That about sums up our discussions. I have a couple of outstanding updates to the PEP to make when I get a chance, as well as putting up a language reference patch for review. -eric

On 24 October 2013 16:05, Eric Snow <ericsnowcurrently@gmail.com> wrote:
There's also the fact Guido asked Brett and I to be co-delegates for this PEP, and we accepted (we both already agree with the overall concept, so it's just a matter of hammering out the final API details).
More specifically: importlib finders will still expose the previous import plugin API for backwards compatibility, so the worst case scenario is that we miss something and there's an API somewhere that doesn't accept import plugins that only provide the new API and not the old one. However, the PEP should explicitly state that any such omissions will be treated as bugs in the 3.4.x series (although we'll aim to handle them all in the initial implementation). Thanks for recording the details of the earlier off-list discussion :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Oct 24, 2013 at 12:05 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Okay, I've posted the patch to http://bugs.python.org/issue18864. I'll get to the other open issues in the next couple days. -eric

On Thu, Oct 24, 2013 at 2:05 AM, Eric Snow <ericsnowcurrently@gmail.com>wrote:
Eric's got an initial patch for this up on http://bugs.python.org/issue18864.
After reading Eric's doc patch, I realized there is one change I want to make to the current semantics and that's not to backfill __package__ when set to None. Since import is now going to take over the job of setting __package__ (along with other attributes), this seems like a slight waste of effort. It also kills (or at least complicates) having a loader which does lazy loading since reading the attribute to see if it is None would trigger the load before leaving the import code, thus killing any postponed loading.

On Fri, Oct 25, 2013 at 11:44 AM, Eric Snow <ericsnowcurrently@gmail.com>wrote:
I should also mention this applies to __loader__ as well. Basically everything from http://hg.python.org/cpython/file/eb1edc9e3722/Lib/importlib/_bootstrap.py#l... down.

I've not really had time to review this PEP yet, but from skimming discussion to date, the only thing I'm still worried about is whether this will break lazy import schemes that use a module subclass that hooks __getattribute__ and calls reload() in order to perform what's actually an *initial* load. IOW, does anything in this proposal rely on a module object having *any* attributes besides __name__ set at reload() time? That is, is there an assumption that a module being reloaded has 1. Been loaded, and 2. Is being reloaded via the same location, __loader__, etc. as before? At least through all 2.x, reload() just uses module.__name__ to restart the module find-and-load process, and does not assume that __loader__ is valid in advance. (Also, if this has changed in recent Python versions independent of this PEP, it's a backwards-compatibility break that should be documented somewhere.) On Thu, Oct 24, 2013 at 2:05 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

On Fri, Oct 25, 2013 at 12:24 PM, PJ Eby <pje@telecommunity.com> wrote:
Depends on how you implement it probably. I have a scheme in my head using __getattribute__ and loaders which will work with this PEP if __package__ and __loader__ are not explicitly checked after execute_module() is called, so lazy imports are not going to be fundamentally broken. Since Python 3.3 lazy loaders relying on __getattribute__ have been broken due to those __package__/__loader__ checks so this is actually going to be an improvement.
IOW, does anything in this proposal rely on a module object having *any* attributes besides __name__ set at reload() time?
As it stands in Python 3.3 it requires __name__ and __loader__ be defined: http://hg.python.org/cpython/file/e2c3f638c3d0/Lib/importlib/__init__.py#l10...
Just 2 in terms of __loader__ since loaders have to work with or without the module already existing.
That doesn't make much sense in a post-importlib world where import makes sure that __loader__ is set (which it has since Python 3.3). Otherwise you are asking for not just a reload but a re-find as well. As for the PEP, it will probably keep the name/loader requirement but shift it to having to be set on the spec object at __spec__. I guess you could make the argument a reload should include re-running the finder step as well, but I don't think it's worth the code tweaking to make it happen when the finder/loader steps are divided as they are.

On Fri, Oct 25, 2013 at 1:15 PM, Brett Cannon <brett@python.org> wrote:
That's a feature, not a bug. A reload() after changing sys.path *should* take into account the change, not to mention any changes to meta_path, path hooks, etc. (And it's how reload() worked before importlib.) I suppose it's not really documented all that well, but way way back in the 2.3 timeframe I asked for a tweak to PEP 302 to make sure that reload() (in the re-find sense) would work properly with PEP 302 loaders like zipimport -- some of the language still in the PEP is there specifically to support this use case. (Specifically, the bit that says loaders *must* use the existing module object in sys.modules if there's one there already, so that reload() will work. It was actually in part to ensure that reload() would work in the case of a re-find.) It appears that since then, the PEP has been changed in a way that invalidates part of the purpose of the prior change; I guess I missed the discussion of that change last year. :-( ISTM there should've been such a discussion, since IIRC importlib wasn't supposed to change any Python semantics, and this is a non-trivial change to the semantics of reload() even in cases that aren't doing lazy imports or other such munging. reload() used to take sys.* and __path__ changes into account, and IMO should continue to do so. If this is an intentional change in reload() semantics, other Python implementations need to know about this too! (That being said, I'm not saying I shouldn't or couldn't have tested this in 3.3 and found out about it that way. And the existence of issue18698 suggests that nobody's relying yet on even the *fundamental* semantics of PEP 302 reload() working properly in 3.3, since that was an even bigger change that nobody spotted till a couple of months ago.)

On Fri, Oct 25, 2013 at 2:10 PM, PJ Eby <pje@telecommunity.com> wrote:
Fair enough, but in my mind that obviously doesn't really click for what I view as a reload in an importlib world where secret import code no longer exists. When I think re-load I think "load again", not "find the module again and execute a load with a possibly new loader". It's not worth changing in Python 3.3, but if you want to propose to redefine importlib.reload() to use importlib.find_spec() to reset the __spec__ on a module then I'm not going to argue against it (but I'm not going to fight for it either). And in a PEP 451 world it should be dead-simple to make this work the way you want in your own code even if this doesn't go the way you want:: spec = importlib.find_spec(name) module.__spec__ = spec importlib.reload(module) # Which in itself is essentially init_module_attrs(spec, module); spec.loader.exec_module(module) Heck, you can do this in Python 3.3 right now:: loader = importlib.find_loader(name) module = sys.modules[name] module.__loader__ = loader importlib.reload(module)
I suppose it's not really documented all that well,
Nope =)
Ah, okay. That is not explicit in the PEP beyond coming off a total nuisance in order to support reloading by the loader, not an explicit finder + loader use-case.
Well, not change them to the best of its abilities. I'm sure there are edge cases that I couldn't match properly since I could only go by the test suite and bug reports filed since Python 3.1.
Unfortunately this was never explicitly documented (at least that I noticed) nor was there a test in the stdlib to enforce compliance with it, else this never would have happened.
If this is an intentional change in reload() semantics, other Python implementations need to know about this too!
Not really. As of Python 3.3 the semantic shift included re-implementing the function in pure Python so they pick up the change for free; remember the function is not a builtin anymore to somewhat de-emphasize its use since it's kind of a hack and really only used typically at the interpreter prompt as an easy part of a load-edit-reload dev cycle, not some fundamental operation.
Yep, never got a bug report on this (although there have been reports of typos in the docs as well as not returning what's in sys.modules so it is being used by people).

On Fri, Oct 25, 2013 at 3:15 PM, Brett Cannon <brett@python.org> wrote:
Sure, and the reference manual is rather vague on this point. However, I would guess that at least some web frameworks with automatic reload support are going to barf on this change in at least some edge cases. (OTOH, it's unlikely the bugs will ever be reported, because the problem will mysteriously go away once the process is restarted, probably never to occur again.) Mostly, this just seems like an ugly wart -- Python should be dynamic by default, and that includes reloading. While the import machinery has lots of ugly caching under the hood, a user-level function like reload() should not require you to do the equivalent of saying, "no, really... I want you to *really* reload, not just pull in whatever exists where you found it last time, while ignoring whether I switched from module to package or vice versa, or just fixed my sys.path so I can load the right version of the module." It is a really tiny thing in the overall scheme of things, because reload() is not used all that often, but it's still a thing. If this isn't treated as a bug, then the docs for reload() at least need to include a forward-supported workaround so you can say "no, really... *really* reload" in an approved fashion. (ISTM that any production code out there that currently uses reload() would want to perform the "really reload" incantation in order to avoid the edge cases, even if they haven't actually run into any of them yet.)
And will that later version still work correctly in a PEP 451 world, or will you have to detect which world you live in before waving this particular dead chicken? ;-)
Yeah, it actually was to ensure that you could reload a module using a different loader than the one that originally loaded it, e.g. due to a change in path hooks, etc.

On 26 October 2013 08:51, PJ Eby <pje@telecommunity.com> wrote:
Having imp.reload() kick off the full reload cycle makes sense to me. The only reason it doesn't currently do that in importlib is because there was no test for it in the regression test suite, and PEP 302 and the library reference are entirely vague on how module reloading works (aside from the requirement that it reuse the same module namespace).
Yep, we already tweaked PEP 451 to say that it has to respect post-import modifications made to the module attributes.
Yeah, the rationale makes sense, we only missed it due to the lack of a regression test for the behaviour. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 26, 2013 at 11:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I put up a patch that should fix this without a lot of work, including a test that would have caught this. http://bugs.python.org/issue19413 -eric

On 27 October 2013 03:27, Nick Coghlan <ncoghlan@gmail.com> wrote:
I've been thinking about this, and I've come to the conclusion that: 1. The old __name__ based behaviour was broken (since __name__ may not be the actual module name, as in "__main__" or when a pseudo-module is lying about its identity) 2. The 3.3 behaviour is still broken for similar reasons *and* added the dependency on __loader__ 3. Even PEP 451 still underspecifies the semantics of reloading (e.g. it's implied create_module isn't called for reload(), but not stated explicitly) Accordingly, I think we should add a "How Reloading Will Work" section, akin to the existing "How Loading Will Work" (http://www.python.org/dev/peps/pep-0451/#how-loading-will-work). We may also want to spit out some import warnings for edge cases that are likely to do the wrong thing. Side note: in "How Loading Will Work", _init_module_attrs needs to be passed both the module *and* the spec (current text only passes it the module). It also needs to be updated to handle the namespace package case (as shown below) And a sketch of some possible fully specified reload semantics: def reload(module): # Get the name to reload from the spec if it is available, otherwise __name__ current_spec = getattr(module, "__spec__") current_name = module.__name__ name_to_reload = current_name if current_spec is None else current_spec.name # Check this module is properly imported before trying to reload it loaded_module = sys.modules.get(name_to_reload) if loaded_module is not module: msg = "module {} not in sys.modules" raise ImportError(msg.format(name_to_reload), name=name_to_reload) parent_name = name_to_reload.rpartition('.')[0] while parent_name: if parent_name not in sys.modules: msg = "parent {!r} not in sys.modules" raise ImportError(msg.format(parent_name), name=parent_name) parent_name = parent_name.rpartition('.')[0] # Check for a modified spec (e.g. if the import hooks have changed) updated_spec = importlib.find_spec(name_to_reload) # The import-related module attributes get updated here # We leave __name__ alone, so reloading __main__ has some hope of working correctly _init_module_attrs(loaded_module, updated_spec, set_name=False) # Now re-execute the module loader = updated_spec.loader if loader is None: # Namespace package, already reinitialised above return loaded_module elif hasattr(loader, 'exec_module'): loader.exec_module(module) else: loader.load_module(name_to_reload) # Reloading modules that replace themselves in sys.modules doesn't work reloaded_module = sys.modules[name_to_reload] if reloaded_module is not loaded_module: # Perhaps this should initially be a DeprecationWarning? # We've never actively *prevented* this case before, even though it is # almost guaranteed to break things if you do it. Alternatively, we could # leave "imp.reload()" as the more permissive version, and include this # check in the new importlib.reload() (with imp.reload calling an internal # importlib._unchecked_reload() instead) msg = "Module {!r} was replaced in sys.modules during reload" raise ImportError(msg.format(name_to_reload), name=name_to_reload) return reloaded_module -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 26, 2013 at 9:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm tempted to just say reload should not be a blocker for PEP 451. The premise is that the PEP mostly maintains the status quo, just shifting reload-based-on-loader to reload-based-on-spec (it's still "*load* the same way you did the first time"). I agree that it would be worth getting reload working right, but I'm not convinced it's worth it to delay PEP 451 more for the sake of reload.
Good catch. Fixed!
It also needs to be updated to handle the namespace package case (as shown below)
I knew I missed something. I'll fix this, but it may be a couple days before I get a minute. :( -eric

On 27 October 2013 14:29, Eric Snow <ericsnowcurrently@gmail.com> wrote:
I don't think we can postpone it to later, as we need to be clear on: 1. Does reload use __name__ or __spec__.name when both are available? 2. Does __name__ get restored to its original value if reloading via __spec__.name? 3. Do other import related attributes get restored to their original values? 4. Does create_module get called if the loader has an exec_module method? 5. Does the loader have access to the previous spec when reloading a module? My answers to these questions (note that my answer to 2 differs from what I had in my initial sketch): 1. __spec__.name 2. Yes, __name__ is updated to match __spec__.name, *expect* if it is currently "__main__" 3. Yes, other import related attributes are restored to their baseline values 4. No, create_module is never called when reloading 5. Currently no, but I'm thinking that may be worth changing (more on that below) The reload() implementation in my message is actually based on the current importlib.reload implementation. The only PEP 451 related changes were: - using __spec__.name (if available) instead of __name__ - checking all parent modules exist rather than just the immediate parent module - calling import.find_spec() rather than using the __loader__ attribute directly - being explicit that __name__ is left at the value it had prior to the reload - handling the namespace package, exec_module and no exec_module cases I also added an explicit check that the module was re-used properly, but I agree *that* change is out of scope for the PEP and should be considered as a separate question. Now, regarding the signature of exec_module(): I'm back to believing that loaders should receive a clear indication that a reload is taking place. Legacy loaders have to figure that out for themselves (by seeing that the module already exists in sys.modules), but we can do better for the new API by making the exec_module signature look like: def exec_module(self, module, previous_spec=None): # module is as per the current PEP 451 text # previous_spec would be set *only* in the reload() case # loaders that don't care still need to accept it, but can just ignore it ... So, with those revisions, the reload() semantics would be defined as: def reload(module): # Get the name to reload from the spec if it is available, # otherwise use __name__ directly current_spec = getattr(module, "__spec__") current_name = module.__name__ name_to_reload = current_name if current_spec is None else current_spec.name # Check this module is properly imported before trying to reload it loaded_module = sys.modules.get(name_to_reload) if loaded_module is not module: msg = "module {} not in sys.modules" raise ImportError(msg.format(name_to_reload), name=name_to_reload) parent_name = name_to_reload.rpartition('.')[0] while parent_name: if parent_name not in sys.modules: msg = "parent {!r} not in sys.modules" raise ImportError(msg.format(parent_name), name=parent_name) parent_name = parent_name.rpartition('.')[0] # Check for a modified spec (e.g. if the import hooks have changed) updated_spec = importlib.find_spec(name_to_reload) # The import-related module attributes get updated here _init_module_attrs(loaded_module, updated_spec) if current_name == "__main__": loaded_module.__name__ = "__main__" # Now re-execute the module loader = updated_spec.loader if loader is None: # Namespace package, already reinitialised above return loaded_module elif hasattr(loader, 'exec_module'): loader.exec_module(module, current_spec) else: loader.load_module(name_to_reload) # Allow for the module to be replaced in sys.modules # (even though that likely won't work very well) return sys.modules[name_to_reload] Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Oct 27, 2013 at 1:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Just to be clear, this means that a lazy import implementation that creates a module object without a __spec__ in the first place will look like an initial import? Or will that crash importlib because of a missing __spec__ attribute? That is, is reload()'s contract adding a new prerequisite for the object passed to it? (The specific use case is creating a ModuleType subclass instance for lazy importing upon attribute access. Pre-importlib, all that was needed was a working __name__ attribute on the module.)

On 28 Oct 2013 02:37, "PJ Eby" <pje@telecommunity.com> wrote:
For custom loaders, that's part of the contract for create_module() (since you'll get an ordinary module otherwise), and so long as *setting* the special module attributes doesn't cause the module to be imported during the initial load operation, attribute access based lazy loading will work fine (and you don't even have to set __name__, since the import machinery will take care of that). For module level lazy loading that injects a partially initialised module object into sys.modules rather than using a custom loader or setting a __spec__ attribute, yes, the exec_module invocation on reloading would always look like a fresh load operation (aside from the fact that the custom instance would already be in sys.modules from the first load operation). It *will* still work, though (at least, it won't break any worse than such code does today, since injecting a replacement into sys.modules really isn't reload friendly in the first place). Cheers, Nick.

On Sun, Oct 27, 2013 at 4:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Huh? I don't understand where custom loaders come into it. For that matter, I don't understand what "get an ordinary module object" means here, either. I'm talking about userspace code that implements lazy importing features, like the lazyModule() function in this module: http://svn.eby-sarna.com/Importing/peak/util/imports.py?view=markup Specifically, I'm trying to get an idea of how much that code will need to change under the PEP (and apparently under importlib in general).
There's no "initial load operation", just creation of a dummy module and stuffing it into sys.modules. The way it works is that in, say, foo/__init__.py, one uses: bar = lazyModule('foo.bar') baz = lazyModule('foo.baz') Then anybody importing 'foo.bar' or 'foo.baz' (or using "from foo import bar", etc.) ends up with the lazy module. That is, it's for lazily exposing APIs, not something used as an import hook.
Right.
Wait, what? Who's injecting a replacement into sys.modules? A replacement of what? Or do you mean that loaders aren't supposed to create new modules, but use the one in sys.modules? Honestly, I'm finding all this stuff *really* confusing, which is kind of worrying. I mean, I gather I'm one of the handful of people who really understood how importing *used to work*, and I'm having a lot of trouble wrapping my brain around the new world. (Granted, I think that may be because I understand how a lot of old corner cases work, but what's bugging me is that I no longer understand how those old corners work under the new regime, nor do I feel I understand what the *new* corners will be. This may also just be communication problems, and the fact that it's been months since I really walked through importlib line by line, and have never really walked through it (or PEP 451) quite as thoroughly as I have import.c. I also seem to be having trouble grokking why the motivating use cases for PEP 451 can't be solved by just providing people with good base classes to use for writing loaders -- i.e., I don't get why the core protocol has to change to address the use case of writing loaders more easily. The new protocol seems way more complex than PEP 302, and ISTM the complexity could just be pushed off to the loader side of the protocol without creating more interdependency between importlib and the loaders.)

On 28 Oct 2013 08:41, "PJ Eby" <pje@telecommunity.com> wrote:
wrote:
If the lazy import is injected by a *different* module, then nothing changes.
I was thinking of the more complex case where a module causes *itself* to be loaded lazily. Nothing changes for the simpler case where the injection occurs in a different module.
Provide test cases that exercise the situations you're concerned about supporting and then you don't need to worry about them breaking.
Bad loader implementations have too much power to break the import system, and loaders don't currently expose enough information about modules that *could* be imported. We also take "inherit from this class to implement the protocol correctly, import will break if you don't" as a sign that the *protocol is broken* by pushing too much of the work onto the plugins. New style loaders can be much simpler, because the import system takes care of the complexity, without relying on users inheriting from a particular base class. The complexity comes from the fact that we're now breaking down what the *real* expectations on a loader actually are, and making those part of the import system itself. Cheers, Nick.

On Sun, Oct 27, 2013 at 4:41 PM, PJ Eby <pje@telecommunity.com> wrote:
Depending on the outcome of issue19413, you shouldn't have to change anything. PEP 451 is aiming for strict backward compatibility.
Like Nick, I would love more tests that cover these corner cases. It would have saved us (Brett especially) a lot of headache. At the least any description you can offer would be great.
What new protocol specifically? Finder.find_module() now returns a module spec instead of a loader. Loader.exec_module() gets used now rather than load_module(). Loaders don't have to worry about all the boilerplate stuff anymore (managing sys.modules and import-related module attributes). From my perspective PEP 451 is simplifying protocols. -eric

On Sat, Oct 26, 2013 at 11:03 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree on all of these. I'm adding a "How reloading will work" section to the PEP.
I'd rather give loaders another optional method: supports_reload(name). Complicating the spec methods signatures (to support passing the old spec through) just for the sake of reload seems less than optimal. I don't see any benefit to exposing the old spec to loader.exec_module().
This is pretty much the same thing I've implemented. :) -eric

On 29 Oct 2013 14:45, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
- calling import.find_spec() rather than using the __loader__ attribute
values? parent module directly
I do: it lets the loader tell if anything changed from the previous load operation, allowing it to make an informed decision on whether or not to permit the reload operation or throw an exception. However, I accept such a check would be better implemented as a separate yes/no method, so it makes sense to postpone this aspect to 3.5 at the earliest. Cheers, Nick.
reload it

On Tue, Oct 29, 2013 at 3:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
With the PEP, loaders are no longer in charge of the module-related boilerplate. This means that by the time exec_module() gets called, they won't be able to determine for themselves if it's a reload by checking sys.modules. That's the point of adding loader.supports_reload(). (I haven't added that rationale to the PEP yet). If it makes a difference to loaders to also look at the previous spec during supports_reload(), then I think we should add that parameter now. Otherwise if we add the parameter later it makes it messier. The signature would now be: Loader.supports_reload(name, old_spec) -eric

On 30 October 2013 09:02, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Ah, good point.
OK, time for me to stop trying to remember the details of the problem I'm trying to solve and go look them up in the source code :) One of my goals here is to be able to migrate extension loading from the old API to the new plugin API. That means being able to break up the existing load_module implementation: http://hg.python.org/cpython/file/1787277915e9/Python/importdl.c#l23 For loading, that's a fairly straightforward create_module/exec_module split, but reloading gets a bit more interesting. Specifically, I'd like to be able to implement the relevant parts of _PyImport_FindExtensionObject as a precheck for reloading: http://hg.python.org/cpython/file/1787277915e9/Python/import.c#l533 That means just having access to the module name isn't enough: the extensions dictionary is keyed by a (name, filename) 2-tuple rather than just by the module name. Using the find_spec API, that filename would be stored in the loader state on the spec object rather than being looked up anew during the load operation. However, rereading this method also indicates that I really want to know *in exec_module* whether this is a reload or not, since extension loading needs to handle reloads differently from initial execution. So I'm back to my original preference: I'd like the previous spec to be passed to exec_module in the reloading case. If reloading is not supported at all, it's up to the loader to throw an appropriate exception when the the previous spec is not None. If loading and reloading doesn't make any difference, then they can just ignore it. But when both are supported, but handled differently (as for extension modules), then that can be detected, and the information from the old spec (including the original loader and loader state) is available if needed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 29, 2013 at 7:29 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Our recent discovery about reloading should probably be reflected in the signature of finder.find_spec(): MetaPathFinder.find_spec(name, path=None, existing=None) PathEntryFinder.find_spec(name, existing=None) This way the finder has an opportunity to incorporate information from an existing spec into the spec it returns. reload() would make use of this by passing module.__spec__ (or None if the module has no __spec__) to _bootstrap._find_spec(). This approach should also address what you are looking for. I'd prefer it over passing the existing spec to exec_module(). The module (and its __spec__) should have everything exec_module() needs to do its job. We would still need to use loader.supports_reload() in reload(). However, it may make more sense to pass in the module-to-be-reloaded (after updating its __spec__ to the one found by _bootstrap._find_spec()). -eric

On 31 Oct 2013 03:41, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
Yes, that should work.
We would still need to use loader.supports_reload() in reload().
Why? If the reload isn't supported, exec_module can just throw an exception based on the loader state in the spec.
From the import system's point of view "reload not permitted" is no different from any other exec time failure.
Cheers, Nick.

On Wed, Oct 30, 2013 at 4:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Cool. I'll update the PEP.
At the point that exec_module() gets called, the loader can't check sys.modules to see if it's a reload or not. As a workaround, the finder could set up some loader state to indicate to the loader that it's a reload and then the loader, during exec_module(), would check that and act accordingly. However, that's the sort of boilerplate that PEP 451 is trying to offload onto the import machinery. With Loader.supports_reload() it's a lot cleaner. -eric

On 31 Oct 2013 08:54, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
exception
There's also the option of implementing the constraint directly in the finder, which *does* have the necessary info (with the change to pass the previous spec to find_spec). I still think it makes more sense to leave this out for the moment - it's not at all clear we need the extra method, and adding it later would be a straightforward protocol update. Cheers, Nick.

On Wed, Oct 30, 2013 at 10:24 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yeah, I thought of that. I just prefer the more explicit supports_reload(). That said...
...I agree that makes the most sense for now. :) BTW, thanks for pushing these issues. I think the API has gotten pretty solid. I just need to make sure the PEP covers the cases and conclusions we're discussing. -eric

On Thu, Oct 31, 2013 at 5:52 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
You're welcome. ;-) But speaking of handwaving, I also want to be sure that loader developers know that "reloading" is only really "reloading" if there's a previous existing spec, or the module type is... Hm. Actually, I think I now know how to state what's bugging me every time I see this "supports_reload()" or "reload=True" or other reloading flags in this process. I think that references to reloading should be replaced with references to what's *actually* at issue, because "reloading" itself is vague and carries too many assumptions for a loader author to understand or get right. (Look how hard it is for *us*!) That is, I think we should clarify what use cases there are for knowing whether a "reload" is happening, and address those use cases explicitly rather than lumping them under a general heading. For example, if the reason a loader cares about reloading is because it's a C extension using a custom module type, and the existing module isn't of the right type, then we should just spell out how to handle it. (e.g. raise an exception) If the reason a loader cares about reloading is because of some sort of caching or reuse, then we should just spell out how to handle that, too. Lumping these cases together under a "reloading" flag or a check for "reloading" support is a nasty code smell, because it requires a loader developer to have the *same* vaguely-defined idea of "reloading" as the PEP authors. ;-) I also suspect, that if properly spelled out, those use cases are going to boil down to: 1. Throwing errors if you have an existing module object you can't load into, and 2. Passing in a previous spec object, if available In other words, loaders should not really have any responsibility for or concept of "reloading" -- they always load into a module object (that they may or may not have created), and they may get given a spec from a previous load. They should deal only in "module reuse" and "spec reuse". While a typical reload() might involve both reuses, there are cases where one sort of reuse could occur independently, and not all loaders care about both (or even either) condition. At any rate, it means a loader author doesn't have to figure out how to handle "reloading", all they have to figure out is whether they can load into a particular module object, and whether they can do something useful with a spec that was previously used to load a module with the same name -- a spec that may or may not refer to a similar previous loader. These are rather more well-defined endeavors than trying to determine in the abstract whether one "supports reload". ;-)

On 1 Nov 2013 01:37, "PJ Eby" <pje@telecommunity.com> wrote: . ;-)
It may not have been clear from the email exchange, but that's basically where we ended up :) The change Eric is currently going to make to the PEP is to add an optional "previous" parameter to the various find_spec APIs. At the time when find_spec runs, sys.modules hasn't been touched yet, so the old module state is still available when reloading. Passing the spec in lets the loader decide whether or not it can actually load that module given the information about the previous load operation. However, perhaps it makes more sense to pass the entire existing module, rather than just the spec? I'd like to minimise the need for new-style loader authors to retrieve things directly from the sys module, and you're right that "can I reload into this particular module?" is a slightly different question from "can I reload a module previously loaded using this particular spec?". A spec-based API could still be used to answer the first question by looking up sys.modules, but so could the name based API. Passing in the module reference, on the other hand, should let loaders answer both questions without needing to touch the sys module. I also now think this pass-the-module approach when finding the spec approach could also clean up some of the messiness that is __main__ initialisation, where we repeatedly load things into the same module object. Let's say we be completely explicit and call the new parameter something like "load_target". If we do that then I would make the *same* change to runpy.run_path and runpy.run_module: let you pass in an existing module object under that name to say "hey, run in this namespace, rather than a fresh one". (Those APIs currently only support pre-populating a *fresh* namespace, rather than allowing you to directly specify an existing one) Most loaders won't care, but those that do will have all the info they need to throw an exception saying "I could provide a spec for this, but it's not compatible with that load target". Cheers, Nick.

On Thu, Oct 31, 2013 at 6:10 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Do you mean loader or finder? You say "find_spec" which is for finders, but then you say "loader".
See you mention "finding" but before now everything has been "loader".
Exactly what APIs -- new or changed -- are you proposing? Method signatures only please to avoid ambiguity. =)

On Fri, Nov 1, 2013 at 7:35 AM, Brett Cannon <brett@python.org> wrote:
I'm about to update the PEP but I'll clarify here. The signature of finder.find_spec() will grow an extra "existing" parameter (default to None) that is an existing module. During reload the module from sys.modules will be passed in. Having this be part of find_spec() is consistent with the job of the finder to identify a loader that should be used to load the module (reload or not). As to the inconsistency in what Nick said, it still fits in with the way I see it. I'm sure he'll correct me if he had something else in mind. :-) finder.find_spec() will be making a decision on what loader (if any) to return. As part of that, when it finds a loader it can further decide on whether or not the loader supports loading into an existing module (if one was provided). As part of that it may or may not consult with the loader. Either way it stands as proxy for the loader in the latter decision. In lieu of an explicit loader.supports_reload() method, a loader will rely on its associated finder to be a proxy for reload-related decisions, likely communicated via spec.loader_state. Perhaps Nick meant something else, but my understanding is that he was referring to this -eric

On Fri, Nov 1, 2013 at 11:59 AM, Brett Cannon <brett@python.org> wrote:
Thanks for the clarification. It all SGTM.
I've updated the PEP and the reference implementation. If I've missed anything please let me know. There is only one thing I thought of that I'd like to get an opinion on: In the Finders section, the PEP specifies returning None (or using a different loader) when the found loader does not support loading into an existing module (e.g during reload). An alternative to returning None would be for the finder to raise ImportError with a message like "the loader does not support reloading the module". This may actually be a better approach since "could not find a loader" and "the found loader won't work" are different situations that a single return value (None) may not sufficiently represent. Other than that, I'm not aware of any blockers for the PEP. Once I settle that last question, I'll ask for pronouncement. (Anyone may feel free to pronounce before then. <wink>) -eric

On 2 November 2013 08:44, Eric Snow <ericsnowcurrently@gmail.com> wrote:
While I was saying loader when I should have been saying finder, Eric correctly divined my intent :)
Throwing ImportError for "can load, but cannot load into that target module" sounds good to me. We should also be explicit that throwing an exception from find_spec *stops* the search (unlike returning None).
Other than that, I'm not aware of any blockers for the PEP.
The one slight quibble I have with the PEP update is the wording regarding the "existing" parameter. I think runpy should pass the new parameter as well when calling find_spec, but in that case it *won't* be an existing copy of the named module, it will be sys.modules["__main__"]. So we shouldn't set an expectation that the existing module passed in will necessarily be a previous copy of the module being loaded, it's just an indication to the finder that the target namespace for execution will be a pre-existing module rather than a new one. For the same reason, I also have a mild preference for "target" (or the more explicit "load_target") as the name, although I won't object if you and Brett prefer "existing". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Nov 2, 2013 at 5:46 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
SGTM as well.
We should also be explicit that throwing an exception from find_spec *stops* the search (unlike returning None).
I thought that was obvious but more docs doesn't hurt. =)
I would go with "module" or "target" and that is where I shall end the bikeshedding.

On 3 November 2013 01:21, Brett Cannon <brett@python.org> wrote:
I had a quick reread of the runpy docs: http://docs.python.org/dev/library/runpy#runpy.run_module I can definitely make "target" work (that's why I suggested it originally, re-reading the docs just confirmed that), but I don't see any non-confusing way to use "module". Eric: since "target" was on both Brett's list and mine, it looks like s/existing/target/ and clarifying that "incompatible target module" should be reported by throwing ImportError from find_spec() will get you the pronouncement you're looking for :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Oct 25, 2013 at 10:24 AM, PJ Eby <pje@telecommunity.com> wrote:
Yeah, this actually changed in 3.3 IIRC, where reload now calls module.__loader__.load_module(module.__name__). I'll double-check. PEP 451 simply changes that to be (basically) module.__loader__.exec_module(module). -eric

On 24 October 2013 16:05, Eric Snow <ericsnowcurrently@gmail.com> wrote:
There's also the fact Guido asked Brett and I to be co-delegates for this PEP, and we accepted (we both already agree with the overall concept, so it's just a matter of hammering out the final API details).
More specifically: importlib finders will still expose the previous import plugin API for backwards compatibility, so the worst case scenario is that we miss something and there's an API somewhere that doesn't accept import plugins that only provide the new API and not the old one. However, the PEP should explicitly state that any such omissions will be treated as bugs in the 3.4.x series (although we'll aim to handle them all in the initial implementation). Thanks for recording the details of the earlier off-list discussion :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Oct 24, 2013 at 12:05 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Okay, I've posted the patch to http://bugs.python.org/issue18864. I'll get to the other open issues in the next couple days. -eric

On Thu, Oct 24, 2013 at 2:05 AM, Eric Snow <ericsnowcurrently@gmail.com>wrote:
Eric's got an initial patch for this up on http://bugs.python.org/issue18864.
After reading Eric's doc patch, I realized there is one change I want to make to the current semantics and that's not to backfill __package__ when set to None. Since import is now going to take over the job of setting __package__ (along with other attributes), this seems like a slight waste of effort. It also kills (or at least complicates) having a loader which does lazy loading since reading the attribute to see if it is None would trigger the load before leaving the import code, thus killing any postponed loading.

On Fri, Oct 25, 2013 at 11:44 AM, Eric Snow <ericsnowcurrently@gmail.com>wrote:
I should also mention this applies to __loader__ as well. Basically everything from http://hg.python.org/cpython/file/eb1edc9e3722/Lib/importlib/_bootstrap.py#l... down.

I've not really had time to review this PEP yet, but from skimming discussion to date, the only thing I'm still worried about is whether this will break lazy import schemes that use a module subclass that hooks __getattribute__ and calls reload() in order to perform what's actually an *initial* load. IOW, does anything in this proposal rely on a module object having *any* attributes besides __name__ set at reload() time? That is, is there an assumption that a module being reloaded has 1. Been loaded, and 2. Is being reloaded via the same location, __loader__, etc. as before? At least through all 2.x, reload() just uses module.__name__ to restart the module find-and-load process, and does not assume that __loader__ is valid in advance. (Also, if this has changed in recent Python versions independent of this PEP, it's a backwards-compatibility break that should be documented somewhere.) On Thu, Oct 24, 2013 at 2:05 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

On Fri, Oct 25, 2013 at 12:24 PM, PJ Eby <pje@telecommunity.com> wrote:
Depends on how you implement it probably. I have a scheme in my head using __getattribute__ and loaders which will work with this PEP if __package__ and __loader__ are not explicitly checked after execute_module() is called, so lazy imports are not going to be fundamentally broken. Since Python 3.3 lazy loaders relying on __getattribute__ have been broken due to those __package__/__loader__ checks so this is actually going to be an improvement.
IOW, does anything in this proposal rely on a module object having *any* attributes besides __name__ set at reload() time?
As it stands in Python 3.3 it requires __name__ and __loader__ be defined: http://hg.python.org/cpython/file/e2c3f638c3d0/Lib/importlib/__init__.py#l10...
Just 2 in terms of __loader__ since loaders have to work with or without the module already existing.
That doesn't make much sense in a post-importlib world where import makes sure that __loader__ is set (which it has since Python 3.3). Otherwise you are asking for not just a reload but a re-find as well. As for the PEP, it will probably keep the name/loader requirement but shift it to having to be set on the spec object at __spec__. I guess you could make the argument a reload should include re-running the finder step as well, but I don't think it's worth the code tweaking to make it happen when the finder/loader steps are divided as they are.

On Fri, Oct 25, 2013 at 1:15 PM, Brett Cannon <brett@python.org> wrote:
That's a feature, not a bug. A reload() after changing sys.path *should* take into account the change, not to mention any changes to meta_path, path hooks, etc. (And it's how reload() worked before importlib.) I suppose it's not really documented all that well, but way way back in the 2.3 timeframe I asked for a tweak to PEP 302 to make sure that reload() (in the re-find sense) would work properly with PEP 302 loaders like zipimport -- some of the language still in the PEP is there specifically to support this use case. (Specifically, the bit that says loaders *must* use the existing module object in sys.modules if there's one there already, so that reload() will work. It was actually in part to ensure that reload() would work in the case of a re-find.) It appears that since then, the PEP has been changed in a way that invalidates part of the purpose of the prior change; I guess I missed the discussion of that change last year. :-( ISTM there should've been such a discussion, since IIRC importlib wasn't supposed to change any Python semantics, and this is a non-trivial change to the semantics of reload() even in cases that aren't doing lazy imports or other such munging. reload() used to take sys.* and __path__ changes into account, and IMO should continue to do so. If this is an intentional change in reload() semantics, other Python implementations need to know about this too! (That being said, I'm not saying I shouldn't or couldn't have tested this in 3.3 and found out about it that way. And the existence of issue18698 suggests that nobody's relying yet on even the *fundamental* semantics of PEP 302 reload() working properly in 3.3, since that was an even bigger change that nobody spotted till a couple of months ago.)

On Fri, Oct 25, 2013 at 2:10 PM, PJ Eby <pje@telecommunity.com> wrote:
Fair enough, but in my mind that obviously doesn't really click for what I view as a reload in an importlib world where secret import code no longer exists. When I think re-load I think "load again", not "find the module again and execute a load with a possibly new loader". It's not worth changing in Python 3.3, but if you want to propose to redefine importlib.reload() to use importlib.find_spec() to reset the __spec__ on a module then I'm not going to argue against it (but I'm not going to fight for it either). And in a PEP 451 world it should be dead-simple to make this work the way you want in your own code even if this doesn't go the way you want:: spec = importlib.find_spec(name) module.__spec__ = spec importlib.reload(module) # Which in itself is essentially init_module_attrs(spec, module); spec.loader.exec_module(module) Heck, you can do this in Python 3.3 right now:: loader = importlib.find_loader(name) module = sys.modules[name] module.__loader__ = loader importlib.reload(module)
I suppose it's not really documented all that well,
Nope =)
Ah, okay. That is not explicit in the PEP beyond coming off a total nuisance in order to support reloading by the loader, not an explicit finder + loader use-case.
Well, not change them to the best of its abilities. I'm sure there are edge cases that I couldn't match properly since I could only go by the test suite and bug reports filed since Python 3.1.
Unfortunately this was never explicitly documented (at least that I noticed) nor was there a test in the stdlib to enforce compliance with it, else this never would have happened.
If this is an intentional change in reload() semantics, other Python implementations need to know about this too!
Not really. As of Python 3.3 the semantic shift included re-implementing the function in pure Python so they pick up the change for free; remember the function is not a builtin anymore to somewhat de-emphasize its use since it's kind of a hack and really only used typically at the interpreter prompt as an easy part of a load-edit-reload dev cycle, not some fundamental operation.
Yep, never got a bug report on this (although there have been reports of typos in the docs as well as not returning what's in sys.modules so it is being used by people).

On Fri, Oct 25, 2013 at 3:15 PM, Brett Cannon <brett@python.org> wrote:
Sure, and the reference manual is rather vague on this point. However, I would guess that at least some web frameworks with automatic reload support are going to barf on this change in at least some edge cases. (OTOH, it's unlikely the bugs will ever be reported, because the problem will mysteriously go away once the process is restarted, probably never to occur again.) Mostly, this just seems like an ugly wart -- Python should be dynamic by default, and that includes reloading. While the import machinery has lots of ugly caching under the hood, a user-level function like reload() should not require you to do the equivalent of saying, "no, really... I want you to *really* reload, not just pull in whatever exists where you found it last time, while ignoring whether I switched from module to package or vice versa, or just fixed my sys.path so I can load the right version of the module." It is a really tiny thing in the overall scheme of things, because reload() is not used all that often, but it's still a thing. If this isn't treated as a bug, then the docs for reload() at least need to include a forward-supported workaround so you can say "no, really... *really* reload" in an approved fashion. (ISTM that any production code out there that currently uses reload() would want to perform the "really reload" incantation in order to avoid the edge cases, even if they haven't actually run into any of them yet.)
And will that later version still work correctly in a PEP 451 world, or will you have to detect which world you live in before waving this particular dead chicken? ;-)
Yeah, it actually was to ensure that you could reload a module using a different loader than the one that originally loaded it, e.g. due to a change in path hooks, etc.

On 26 October 2013 08:51, PJ Eby <pje@telecommunity.com> wrote:
Having imp.reload() kick off the full reload cycle makes sense to me. The only reason it doesn't currently do that in importlib is because there was no test for it in the regression test suite, and PEP 302 and the library reference are entirely vague on how module reloading works (aside from the requirement that it reuse the same module namespace).
Yep, we already tweaked PEP 451 to say that it has to respect post-import modifications made to the module attributes.
Yeah, the rationale makes sense, we only missed it due to the lack of a regression test for the behaviour. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 26, 2013 at 11:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I put up a patch that should fix this without a lot of work, including a test that would have caught this. http://bugs.python.org/issue19413 -eric

On 27 October 2013 03:27, Nick Coghlan <ncoghlan@gmail.com> wrote:
I've been thinking about this, and I've come to the conclusion that: 1. The old __name__ based behaviour was broken (since __name__ may not be the actual module name, as in "__main__" or when a pseudo-module is lying about its identity) 2. The 3.3 behaviour is still broken for similar reasons *and* added the dependency on __loader__ 3. Even PEP 451 still underspecifies the semantics of reloading (e.g. it's implied create_module isn't called for reload(), but not stated explicitly) Accordingly, I think we should add a "How Reloading Will Work" section, akin to the existing "How Loading Will Work" (http://www.python.org/dev/peps/pep-0451/#how-loading-will-work). We may also want to spit out some import warnings for edge cases that are likely to do the wrong thing. Side note: in "How Loading Will Work", _init_module_attrs needs to be passed both the module *and* the spec (current text only passes it the module). It also needs to be updated to handle the namespace package case (as shown below) And a sketch of some possible fully specified reload semantics: def reload(module): # Get the name to reload from the spec if it is available, otherwise __name__ current_spec = getattr(module, "__spec__") current_name = module.__name__ name_to_reload = current_name if current_spec is None else current_spec.name # Check this module is properly imported before trying to reload it loaded_module = sys.modules.get(name_to_reload) if loaded_module is not module: msg = "module {} not in sys.modules" raise ImportError(msg.format(name_to_reload), name=name_to_reload) parent_name = name_to_reload.rpartition('.')[0] while parent_name: if parent_name not in sys.modules: msg = "parent {!r} not in sys.modules" raise ImportError(msg.format(parent_name), name=parent_name) parent_name = parent_name.rpartition('.')[0] # Check for a modified spec (e.g. if the import hooks have changed) updated_spec = importlib.find_spec(name_to_reload) # The import-related module attributes get updated here # We leave __name__ alone, so reloading __main__ has some hope of working correctly _init_module_attrs(loaded_module, updated_spec, set_name=False) # Now re-execute the module loader = updated_spec.loader if loader is None: # Namespace package, already reinitialised above return loaded_module elif hasattr(loader, 'exec_module'): loader.exec_module(module) else: loader.load_module(name_to_reload) # Reloading modules that replace themselves in sys.modules doesn't work reloaded_module = sys.modules[name_to_reload] if reloaded_module is not loaded_module: # Perhaps this should initially be a DeprecationWarning? # We've never actively *prevented* this case before, even though it is # almost guaranteed to break things if you do it. Alternatively, we could # leave "imp.reload()" as the more permissive version, and include this # check in the new importlib.reload() (with imp.reload calling an internal # importlib._unchecked_reload() instead) msg = "Module {!r} was replaced in sys.modules during reload" raise ImportError(msg.format(name_to_reload), name=name_to_reload) return reloaded_module -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 26, 2013 at 9:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm tempted to just say reload should not be a blocker for PEP 451. The premise is that the PEP mostly maintains the status quo, just shifting reload-based-on-loader to reload-based-on-spec (it's still "*load* the same way you did the first time"). I agree that it would be worth getting reload working right, but I'm not convinced it's worth it to delay PEP 451 more for the sake of reload.
Good catch. Fixed!
It also needs to be updated to handle the namespace package case (as shown below)
I knew I missed something. I'll fix this, but it may be a couple days before I get a minute. :( -eric

On 27 October 2013 14:29, Eric Snow <ericsnowcurrently@gmail.com> wrote:
I don't think we can postpone it to later, as we need to be clear on: 1. Does reload use __name__ or __spec__.name when both are available? 2. Does __name__ get restored to its original value if reloading via __spec__.name? 3. Do other import related attributes get restored to their original values? 4. Does create_module get called if the loader has an exec_module method? 5. Does the loader have access to the previous spec when reloading a module? My answers to these questions (note that my answer to 2 differs from what I had in my initial sketch): 1. __spec__.name 2. Yes, __name__ is updated to match __spec__.name, *expect* if it is currently "__main__" 3. Yes, other import related attributes are restored to their baseline values 4. No, create_module is never called when reloading 5. Currently no, but I'm thinking that may be worth changing (more on that below) The reload() implementation in my message is actually based on the current importlib.reload implementation. The only PEP 451 related changes were: - using __spec__.name (if available) instead of __name__ - checking all parent modules exist rather than just the immediate parent module - calling import.find_spec() rather than using the __loader__ attribute directly - being explicit that __name__ is left at the value it had prior to the reload - handling the namespace package, exec_module and no exec_module cases I also added an explicit check that the module was re-used properly, but I agree *that* change is out of scope for the PEP and should be considered as a separate question. Now, regarding the signature of exec_module(): I'm back to believing that loaders should receive a clear indication that a reload is taking place. Legacy loaders have to figure that out for themselves (by seeing that the module already exists in sys.modules), but we can do better for the new API by making the exec_module signature look like: def exec_module(self, module, previous_spec=None): # module is as per the current PEP 451 text # previous_spec would be set *only* in the reload() case # loaders that don't care still need to accept it, but can just ignore it ... So, with those revisions, the reload() semantics would be defined as: def reload(module): # Get the name to reload from the spec if it is available, # otherwise use __name__ directly current_spec = getattr(module, "__spec__") current_name = module.__name__ name_to_reload = current_name if current_spec is None else current_spec.name # Check this module is properly imported before trying to reload it loaded_module = sys.modules.get(name_to_reload) if loaded_module is not module: msg = "module {} not in sys.modules" raise ImportError(msg.format(name_to_reload), name=name_to_reload) parent_name = name_to_reload.rpartition('.')[0] while parent_name: if parent_name not in sys.modules: msg = "parent {!r} not in sys.modules" raise ImportError(msg.format(parent_name), name=parent_name) parent_name = parent_name.rpartition('.')[0] # Check for a modified spec (e.g. if the import hooks have changed) updated_spec = importlib.find_spec(name_to_reload) # The import-related module attributes get updated here _init_module_attrs(loaded_module, updated_spec) if current_name == "__main__": loaded_module.__name__ = "__main__" # Now re-execute the module loader = updated_spec.loader if loader is None: # Namespace package, already reinitialised above return loaded_module elif hasattr(loader, 'exec_module'): loader.exec_module(module, current_spec) else: loader.load_module(name_to_reload) # Allow for the module to be replaced in sys.modules # (even though that likely won't work very well) return sys.modules[name_to_reload] Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Oct 27, 2013 at 1:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Just to be clear, this means that a lazy import implementation that creates a module object without a __spec__ in the first place will look like an initial import? Or will that crash importlib because of a missing __spec__ attribute? That is, is reload()'s contract adding a new prerequisite for the object passed to it? (The specific use case is creating a ModuleType subclass instance for lazy importing upon attribute access. Pre-importlib, all that was needed was a working __name__ attribute on the module.)

On 28 Oct 2013 02:37, "PJ Eby" <pje@telecommunity.com> wrote:
For custom loaders, that's part of the contract for create_module() (since you'll get an ordinary module otherwise), and so long as *setting* the special module attributes doesn't cause the module to be imported during the initial load operation, attribute access based lazy loading will work fine (and you don't even have to set __name__, since the import machinery will take care of that). For module level lazy loading that injects a partially initialised module object into sys.modules rather than using a custom loader or setting a __spec__ attribute, yes, the exec_module invocation on reloading would always look like a fresh load operation (aside from the fact that the custom instance would already be in sys.modules from the first load operation). It *will* still work, though (at least, it won't break any worse than such code does today, since injecting a replacement into sys.modules really isn't reload friendly in the first place). Cheers, Nick.

On Sun, Oct 27, 2013 at 4:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Huh? I don't understand where custom loaders come into it. For that matter, I don't understand what "get an ordinary module object" means here, either. I'm talking about userspace code that implements lazy importing features, like the lazyModule() function in this module: http://svn.eby-sarna.com/Importing/peak/util/imports.py?view=markup Specifically, I'm trying to get an idea of how much that code will need to change under the PEP (and apparently under importlib in general).
There's no "initial load operation", just creation of a dummy module and stuffing it into sys.modules. The way it works is that in, say, foo/__init__.py, one uses: bar = lazyModule('foo.bar') baz = lazyModule('foo.baz') Then anybody importing 'foo.bar' or 'foo.baz' (or using "from foo import bar", etc.) ends up with the lazy module. That is, it's for lazily exposing APIs, not something used as an import hook.
Right.
Wait, what? Who's injecting a replacement into sys.modules? A replacement of what? Or do you mean that loaders aren't supposed to create new modules, but use the one in sys.modules? Honestly, I'm finding all this stuff *really* confusing, which is kind of worrying. I mean, I gather I'm one of the handful of people who really understood how importing *used to work*, and I'm having a lot of trouble wrapping my brain around the new world. (Granted, I think that may be because I understand how a lot of old corner cases work, but what's bugging me is that I no longer understand how those old corners work under the new regime, nor do I feel I understand what the *new* corners will be. This may also just be communication problems, and the fact that it's been months since I really walked through importlib line by line, and have never really walked through it (or PEP 451) quite as thoroughly as I have import.c. I also seem to be having trouble grokking why the motivating use cases for PEP 451 can't be solved by just providing people with good base classes to use for writing loaders -- i.e., I don't get why the core protocol has to change to address the use case of writing loaders more easily. The new protocol seems way more complex than PEP 302, and ISTM the complexity could just be pushed off to the loader side of the protocol without creating more interdependency between importlib and the loaders.)

On 28 Oct 2013 08:41, "PJ Eby" <pje@telecommunity.com> wrote:
wrote:
If the lazy import is injected by a *different* module, then nothing changes.
I was thinking of the more complex case where a module causes *itself* to be loaded lazily. Nothing changes for the simpler case where the injection occurs in a different module.
Provide test cases that exercise the situations you're concerned about supporting and then you don't need to worry about them breaking.
Bad loader implementations have too much power to break the import system, and loaders don't currently expose enough information about modules that *could* be imported. We also take "inherit from this class to implement the protocol correctly, import will break if you don't" as a sign that the *protocol is broken* by pushing too much of the work onto the plugins. New style loaders can be much simpler, because the import system takes care of the complexity, without relying on users inheriting from a particular base class. The complexity comes from the fact that we're now breaking down what the *real* expectations on a loader actually are, and making those part of the import system itself. Cheers, Nick.

On Sun, Oct 27, 2013 at 4:41 PM, PJ Eby <pje@telecommunity.com> wrote:
Depending on the outcome of issue19413, you shouldn't have to change anything. PEP 451 is aiming for strict backward compatibility.
Like Nick, I would love more tests that cover these corner cases. It would have saved us (Brett especially) a lot of headache. At the least any description you can offer would be great.
What new protocol specifically? Finder.find_module() now returns a module spec instead of a loader. Loader.exec_module() gets used now rather than load_module(). Loaders don't have to worry about all the boilerplate stuff anymore (managing sys.modules and import-related module attributes). From my perspective PEP 451 is simplifying protocols. -eric

On Sat, Oct 26, 2013 at 11:03 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree on all of these. I'm adding a "How reloading will work" section to the PEP.
I'd rather give loaders another optional method: supports_reload(name). Complicating the spec methods signatures (to support passing the old spec through) just for the sake of reload seems less than optimal. I don't see any benefit to exposing the old spec to loader.exec_module().
This is pretty much the same thing I've implemented. :) -eric

On 29 Oct 2013 14:45, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
- calling import.find_spec() rather than using the __loader__ attribute
values? parent module directly
I do: it lets the loader tell if anything changed from the previous load operation, allowing it to make an informed decision on whether or not to permit the reload operation or throw an exception. However, I accept such a check would be better implemented as a separate yes/no method, so it makes sense to postpone this aspect to 3.5 at the earliest. Cheers, Nick.
reload it

On Tue, Oct 29, 2013 at 3:32 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
With the PEP, loaders are no longer in charge of the module-related boilerplate. This means that by the time exec_module() gets called, they won't be able to determine for themselves if it's a reload by checking sys.modules. That's the point of adding loader.supports_reload(). (I haven't added that rationale to the PEP yet). If it makes a difference to loaders to also look at the previous spec during supports_reload(), then I think we should add that parameter now. Otherwise if we add the parameter later it makes it messier. The signature would now be: Loader.supports_reload(name, old_spec) -eric

On 30 October 2013 09:02, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Ah, good point.
OK, time for me to stop trying to remember the details of the problem I'm trying to solve and go look them up in the source code :) One of my goals here is to be able to migrate extension loading from the old API to the new plugin API. That means being able to break up the existing load_module implementation: http://hg.python.org/cpython/file/1787277915e9/Python/importdl.c#l23 For loading, that's a fairly straightforward create_module/exec_module split, but reloading gets a bit more interesting. Specifically, I'd like to be able to implement the relevant parts of _PyImport_FindExtensionObject as a precheck for reloading: http://hg.python.org/cpython/file/1787277915e9/Python/import.c#l533 That means just having access to the module name isn't enough: the extensions dictionary is keyed by a (name, filename) 2-tuple rather than just by the module name. Using the find_spec API, that filename would be stored in the loader state on the spec object rather than being looked up anew during the load operation. However, rereading this method also indicates that I really want to know *in exec_module* whether this is a reload or not, since extension loading needs to handle reloads differently from initial execution. So I'm back to my original preference: I'd like the previous spec to be passed to exec_module in the reloading case. If reloading is not supported at all, it's up to the loader to throw an appropriate exception when the the previous spec is not None. If loading and reloading doesn't make any difference, then they can just ignore it. But when both are supported, but handled differently (as for extension modules), then that can be detected, and the information from the old spec (including the original loader and loader state) is available if needed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 29, 2013 at 7:29 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Our recent discovery about reloading should probably be reflected in the signature of finder.find_spec(): MetaPathFinder.find_spec(name, path=None, existing=None) PathEntryFinder.find_spec(name, existing=None) This way the finder has an opportunity to incorporate information from an existing spec into the spec it returns. reload() would make use of this by passing module.__spec__ (or None if the module has no __spec__) to _bootstrap._find_spec(). This approach should also address what you are looking for. I'd prefer it over passing the existing spec to exec_module(). The module (and its __spec__) should have everything exec_module() needs to do its job. We would still need to use loader.supports_reload() in reload(). However, it may make more sense to pass in the module-to-be-reloaded (after updating its __spec__ to the one found by _bootstrap._find_spec()). -eric

On 31 Oct 2013 03:41, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
Yes, that should work.
We would still need to use loader.supports_reload() in reload().
Why? If the reload isn't supported, exec_module can just throw an exception based on the loader state in the spec.
From the import system's point of view "reload not permitted" is no different from any other exec time failure.
Cheers, Nick.

On Wed, Oct 30, 2013 at 4:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Cool. I'll update the PEP.
At the point that exec_module() gets called, the loader can't check sys.modules to see if it's a reload or not. As a workaround, the finder could set up some loader state to indicate to the loader that it's a reload and then the loader, during exec_module(), would check that and act accordingly. However, that's the sort of boilerplate that PEP 451 is trying to offload onto the import machinery. With Loader.supports_reload() it's a lot cleaner. -eric

On 31 Oct 2013 08:54, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
exception
There's also the option of implementing the constraint directly in the finder, which *does* have the necessary info (with the change to pass the previous spec to find_spec). I still think it makes more sense to leave this out for the moment - it's not at all clear we need the extra method, and adding it later would be a straightforward protocol update. Cheers, Nick.

On Wed, Oct 30, 2013 at 10:24 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yeah, I thought of that. I just prefer the more explicit supports_reload(). That said...
...I agree that makes the most sense for now. :) BTW, thanks for pushing these issues. I think the API has gotten pretty solid. I just need to make sure the PEP covers the cases and conclusions we're discussing. -eric

On Thu, Oct 31, 2013 at 5:52 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
You're welcome. ;-) But speaking of handwaving, I also want to be sure that loader developers know that "reloading" is only really "reloading" if there's a previous existing spec, or the module type is... Hm. Actually, I think I now know how to state what's bugging me every time I see this "supports_reload()" or "reload=True" or other reloading flags in this process. I think that references to reloading should be replaced with references to what's *actually* at issue, because "reloading" itself is vague and carries too many assumptions for a loader author to understand or get right. (Look how hard it is for *us*!) That is, I think we should clarify what use cases there are for knowing whether a "reload" is happening, and address those use cases explicitly rather than lumping them under a general heading. For example, if the reason a loader cares about reloading is because it's a C extension using a custom module type, and the existing module isn't of the right type, then we should just spell out how to handle it. (e.g. raise an exception) If the reason a loader cares about reloading is because of some sort of caching or reuse, then we should just spell out how to handle that, too. Lumping these cases together under a "reloading" flag or a check for "reloading" support is a nasty code smell, because it requires a loader developer to have the *same* vaguely-defined idea of "reloading" as the PEP authors. ;-) I also suspect, that if properly spelled out, those use cases are going to boil down to: 1. Throwing errors if you have an existing module object you can't load into, and 2. Passing in a previous spec object, if available In other words, loaders should not really have any responsibility for or concept of "reloading" -- they always load into a module object (that they may or may not have created), and they may get given a spec from a previous load. They should deal only in "module reuse" and "spec reuse". While a typical reload() might involve both reuses, there are cases where one sort of reuse could occur independently, and not all loaders care about both (or even either) condition. At any rate, it means a loader author doesn't have to figure out how to handle "reloading", all they have to figure out is whether they can load into a particular module object, and whether they can do something useful with a spec that was previously used to load a module with the same name -- a spec that may or may not refer to a similar previous loader. These are rather more well-defined endeavors than trying to determine in the abstract whether one "supports reload". ;-)

On 1 Nov 2013 01:37, "PJ Eby" <pje@telecommunity.com> wrote: . ;-)
It may not have been clear from the email exchange, but that's basically where we ended up :) The change Eric is currently going to make to the PEP is to add an optional "previous" parameter to the various find_spec APIs. At the time when find_spec runs, sys.modules hasn't been touched yet, so the old module state is still available when reloading. Passing the spec in lets the loader decide whether or not it can actually load that module given the information about the previous load operation. However, perhaps it makes more sense to pass the entire existing module, rather than just the spec? I'd like to minimise the need for new-style loader authors to retrieve things directly from the sys module, and you're right that "can I reload into this particular module?" is a slightly different question from "can I reload a module previously loaded using this particular spec?". A spec-based API could still be used to answer the first question by looking up sys.modules, but so could the name based API. Passing in the module reference, on the other hand, should let loaders answer both questions without needing to touch the sys module. I also now think this pass-the-module approach when finding the spec approach could also clean up some of the messiness that is __main__ initialisation, where we repeatedly load things into the same module object. Let's say we be completely explicit and call the new parameter something like "load_target". If we do that then I would make the *same* change to runpy.run_path and runpy.run_module: let you pass in an existing module object under that name to say "hey, run in this namespace, rather than a fresh one". (Those APIs currently only support pre-populating a *fresh* namespace, rather than allowing you to directly specify an existing one) Most loaders won't care, but those that do will have all the info they need to throw an exception saying "I could provide a spec for this, but it's not compatible with that load target". Cheers, Nick.

On Thu, Oct 31, 2013 at 6:10 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Do you mean loader or finder? You say "find_spec" which is for finders, but then you say "loader".
See you mention "finding" but before now everything has been "loader".
Exactly what APIs -- new or changed -- are you proposing? Method signatures only please to avoid ambiguity. =)

On Fri, Nov 1, 2013 at 7:35 AM, Brett Cannon <brett@python.org> wrote:
I'm about to update the PEP but I'll clarify here. The signature of finder.find_spec() will grow an extra "existing" parameter (default to None) that is an existing module. During reload the module from sys.modules will be passed in. Having this be part of find_spec() is consistent with the job of the finder to identify a loader that should be used to load the module (reload or not). As to the inconsistency in what Nick said, it still fits in with the way I see it. I'm sure he'll correct me if he had something else in mind. :-) finder.find_spec() will be making a decision on what loader (if any) to return. As part of that, when it finds a loader it can further decide on whether or not the loader supports loading into an existing module (if one was provided). As part of that it may or may not consult with the loader. Either way it stands as proxy for the loader in the latter decision. In lieu of an explicit loader.supports_reload() method, a loader will rely on its associated finder to be a proxy for reload-related decisions, likely communicated via spec.loader_state. Perhaps Nick meant something else, but my understanding is that he was referring to this -eric

On Fri, Nov 1, 2013 at 11:59 AM, Brett Cannon <brett@python.org> wrote:
Thanks for the clarification. It all SGTM.
I've updated the PEP and the reference implementation. If I've missed anything please let me know. There is only one thing I thought of that I'd like to get an opinion on: In the Finders section, the PEP specifies returning None (or using a different loader) when the found loader does not support loading into an existing module (e.g during reload). An alternative to returning None would be for the finder to raise ImportError with a message like "the loader does not support reloading the module". This may actually be a better approach since "could not find a loader" and "the found loader won't work" are different situations that a single return value (None) may not sufficiently represent. Other than that, I'm not aware of any blockers for the PEP. Once I settle that last question, I'll ask for pronouncement. (Anyone may feel free to pronounce before then. <wink>) -eric

On 2 November 2013 08:44, Eric Snow <ericsnowcurrently@gmail.com> wrote:
While I was saying loader when I should have been saying finder, Eric correctly divined my intent :)
Throwing ImportError for "can load, but cannot load into that target module" sounds good to me. We should also be explicit that throwing an exception from find_spec *stops* the search (unlike returning None).
Other than that, I'm not aware of any blockers for the PEP.
The one slight quibble I have with the PEP update is the wording regarding the "existing" parameter. I think runpy should pass the new parameter as well when calling find_spec, but in that case it *won't* be an existing copy of the named module, it will be sys.modules["__main__"]. So we shouldn't set an expectation that the existing module passed in will necessarily be a previous copy of the module being loaded, it's just an indication to the finder that the target namespace for execution will be a pre-existing module rather than a new one. For the same reason, I also have a mild preference for "target" (or the more explicit "load_target") as the name, although I won't object if you and Brett prefer "existing". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Nov 2, 2013 at 5:46 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
SGTM as well.
We should also be explicit that throwing an exception from find_spec *stops* the search (unlike returning None).
I thought that was obvious but more docs doesn't hurt. =)
I would go with "module" or "target" and that is where I shall end the bikeshedding.

On 3 November 2013 01:21, Brett Cannon <brett@python.org> wrote:
I had a quick reread of the runpy docs: http://docs.python.org/dev/library/runpy#runpy.run_module I can definitely make "target" work (that's why I suggested it originally, re-reading the docs just confirmed that), but I don't see any non-confusing way to use "module". Eric: since "target" was on both Brett's list and mine, it looks like s/existing/target/ and clarifying that "incompatible target module" should be reported by throwing ImportError from find_spec() will get you the pronouncement you're looking for :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Oct 25, 2013 at 10:24 AM, PJ Eby <pje@telecommunity.com> wrote:
Yeah, this actually changed in 3.3 IIRC, where reload now calls module.__loader__.load_module(module.__name__). I'll double-check. PEP 451 simply changes that to be (basically) module.__loader__.exec_module(module). -eric
participants (5)
-
Brett Cannon
-
Eric Snow
-
Nick Coghlan
-
PJ Eby
-
Stephen J. Turnbull