[Import-SIG] Import engine PEP up on python.org

Nick Coghlan ncoghlan at gmail.com
Mon Nov 14 02:27:11 CET 2011


On Sun, Nov 13, 2011 at 9:36 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> It's desirable for the same reason *any* form of object-oriented
>> encapsulation is desirable: because it makes it easier to *think*
>> about the problem and manage interdependencies between state updates.
>
> I guess I'm -1 on that PEP then. If it introduces complications just
> for the sake of some presumed simplification, it's not worth it.

I think you're right on the PEP *as it stands* - I don't think it's an
improvement on the status quo yet. However, I think it's a useful
starting point for using the tools we have available (i.e. classes and
context managers) to make further progress in bringing the complexity
under control and making the import system as a whole less
intimidating and magical.

>> I didn't realise the merits of OO designs needed to be justified - I
>> figured the list of 6 pieces of interdependent process global state
>> spoke for itself.
>
> Perhaps I'm challenging the specific choice of classes then: I would
> find it completely reasonable to move all of this into the interpreter
> state, as I think it's fine that this "global" state is unique to the
> interpreter. There is only a single __import__ builtin, and the
> objective of the import statement is to make a change to the "global"
> state (scoped with the interpreter).

Yes, I think that's a reasonable way of looking at it. I believe
there's merit in partitioning off the import state from everything
else, but the basic idea would also be served just by moving
everything to the interpreter state without adding a new class level
to the mix (in fact, as it turns out, 'sys' as a whole is effectively
part of the interpreter state, so this is already true to some
degree).

The analogy that occurred to me after reading your reply was the past
migration from the separate "last_traceback", "last_type",
"last_value" attributes in the sys module to the consolidated triple
returned by sys.exc_info().

>>> I notice that there is overlap both with multiple subinterpreters,
>>> and with restricted execution. It hints at providing both, but actually
>>> provides neither.
>>
>> It doesn't claim to provide either - it's sole aim is to provide a
>> relatively lightweight mechanism to selectively adjust elements of the
>> import system (e.g. adding to sys.path when importing plugins, but
>> leaving it alone otherwise).
>
> Ok - that might be a use case. However, I'm skeptical that this PEP is
> good at achieving that objective - as you notice, there is the challenge
> of recursive imports.
>
> I would rather prefer to make such variables per-thread, or, rather
> "per context". Something like
>
> with sys.extended_path(directory):
>  load_plugin()
>
> This would extend the path for all code run within the context of the
> with statement, but not elsewhere. As an implementation strategy, the
> thread state would be able to override the global variables, in a
> stacked (nested) manner. The exact list of variables that can be
> overridden needs to be carefully considered - for example, I would
> still view a single sys.modules as important in that use case.

One advantage of the OO model is that it allows such decisions to be
made on a case-by-case basis - as the PEP shows, you can use
properties to control which attributes are isolated from the process
global state and which still perform global modifications.

>> But having the import state better encapsulated would make it easier
>> to improve the isolation of subinterpreters so that they aren't
>> sharing Python modules, even if they still share extension modules
>
> That already works, no? Subinterpreters don't share Python modules
> (that's about the only feature about subinterpreters that actually
> works).

You're quite right - I forgot that the subinterpreter initialisation
already takes care to ensure that the subinterpreter gets a new copy
of the sys module, and then reinitialises the import state for that
new copy. So I guess this proposal can be seen as an intermediate
level of isolation that is accessible from Python code, without
requiring actually swapping out the interpreter state the way
Py_NewInterpreter() does.

>> No, they're not. Yes, the hooks are *usable*, but they're damn hard to
>> comprehend. When even the *experts* hate messing with a subsystem,
>> it's a hint that there's something wrong with the way it is set up. In
>> this case, I firmly believe a big part of the problem is that the
>> import system is a complex, interdependent mechanism, but there's no
>> coherence to the architecture. It's as if the whole thing was written
>> in C from an architectural point of view, but without even bothering
>> to create a dedicated structure to hold the state.
>
> I agree that the import system is difficult to understand, and the
> PEP 302 hooks in particular. I mightily disagree that the cause of
> these difficulties is the global state used in the implementation.
> It's rather the order in which things are called, and how they interact,
> which makes it difficult to understand.

I don't think there's any one thing that makes it so hard to
understand - I think it's a lot of smaller things stacking on top of
each other. Off the top of my head:
- 6 pieces of interdependent global state in 'sys' and 'imp'
- distributed state in package '__path__' attributes
- lack of builtin PEP 302 support for the standard filesystem import
mechanism (hence the undocumented emulation inside pkgutil)
- difficulty of tweaking pieces of the import algorithm while
preserving the rest without copying a lot of code
- scattered APIs (across imp, importlib and pkgutil) for correctly
handling import state updates and data driven imports

importlib is a big step forward on several of those fronts - if Brett
wants/needs help bootstrapping it in as the main import
implementation, then that's a far more important task than adding a
top-level object-oriented API.

However, a top level OO API will still be beneficial in at least a
couple of respects:
  - it becomes significantly easier to replace *elements* of the
import mechanism, beyond the hooks provided by PEP 302. Specifically,
you can *subclass* the engine implementation and only replace the
parts you want to change.
  - you can provide convenience functions that handle multi-stage
updates to the import state in a consistent fashion (cf. many of the
details in PEP 402 regarding correctly updating package paths as
sys.path changes).

> Adding an optional keyword
> argument to some of the function is surely no simplification.

Yeah, that's by far the weakest part of the idea so far - figuring out
how to integrate it with the existing PEP 302 mechanisms. As an
initial step, I'm now thinking we may be able to do something based on
context management and the import lock that is even simpler than going
to thread-local storage: offer a context manager as part of the engine
API that acquires the import lock, swaps out all of the state in
sys.modules for the engine's own state, then reverses the process at
the end. Something like:

    IMPORT_STATE_ATTRS = ("path", "modules", "path_importer_cache",
"meta_path", "path_hooks")

    @contextmanager
    def import_context(self):
        imp.acquire_lock()
        try:
            orig_state = tuple(getattr(sys, attr) for attr in
IMPORT_STATE_ATTRS)
            new_state = tuple(getattr(self, attr) for attr in
IMPORT_STATE_ATTRS)
            restore_state = []
            try:
                for attr, new_value, orig_value in zip(state_attrs,
new_state, orig_state):
                    setattr(sys, attr, new_value)
                    restore_state.append((attr, orig_value))
                yield self
            finally:
                for attr, orig_value in restore_state:
                    setattr(sys, attr, orig_value)
        finally:
            imp.release_lock()

We would have to go through the interpreter and eliminate all of the
current locations where 'sys' gets bypassed to make that work, though
(e.g. most direct access to interp->modules from C code would need to
be updated to go through 'sys' instead).

The bar for the PEP really needs to be set at "existing importers and
loaders work without modification" (so long as they're not caching sys
attributes when they really shouldn't be)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Import-SIG mailing list