[Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning"

Nick Coghlan ncoghlan at gmail.com
Tue Aug 16 01:28:58 CEST 2011


On Tue, Aug 16, 2011 at 6:15 AM, Greg Slodkowicz <jergosh at gmail.com> wrote:
> I've ran into the following problem: prior to PEP-402, if we import
> Foo.Bar.Baz, first Foo and then Foo.Bar are imported by recursion
> finally, if these succeed, also Foo.Bar.Baz. Then __import__ returns
> the root module, in this case Foo. But suppose Foo and Bar are virtual
> packages, i. e. there is a file Foo/Bar/Baz.py somewhere on the path,
> lacking any __init__.py files. As far as I understand the proposal,
> then there would be no Foo to return.
>
> One way to get around this would be to create an empty module on each
> import level (Foo and Foo.Bar in this case) and set appropriate
> __path__ on each of them. To my mind, this would require changing the
> way get_subpath() works.
>
> Currently, the requirement is that
> "Each importer is checked for a get_subpath() method, and if present,
> the method is called with the *full name* of the module/package the
> path is being constructed for. The return value is either a string
> representing a subdirectory for the requested package, or None if no
> such subdirectory exists."

However, if you go back up to the Specification section, you'll find this bit:

"Note, by the way, that this change must be applied recursively: that
is, if foo and foo.bar are pure virtual packages, then import
foo.bar.baz must wait until foo.bar.baz is found before creating
module objects for both foo and foo.bar, and then create both of them
together, properly setting the foo module's .bar attribute to point to
the foo.bar module.

In this way, pure virtual packages are never directly importable: an
import foo or import foo.bar by itself will fail, and the
corresponding modules will not appear in sys.modules until they are
needed to point to a successfully imported submodule or self-contained
subpackage."

I'm actually not sure this is a viable approach as currently described
in the PEP - most of the existing import machinery assumes that parent
modules will exist in sys.modules before child modules are imported.
In addition, it would create some confusing inconsistencies in cases
where, for example 'foo' was pure virtual, but 'bar' was either
self-contained or included a module alongside the subpackage
directories (whether or not 'foo' remained around after a failed
'foo.bar.baz' lookup would depend on whether or not 'foo.bar' was pure
virtual or not). Given that, I think once a pure virtual package has
been created while hunting for subpackages, there's no percentage in
trying to remove it even if the subpackage search ultimately fails.

This would be most consistent with current import behaviour:

>>> 'logging' in sys.modules
False
>>> 'logging.handlers' in sys.modules
False
>>> import logging.handlers.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named foo
>>> 'logging' in sys.modules
True
>>> 'logging.handlers' in sys.modules
True

> If instead we allow passing a 'partial' name (Foo and Foo.Bar) of a
> virtual package to get_subpath() and indicate (by an extra parameter)
> that we are looking for a purely virtual module, we could create
> __path__ entries for each of these parent modules.
>
> For that reason, in my implementation I introduced an extra parameter
> ' as_parent' to _gcd_import(), indicating that if a module or package
> with a given name is not found, a subdirectory (e. g. Foo/Bar) on the
> import path should be accepted as well and an analogical extra
> parameter in get_subpath().
>
> Hope this makes sense. I'd appreciate any feedback.

I don't think you need the extra argument to get_subpath() - virtual
path construction should be the same regardless of whether you're
creating it for a pure virtual module or not. However, I suspect
you're right about needing the flag argument to _gcd_import().

For the foo.bar.baz case, where there is no 'foo' instance in
sys.modules, you want to end up doing something along the lines of the
following (obviously deriving the names from the passed in dotted
string rather than hard coding anything):

  # Assume a _std_import utility function is available
  path = sys.path
  for name in ('foo', 'foo.bar'):
    # Virtual packages may be created while hunting for subpackages
    try:
      pkg = _std_import(name)
    except ImportError:
      # Try the pure virtual case
      path = get_virtual_path(name, path) # May throw ImportError
      pkg = imp.new_module(name)
      pkg.__path__ = path
    else:
      try:
        path = pkg.__path__
      except AttributeError:
        # Try the initialised virtual case
        path = get_virtual_path(name, path) # May throw ImportError
        pkg.__path__ = path
  else:
    # Direct initial import of virtual packages is not allowed
    result = _std_import('foo.bar.baz')
  # Hook up imported modules in relevant namespaces
  foo.bar = sys.modules['foo.bar']
  foo.bar.baz = sys.modules['foo.bar.baz']

I haven't looked at the code at this point, but the body of that loop
presumably corresponds to your 'as_parent' case in _gcd_import(),
while the else clause on the loop would be the standard import case.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Import-SIG mailing list