[Import-SIG] New PEP draft: "Simplified Package Layout and Partitioning"

Nick Coghlan ncoghlan at gmail.com
Tue Aug 16 01:28:58 CEST 2011

On Tue, Aug 16, 2011 at 6:15 AM, Greg Slodkowicz <jergosh at gmail.com> wrote:
> I've ran into the following problem: prior to PEP-402, if we import
> Foo.Bar.Baz, first Foo and then Foo.Bar are imported by recursion
> finally, if these succeed, also Foo.Bar.Baz. Then __import__ returns
> the root module, in this case Foo. But suppose Foo and Bar are virtual
> packages, i. e. there is a file Foo/Bar/Baz.py somewhere on the path,
> lacking any __init__.py files. As far as I understand the proposal,
> then there would be no Foo to return.
> One way to get around this would be to create an empty module on each
> import level (Foo and Foo.Bar in this case) and set appropriate
> __path__ on each of them. To my mind, this would require changing the
> way get_subpath() works.
> Currently, the requirement is that
> "Each importer is checked for a get_subpath() method, and if present,
> the method is called with the *full name* of the module/package the
> path is being constructed for. The return value is either a string
> representing a subdirectory for the requested package, or None if no
> such subdirectory exists."

However, if you go back up to the Specification section, you'll find this bit:

"Note, by the way, that this change must be applied recursively: that
is, if foo and foo.bar are pure virtual packages, then import
foo.bar.baz must wait until foo.bar.baz is found before creating
module objects for both foo and foo.bar, and then create both of them
together, properly setting the foo module's .bar attribute to point to
the foo.bar module.

In this way, pure virtual packages are never directly importable: an
import foo or import foo.bar by itself will fail, and the
corresponding modules will not appear in sys.modules until they are
needed to point to a successfully imported submodule or self-contained

I'm actually not sure this is a viable approach as currently described
in the PEP - most of the existing import machinery assumes that parent
modules will exist in sys.modules before child modules are imported.
In addition, it would create some confusing inconsistencies in cases
where, for example 'foo' was pure virtual, but 'bar' was either
self-contained or included a module alongside the subpackage
directories (whether or not 'foo' remained around after a failed
'foo.bar.baz' lookup would depend on whether or not 'foo.bar' was pure
virtual or not). Given that, I think once a pure virtual package has
been created while hunting for subpackages, there's no percentage in
trying to remove it even if the subpackage search ultimately fails.

This would be most consistent with current import behaviour:

>>> 'logging' in sys.modules
>>> 'logging.handlers' in sys.modules
>>> import logging.handlers.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named foo
>>> 'logging' in sys.modules
>>> 'logging.handlers' in sys.modules

> If instead we allow passing a 'partial' name (Foo and Foo.Bar) of a
> virtual package to get_subpath() and indicate (by an extra parameter)
> that we are looking for a purely virtual module, we could create
> __path__ entries for each of these parent modules.
> For that reason, in my implementation I introduced an extra parameter
> ' as_parent' to _gcd_import(), indicating that if a module or package
> with a given name is not found, a subdirectory (e. g. Foo/Bar) on the
> import path should be accepted as well and an analogical extra
> parameter in get_subpath().
> Hope this makes sense. I'd appreciate any feedback.

I don't think you need the extra argument to get_subpath() - virtual
path construction should be the same regardless of whether you're
creating it for a pure virtual module or not. However, I suspect
you're right about needing the flag argument to _gcd_import().

For the foo.bar.baz case, where there is no 'foo' instance in
sys.modules, you want to end up doing something along the lines of the
following (obviously deriving the names from the passed in dotted
string rather than hard coding anything):

  # Assume a _std_import utility function is available
  path = sys.path
  for name in ('foo', 'foo.bar'):
    # Virtual packages may be created while hunting for subpackages
      pkg = _std_import(name)
    except ImportError:
      # Try the pure virtual case
      path = get_virtual_path(name, path) # May throw ImportError
      pkg = imp.new_module(name)
      pkg.__path__ = path
        path = pkg.__path__
      except AttributeError:
        # Try the initialised virtual case
        path = get_virtual_path(name, path) # May throw ImportError
        pkg.__path__ = path
    # Direct initial import of virtual packages is not allowed
    result = _std_import('foo.bar.baz')
  # Hook up imported modules in relevant namespaces
  foo.bar = sys.modules['foo.bar']
  foo.bar.baz = sys.modules['foo.bar.baz']

I haven't looked at the code at this point, but the body of that loop
presumably corresponds to your 'as_parent' case in _gcd_import(),
while the else clause on the loop would be the standard import case.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Import-SIG mailing list