From ericsnowcurrently at gmail.com  Thu Jul  7 06:35:06 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 6 Jul 2011 22:35:06 -0600
Subject: [Import-SIG] PEP 382 update? and implementation feedback
Message-ID: <CALFfu7AnVJ3S1THyXGQVWA2+Rr4HJ752PyjA6GkRoDnwYDAxHg@mail.gmail.com>

Any feedback on PJE's proposal [1] regarding PEP 382?  I have some
free time to work on a reference implementation and want to make sure
I am targeting an up-to-date spec.

My first goal is to help get a proof-of-concept implementation out
there for the PEP, for 3.3, regardless of the ultimate implementation.
 However, my end goal is to leverage that effort into a backported
implementation for 2.x.  How far back should I go with that?  I was
thinking 2.4 [2].

The two approaches I've considered to meet these goals are a heavy
import hook and changes to importlib.  For what I have in mind, both
would require backporting the full importlib (for my end goal);
currently only a simple port of import_module is backported and
released on PyPI.

The import hook approach would not be helpful for 3.3 except as a
proof-of-concept.  However, the importlib approach could also work as
the 3.3 implementation if Brett realizes his intentions for
importlib.__import__ [3].

Thoughts?

-eric


[1] http://mail.python.org/pipermail/import-sig/2011-June/000208.html
[2] the version depends partly on use cases, like google app engine
(2.5) and the various distros (no idea).  I'm personally stuck on 2.4
at work for the next while, hence my choice.  :)
[3] http://bugs.python.org/issue2377

From eric at trueblade.com  Thu Jul  7 11:39:38 2011
From: eric at trueblade.com (Eric Smith)
Date: Thu, 07 Jul 2011 05:39:38 -0400
Subject: [Import-SIG] PEP 382 update? and implementation feedback
In-Reply-To: <CALFfu7AnVJ3S1THyXGQVWA2+Rr4HJ752PyjA6GkRoDnwYDAxHg@mail.gmail.com>
References: <CALFfu7AnVJ3S1THyXGQVWA2+Rr4HJ752PyjA6GkRoDnwYDAxHg@mail.gmail.com>
Message-ID: <4E157EDA.6000606@trueblade.com>

On 7/7/2011 12:35 AM, Eric Snow wrote:
> Any feedback on PJE's proposal [1] regarding PEP 382?  I have some
> free time to work on a reference implementation and want to make sure
> I am targeting an up-to-date spec.

I've been working on a response, but haven't had time to post it yet.
Maybe in the next few days. I agree with most of it (and maybe all of
it, I'm still reading through it).

> My first goal is to help get a proof-of-concept implementation out
> there for the PEP, for 3.3, regardless of the ultimate implementation.
>  However, my end goal is to leverage that effort into a backported
> implementation for 2.x.  How far back should I go with that?  I was
> thinking 2.4 [2].

We (python-dev) can't release a new version of 2.x. That said, I'd love
it if I could compile a version of 2.5 for my own uses that had this
feature, or if it could be done as an import hook.

> The import hook approach would not be helpful for 3.3 except as a
> proof-of-concept.  However, the importlib approach could also work as
> the 3.3 implementation if Brett realizes his intentions for
> importlib.__import__ [3].

Are you thinking of doing the import hook version in C?

From ericsnowcurrently at gmail.com  Thu Jul  7 16:36:30 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Thu, 7 Jul 2011 08:36:30 -0600
Subject: [Import-SIG] PEP 382 update? and implementation feedback
In-Reply-To: <4E157EDA.6000606@trueblade.com>
References: <CALFfu7AnVJ3S1THyXGQVWA2+Rr4HJ752PyjA6GkRoDnwYDAxHg@mail.gmail.com>
	<4E157EDA.6000606@trueblade.com>
Message-ID: <CALFfu7AEFxG488V4EEBCx95WvtoNOxwNhLsnp-urohYNNAyVQg@mail.gmail.com>

On Thu, Jul 7, 2011 at 3:39 AM, Eric Smith <eric at trueblade.com> wrote:
> On 7/7/2011 12:35 AM, Eric Snow wrote:
>> My first goal is to help get a proof-of-concept implementation out
>> there for the PEP, for 3.3, regardless of the ultimate implementation.
>> ?However, my end goal is to leverage that effort into a backported
>> implementation for 2.x. ?How far back should I go with that? ?I was
>> thinking 2.4 [2].
>
> We (python-dev) can't release a new version of 2.x. That said, I'd love
> it if I could compile a version of 2.5 for my own uses that had this
> feature, or if it could be done as an import hook.
>

Yeah, any backport that I do would be released on PyPI, as has been
done with things like importlib and distutils2.

>> The import hook approach would not be helpful for 3.3 except as a
>> proof-of-concept. ?However, the importlib approach could also work as
>> the 3.3 implementation if Brett realizes his intentions for
>> importlib.__import__ [3].
>
> Are you thinking of doing the import hook version in C?

Nope.  I'm just planning on extending (and backporting) importlib
either indirectly (for the import hook) or directly.  With the import
hook it should be easy enough to add it onto sys.meta_path early on.
Same with explicitly changing __import__ to be importlib.__import__.

If I keep the implementation pure Python it would be usable with
Jython, PyPy, and the rest.  Also, my Python-fu is much stronger than
my C.  Finally, my understanding is that performance is the only gain
for a C version, which does not seem to matter much for imports (hence
importlib).

Keep in mind that I don't have a vested interest in PEP 382, just in
import features.  The PEP and ensuing discussion seem clear enough
that that should not get in the way.  However, if the actual use cases
dictate a different approach I'd be glad to reassess.

-eric


> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From pje at telecommunity.com  Thu Jul  7 20:43:18 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 07 Jul 2011 14:43:18 -0400
Subject: [Import-SIG] PEP 382 update? and implementation feedback
In-Reply-To: <4E157EDA.6000606@trueblade.com>
References: <CALFfu7AnVJ3S1THyXGQVWA2+Rr4HJ752PyjA6GkRoDnwYDAxHg@mail.gmail.com>
	<4E157EDA.6000606@trueblade.com>
Message-ID: <20110707184337.BEEF13A4108@sparrow.telecommunity.com>

At 05:39 AM 7/7/2011 -0400, Eric Smith wrote:
>We (python-dev) can't release a new version of 2.x. That said, I'd love
>it if I could compile a version of 2.5 for my own uses that had this
>feature, or if it could be done as an import hook.

FYI, I have a draft import hook for 2.x that complies with the spec I proposed:

    http://pastebin.com/uFQ9iwXQ

In fact, the spec proposal is a retrofit based on my import hook 
being (AFAICT) the Simplest Thing That Could Possibly Work for a 2.x 
implementation.

That code hasn't actually been tested yet; I was starting to port the 
PEP 382 branch's test suite when I noticed the discrepancy between 
what I was doing and what the tests were looking for.

That's why I ended up proposing a change, as the tests check for 
something that seems like another unneeded feature (i.e., the ability 
to sandwich undeclared namespace directories between declared ones).

It probably would be a good idea to revise the PEP itself, assuming 
Martin is amenable.  One thing I'd also like to clean up, for 
example, is the idea that there's a '*' in __path__ lists.  If we are 
no longer using '*' in .pth files to denote namespaces, then the '*' 
in __path__ is kind of pointless.  So, sys.namespace_packages should 
be the sole arbiter of what constitutes a namespace package.  (It 
should also be clarified that sys.namespace_packages may name 
packages which are not as yet imported, although the implied 
semantics are undefined.)

Anyway...  still looking for some feedback here.  I'd like to know if 
there's general support before taking the time to revise the tests, 
draft an updated spec, etc.


From pje at telecommunity.com  Fri Jul  8 21:51:39 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 08 Jul 2011 15:51:39 -0400
Subject: [Import-SIG] New draft revision for PEP 382
Message-ID: <20110708195157.335043A404D@sparrow.telecommunity.com>

The following is my attempt at an updated draft of PEP 382, based on 
the recently-discussed changes.

To address the questions and criticisms raisd on Python-Dev when the 
PEP was introduced, I added an extended "Motivation" section that 
explains issues with the current approaches, and states the case for 
the PEP in more detail, including info about why anyone should care 
about namespace packages in the first place.  ;-)

I've also added a "Rejected Alternatives" section to document the 
other proposed approaches and the rationale for rejecting them in 
favor of the current proposal.

In addition, I've specified in a bit more detail the necessary 
changes to e.g. the pkgutil module.  (At least one open issue 
remains, however, and that is the question of what, if anything, 
should happen to the existing extend_path() function.  A second 
possible open question regards the API of the path fixup functions I 
propose in pkgutil.)

Anyway, your questions and comments, please!  The draft follows below:


PEP: 382
Title: Namespace Package Declarations
Version: $Revision$
Last-Modified: $Date$
Author: Martin v. L??wis <martin at v.loewis.de>, PJ Eby
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 02-Apr-2009
Python-Version: 3.2
Post-History:

Abstract
========

This PEP proposes an enhancement to Python's import machinery to
replace existing uses of the standard library's
``pkgutil.extend_path()`` API, and similar third-party APIs such as
``pkg_resources.declare_namespace()``.

The proposed enhancement will improve the reliability of existing
namespace package implementations, while providing "One Obvious Way"
to produce and consume namespace packages.


Terminology
===========

Within this PEP, the following terms are used as follows:

Package
     Python packages as defined by Python's import statement.

Distribution
     A separately installable set of Python modules, as registered in
     the Python package index, and installed by distutils, setuptools,
     etc.

Vendor Package
     A group of files installed by an operating system's packaging
     mechanism (e.g. Debian or Redhat packages installed on Linux
     systems).

Portion
     A set of files in a single directory (possibly inside a zip file
     or other storage mechanism) that contribute modules or subpackages
     to a namespace package.  The contents of each portion ``sys.path``

Namespace Package
     A package whose subpackages and modules can be split into portions
     that can be distributed or installed separately (via separate
     distributions and/or vendor packages), in shared or separate
     installation locations.

     Unlike a regular package, however, which only allows submodule
     and subpackage imports from a single location, a namespace
     package's ``__path__`` is configured so that submodules and
     subpackages can be imported from each of its installed portions,
     regardless of their relative positions in ``sys.path``.


Motivation
==========

.. epigraph::

     "Most packages are like modules.  Their contents are highly
     interdependent and can't be pulled apart.  [However,] some
     packages exist to provide a separate namespace. ...  It should
     be possible to distribute sub-packages or submodules of these
     [namespace packages] independently."

     -- Jim Fulton, shortly before the release of Python 2.3 [1]_


The Current Approach
--------------------

First introduced in Python 2.3, namespace packages are a mechanism
for splitting a single Python package across multiple directories
on disk.  This splitting has two main benefits:

1. It allows different parts of a large package or framework to be
    distributed and installed independently.  For example, installing
    the ``zope.interface`` package without having to install every
    package in the ``zope.*`` namespace.

    (This is somewhat similar to the way Perl's package system allows
    authors to separately distribute subpackages of ``File::`` or
    ``Email::``.)

2. As a side-effect of benefit 1, it reduces package naming collisions
    across multiple authors or organizations, by encouraging them to
    use distinguishing prefixes.  Instead of say, Zope and Twisted both
    offering a top-level ``interface`` package (in which case, both
    could not be installed to the same directory), they can use
    ``zope.interface`` and ``twisted.interface``, while still being
    able to distribute these subpackages separately from other ``zope``
    or ``twisted`` subpackages.

    (This is somewhat similar to the way Java uses names like
    ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
    only flatter.)

In current Python versions, however, a registration function (such as
``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
must be explicitly invoked in order to set up the package's
``__path__``.

There are two problems with this approach, however.


Problems With The Current Approach
----------------------------------

The first (and lesser) problem is that there is no One Obvious Way to
either declare that a package is a "namespace" or "module" package,
or to tell which kind of package a given directory on disk is.

Instead, you must choose one of the various APIs to use, each of
which is slightly-incompatible with the others.  (For example,
``pkgutil`` supports ``*.pkg`` files; setuptools doesn't.  Likewise,
setuptools supports package portions living in zip files, and adding
new path components to already-imported namespaces, whereas
``pkgutil`` doesn't.)

Similarly, to tell whether a given directory is a "namespace" or
"module" package, you must read its documentation or inspect its code
in detail, and be able to recognize the various API calls mentioned
above.

The second -- and much larger -- issue is that whichever API is used
to declare the namespace, the declaration has to be invoked from a
namespace package's ``__init__`` module in order to work.  (Otherwise,
only the first part of the package found on ``sys.path`` would be
importable.)

This clashes with the goal of separately installing portions of a
namespace, because then each distributed piece must include a copy
of the same ``__init__.py``.  (Otherwise, each piece would not be
importable on its own, as Python currently requires the existence
of an ``__init__`` module in order to import the package at all, let
alone set up the namespace!)

In addition to the developer inconvenience of creating, synchronizing,
and distributing these duplicated ``__init__`` modules, there is a
further problem created for operating system vendors.

Vendor packages typically must not provide overlapping files, and an
attempt to install a vendor package that has a file already on disk
will fail or cause unpredictable behavior.  As vendors might choose to
package distributions such that they will end up all in a single
directory for the namespace package, all portions would contribute
conflicting ``__init__.py`` files.

This issue has lead to various fragile and complex workarounds in
practice, such as ``.pth`` file abuse by setuptools, and the shipping
of broken partial packages with distutils.

With the enhancement proposed here, however, all of the above problems
can be readily resolved.


Specification
=============

Instead of an API call buried inside a series of duplicated and
potentially-clashing ``__init__`` modules (which mostly exist only
to make the package importable and declare its namespace-ness), this
PEP proposes that Python's import machinery be modified to include
direct support for namespace packages.

This support would work by adding a new way to desginate a directory
as containing a namespace package portion: by including one or more
``*.ns`` files in it.

This approach removes the need for an ``__init__`` module to be
duplicated across namespace package portions.  Instead, each portion
can simply include a uniquely-named ``*.ns`` file, thereby avoiding
filename clashes in vendor packages.

And, since the import machinery knows that these directories are
portions of a namespace package, it can automatically initialize
the package's ``__path__`` to include portions located on different
parts of ``sys.path``.  (Thus avoiding the need for special code
to be called in the ``__init__`` module.)

In addition to doing this path setup, the import machinery will also
add any imported namespace packages to ``sys.namespace_packages``
(initially an empty set), so that namespace packages can be identified
or iterated over.


PEP \302 Extension
------------------

The existing PEP 302 protocol is to be extended to handle namespace
package portion directories, by adding a new importer method,
``namespace_subpath(fullname)``.  An implementation of this method
will be added to all applicable importer classes distributed with
Python, including those in ``pkgutil`` and ``zipimport``).

(Note: any other importer wishing to support namespace packages must
provide its own implementation of this method as well.  If an importer
does not have a ``namespace_subpath()`` method, it will be treated as
if it *did* have the method, but it returned ``None`` when called.)

This new method is called just before the importer's ``find_module()``
is normally invoked.  If the importer determines that `fullname` is
a namespace package portion under its jurisdiction, then the importer
returns an importer-specific path to that namespace portion.

For example, if a standard filesystem path importer for the path
``/usr/lib/site-packages`` is about to be asked to import ``zope``,
and there is a ``/usr/lib/site-packages/zope`` directory containing
any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
on that importer should return ``"/usr/lib/site-packages/zope"``.

However, if there is no such subdirectory, or it does *not* contain
any files whose names end with ``.ns``, that importer would return
``None`` instead.

The Python import machinery will call this method on each importer
corresponding to a path entry in ``sys.path`` (for top-level imports)
or in a parent package ``__path__`` (for subpackage imports).

If a normal package or module is found before a namespace package,
importing proceeds according to the normal PEP 302 protocol.  (That
is, a loader object is simply asked to load the located module or
package.)

However, if a namespace package portion is found (i.e., an importer's
``namespace_subpath()`` returns a string), then the normal import
search stops, and a namespace package is created instead.

The import machinery continues iterating over importers and calling
``namespace_subpath()`` on them, but it does **not** continue calling
``find_module()`` on them.  Instead, it accumulates any strings
returned by the subpath calls, in order to assemble a ``__path__``
for the package being imported.

(Note that this implies that any non-namespace packages with the same
name are skipped, and not included in the resulting package's
``__path__``.  In other words, a namespace package's initial
``__path__`` only includes namespace portions, never non-namespace
package directories.)

Once this ``__path__`` has been assembled, a module is created, and
its ``__path__`` attribute is set.  The package's name is then added
to ``sys.namespace_packages`` -- a set of package names.

Finally, the ``__init__`` module code for the package (if it exists)
is located and executed in the new module's namespace.

Each importer that returns a ``namespace_subpath()`` for the package
is asked to perform a standard ``find_module()`` for the package.
Since by the normal import rules, a directory containing an
``__init__`` module is a package, this call should succeed if the
namespace package portion contains an ``__init__`` module, and the
importing can proceed normally from that point.

There is one caveat, however.  The importers currently distributed
with Python expect that *they* will be the ones to initialize the
``__path__`` attribute, which means that they must be changed to
either recognize that ``__path__`` has already been set and not
change it, or to handle namespace packages specially (e.g., via an
internal flag or checking ``sys.namespace_packages``).

Similarly, any third-party importers wishing to support namespace
packages must make similar changes.

(NOTE: in general, it goes against the design of PEP 302 for a loader
object to assume that it is always creating the module object or that
the module it is operating on is empty.  Making this assumption can
result in code that breaks the normal operation of the ``reload()``
builtin and any specialized tools that rely on it, such as lazy
importers, automatic reloaders, and so on.)


Standard Library Changes/Additions
----------------------------------

The ``pkgutil`` module should be updated to handle this
specification appropriately, including any necessary changes to
``extend_path()``, ``iter_modules()``, etc.  A new generic API for
calling ``namespace_subpath()`` on importers should be added as well.

Specifically the proposed changes and additions are:

* A new ``namespace_subpath(importer, fullname)`` generic, allowing
   implementations to be registered for existing importers.

* A new ``extend_namespaces(path_entry)`` function, to extend existing
   and already-imported namespace packages' ``__path__`` attributes to
   include any portions found in a new ``sys.path`` entry.  This
   function should be called by applications extending ``sys.path``
   at runtime, e.g. to include a plugin directory or add an egg to the
   path.

   The implementation of this function does a simple breadth-first walk
   of ``sys.namespace_packages``, and performs any necessary
   ``namespace_subpath()`` calls to identify what path entries need to
   be added to each package's ``__path__``, given that `path_entry`
   has been added to ``sys.path``.

* A new ``iter_namespaces(parent='')`` function to allow breadth-first
   traversal of namespaces in ``sys.namespace_packages``, by yielding
   the child namespace packages of `parent`.  For example, calling
   ``iter_namespaces("zope")`` might yield ``zope.app`` and
   ``zope.products`` (if they are namespace packages registered in
   ``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
   This function is needed to implement ``extend_namespaces()``, but
   is potentially useful to others.

* ``ImpImporter.iter_modules()`` should be changed to also detect and
   yield the names of namespace package portions.

In addition to the above changes, the ``zipimport`` importer should
have its ``iter_modules()`` implementation similarly changed.  (Note:
current versions of Python implement this via a shim in ``pkgutil``,
so technically this is also a change to ``pkgutil``.)


Implementation Notes
--------------------

For users, developers, and distributors of namespace packages:

* ``sys.namespace_packages`` is allowed to contain non-existent or
   not-yet-imported package names; code that uses its contents should
   not assume that every name in this set is also present in
   sys.packages or that importing the name will necessarily succeed.

* ``*.ns`` files must be empty or contain only ASCII whitespace
   characters.  This leaves open the possibility for future extension
   to the format.

* Files contained within a namespace package portion directory must
   be *unique* to that portion, so that the portion can be distributed
   as a vendor package without any filename overlap.  This applies to
   modules and data files as well as ``*.ns`` files.

   (For ``*.ns`` files themselves, uniqueness can be achieved simply by
   giving them a name based on the distribution that contains the file,
   and it is recommended that packaging tools support doing this
   automatically.)

* Although this PEP supports the use of non-empty ``__init__`` modules
   in namespace packages, their usage is controversial.  If more than
   one package portion contains an ``__init__`` module, at most one of
   them will be executed, possibly leading to silent errors.

   Therefore, if you must include an ``__init__`` module in your
   namespace package, make sure that it is provided by exactly **one**
   distribution, and that all other distributions using that module's
   contents are defined so as to have an installation dependency on
   the distribution containing the ``__init__`` module.  Otherwise,
   it may not be present in some installations.

   (Note: for historical reasons, existing namespace packages nearly
   always include ``__init__`` modules, but they are usually empty
   except for code to declare the package a namespace.  Under this
   proposal, these nearly-empty modules could and should be replaced
   by an empty ``*.ns`` file in the package directory.)

For those implementing PEP 302 importer objects:

* Importers that support the ``iter_modules()`` method and want to add
   namespace support should modify their ``iter_modules()``
   method so that it discovers and list namespace packages as well as
   standard modules and packages.

* For implementation efficiency, an importer is allowed to cache
   information (such as whether a directory exists and whether an
   ``__init__`` module is present in it) between the invocation of a
   ``namespace_subpath()`` call and a subsequent ``find_module()`` call
   for the same name.

   It should, however, avoid retaining such cached information for any
   longer than the next method call, and it should also verify that the
   request is in fact for the same module/package name, as it is not
   guaranteed that a ``namespace_subpath()`` call will always be
   followed by a matching ``find_module()`` call.  (After all, an
   ``__init__`` module may already have been supplied by an earlier
   importer on the path.)

* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
   not need to implement ``namespace_subpath()``, because the method
   is only called on importers corresponding to ``sys.path`` entries.'
   If a meta importer wishes to support namespace packages, it must
   do so entirely within its ``find_module()`` implementation.

   Unfortunately, it is unlikely that any such implementation will be
   able to merge its namespace portions with those of other meta
   importers or ``sys.path`` importers, so the meaning of "supporting
   namespace packages" for a meta importer is currently undefined.

   However, since the intended use case for meta importers is to
   replace Python's normal import process entirely for some subset of
   modules, and the number of such importers currently implemented is
   quite small, this seems unlikely to be a big issue in practice.


Rejected Alternatives
=====================

* The original version of this PEP used ``.pkg`` or ``.pth`` files
   that contained either explicit directories to be added to a
   package's ``__path__``, or ``*`` to indicate that a package was
   a namespace.

   But this approach required a more complex change to the importer
   protocol, the files had to actually be opened and read, and there
   were no concrete use cases proposed for the additional flexibility
   specifying explicit paths.

* On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using
   extra files, namespace packages use a ``__pkg__.py`` file to
   indicate their namespace-ness, in addition to a (required)
   ``__init__.py``.

   Unfortunately, this approach solves only one of the `problems with
   the current approach`_: i.e., having a standard way of declaring and
   identifying namespace packages.  It does not address the necessity
   of distributing duplicated files, or filename overlap between
   distributions.  Further, it does not allow truly-independent
   namespace portions to exist, since it requires a "defining" portion
   (the portion containing the single ``__init__`` module) to exist.

* Another approach considered during revisions to this PEP was to
   simply rename package directories to add a suffix like ``.ns``
   or ``-ns``, to indicate their namespaced nature.  This would effect
   a small performance improvement for the initial import of a
   namespace package, avoid the need to create empty ``*.ns`` files,
   and even make it clearer that the directory involved is a namespace
   portion.

   The downsides, however, are also plentiful.  If a package starts
   its life as a normal package, it must be renamed when it becomes
   a namespace, with the implied consequences for revision control
   tools.

   Further, there is an immense body of existing code (including the
   distutils and many other packaging tools) that expect a package
   directory's name to be the same as the package name.  And porting
   existing Python 2.x namespace packages to Python 3 would require
   widespread directory renaming as well.

   In short, this approach would require a vastly larger number of
   changes to both the standard library and third-party code, for
   a tiny potential performance improvement and a small increase in
   clarity.  It was therefore rejected on "practicality vs. purity"
   grounds.



References
==========

.. [1] "namespace" vs "module" packages (mailing list thread)
    (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)

.. [2] "PEP \382: Namespace Packages" (mailing list thread)
    (http://mail.python.org/pipermail/python-dev/2009-April/088087.html)

Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:


From ericsnowcurrently at gmail.com  Fri Jul  8 23:52:49 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Fri, 8 Jul 2011 15:52:49 -0600
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
Message-ID: <CALFfu7CFJqK+2cB+HbFZR5PyMfeesm1s-+_CcrqGKK9JQC7dHw@mail.gmail.com>

On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby <pje at telecommunity.com> wrote:
> The following is my attempt at an updated draft of PEP 382, based on the
> recently-discussed changes.
>
> To address the questions and criticisms raisd on Python-Dev when the PEP was
> introduced, I added an extended "Motivation" section that explains issues
> with the current approaches, and states the case for the PEP in more detail,
> including info about why anyone should care about namespace packages in the
> first place. ?;-)
>
> I've also added a "Rejected Alternatives" section to document the other
> proposed approaches and the rationale for rejecting them in favor of the
> current proposal.
>
> In addition, I've specified in a bit more detail the necessary changes to
> e.g. the pkgutil module. ?(At least one open issue remains, however, and
> that is the question of what, if anything, should happen to the existing
> extend_path() function. ?A second possible open question regards the API of
> the path fixup functions I propose in pkgutil.)
>
> Anyway, your questions and comments, please! ?The draft follows below:
>
>
> PEP: 382
> Title: Namespace Package Declarations
> Version: $Revision$
> Last-Modified: $Date$
> Author: Martin v. L??wis <martin at v.loewis.de>, PJ Eby
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 02-Apr-2009
> Python-Version: 3.2
> Post-History:
>
> Abstract
> ========
>
> This PEP proposes an enhancement to Python's import machinery to
> replace existing uses of the standard library's
> ``pkgutil.extend_path()`` API, and similar third-party APIs such as
> ``pkg_resources.declare_namespace()``.
>
> The proposed enhancement will improve the reliability of existing
> namespace package implementations, while providing "One Obvious Way"
> to produce and consume namespace packages.
>
>
> Terminology
> ===========
>
> Within this PEP, the following terms are used as follows:
>
> Package
> ? ?Python packages as defined by Python's import statement.
>
> Distribution
> ? ?A separately installable set of Python modules, as registered in
> ? ?the Python package index, and installed by distutils, setuptools,
> ? ?etc.
>
> Vendor Package
> ? ?A group of files installed by an operating system's packaging
> ? ?mechanism (e.g. Debian or Redhat packages installed on Linux
> ? ?systems).
>
> Portion
> ? ?A set of files in a single directory (possibly inside a zip file
> ? ?or other storage mechanism) that contribute modules or subpackages
> ? ?to a namespace package. ?The contents of each portion ``sys.path``
>
> Namespace Package
> ? ?A package whose subpackages and modules can be split into portions
> ? ?that can be distributed or installed separately (via separate
> ? ?distributions and/or vendor packages), in shared or separate
> ? ?installation locations.
>
> ? ?Unlike a regular package, however, which only allows submodule
> ? ?and subpackage imports from a single location, a namespace
> ? ?package's ``__path__`` is configured so that submodules and
> ? ?subpackages can be imported from each of its installed portions,
> ? ?regardless of their relative positions in ``sys.path``.
>
>
> Motivation
> ==========
>
> .. epigraph::
>
> ? ?"Most packages are like modules. ?Their contents are highly
> ? ?interdependent and can't be pulled apart. ?[However,] some
> ? ?packages exist to provide a separate namespace. ... ?It should
> ? ?be possible to distribute sub-packages or submodules of these
> ? ?[namespace packages] independently."
>
> ? ?-- Jim Fulton, shortly before the release of Python 2.3 [1]_
>

This is a really helpful addition.

>
> The Current Approach
> --------------------
>
> First introduced in Python 2.3, namespace packages are a mechanism
> for splitting a single Python package across multiple directories
> on disk. ?This splitting has two main benefits:
>
> 1. It allows different parts of a large package or framework to be
> ? distributed and installed independently. ?For example, installing
> ? the ``zope.interface`` package without having to install every
> ? package in the ``zope.*`` namespace.
>
> ? (This is somewhat similar to the way Perl's package system allows
> ? authors to separately distribute subpackages of ``File::`` or
> ? ``Email::``.)
>
> 2. As a side-effect of benefit 1, it reduces package naming collisions
> ? across multiple authors or organizations, by encouraging them to
> ? use distinguishing prefixes. ?Instead of say, Zope and Twisted both
> ? offering a top-level ``interface`` package (in which case, both
> ? could not be installed to the same directory), they can use
> ? ``zope.interface`` and ``twisted.interface``, while still being
> ? able to distribute these subpackages separately from other ``zope``
> ? or ``twisted`` subpackages.
>
> ? (This is somewhat similar to the way Java uses names like
> ? ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
> ? only flatter.)
>
> In current Python versions, however, a registration function (such as
> ``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
> must be explicitly invoked in order to set up the package's
> ``__path__``.
>
> There are two problems with this approach, however.
>
>
> Problems With The Current Approach
> ----------------------------------
>
> The first (and lesser) problem is that there is no One Obvious Way to
> either declare that a package is a "namespace" or "module" package,
> or to tell which kind of package a given directory on disk is.
>
> Instead, you must choose one of the various APIs to use, each of
> which is slightly-incompatible with the others. ?(For example,
> ``pkgutil`` supports ``*.pkg`` files; setuptools doesn't. ?Likewise,
> setuptools supports package portions living in zip files, and adding
> new path components to already-imported namespaces, whereas
> ``pkgutil`` doesn't.)
>
> Similarly, to tell whether a given directory is a "namespace" or
> "module" package, you must read its documentation or inspect its code
> in detail, and be able to recognize the various API calls mentioned
> above.
>
> The second -- and much larger -- issue is that whichever API is used
> to declare the namespace, the declaration has to be invoked from a
> namespace package's ``__init__`` module in order to work. ?(Otherwise,
> only the first part of the package found on ``sys.path`` would be
> importable.)
>
> This clashes with the goal of separately installing portions of a
> namespace, because then each distributed piece must include a copy
> of the same ``__init__.py``. ?(Otherwise, each piece would not be
> importable on its own, as Python currently requires the existence
> of an ``__init__`` module in order to import the package at all, let
> alone set up the namespace!)
>
> In addition to the developer inconvenience of creating, synchronizing,
> and distributing these duplicated ``__init__`` modules, there is a
> further problem created for operating system vendors.
>
> Vendor packages typically must not provide overlapping files, and an
> attempt to install a vendor package that has a file already on disk
> will fail or cause unpredictable behavior. ?As vendors might choose to
> package distributions such that they will end up all in a single
> directory for the namespace package, all portions would contribute
> conflicting ``__init__.py`` files.
>
> This issue has lead to various fragile and complex workarounds in
> practice, such as ``.pth`` file abuse by setuptools, and the shipping
> of broken partial packages with distutils.
>
> With the enhancement proposed here, however, all of the above problems
> can be readily resolved.
>
>
> Specification
> =============
>
> Instead of an API call buried inside a series of duplicated and
> potentially-clashing ``__init__`` modules (which mostly exist only
> to make the package importable and declare its namespace-ness), this
> PEP proposes that Python's import machinery be modified to include
> direct support for namespace packages.
>
> This support would work by adding a new way to desginate a directory
> as containing a namespace package portion: by including one or more
> ``*.ns`` files in it.
>
> This approach removes the need for an ``__init__`` module to be
> duplicated across namespace package portions. ?Instead, each portion
> can simply include a uniquely-named ``*.ns`` file, thereby avoiding
> filename clashes in vendor packages.
>
> And, since the import machinery knows that these directories are
> portions of a namespace package, it can automatically initialize
> the package's ``__path__`` to include portions located on different
> parts of ``sys.path``. ?(Thus avoiding the need for special code
> to be called in the ``__init__`` module.)
>
> In addition to doing this path setup, the import machinery will also
> add any imported namespace packages to ``sys.namespace_packages``
> (initially an empty set), so that namespace packages can be identified
> or iterated over.
>
>
> PEP \302 Extension
> ------------------
>
> The existing PEP 302 protocol is to be extended to handle namespace
> package portion directories, by adding a new importer method,
> ``namespace_subpath(fullname)``. ?An implementation of this method
> will be added to all applicable importer classes distributed with
> Python, including those in ``pkgutil`` and ``zipimport``).
>
> (Note: any other importer wishing to support namespace packages must
> provide its own implementation of this method as well. ?If an importer
> does not have a ``namespace_subpath()`` method, it will be treated as
> if it *did* have the method, but it returned ``None`` when called.)
>
> This new method is called just before the importer's ``find_module()``
> is normally invoked. ?If the importer determines that `fullname` is
> a namespace package portion under its jurisdiction, then the importer
> returns an importer-specific path to that namespace portion.
>
> For example, if a standard filesystem path importer for the path
> ``/usr/lib/site-packages`` is about to be asked to import ``zope``,
> and there is a ``/usr/lib/site-packages/zope`` directory containing
> any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
> on that importer should return ``"/usr/lib/site-packages/zope"``.
>
> However, if there is no such subdirectory, or it does *not* contain
> any files whose names end with ``.ns``, that importer would return
> ``None`` instead.
>
> The Python import machinery will call this method on each importer
> corresponding to a path entry in ``sys.path`` (for top-level imports)
> or in a parent package ``__path__`` (for subpackage imports).
>
> If a normal package or module is found before a namespace package,
> importing proceeds according to the normal PEP 302 protocol. ?(That
> is, a loader object is simply asked to load the located module or
> package.)
>
> However, if a namespace package portion is found (i.e., an importer's
> ``namespace_subpath()`` returns a string), then the normal import
> search stops, and a namespace package is created instead.
>
> The import machinery continues iterating over importers and calling
> ``namespace_subpath()`` on them, but it does **not** continue calling
> ``find_module()`` on them. ?Instead, it accumulates any strings
> returned by the subpath calls, in order to assemble a ``__path__``
> for the package being imported.
>
> (Note that this implies that any non-namespace packages with the same
> name are skipped, and not included in the resulting package's
> ``__path__``. ?In other words, a namespace package's initial
> ``__path__`` only includes namespace portions, never non-namespace
> package directories.)
>
> Once this ``__path__`` has been assembled, a module is created, and
> its ``__path__`` attribute is set. ?The package's name is then added
> to ``sys.namespace_packages`` -- a set of package names.
>
> Finally, the ``__init__`` module code for the package (if it exists)
> is located and executed in the new module's namespace.
>
> Each importer that returns a ``namespace_subpath()`` for the package
> is asked to perform a standard ``find_module()`` for the package.
> Since by the normal import rules, a directory containing an
> ``__init__`` module is a package, this call should succeed if the
> namespace package portion contains an ``__init__`` module, and the
> importing can proceed normally from that point.
>
> There is one caveat, however. ?The importers currently distributed
> with Python expect that *they* will be the ones to initialize the
> ``__path__`` attribute, which means that they must be changed to
> either recognize that ``__path__`` has already been set and not
> change it, or to handle namespace packages specially (e.g., via an
> internal flag or checking ``sys.namespace_packages``).
>
> Similarly, any third-party importers wishing to support namespace
> packages must make similar changes.
>
> (NOTE: in general, it goes against the design of PEP 302 for a loader
> object to assume that it is always creating the module object or that
> the module it is operating on is empty. ?Making this assumption can
> result in code that breaks the normal operation of the ``reload()``
> builtin and any specialized tools that rely on it, such as lazy
> importers, automatic reloaders, and so on.)
>
>
> Standard Library Changes/Additions
> ----------------------------------
>
> The ``pkgutil`` module should be updated to handle this
> specification appropriately, including any necessary changes to
> ``extend_path()``, ``iter_modules()``, etc. ?A new generic API for
> calling ``namespace_subpath()`` on importers should be added as well.
>
> Specifically the proposed changes and additions are:
>
> * A new ``namespace_subpath(importer, fullname)`` generic, allowing
> ?implementations to be registered for existing importers.
>
> * A new ``extend_namespaces(path_entry)`` function, to extend existing
> ?and already-imported namespace packages' ``__path__`` attributes to
> ?include any portions found in a new ``sys.path`` entry. ?This
> ?function should be called by applications extending ``sys.path``
> ?at runtime, e.g. to include a plugin directory or add an egg to the
> ?path.
>
> ?The implementation of this function does a simple breadth-first walk
> ?of ``sys.namespace_packages``, and performs any necessary
> ?``namespace_subpath()`` calls to identify what path entries need to
> ?be added to each package's ``__path__``, given that `path_entry`
> ?has been added to ``sys.path``.
>
> * A new ``iter_namespaces(parent='')`` function to allow breadth-first
> ?traversal of namespaces in ``sys.namespace_packages``, by yielding
> ?the child namespace packages of `parent`. ?For example, calling
> ?``iter_namespaces("zope")`` might yield ``zope.app`` and
> ?``zope.products`` (if they are namespace packages registered in
> ?``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
> ?This function is needed to implement ``extend_namespaces()``, but
> ?is potentially useful to others.
>
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
> ?yield the names of namespace package portions.
>
> In addition to the above changes, the ``zipimport`` importer should
> have its ``iter_modules()`` implementation similarly changed. ?(Note:
> current versions of Python implement this via a shim in ``pkgutil``,
> so technically this is also a change to ``pkgutil``.)
>
>
> Implementation Notes
> --------------------
>
> For users, developers, and distributors of namespace packages:
>
> * ``sys.namespace_packages`` is allowed to contain non-existent or
> ?not-yet-imported package names; code that uses its contents should
> ?not assume that every name in this set is also present in
> ?sys.packages or that importing the name will necessarily succeed.
>
> * ``*.ns`` files must be empty or contain only ASCII whitespace
> ?characters. ?This leaves open the possibility for future extension
> ?to the format.
>
> * Files contained within a namespace package portion directory must
> ?be *unique* to that portion, so that the portion can be distributed
> ?as a vendor package without any filename overlap. ?This applies to
> ?modules and data files as well as ``*.ns`` files.
>
> ?(For ``*.ns`` files themselves, uniqueness can be achieved simply by
> ?giving them a name based on the distribution that contains the file,
> ?and it is recommended that packaging tools support doing this
> ?automatically.)
>
> * Although this PEP supports the use of non-empty ``__init__`` modules
> ?in namespace packages, their usage is controversial. ?If more than
> ?one package portion contains an ``__init__`` module, at most one of
> ?them will be executed, possibly leading to silent errors.
>
> ?Therefore, if you must include an ``__init__`` module in your
> ?namespace package, make sure that it is provided by exactly **one**
> ?distribution, and that all other distributions using that module's
> ?contents are defined so as to have an installation dependency on
> ?the distribution containing the ``__init__`` module. ?Otherwise,
> ?it may not be present in some installations.
>
> ?(Note: for historical reasons, existing namespace packages nearly
> ?always include ``__init__`` modules, but they are usually empty
> ?except for code to declare the package a namespace. ?Under this
> ?proposal, these nearly-empty modules could and should be replaced
> ?by an empty ``*.ns`` file in the package directory.)
>
> For those implementing PEP 302 importer objects:
>
> * Importers that support the ``iter_modules()`` method and want to add
> ?namespace support should modify their ``iter_modules()``
> ?method so that it discovers and list namespace packages as well as
> ?standard modules and packages.
>
> * For implementation efficiency, an importer is allowed to cache
> ?information (such as whether a directory exists and whether an
> ?``__init__`` module is present in it) between the invocation of a
> ?``namespace_subpath()`` call and a subsequent ``find_module()`` call
> ?for the same name.
>
> ?It should, however, avoid retaining such cached information for any
> ?longer than the next method call, and it should also verify that the
> ?request is in fact for the same module/package name, as it is not
> ?guaranteed that a ``namespace_subpath()`` call will always be
> ?followed by a matching ``find_module()`` call. ?(After all, an
> ?``__init__`` module may already have been supplied by an earlier
> ?importer on the path.)
>
> * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
> ?not need to implement ``namespace_subpath()``, because the method
> ?is only called on importers corresponding to ``sys.path`` entries.'
> ?If a meta importer wishes to support namespace packages, it must
> ?do so entirely within its ``find_module()`` implementation.
>
> ?Unfortunately, it is unlikely that any such implementation will be
> ?able to merge its namespace portions with those of other meta
> ?importers or ``sys.path`` importers, so the meaning of "supporting
> ?namespace packages" for a meta importer is currently undefined.
>
> ?However, since the intended use case for meta importers is to
> ?replace Python's normal import process entirely for some subset of
> ?modules, and the number of such importers currently implemented is
> ?quite small, this seems unlikely to be a big issue in practice.
>
>
> Rejected Alternatives
> =====================
>
> * The original version of this PEP used ``.pkg`` or ``.pth`` files
> ?that contained either explicit directories to be added to a
> ?package's ``__path__``, or ``*`` to indicate that a package was
> ?a namespace.
>
> ?But this approach required a more complex change to the importer
> ?protocol, the files had to actually be opened and read, and there
> ?were no concrete use cases proposed for the additional flexibility
> ?specifying explicit paths.
>
> * On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using
> ?extra files, namespace packages use a ``__pkg__.py`` file to
> ?indicate their namespace-ness, in addition to a (required)
> ?``__init__.py``.
>
> ?Unfortunately, this approach solves only one of the `problems with
> ?the current approach`_: i.e., having a standard way of declaring and
> ?identifying namespace packages. ?It does not address the necessity
> ?of distributing duplicated files, or filename overlap between
> ?distributions. ?Further, it does not allow truly-independent
> ?namespace portions to exist, since it requires a "defining" portion
> ?(the portion containing the single ``__init__`` module) to exist.
>
> * Another approach considered during revisions to this PEP was to
> ?simply rename package directories to add a suffix like ``.ns``
> ?or ``-ns``, to indicate their namespaced nature. ?This would effect
> ?a small performance improvement for the initial import of a
> ?namespace package, avoid the need to create empty ``*.ns`` files,
> ?and even make it clearer that the directory involved is a namespace
> ?portion.
>
> ?The downsides, however, are also plentiful. ?If a package starts
> ?its life as a normal package, it must be renamed when it becomes
> ?a namespace, with the implied consequences for revision control
> ?tools.
>
> ?Further, there is an immense body of existing code (including the
> ?distutils and many other packaging tools) that expect a package
> ?directory's name to be the same as the package name. ?And porting
> ?existing Python 2.x namespace packages to Python 3 would require
> ?widespread directory renaming as well.
>
> ?In short, this approach would require a vastly larger number of
> ?changes to both the standard library and third-party code, for
> ?a tiny potential performance improvement and a small increase in
> ?clarity. ?It was therefore rejected on "practicality vs. purity"
> ?grounds.
>
>
>
> References
> ==========
>
> .. [1] "namespace" vs "module" packages (mailing list thread)
> ? (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
>
> .. [2] "PEP \382: Namespace Packages" (mailing list thread)
> ? (http://mail.python.org/pipermail/python-dev/2009-April/088087.html)
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
> ? Local Variables:
> ? mode: indented-text
> ? indent-tabs-mode: nil
> ? sentence-end-double-space: t
> ? fill-column: 70
> ? coding: utf-8
> ? End:
>

I have some separate comments on this draft that I'll have to
postpone.  In the meantime I have a couple of questions:

1. Should this PEP wait until importlib.__import__ replaces the
builtin __import__?  That will have bearing on where the
implementation takes place.  I'm not sure of the status of that
effort, other than what Brett has reported in the tracker issue
(http://bugs.python.org/issue2377), nor of the timeframe.

2. Should it wait for the work on the import engine (a GSOC project).
It sounds like a PEP is in the works right now.  It may also impact
the implementation of this PEP.

-eric

> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From barry at python.org  Sat Jul  9 00:31:35 2011
From: barry at python.org (Barry Warsaw)
Date: Fri, 8 Jul 2011 18:31:35 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
Message-ID: <20110708183135.7c9fa5d5@limelight.wooz.org>

On Jul 08, 2011, at 03:51 PM, P.J. Eby wrote:

>The following is my attempt at an updated draft of PEP 382, based on the
>recently-discussed changes.

Thanks!  I've been trying to catch up on the mailing list traffic today, and
grabbed your prototype code.  I plan on committing it to MvL's pep382 hg
branch so we have a place to play with it.

Comments inlined.

>PEP: 382
>Title: Namespace Package Declarations
>Version: $Revision$
>Last-Modified: $Date$
>Author: Martin v. L??wis <martin at v.loewis.de>, PJ Eby
>Status: Draft
>Type: Standards Track
>Content-Type: text/x-rst
>Created: 02-Apr-2009
>Python-Version: 3.2
>Post-History:
>
>Abstract
>========
>
>This PEP proposes an enhancement to Python's import machinery to
>replace existing uses of the standard library's
>``pkgutil.extend_path()`` API, and similar third-party APIs such as
>``pkg_resources.declare_namespace()``.
>
>The proposed enhancement will improve the reliability of existing
>namespace package implementations, while providing "One Obvious Way"
>to produce and consume namespace packages.
>
>
>Terminology
>===========
>
>Within this PEP, the following terms are used as follows:
>
>Package
>     Python packages as defined by Python's import statement.
>
>Distribution
>     A separately installable set of Python modules, as registered in
>     the Python package index, and installed by distutils, setuptools,
>     etc.
>
>Vendor Package
>     A group of files installed by an operating system's packaging
>     mechanism (e.g. Debian or Redhat packages installed on Linux
>     systems).
>
>Portion
>     A set of files in a single directory (possibly inside a zip file
>     or other storage mechanism) that contribute modules or subpackages
>     to a namespace package.  The contents of each portion ``sys.path``

This one got cut off.

>Namespace Package
>     A package whose subpackages and modules can be split into portions
>     that can be distributed or installed separately (via separate
>     distributions and/or vendor packages), in shared or separate
>     installation locations.
>
>     Unlike a regular package, however, which only allows submodule
>     and subpackage imports from a single location, a namespace
>     package's ``__path__`` is configured so that submodules and
>     subpackages can be imported from each of its installed portions,
>     regardless of their relative positions in ``sys.path``.
>
>
>Motivation
>==========
>
>.. epigraph::
>
>     "Most packages are like modules.  Their contents are highly
>     interdependent and can't be pulled apart.  [However,] some
>     packages exist to provide a separate namespace. ...  It should
>     be possible to distribute sub-packages or submodules of these
>     [namespace packages] independently."
>
>     -- Jim Fulton, shortly before the release of Python 2.3 [1]_

Nice find!

>The Current Approach
>--------------------
>
>First introduced in Python 2.3, namespace packages are a mechanism
>for splitting a single Python package across multiple directories
>on disk.  This splitting has two main benefits:
>
>1. It allows different parts of a large package or framework to be
>    distributed and installed independently.  For example, installing
>    the ``zope.interface`` package without having to install every
>    package in the ``zope.*`` namespace.
>
>    (This is somewhat similar to the way Perl's package system allows
>    authors to separately distribute subpackages of ``File::`` or
>    ``Email::``.)
>
>2. As a side-effect of benefit 1, it reduces package naming collisions
>    across multiple authors or organizations, by encouraging them to
>    use distinguishing prefixes.  Instead of say, Zope and Twisted both
>    offering a top-level ``interface`` package (in which case, both
>    could not be installed to the same directory), they can use
>    ``zope.interface`` and ``twisted.interface``, while still being
>    able to distribute these subpackages separately from other ``zope``
>    or ``twisted`` subpackages.
>
>    (This is somewhat similar to the way Java uses names like
>    ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
>    only flatter.)
>
>In current Python versions, however, a registration function (such as
>``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
>must be explicitly invoked in order to set up the package's
>``__path__``.

Do you need to explain a little more why __path__ is significant, and why the
registration function is required?

>There are two problems with this approach, however.
>
>
>Problems With The Current Approach
>----------------------------------
>
>The first (and lesser) problem is that there is no One Obvious Way to
>either declare that a package is a "namespace" or "module" package,
>or to tell which kind of package a given directory on disk is.
>
>Instead, you must choose one of the various APIs to use, each of
>which is slightly-incompatible with the others.  (For example,
>``pkgutil`` supports ``*.pkg`` files; setuptools doesn't.  Likewise,
>setuptools supports package portions living in zip files, and adding
>new path components to already-imported namespaces, whereas
>``pkgutil`` doesn't.)
>
>Similarly, to tell whether a given directory is a "namespace" or
>"module" package, you must read its documentation or inspect its code
>in detail, and be able to recognize the various API calls mentioned
>above.
>
>The second -- and much larger -- issue is that whichever API is used
>to declare the namespace, the declaration has to be invoked from a
>namespace package's ``__init__`` module in order to work.  (Otherwise,
>only the first part of the package found on ``sys.path`` would be
>importable.)
>
>This clashes with the goal of separately installing portions of a
>namespace, because then each distributed piece must include a copy
>of the same ``__init__.py``.  (Otherwise, each piece would not be
>importable on its own, as Python currently requires the existence
>of an ``__init__`` module in order to import the package at all, let
>alone set up the namespace!)
>
>In addition to the developer inconvenience of creating, synchronizing,
>and distributing these duplicated ``__init__`` modules, there is a
>further problem created for operating system vendors.
>
>Vendor packages typically must not provide overlapping files, and an
>attempt to install a vendor package that has a file already on disk
>will fail or cause unpredictable behavior.  As vendors might choose to
>package distributions such that they will end up all in a single
>directory for the namespace package, all portions would contribute
>conflicting ``__init__.py`` files.

I might word this a little differently.  Perhaps:

Vendor packaging standards require every file on disk to be owned by exactly
one vendor package.  But because each portion of a namespace package may be
contained in a separate vendor package, multiple vendor packages would have to
own the namespace package's __init__.py file.  For example, would the
``zope.interface`` vendor package own ``zope/__init__.py`` or would the
``zope.component`` vendor package own it?  Different vendors handle this
conflict differently, and in fact, different packaging tools from the same
vendor can handle this differently, which can cause consistency problems.

>This issue has lead to various fragile and complex workarounds in
>practice, such as ``.pth`` file abuse by setuptools, and the shipping
>of broken partial packages with distutils.
>
>With the enhancement proposed here, however, all of the above problems
>can be readily resolved.
>
>
>Specification
>=============
>
>Instead of an API call buried inside a series of duplicated and
>potentially-clashing ``__init__`` modules (which mostly exist only
>to make the package importable and declare its namespace-ness), this
>PEP proposes that Python's import machinery be modified to include
>direct support for namespace packages.
>
>This support would work by adding a new way to desginate a directory

s/desginate/designate/

>as containing a namespace package portion: by including one or more
>``*.ns`` files in it.
>
>This approach removes the need for an ``__init__`` module to be
>duplicated across namespace package portions.  Instead, each portion
>can simply include a uniquely-named ``*.ns`` file, thereby avoiding
>filename clashes in vendor packages.

I think a concrete example would really help here.  E.g.:

For example, the ``zope.interface`` portion would include a
``zope/zope.interface.ns`` file, while the ``zope.component`` portion would
include a ``zope/zope.component.ns`` file.  The very presence of any ``.ns``
files inside the ``zope`` directory is enough to designate ``zope`` as a
namespace package.  No conflicting ``zope/__init__.py`` file is necessary.

>And, since the import machinery knows that these directories are
>portions of a namespace package, it can automatically initialize
>the package's ``__path__`` to include portions located on different
>parts of ``sys.path``.  (Thus avoiding the need for special code
>to be called in the ``__init__`` module.)
>
>In addition to doing this path setup, the import machinery will also
>add any imported namespace packages to ``sys.namespace_packages``
>(initially an empty set), so that namespace packages can be identified
>or iterated over.
>
>
>PEP \302 Extension
>------------------
>
>The existing PEP 302 protocol is to be extended to handle namespace
>package portion directories, by adding a new importer method,
>``namespace_subpath(fullname)``.  An implementation of this method
>will be added to all applicable importer classes distributed with
>Python, including those in ``pkgutil`` and ``zipimport``).
>
>(Note: any other importer wishing to support namespace packages must
>provide its own implementation of this method as well.  If an importer
>does not have a ``namespace_subpath()`` method, it will be treated as
>if it *did* have the method, but it returned ``None`` when called.)
>
>This new method is called just before the importer's ``find_module()``
>is normally invoked.  If the importer determines that `fullname` is
>a namespace package portion under its jurisdiction, then the importer
>returns an importer-specific path to that namespace portion.

Please define exactly what ``fullname`` is.

>For example, if a standard filesystem path importer for the path
>``/usr/lib/site-packages`` is about to be asked to import ``zope``,
>and there is a ``/usr/lib/site-packages/zope`` directory containing
>any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
>on that importer should return ``"/usr/lib/site-packages/zope"``.
>
>However, if there is no such subdirectory, or it does *not* contain
>any files whose names end with ``.ns``, that importer would return
>``None`` instead.
>
>The Python import machinery will call this method on each importer
>corresponding to a path entry in ``sys.path`` (for top-level imports)
>or in a parent package ``__path__`` (for subpackage imports).
>
>If a normal package or module is found before a namespace package,
>importing proceeds according to the normal PEP 302 protocol.  (That
>is, a loader object is simply asked to load the located module or
>package.)
>
>However, if a namespace package portion is found (i.e., an importer's
>``namespace_subpath()`` returns a string), then the normal import
>search stops, and a namespace package is created instead.
>
>The import machinery continues iterating over importers and calling
>``namespace_subpath()`` on them, but it does **not** continue calling
>``find_module()`` on them.  Instead, it accumulates any strings
>returned by the subpath calls, in order to assemble a ``__path__``
>for the package being imported.
>
>(Note that this implies that any non-namespace packages with the same
>name are skipped, and not included in the resulting package's
>``__path__``.  In other words, a namespace package's initial
>``__path__`` only includes namespace portions, never non-namespace
>package directories.)

Would you expect this to be common?  Did you have any examples in mind, or was
it just covering-the-bases?

>Once this ``__path__`` has been assembled, a module is created, and
>its ``__path__`` attribute is set.  The package's name is then added
>to ``sys.namespace_packages`` -- a set of package names.
>
>Finally, the ``__init__`` module code for the package (if it exists)
>is located and executed in the new module's namespace.
>
>Each importer that returns a ``namespace_subpath()`` for the package
>is asked to perform a standard ``find_module()`` for the package.
>Since by the normal import rules, a directory containing an
>``__init__`` module is a package, this call should succeed if the
>namespace package portion contains an ``__init__`` module, and the
>importing can proceed normally from that point.
>
>There is one caveat, however.  The importers currently distributed
>with Python expect that *they* will be the ones to initialize the
>``__path__`` attribute, which means that they must be changed to
>either recognize that ``__path__`` has already been set and not
>change it, or to handle namespace packages specially (e.g., via an
>internal flag or checking ``sys.namespace_packages``).
>
>Similarly, any third-party importers wishing to support namespace
>packages must make similar changes.
>
>(NOTE: in general, it goes against the design of PEP 302 for a loader
>object to assume that it is always creating the module object or that
>the module it is operating on is empty.  Making this assumption can
>result in code that breaks the normal operation of the ``reload()``
>builtin and any specialized tools that rely on it, such as lazy
>importers, automatic reloaders, and so on.)
>
>
>Standard Library Changes/Additions
>----------------------------------
>
>The ``pkgutil`` module should be updated to handle this
>specification appropriately, including any necessary changes to
>``extend_path()``, ``iter_modules()``, etc.  A new generic API for
>calling ``namespace_subpath()`` on importers should be added as well.

Is there any reason not to put extend_path() on the road to deprecation?

>Specifically the proposed changes and additions are:
>
>* A new ``namespace_subpath(importer, fullname)`` generic, allowing
>   implementations to be registered for existing importers.

Is this the registration mechanism?

>* A new ``extend_namespaces(path_entry)`` function, to extend existing
>   and already-imported namespace packages' ``__path__`` attributes to
>   include any portions found in a new ``sys.path`` entry.  This
>   function should be called by applications extending ``sys.path``
>   at runtime, e.g. to include a plugin directory or add an egg to the
>   path.
>
>   The implementation of this function does a simple breadth-first walk
>   of ``sys.namespace_packages``, and performs any necessary
>   ``namespace_subpath()`` calls to identify what path entries need to
>   be added to each package's ``__path__``, given that `path_entry`
>   has been added to ``sys.path``.
>
>* A new ``iter_namespaces(parent='')`` function to allow breadth-first
>   traversal of namespaces in ``sys.namespace_packages``, by yielding
>   the child namespace packages of `parent`.  For example, calling
>   ``iter_namespaces("zope")`` might yield ``zope.app`` and
>   ``zope.products`` (if they are namespace packages registered in
>   ``sys.namespace_packagess``), but **not** ``zope.foo.bar``.

s/packagess/packages/

>   This function is needed to implement ``extend_namespaces()``, but
>   is potentially useful to others.
>
>* ``ImpImporter.iter_modules()`` should be changed to also detect and
>   yield the names of namespace package portions.
>
>In addition to the above changes, the ``zipimport`` importer should
>have its ``iter_modules()`` implementation similarly changed.  (Note:
>current versions of Python implement this via a shim in ``pkgutil``,
>so technically this is also a change to ``pkgutil``.)
>
>
>Implementation Notes
>--------------------
>
>For users, developers, and distributors of namespace packages:
>
>* ``sys.namespace_packages`` is allowed to contain non-existent or
>   not-yet-imported package names; code that uses its contents should
>   not assume that every name in this set is also present in
>   sys.packages or that importing the name will necessarily succeed.
>
>* ``*.ns`` files must be empty or contain only ASCII whitespace
>   characters.  This leaves open the possibility for future extension
>   to the format.

Getting back to our previous discussion on this, I might also add a comment
format, e.g. lines starting with `#`.  Almost any extension we can come up
with will probably need to include comments, so we might as well add them here
now.  This will also allow folks to add copyright, or other textual
information into .ns files as their coding conventions may dictate.

Do you expect to ignore everything else, or throw an exception?  Let's be
explicit about that.

>* Files contained within a namespace package portion directory must
>   be *unique* to that portion, so that the portion can be distributed
>   as a vendor package without any filename overlap.  This applies to
>   modules and data files as well as ``*.ns`` files.
>
>   (For ``*.ns`` files themselves, uniqueness can be achieved simply by
>   giving them a name based on the distribution that contains the file,
>   and it is recommended that packaging tools support doing this
>   automatically.)
>
>* Although this PEP supports the use of non-empty ``__init__`` modules
>   in namespace packages, their usage is controversial.  If more than
>   one package portion contains an ``__init__`` module, at most one of
>   them will be executed, possibly leading to silent errors.
>
>   Therefore, if you must include an ``__init__`` module in your
>   namespace package, make sure that it is provided by exactly **one**
>   distribution, and that all other distributions using that module's
>   contents are defined so as to have an installation dependency on
>   the distribution containing the ``__init__`` module.  Otherwise,
>   it may not be present in some installations.
>
>   (Note: for historical reasons, existing namespace packages nearly
>   always include ``__init__`` modules, but they are usually empty
>   except for code to declare the package a namespace.  Under this
>   proposal, these nearly-empty modules could and should be replaced
>   by an empty ``*.ns`` file in the package directory.)

I'd be a little more forceful; the PEP should strongly recommend against
including namespace package __init__.py files.

>For those implementing PEP 302 importer objects:
>
>* Importers that support the ``iter_modules()`` method and want to add
>   namespace support should modify their ``iter_modules()``
>   method so that it discovers and list namespace packages as well as
>   standard modules and packages.
>
>* For implementation efficiency, an importer is allowed to cache
>   information (such as whether a directory exists and whether an
>   ``__init__`` module is present in it) between the invocation of a
>   ``namespace_subpath()`` call and a subsequent ``find_module()`` call
>   for the same name.
>
>   It should, however, avoid retaining such cached information for any
>   longer than the next method call, and it should also verify that the
>   request is in fact for the same module/package name, as it is not
>   guaranteed that a ``namespace_subpath()`` call will always be
>   followed by a matching ``find_module()`` call.  (After all, an
>   ``__init__`` module may already have been supplied by an earlier
>   importer on the path.)
>
>* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
>   not need to implement ``namespace_subpath()``, because the method
>   is only called on importers corresponding to ``sys.path`` entries.'
>   If a meta importer wishes to support namespace packages, it must
>   do so entirely within its ``find_module()`` implementation.
>
>   Unfortunately, it is unlikely that any such implementation will be
>   able to merge its namespace portions with those of other meta
>   importers or ``sys.path`` importers, so the meaning of "supporting
>   namespace packages" for a meta importer is currently undefined.
>
>   However, since the intended use case for meta importers is to
>   replace Python's normal import process entirely for some subset of
>   modules, and the number of such importers currently implemented is
>   quite small, this seems unlikely to be a big issue in practice.
>
>
>Rejected Alternatives
>=====================
>
>* The original version of this PEP used ``.pkg`` or ``.pth`` files
>   that contained either explicit directories to be added to a
>   package's ``__path__``, or ``*`` to indicate that a package was
>   a namespace.
>
>   But this approach required a more complex change to the importer
>   protocol, the files had to actually be opened and read, and there
>   were no concrete use cases proposed for the additional flexibility
>   specifying explicit paths.
>
>* On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using
>   extra files, namespace packages use a ``__pkg__.py`` file to
>   indicate their namespace-ness, in addition to a (required)
>   ``__init__.py``.
>
>   Unfortunately, this approach solves only one of the `problems with
>   the current approach`_: i.e., having a standard way of declaring and
>   identifying namespace packages.  It does not address the necessity
>   of distributing duplicated files, or filename overlap between
>   distributions.  Further, it does not allow truly-independent
>   namespace portions to exist, since it requires a "defining" portion
>   (the portion containing the single ``__init__`` module) to exist.
>
>* Another approach considered during revisions to this PEP was to
>   simply rename package directories to add a suffix like ``.ns``
>   or ``-ns``, to indicate their namespaced nature.  This would effect
>   a small performance improvement for the initial import of a
>   namespace package, avoid the need to create empty ``*.ns`` files,
>   and even make it clearer that the directory involved is a namespace
>   portion.
>
>   The downsides, however, are also plentiful.  If a package starts
>   its life as a normal package, it must be renamed when it becomes
>   a namespace, with the implied consequences for revision control
>   tools.
>
>   Further, there is an immense body of existing code (including the
>   distutils and many other packaging tools) that expect a package
>   directory's name to be the same as the package name.  And porting
>   existing Python 2.x namespace packages to Python 3 would require
>   widespread directory renaming as well.
>
>   In short, this approach would require a vastly larger number of
>   changes to both the standard library and third-party code, for
>   a tiny potential performance improvement and a small increase in
>   clarity.  It was therefore rejected on "practicality vs. purity"
>   grounds.
>
>
>
>References
>==========
>
>.. [1] "namespace" vs "module" packages (mailing list thread)
>    (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
>
>.. [2] "PEP \382: Namespace Packages" (mailing list thread)
>    (http://mail.python.org/pipermail/python-dev/2009-April/088087.html)
>
>Copyright
>=========
>
>This document has been placed in the public domain.
>
>
>..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:

You've done a really excellent job at both simplifying the specification, and
providing a clear explanation of the issues and mechanisms involved.  Kudos!
I really like this a lot, and wholeheartedly support its adoption.  I hope MvL
will agree.

I'm going to have a look at your prototype now and will commit it, and any
updates, to the hg repo.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110708/2dc4862f/attachment.pgp>

From pje at telecommunity.com  Sat Jul  9 00:33:23 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 08 Jul 2011 18:33:23 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <CALFfu7CFJqK+2cB+HbFZR5PyMfeesm1s-+_CcrqGKK9JQC7dHw@mail.g
	mail.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CALFfu7CFJqK+2cB+HbFZR5PyMfeesm1s-+_CcrqGKK9JQC7dHw@mail.gmail.com>
Message-ID: <20110708223346.3FD4F3A404D@sparrow.telecommunity.com>

At 03:52 PM 7/8/2011 -0600, Eric Snow wrote:
>1. Should this PEP wait until importlib.__import__ replaces the
>builtin __import__?  That will have bearing on where the
>implementation takes place.  I'm not sure of the status of that
>effort, other than what Brett has reported in the tracker issue
>(http://bugs.python.org/issue2377), nor of the timeframe.
>
>2. Should it wait for the work on the import engine (a GSOC project).
>It sounds like a PEP is in the works right now.  It may also impact
>the implementation of this PEP.

Honestly, since I've done very little with Python 3.x and don't 
expect to be involved in the implementation there, I would leave 
answering those questions to the folks involved.

I will say, though, that this really doesn't modify the main import 
processing loop much; it's just an extra method call at the point 
where you have a finder, and a few extra local variables.  So I don't 
see any insurmountable obstacles to adding it to import.c, at least 
given what I remember of how the 2.x version works.

But again, I'm not the one doing the work, so take that with a grain 
of salt.  ;-)


From pje at telecommunity.com  Sat Jul  9 02:01:28 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 08 Jul 2011 20:01:28 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110708183135.7c9fa5d5@limelight.wooz.org>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<20110708183135.7c9fa5d5@limelight.wooz.org>
Message-ID: <20110709000146.06C313A404D@sparrow.telecommunity.com>

At 06:31 PM 7/8/2011 -0400, Barry Warsaw wrote:
>On Jul 08, 2011, at 03:51 PM, P.J. Eby wrote:
>
> >The following is my attempt at an updated draft of PEP 382, based on the
> >recently-discussed changes.
>
>Thanks!  I've been trying to catch up on the mailing list traffic today, and
>grabbed your prototype code.  I plan on committing it to MvL's pep382 hg
>branch so we have a place to play with it.

You should probably start from this version instead:

   http://pastebin.com/Wv77WYyb

It's got some work on other things like iter_modules, extend_namespaces, etc.


> >Portion
> >     A set of files in a single directory (possibly inside a zip file
> >     or other storage mechanism) that contribute modules or subpackages
> >     to a namespace package.  The contents of each portion ``sys.path``
>
>This one got cut off.

Oops.  A bad edit; ignore that sentence fragment, it was replaced by 
language in the definition that followed it.


> >Motivation
> >==========
> >
> >.. epigraph::
> >
> >     "Most packages are like modules.  Their contents are highly
> >     interdependent and can't be pulled apart.  [However,] some
> >     packages exist to provide a separate namespace. ...  It should
> >     be possible to distribute sub-packages or submodules of these
> >     [namespace packages] independently."
> >
> >     -- Jim Fulton, shortly before the release of Python 2.3 [1]_
>
>Nice find!

That was where Jim coined the term in the first place.  I went back 
looking because I remembered at least Jim, Guido and I hashing this 
out back then on a zope related mailing list.  Took a few minutes to 
find, but I think it was worth it.


>Do you need to explain a little more why __path__ is significant, and why the
>registration function is required?

Revsed paragraph:

====
In current Python versions, however, a registration function (such as
``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
must be explicitly invoked in order to set up the package's
``__path__``.  (By default, a package's ``__path__`` lists only one
directory, so to allow imports from more than one directory, the
``__path__`` must be explicitly extended in code.)
====




> >Vendor packages typically must not provide overlapping files, and an
> >attempt to install a vendor package that has a file already on disk
> >will fail or cause unpredictable behavior.  As vendors might choose to
> >package distributions such that they will end up all in a single
> >directory for the namespace package, all portions would contribute
> >conflicting ``__init__.py`` files.
>
>I might word this a little differently.  Perhaps:
>
>Vendor packaging standards require every file on disk to be owned by exactly
>one vendor package.  But because each portion of a namespace package may be
>contained in a separate vendor package, multiple vendor packages would have to
>own the namespace package's __init__.py file.  For example, would the
>``zope.interface`` vendor package own ``zope/__init__.py`` or would the
>``zope.component`` vendor package own it?  Different vendors handle this
>conflict differently, and in fact, different packaging tools from the same
>vendor can handle this differently, which can cause consistency problems.

I took the original wording as directly as practical from MvLs, but I 
agree yours is clearer.  OTOH, I think the "fail or cause 
unpredictable behavior is a much stronger motivator than, "it's 
nonstandard and confusing".  ;-)

Did you have a specific rationale for your choice?  I mean, what did 
you want to gain or avoid by the change?


> >This support would work by adding a new way to desginate a directory
>
>s/desginate/designate/

Got it, thanks for the careful read!


> >as containing a namespace package portion: by including one or more
> >``*.ns`` files in it.
> >
> >This approach removes the need for an ``__init__`` module to be
> >duplicated across namespace package portions.  Instead, each portion
> >can simply include a uniquely-named ``*.ns`` file, thereby avoiding
> >filename clashes in vendor packages.
>
>I think a concrete example would really help here.  E.g.:
>
>For example, the ``zope.interface`` portion would include a
>``zope/zope.interface.ns`` file, while the ``zope.component`` portion would
>include a ``zope/zope.component.ns`` file.  The very presence of any ``.ns``
>files inside the ``zope`` directory is enough to designate ``zope`` as a
>namespace package.  No conflicting ``zope/__init__.py`` file is necessary.

The problem with this example is that it gives the impression that 
.ns files are named for packages, instead of being named for 
distributions.  So, I went with a more detailed and explict 
example.  Here's my revised version:

====

For example, if two distributions, ``Importing`` and ``ProxyTypes``
wish to contribute the modules ``peak.util.imports`` and
``peak.util.proxies`` to the ``peak.util`` namespace package, then
their source distribution directory layouts would look like this::

     ProxyTypes-0.9.tgz:
         peak/
             ProxyTypes.ns   <- 'peak' is a namespace package
             util/
                 ProxyTypes.ns   <- 'peak.util' is a namespace package
                 proxies.py

     Importing-1.10.tgz:
         peak/
             Importing.ns   <- 'peak' is a namespace package
             util/
                 Importing.ns   <- 'peak.util' is a namespace package
                 imports.py

If installed separately (e.g. one via system package, another via
a user's home directory), then the ``__path__`` of the ``peak``
main package will include both ``peak`` subdirectories, and the
``__path__`` of the ``peak.util`` namespace package will include both
``peak/util`` subdirectories.  Thus, both ``peak.util.proxies``
and ``peak.util.imports`` will be importable, despite the physical
separation of the modules.

On the other hand, if these portions are both installed to the *same*
directory, the layout will look like this::

     site-packages/   (or wherever)
         peak/
             Importing.ns
             ProxyTypes.ns   <- both portions' .ns files appear
             util/
                 Importing.ns   <- at both levels
                 ProxyTypes.ns
                 imports.py
                 proxies.py

And the ``__path__`` of the ``peak`` and ``peak.util`` packages will
only contain a single directory each.  (Assuming these are the only
contributions to ``peak`` and ``peak.util`` on ``sys.path``, of
course!)

Either way, the mere presence of the ``.ns`` files tells the import
machinery that the directory is a namespace package portion and is
importable; there is no need for any ``__init__.py`` files that would
cause installation conflicts, when both portions are installed to the
same target location.

In addition to detecting namespace portions and adding them to the
package's ``__path__``, the import machinery will also add any
imported namespace packages to ``sys.namespace_packages`` (initially
an empty set), so that namespace packages can be identified or
iterated over.

====

I think this also gets more of the clarity about __path__ that you 
asked for, too.


> >This new method is called just before the importer's ``find_module()``
> >is normally invoked.  If the importer determines that `fullname` is
> >a namespace package portion under its jurisdiction, then the importer
> >returns an importer-specific path to that namespace portion.
>
>Please define exactly what ``fullname`` is.

Ugh.  Do I have to?  ;-)

Will it work if I just change that to "just before the importer's 
``find_module(fullname)`` is normally invoked", so it's more clearly implied?


> >(Note that this implies that any non-namespace packages with the same
> >name are skipped, and not included in the resulting package's
> >``__path__``.  In other words, a namespace package's initial
> >``__path__`` only includes namespace portions, never non-namespace
> >package directories.)
>
>Would you expect this to be common?  Did you have any examples in mind, or was
>it just covering-the-bases?

Just covering the bases.


> >Standard Library Changes/Additions
> >----------------------------------
> >
> >The ``pkgutil`` module should be updated to handle this
> >specification appropriately, including any necessary changes to
> >``extend_path()``, ``iter_modules()``, etc.  A new generic API for
> >calling ``namespace_subpath()`` on importers should be added as well.
>
>Is there any reason not to put extend_path() on the road to deprecation?

I don't know.  Is there?  As I said, I considered that an open question.



> >Specifically the proposed changes and additions are:
> >
> >* A new ``namespace_subpath(importer, fullname)`` generic, allowing
> >   implementations to be registered for existing importers.
>
>Is this the registration mechanism?

Registration for what?  I meant that this is analogous to other 
pkgutil generic functions that let you call a PEP 302 extension 
protocol on an importer, whether or not the importer directly 
implements that protocol.  For example, 
pkgutil.iter_importer_modules() is a generic function that lets you 
ask an importer to iterate over available modules, whether it 
actually implements its own "iter_modules()" method or not.  The 
pkgutil.namespace_subpath() function would do the same for the 
(possibly-absent) namespace_subpath() method on existing importers, 
and allow third parties to register namespace support for custom 
importers that can't be directly modified to support namespace packages.

Any thoughts on how better to word that bit, without necessarily 
going into that much detail?  ;-)


>s/packagess/packages/

Got it.


> >* ``*.ns`` files must be empty or contain only ASCII whitespace
> >   characters.  This leaves open the possibility for future extension
> >   to the format.
>
>Getting back to our previous discussion on this, I might also add a comment
>format, e.g. lines starting with `#`.  Almost any extension we can come up
>with will probably need to include comments, so we might as well add them here
>now.  This will also allow folks to add copyright, or other textual
>information into .ns files as their coding conventions may dictate.
>
>Do you expect to ignore everything else, or throw an exception?  Let's be
>explicit about that.

We won't be opening the files at all, so the contents will be ignored.


>I'd be a little more forceful; the PEP should strongly recommend against
>including namespace package __init__.py files.

As I said, it's controversial.  Some people really want those 
__init__ modules, and setuptools sort-of supports them now.  I can 
make it a bit more forceful, though.


>You've done a really excellent job at both simplifying the specification, and
>providing a clear explanation of the issues and mechanisms involved.  Kudos!
>I really like this a lot, and wholeheartedly support its adoption.  I hope MvL
>will agree.

Thanks.


From ncoghlan at gmail.com  Sat Jul  9 10:00:18 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Jul 2011 18:00:18 +1000
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <CALFfu7CFJqK+2cB+HbFZR5PyMfeesm1s-+_CcrqGKK9JQC7dHw@mail.gmail.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CALFfu7CFJqK+2cB+HbFZR5PyMfeesm1s-+_CcrqGKK9JQC7dHw@mail.gmail.com>
Message-ID: <CADiSq7ciz+kWAg1uyNj2Dyhr=yB9-2B3DNVpZ6tkLqJKEiAsJA@mail.gmail.com>

On Sat, Jul 9, 2011 at 7:52 AM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> I have some separate comments on this draft that I'll have to
> postpone. ?In the meantime I have a couple of questions:
>
> 1. Should this PEP wait until importlib.__import__ replaces the
> builtin __import__? ?That will have bearing on where the
> implementation takes place. ?I'm not sure of the status of that
> effort, other than what Brett has reported in the tracker issue
> (http://bugs.python.org/issue2377), nor of the timeframe.

Up to the people implementing it. They can either do the work twice
(once for import.c and once for importlib) in the knowledge that the
intent is to nuke (most of) import.c before 3.3 is released or else
they can just do the importlib implementation and make issue 2377 a
dependency of the PEP 382 support becoming available in the default
interpreter.

The only approach I would actively oppose is checking in a PEP 382
implementation that *didn't* include the necessary importlib updates.

> 2. Should it wait for the work on the import engine (a GSOC project).
> It sounds like a PEP is in the works right now. ?It may also impact
> the implementation of this PEP.

PEP 382 is much further along (and more significant from a practical
point of view) than the import engine work, so it shouldn't be delayed
for the latter. If PEP 382 goes in first the appropriate changes to
the engine code to account for sys.namespace_packages and the importer
protocol changes can be adopted from the importlib modifications.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Sat Jul  9 10:13:47 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Jul 2011 18:13:47 +1000
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
Message-ID: <CADiSq7cAnaPwo8+uK3tza6RnLp8Hsf0FOP8Ej=zwRAw9ugZqfA@mail.gmail.com>

Nice write up!

Barry covered most things, just a few minor comments below.

On Sat, Jul 9, 2011 at 5:51 AM, P.J. Eby <pje at telecommunity.com> wrote:
> Vendor Package
> ? ?A group of files installed by an operating system's packaging
> ? ?mechanism (e.g. Debian or Redhat packages installed on Linux
> ? ?systems).

s/Redhat/RPM/

(or Red Hat. Either works, but Redhat is wrong)

> * A new ``extend_namespaces(path_entry)`` function, to extend existing
> ?and already-imported namespace packages' ``__path__`` attributes to
> ?include any portions found in a new ``sys.path`` entry. ?This
> ?function should be called by applications extending ``sys.path``
> ?at runtime, e.g. to include a plugin directory or add an egg to the
> ?path.
>
> ?The implementation of this function does a simple breadth-first walk
> ?of ``sys.namespace_packages``, and performs any necessary
> ?``namespace_subpath()`` calls to identify what path entries need to
> ?be added to each package's ``__path__``, given that `path_entry`
> ?has been added to ``sys.path``.

I believe this may need a "parent=''" argument so it can also be used
to extend a package path.

> For users, developers, and distributors of namespace packages:
>
> * ``sys.namespace_packages`` is allowed to contain non-existent or
> ?not-yet-imported package names; code that uses its contents should
> ?not assume that every name in this set is also present in
> ?sys.packages or that importing the name will necessarily succeed.

s/sys.packages/sys.modules/

>
> * ``*.ns`` files must be empty or contain only ASCII whitespace
> ?characters. ?This leaves open the possibility for future extension
> ?to the format.

+1 for Barry's suggestion to mandate # as a comment prefix and
disallow any other contents (even though the interpreter itself won't
enforce that).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ericsnowcurrently at gmail.com  Sat Jul  9 10:42:13 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 9 Jul 2011 02:42:13 -0600
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110708195157.335043A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
Message-ID: <CALFfu7CPh9o07FBM-FcYZ4sK+c7erkCPC8CgJatVWABkfZR46Q@mail.gmail.com>

Thanks for working on this.  It's looking good.

A couple of questions inline.  Apologies ahead of time if my ignorance
shows to loudly. :)


On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby <pje at telecommunity.com> wrote:
>
> PEP \302 Extension
> ------------------
>
> The existing PEP 302 protocol is to be extended to handle namespace
> package portion directories, by adding a new importer method,
> ``namespace_subpath(fullname)``. ?An implementation of this method
> will be added to all applicable importer classes distributed with
> Python, including those in ``pkgutil`` and ``zipimport``).
>
> (Note: any other importer wishing to support namespace packages must
> provide its own implementation of this method as well. ?If an importer
> does not have a ``namespace_subpath()`` method, it will be treated as
> if it *did* have the method, but it returned ``None`` when called.)
>
> This new method is called just before the importer's ``find_module()``
> is normally invoked. ?If the importer determines that `fullname` is
> a namespace package portion under its jurisdiction, then the importer
> returns an importer-specific path to that namespace portion.
>
> For example, if a standard filesystem path importer for the path
> ``/usr/lib/site-packages`` is about to be asked to import ``zope``,
> and there is a ``/usr/lib/site-packages/zope`` directory containing
> any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
> on that importer should return ``"/usr/lib/site-packages/zope"``.
>

And if there were a "zope_part1" and a "zope_part2" directory, both
with a zope.ns file in them, that namespace_subpath("zope") call would
return ["/usr/lib/site-packages/zope_part1",
"/usr/lib/site-packages/zope_part2"], right?  And if both also had a
foo.ns file in them, the same would be returned for
namespace_subpath("foo").

> However, if there is no such subdirectory, or it does *not* contain
> any files whose names end with ``.ns``, that importer would return
> ``None`` instead.
>
> The Python import machinery will call this method on each importer
> corresponding to a path entry in ``sys.path`` (for top-level imports)
> or in a parent package ``__path__`` (for subpackage imports).
>
> If a normal package or module is found before a namespace package,
> importing proceeds according to the normal PEP 302 protocol. ?(That
> is, a loader object is simply asked to load the located module or
> package.)
>
> However, if a namespace package portion is found (i.e., an importer's
> ``namespace_subpath()`` returns a string), then the normal import
> search stops, and a namespace package is created instead.
>
> The import machinery continues iterating over importers and calling
> ``namespace_subpath()`` on them, but it does **not** continue calling
> ``find_module()`` on them. ?Instead, it accumulates any strings
> returned by the subpath calls, in order to assemble a ``__path__``
> for the package being imported.
>
> (Note that this implies that any non-namespace packages with the same
> name are skipped, and not included in the resulting package's
> ``__path__``. ?In other words, a namespace package's initial
> ``__path__`` only includes namespace portions, never non-namespace
> package directories.)
>
> Once this ``__path__`` has been assembled, a module is created, and
> its ``__path__`` attribute is set. ?The package's name is then added
> to ``sys.namespace_packages`` -- a set of package names.
>
> Finally, the ``__init__`` module code for the package (if it exists)
> is located and executed in the new module's namespace.
>
> Each importer that returns a ``namespace_subpath()`` for the package
> is asked to perform a standard ``find_module()`` for the package.
> Since by the normal import rules, a directory containing an
> ``__init__`` module is a package, this call should succeed if the
> namespace package portion contains an ``__init__`` module, and the
> importing can proceed normally from that point.
>

Is this last paragraph part of the finally?  If so, what does calling
find_module at this point accompish?  Do you mean load_module is also
called for each that is found?  Will it be too easy (or conversely
very likely) to have __init__.py collisions?

> There is one caveat, however. ?The importers currently distributed
> with Python expect that *they* will be the ones to initialize the
> ``__path__`` attribute, which means that they must be changed to
> either recognize that ``__path__`` has already been set and not
> change it, or to handle namespace packages specially (e.g., via an
> internal flag or checking ``sys.namespace_packages``).
>
> Similarly, any third-party importers wishing to support namespace
> packages must make similar changes.
>

Seems like the caveat is dependent on the above algorithm.  If the
module's __path__ were set with the namespace_subpath() results after
the namespace package's import was all over, would it still be an
issue?  Most of this section reads like "this is how people should
expect the implementation to look".  However, I'm fine with that if
this is how the implementation should look.  :)

> (NOTE: in general, it goes against the design of PEP 302 for a loader
> object to assume that it is always creating the module object or that
> the module it is operating on is empty. ?Making this assumption can
> result in code that breaks the normal operation of the ``reload()``
> builtin and any specialized tools that rely on it, such as lazy
> importers, automatic reloaders, and so on.)
>
>
> Standard Library Changes/Additions
> ----------------------------------
>
> The ``pkgutil`` module should be updated to handle this
> specification appropriately, including any necessary changes to
> ``extend_path()``, ``iter_modules()``, etc. ?A new generic API for
> calling ``namespace_subpath()`` on importers should be added as well.
>
> Specifically the proposed changes and additions are:

Maybe, "Specifically the proposed changes and additions to pkgutil
are:", to clarify the context?

>
> * A new ``namespace_subpath(importer, fullname)`` generic, allowing
> ?implementations to be registered for existing importers.
>
> * A new ``extend_namespaces(path_entry)`` function, to extend existing
> ?and already-imported namespace packages' ``__path__`` attributes to
> ?include any portions found in a new ``sys.path`` entry. ?This
> ?function should be called by applications extending ``sys.path``
> ?at runtime, e.g. to include a plugin directory or add an egg to the
> ?path.
>
> ?The implementation of this function does a simple breadth-first walk
> ?of ``sys.namespace_packages``, and performs any necessary
> ?``namespace_subpath()`` calls to identify what path entries need to
> ?be added to each package's ``__path__``, given that `path_entry`
> ?has been added to ``sys.path``.
>

Does the same apply to namespace sub-packages where their parent
package has an updated __path__?  So a recursion would take place in
some cases.

> * A new ``iter_namespaces(parent='')`` function to allow breadth-first
> ?traversal of namespaces in ``sys.namespace_packages``, by yielding
> ?the child namespace packages of `parent`. ?For example, calling
> ?``iter_namespaces("zope")`` might yield ``zope.app`` and
> ?``zope.products`` (if they are namespace packages registered in
> ?``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
> ?This function is needed to implement ``extend_namespaces()``, but
> ?is potentially useful to others.
>
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
> ?yield the names of namespace package portions.
>
> In addition to the above changes, the ``zipimport`` importer should
> have its ``iter_modules()`` implementation similarly changed. ?(Note:
> current versions of Python implement this via a shim in ``pkgutil``,
> so technically this is also a change to ``pkgutil``.)
>
>
> Implementation Notes
> --------------------
>
> For users, developers, and distributors of namespace packages:
>
> * ``sys.namespace_packages`` is allowed to contain non-existent or
> ?not-yet-imported package names; code that uses its contents should
> ?not assume that every name in this set is also present in
> ?sys.packages or that importing the name will necessarily succeed.
>
> * ``*.ns`` files must be empty or contain only ASCII whitespace
> ?characters. ?This leaves open the possibility for future extension
> ?to the format.
>
> * Files contained within a namespace package portion directory must
> ?be *unique* to that portion, so that the portion can be distributed
> ?as a vendor package without any filename overlap. ?This applies to
> ?modules and data files as well as ``*.ns`` files.
>
> ?(For ``*.ns`` files themselves, uniqueness can be achieved simply by
> ?giving them a name based on the distribution that contains the file,
> ?and it is recommended that packaging tools support doing this
> ?automatically.)
>
> * Although this PEP supports the use of non-empty ``__init__`` modules
> ?in namespace packages, their usage is controversial. ?If more than
> ?one package portion contains an ``__init__`` module, at most one of
> ?them will be executed, possibly leading to silent errors.
>

As noted above, the implementation outlined in the "PEP \302
Extension" section seems ambiguous on this point.  However, I think
this bullet does a great job clarifying about __init__.py modules.

> ?Therefore, if you must include an ``__init__`` module in your
> ?namespace package, make sure that it is provided by exactly **one**
> ?distribution, and that all other distributions using that module's
> ?contents are defined so as to have an installation dependency on
> ?the distribution containing the ``__init__`` module. ?Otherwise,
> ?it may not be present in some installations.
>
> ?(Note: for historical reasons, existing namespace packages nearly
> ?always include ``__init__`` modules, but they are usually empty
> ?except for code to declare the package a namespace. ?Under this
> ?proposal, these nearly-empty modules could and should be replaced
> ?by an empty ``*.ns`` file in the package directory.)
>
> For those implementing PEP 302 importer objects:
>
> * Importers that support the ``iter_modules()`` method and want to add
> ?namespace support should modify their ``iter_modules()``
> ?method so that it discovers and list namespace packages as well as
> ?standard modules and packages.
>

The iter_modules() method isn't part of PEP 302, is it?  Where can I
find out more about it?

> * For implementation efficiency, an importer is allowed to cache
> ?information (such as whether a directory exists and whether an
> ?``__init__`` module is present in it) between the invocation of a
> ?``namespace_subpath()`` call and a subsequent ``find_module()`` call
> ?for the same name.
>
> ?It should, however, avoid retaining such cached information for any
> ?longer than the next method call, and it should also verify that the
> ?request is in fact for the same module/package name, as it is not
> ?guaranteed that a ``namespace_subpath()`` call will always be
> ?followed by a matching ``find_module()`` call. ?(After all, an
> ?``__init__`` module may already have been supplied by an earlier
> ?importer on the path.)
>
> * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
> ?not need to implement ``namespace_subpath()``, because the method
> ?is only called on importers corresponding to ``sys.path`` entries.'

And parent.__path__ for namespace submodules?

> ?If a meta importer wishes to support namespace packages, it must
> ?do so entirely within its ``find_module()`` implementation.
>
> ?Unfortunately, it is unlikely that any such implementation will be
> ?able to merge its namespace portions with those of other meta
> ?importers or ``sys.path`` importers, so the meaning of "supporting
> ?namespace packages" for a meta importer is currently undefined.
>

While I'm not sure meta importers need to be left out, I suppose it
isn't critical since the work-around isn't that hard, nor widely
needed.  Thus the message here is that this PEP only applies to the
use of sys.path_hooks and sys.path_importer_cache.  It would be nice
for that to be clear up front.

>...
>
> ?Further, there is an immense body of existing code (including the
> ?distutils and many other packaging tools) that expect a package
> ?directory's name to be the same as the package name.

Correct me if I'm wrong, but I have understood that for namespace
packages in the PEP, the directory name does not have to be the
package name.


Back to namespace subpackages, it's unclear how they should work.
Either a namespace package is at the top level of a sys.path entry, or
its a module of a parent package, namespace or otherwise.  The
top-level case is pretty clear.  However, the subpackage case is not.
I don't see namespace subpackages as being too practical with
non-namespace parent packages, but I'm probably missing something.

In the case that a namespace subpackage has a namespace package
parent, how would that look?  In the email to Barry you gave an
example that covered this a little, but it's still pretty unclear.  In
any case, I think more examples with namespace subpackages would be
helpful.  Or maybe namespace subpackages are a corner case that
doesn't deserve the keystrokes I've given it.  ;)

Other than that, the PEP is pretty clear (coming from a less
experienced perspective).  Thanks again for working on this!

-eric

From ncoghlan at gmail.com  Sat Jul  9 11:07:14 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 9 Jul 2011 19:07:14 +1000
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <CALFfu7CPh9o07FBM-FcYZ4sK+c7erkCPC8CgJatVWABkfZR46Q@mail.gmail.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CALFfu7CPh9o07FBM-FcYZ4sK+c7erkCPC8CgJatVWABkfZR46Q@mail.gmail.com>
Message-ID: <CADiSq7eMeZY31NnUyMHSW2GMhvTQLERtVb+TJtajOJJZ=bn1mg@mail.gmail.com>

On Sat, Jul 9, 2011 at 6:42 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> Correct me if I'm wrong, but I have understood that for namespace
> packages in the PEP, the directory name does not have to be the
> package name.

No, if the directory name doesn't match, the interpreter won't even
check it for __init__.py or .ns files, so there's no way for it to
satisfy an import request (remember, the .ns files don't live directly
in directories on sys.path - they live in *subdirectories* of those
directories)

> Back to namespace subpackages, it's unclear how they should work.
> Either a namespace package is at the top level of a sys.path entry, or
> its a module of a parent package, namespace or otherwise. ?The
> top-level case is pretty clear. ?However, the subpackage case is not.
> I don't see namespace subpackages as being too practical with
> non-namespace parent packages, but I'm probably missing something.
>
> In the case that a namespace subpackage has a namespace package
> parent, how would that look? ?In the email to Barry you gave an
> example that covered this a little, but it's still pretty unclear. ?In
> any case, I think more examples with namespace subpackages would be
> helpful. ?Or maybe namespace subpackages are a corner case that
> doesn't deserve the keystrokes I've given it. ?;)

The subpackage import is just a scaled down version of top-level
imports, with pkg.__path__ taking on the role of sys.path.

Now, in normal circumstances, it's a pretty degenerate case with
pkg.__path__ containing only a single directory (the directory where
the __init__.py file lives).

namespace packages (either created via PEP 382 or one of the existing
namespace package systems) are one way to get multiple entries into
pkg.__path__, but not the only way (__init__.py can do it, as can
other application code).

Regardless of how it happens, the process of handling it under PEP 382
is the same as it is for top-level imports - once the import machinery
sees a .ns file in a directory that matches the current import inside
that package, it then scans the rest of the pkg.__path__ entries
looking for more directories that also contain .ns files, adding all
those directories to pkg.subpkg.__path__

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From pje at telecommunity.com  Sat Jul  9 20:52:49 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sat, 09 Jul 2011 14:52:49 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <CADiSq7cAnaPwo8+uK3tza6RnLp8Hsf0FOP8Ej=zwRAw9ugZqfA@mail.g
	mail.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CADiSq7cAnaPwo8+uK3tza6RnLp8Hsf0FOP8Ej=zwRAw9ugZqfA@mail.gmail.com>
Message-ID: <20110709185310.280C33A404D@sparrow.telecommunity.com>

At 06:13 PM 7/9/2011 +1000, Nick Coghlan wrote:
>Nice write up!

Thanks.

>Barry covered most things, just a few minor comments below.
>
>On Sat, Jul 9, 2011 at 5:51 AM, P.J. Eby <pje at telecommunity.com> wrote:
> > Vendor Package
> >    A group of files installed by an operating system's packaging
> >    mechanism (e.g. Debian or Redhat packages installed on Linux
> >    systems).
>
>s/Redhat/RPM/
>
>(or Red Hat. Either works, but Redhat is wrong)

Done.


> > * A new ``extend_namespaces(path_entry)`` function, to extend existing
> >  and already-imported namespace packages' ``__path__`` attributes to
> >  include any portions found in a new ``sys.path`` entry.  This
> >  function should be called by applications extending ``sys.path``
> >  at runtime, e.g. to include a plugin directory or add an egg to the
> >  path.
>
>I believe this may need a "parent=''" argument so it can also be used
>to extend a package path.

Yes, it does; see the sketch here:  http://pastebin.com/G7fdFG2V

I just left that bit out of the spec as an extra detail that would 
need explaining; I see it as really being internal to the API being 
provided in that case.



> > For users, developers, and distributors of namespace packages:
> >
> > * ``sys.namespace_packages`` is allowed to contain non-existent or
> >  not-yet-imported package names; code that uses its contents should
> >  not assume that every name in this set is also present in
> >  sys.packages or that importing the name will necessarily succeed.
>
>s/sys.packages/sys.modules/

Got it.


> > * ``*.ns`` files must be empty or contain only ASCII whitespace
> >  characters.  This leaves open the possibility for future extension
> >  to the format.
>
>+1 for Barry's suggestion to mandate # as a comment prefix and
>disallow any other contents (even though the interpreter itself won't
>enforce that).

Yeah...  Honestly the more we talk about all that the more inclined I 
am to saying that it's a zero-length file, just to avoid more spec detail.  ;-)

I just don't know if there are any issues with packaging or revision 
control systems for zero length files, not to mention whether there 
are OSes where it's hard to make a zero-length file.


From pje at telecommunity.com  Sat Jul  9 23:20:30 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sat, 09 Jul 2011 17:20:30 -0400
Subject: [Import-SIG] Is ".ns" really the right extension?
Message-ID: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>

Looking over the example code I added to the PEP draft (based on 
Barry's suggestion), it occurs to me that, like his example, mine is 
still confusing.

And, now that I look more closely at it, I see that the confusion in 
large part comes from the idea of naming something "ThisPart.ns" -- 
it implies that "ThisPart" is the namespace!

And it's not a namespace at all.  It's really a portion of the namespace.

It seems to me that what the actual meaning of a foo.ns file is, "The 
'foo' portion of the this namespace is installed here".  And that 
thus, foo.portion or foo.part or foo.contribution something like that 
would be more appropriate, given the PEP terminology.

I think that a change is needed here to make the PEP's narrative come 
together more cleanly.  I'm leaning towards calling them foo.contrib 
files, as in "The 'foo' distribution contributed to this portion of 
the enclosing package."

(Among other things, this makes the need for repeated files clearer; 
i.e., you add a contribution marker to each package directory you're 
putting files or subdirectories into.)

Overall, the narrative can then lose the constant references to *.ns 
files and instead talk about contribution markers -- i.e. a namespace 
package portion is a directory containing one or more contribution 
markers.  I think this will be clearer than the current text, and in 
particular it should make the example directory layout more meaningful to read.

Notice, too, that Eric Snow's confusion about how .ns files work 
seems to have been influenced by the terminology -- i.e., the 
expectation that a 'zope.ns' file was talking about a 'zope' 
namespace package and identifying the containing directory as part of 
the namespace, rather than the other way around.  Was that the case, 
Eric?  And if so, do you think that these layouts are any clearer?


     ProxyTypes-0.9.tgz:
         peak/
             ProxyTypes.contrib <- marks this as a namespace package portion
             util/
                 ProxyTypes.contrib   <- same for 'peak.util'
                 proxies.py

     Importing-1.10.tgz:
         peak/
             Importing.contrib   <- marks this as a namespace package portion
             util/
                 Importing.contrib   <- same for 'peak.util'
                 imports.py


     site-packages/   (or wherever)
         peak/
             Importing.contrib
             ProxyTypes.contrib   <- both distributions' 
contributions are merged
             util/
                 Importing.contrib   <- at both levels
                 ProxyTypes.contrib
                 imports.py
                 proxies.py

Any other thoughts?


From pje at telecommunity.com  Sat Jul  9 23:49:49 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sat, 09 Jul 2011 17:49:49 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <CALFfu7CPh9o07FBM-FcYZ4sK+c7erkCPC8CgJatVWABkfZR46Q@mail.g
	mail.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CALFfu7CPh9o07FBM-FcYZ4sK+c7erkCPC8CgJatVWABkfZR46Q@mail.gmail.com>
Message-ID: <20110709215010.43C7D3A404D@sparrow.telecommunity.com>

At 02:42 AM 7/9/2011 -0600, Eric Snow wrote:
>And if there were a "zope_part1" and a "zope_part2" directory, both
>with a zope.ns file in them, that namespace_subpath("zope") call would
>return ["/usr/lib/site-packages/zope_part1",
>"/usr/lib/site-packages/zope_part2"], right?  And if both also had a
>foo.ns file in them, the same would be returned for
>namespace_subpath("foo").

No; the directory is always named for the package, just like 
now.  We're just saying that we replace looking for __init__.py with 
looking for *.ns.


> > Finally, the ``__init__`` module code for the package (if it exists)
> > is located and executed in the new module's namespace.
> >
> > Each importer that returns a ``namespace_subpath()`` for the package
> > is asked to perform a standard ``find_module()`` for the package.
> > Since by the normal import rules, a directory containing an
> > ``__init__`` module is a package, this call should succeed if the
> > namespace package portion contains an ``__init__`` module, and the
> > importing can proceed normally from that point.
> >
>
>Is this last paragraph part of the finally?

Yes.

>   If so, what does calling
>find_module at this point accompish?  Do you mean load_module is also
>called for each that is found?

I'm adding this sentence to the end of that paragraph for clarification:

"""(That is, with a ``load_module()`` call to execute the first 
``__init__`` module found on the package's ``__path__``.)"""

Does that make it clearer?


>Will it be too easy (or conversely
>very likely) to have __init__.py collisions?

__init__ collisions (and the recommendation to not use __init__ 
modules at all) are addressed later below in the implementation notes.



> > There is one caveat, however.  The importers currently distributed
> > with Python expect that *they* will be the ones to initialize the
> > ``__path__`` attribute, which means that they must be changed to
> > either recognize that ``__path__`` has already been set and not
> > change it, or to handle namespace packages specially (e.g., via an
> > internal flag or checking ``sys.namespace_packages``).
> >
> > Similarly, any third-party importers wishing to support namespace
> > packages must make similar changes.
> >
>
>Seems like the caveat is dependent on the above algorithm.  If the
>module's __path__ were set with the namespace_subpath() results after
>the namespace package's import was all over, would it still be an
>issue?

No, but then we couldn't support __init__ modules executing with the 
correct __path__ value; notably, this would prevent __init__ modules 
from manipulating their own __path__.

Honestly, throwing out __init__ support entirely would make a LOT of 
things easier and simpler here, especially in the 2.x version.  But 
there was a vocal contingent of support for them in the original 
Python-Dev discussion.


> > Specifically the proposed changes and additions are:
>
>Maybe, "Specifically the proposed changes and additions to pkgutil
>are:", to clarify the context?

Ok.

> >
> > * A new ``namespace_subpath(importer, fullname)`` generic, allowing
> >  implementations to be registered for existing importers.
> >
> > * A new ``extend_namespaces(path_entry)`` function, to extend existing
> >  and already-imported namespace packages' ``__path__`` attributes to
> >  include any portions found in a new ``sys.path`` entry.  This
> >  function should be called by applications extending ``sys.path``
> >  at runtime, e.g. to include a plugin directory or add an egg to the
> >  path.
> >
> >  The implementation of this function does a simple breadth-first walk
> >  of ``sys.namespace_packages``, and performs any necessary
> >  ``namespace_subpath()`` calls to identify what path entries need to
> >  be added to each package's ``__path__``, given that `path_entry`
> >  has been added to ``sys.path``.
> >
>
>Does the same apply to namespace sub-packages where their parent
>package has an updated __path__?  So a recursion would take place in
>some cases.

Yes, that's what "breadth-first" meant here; i.e., first top-level 
namespaces, then second-level namespaces, and so on.  In actuality, I 
erred by saying breadth-first, though, what I actually meant is 
technically "pre-order traversal", i.e., parent nodes are touched 
before their children.  I'll tweak that to "top-down traversal" 
instead of "breadth-first walk", and add:

"""(Or, in the case of sub-packages, adding a derived subpath entry, 
based on their parent namespace's ``__path__``.)"""


>The iter_modules() method isn't part of PEP 302, is it?  Where can I
>find out more about it?

See pkgutil; it's something I added in Python 2.5 to help tools like 
pydoc better support zipfiles and other exotic importers.


> > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
> >  not need to implement ``namespace_subpath()``, because the method
> >  is only called on importers corresponding to ``sys.path`` entries.'
>
>And parent.__path__ for namespace submodules?

Yes.  Fixed.


> >  If a meta importer wishes to support namespace packages, it must
> >  do so entirely within its ``find_module()`` implementation.
> >
> >  Unfortunately, it is unlikely that any such implementation will be
> >  able to merge its namespace portions with those of other meta
> >  importers or ``sys.path`` importers, so the meaning of "supporting
> >  namespace packages" for a meta importer is currently undefined.
> >
>
>While I'm not sure meta importers need to be left out, I suppose it
>isn't critical since the work-around isn't that hard, nor widely
>needed.  Thus the message here is that this PEP only applies to the
>use of sys.path_hooks and sys.path_importer_cache.  It would be nice
>for that to be clear up front.

Ok, I added this:

"""(Note: the import machinery will NOT invoke this method for importers
on ``sys.meta_path``, because there is no path string associated with
such importers, and so the idea of a "subpath" makes no sense in that
case.)"""

just after this bit:

"""The Python import machinery will call this method on each importer
corresponding to a path entry in ``sys.path`` (for top-level imports)
or in a parent package ``__path__`` (for subpackage imports)."""

in the PEP 302 protocol description.



> >  Further, there is an immense body of existing code (including the
> >  distutils and many other packaging tools) that expect a package
> >  directory's name to be the same as the package name.
>
>Correct me if I'm wrong, but I have understood that for namespace
>packages in the PEP, the directory name does not have to be the
>package name.

Consider yourself corrected.  ;-)


>Back to namespace subpackages, it's unclear how they should work.
>Either a namespace package is at the top level of a sys.path entry, or
>its a module of a parent package, namespace or otherwise.  The
>top-level case is pretty clear.  However, the subpackage case is not.
>I don't see namespace subpackages as being too practical with
>non-namespace parent packages, but I'm probably missing something.

They aren't practical at all, no.  ;-)  I'll add an implementation 
note explaining that even though the spec doesn't require a namespace 
package's parent to also be a namespace, that there isn't any 
practical use in doing so, as the child __path__ is a collection of 
subpaths derived from the parent __path__, and thus it wouldn't 
combine with any other contributions that weren't installed to the 
same location.

Here's the text:

* In general, a namespace subpackage (e.g. ``peak.util``, ``zope.app``,
   etc.) must be a child of another namespace package (e.g. ``peak``,
   ``zope``, etc.).  This is not required by the spec or enforced by
   the implementation, but in practice, it is useless to put a
   namespace package inside a non-namespace package, as the child
   package's ``__path__`` will be a subset of the parent's.

   In other words, it will only work correctly if all the contributions
   to that namespace package are installed to the same physical
   location.  So, if you intend to use a namespace subpackage, you
   should always make its parent package a namespace as well.



From ericsnowcurrently at gmail.com  Sun Jul 10 00:30:28 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 9 Jul 2011 16:30:28 -0600
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110709215010.43C7D3A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CALFfu7CPh9o07FBM-FcYZ4sK+c7erkCPC8CgJatVWABkfZR46Q@mail.gmail.com>
	<20110709215010.43C7D3A404D@sparrow.telecommunity.com>
Message-ID: <CALFfu7CTg9D7WrkajfvQ6_vxLXbpfUa_kvpKCoXqga+62smVCA@mail.gmail.com>

On Sat, Jul 9, 2011 at 3:49 PM, P.J. Eby <pje at telecommunity.com> wrote:
> At 02:42 AM 7/9/2011 -0600, Eric Snow wrote:
>>
>> And if there were a "zope_part1" and a "zope_part2" directory, both
>> with a zope.ns file in them, that namespace_subpath("zope") call would
>> return ["/usr/lib/site-packages/zope_part1",
>> "/usr/lib/site-packages/zope_part2"], right? ?And if both also had a
>> foo.ns file in them, the same would be returned for
>> namespace_subpath("foo").
>
> No; the directory is always named for the package, just like now. ?We're
> just saying that we replace looking for __init__.py with looking for *.ns.
>
>
>> > Finally, the ``__init__`` module code for the package (if it exists)
>> > is located and executed in the new module's namespace.
>> >
>> > Each importer that returns a ``namespace_subpath()`` for the package
>> > is asked to perform a standard ``find_module()`` for the package.
>> > Since by the normal import rules, a directory containing an
>> > ``__init__`` module is a package, this call should succeed if the
>> > namespace package portion contains an ``__init__`` module, and the
>> > importing can proceed normally from that point.
>> >
>>
>> Is this last paragraph part of the finally?
>
> Yes.
>
>> ?If so, what does calling
>> find_module at this point accompish? ?Do you mean load_module is also
>> called for each that is found?
>
> I'm adding this sentence to the end of that paragraph for clarification:
>
> """(That is, with a ``load_module()`` call to execute the first ``__init__``
> module found on the package's ``__path__``.)"""
>
> Does that make it clearer?
>

Yeah, that's great.

>
>> Will it be too easy (or conversely
>> very likely) to have __init__.py collisions?
>
> __init__ collisions (and the recommendation to not use __init__ modules at
> all) are addressed later below in the implementation notes.
>
>
>
>> > There is one caveat, however. ?The importers currently distributed
>> > with Python expect that *they* will be the ones to initialize the
>> > ``__path__`` attribute, which means that they must be changed to
>> > either recognize that ``__path__`` has already been set and not
>> > change it, or to handle namespace packages specially (e.g., via an
>> > internal flag or checking ``sys.namespace_packages``).
>> >
>> > Similarly, any third-party importers wishing to support namespace
>> > packages must make similar changes.
>> >
>>
>> Seems like the caveat is dependent on the above algorithm. ?If the
>> module's __path__ were set with the namespace_subpath() results after
>> the namespace package's import was all over, would it still be an
>> issue?
>
> No, but then we couldn't support __init__ modules executing with the correct
> __path__ value; notably, this would prevent __init__ modules from
> manipulating their own __path__.
>

Good point.

> Honestly, throwing out __init__ support entirely would make a LOT of things
> easier and simpler here, especially in the 2.x version. ?But there was a
> vocal contingent of support for them in the original Python-Dev discussion.
>
>
>> > Specifically the proposed changes and additions are:
>>
>> Maybe, "Specifically the proposed changes and additions to pkgutil
>> are:", to clarify the context?
>
> Ok.
>
>> >
>> > * A new ``namespace_subpath(importer, fullname)`` generic, allowing
>> > ?implementations to be registered for existing importers.
>> >
>> > * A new ``extend_namespaces(path_entry)`` function, to extend existing
>> > ?and already-imported namespace packages' ``__path__`` attributes to
>> > ?include any portions found in a new ``sys.path`` entry. ?This
>> > ?function should be called by applications extending ``sys.path``
>> > ?at runtime, e.g. to include a plugin directory or add an egg to the
>> > ?path.
>> >
>> > ?The implementation of this function does a simple breadth-first walk
>> > ?of ``sys.namespace_packages``, and performs any necessary
>> > ?``namespace_subpath()`` calls to identify what path entries need to
>> > ?be added to each package's ``__path__``, given that `path_entry`
>> > ?has been added to ``sys.path``.
>> >
>>
>> Does the same apply to namespace sub-packages where their parent
>> package has an updated __path__? ?So a recursion would take place in
>> some cases.
>
> Yes, that's what "breadth-first" meant here; i.e., first top-level
> namespaces, then second-level namespaces, and so on. ?In actuality, I erred
> by saying breadth-first, though, what I actually meant is technically
> "pre-order traversal", i.e., parent nodes are touched before their children.
> ?I'll tweak that to "top-down traversal" instead of "breadth-first walk",
> and add:
>
> """(Or, in the case of sub-packages, adding a derived subpath entry, based
> on their parent namespace's ``__path__``.)"""
>

That helps a lot.

>
>> The iter_modules() method isn't part of PEP 302, is it? ?Where can I
>> find out more about it?
>
> See pkgutil; it's something I added in Python 2.5 to help tools like pydoc
> better support zipfiles and other exotic importers.
>

Yeah, I see iter_modules() in pkgutil, but was unaware of it on
importer objects.  However, my ignorance is irrelevant to the PEP, as
I certainly agree with the bullet in the case that importer objects
have iter_modules().  :)

>
>> > * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
>> > ?not need to implement ``namespace_subpath()``, because the method
>> > ?is only called on importers corresponding to ``sys.path`` entries.'
>>
>> And parent.__path__ for namespace submodules?
>
> Yes. ?Fixed.
>
>
>> > ?If a meta importer wishes to support namespace packages, it must
>> > ?do so entirely within its ``find_module()`` implementation.
>> >
>> > ?Unfortunately, it is unlikely that any such implementation will be
>> > ?able to merge its namespace portions with those of other meta
>> > ?importers or ``sys.path`` importers, so the meaning of "supporting
>> > ?namespace packages" for a meta importer is currently undefined.
>> >
>>
>> While I'm not sure meta importers need to be left out, I suppose it
>> isn't critical since the work-around isn't that hard, nor widely
>> needed. ?Thus the message here is that this PEP only applies to the
>> use of sys.path_hooks and sys.path_importer_cache. ?It would be nice
>> for that to be clear up front.
>
> Ok, I added this:
>
> """(Note: the import machinery will NOT invoke this method for importers
> on ``sys.meta_path``, because there is no path string associated with
> such importers, and so the idea of a "subpath" makes no sense in that
> case.)"""
>
> just after this bit:
>
> """The Python import machinery will call this method on each importer
> corresponding to a path entry in ``sys.path`` (for top-level imports)
> or in a parent package ``__path__`` (for subpackage imports)."""
>
> in the PEP 302 protocol description.
>

Nice!  Would it be worth pointing out that the focus is on
sys.pathhooks and sys.path_importer_cache?  Something like "Note: ...
Instead, this PEP is focused on the import machinery surrounding
sys.pathhooks.)"  I only bring this up because the specificity of what
the focus **is** helped me grasp what the implementation involves.

>
>
>> > ?Further, there is an immense body of existing code (including the
>> > ?distutils and many other packaging tools) that expect a package
>> > ?directory's name to be the same as the package name.
>>
>> Correct me if I'm wrong, but I have understood that for namespace
>> packages in the PEP, the directory name does not have to be the
>> package name.
>
> Consider yourself corrected. ?;-)
>
>
>> Back to namespace subpackages, it's unclear how they should work.
>> Either a namespace package is at the top level of a sys.path entry, or
>> its a module of a parent package, namespace or otherwise. ?The
>> top-level case is pretty clear. ?However, the subpackage case is not.
>> I don't see namespace subpackages as being too practical with
>> non-namespace parent packages, but I'm probably missing something.
>
> They aren't practical at all, no. ?;-) ?I'll add an implementation note
> explaining that even though the spec doesn't require a namespace package's
> parent to also be a namespace, that there isn't any practical use in doing
> so, as the child __path__ is a collection of subpaths derived from the
> parent __path__, and thus it wouldn't combine with any other contributions
> that weren't installed to the same location.
>
> Here's the text:
>
> * In general, a namespace subpackage (e.g. ``peak.util``, ``zope.app``,
> ?etc.) must be a child of another namespace package (e.g. ``peak``,
> ?``zope``, etc.). ?This is not required by the spec or enforced by
> ?the implementation, but in practice, it is useless to put a
> ?namespace package inside a non-namespace package, as the child
> ?package's ``__path__`` will be a subset of the parent's.
>
> ?In other words, it will only work correctly if all the contributions
> ?to that namespace package are installed to the same physical
> ?location. ?So, if you intend to use a namespace subpackage, you
> ?should always make its parent package a namespace as well.
>
>
>

Sounds great.  Much appreciated.

-eric

From ericsnowcurrently at gmail.com  Sun Jul 10 00:58:14 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 9 Jul 2011 16:58:14 -0600
Subject: [Import-SIG] Is ".ns" really the right extension?
In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
Message-ID: <CALFfu7DAxVO_1=DR+QVSHefAovEjV9mNzzxOV=1vN1XqDbjSrQ@mail.gmail.com>

On Sat, Jul 9, 2011 at 3:20 PM, P.J. Eby <pje at telecommunity.com> wrote:
> Looking over the example code I added to the PEP draft (based on Barry's
> suggestion), it occurs to me that, like his example, mine is still
> confusing.
>
> And, now that I look more closely at it, I see that the confusion in large
> part comes from the idea of naming something "ThisPart.ns" -- it implies
> that "ThisPart" is the namespace!
>
> And it's not a namespace at all. ?It's really a portion of the namespace.
>
> It seems to me that what the actual meaning of a foo.ns file is, "The 'foo'
> portion of the this namespace is installed here". ?And that thus,
> foo.portion or foo.part or foo.contribution something like that would be
> more appropriate, given the PEP terminology.
>
> I think that a change is needed here to make the PEP's narrative come
> together more cleanly. ?I'm leaning towards calling them foo.contrib files,
> as in "The 'foo' distribution contributed to this portion of the enclosing
> package."
>
> (Among other things, this makes the need for repeated files clearer; i.e.,
> you add a contribution marker to each package directory you're putting files
> or subdirectories into.)
>
> Overall, the narrative can then lose the constant references to *.ns files
> and instead talk about contribution markers -- i.e. a namespace package
> portion is a directory containing one or more contribution markers. ?I think
> this will be clearer than the current text, and in particular it should make
> the example directory layout more meaningful to read.
>
> Notice, too, that Eric Snow's confusion about how .ns files work seems to
> have been influenced by the terminology -- i.e., the expectation that a
> 'zope.ns' file was talking about a 'zope' namespace package and identifying
> the containing directory as part of the namespace, rather than the other way
> around. ?Was that the case, Eric? ?And if so, do you think that these
> layouts are any clearer?
>

Yeah, that is spot on.  I definitely had it backwards.  Those examples
make _much_ more sense now, particularly because of the different
extension, and partly from your and Nick's explanations.

>
> ? ?ProxyTypes-0.9.tgz:
> ? ? ? ?peak/
> ? ? ? ? ? ?ProxyTypes.contrib <- marks this as a namespace package portion
> ? ? ? ? ? ?util/
> ? ? ? ? ? ? ? ?ProxyTypes.contrib ? <- same for 'peak.util'
> ? ? ? ? ? ? ? ?proxies.py
>
> ? ?Importing-1.10.tgz:
> ? ? ? ?peak/
> ? ? ? ? ? ?Importing.contrib ? <- marks this as a namespace package portion
> ? ? ? ? ? ?util/
> ? ? ? ? ? ? ? ?Importing.contrib ? <- same for 'peak.util'
> ? ? ? ? ? ? ? ?imports.py
>
>
> ? ?site-packages/ ? (or wherever)
> ? ? ? ?peak/
> ? ? ? ? ? ?Importing.contrib
> ? ? ? ? ? ?ProxyTypes.contrib ? <- both distributions' contributions are
> merged
> ? ? ? ? ? ?util/
> ? ? ? ? ? ? ? ?Importing.contrib ? <- at both levels
> ? ? ? ? ? ? ? ?ProxyTypes.contrib
> ? ? ? ? ? ? ? ?imports.py
> ? ? ? ? ? ? ? ?proxies.py
>
> Any other thoughts?
>

If two contributions are added into the same directory (a la that last
example) is there a way of telling programatically what portions came
from which contribution?

Also, if two contributions are made to a namespace package on the same
sys.path entry, they must go into the same directory, right?  Is there
a way around that, like using zip files or something (might we find
all three above examples in site-packages)?  The idea of having them
in separate plain directories (without __init__.py) for the same
sys.path entry is part of what motivated my earlier confusion.

Finally, say a portion is "contributed" to an existing non-namespace
package [directory], turning it into a namespace package.  The package
is then impacted by PEP 382 (particularly regarding __init__.py) when
it may not have been developed for use as a namespace package.  Is
this case worth considering?

-eric

> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From pje at telecommunity.com  Sun Jul 10 05:50:37 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sat, 09 Jul 2011 23:50:37 -0400
Subject: [Import-SIG] Is ".ns" really the right extension?
In-Reply-To: <CALFfu7DAxVO_1=DR+QVSHefAovEjV9mNzzxOV=1vN1XqDbjSrQ@mail.g
	mail.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<CALFfu7DAxVO_1=DR+QVSHefAovEjV9mNzzxOV=1vN1XqDbjSrQ@mail.gmail.com>
Message-ID: <20110710035101.3BD893A404D@sparrow.telecommunity.com>

At 04:58 PM 7/9/2011 -0600, Eric Snow wrote:
>If two contributions are added into the same directory (a la that last
>example) is there a way of telling programatically what portions came
>from which contribution?

See PEP 376, which addresses that issue.


>Also, if two contributions are made to a namespace package on the same
>sys.path entry, they must go into the same directory, right?

Yes.


>   Is there
>a way around that, like using zip files or something (might we find
>all three above examples in site-packages)?  The idea of having them
>in separate plain directories (without __init__.py) for the same
>sys.path entry is part of what motivated my earlier confusion.

Where did you get that idea from?  Was there a particular part of the 
PEP I should change to avoid creating that idea, or did you have it 
before you read the new draft?


>Finally, say a portion is "contributed" to an existing non-namespace
>package [directory], turning it into a namespace package.  The package
>is then impacted by PEP 382 (particularly regarding __init__.py) when
>it may not have been developed for use as a namespace package.  Is
>this case worth considering?

The same thing would happen now if you installed two distributions 
containing files for the same package.  So no, I don't think it's 
worth elaborating on.  The PEP is starting to get kind of long as it 
is; I'm already a little worried about backlash when this goes back 
to Python-Dev, actually, *despite* the fact that it's more precisely 
specified, simpler, etc. than the previous shorter version.   :-(


From ericsnowcurrently at gmail.com  Sun Jul 10 06:11:57 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 9 Jul 2011 22:11:57 -0600
Subject: [Import-SIG] Is ".ns" really the right extension?
In-Reply-To: <20110710035101.3BD893A404D@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<CALFfu7DAxVO_1=DR+QVSHefAovEjV9mNzzxOV=1vN1XqDbjSrQ@mail.gmail.com>
	<20110710035101.3BD893A404D@sparrow.telecommunity.com>
Message-ID: <CALFfu7BLdX9FN0VXrA3z=ApwepvEARQrtSzr0VV2780kNvD8EA@mail.gmail.com>

On Sat, Jul 9, 2011 at 9:50 PM, P.J. Eby <pje at telecommunity.com> wrote:
> At 04:58 PM 7/9/2011 -0600, Eric Snow wrote:
>>
>> If two contributions are added into the same directory (a la that last
>> example) is there a way of telling programatically what portions came
>> from which contribution?
>
> See PEP 376, which addresses that issue.
>
>
>> Also, if two contributions are made to a namespace package on the same
>> sys.path entry, they must go into the same directory, right?
>
> Yes.
>
>
>> ?Is there
>> a way around that, like using zip files or something (might we find
>> all three above examples in site-packages)? ?The idea of having them
>> in separate plain directories (without __init__.py) for the same
>> sys.path entry is part of what motivated my earlier confusion.
>
> Where did you get that idea from? ?Was there a particular part of the PEP I
> should change to avoid creating that idea, or did you have it before you
> read the new draft?
>

I wish I could pin crazy things like that on someone else, but I'm
afraid it's my own.  Not having used namespace packages before I was
trying to piece together the concept from bits and pieces when Barry
brought up their sprint last month.  It took this long to get through
to me that I was a little backwards.  :)

>
>> Finally, say a portion is "contributed" to an existing non-namespace
>> package [directory], turning it into a namespace package. ?The package
>> is then impacted by PEP 382 (particularly regarding __init__.py) when
>> it may not have been developed for use as a namespace package. ?Is
>> this case worth considering?
>
> The same thing would happen now if you installed two distributions
> containing files for the same package. ?So no, I don't think it's worth
> elaborating on. ?The PEP is starting to get kind of long as it is; I'm
> already a little worried about backlash when this goes back to Python-Dev,
> actually, *despite* the fact that it's more precisely specified, simpler,
> etc. than the previous shorter version. ? :-(
>
>

Yeah, I agree.  For what it's worth, I think the PEP is a lot clearer now.

-eric

From eric at trueblade.com  Sun Jul 10 18:33:00 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Sun, 10 Jul 2011 12:33:00 -0400
Subject: [Import-SIG] Is ".ns" really the right extension?
In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
Message-ID: <4E19D43C.2080902@trueblade.com>

On 7/9/2011 5:20 PM, P.J. Eby wrote:
...
> And it's not a namespace at all.  It's really a portion of the namespace.

Agreed.

> I think that a change is needed here to make the PEP's narrative come
> together more cleanly.  I'm leaning towards calling them foo.contrib
> files, as in "The 'foo' distribution contributed to this portion of the
> enclosing package."

I would paint this particular bikeshed "foo.nspart", since it's a part
of a namespace. "foo.contrib" sounds like a license to me.

Although since the PEP uses "portion" to describe what these are, I
guess I could live with "foo.portion" as well.

I do have some thoughts on the other emails in this thread, but no time
today to write them up.

Eric.

From martin at v.loewis.de  Sun Jul 10 19:27:29 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 10 Jul 2011 19:27:29 +0200
Subject: [Import-SIG] PEP 382: Partial packages (was: Is ".ns" really
 the right extension?)
In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
Message-ID: <4E19E101.4020508@v.loewis.de>

I've been talking to people about how things should be named in PEP 382.

I think "namespace package" is the wrong name for the feature: every
package is a namespace, as is every class, object, and function. In
setuptools, there might have been a point in calling it "namespace
package" to indicate it is a *mere* namespace (i.e. can't contain
code on its own); this won't be the case for the PEP 382 feature.

Likewise, people had objections to the .ns extension:
- as Phillip points out, people may confuse the file with actually
  being a namespace
- the .ns extension does not indicate that it belongs to Python,
  which apparently is important to people (who otherwise don't
  know what piece of software is in charge of that file); this
  is also a flaw in Phillip's proposed '.contrib' file
- the extension asks to invoke Godwin's law

So here is my proposal:

- the feature defined in PEP 382 is called "partial package",
  indicating that the entire package may be more than that.
  "package portion" could work as well, as could "component
  package" or "package component"; "partial package has the
  advantage of raising associations with C#'s "partial classes"
  which are esstentially the same feature (but on a class level).
- the extension is ".pyp", for "Python Package"

What do you think?

Regards,
Martin

From eric at trueblade.com  Sun Jul 10 19:59:03 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Sun, 10 Jul 2011 13:59:03 -0400
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <4E19E101.4020508@v.loewis.de>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de>
Message-ID: <4E19E867.4020703@trueblade.com>

On 07/10/2011 01:27 PM, "Martin v. L?wis" wrote:
> I've been talking to people about how things should be named in PEP 382.
> 
> I think "namespace package" is the wrong name for the feature: every
> package is a namespace, as is every class, object, and function. In
> setuptools, there might have been a point in calling it "namespace
> package" to indicate it is a *mere* namespace (i.e. can't contain
> code on its own); this won't be the case for the PEP 382 feature.

I agree "namespace package" is not a great name. I don't see any
particular problem with changing to a different name, though.

> Likewise, people had objections to the .ns extension:
> - as Phillip points out, people may confuse the file with actually
>   being a namespace
> - the .ns extension does not indicate that it belongs to Python,
>   which apparently is important to people (who otherwise don't
>   know what piece of software is in charge of that file); this
>   is also a flaw in Phillip's proposed '.contrib' file
> - the extension asks to invoke Godwin's law

In my head I was thinking .pyns, but I thought people might pronounce
the "y" as a long "e". I hadn't thought of Godwin's law with .ns, but I
see your point.

> So here is my proposal:
> 
> - the feature defined in PEP 382 is called "partial package",
>   indicating that the entire package may be more than that.
>   "package portion" could work as well, as could "component
>   package" or "package component"; "partial package has the
>   advantage of raising associations with C#'s "partial classes"
>   which are esstentially the same feature (but on a class level).
> - the extension is ".pyp", for "Python Package"
> 
> What do you think?

Partial package works for me. I too like the association with partial
classes. ".pyp" is okay, although I'd avoid saying it stands for "Python
Package", since the presence of the file is not what makes this code a
package, it makes it a partial package.

Eric.

From barry at python.org  Sun Jul 10 22:32:10 2011
From: barry at python.org (Barry Warsaw)
Date: Sun, 10 Jul 2011 16:32:10 -0400
Subject: [Import-SIG] Is ".ns" really the right extension?
In-Reply-To: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
Message-ID: <20110710163210.03f62155@resist>

On Jul 09, 2011, at 05:20 PM, P.J. Eby wrote:

>It seems to me that what the actual meaning of a foo.ns file is, "The 'foo'
>portion of the this namespace is installed here".  And that thus, foo.portion
>or foo.part or foo.contribution something like that would be more
>appropriate, given the PEP terminology.

+1 and your rewritten example makes a lot of sense.

I don't particularly like .contrib, and I saw in a followup that someone
proposed .pyp.  I'd be fine with that, but if you want something more
descriptive (i.e. longer), then .portion works for me.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/c2f926bf/attachment.pgp>

From barry at python.org  Sun Jul 10 22:34:21 2011
From: barry at python.org (Barry Warsaw)
Date: Sun, 10 Jul 2011 16:34:21 -0400
Subject: [Import-SIG] PEP 382: Partial packages (was: Is ".ns" really
 the right extension?)
In-Reply-To: <4E19E101.4020508@v.loewis.de>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de>
Message-ID: <20110710163421.3b086452@resist>

On Jul 10, 2011, at 07:27 PM, Martin v. L?wis wrote:

>- the feature defined in PEP 382 is called "partial package",
>  indicating that the entire package may be more than that.
>  "package portion" could work as well, as could "component
>  package" or "package component"; "partial package has the
>  advantage of raising associations with C#'s "partial classes"
>  which are esstentially the same feature (but on a class level).
>- the extension is ".pyp", for "Python Package"

I like "package portions" and .pyp.  "partial packages" would be okay, but to
me it puts the emphasis in the wrong place (i.e in what's missing rather than
what's present).

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/8cdfda2c/attachment.pgp>

From barry at python.org  Sun Jul 10 22:36:39 2011
From: barry at python.org (Barry Warsaw)
Date: Sun, 10 Jul 2011 16:36:39 -0400
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <4E19E867.4020703@trueblade.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
Message-ID: <20110710163639.5c3804c5@resist>

On Jul 10, 2011, at 01:59 PM, Eric V. Smith wrote:

>Partial package works for me. I too like the association with partial
>classes. ".pyp" is okay, although I'd avoid saying it stands for "Python
>Package", since the presence of the file is not what makes this code a
>package, it makes it a partial package.

I'd say it stands for "Python portion" which I guess isn't as descriptive as
"Python package portion", but it's close enough.   And at least according to

http://en.wikipedia.org/wiki/List_of_file_formats_(alphabetical)

.pyp is unused.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/c046b228/attachment.pgp>

From barry at python.org  Sun Jul 10 23:01:36 2011
From: barry at python.org (Barry Warsaw)
Date: Sun, 10 Jul 2011 17:01:36 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110709000146.06C313A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<20110708183135.7c9fa5d5@limelight.wooz.org>
	<20110709000146.06C313A404D@sparrow.telecommunity.com>
Message-ID: <20110710170136.62976255@resist>

On Jul 08, 2011, at 08:01 PM, P.J. Eby wrote:

>At 06:31 PM 7/8/2011 -0400, Barry Warsaw wrote:
>>Thanks!  I've been trying to catch up on the mailing list traffic today, and
>>grabbed your prototype code.  I plan on committing it to MvL's pep382 hg
>>branch so we have a place to play with it.
>
>You should probably start from this version instead:
>
>   http://pastebin.com/Wv77WYyb
>
>It's got some work on other things like iter_modules, extend_namespaces, etc.

Are you working from a publicly available repo?  If not, would you like to be?
<wink>.  It will make collaboration easier, and MvL's hg branch is already
available and I think entirely appropriate for this code, at least until its
merged back into trunk.

>> >Portion
>> >     A set of files in a single directory (possibly inside a zip file
>> >     or other storage mechanism) that contribute modules or subpackages
>> >     to a namespace package.  The contents of each portion ``sys.path``
>>
>>This one got cut off.
>
>Oops.  A bad edit; ignore that sentence fragment, it was replaced by language
>in the definition that followed it.

Cool.

>>Do you need to explain a little more why __path__ is significant, and why the
>>registration function is required?
>
>Revsed paragraph:
>
>====
>In current Python versions, however, a registration function (such as
>``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
>must be explicitly invoked in order to set up the package's
>``__path__``.  (By default, a package's ``__path__`` lists only one
>directory, so to allow imports from more than one directory, the
>``__path__`` must be explicitly extended in code.)
>====

I'd only add something like "Python searches a package's __path__ instead of
sys.path when it's looking for subpackages.  Yes, I know this is covered in
PEP 302, but I think it couldn't hurt a little extra text here.  Your call
though.

>> >Vendor packages typically must not provide overlapping files, and an
>> >attempt to install a vendor package that has a file already on disk
>> >will fail or cause unpredictable behavior.  As vendors might choose to
>> >package distributions such that they will end up all in a single
>> >directory for the namespace package, all portions would contribute
>> >conflicting ``__init__.py`` files.
>>
>>I might word this a little differently.  Perhaps:
>>
>>Vendor packaging standards require every file on disk to be owned by exactly
>>one vendor package.  But because each portion of a namespace package may be
>>contained in a separate vendor package, multiple vendor packages would have to
>>own the namespace package's __init__.py file.  For example, would the
>>``zope.interface`` vendor package own ``zope/__init__.py`` or would the
>>``zope.component`` vendor package own it?  Different vendors handle this
>>conflict differently, and in fact, different packaging tools from the same
>>vendor can handle this differently, which can cause consistency problems.
>
>I took the original wording as directly as practical from MvLs, but I agree
>yours is clearer.  OTOH, I think the "fail or cause unpredictable behavior is
>a much stronger motivator than, "it's nonstandard and confusing".  ;-)
>
>Did you have a specific rationale for your choice?  I mean, what did you want
>to gain or avoid by the change?

The confusion I had was on "overlapping files", since that doesn't have a
clear meaning to me.  I'm happy to use the stronger language you prefer; maybe
you can work both texts into something even better!

>The problem with this example is that it gives the impression that .ns files
>are named for packages, instead of being named for distributions.  So, I went
>with a more detailed and explict example.

The example you posted in the other thread looks great.

>> >This new method is called just before the importer's ``find_module()``
>> >is normally invoked.  If the importer determines that `fullname` is
>> >a namespace package portion under its jurisdiction, then the importer
>> >returns an importer-specific path to that namespace portion.
>>
>>Please define exactly what ``fullname`` is.
>
>Ugh.  Do I have to?  ;-)
>
>Will it work if I just change that to "just before the importer's
>``find_module(fullname)`` is normally invoked", so it's more clearly implied?

Sure, that'll work.

>> >Standard Library Changes/Additions
>> >----------------------------------
>> >
>> >The ``pkgutil`` module should be updated to handle this
>> >specification appropriately, including any necessary changes to
>> >``extend_path()``, ``iter_modules()``, etc.  A new generic API for
>> >calling ``namespace_subpath()`` on importers should be added as well.
>>
>>Is there any reason not to put extend_path() on the road to deprecation?
>
>I don't know.  Is there?  As I said, I considered that an open question.

I think we should.

>> >Specifically the proposed changes and additions are:
>> >
>> >* A new ``namespace_subpath(importer, fullname)`` generic, allowing
>> >   implementations to be registered for existing importers.
>>
>>Is this the registration mechanism?
>
>Registration for what?  I meant that this is analogous to other pkgutil
>generic functions that let you call a PEP 302 extension protocol on an
>importer, whether or not the importer directly implements that protocol.  For
>example, pkgutil.iter_importer_modules() is a generic function that lets you
>ask an importer to iterate over available modules, whether it actually
>implements its own "iter_modules()" method or not.  The
>pkgutil.namespace_subpath() function would do the same for the
>(possibly-absent) namespace_subpath() method on existing importers, and allow
>third parties to register namespace support for custom importers that can't
>be directly modified to support namespace packages.
>
>Any thoughts on how better to word that bit, without necessarily going into
>that much detail?  ;-)

I guess part of the problem is that generics like iter_importer_modules()
isn't actually documented in pkgutils, no (in Python 2.7), even included in
the __all__.  So you can't just say something like:

* A new ``namespace_subpath(importer, fullname)`` generic, analogous to other
  existing generics in the pkgutil package.

or maybe you can when you file a bug to get the existing ones
documented. <wink>.

>We won't be opening the files at all, so the contents will be ignored.

Does your rewrite make that explicit?

I'd like to either have a strong recommendation for the file being empty, or
specify the syntax we'd likely support in any extension to PEP 382.  In all
likelihood that would be to ignore lines with only whitespace, or that begin
with a `#`.

>>I'd be a little more forceful; the PEP should strongly recommend against
>>including namespace package __init__.py files.
>
>As I said, it's controversial.  Some people really want those __init__
>modules, and setuptools sort-of supports them now.  I can make it a bit more
>forceful, though.

I think you're rewrite looked good here.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/5dd292e7/attachment-0001.pgp>

From brett at python.org  Sun Jul 10 23:04:21 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 10 Jul 2011 14:04:21 -0700
Subject: [Import-SIG] PEP 382: Partial packages (was: Is ".ns" really
 the right extension?)
In-Reply-To: <20110710163421.3b086452@resist>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <20110710163421.3b086452@resist>
Message-ID: <CAP1=2W7JC36=Xx0S+fgCMPOirQ+Oks8mDdJjAf14g82UREjvpA@mail.gmail.com>

On Sun, Jul 10, 2011 at 13:34, Barry Warsaw <barry at python.org> wrote:

> On Jul 10, 2011, at 07:27 PM, Martin v. L?wis wrote:
>
> >- the feature defined in PEP 382 is called "partial package",
> >  indicating that the entire package may be more than that.
> >  "package portion" could work as well, as could "component
> >  package" or "package component"; "partial package has the
> >  advantage of raising associations with C#'s "partial classes"
> >  which are esstentially the same feature (but on a class level).
> >- the extension is ".pyp", for "Python Package"
>
> I like "package portions" and .pyp.  "partial packages" would be okay, but
> to
> me it puts the emphasis in the wrong place (i.e in what's missing rather
> than
> what's present).


I agree with Barry on this one. "Partial package" makes me think that
something is explicitly missing from the package and that it may not work
until all the partial bits of the package are gathered.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/e45b6340/attachment.html>

From barry at python.org  Sun Jul 10 23:05:00 2011
From: barry at python.org (Barry Warsaw)
Date: Sun, 10 Jul 2011 17:05:00 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110709185310.280C33A404D@sparrow.telecommunity.com>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>
	<CADiSq7cAnaPwo8+uK3tza6RnLp8Hsf0FOP8Ej=zwRAw9ugZqfA@mail.gmail.com>
	<20110709185310.280C33A404D@sparrow.telecommunity.com>
Message-ID: <20110710170500.35d5b20f@resist>

On Jul 09, 2011, at 02:52 PM, P.J. Eby wrote:

>Yeah...  Honestly the more we talk about all that the more inclined I am to
>saying that it's a zero-length file, just to avoid more spec detail.  ;-)
>
>I just don't know if there are any issues with packaging or revision control
>systems for zero length files, not to mention whether there are OSes where
>it's hard to make a zero-length file.

Well, put it in the PEP and we'll find out! :)

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/8fbd672b/attachment.pgp>

From eric at trueblade.com  Sun Jul 10 23:14:04 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Sun, 10 Jul 2011 17:14:04 -0400
Subject: [Import-SIG] New draft revision for PEP 382
In-Reply-To: <20110710170500.35d5b20f@resist>
References: <20110708195157.335043A404D@sparrow.telecommunity.com>	<CADiSq7cAnaPwo8+uK3tza6RnLp8Hsf0FOP8Ej=zwRAw9ugZqfA@mail.gmail.com>	<20110709185310.280C33A404D@sparrow.telecommunity.com>
	<20110710170500.35d5b20f@resist>
Message-ID: <4E1A161C.8090001@trueblade.com>

On 7/10/2011 5:05 PM, Barry Warsaw wrote:
> On Jul 09, 2011, at 02:52 PM, P.J. Eby wrote:
>
>> Yeah...  Honestly the more we talk about all that the more inclined I am to
>> saying that it's a zero-length file, just to avoid more spec detail.  ;-)
>>
>> I just don't know if there are any issues with packaging or revision control
>> systems for zero length files, not to mention whether there are OSes where
>> it's hard to make a zero-length file.
>
> Well, put it in the PEP and we'll find out! :)

I definitely think it should be a zero length file. It's the one thing 
we can check having already done a stat call, and without opening the file.

Eric.

From martin at v.loewis.de  Sun Jul 10 23:55:22 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 10 Jul 2011 23:55:22 +0200
Subject: [Import-SIG] Is ".ns" really the right extension?
In-Reply-To: <20110710163210.03f62155@resist>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<20110710163210.03f62155@resist>
Message-ID: <4E1A1FCA.1010906@v.loewis.de>

> I don't particularly like .contrib, and I saw in a followup that someone
> proposed .pyp.  I'd be fine with that, but if you want something more
> descriptive (i.e. longer), then .portion works for me.

I'd rather avoid something longer. It's a long-time convention at least
on Windows to use no more than three characters for file extensions.

Regards,
Martin

From martin at v.loewis.de  Sun Jul 10 23:57:46 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 10 Jul 2011 23:57:46 +0200
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <4E19E867.4020703@trueblade.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>	<4E19E101.4020508@v.loewis.de>
	<4E19E867.4020703@trueblade.com>
Message-ID: <4E1A205A.3010004@v.loewis.de>

> Partial package works for me. I too like the association with partial
> classes. ".pyp" is okay, although I'd avoid saying it stands for "Python
> Package", since the presence of the file is not what makes this code a
> package, it makes it a partial package.

No - it is actually what makes it a package. There are two ways to
declare a package: either put an __init__.py into the directory, or
a .pyp file.

Regards,
Martin

From pje at telecommunity.com  Mon Jul 11 00:30:27 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sun, 10 Jul 2011 18:30:27 -0400
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <4E1A205A.3010004@v.loewis.de>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
Message-ID: <20110710223044.E77943A4100@sparrow.telecommunity.com>

At 11:57 PM 7/10/2011 +0200, Martin v. L?wis wrote:
> > Partial package works for me. I too like the association with partial
> > classes. ".pyp" is okay, although I'd avoid saying it stands for "Python
> > Package", since the presence of the file is not what makes this code a
> > package, it makes it a partial package.
>
>No - it is actually what makes it a package. There are two ways to
>declare a package: either put an __init__.py into the directory, or
>a .pyp file.

It's too bad that (for backward compatibility reasons) we can't just 
use the presence of any importable file to signify this, as is the 
norm for Java, Perl, PHP, etc.  (AFAIK, all of them have namespacey 
packages by default.)

In any case, I agree with Barry and Brett that "partial packages" 
conveys the wrong impression, as it puts emphasis on what is missing 
rather than what is there.

I want to suggest alternatives such as "compilation package" or some 
such to indicate that the package is a compilation of contributions, 
but that sounds like it's going to be compiled to assembly code or 
something.  ;-)

Frankly, though, I have no strong motivation to change the name; I'd 
honestly rather drop __init__ support as it's technically difficult 
and an invitation to problems anyway.  ;-)

I'm okay with some bikeshedding on the file extension, but unless 
somebody really comes up with a truly *excellent* replacement for 
"namespace package", I don't see much point to changing it.

I will go ahead and throw in a few ideas, none of which I think are 
necessarily *excellent*, but which seem like they might work:

  * multipart packages (packages that can be divided into separately 
installed/distributed parts)
  * package families (a group of packages that share a "family name")
  * organization packages (package whose purpose is to organize other 
packages, and/or indicate organizational authorship)
  * partitioned packages (packages that can be divided into 
separately installed/distributed parts)

Thoughts?

(Oh, btw, I'm a long-time Windows user and I see zero technical or 
cultural problems with having a longer-than-three extension.  It's 
increasingly common to see apps using them; even Microsoft now has 
'.docx' files in Office.  So, for the first and last naming schemes 
above I would lean towards ".pypart" as the extension.)


From ncoghlan at gmail.com  Mon Jul 11 02:39:04 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Jul 2011 10:39:04 +1000
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <20110710223044.E77943A4100@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7eC3NOZOTJ4x1LGtemCHLwuW4Farex2Eau+aeVq3oNrvw@mail.gmail.com>

On Mon, Jul 11, 2011 at 8:30 AM, P.J. Eby <pje at telecommunity.com> wrote:
> I'm okay with some bikeshedding on the file extension, but unless somebody
> really comes up with a truly *excellent* replacement for "namespace
> package", I don't see much point to changing it.
>
> I will go ahead and throw in a few ideas, none of which I think are
> necessarily *excellent*, but which seem like they might work:
>
> ?* multipart packages (packages that can be divided into separately
> installed/distributed parts)
> ?* package families (a group of packages that share a "family name")
> ?* organization packages (package whose purpose is to organize other
> packages, and/or indicate organizational authorship)
> ?* partitioned packages (packages that can be divided into separately
> installed/distributed parts)
>
> Thoughts?

FWIW, +1 on "partitioned packages" as the term and either .pyp or
.pypart as the extension.

Why do I like partitioned packages?

1. It correctly emphasises the real purpose of this kind of package:
allowing a single namespace at the Python level to be cleanly split
into multiple partitions at the file distribution level. "namespace
packages" fails on this count.
2. It makes it clear that any given *piece* of the package can only be
correctly provided by one partition, as anything else results in a
collision within the package namespace. This is the only suggested
term that really conveys this aspect at all.
3. It doesn't have the same connotations of incompleteness that
plagues "partial packages"
4. It makes it clear that this is still just one package at the Python
level, which a term like "package families" would obscure.
5. It is agnostic as to the reasons *why* developers might want to
partition the namespace, whereas something like "organization
packages" assumes a great deal about how they will be used in
practice.

Regards,
Nick.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From pje at telecommunity.com  Mon Jul 11 04:34:45 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sun, 10 Jul 2011 22:34:45 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
Message-ID: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>

I think one reason we're having trouble with naming and explaining 
this whole concept is that, really, the current Python import system 
is broken, compared to other languages.

In at least Perl, PHP, and Java, you don't have to do anything 
special to merge components in a single namespace from multiple parts 
of the class/include/autoload path.  We are thus having trouble 
trying to come up with a special name to describe these, when from a 
more objective perspective, what we are describing are "normal 
packages", with what Python has now being "restricted to a single 
directory packages".

It's for this reason that all packages being namespaces doesn't 
bother me for the term.  All packages *should* be namespace packages, 
pretty much.  It's the *non* namespaceyness of Python's default 
packages that's broken, not the term.  ;-)

If there really was a time machine, I like to think we'd go back and 
get Python's package import mechanism to just work this way from the 
outset (i.e. always combining shards across sys.path), and perhaps 
use the presence of .py[cod]/.so files as an indication of 
package-ness -- if indeed an indication is needed at all.

Actually...  here's an interesting idea.  Suppose that we define the 
rules so that any directory containing any file with an importable 
extension is a namespace package...  *but*, if one of those 
directories contains an __init__ module, that directory will be 
placed first on the package __path__.

See, the reason why dropping the need for __init__ was previously 
rejected was because it meant you could block the importing of a 
package later on the path.  *But*, if we always put the segment with 
__init__ first on the __path__, then any such blocking directories 
would not block the "real" package -- they'd just be accessible for imports.

If we did that, then there would be no need for any special flag 
files, and no need for special terminology.  The protocol in my draft 
would remain basically the same, except for moving the __init__ 
module's subpath to the front of __path__.  And instead of globbing 
for *.pypart or whatever, importers would just check whether there 
was a directory there at all.

The only backward compatibility that this can break is that you can 
import things you couldn't import before.  So, if you had a 
foo/bar.py, with 'foo' in a sys.path directory, and you also had a 
'foo' package, AND you relied on 'import foo.bar' raising an error, 
then it would no longer do so.  But, if you *had* a foo.bar module 
before, then under this scheme, 'import foo.bar' would still import 
the exact same file it did before, so nothing changes.

In other words, the first subdirectory with an __init__ gets to head 
up the new package's __path__, but ALL matching subdirectories will 
make up the tail.

The big advantage of this approach is that it doesn't require us to 
have a special name - it's just, "Enhanced Package Imports" or some 
such.  No special marker files to name, either.  Just, "hey, people 
want to put their package contents in more than one directory, and 
they don't always need an __init__.py."

Thoughts, anyone?


From pje at telecommunity.com  Mon Jul 11 05:10:09 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sun, 10 Jul 2011 23:10:09 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
Message-ID: <20110711031033.55D1F3A4100@sparrow.telecommunity.com>

At 10:34 PM 7/10/2011 -0400, P.J. Eby wrote:
>The big advantage of this approach is that it doesn't require us to 
>have a special name - it's just, "Enhanced Package Imports" or some 
>such.  No special marker files to name, either.  Just, "hey, people 
>want to put their package contents in more than one directory, and 
>they don't always need an __init__.py."
>
>Thoughts, anyone?

A quick follow-up; I found a thread where something vaguely similar 
was discussed before:

   http://mail.python.org/pipermail/python-dev/2006-April/064400.html

Various issues regarding tool support were brought up, mainly that 
existing tools would not detect such packages as packages, and that 
doing this at the top level was problematic because of the 
possibility of blocking a module like 'string' or 'time' or some such.

However, as it happens, with a slight adjustment to what I proposed, 
that latter issue can be addressed...  if *any* loadable module 
anywhere on sys.path (vs. just a directory with an __init__) simply 
gets all the subpaths appended to its __path__, then having a "time" 
directory just gets it added to time.__path__ -- and the plain old 
__time__ module still gets loaded.

Tool support isn't actually as much affected by my revised approach 
either, since if you don't intend a directory to be a package, you're 
not importing it.  If you have a directory and your tool *doesn't* 
recognize it as a package, well, that's an issue of the tool adding 
support for namespace packages.  Likewise, if you have a module or 
package that's working today, all that happens is that it grows a 
__path__ and has sub-imports possible.

It does seem that the previous discussion was rather controversial, 
even though only sub-packages were being discussed.  OTOH, the change 
really *was* a change, and my proposal doesn't change the existing 
behavior (apart from some occasional __path__ attributes appearing 
where they didn't before), it only adds to it.


From ncoghlan at gmail.com  Mon Jul 11 05:16:51 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Jul 2011 13:16:51 +1000
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>

On Mon, Jul 11, 2011 at 12:34 PM, P.J. Eby <pje at telecommunity.com> wrote:
> The big advantage of this approach is that it doesn't require us to have a
> special name - it's just, "Enhanced Package Imports" or some such. ?No
> special marker files to name, either. ?Just, "hey, people want to put their
> package contents in more than one directory, and they don't always need an
> __init__.py."

It does mean that the pkgutil changes to handle sys.path extensions
will need to scan sys.modules looking for packages (i.e. modules with
__path__ attributes) rather than the more limited subset that would
have been stored in sys.partitioned_packages (although not adding
extra global state is actually a win in my book).

Removing the need for __init__.py as a package marker would also
eliminate quite a lot of newbie confusion when it comes to using
packages.

However, I think the explicitly partitioned package approach is going
to be an easier sell, as it's *obvious* that it won't break existing
code. While examples of existing code that will break under a
"partitioned by default" model are going to be hypothetical and
contrived, they're also pretty easy to come up with.

There's also a performance impact on app startup time - currently most
package imports stop as soon as they hit a matching directory. Under a
"partitioned by default" scheme, all package imports (including things
like "logging" and "email" which currently get a hit in the first zip
file for the standard library) would have to scan the entirety of
sys.path just in case there are additional shards lying around. For
large applications, that additional overhead is going to add up.

So I don't think implicit partitioning is really going to fly at this
point. That said, I wouldn't oppose tweaking the partitioned package
design to eventually support dropping the requirement for explicit
".pyp(art)" files (i.e. by always placing the directory with
__init__.py at the head of the list).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From waterbug at pangalactic.us  Mon Jul 11 05:07:34 2011
From: waterbug at pangalactic.us (Stephen Waterbury)
Date: Sun, 10 Jul 2011 23:07:34 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
Message-ID: <4E1A68F6.9070004@pangalactic.us>

On 07/10/2011 10:34 PM, P.J. Eby wrote:
> I think one reason we're having trouble with naming and explaining this
> whole concept is that, really, the current Python import system is
> broken, compared to other languages.

That seems an important consideration, at least because of the
negative perception it presents to programmers coming from other
languages ...

> ... All packages *should* be namespace packages, pretty
> much. It's the *non* namespaceyness of Python's default packages that's
> broken, not the term. ;-)

Novel to a Python programmer, perhaps even "revolutionary",
but logical on the face of it.

> Actually... here's an interesting idea. Suppose that we define the rules
> so that any directory containing any file with an importable extension
> is a namespace package... *but*, if one of those directories contains an
> __init__ module, that directory will be placed first on the package
> __path__.
>
> If we did that, then there would be no need for any special flag files ...

I like losing the flag files -- nice!

> The only backward compatibility that this can break is that you can
> import things you couldn't import before. ...

... which seems like no breakage at all, really.

> In other words, the first subdirectory with an __init__ gets to head up
> the new package's __path__, but ALL matching subdirectories will make up
> the tail.
>
> The big advantage of this approach is that it doesn't require us to have
> a special name - it's just, "Enhanced Package Imports" or some such. No
> special marker files to name, either. Just, "hey, people want to put
> their package contents in more than one directory, and they don't always
> need an __init__.py."
>
> Thoughts, anyone?

I like it very much:  it seems elegant and minimalist.

To put my comments in context, I am a non-implementor and non-guru,
but also a Python old-timer, who wants to use "partitioned packages"
and would like to see this done right.  I.e., this is input from the
peanut gallery.  ;)

Cheers,
Steve

From pje at telecommunity.com  Mon Jul 11 05:57:06 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sun, 10 Jul 2011 23:57:06 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.g
	mail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
Message-ID: <20110711035731.C012E3A4100@sparrow.telecommunity.com>

At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote:
>There's also a performance impact on app startup time - currently most
>package imports stop as soon as they hit a matching directory. Under a
>"partitioned by default" scheme, all package imports (including things
>like "logging" and "email" which currently get a hit in the first zip
>file for the standard library) would have to scan the entirety of
>sys.path just in case there are additional shards lying around. For
>large applications, that additional overhead is going to add up.

Darn, I missed that.  That kills the idea pretty much dead right 
there, as it means ALL imports are massively slowed down.  Crap.


>So I don't think implicit partitioning is really going to fly at this
>point. That said, I wouldn't oppose tweaking the partitioned package
>design to eventually support dropping the requirement for explicit
>".pyp(art)" files (i.e. by always placing the directory with
>__init__.py at the head of the list).

Nah, I don't think there's much point to that.

I'm noticing, though, that the more I hear "partitioned package", the 
less I like it, and the more I wish I hadn't proposed it.  ;-)

It's fundamentally wrong, because (e.g.) peak.util is *not* a single 
thing that's been partitioned, *even though* it started out that way.

It's just a bunch of things with a common namespace, and ISTM the 
name *really* ought to reflect that.

Common namespace packages?  Shared namespace packages?  Surname packages?  ;-)


From ncoghlan at gmail.com  Mon Jul 11 06:22:54 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Jul 2011 14:22:54 +1000
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711035731.C012E3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7f_KU=0N1Yme7CxZRimJL8FCwYkZtB3aOm2P1ej5N=05g@mail.gmail.com>

On Mon, Jul 11, 2011 at 1:57 PM, P.J. Eby <pje at telecommunity.com> wrote:
> I'm noticing, though, that the more I hear "partitioned package", the less I
> like it, and the more I wish I hadn't proposed it. ?;-)
>
> It's fundamentally wrong, because (e.g.) peak.util is *not* a single thing
> that's been partitioned, *even though* it started out that way.
>
> It's just a bunch of things with a common namespace, and ISTM the name
> *really* ought to reflect that.
>
> Common namespace packages? ?Shared namespace packages? ?Surname packages?
> ?;-)

As soon as you have a flat namespace, you need to be careful to
partition it correctly. We run into that fairly often with namespace
collisions at the top level - failures of partitioning because (for
example) a user decided to call their "experimenting with sockets"
file "socket.py".

So the "partitioned package" naming is a developer oriented view
pointing out that hey, you're putting these files into a namespace
shared with other people so think about the implications that may have
for the name you choose (just as you would for a top-level package or
module name or for a script symlink that is going to be placed into
/usr/bin).

>From a *user* point of view, they shouldn't care whether a package is
partitioned or not - they'll just be treating it like an ordinary
package, since (in theory) you can't tell the difference without
poking around inside __path__.

There may be a slight implication that all the partitions came from a
single source that has been split up, but I don't think the single
source implications are strong enough to invalidate the term. Anything
that gets put into "peak.util" is going to relate to PEAK in *some*
fashion, even if it isn't distributed as part of PEAK itself.

Certainly, I haven't seen anything else suggested that comes close to
this one for accuracy and mnemonic value.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From pje at telecommunity.com  Mon Jul 11 06:39:04 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 00:39:04 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711035731.C012E3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
Message-ID: <20110711043932.22F8B3A4100@sparrow.telecommunity.com>

At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote:
>At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote:
>>There's also a performance impact on app startup time - currently most
>>package imports stop as soon as they hit a matching directory. Under a
>>"partitioned by default" scheme, all package imports (including things
>>like "logging" and "email" which currently get a hit in the first zip
>>file for the standard library) would have to scan the entirety of
>>sys.path just in case there are additional shards lying around. For
>>large applications, that additional overhead is going to add up.
>
>Darn, I missed that.  That kills the idea pretty much dead right 
>there, as it means ALL imports are massively slowed down.  Crap.

Hrm.  I just realized WHY I missed it.  I was thinking that we'd only 
do that in the case where you *first* find a namespace.  IOW, I was 
proposing to only change the semantics in the case where a suitable 
directory is found on sys.path *before* the normal package or 
module.  IOW, the semantics I was thinking of were:

  * Scan sys.path, keeping track of any subpaths found
  * If you hit a module with no subpaths found before it, import and finish
  * Otherwise, if you hit a subpath first, accumulate all subpaths 
and tack them on a module or package
  * If the matching module was a package __init__, move its subpath 
to the beginning of the list

But I agree that it's an upward climb to sell this approach.  For 
example, it means that you can have code later on sys.path affect 
code that's earlier, which seems wrong and a tad unsafe.

I wish we had a way to do this that didn't require special files, and 
still allowed us to have package names be plain directory names, and 
didn't break distutils installation processes.  (Distutils can 
install submodules without a package __init__ being included, but 
apart from that it forces installed directory structure to match 
package name structure.)

Okay, I have an idea.

Suppose that we reserve a special directory name, like 'pypkg'.  And, 
if a sys.path directory contains a 'py-pkg' subdirectory, then any 
directory in that directory (recursively) is a package following 
__path__-assembly semantics.

So, in order to enable new import semantics, you have to install your 
code to a 'py-pkg' directory under a regular sys.path 
directory...  that's the only catch.

*However*, because the distutils actually let you install packages 
without __init__ modules, you can trick them into installing your 
otherwise-normal package this way, by the simple expedient of telling 
the distutils your package name is 'py-pkg.foo' instead of 'foo'.

(Note: this is only a hack for 2.x, and setuptools will probably be 
doing the dirty work of making distutils do this anyway "under the 
hood".  For 3.x, we can hopefully assume that the 'packaging' folks 
will enable doing this in a somewhat saner way.)

Anyway, revising the ongoing example to add the directory and drop 
the flag files, we get:

     ProxyTypes-0.9.tgz:
         py-pkg/peak/util/proxies.py

     Importing-1.10.tgz:
         py-pkg/peak/util/imports.py

or (combined):

     site-packages/   (or wherever)
         py-pkg/
             peak/
                 util/
                     imports.py
                     proxies.py
             zope/
             ...

This approach solves several problems at once:

  1. No flag files
  2. Faster imports (stat instead of listdir)
  3. Directory clearly identified as containing python packages
  4. No need for a special name, these are just regular packages with 
enhanced import semantics
  5. Distutils can still install it

Minor downsides:

  * Flat is better than nested
  * Existing code has to move to take advantage (unless you're not 
going to import the code without installing it, in which case you can 
just tweak your setup.py and not actually move anything)

Thoughts?


From brett at python.org  Mon Jul 11 06:49:17 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 10 Jul 2011 21:49:17 -0700
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711043932.22F8B3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
Message-ID: <CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>

On Sun, Jul 10, 2011 at 21:39, P.J. Eby <pje at telecommunity.com> wrote:

> At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote:
>
>> At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote:
>>
>>> There's also a performance impact on app startup time - currently most
>>> package imports stop as soon as they hit a matching directory. Under a
>>> "partitioned by default" scheme, all package imports (including things
>>> like "logging" and "email" which currently get a hit in the first zip
>>> file for the standard library) would have to scan the entirety of
>>> sys.path just in case there are additional shards lying around. For
>>> large applications, that additional overhead is going to add up.
>>>
>>
>> Darn, I missed that.  That kills the idea pretty much dead right there, as
>> it means ALL imports are massively slowed down.  Crap.
>>
>
> Hrm.  I just realized WHY I missed it.  I was thinking that we'd only do
> that in the case where you *first* find a namespace.  IOW, I was proposing
> to only change the semantics in the case where a suitable directory is found
> on sys.path *before* the normal package or module.  IOW, the semantics I was
> thinking of were:
>
>  * Scan sys.path, keeping track of any subpaths found
>  * If you hit a module with no subpaths found before it, import and finish
>  * Otherwise, if you hit a subpath first, accumulate all subpaths and tack
> them on a module or package
>  * If the matching module was a package __init__, move its subpath to the
> beginning of the list
>
> But I agree that it's an upward climb to sell this approach.  For example,
> it means that you can have code later on sys.path affect code that's
> earlier, which seems wrong and a tad unsafe.
>
> I wish we had a way to do this that didn't require special files, and still
> allowed us to have package names be plain directory names, and didn't break
> distutils installation processes.  (Distutils can install submodules without
> a package __init__ being included, but apart from that it forces installed
> directory structure to match package name structure.)
>
> Okay, I have an idea.
>
> Suppose that we reserve a special directory name, like 'pypkg'.  And, if a
> sys.path directory contains a 'py-pkg' subdirectory, then any directory in
> that directory (recursively) is a package following __path__-assembly
> semantics.
>
> So, in order to enable new import semantics, you have to install your code
> to a 'py-pkg' directory under a regular sys.path directory...  that's the
> only catch.
>
> *However*, because the distutils actually let you install packages without
> __init__ modules, you can trick them into installing your otherwise-normal
> package this way, by the simple expedient of telling the distutils your
> package name is 'py-pkg.foo' instead of 'foo'.
>
> (Note: this is only a hack for 2.x, and setuptools will probably be doing
> the dirty work of making distutils do this anyway "under the hood".  For
> 3.x, we can hopefully assume that the 'packaging' folks will enable doing
> this in a somewhat saner way.)
>
> Anyway, revising the ongoing example to add the directory and drop the flag
> files, we get:
>
>    ProxyTypes-0.9.tgz:
>        py-pkg/peak/util/proxies.py
>
>    Importing-1.10.tgz:
>        py-pkg/peak/util/imports.py
>
> or (combined):
>
>    site-packages/   (or wherever)
>        py-pkg/
>            peak/
>                util/
>                    imports.py
>                    proxies.py
>            zope/
>            ...
>
> This approach solves several problems at once:
>
>  1. No flag files
>  2. Faster imports (stat instead of listdir)
>  3. Directory clearly identified as containing python packages
>  4. No need for a special name, these are just regular packages with
> enhanced import semantics
>  5. Distutils can still install it
>
> Minor downsides:
>
>  * Flat is better than nested
>  * Existing code has to move to take advantage (unless you're not going to
> import the code without installing it, in which case you can just tweak your
> setup.py and not actually move anything)
>

I prefer going with a specifically named file if for any other reason than
there will be less broken tools. By shifting everything into a subdirectory
you prevent any pre-existing code that scans sys.path from doing anything.
But with the special file approach you don't break those tools in the case
of when you didn't have some package fragment farther down sys.path. Plus
you can also use a specially named file instead of allowing for any file
name with a specific file ending to achieve the same result (e.g., py.pkg or
__init__.part).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/632b50ab/attachment.html>

From brett at python.org  Mon Jul 11 06:51:13 2011
From: brett at python.org (Brett Cannon)
Date: Sun, 10 Jul 2011 21:51:13 -0700
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711031033.55D1F3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<20110711031033.55D1F3A4100@sparrow.telecommunity.com>
Message-ID: <CAP1=2W7Kkt8RkvPLNw9O5ngQtHWPOz_jZwxD-u22gv8yY3mGCA@mail.gmail.com>

On Sun, Jul 10, 2011 at 20:10, P.J. Eby <pje at telecommunity.com> wrote:

> At 10:34 PM 7/10/2011 -0400, P.J. Eby wrote:
>
>> The big advantage of this approach is that it doesn't require us to have a
>> special name - it's just, "Enhanced Package Imports" or some such.  No
>> special marker files to name, either.  Just, "hey, people want to put their
>> package contents in more than one directory, and they don't always need an
>> __init__.py."
>>
>> Thoughts, anyone?
>>
>
> A quick follow-up; I found a thread where something vaguely similar was
> discussed before:
>
>  http://mail.python.org/**pipermail/python-dev/2006-**April/064400.html<http://mail.python.org/pipermail/python-dev/2006-April/064400.html>
>
> Various issues regarding tool support were brought up, mainly that existing
> tools would not detect such packages as packages, and that doing this at the
> top level was problematic because of the possibility of blocking a module
> like 'string' or 'time' or some such.
>
> However, as it happens, with a slight adjustment to what I proposed, that
> latter issue can be addressed...  if *any* loadable module anywhere on
> sys.path (vs. just a directory with an __init__) simply gets all the
> subpaths appended to its __path__, then having a "time" directory just gets
> it added to time.__path__ -- and the plain old __time__ module still gets
> loaded.
>

I didn't read the thread, but I don't get the worry here. A 'time' package
already will shadow a 'time' module if it is farther up sys.path, so this
proposal in any of its current forms won't change that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/3b8b7157/attachment-0001.html>

From pje at telecommunity.com  Mon Jul 11 07:18:31 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 01:18:31 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.g
	mail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
Message-ID: <20110711051855.484273A4100@sparrow.telecommunity.com>

At 09:49 PM 7/10/2011 -0700, Brett Cannon wrote:
>I prefer going with a specifically named file if for any other 
>reason than there will be less broken tools. By shifting everything 
>into a subdirectory you prevent any pre-existing code that scans 
>sys.path from doing anything. But with the special file approach you 
>don't break those tools in the case of when you didn't have some 
>package fragment farther down sys.path.

I'm not sure I follow you.  The approach we're explicitly 
recommending for new namespaces is to *not* use an __init__, so the 
tools will still fail unless they're updated.

What you're saying is that in some cases, these tools will 
accidentally *seem* to work under the flag-file proposal, but will 
only see the contents of the first portion on sys.path.  IOW, I don't 
think that you can claim that tools won't be broken by a flag file 
approach, or even that they'll really be *less* broken than by a 
subdirectory approach.

(Also, if tools are using pkgutil's module traversal API, they won't 
have a problem, as it will be updated to match the import semantics 
-- and this should provide tool authors an incentive to start using 
that API, if they're not already doing so.)


>  Plus you can also use a specially named file instead of allowing 
> for any file name with a specific file ending to achieve the same 
> result (e.g., py.pkg or __init__.part).

I'm not quite following you here; it sounds like you're talking about 
a single fixed filename, which won't work for the reasons described 
in the "rejected alternatives" section at the end of this draft:

    http://mail.python.org/pipermail/import-sig/2011-July/000213.html

That draft proposed "DistributionName.ns" as the flag file naming 
pattern, and recent discussion has proposed .pypart or .pyp as 
alternate extensions.

The present thread ("what if namespaces aren't special?") is an 
experiment to see if we could find a way to dispense with flag files 
altogether, thereby simplifying the terminology and usage, as well as 
saving us a listdir call or two.


From martin at v.loewis.de  Mon Jul 11 08:53:21 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 11 Jul 2011 08:53:21 +0200
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <20110710223044.E77943A4100@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
Message-ID: <4E1A9DE1.1090809@v.loewis.de>

>> No - it is actually what makes it a package. There are two ways to
>> declare a package: either put an __init__.py into the directory, or
>> a .pyp file.
> 
> It's too bad that (for backward compatibility reasons) we can't just use
> the presence of any importable file to signify this, as is the norm for
> Java, Perl, PHP, etc.

I'm not sure I understand:
- in Java, a package is not an importable file, but a directory,
  just as in Python. The major differences are:
  * empty directories count as packages as well; they just have to
    be on the CLASSPATH
  * you can't import packages in Java - you can only import classes
- in PHP, namespaces and files are completely unrelated:
  http://php.net/manual/en/language.namespaces.php
  The files you want to use are passed to "include". include takes
  file names, not namespace names. Only after including the file,
  PHP finds out what namespace the stuff is in it imported.
- in Perl, the parent package and the subpackages appear unrelated.
  The parent package is a file "foo.pm"; the subpackages are files
  in a folder "foo"; in addition, each module needs to declare
  its package (i.e. "package foo;" or "package foo::bar;"). This
  automatically makes "composite packages" possible (as the
  subpackages are just not considered "parts" of the parent package,
  AFAICT).

> (AFAIK, all of them have namespacey packages by default.)

Please stop calling this "composite" feature "namespacey".

http://en.wikipedia.org/wiki/Namespace

"In general, a namespace is a container that provides context for the
identifiers (names, or technical terms, or words) it holds, and allows
the disambiguation of homonym identifiers residing in different
namespaces[1]."

*All* Python packages are namespaces. What specific property of the
package mechanism do you mean when you say "namespacey"?

Regards,
Martin

From martin at v.loewis.de  Mon Jul 11 08:59:07 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 11 Jul 2011 08:59:07 +0200
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
Message-ID: <4E1A9F3B.9090208@v.loewis.de>

> In at least Perl, PHP, and Java, you don't have to do anything special
> to merge components in a single namespace from multiple parts of the
> class/include/autoload path.

Not true. In all three languages, you have to declare in the module what
package it belongs to. So there is something special to do.

> It's for this reason that all packages being namespaces doesn't bother
> me for the term.  All packages *should* be namespace packages, pretty
> much.  It's the *non* namespaceyness of Python's default packages that's
> broken, not the term.  ;-)

Python packages have been namespaces since day 1 (as are modules).

> Actually...  here's an interesting idea.  Suppose that we define the
> rules so that any directory containing any file with an importable
> extension is a namespace package...  *but*, if one of those directories
> contains an __init__ module, that directory will be placed first on the
> package __path__.

"... is a package" (not: "namespace package")

I'd go further: any directory with the package name could constitute
a portion of the package. With your approach, you'd need a file with
an importable extension in each portion of the "zope" package, right?

Regards,
Martin

From ericsnowcurrently at gmail.com  Mon Jul 11 09:04:14 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 11 Jul 2011 01:04:14 -0600
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711051855.484273A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
Message-ID: <CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>

On Sun, Jul 10, 2011 at 11:18 PM, P.J. Eby <pje at telecommunity.com> wrote:
>
> The present thread ("what if namespaces aren't special?") is an experiment
> to see if we could find a way to dispense with flag files altogether,
> thereby simplifying the terminology and usage, as well as saving us a
> listdir call or two.
>

Ultimately there has to be something to indicate it is a package and
that it is a partition (or whatever it's called).  There would be less
surprises if it followed the current pattern of having a file to
indicate packageness (currently only __init__.py fills this role).

FWIW, I think the solution in the PEP is the clearest approach, if
"partitioned by default" is not an option.  And if that and the other
alternate solutions are not feasible, it would be nice to have them
added to the "rejected" section because they are still reasonable
ideas.  Still, it would be nice if we didn't have to add a new
packageness indicator.

-eric

> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From ericsnowcurrently at gmail.com  Mon Jul 11 09:07:49 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 11 Jul 2011 01:07:49 -0600
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
Message-ID: <CALFfu7A1V1u7RQK-NRj0gocPwH8bAv3NHSo48YmgBq3jyr59nw@mail.gmail.com>

On Sun, Jul 10, 2011 at 8:34 PM, P.J. Eby <pje at telecommunity.com> wrote:
> I think one reason we're having trouble with naming and explaining this
> whole concept is that, really, the current Python import system is broken,
> compared to other languages.
>
> In at least Perl, PHP, and Java, you don't have to do anything special to
> merge components in a single namespace from multiple parts of the
> class/include/autoload path. ?We are thus having trouble trying to come up
> with a special name to describe these, when from a more objective
> perspective, what we are describing are "normal packages", with what Python
> has now being "restricted to a single directory packages".
>
> It's for this reason that all packages being namespaces doesn't bother me
> for the term. ?All packages *should* be namespace packages, pretty much.
> ?It's the *non* namespaceyness of Python's default packages that's broken,
> not the term. ?;-)
>
> If there really was a time machine, I like to think we'd go back and get
> Python's package import mechanism to just work this way from the outset
> (i.e. always combining shards across sys.path), and perhaps use the presence
> of .py[cod]/.so files as an indication of package-ness -- if indeed an
> indication is needed at all.
>
> Actually... ?here's an interesting idea. ?Suppose that we define the rules
> so that any directory containing any file with an importable extension is a
> namespace package... ?*but*, if one of those directories contains an
> __init__ module, that directory will be placed first on the package
> __path__.
>
> See, the reason why dropping the need for __init__ was previously rejected
> was because it meant you could block the importing of a package later on the
> path. ?*But*, if we always put the segment with __init__ first on the
> __path__, then any such blocking directories would not block the "real"
> package -- they'd just be accessible for imports.
>

> If we did that, then there would be no need for any special flag files, and
> no need for special terminology.

Would it be a problem to lose the filename that indicates where the
portion/partition came from?

-eric


> ?The protocol in my draft would remain
> basically the same, except for moving the __init__ module's subpath to the
> front of __path__. ?And instead of globbing for *.pypart or whatever,
> importers would just check whether there was a directory there at all.
>
> The only backward compatibility that this can break is that you can import
> things you couldn't import before. ?So, if you had a foo/bar.py, with 'foo'
> in a sys.path directory, and you also had a 'foo' package, AND you relied on
> 'import foo.bar' raising an error, then it would no longer do so. ?But, if
> you *had* a foo.bar module before, then under this scheme, 'import foo.bar'
> would still import the exact same file it did before, so nothing changes.
>
> In other words, the first subdirectory with an __init__ gets to head up the
> new package's __path__, but ALL matching subdirectories will make up the
> tail.
>
> The big advantage of this approach is that it doesn't require us to have a
> special name - it's just, "Enhanced Package Imports" or some such. ?No
> special marker files to name, either. ?Just, "hey, people want to put their
> package contents in more than one directory, and they don't always need an
> __init__.py."
>
> Thoughts, anyone?
>
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From ncoghlan at gmail.com  Mon Jul 11 09:32:17 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 11 Jul 2011 17:32:17 +1000
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
Message-ID: <CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>

On Mon, Jul 11, 2011 at 5:04 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> FWIW, I think the solution in the PEP is the clearest approach, if
> "partitioned by default" is not an option. ?And if that and the other
> alternate solutions are not feasible, it would be nice to have them
> added to the "rejected" section because they are still reasonable
> ideas. ?Still, it would be nice if we didn't have to add a new
> packageness indicator.

The runtime performance impact kills "partitioned by default" (i.e. no
marker files needed to indicate partitioned packages). Java doesn't
suffer from it since the cost is incurred at compile time, and I
believe there are differences in the way Perl and PHP work that make
it less of an issue there as well.

PJE's latest PEP update clearly articulates the semantics for a
"non-conflicting marker file" approach (modulo a name change to
<contributing-distribution>.pyp or .pypart instead of
<contributing-distribution>.ns).

Allowing unmarked directories to count as packages has already been
rejected in the past due to the problem of hiding package directories
later on sys.path. Given the performance penalty that rules out
"partitioned by default", this rejection remains in force.

One question then is whether, given that a partitioned package has
already been identified, should unmarked directories later on sys.path
count as part of that package? My answer is no, as this is the only
answer that provides consistent behaviour. Otherwise, unmarked
directories may or may not be detected as part of the package
depending on whether or not a partitioned package directory was found
earlier on the path.

As far as the specific suggestion of using a "marker directory"
instead of marker files goes, I don't really see the benefit (and
plenty of downsides). I put it in the same category as using a special
extension on the directory name (since that's what it is, only using
"/" as the separator instead of ".") and reject it for the same
reasons.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From barry at python.org  Mon Jul 11 16:19:51 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 11 Jul 2011 10:19:51 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711031033.55D1F3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<20110711031033.55D1F3A4100@sparrow.telecommunity.com>
Message-ID: <20110711101951.3a01f769@resist>

On Jul 10, 2011, at 11:10 PM, P.J. Eby wrote:

>However, as it happens, with a slight adjustment to what I proposed, that
>latter issue can be addressed...  if *any* loadable module anywhere on
>sys.path (vs. just a directory with an __init__) simply gets all the subpaths
>appended to its __path__, then having a "time" directory just gets it added
>to time.__path__ -- and the plain old __time__ module still gets loaded.

Does that mean I could add subpackage bits to existing modules without their
"knowledge"?  IOW, could I manage to add a time.foo subpackage that would be
importable?  I'd find that FAST (fascinating and stomach turning, to revive an
old meme :).

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110711/2fd610d5/attachment.pgp>

From barry at python.org  Mon Jul 11 16:23:55 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 11 Jul 2011 10:23:55 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711043932.22F8B3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
Message-ID: <20110711102355.7cbebacc@resist>

On Jul 11, 2011, at 12:39 AM, P.J. Eby wrote:

>Suppose that we reserve a special directory name, like 'pypkg'.  And, if a
>sys.path directory contains a 'py-pkg' subdirectory, then any directory in
>that directory (recursively) is a package following __path__-assembly
>semantics.

I'm not entirely sold on the idea, but I do have some lovely bikeshed paint.
*If* this idea were to pan out, I think __package__ would be a good directory
name.  Okay, it's not as unimportable itself as py-pkg, but it's still special
enough to have its semantics controlled by Python.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110711/49b51bc9/attachment.pgp>

From barry at python.org  Mon Jul 11 16:30:23 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 11 Jul 2011 10:30:23 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
Message-ID: <20110711103023.75ebfb75@resist>

On Jul 11, 2011, at 05:32 PM, Nick Coghlan wrote:

>One question then is whether, given that a partitioned package has
>already been identified, should unmarked directories later on sys.path
>count as part of that package? My answer is no, as this is the only
>answer that provides consistent behaviour. Otherwise, unmarked
>directories may or may not be detected as part of the package
>depending on whether or not a partitioned package directory was found
>earlier on the path.

This is my biggest concern.  While I think PJE's proposal has some appeal, I'm
worried that it will be very difficult to debug when things go wrong.  I'm
also concerned that introspection may not be possible without "going through
Python".

By this I mean, on a *nix system it would be very easy to identify all the
package portions on a file system with a simple `locate *.pyp`.  So I'm still
in favor of the marker files approach.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110711/870e6333/attachment.pgp>

From barry at python.org  Mon Jul 11 16:32:13 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 11 Jul 2011 10:32:13 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711035731.C012E3A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
Message-ID: <20110711103213.761540a7@resist>

On Jul 10, 2011, at 11:57 PM, P.J. Eby wrote:

>I'm noticing, though, that the more I hear "partitioned package", the less I
>like it, and the more I wish I hadn't proposed it.  ;-)
>
>It's fundamentally wrong, because (e.g.) peak.util is *not* a single thing
>that's been partitioned, *even though* it started out that way.
>
>It's just a bunch of things with a common namespace, and ISTM the name
>*really* ought to reflect that.
>
>Common namespace packages?  Shared namespace packages?  Surname packages?  ;-)

Does 'package portions' not fit the bill?

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110711/39332f61/attachment.pgp>

From barry at python.org  Mon Jul 11 16:44:52 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 11 Jul 2011 10:44:52 -0400
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <CADiSq7eC3NOZOTJ4x1LGtemCHLwuW4Farex2Eau+aeVq3oNrvw@mail.gmail.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
	<CADiSq7eC3NOZOTJ4x1LGtemCHLwuW4Farex2Eau+aeVq3oNrvw@mail.gmail.com>
Message-ID: <20110711104452.7f296e91@resist>

Another thought: what about calling these "fusion packages"?   The dictionary
definition of "fusion" does seem like a pretty good match for what's going on
here.

http://en.wiktionary.org/wiki/fusion

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110711/acdb0665/attachment.pgp>

From pje at telecommunity.com  Mon Jul 11 17:12:50 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 11:12:50 -0400
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <20110711104452.7f296e91@resist>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
	<CADiSq7eC3NOZOTJ4x1LGtemCHLwuW4Farex2Eau+aeVq3oNrvw@mail.gmail.com>
	<20110711104452.7f296e91@resist>
Message-ID: <20110711151322.122063A414B@sparrow.telecommunity.com>

At 10:44 AM 7/11/2011 -0400, Barry Warsaw wrote:
>Another thought: what about calling these "fusion packages"?   The dictionary
>definition of "fusion" does seem like a pretty good match for what's going on
>here.
>
>http://en.wiktionary.org/wiki/fusion

Hm.  The first definition on that page says, "The 
<http://en.wiktionary.org/wiki/merge>merging of similar or different 
<http://en.wiktionary.org/wiki/element>elements into a 
<http://en.wiktionary.org/wiki/union>union"...

So how about "union packages"?  ;-)


>-Barry
>
>
>_______________________________________________
>Import-SIG mailing list
>Import-SIG at python.org
>http://mail.python.org/mailman/listinfo/import-sig


From barry at python.org  Mon Jul 11 17:22:25 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 11 Jul 2011 11:22:25 -0400
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <20110711151322.122063A414B@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
	<CADiSq7eC3NOZOTJ4x1LGtemCHLwuW4Farex2Eau+aeVq3oNrvw@mail.gmail.com>
	<20110711104452.7f296e91@resist>
	<20110711151322.122063A414B@sparrow.telecommunity.com>
Message-ID: <20110711112225.0c02daf0@resist>

On Jul 11, 2011, at 11:12 AM, P.J. Eby wrote:

>At 10:44 AM 7/11/2011 -0400, Barry Warsaw wrote:
>>Another thought: what about calling these "fusion packages"?   The dictionary
>>definition of "fusion" does seem like a pretty good match for what's going on
>>here.
>>
>>http://en.wiktionary.org/wiki/fusion
>
>Hm.  The first definition on that page says, "The <http://en.wiktionary.org/wiki/merge>merging of similar or different <http://en.wiktionary.org/wiki/element>elements into a <http://en.wiktionary.org/wiki/union>union"...
>
>So how about "union packages"?  ;-)

I thought about that too, but i liked the sound of "fusion" better. :)

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110711/f9e4ad4b/attachment-0001.pgp>

From pje at telecommunity.com  Mon Jul 11 17:25:40 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 11:25:40 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.g
	mail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
Message-ID: <20110711152605.8A3083A4100@sparrow.telecommunity.com>

At 05:32 PM 7/11/2011 +1000, Nick Coghlan wrote:
>On Mon, Jul 11, 2011 at 5:04 PM, Eric Snow 
><ericsnowcurrently at gmail.com> wrote:
> > FWIW, I think the solution in the PEP is the clearest approach, if
> > "partitioned by default" is not an option.  And if that and the other
> > alternate solutions are not feasible, it would be nice to have them
> > added to the "rejected" section because they are still reasonable
> > ideas.  Still, it would be nice if we didn't have to add a new
> > packageness indicator.
>
>The runtime performance impact kills "partitioned by default" (i.e. no
>marker files needed to indicate partitioned packages).

Actually, partitioned by default is the *best* performance option we 
have for implementing this PEP, because it only uses a stat rather 
than a listdir.  Backward compatibility is the thing that kills it.

That's why I made the more recent "py-pkg/" proposal -- it has the 
same degree of backward compatibility as flag files does, but keeps 
the improved performance of partitioning by default.


>One question then is whether, given that a partitioned package has
>already been identified, should unmarked directories later on sys.path
>count as part of that package? My answer is no, as this is the only
>answer that provides consistent behaviour. Otherwise, unmarked
>directories may or may not be detected as part of the package
>depending on whether or not a partitioned package directory was found
>earlier on the path.

This is already in the PEP draft I wrote, and it's definitely the 
correct semantics for marker files approach.  The py-pkg approach of 
course works similarly, since the py-pkg directory is the "marker" in 
that case.


>As far as the specific suggestion of using a "marker directory"
>instead of marker files goes, I don't really see the benefit (and
>plenty of downsides). I put it in the same category as using a special
>extension on the directory name (since that's what it is, only using
>"/" as the separator instead of ".") and reject it for the same
>reasons.

What are the downsides, exactly?  Special extensions don't work with 
the distutils; a prefix does.  (I've tested it.)  Most tools that 
look for code can be given a prefix to look for the code, but not an 
extension.  It's *quite* a different proposition than specially-named 
directories -- especially since only the package root is affected, 
not every subpackage directory.


From martin at v.loewis.de  Tue Jul 12 00:24:35 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 12 Jul 2011 00:24:35 +0200
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <20110711151810.260883A4100@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
	<4E1A9DE1.1090809@v.loewis.de>
	<20110711151810.260883A4100@sparrow.telecommunity.com>
Message-ID: <4E1B7823.9000901@v.loewis.de>

Am 11.07.2011 17:17, schrieb P.J. Eby:
> At 08:53 AM 7/11/2011 +0200, Martin v. L?wis wrote:
>> - in PHP, namespaces and files are completely unrelated:
>>   http://php.net/manual/en/language.namespaces.php
>>   The files you want to use are passed to "include". include takes
>>   file names, not namespace names. Only after including the file,
>>   PHP finds out what namespace the stuff is in it imported.
> 
> I mean that in PHP, when you 'include "foo/bar"', the entire include
> path is searched for foo/bar.  PHP namespaces are a new feature.

As you say, namespaces are new. IIUC, before that, there was a single
flat namespace, and file names had no relationship to identifiers.
So I don't see why the PHP include mechanism is related to "namespace
packages" at all. It's more like Python's import before the introduction
of packages (but even then, the modules formed namespaces, which they
don't in PHP).

>> *All* Python packages are namespaces. What specific property of the
>> package mechanism do you mean when you say "namespacey"?
> 
> The feature that allows a "package" to be merely an agglomeration of
> child elements, rather than an entity in itself.

I still think "namespace package" is a misnomer for that. In addition,
even a namespace package is "an entity in itself". "import zope" will
give me a proper module object bound to the name zope, with reflection,
and all. I can do

zope.foo = 1

if I want to. It's *technically* the case that you shouldn't have
any code in it, although also technically, it would be put more stuff
into __init__.py, as long as you do so for all portions of the
package.

Regards,
Martin

From pje at telecommunity.com  Tue Jul 12 01:07:09 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 19:07:09 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
Message-ID: <20110711230729.82A813A414B@sparrow.telecommunity.com>

At 08:59 AM 7/11/2011 +0200, Martin v. L?wis wrote:
> > In at least Perl, PHP, and Java, you don't have to do anything special
> > to merge components in a single namespace from multiple parts of the
> > class/include/autoload path.
>
>Not true. In all three languages, you have to declare in the module what
>package it belongs to. So there is something special to do.

But you have to do that for *every* package, so it's not 
special.  (i.e., by special, I meant, "in addition to what you do 
normally to make a package".


>I'd go further: any directory with the package name could constitute
>a portion of the package. With your approach, you'd need a file with
>an importable extension in each portion of the "zope" package, right?

For that version, yes.  For the performance and compatibility reasons 
discussed elsewhere in this thread, though, that particular variation 
isn't really workable.


From pje at telecommunity.com  Tue Jul 12 01:06:58 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 19:06:58 -0400
Subject: [Import-SIG] PEP 382: Partial packages
Message-ID: <20110711230728.A21C73A4100@sparrow.telecommunity.com>

At 08:53 AM 7/11/2011 +0200, Martin v. L?wis wrote:
>- in PHP, namespaces and files are completely unrelated:
>   http://php.net/manual/en/language.namespaces.php
>   The files you want to use are passed to "include". include takes
>   file names, not namespace names. Only after including the file,
>   PHP finds out what namespace the stuff is in it imported.

I mean that in PHP, when you 'include "foo/bar"', the entire include 
path is searched for foo/bar.  PHP namespaces are a new feature.


>*All* Python packages are namespaces. What specific property of the
>package mechanism do you mean when you say "namespacey"?

The feature that allows a "package" to be merely an agglomeration of 
child elements, rather than an entity in itself.  If you read my 
draft proposal, it quotes Jim Fulton's original coining of the term 
"namespace" package, as a contrast to what he called a "module" package.

That is, some packages are self-contained entities, and others merely 
serve as a gathering place (namespace) for distinct entities.

This is not a property of packages themselves, but of the user's 
intention in organizing the package.  The other languages I mention 
all support the "namespace-only" use case better by allowing segments 
to be merged along their include/import paths.


From pje at telecommunity.com  Tue Jul 12 01:26:41 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 19:26:41 -0400
Subject: [Import-SIG] PEP 382: Partial packages
Message-ID: <20110711232701.C54033A4100@sparrow.telecommunity.com>

At 12:24 AM 7/12/2011 +0200, Martin v. L?wis wrote:
>So I don't see why the PHP include mechanism is related to "namespace
>packages" at all.

Because only in Python does a search for "foo/bar" (whatever the 
separator) *stops* the path search when there is a match for 
"foo/".  In PHP, Perl, and Java, searching continues along the path 
until the entire target is matched, regardless of whether the name 
parts are separated by slashes (PHP), dots (Java), or double-colons (Perl).

That's why Python's behavior here is arguably a misfeature.  In these 
other languages, there is a distinction between an entity named "x" 
and a *namespace* named "x::" (or "x/" or "x."), in their on-disk 
representations.

For example, in Java, the class org.Foo is distinct from the 
namespace org.Foo.* in on-disk representation, as you have 
org/Foo.java (or .class) sitting outside the directory org/Foo/ 
(where any contents of org.Foo.* would be located.

Similarly in Perl, Foo.pm sits outside the Foo/ directory, thereby 
distinguishing Foo and Foo::.

Python, however, in the case where both a Foo module and Foo package 
exist, places the module *inside* the package.

If Python were following the model of these other languages, then 
instead of using zope/__init__.py, we would place a zope.py in the 
parent directory, and when importing zope.interface, we would search 
the entire path for zope/ subdirectories containing an 
interface.py...  but we wouldn't look for interface/ directories 
until/unless we tried to import zope.interface.foo.


From pje at telecommunity.com  Tue Jul 12 03:01:52 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 21:01:52 -0400
Subject: [Import-SIG] One last try: "virtual packages"
Message-ID: <20110712010218.5E0873A4100@sparrow.telecommunity.com>

Ok, so based on the last round of discussions about terminology, and 
how other languages process their path, I got to doing some thinking, 
and here is one last try at a high-performing, markerless, 
ultra-backwards-compatible, approach to this thing.  I call it, 
"virtual packages".

The implementation consists of two small *additions* to today's 
import semantics.  These additions don't affect the performance or 
behavior of "standard" imports (i.e., the ones we have today), but 
enable certain imports that would currently fail, to succeed instead.

Overall, the goal is to make package imports work more like a user 
coming over from languages like Perl, Java, and PHP would expect with 
respect to subpath searching, and creation/expansion of 
packages.  (For instance, this proposal does away with the need to 
move 'foo.py' to 'foo/__init__.py' when turning a module into a package.)

Anyway, this'll be my last attempt at coming up with a markerless 
approach, but I do hope you'll take the time to read it carefully, as 
it has very different semantics and performance impacts from my 
previous proposals, even though it may sound quite similar on the surface.

In particular, this proposal is the ONLY implementation ever proposed 
for this PEP that has *zero* filesystem-level performance overhead 
for normal imports.

That's right.  Zip.  Zero.  Nada.  None.  Nil.

The *only* cases where this proposal adds additional filesystem 
access overhead is in cases where, without this proposal, an 
ImportError would've happened under present-day import semantics.

So, read it and weep... or smile, or whatever.  ;-)


The First Addition - "Virtual" Packages
---------------------------------------

The first addition to existing import semantics is that if you try to 
import a submodule of a module with no __path__, then instead of 
treating it as a missing module, a __path__ is dynamically 
constructed, using namespace_subpath() calls on the parent path or sys.path.

If the resulting __path__ is empty, it's an import error.  Otherwise, 
the module's __path__ attribute is set, and the import goes ahead as 
if the module had been a package all along.

In other words, every module is a "virtual package".  If you treat it 
as a package, it'll become/act like one.  Otherwise, it's still a module.

This means that if, say, you have a bunch of directories named 
'random' on sys.path (without any __init__ modules in them), 
importing 'random' still imports the stdlib random.py.

However, if you try to import 'random.myspecialrandom', a __path__ 
will be constructed and used -- and if the submodule exists, it'll be 
imported.  (And if you later add a random/myspecialrandom/ directory 
somewhere on sys.path, you'll be able to import 
random.myspecialrandom.whatever out of it, by recursive application 
of this "virtual package" rule.)

Notice that this is very different from my previous attempt at a 
similar scheme.  First, it doesn't introduce any performance overhead 
on 'import random', as the extra lookups aren't done until and unless 
you try to 'import random.foo'...  which existing code of course will 
not be doing.

(Second, but also important, it doesn't distort the __path__ of 
packages with an __init__ module, because such packages are *not* 
virtual packages; they retain their present day semantics.)

Anyway, with this one addition, imports will now behave in a way 
that's friendly to users of e.g. Perl and Java, who expect the code 
for a module 'foo' to lie *outside* the foo/ directory, and for 
lookups of foo.bar or foo::bar to be searched for in foo/ 
subdirectories all along the respective equivalents of sys.path.

You now can simply ship a single zope.py to create a virtual "zope" 
package -- a namespace shared by code from multiple distributions.

But wait...  how does that fix the filename collision 
problem?  Aren't we still going to collide on the zope.py 
file?  Well, that's where the second addition comes in.


The Second Addition - "Pure Virtual" Packages
---------------------------------------------

The second addition is that, if an import fails to find a module 
entirely, then once again, a virtual __path__ is assembled using 
namespace_subpath() calls.  If the path is empty, the import 
fails.  But if it's non-empty, an empty module is created and its 
__path__ is set.

Voila...  we now have a "pure" virtual package.  (i.e. a package with 
no corresponding "defining" module).  So, if you have a bunch of 
__init__-free zope/ directories on sys.path, you can freely import from them.

But what happens if you DO have an __init__ module somewhere?  Well, 
because we haven't changed the normal import semantics, the first 
__init__ module ends up being a defining module, and by default, its 
__path__ is set in the normal way, just like today.  So, it's not a 
virtual package, it's a standard package.  If you must have a 
defining module, you'll have to move it from zope/__init__.py to zope.py.

(Either that, or use some sort of API call to explicitly request a 
virtual package __path__ to be set up.  But the recommended way to do 
it would be just to move the file up a level.)


Impact
------

This proposal doesn't affect performance of imports that don't ever 
*use* a virtual package __path__, because the setup is delayed until then.

It doesn't break installation tools: distutils and setuptools both 
handle this without blinking.  You just list your defining module (if 
you have one) in 'py_modules', along with any individual submodules, 
and you list the subpackages in 'packages'.

It doesn't break code-finding tools in any way that other 
implementation proposals don't.  (That is, ALL our proposals allow 
__init__ to go away, so tools are definitely going to require 
updating; all that differs between proposals is precisely what sort 
of updating is required.)

Really, about the only downside I can see is that this proposal can't 
be implemented purely via a PEP 302 meta-importer in Python 2.x.  The 
builtin __import__ function bails out when __path__ is missing on a 
parent, so it would actually require replacing __import__ in order to 
implement true virtual package support.

(For my own personal use case for this PEP in 2.x (i.e., replacing 
setuptools' current mechanism with a PEP-compliant one), it's not too 
big a deal, though, because I was still going to need explicit 
registration in .pth files: no matter the mechanism used, it isn't 
built into the interpreter, so it still has to be bootstrapped somehow!)

Anyway.  Thoughts?


From ericsnowcurrently at gmail.com  Tue Jul 12 04:19:46 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 11 Jul 2011 20:19:46 -0600
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
Message-ID: <CALFfu7BosWnHzP_og2cX1B-wFAOBRTQXD8BzJG4Ko6Sim6Qn+Q@mail.gmail.com>

On Mon, Jul 11, 2011 at 7:01 PM, P.J. Eby <pje at telecommunity.com> wrote:
> Ok, so based on the last round of discussions about terminology, and how
> other languages process their path, I got to doing some thinking, and here
> is one last try at a high-performing, markerless,
> ultra-backwards-compatible, approach to this thing. ?I call it, "virtual
> packages".
>
> The implementation consists of two small *additions* to today's import
> semantics. ?These additions don't affect the performance or behavior of
> "standard" imports (i.e., the ones we have today), but enable certain
> imports that would currently fail, to succeed instead.
>
> Overall, the goal is to make package imports work more like a user coming
> over from languages like Perl, Java, and PHP would expect with respect to
> subpath searching, and creation/expansion of packages. ?(For instance, this
> proposal does away with the need to move 'foo.py' to 'foo/__init__.py' when
> turning a module into a package.)
>
> Anyway, this'll be my last attempt at coming up with a markerless approach,
> but I do hope you'll take the time to read it carefully, as it has very
> different semantics and performance impacts from my previous proposals, even
> though it may sound quite similar on the surface.
>
> In particular, this proposal is the ONLY implementation ever proposed for
> this PEP that has *zero* filesystem-level performance overhead for normal
> imports.
>
> That's right. ?Zip. ?Zero. ?Nada. ?None. ?Nil.
>
> The *only* cases where this proposal adds additional filesystem access
> overhead is in cases where, without this proposal, an ImportError would've
> happened under present-day import semantics.
>
> So, read it and weep... or smile, or whatever. ?;-)
>
>
> The First Addition - "Virtual" Packages
> ---------------------------------------
>
> The first addition to existing import semantics is that if you try to import
> a submodule of a module with no __path__, then instead of treating it as a
> missing module, a __path__ is dynamically constructed, using
> namespace_subpath() calls on the parent path or sys.path.
>
> If the resulting __path__ is empty, it's an import error. ?Otherwise, the
> module's __path__ attribute is set, and the import goes ahead as if the
> module had been a package all along.
>
> In other words, every module is a "virtual package". ?If you treat it as a
> package, it'll become/act like one. ?Otherwise, it's still a module.
>
> This means that if, say, you have a bunch of directories named 'random' on
> sys.path (without any __init__ modules in them), importing 'random' still
> imports the stdlib random.py.
>
> However, if you try to import 'random.myspecialrandom', a __path__ will be
> constructed and used -- and if the submodule exists, it'll be imported.
> ?(And if you later add a random/myspecialrandom/ directory somewhere on
> sys.path, you'll be able to import random.myspecialrandom.whatever out of
> it, by recursive application of this "virtual package" rule.)
>
> Notice that this is very different from my previous attempt at a similar
> scheme. ?First, it doesn't introduce any performance overhead on 'import
> random', as the extra lookups aren't done until and unless you try to
> 'import random.foo'... ?which existing code of course will not be doing.
>
> (Second, but also important, it doesn't distort the __path__ of packages
> with an __init__ module, because such packages are *not* virtual packages;
> they retain their present day semantics.)
>
> Anyway, with this one addition, imports will now behave in a way that's
> friendly to users of e.g. Perl and Java, who expect the code for a module
> 'foo' to lie *outside* the foo/ directory, and for lookups of foo.bar or
> foo::bar to be searched for in foo/ subdirectories all along the respective
> equivalents of sys.path.
>
> You now can simply ship a single zope.py to create a virtual "zope" package
> -- a namespace shared by code from multiple distributions.
>
> But wait... ?how does that fix the filename collision problem? ?Aren't we
> still going to collide on the zope.py file? ?Well, that's where the second
> addition comes in.
>
>
> The Second Addition - "Pure Virtual" Packages
> ---------------------------------------------
>
> The second addition is that, if an import fails to find a module entirely,
> then once again, a virtual __path__ is assembled using namespace_subpath()
> calls. ?If the path is empty, the import fails. ?But if it's non-empty, an
> empty module is created and its __path__ is set.
>
> Voila... ?we now have a "pure" virtual package. ?(i.e. a package with no
> corresponding "defining" module). ?So, if you have a bunch of __init__-free
> zope/ directories on sys.path, you can freely import from them.
>
> But what happens if you DO have an __init__ module somewhere? ?Well, because
> we haven't changed the normal import semantics, the first __init__ module
> ends up being a defining module, and by default, its __path__ is set in the
> normal way, just like today. ?So, it's not a virtual package, it's a
> standard package. ?If you must have a defining module, you'll have to move
> it from zope/__init__.py to zope.py.
>
> (Either that, or use some sort of API call to explicitly request a virtual
> package __path__ to be set up. ?But the recommended way to do it would be
> just to move the file up a level.)
>
>
> Impact
> ------
>
> This proposal doesn't affect performance of imports that don't ever *use* a
> virtual package __path__, because the setup is delayed until then.
>
> It doesn't break installation tools: distutils and setuptools both handle
> this without blinking. ?You just list your defining module (if you have one)
> in 'py_modules', along with any individual submodules, and you list the
> subpackages in 'packages'.
>
> It doesn't break code-finding tools in any way that other implementation
> proposals don't. ?(That is, ALL our proposals allow __init__ to go away, so
> tools are definitely going to require updating; all that differs between
> proposals is precisely what sort of updating is required.)
>
> Really, about the only downside I can see is that this proposal can't be
> implemented purely via a PEP 302 meta-importer in Python 2.x. ?The builtin
> __import__ function bails out when __path__ is missing on a parent, so it
> would actually require replacing __import__ in order to implement true
> virtual package support.
>

I have been considering porting the 3.3 importlib to 2.x, for a
variety of reasons.  If the implementation for "virtual namespace
package portions" is done there then this shouldn't be a big deal.

> (For my own personal use case for this PEP in 2.x (i.e., replacing
> setuptools' current mechanism with a PEP-compliant one), it's not too big a
> deal, though, because I was still going to need explicit registration in
> .pth files: no matter the mechanism used, it isn't built into the
> interpreter, so it still has to be bootstrapped somehow!)
>
> Anyway. ?Thoughts?
>

Cool idea.  So for users the only difference is that suddenly foo.py
and a foo directory (without __init__.py) can coexist/cooperate, and
__init__.py becomes optional?

-eric

> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From martin at v.loewis.de  Tue Jul 12 08:03:59 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 12 Jul 2011 08:03:59 +0200
Subject: [Import-SIG] PEP 382: Partial packages
In-Reply-To: <20110711232645.7B60D3A4100@sparrow.telecommunity.com>
References: <20110709212103.00BFC3A404D@sparrow.telecommunity.com>
	<4E19E101.4020508@v.loewis.de> <4E19E867.4020703@trueblade.com>
	<4E1A205A.3010004@v.loewis.de>
	<20110710223044.E77943A4100@sparrow.telecommunity.com>
	<4E1A9DE1.1090809@v.loewis.de>
	<20110711151810.260883A4100@sparrow.telecommunity.com>
	<4E1B7823.9000901@v.loewis.de>
	<20110711232645.7B60D3A4100@sparrow.telecommunity.com>
Message-ID: <4E1BE3CF.1070201@v.loewis.de>

> Because only in Python does a search for "foo/bar" (whatever the
> separator) *stops* the path search when there is a match for "foo/".  In
> PHP, Perl, and Java, searching continues along the path until the entire
> target is matched, regardless of whether the name parts are separated by
> slashes (PHP), dots (Java), or double-colons (Perl).
> 
> That's why Python's behavior here is arguably a misfeature.

Hmm. For PHP, I don't think it's better, just different - you can
*never* include a directory, so the directory is not a recognized entity
in the include mechanism at all. It's the file system that is
hierarchical, not the PHP namespace concept (except for new-style
namespaces, which we seem to agree are unrelated).

> For example, in Java, the class org.Foo is distinct from the namespace
> org.Foo.* in on-disk representation, as you have org/Foo.java (or
> .class) sitting outside the directory org/Foo/ (where any contents of
> org.Foo.* would be located.

Not true; see the attached example. Compiling foo/baz.java gives

foo/baz.java:6: cannot find symbol
symbol  : variable foobar
location: class foo.bar
                System.out.println(foo.bar.foobar.V);

It decides that foo.bar is a class (from foo/bar.java), so
foo.bar.foobar should be something inside the class (such as a nested
class), and the package foo/bar is not considered anymore. If you delete
bar.java, and refer to foo.bar1.V instead in baz.java, it compiles.

IOW, you can't have a class and a package with the same name in Java.

> Similarly in Perl, Foo.pm sits outside the Foo/ directory, thereby
> distinguishing Foo and Foo::.

I agree that this is better.

> If Python were following the model of these other languages, then
> instead of using zope/__init__.py, we would place a zope.py in the
> parent directory, and when importing zope.interface, we would search the
> entire path for zope/ subdirectories containing an interface.py...  but
> we wouldn't look for interface/ directories until/unless we tried to
> import zope.interface.foo.

I think it can actually work, and will propose a PEP (wording) in that
direction shortly.

Regards,
Martin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.tar
Type: application/x-tar
Size: 10240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110712/77c1a21a/attachment-0001.tar>

From ncoghlan at gmail.com  Tue Jul 12 09:53:13 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Jul 2011 17:53:13 +1000
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110711152605.8A3083A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
	<20110711152605.8A3083A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7eB4UaMdvJfqmxeVo7871Ax38amw0EgLrpf5TY2y0Y7ww@mail.gmail.com>

On Tue, Jul 12, 2011 at 1:25 AM, P.J. Eby <pje at telecommunity.com> wrote:
> At 05:32 PM 7/11/2011 +1000, Nick Coghlan wrote:
>>
>> On Mon, Jul 11, 2011 at 5:04 PM, Eric Snow <ericsnowcurrently at gmail.com>
>> wrote:
>> > FWIW, I think the solution in the PEP is the clearest approach, if
>> > "partitioned by default" is not an option. ?And if that and the other
>> > alternate solutions are not feasible, it would be nice to have them
>> > added to the "rejected" section because they are still reasonable
>> > ideas. ?Still, it would be nice if we didn't have to add a new
>> > packageness indicator.
>>
>> The runtime performance impact kills "partitioned by default" (i.e. no
>> marker files needed to indicate partitioned packages).
>
> Actually, partitioned by default is the *best* performance option we have
> for implementing this PEP, because it only uses a stat rather than a
> listdir. ?Backward compatibility is the thing that kills it.

By "partitioned by default" I meant the prospect of continuing to
search sys.path after finding the email (etc.) directory in the stdlib
zipfile. Slowing down everything in order to speed up a new feature
isn't a good trade-off.

>> As far as the specific suggestion of using a "marker directory"
>> instead of marker files goes, I don't really see the benefit (and
>> plenty of downsides). I put it in the same category as using a special
>> extension on the directory name (since that's what it is, only using
>> "/" as the separator instead of ".") and reject it for the same
>> reasons.
>
> What are the downsides, exactly? ?Special extensions don't work with the
> distutils; a prefix does. ?(I've tested it.) ?Most tools that look for code
> can be given a prefix to look for the code, but not an extension. ?It's
> *quite* a different proposition than specially-named directories --
> especially since only the package root is affected, not every subpackage
> directory.

>From the revised PEP draft [1] re. a directory suffix:

"""   The downsides, however, are also plentiful.  If a package starts
   its life as a normal package, it must be renamed when it becomes
   a namespace, with the implied consequences for revision control
   tools.

   Further, there is an immense body of existing code (including the
   distutils and many other packaging tools) that expect a package
   directory's name to be the same as the package name.  And porting
   existing Python 2.x namespace packages to Python 3 would require
   widespread directory renaming as well.

   In short, this approach would require a vastly larger number of
   changes to both the standard library and third-party code, for
   a tiny potential performance improvement and a small increase in
   clarity.  It was therefore rejected on "practicality vs. purity"
   grounds."""

[1] http://mail.python.org/pipermail/import-sig/2011-July/000213.html

There are plenty of practical objections to having to move files
around and rename directories in order to turn an ordinary package
into a partitioned package. Those objections are just as valid for the
subdirectory approach as they are for a directory suffix. Dropping a
marker file into the directory is simple by contrast.

As someone that uses a dir tree+file list view to manage my file
system, I also think the subdirectory approach would be absolutely
hideous to navigate and manage. It works for __pycache__ because I
don't care what's in those (most of the time) and they don't have any
subdirectories. But for the actual package source code? And
potentially nested for subpackages? Yuck. Awful UI design.

*ding* <--- lightbulb

However, the __pycache__ example did just trigger an idea that may
give us the best of both worlds.

1. We use a shared marker *directory* called __package__ to indicate
partitioned packages. The import system just does a stat check for
__init__.py and a __package__ subdir to see if a directory is a Python
package directory.

2. All the .pyp files go inside the __package__ subdir rather than
being placed directly in the same directory as the package source
code.

No os.listdir() calls, no need to move files around to create a
partitioned package, no cluttering of the main package directories
with *.pyp files and distro packaging utilities are quite happy with
the idea of multiple packages writing to the same directory.

Thoughts?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From eric at trueblade.com  Tue Jul 12 09:57:59 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 12 Jul 2011 03:57:59 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CADiSq7eB4UaMdvJfqmxeVo7871Ax38amw0EgLrpf5TY2y0Y7ww@mail.gmail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
	<20110711152605.8A3083A4100@sparrow.telecommunity.com>
	<CADiSq7eB4UaMdvJfqmxeVo7871Ax38amw0EgLrpf5TY2y0Y7ww@mail.gmail.com>
Message-ID: <4E1BFE87.6030400@trueblade.com>

On 7/12/2011 3:53 AM, Nick Coghlan wrote:

> *ding* <--- lightbulb
> 
> However, the __pycache__ example did just trigger an idea that may
> give us the best of both worlds.
> 
> 1. We use a shared marker *directory* called __package__ to indicate
> partitioned packages. The import system just does a stat check for
> __init__.py and a __package__ subdir to see if a directory is a Python
> package directory.
> 
> 2. All the .pyp files go inside the __package__ subdir rather than
> being placed directly in the same directory as the package source
> code.

Why would we need the .pyp files, if we already have the __package__
subdir? Isn't the existence of the subdir enough?

The only reason I can think of is for mercurial, which doesn't like
empty directories. But then the file could be anything, and python would
never look for it. For tools like RPM the files in the subdir would need
to be unique per-RPM, but I don't think that's Python's concern.

Eric.


From ncoghlan at gmail.com  Tue Jul 12 10:07:31 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Jul 2011 18:07:31 +1000
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7fNosrvy7G4tj6CvNgzsR40UdUuBFrScknYgCr+iT5phA@mail.gmail.com>

On Tue, Jul 12, 2011 at 11:01 AM, P.J. Eby <pje at telecommunity.com> wrote:
> Anyway, this'll be my last attempt at coming up with a markerless approach,
> but I do hope you'll take the time to read it carefully, as it has very
> different semantics and performance impacts from my previous proposals, even
> though it may sound quite similar on the surface.

My first reaction is "I like it". It's the only one of the proposals
put forward that will make "Why aren't my packages working?" questions
on Stack Overflow go away. Boilerplate is bad, empty __init__.py files
are boilerplate, and this change would let them die off gracefully.

__init__.py would essentially become the package equivalent of
__slots__ (i.e. declaring that the package was limited to that one
directory).

My second reaction is a work in progress. Going to need to think about
this one for a while :)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Tue Jul 12 14:58:08 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 12 Jul 2011 22:58:08 +1000
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <4E1BFE87.6030400@trueblade.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
	<20110711152605.8A3083A4100@sparrow.telecommunity.com>
	<CADiSq7eB4UaMdvJfqmxeVo7871Ax38amw0EgLrpf5TY2y0Y7ww@mail.gmail.com>
	<4E1BFE87.6030400@trueblade.com>
Message-ID: <CADiSq7f0PuiEkZUO7gM11jWYmUcb4+KJjLSHg7vKPE6U4Nwjjg@mail.gmail.com>

On Tue, Jul 12, 2011 at 5:57 PM, Eric V. Smith <eric at trueblade.com> wrote:
> Why would we need the .pyp files, if we already have the __package__
> subdir? Isn't the existence of the subdir enough?
>
> The only reason I can think of is for mercurial, which doesn't like
> empty directories. But then the file could be anything, and python would
> never look for it. For tools like RPM the files in the subdir would need
> to be unique per-RPM, but I don't think that's Python's concern.

For the reasons you say - empty directories aren't handled well by
many tools and if the directory is going to have content, then
*somebody* has to define the rules for playing well with others, so it
may as well be us.

However, I wrote this before reading PJE's last piece about virtual
packages. If that idea pans out (and I personally haven't spotted any
problems with it as yet) then we won't need a marker system at all, so
the point will become moot.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From barry at python.org  Tue Jul 12 17:03:29 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 12 Jul 2011 11:03:29 -0400
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
Message-ID: <20110712110329.337e5d17@resist.wooz.org>

It's a very interesting idea that is worth exploring.  A few things come to
mind:

- Under this scheme it's possible for names in a module to "suddenly" appear.
  E.g. I could install packages that extend existing top level modules like
  `time` or `string`.  This might be a good thing in that it gives 3rd party
  folks a more natural place to add things, but it could also open up a
  land-grab type collision if lots of people want to publish their packages as
  subpackage extensions to existing modules.

- It's unfortunate that this will be more difficult to back port to Python 2.

- It sounds like it will be more difficult to have a single code base that
  supports Python 2, Python3 <= 3.2, and Python 3.3.  This is because
  __init__.py is required in the first two, but does the wrong thing (I think
  ;) in a post-PEP 382 Python 3.3.  Adding a .pyp file that's ignored in
  anything that doesn't support PEP 382 would make it easier to support
  multiple Pythons.

- This should make vendor packaging tools happy because it does seem to
  eliminate file collisions (duplicate directories don't matter).

Let's see the PEP!
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110712/3f1fbf7e/attachment.pgp>

From pje at telecommunity.com  Tue Jul 12 17:34:49 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Tue, 12 Jul 2011 11:34:49 -0400
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <CADiSq7f0PuiEkZUO7gM11jWYmUcb4+KJjLSHg7vKPE6U4Nwjjg@mail.g
	mail.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
	<20110711152605.8A3083A4100@sparrow.telecommunity.com>
	<CADiSq7eB4UaMdvJfqmxeVo7871Ax38amw0EgLrpf5TY2y0Y7ww@mail.gmail.com>
	<4E1BFE87.6030400@trueblade.com>
	<CADiSq7f0PuiEkZUO7gM11jWYmUcb4+KJjLSHg7vKPE6U4Nwjjg@mail.gmail.com>
Message-ID: <20110712153521.5F4293A4100@sparrow.telecommunity.com>

At 10:58 PM 7/12/2011 +1000, Nick Coghlan wrote:
>For the reasons you say - empty directories aren't handled well by
>many tools and if the directory is going to have content, then
>*somebody* has to define the rules for playing well with others, so it
>may as well be us.
>
>However, I wrote this before reading PJE's last piece about virtual
>packages. If that idea pans out (and I personally haven't spotted any
>problems with it as yet) then we won't need a marker system at all, so
>the point will become moot.

True enough, but for the record, I like the idea.  I had previously 
thought of using a marker directory, but discarded it due to the fact 
that it seemed to make things more complicated to set up a 
package.  However, it occurs to me now that packaging tools can take 
responsibility for adding marker files to the directory, so for the 
end user, you just 'mkdir -p mypkg/py-pkg' or some such.  (I'm not 
keen on __package__ as the name; I'd rather something 
non-importable.  But that's a bikeshed for another time.)

I think one other thing that we can and should do with whatever 
approach we end up with, is to only require one level of 
marker.  There's virtually no benefit to restricting subpackage 
partitioning, because a subpackage's __path__ is always a subset of 
its parent's __path__.  So, as soon as you get down to something that 
only lives in a single directory, it'll be the same as if you'd 
restricted it.  Therefore, any drafts we do from this point forward 
should only require top-level markers.


From pje at telecommunity.com  Tue Jul 12 18:02:40 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Tue, 12 Jul 2011 12:02:40 -0400
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <20110712110329.337e5d17@resist.wooz.org>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
	<20110712110329.337e5d17@resist.wooz.org>
Message-ID: <20110712160304.A84103A4100@sparrow.telecommunity.com>

At 11:03 AM 7/12/2011 -0400, Barry Warsaw wrote:
>It's a very interesting idea that is worth exploring.  A few things come to
>mind:
>
>- Under this scheme it's possible for names in a module to "suddenly" appear.

Bear in mind that you still have to actually *import* those names, so 
it's not like they really "suddenly" appear.  And when you do import 
them, they'll be *modules*, not functions or classes or constants or anything.

>   E.g. I could install packages that extend existing top level modules like
>   `time` or `string`.  This might be a good thing in that it gives 3rd party
>   folks a more natural place to add things, but it could also open up a
>   land-grab type collision if lots of people want to publish their 
> packages as
>   subpackage extensions to existing modules.

True -- an ironic side-effect, given our intent to make it easier to 
*avoid* such collisions.  ;-)  However, given that this feature will 
probably NOT be available on versions <3.3 by default (see discussion 
below), it probably won't get *too* far out of hand.

Also, because you can't add new module *contents*, there's little 
benefit to doing this anyway.  Your users would have to do "from 
string.foobar import bizbaz" or "import string.foobar as foobar", 
anyway, so why not just make a "foobar.string" module and call it a day?

I also don't think we should really advertise the ability to extend 
other people's packages, except maybe to say, "don't do it."

We could also shut down the capability by requiring virtual packages 
to be declared in the module, if there is a defining module.  That 
would actually work well with cross-version compatibility (see below) 
but would add an extra step when turning a module into a package.


>- It's unfortunate that this will be more difficult to back port to Python 2.

Well, I'm not that bothered by it.  Python 2 still has its two 
existing ways to do this, and it's not *that* terribly hard to make 
an __import__ wrapper.  But there are some things that can be done to 
make it easier.


>- It sounds like it will be more difficult to have a single code base that
>   supports Python 2, Python3 <= 3.2, and Python 3.3.  This is because
>   __init__.py is required in the first two, but does the wrong thing (I think
>   ;) in a post-PEP 382 Python 3.3.  Adding a .pyp file that's ignored in
>   anything that doesn't support PEP 382 would make it easier to support
>   multiple Pythons.

There's a straightforward way to solve this.  Suppose we have a 
module called 'pep382', with a function 
'make_virtual(packagename)'.  In Python 2.x, setuptools will make 
"distributionname-version-nspkg.pth" files that just say 'import 
pep382; pep382.make_virtual("toplevelnamespace")', and the same 
solution would work for Python 3 through 3.2.  (In the .egg based 
install case, __init__.py gets used and the older API is called, but 
in future setuptools that'll be a wrapper over the pep382 module.)

For Python 3.3, these APIs don't need to be used, but they'll still 
work.  They just won't be doing anything significant.  You can drop 
use of the APIs as you drop support for older Pythons, and code 
targeted to 3.3+ can just do whatever.

For Python < 3.3, you have to get the pep382 module installed and 
activated somehow in order to use the feature.  However, once you do, 
you can use "pure virtual" packages without an __import__ hook, 
because a meta_path importer can catch an otherwise-failed import and 
set up an empty module with a __path__.

IOW, the difficult part of implementing this on 2.x is only the part 
where you allow transitioning from a 'foo' module to a 'foo' package 
without changing the module.  If you're using namespaces the way 
people mostly do now on 2.x, it works without an __import__ hook.

For this reason, I suggest that the default for the 
backwards-compatibility module be to only handle pure-virtual and 
declared-virtual packages, not module-extension virtual 
packages.  That way, the overhead remains low.  (Writing __import__ 
in Python adds overhead to *every* import statement, vs. the 
relatively small and infrequent overheads added by PEP 302 hooks.)


>Let's see the PEP!

Martin said something about working one up along similar lines 
himself; I'm curious to see what his proposal is.


From stephen.c.waterbury at nasa.gov  Tue Jul 12 17:38:44 2011
From: stephen.c.waterbury at nasa.gov (Stephen Waterbury)
Date: Tue, 12 Jul 2011 11:38:44 -0400
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <20110712110329.337e5d17@resist.wooz.org>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
	<20110712110329.337e5d17@resist.wooz.org>
Message-ID: <4E1C6A84.6020002@nasa.gov>

On 07/12/2011 11:03 AM, Barry Warsaw wrote:
> It's a very interesting idea that is worth exploring.  A few
> things come to mind:
>
> - Under this scheme it's possible for names in a module to
>   "suddenly" appear.  E.g. I could install packages that extend
>   existing top level modules like `time` or `string`.  This
>   might be a good thing in that it gives 3rd party folks a more
>   natural place to add things, but it could also open up a
>   land-grab type collision if lots of people want to publish
>   their packages as subpackage extensions to existing modules.

Names can suddenly appear only if installed and imported,
so that doesn't seem too scary to me.  As to the land-grab
type collision, there are similar dangers today -- we're all
consenting adults here ... ;)

> - It's unfortunate that this will be more difficult to back
>   port to Python 2.

To me the elegance seems worth the price (assuming no big gotchas
that haven't been noticed yet) ... OTOH, I'm not the one doing
the back porting ... "for the man who doesn't have to do it,
nothing is impossible ..." ;)

> - It sounds like it will be more difficult to have a single
>   code base that supports Python 2, Python3<= 3.2, and Python
>   3.3.  This is because __init__.py is required in the first two,
>   but does the wrong thing (I think ;) in a post-PEP 382 Python
>   3.3.  Adding a .pyp file that's ignored in anything that
>   doesn't support PEP 382 would make it easier to support
>   multiple Pythons.

That's a consideration, but it seems a fairly simple script could
add the __init__.py files to create a version of the codebase for
the Python versions that require them.  I might be
over-simplifying, though.

> - This should make vendor packaging tools happy because it does
>   seem to eliminate file collisions (duplicate directories don't
>   matter).

Right.

> Let's see the PEP!

The peanut gallery is riveted ... :)

Steve


From pje at telecommunity.com  Tue Jul 12 05:17:15 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 11 Jul 2011 23:17:15 -0400
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <CALFfu7BosWnHzP_og2cX1B-wFAOBRTQXD8BzJG4Ko6Sim6Qn+Q@mail.g
	mail.com>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
	<CALFfu7BosWnHzP_og2cX1B-wFAOBRTQXD8BzJG4Ko6Sim6Qn+Q@mail.gmail.com>
Message-ID: <20110712174734.CE4493A4116@sparrow.telecommunity.com>

At 08:19 PM 7/11/2011 -0600, Eric Snow wrote:
>Cool idea.  So for users the only difference is that suddenly foo.py
>and a foo directory (without __init__.py) can coexist/cooperate, and
>__init__.py becomes optional?

That, and if *no* directory of the given name has an __init__.py, 
then the directory is a virtual package that combines portions spread 
across sys.path.


From barry at python.org  Tue Jul 12 22:03:58 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 12 Jul 2011 16:03:58 -0400
Subject: [Import-SIG] One last try: "virtual packages"
In-Reply-To: <20110712160304.A84103A4100@sparrow.telecommunity.com>
References: <20110712010218.5E0873A4100@sparrow.telecommunity.com>
	<20110712110329.337e5d17@resist.wooz.org>
	<20110712160304.A84103A4100@sparrow.telecommunity.com>
Message-ID: <20110712160358.4f4f5398@resist.wooz.org>

On Jul 12, 2011, at 12:02 PM, P.J. Eby wrote:

>At 11:03 AM 7/12/2011 -0400, Barry Warsaw wrote:
>>It's a very interesting idea that is worth exploring.  A few things come to
>>mind:
>>
>>- Under this scheme it's possible for names in a module to "suddenly" appear.
>
>Bear in mind that you still have to actually *import* those names, so it's
>not like they really "suddenly" appear.  And when you do import them, they'll
>be *modules*, not functions or classes or constants or anything.

Yeah, I was just thinking about something dumb like a typo in an import
statement, but I think that's nothing realistic to be worried about.

>>   E.g. I could install packages that extend existing top level modules like
>>   `time` or `string`.  This might be a good thing in that it gives 3rd party
>>   folks a more natural place to add things, but it could also open up a
>>   land-grab type collision if lots of people want to publish their > packages as
>>   subpackage extensions to existing modules.
>
>True -- an ironic side-effect, given our intent to make it easier to *avoid*
>such collisions.  ;-) However, given that this feature will probably NOT be
>available on versions <3.3 by default (see discussion below), it probably
>won't get *too* far out of hand.

We'll let you eat those words in 15 years when Python 4.7 comes out. :)

>Also, because you can't add new module *contents*, there's little benefit to
>doing this anyway.  Your users would have to do "from string.foobar import
>bizbaz" or "import string.foobar as foobar", anyway, so why not just make a
>"foobar.string" module and call it a day?
>
>I also don't think we should really advertise the ability to extend other
>people's packages, except maybe to say, "don't do it."

Agreed.  I did want to bring this up as a side-effect of the feature.

>We could also shut down the capability by requiring virtual packages to be
>declared in the module, if there is a defining module.  That would actually
>work well with cross-version compatibility (see below) but would add an extra
>step when turning a module into a package.

I'd rather go the other way.  IOW, leave it open by default but perhaps
provide an API that allows a module to declare itself closed to submodules.  I
don't actually expect that to be used much, so I'm happy to call YAGNI on it.
But I don't want to require a defining module for virtual packages, because
that makes it less useful for vendor packagers.

Generally, I think we'd prefer not to have defining modules, but when we do,
we can have the defmod.py owned by exactly one vendor package, and then
submodules would add dependencies on that defining module.  This is actually
one way we currently handle colliding __init__.py files, but it kind of sucks
because it makes packaging submodules more complicated.

>>- It's unfortunate that this will be more difficult to back port to Python 2.
>
>Well, I'm not that bothered by it.  Python 2 still has its two existing ways
>to do this, and it's not *that* terribly hard to make an __import__ wrapper.
>But there are some things that can be done to make it easier.

I'm also not entirely sure I'd want to back port this into our Python 2
versions anyway, at least not without fully understanding the performance and
other implications.  I'd rather spend the effort to get people switched to
Python 3. :)

>>- It sounds like it will be more difficult to have a single code base that
>>   supports Python 2, Python3 <= 3.2, and Python 3.3.  This is because
>>   __init__.py is required in the first two, but does the wrong thing (I
>>   think ;) in a post-PEP 382 Python 3.3.  Adding a .pyp file that's ignored
>>   in anything that doesn't support PEP 382 would make it easier to support
>>   multiple Pythons.
>
>There's a straightforward way to solve this.  Suppose we have a module called
>'pep382', with a function 'make_virtual(packagename)'.  In Python 2.x,
>setuptools will make "distributionname-version-nspkg.pth" files that just say
>'import pep382; pep382.make_virtual("toplevelnamespace")', and the same
>solution would work for Python 3 through 3.2.  (In the .egg based install
>case, __init__.py gets used and the older API is called, but in future
>setuptools that'll be a wrapper over the pep382 module.)
>
>For Python 3.3, these APIs don't need to be used, but they'll still work.
>They just won't be doing anything significant.  You can drop use of the APIs
>as you drop support for older Pythons, and code targeted to 3.3+ can just do
>whatever.
>
>For Python < 3.3, you have to get the pep382 module installed and activated
>somehow in order to use the feature.  However, once you do, you can use "pure
>virtual" packages without an __import__ hook, because a meta_path importer
>can catch an otherwise-failed import and set up an empty module with a
>__path__.
>
>IOW, the difficult part of implementing this on 2.x is only the part where
>you allow transitioning from a 'foo' module to a 'foo' package without
>changing the module.  If you're using namespaces the way people mostly do now
>on 2.x, it works without an __import__ hook.
>
>For this reason, I suggest that the default for the backwards-compatibility
>module be to only handle pure-virtual and declared-virtual packages, not
>module-extension virtual packages.  That way, the overhead remains low.
>(Writing __import__ in Python adds overhead to *every* import statement,
>vs. the relatively small and infrequent overheads added by PEP 302 hooks.)

I'm less concerned about the foo-module-to-foo-package case, so I'm okay with
that being more difficult in Python < 3.3.

>>Let's see the PEP!
>
>Martin said something about working one up along similar lines himself; I'm
>curious to see what his proposal is.

+1
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110712/c7ab0f49/attachment.pgp>

From ncoghlan at gmail.com  Wed Jul 13 03:17:12 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 13 Jul 2011 11:17:12 +1000
Subject: [Import-SIG] What if namespace imports weren't special?
In-Reply-To: <20110712153521.5F4293A4100@sparrow.telecommunity.com>
References: <20110711023503.DEDD63A4100@sparrow.telecommunity.com>
	<CADiSq7cBJb5s=7YP6ze5Mvxry6kze5_BKGtO7NUBCbKzSrL0Ug@mail.gmail.com>
	<20110711035731.C012E3A4100@sparrow.telecommunity.com>
	<20110711043932.22F8B3A4100@sparrow.telecommunity.com>
	<CAP1=2W7BhGNwqfumUKWA0b4Pan5EUjAhNHvAeG3omnc483PPCg@mail.gmail.com>
	<20110711051855.484273A4100@sparrow.telecommunity.com>
	<CALFfu7BRHyNPSVFmz6xmtGKCGfuQW_2s=5SHVZPqEb6TWKFF=A@mail.gmail.com>
	<CADiSq7ejFhNrpuwoW7x287dUdEfemGmEcoyn96ga1W0Mqi2fiA@mail.gmail.com>
	<20110711152605.8A3083A4100@sparrow.telecommunity.com>
	<CADiSq7eB4UaMdvJfqmxeVo7871Ax38amw0EgLrpf5TY2y0Y7ww@mail.gmail.com>
	<4E1BFE87.6030400@trueblade.com>
	<CADiSq7f0PuiEkZUO7gM11jWYmUcb4+KJjLSHg7vKPE6U4Nwjjg@mail.gmail.com>
	<20110712153521.5F4293A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7dyjKG28WTPzbRG74k_YL2QP+uzE_WF4Kh2QZ9A1zxi-g@mail.gmail.com>

On Wed, Jul 13, 2011 at 1:34 AM, P.J. Eby <pje at telecommunity.com> wrote:
> At 10:58 PM 7/12/2011 +1000, Nick Coghlan wrote:
>>
>> For the reasons you say - empty directories aren't handled well by
>> many tools and if the directory is going to have content, then
>> *somebody* has to define the rules for playing well with others, so it
>> may as well be us.
>>
>> However, I wrote this before reading PJE's last piece about virtual
>> packages. If that idea pans out (and I personally haven't spotted any
>> problems with it as yet) then we won't need a marker system at all, so
>> the point will become moot.
>
> True enough, but for the record, I like the idea. ?I had previously thought
> of using a marker directory, but discarded it due to the fact that it seemed
> to make things more complicated to set up a package. ?However, it occurs to
> me now that packaging tools can take responsibility for adding marker files
> to the directory, so for the end user, you just 'mkdir -p mypkg/py-pkg' or
> some such. ?(I'm not keen on __package__ as the name; I'd rather something
> non-importable. ?But that's a bikeshed for another time.)

I think we chose the colour of that particular bikeshed back when
__pycache__ was added :)

> I think one other thing that we can and should do with whatever approach we
> end up with, is to only require one level of marker. ?There's virtually no
> benefit to restricting subpackage partitioning, because a subpackage's
> __path__ is always a subset of its parent's __path__. ?So, as soon as you
> get down to something that only lives in a single directory, it'll be the
> same as if you'd restricted it. ?Therefore, any drafts we do from this point
> forward should only require top-level markers.

+1 on having a multi-path parent imply multi-path support in subpackages.

Given the significant differences between the two approaches, perhaps
the marker directory idea should be written up as the "best of breed"
version of PEP 382 (probably under the name "partitioned packages"),
with a new PEP for the radically different "virtual packages"
alternative? I think publishing the two side-by-side will actually
help sell the virtual packages idea (Option A: Choose which flavour of
boilerplate you want to use to make your packages work; Option B: What
boilerplate?).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From pje at telecommunity.com  Wed Jul 13 19:11:48 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 13 Jul 2011 13:11:48 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
Message-ID: <20110713171345.4E0673A4100@sparrow.telecommunity.com>

I'd appreciate any questions, problems, clarifications, concerns, 
etc. so we can clean this up before we run it past Python-Dev.  There 
are also a couple of "XXX" comments down in the "Implementation 
Notes" section, with open questions we need to nail down.  Mostly, 
though, this is looking...  pretty doable, actually.

Thanks!


PEP: XXX
Title: Simplified Package Layout and Partitioning
Version: $Revision$
Last-Modified: $Date$
Author: P.J. Eby
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Jul-2011
Python-Version: 3.3
Post-History:
Replaces: 382

Abstract
========

This PEP proposes an enhancement to Python's package importing
to:

* Surprise users of other languages less,
* Make it easier to convert a module into a package, and
* Support dividing packages into separately installed components
   (ala "namespace packages", as described in PEP 382)

The proposed enhancements do not change the semantics of any
currently-importable directory layouts, but make it possible for
packages to use a simplified directory layout (that is not importable
currently).

However, the proposed changes do NOT add any performance overhead to
the importing of existing modules or packages, and performance for the
new directory layout should be about the same as that of previous
"namespace package" solutions (such as ``pkgutil.extend_path()``).


The Problem
===========

.. epigraph::

     "Most packages are like modules.  Their contents are highly
     interdependent and can't be pulled apart.  [However,] some
     packages exist to provide a separate namespace. ...  It should
     be possible to distribute sub-packages or submodules of these
     [namespace packages] independently."

     -- Jim Fulton, shortly before the release of Python 2.3 [1]_


When new users come to Python from other languages, they are often
confused by Python's packaging semantics.  At Google, for example,
Guido received complaints from "a large crowd with pitchforks" [2]_
that the requirement for packages to contain an ``__init__`` module
was a "misfeature", and should be dropped.

In addition, users coming from languages like Java or Perl are
sometimes confused by a difference in Python's import path searching.

In most other languages that have a path mechanism to Python's
``sys.path``, a package is merely a namespace that contains modules
or classes, and can thus be spread across multiple directories in
the language's path.  In Perl, for instance, a ``Foo::Bar`` module
will be searched for in ``Foo/`` subdirectories all along the module
include path, not just in the first such subdirectory found.

Worse, this is not just a problem for new users: it prevents *anyone*
from easily splitting a package into separately-installable
components.  In Perl terms, it would be as if every possible ``Net::``
module on CPAN had to be bundled up and shipped in a single tarball!

For that reason, various workarounds for this latter limitation exist,
circulated under the term "namespace packages".  The Python standard
library has provided one such workaround since Python 2.3 (via the
``pkgutil.extend_path()`` function), and the "setuptools" package
provides another (via ``pkg_resources.declare_namespace()``).

The workarounds themselves, however, fall prey to a *third* issue with
Python's way of laying out packages in the filesystem.

Because a package *must* contain an ``__init__`` module, any attempt
to distribute modules for that package must necessarily include that
``__init__`` module, if those modules are to be importable.

However, the very fact that each distribution of modules for a package
must contain this (duplicated) ``__init__`` module, means that OS
vendors who package up these module distributions must somehow handle
the conflict caused by several distributions installing that
``__init__`` module to the same location in the filesystem.

This led to the proposing of PEP 382 ("Namespace Packages") - a way
to signal to Python's import machinery that a directory was
importable, using unique filenames per module distribution.

However, there was more than one downside to this approach.
Performance for all import operations would be affected, and the
process of designating a package became even more complex.  New
terminology had to be invented to explain the solution, and so on.

As terminology discussions continued on the Import-SIG, it soon became
apparent that the main reason it was so difficult to explain the
concepts related to "namespace packages" was because Python's
current way of handling packages is somewhat underpowered, when
compared to other languages.

That is, in other popular languages with package systems, no special
term is needed to describe "namespace packages", because *all*
packages generally behave in the desired fashion.

Rather than being an isolated single directory with a special marker
module (as in Python), packages in other languages are typically just
a *union of appropriately-named directories* across the *entire*
import or inclusion path.

In Perl, for example, the module ``Foo`` is always found in a
``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a
``Foo/Bar.pm`` file.  (In other words, there is One Obvious Way to
find the location of a particular module.)

This is because Perl considers a module to be *different* from a
package: the package is purely a *namespace* in which other modules
may reside, and is only *coincidentally* the name of a module as well.

In current versions of Python, however, the module and the package are
more tightly bound together.  ``Foo`` is always a module -- whether it
is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly
linked to its submodules (if any), which *must* reside in the exact
same directory where the ``__init__.py`` was found.

On the positive side, this design choice means that a package is quite
self-contained, and can be installed, copied, etc. as a unit just by
performing an operation on the package's root directory.

On the negative side, however, it is non-intuitive for beginners, and
requires a more complex step to turn a module into a package.  If
``Foo`` begins its life as ``Foo.py``, then it must be moved and
renamed to ``Foo/__init__.py``.

Conversely, if you intend to create a ``Foo.Bar`` module from the
start, but have no particular module contents to put in ``Foo``
itself, then you have to create an empty and seemingly-irrelevant
``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported.

(And these issues don't just confuse newcomers to the language,
either: they annoy many experienced developers as well.)

So, after some discussion on the Import-SIG, this PEP was created
as an alternative to PEP \382, in an attempt to solve *all* of the
above problems, not just the "namespace package" use cases.

And, as a delightful side effect, the solution proposed in this PEP
does not affect the import performance of ordinary modules or
self-contained (i.e. ``__init__``-based) packages.


The Solution
============

In the past, various proposals have been made to allow more intuitive
approaches to package directory layout.  However, most of them failed
because of an apparent backward-compatibility problem.

That is, if the requirement for an ``__init__`` module were simply
dropped, it would open up the possibility for a directory named, say,
``string`` on ``sys.path``, to block importing of the standard library
``string`` module.

Paradoxically, however, the failure of this approach does *not* arise
from the elimination of the ``__init__`` requirement!

Rather, the failure arises because the underlying approach takes for
granted that a package is just ONE thing, instead of two.

In truth, a package comprises two separate, but related entities: a
module (with its own, optional contents), and a *namespace* where
*other* modules or packages can be found.

In current versions of Python, however, the module part (found in
``__init__``) and the namespace for submodule imports (represented
by the ``__path__`` attribute) are both initialized at the same time,
when the package is first imported.

And, if you assume this is the *only* way to initialize these two
things, then there is no way to drop the need for an ``__init__``
module, while still being backwards-compatible with existing directory
layouts.

After all, as soon as you encounter a directory on ``sys.path``
matching the desired name, that means you've "found" the package, and
must stop searching, right?

Well, not quite.


A Thought Experiment
--------------------

Let's hop into the time machine for a moment, and pretend we're back
in the early 1990s, shortly before Python packages and ``__init__.py``
have been invented.  But, imagine that we *are* familiar with
Perl-like package imports, and we want to implement a similar system
in Python.

We'd still have Python's *module* imports to build on, so we could
certainly conceive of having ``Foo.py`` as a parent ``Foo`` module
for a ``Foo`` package.  But how would we implement submodule and
subpackage imports?

Well, if we didn't have the idea of ``__path__`` attributes yet,
we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``.

But we'd *only* do it when someone actually tried to *import*
``Foo.Bar``.

NOT when they imported ``Foo``.

And *that* lets us get rid of the backwards-compatibility problem
of dropping the ``__init__`` requirement, back here in 2011.

How?

Well, when we ``import Foo``, we're not even *looking* for ``Foo/``
directories on ``sys.path``, because we don't *care* yet.  The only
point at which we care, is the point when somebody tries to actually
import a submodule or subpackage of ``Foo``.

That means that if ``Foo`` is a standard library module (for example),
and I happen to have a ``Foo`` directory on ``sys.path`` (without
an ``__init__.py``, of course), then *nothing breaks*.  The ``Foo``
module is still just a module, and it's still imported normally.


Self-Contained vs. "Virtual" Packages
-------------------------------------

Of course, in today's Python, trying to ``import Foo.Bar`` will
fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a
``__path__`` attribute).

So, this PEP proposes to *dynamically* create a ``__path__``, in the
case where one is missing.

That is, if I try to ``import Foo.Bar`` the proposed change to the
import machinery will notice that the ``Foo`` module lacks a
``__path__``, and will therefore try to *build* one before proceeding.

And it will do this by making a list of all the existing ``Foo/``
subdirectories of the directories listed in ``sys.path``.

If the list is empty, the import will fail with ``ImportError``, just
like today.  But if the list is *not* empty, then it is saved in
a new ``Foo.__path__`` attribute, making the module a "virtual
package".

That is, because it now has a valid ``__path__``, we can proceed
to import submodules or subpackages in the normal way.

Now, notice that this change does not affect "classic", self-contained
packages that have an ``__init__`` module in them.  Such packages
already *have* a ``__path__`` attribute (initialized at import time)
so the import machinery won't try to create another one later.

This means that (for example) the standard library ``email`` package
will not be affected in any way by you having a bunch of unrelated
directories named ``email`` on ``sys.path``.

But it *does* mean that if you want to turn your ``Foo`` module into
a ``Foo`` package, all you have to do is add a ``Foo/`` directory
somewhere on ``sys.path``, and start adding modules to it.

But what if you only want a "namespace package"?  That is, a package
that is *only* a namespace for various separately-distributed
submodules and subpackages?

For exmaple, if you're Zope Corporation, distributing dozens of
separate tools like ``zc.buildout``, each in packages under the ``zc``
namespace, you don't want to have to make and include an empty
``zc.py`` in every tool you ship.  (And, if you're a Linux or other
OS vendor, you don't want to deal with the package conflicts created
by trying to install ten copies of ``zc.py`` to the same location!)

No problem.  All we have to do is make one more minor tweak to the
import process: if the "classic" import process fails to find a
self-contained module or package (e.g., if ``import zc`` fails to find
a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a
``__path__`` by searching for all the ``zc/`` directories on
``sys.path``, and putting them in a list.

If this list is empty, we raise ``ImportError``.  But if it's
non-empty, we create an empty ``zc`` module, and put the list in
``zc.__path__``.  Congratulations: ``zc`` is now a namespace-only,
"pure virtual" package!  It has no module contents, but you can still
import submodules and subpackages from it, regardless of where they're
located on ``sys.path``.

(By the way, both of these additions to the import protocol (i.e. the
dynamically-added ``__path__``, and dynamically-created modules)
apply recursively to child packages, using the parent package's
``__path__`` in place of ``sys.path`` as a basis for generating a
child ``__path__``.  This means that self-contained and virtual
packages can contain each other without limitation, with the caveat
that if you put a virtual package inside a self-contained one, it's
gonna have a really short ``__path__``!)


Backwards Compatibility and Performance
---------------------------------------

Notice that these two changes *only* affect import operations that
today would result in ``ImportError``.  As a result, the performance
of imports that do not involve virtual packages is unaffected, and
potential backward compatibility issues are very restricted.

Today, if you try to import submodules or subpackages from a module
with no ``__path__``, it's an immediate error.  And of course, if you
don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path``
today, ``import zc`` would likewise fail.

Thus, the only potential backwards-compatibility issues are:

1. Tools that expect package directories to have an ``__init__``
    module, that expect directories without an ``__init__`` module
    to be unimportable, or that expect ``__path__`` attributes to be
    static, will not recognize virtual packages as packages.

    (In practice, this just means that tools will need updating to
    support virtual packages, e.g. by using ``pkgutil.walk_modules()``
    instead of using hardcoded filesystem searches.)

2. Code that *expects* certain imports to fail may now do something
    unexpected.  This should be fairly rare in practice, as most sane,
    non-test code does not import things that are expected not to
    exist!

The biggest likely exception to the above would be when a piece of
code tries to check whether some package is installed by importing
it.  If this is done *only* by importing a top-level module (i.e., not
checking for a ``__version__`` or some other attribute), *and* there
is a directory of the same name as the sought-for package on
``sys.path`` somewhere, *and* the package is not actually installed,
then such code could perhaps be fooled into thinking a package is
installed that really isn't.

However, even in this case, the failure is more likely to be annoying
than damaging; in most cases, the code will simply fail a little later
on, when it actually tries to DO something with the imported (but
empty) module.  (And code that checks for a ``__version__`` attribute
or the presence of some desired function, class, or module
in the package will not see such a false positive result in the
first place.)

Meanwhile, tools that expect to locate packages and modules by
walking a directory tree can be updated to use the existing
``pkgutil.walk_modules()`` API, and tools that need to inspect
packages in memory should use the other APIs described in the
`Standard Library Changes/Additions`_ section below.


Specification
=============

Two changes are made to the existing import process.

First, the built-in ``__import__`` function must not raise an
``ImportError`` when importing a submodule of a module with no
``__path__``.  Instead, it must attempt to *create* a ``__path__``
attribute for the parent module, as described in `__path__ creation`_
below.

Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent
package ``__path__``) fails to find a module, the import process must
also attempt to create a ``__path__`` attribute for the non-existent
module.  If the attempt succeeds, an empty module is created and its
``__path__`` is set.  Otherwise, importing fails.

In both of the above cases, if a non-empty ``__path__`` is created,
the name of the module whose ``__path__`` was created is added to
``sys.virtual_packages`` -- an initially-empty set of package names.

Conversely, if an empty ``__path__`` results, an ``ImportError``
is immediately raised, and the module is not created or changed, nor
is its name added to ``sys.virtual_packages``.

(This way, code that extends ``sys.path`` at runtime can find out
what virtual packages are currently imported, and thereby add any
new subdirectories to those packages' ``__path__`` attributes.  See
`Standard Library Changes/Additions`_ below for more details.)


``__path__`` Creation
---------------------

A virtual ``__path__`` is created by obtaining a PEP 302 "importer"
object for each of the path entries found in ``sys.path`` (for a
top-level module) or the parent ``__path__`` (for a submodule).

(Note: because ``sys.meta_path`` importers are not associated with
``sys.path`` or ``__path__`` entry strings, such importers do *not*
participate in this process.)

Each importer is checked for a ``get_subpath()`` method, and if
present, the method is called with the full name of the module the
``__path__`` is being constructed for.  The return value is either
a string representing a package subdirectory, or ``None`` if no such
subdirectory exists.

The strings returned by each importer are added to the ``__path__``
being built, in the same order as they are found.  (``None`` values
and missing ``get_subpath()`` methods are simply skipped.)

In Python code, the algorithm would look something like this::

     def get_virtual_path(modulename, parent_path=None):

         if parent_path is None:
             parent_path = sys.path

         path = []

         for entry in parent_path:
             # Obtain a PEP 302 importer object - see pkgutil module
             importer = pkgutil.get_importer(entry)

             if hasattr(importer, 'get_subpath'):
                 subpath = importer.get_subpath(modulename)
                 if subpath is not None:
                     path.append(subpath)

         return path

And a function like this one should be exposed in the standard
library as ``imp.get_virtual_path()``, so that people creating
``__import__`` replacements or ``sys.meta_path`` hooks can reuse it.


Standard Library Changes/Additions
----------------------------------

The ``pkgutil`` module should be updated to handle this
specification appropriately, including any necessary changes to
``extend_path()``, ``iter_modules()``, etc.  A new generic API for
calling ``get_subpath()`` on importers should be added as well.

Specifically the proposed changes and additions to ``pkgutil`` are:

* A new ``get_subpath(importer, fullname)`` generic function, allowing
   implementations to be registered for existing importers.

* A new ``extend_virtual_paths(path_entry)`` function, to extend
   existing, already-imported virtual packages' ``__path__`` attributes
   to include any portions found in a new ``sys.path`` entry.  This
   function should be called by applications extending ``sys.path``
   at runtime, e.g. when adding a plugin directory or an egg to the
   path.

   The implementation of this function does a simple top-down traversal
   of ``sys.virtual_packages``, and performs any necessary
   ``get_subpath()`` calls to identify what path entries need to
   be added to each package's ``__path__``, given that `path_entry`
   has been added to ``sys.path``.  (Or, in the case of sub-packages,
   adding a derived subpath entry, based on their parent namespace's
   ``__path__``.)

* A new ``iter_virtual_packages(parent='')`` function to allow
   top-down traversal of virtual packages in ``sys.virtual_packages``,
   by yielding the child virtual packages of `parent`.  For example,
   calling ``iter_virtual_packages("zope")`` might yield ``zope.app``
   and ``zope.products`` (if they are imported virtual packages listed
   in ``sys.virtual_packages``), but **not** ``zope.foo.bar``.
   (This function is needed to implement ``extend_virtual_paths()``,
   but is also potentially useful for other code that needs to inspect
   imported virtual packages.)

* ``ImpImporter.iter_modules()`` should be changed to also detect and
   yield the names of modules found in virtual packages.

In addition to the above changes, the ``zipimport`` importer should
have its ``iter_modules()`` implementation similarly changed.  (Note:
current versions of Python implement this via a shim in ``pkgutil``,
so technically this is also a change to ``pkgutil``.)

Last, but not least, the ``imp`` module should expose the algorithm
described in the `__path__ creation`_ section above, as a
``get_virtual_path(modulename, parent_path=None)`` function, so that
creators of ``__import__`` replacements can use it.


Implementation Notes
--------------------

For users, developers, and distributors of virtual packages:

* ``sys.virtual_packages`` is allowed to contain non-existent or
   not-yet-imported package names; code that uses its contents should
   not assume that every name in this set is also present in
   ``sys.modules`` or that importing the name will necessarily succeed.

* If you are changing a currently self-contained package into a
   virtual one, it's important to note that you can no longer use its
   ``__file__`` attribute to locate data files stored in a package
   directory.  Instead, you must search ``__path__`` or use the
   ``__file__`` of a submodule adjacent to the desired files, or
   of a self-contained subpackage that contains the desired files.

* XXX what is the __file__ of a "pure virtual" package?  ``None``?
   Some arbitrary string?  The path of the first directory with a
   trailing separator?  No matter what we put, *some* code is
   going to break, but the last choice might allow some code to
   accidentally work.  Is that good or bad?


For those implementing PEP \302 importer objects:

* Importers that support the ``iter_modules()`` method (used by
   ``pkgutil`` to locate importable modules and pacakges) and want to
   add virtual package support should modify their ``iter_modules()``
   method so that it discovers and lists virtual packages as well as
   standard modules and packages.  To do this, the importer should
   simply list all immediate subdirectory names in its jurisdiction
   that are valid Python identifiers.

   XXX This might list a lot of not-really-packages.  Should we
   require importable contents to exist?  If so, how deep do we
   search, and how do we prevent e.g. link loops, or traversing onto
   different filesystems, etc.?  Ick.

* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
   not need to implement ``get_subpath()``, because the method
   is only called on importers corresponding to ``sys.path`` entries
   and ``__path__`` entries.  If a meta importer wishes to support
   virtual packages, it must do so entirely within its own
   ``find_module()`` implementation.

   Unfortunately, it is unlikely that any such implementation will be
   able to merge its package subpaths with those of other meta
   importers or ``sys.path`` importers, so the meaning of "supporting
   virtual packages" for a meta importer is currently undefined!

   (However, since the intended use case for meta importers is to
   replace Python's normal import process entirely for some subset of
   modules, and the number of such importers currently implemented is
   quite small, this seems unlikely to be a big issue in practice.)


References
==========

.. [1] "namespace" vs "module" packages (mailing list thread)
    (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)

.. [2] "Dropping __init__.py requirement for subpackages"
    (http://mail.python.org/pipermail/python-dev/2006-April/064400.html)


Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:


From ericsnowcurrently at gmail.com  Thu Jul 14 00:27:01 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 13 Jul 2011 16:27:01 -0600
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
In-Reply-To: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
Message-ID: <CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.gmail.com>

On Wed, Jul 13, 2011 at 11:11 AM, P.J. Eby <pje at telecommunity.com> wrote:
> I'd appreciate any questions, problems, clarifications, concerns, etc. so we
> can clean this up before we run it past Python-Dev. ?There are also a couple
> of "XXX" comments down in the "Implementation Notes" section, with open
> questions we need to nail down. ?Mostly, though, this is looking... ?pretty
> doable, actually.
>

This is cool stuff.  And you have presented it really well.  I have
some (probably too much) feedback inline.

> Thanks!
>
>
> PEP: XXX
> Title: Simplified Package Layout and Partitioning
> Version: $Revision$
> Last-Modified: $Date$
> Author: P.J. Eby
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 12-Jul-2011
> Python-Version: 3.3
> Post-History:
> Replaces: 382
>
> Abstract
> ========
>
> This PEP proposes an enhancement to Python's package importing
> to:
>
> * Surprise users of other languages less,
> * Make it easier to convert a module into a package, and
> * Support dividing packages into separately installed components
> ?(ala "namespace packages", as described in PEP 382)
>
> The proposed enhancements do not change the semantics of any
> currently-importable directory layouts, but make it possible for
> packages to use a simplified directory layout (that is not importable
> currently).
>
> However, the proposed changes do NOT add any performance overhead to
> the importing of existing modules or packages, and performance for the
> new directory layout should be about the same as that of previous
> "namespace package" solutions (such as ``pkgutil.extend_path()``).
>
>
> The Problem
> ===========
>
> .. epigraph::
>
> ? ?"Most packages are like modules. ?Their contents are highly
> ? ?interdependent and can't be pulled apart. ?[However,] some
> ? ?packages exist to provide a separate namespace. ... ?It should
> ? ?be possible to distribute sub-packages or submodules of these
> ? ?[namespace packages] independently."
>
> ? ?-- Jim Fulton, shortly before the release of Python 2.3 [1]_
>
>
> When new users come to Python from other languages, they are often
> confused by Python's packaging semantics. ?At Google, for example,
> Guido received complaints from "a large crowd with pitchforks" [2]_
> that the requirement for packages to contain an ``__init__`` module
> was a "misfeature", and should be dropped.
>
> In addition, users coming from languages like Java or Perl are
> sometimes confused by a difference in Python's import path searching.
>
> In most other languages that have a path mechanism to Python's

... mechanism similar to Python's

> ``sys.path``, a package is merely a namespace that contains modules
> or classes, and can thus be spread across multiple directories in
> the language's path. ?In Perl, for instance, a ``Foo::Bar`` module
> will be searched for in ``Foo/`` subdirectories all along the module
> include path, not just in the first such subdirectory found.
>
> Worse, this is not just a problem for new users: it prevents *anyone*
> from easily splitting a package into separately-installable
> components. ?In Perl terms, it would be as if every possible ``Net::``
> module on CPAN had to be bundled up and shipped in a single tarball!
>
> For that reason, various workarounds for this latter limitation exist,
> circulated under the term "namespace packages". ?The Python standard
> library has provided one such workaround since Python 2.3 (via the
> ``pkgutil.extend_path()`` function), and the "setuptools" package
> provides another (via ``pkg_resources.declare_namespace()``).
>
> The workarounds themselves, however, fall prey to a *third* issue with
> Python's way of laying out packages in the filesystem.
>
> Because a package *must* contain an ``__init__`` module, any attempt
> to distribute modules for that package must necessarily include that
> ``__init__`` module, if those modules are to be importable.
>
> However, the very fact that each distribution of modules for a package
> must contain this (duplicated) ``__init__`` module, means that OS
> vendors who package up these module distributions must somehow handle
> the conflict caused by several distributions installing that
> ``__init__`` module to the same location in the filesystem.
>
> This led to the proposing of PEP 382 ("Namespace Packages") - a way
> to signal to Python's import machinery that a directory was
> importable, using unique filenames per module distribution.
>
> However, there was more than one downside to this approach.
> Performance for all import operations would be affected, and the
> process of designating a package became even more complex. ?New
> terminology had to be invented to explain the solution, and so on.
>
> As terminology discussions continued on the Import-SIG, it soon became
> apparent that the main reason it was so difficult to explain the
> concepts related to "namespace packages" was because Python's
> current way of handling packages is somewhat underpowered, when
> compared to other languages.
>
> That is, in other popular languages with package systems, no special
> term is needed to describe "namespace packages", because *all*
> packages generally behave in the desired fashion.
>
> Rather than being an isolated single directory with a special marker
> module (as in Python), packages in other languages are typically just
> a *union of appropriately-named directories* across the *entire*
> import or inclusion path.
>
> In Perl, for example, the module ``Foo`` is always found in a
> ``Foo.pm`` file, and a module ``Foo::Bar`` is always found in a
> ``Foo/Bar.pm`` file. ?(In other words, there is One Obvious Way to
> find the location of a particular module.)
>
> This is because Perl considers a module to be *different* from a
> package: the package is purely a *namespace* in which other modules
> may reside, and is only *coincidentally* the name of a module as well.
>
> In current versions of Python, however, the module and the package are
> more tightly bound together. ?``Foo`` is always a module -- whether it
> is found in ``Foo.py`` or ``Foo/__init__.py`` -- and it is tightly
> linked to its submodules (if any), which *must* reside in the exact
> same directory where the ``__init__.py`` was found.
>
> On the positive side, this design choice means that a package is quite
> self-contained, and can be installed, copied, etc. as a unit just by
> performing an operation on the package's root directory.
>
> On the negative side, however, it is non-intuitive for beginners, and
> requires a more complex step to turn a module into a package. ?If
> ``Foo`` begins its life as ``Foo.py``, then it must be moved and
> renamed to ``Foo/__init__.py``.
>
> Conversely, if you intend to create a ``Foo.Bar`` module from the
> start, but have no particular module contents to put in ``Foo``
> itself, then you have to create an empty and seemingly-irrelevant
> ``Foo/__init__.py`` file, just so that ``Foo.Bar`` can be imported.
>
> (And these issues don't just confuse newcomers to the language,
> either: they annoy many experienced developers as well.)
>
> So, after some discussion on the Import-SIG, this PEP was created
> as an alternative to PEP \382, in an attempt to solve *all* of the
> above problems, not just the "namespace package" use cases.
>
> And, as a delightful side effect, the solution proposed in this PEP
> does not affect the import performance of ordinary modules or
> self-contained (i.e. ``__init__``-based) packages.
>
>
> The Solution
> ============
>
> In the past, various proposals have been made to allow more intuitive
> approaches to package directory layout. ?However, most of them failed
> because of an apparent backward-compatibility problem.
>
> That is, if the requirement for an ``__init__`` module were simply
> dropped, it would open up the possibility for a directory named, say,
> ``string`` on ``sys.path``, to block importing of the standard library
> ``string`` module.
>
> Paradoxically, however, the failure of this approach does *not* arise
> from the elimination of the ``__init__`` requirement!
>
> Rather, the failure arises because the underlying approach takes for
> granted that a package is just ONE thing, instead of two.
>
> In truth, a package comprises two separate, but related entities: a
> module (with its own, optional contents), and a *namespace* where
> *other* modules or packages can be found.
>
> In current versions of Python, however, the module part (found in
> ``__init__``) and the namespace for submodule imports (represented
> by the ``__path__`` attribute) are both initialized at the same time,
> when the package is first imported.
>
> And, if you assume this is the *only* way to initialize these two
> things, then there is no way to drop the need for an ``__init__``
> module, while still being backwards-compatible with existing directory
> layouts.
>
> After all, as soon as you encounter a directory on ``sys.path``
> matching the desired name, that means you've "found" the package, and
> must stop searching, right?
>
> Well, not quite.
>
>
> A Thought Experiment
> --------------------
>
> Let's hop into the time machine for a moment, and pretend we're back
> in the early 1990s, shortly before Python packages and ``__init__.py``
> have been invented. ?But, imagine that we *are* familiar with
> Perl-like package imports, and we want to implement a similar system
> in Python.
>
> We'd still have Python's *module* imports to build on, so we could
> certainly conceive of having ``Foo.py`` as a parent ``Foo`` module
> for a ``Foo`` package. ?But how would we implement submodule and
> subpackage imports?
>
> Well, if we didn't have the idea of ``__path__`` attributes yet,
> we'd probably just search ``sys.path`` looking for ``Foo/Bar.py``.
>
> But we'd *only* do it when someone actually tried to *import*
> ``Foo.Bar``.
>
> NOT when they imported ``Foo``.
>
> And *that* lets us get rid of the backwards-compatibility problem
> of dropping the ``__init__`` requirement, back here in 2011.
>
> How?
>
> Well, when we ``import Foo``, we're not even *looking* for ``Foo/``
> directories on ``sys.path``, because we don't *care* yet. ?The only
> point at which we care, is the point when somebody tries to actually
> import a submodule or subpackage of ``Foo``.
>
> That means that if ``Foo`` is a standard library module (for example),
> and I happen to have a ``Foo`` directory on ``sys.path`` (without
> an ``__init__.py``, of course), then *nothing breaks*. ?The ``Foo``
> module is still just a module, and it's still imported normally.
>
>
> Self-Contained vs. "Virtual" Packages
> -------------------------------------
>
> Of course, in today's Python, trying to ``import Foo.Bar`` will
> fail if ``Foo`` is just a ``Foo.py`` module (and thus lacks a
> ``__path__`` attribute).
>
> So, this PEP proposes to *dynamically* create a ``__path__``, in the
> case where one is missing.
>
> That is, if I try to ``import Foo.Bar`` the proposed change to the
> import machinery will notice that the ``Foo`` module lacks a
> ``__path__``, and will therefore try to *build* one before proceeding.
>
> And it will do this by making a list of all the existing ``Foo/``
> subdirectories of the directories listed in ``sys.path``.
>
> If the list is empty, the import will fail with ``ImportError``, just
> like today. ?But if the list is *not* empty, then it is saved in
> a new ``Foo.__path__`` attribute, making the module a "virtual
> package".
>
> That is, because it now has a valid ``__path__``, we can proceed
> to import submodules or subpackages in the normal way.
>
> Now, notice that this change does not affect "classic", self-contained
> packages that have an ``__init__`` module in them. ?Such packages
> already *have* a ``__path__`` attribute (initialized at import time)
> so the import machinery won't try to create another one later.
>
> This means that (for example) the standard library ``email`` package
> will not be affected in any way by you having a bunch of unrelated
> directories named ``email`` on ``sys.path``.
>
> But it *does* mean that if you want to turn your ``Foo`` module into
> a ``Foo`` package, all you have to do is add a ``Foo/`` directory
> somewhere on ``sys.path``, and start adding modules to it.
>
> But what if you only want a "namespace package"? ?That is, a package
> that is *only* a namespace for various separately-distributed
> submodules and subpackages?
>
> For exmaple, if you're Zope Corporation, distributing dozens of
> separate tools like ``zc.buildout``, each in packages under the ``zc``
> namespace, you don't want to have to make and include an empty
> ``zc.py`` in every tool you ship. ?(And, if you're a Linux or other
> OS vendor, you don't want to deal with the package conflicts created
> by trying to install ten copies of ``zc.py`` to the same location!)
>
> No problem. ?All we have to do is make one more minor tweak to the
> import process: if the "classic" import process fails to find a
> self-contained module or package (e.g., if ``import zc`` fails to find
> a ``zc.py`` or ``zc/__init__.py``), then we once more try to build a
> ``__path__`` by searching for all the ``zc/`` directories on
> ``sys.path``, and putting them in a list.
>
> If this list is empty, we raise ``ImportError``. ?But if it's
> non-empty, we create an empty ``zc`` module, and put the list in
> ``zc.__path__``. ?Congratulations: ``zc`` is now a namespace-only,
> "pure virtual" package! ?It has no module contents, but you can still
> import submodules and subpackages from it, regardless of where they're
> located on ``sys.path``.
>
> (By the way, both of these additions to the import protocol (i.e. the
> dynamically-added ``__path__``, and dynamically-created modules)
> apply recursively to child packages, using the parent package's
> ``__path__`` in place of ``sys.path`` as a basis for generating a
> child ``__path__``. ?This means that self-contained and virtual
> packages can contain each other without limitation, with the caveat
> that if you put a virtual package inside a self-contained one, it's
> gonna have a really short ``__path__``!)

Nice.

>
>
> Backwards Compatibility and Performance
> ---------------------------------------
>
> Notice that these two changes *only* affect import operations that
> today would result in ``ImportError``. ?As a result, the performance
> of imports that do not involve virtual packages is unaffected, and
> potential backward compatibility issues are very restricted.
>
> Today, if you try to import submodules or subpackages from a module
> with no ``__path__``, it's an immediate error. ?And of course, if you
> don't have a ``zc.py`` or ``zc/__init__.py`` somewhere on ``sys.path``
> today, ``import zc`` would likewise fail.
>
> Thus, the only potential backwards-compatibility issues are:
>
> 1. Tools that expect package directories to have an ``__init__``
> ? module, that expect directories without an ``__init__`` module
> ? to be unimportable, or that expect ``__path__`` attributes to be
> ? static, will not recognize virtual packages as packages.
>

Should there be a way to indicate that you do not want a directory to
be considered for a package (an opt-out)?  Currently I can move the
__init__.py out of the way and it gets ignored by import.

> ? (In practice, this just means that tools will need updating to
> ? support virtual packages, e.g. by using ``pkgutil.walk_modules()``
> ? instead of using hardcoded filesystem searches.)
>
> 2. Code that *expects* certain imports to fail may now do something
> ? unexpected. ?This should be fairly rare in practice, as most sane,
> ? non-test code does not import things that are expected not to
> ? exist!
>
> The biggest likely exception to the above would be when a piece of
> code tries to check whether some package is installed by importing
> it. ?If this is done *only* by importing a top-level module (i.e., not
> checking for a ``__version__`` or some other attribute), *and* there
> is a directory of the same name as the sought-for package on
> ``sys.path`` somewhere, *and* the package is not actually installed,
> then such code could perhaps be fooled into thinking a package is
> installed that really isn't.
>
> However, even in this case, the failure is more likely to be annoying
> than damaging; in most cases, the code will simply fail a little later
> on, when it actually tries to DO something with the imported (but
> empty) module. ?(And code that checks for a ``__version__`` attribute
> or the presence of some desired function, class, or module
> in the package will not see such a false positive result in the
> first place.)

Good point.

>
> Meanwhile, tools that expect to locate packages and modules by
> walking a directory tree can be updated to use the existing
> ``pkgutil.walk_modules()`` API, and tools that need to inspect
> packages in memory should use the other APIs described in the
> `Standard Library Changes/Additions`_ section below.
>
>
> Specification
> =============
>
> Two changes are made to the existing import process.
>
> First, the built-in ``__import__`` function must not raise an
> ``ImportError`` when importing a submodule of a module with no
> ``__path__``. ?Instead, it must attempt to *create* a ``__path__``
> attribute for the parent module, as described in `__path__ creation`_
> below.
>
> Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent
> package ``__path__``) fails to find a module, the import process must
> also attempt to create a ``__path__`` attribute for the non-existent
> module. ?If the attempt succeeds, an empty module is created and its
> ``__path__`` is set. ?Otherwise, importing fails.
>

Nice summary.

> In both of the above cases, if a non-empty ``__path__`` is created,
> the name of the module whose ``__path__`` was created is added to
> ``sys.virtual_packages`` -- an initially-empty set of package names.

<warning>
I am looking at this PEP from the perspective that it may be useful,
and not terribly difficult, to factor in meta importers.  So if that
viewpoint is invalid a good chunk of my remaining comments may be
irrelevant.  Also, I have been knee deep in importlib in the last few
weeks, which will be painfully obvious in my feedback.  I apologize in
advance.  <wink>
</warning>

Perhaps it should be a mapping from the module name to the meta
importer which generated the __path__ entry for the module.  If meta
importers are factored in, the matching importer would be the one to
determine how __path__ should change (like in the situation described
for extend_virtual_paths() below).

>
> Conversely, if an empty ``__path__`` results, an ``ImportError``
> is immediately raised, and the module is not created or changed, nor
> is its name added to ``sys.virtual_packages``.
>
> (This way, code that extends ``sys.path`` at runtime can find out
> what virtual packages are currently imported, and thereby add any
> new subdirectories to those packages' ``__path__`` attributes. ?See
> `Standard Library Changes/Additions`_ below for more details.)

Clear and straightforward.

>
>
> ``__path__`` Creation
> ---------------------
>
> A virtual ``__path__`` is created by obtaining a PEP 302 "importer"
> object for each of the path entries found in ``sys.path`` (for a
> top-level module) or the parent ``__path__`` (for a submodule).
>
> (Note: because ``sys.meta_path`` importers are not associated with
> ``sys.path`` or ``__path__`` entry strings, such importers do *not*
> participate in this process.)
>

Nice.  The context for this note here make more sense than in the
other versions (of the other PEP).

Could the importers on sys.meta_path  be given the opportunity to take
control of the process, just as they get tried first when "finding"
modules?  Otherwise we'd be missing the means of customizing the
__path__ creation process, if that is important.  I don't think it
would add much complexity to the implementation and would parallel the
"finding" part of the import process.

In importlib, the _DefaultPathFinder class handles the search across
sys.path, corresponding to the default import behavior for files.  It
is implicitly added to the end of sys.meta_path for
importlib.__import__, along with the builtin and frozen importers.
For virtual __path__ creation, it would perform the process described
in this section.

Thus, _DefaultPathFinder would return the list of __path__ entry
strings resulting when no other meta importer matches the fullname.
However, if another (on sys.meta_path) matched, wouldn't the __path__
coming from  _DefaultPathFinder be potentially wrong?  If so, it would
pay to ask each importer on sys.meta_path for the virtual __path__ and
stop on the first hit.

> Each importer is checked for a ``get_subpath()`` method, and if
> present, the method is called with the full name of the module the
> ``__path__`` is being constructed for. ?The return value is either
> a string representing a package subdirectory, or ``None`` if no such
> subdirectory exists.

Should it return a list of strings rather than a single string?  Your
use of "strings" in the next sentence implies that it would.  If
get_path() is called at the meta_path level it would need to return a
list of strings.  I am guessing that importers on sys.path_hooks could
too.

>
> The strings returned by each importer are added to the ``__path__``
> being built, in the same order as they are found. ?(``None`` values
> and missing ``get_subpath()`` methods are simply skipped.)
>
> In Python code, the algorithm would look something like this::
>
> ? ?def get_virtual_path(modulename, parent_path=None):
>
> ? ? ? ?if parent_path is None:
> ? ? ? ? ? ?parent_path = sys.path

sys.path is used here instead of as the default arg so that it gets
evaluated each time?

>
> ? ? ? ?path = []
>
> ? ? ? ?for entry in parent_path:
> ? ? ? ? ? ?# Obtain a PEP 302 importer object - see pkgutil module
> ? ? ? ? ? ?importer = pkgutil.get_importer(entry)
>
> ? ? ? ? ? ?if hasattr(importer, 'get_subpath'):
> ? ? ? ? ? ? ? ?subpath = importer.get_subpath(modulename)
> ? ? ? ? ? ? ? ?if subpath is not None:
> ? ? ? ? ? ? ? ? ? ?path.append(subpath)
>
> ? ? ? ?return path
>
> And a function like this one should be exposed in the standard
> library as ``imp.get_virtual_path()``, so that people creating

Or in importlib...

> ``__import__`` replacements or ``sys.meta_path`` hooks can reuse it.
>
>
> Standard Library Changes/Additions
> ----------------------------------
>
> The ``pkgutil`` module should be updated to handle this
> specification appropriately, including any necessary changes to
> ``extend_path()``, ``iter_modules()``, etc. ?A new generic API for
> calling ``get_subpath()`` on importers should be added as well.
>
> Specifically the proposed changes and additions to ``pkgutil`` are:
>
> * A new ``get_subpath(importer, fullname)`` generic function, allowing
> ?implementations to be registered for existing importers.

Not that it necessarily impacts this PEP, but I'm not sure what you
mean by "registered for existing importers".  I am guessing that
pkgutil is used to facilitate behaviors in packaging libraries, like
setuptools, and that this registration is one of those behaviors.
Then again I am a little dense sometimes <wink>.

Don't sweat responding with an explanation.  I just wanted to point
out the the context of some of the pkgutil related stuff may not be
obvious; and that the documentation for pkgutil doesn't help a ton to
clarify that context.  This may not matter for the PEP and its
expected audience.

>
> * A new ``extend_virtual_paths(path_entry)`` function, to extend
> ?existing, already-imported virtual packages' ``__path__`` attributes
> ?to include any portions found in a new ``sys.path`` entry. ?This
> ?function should be called by applications extending ``sys.path``
> ?at runtime, e.g. when adding a plugin directory or an egg to the
> ?path.
>
> ?The implementation of this function does a simple top-down traversal
> ?of ``sys.virtual_packages``, and performs any necessary
> ?``get_subpath()`` calls to identify what path entries need to
> ?be added to each package's ``__path__``, given that `path_entry`
> ?has been added to ``sys.path``. ?(Or, in the case of sub-packages,
> ?adding a derived subpath entry, based on their parent namespace's
> ?``__path__``.)
>

As I already noted, this is pretty specific to the default file import
mechanism rather than the more general meta import process.  Maybe
that's all that is needed?  My sense of extending virtual paths is
pretty fuzzy.

> * A new ``iter_virtual_packages(parent='')`` function to allow
> ?top-down traversal of virtual packages in ``sys.virtual_packages``,
> ?by yielding the child virtual packages of `parent`. ?For example,
> ?calling ``iter_virtual_packages("zope")`` might yield ``zope.app``
> ?and ``zope.products`` (if they are imported virtual packages listed
> ?in ``sys.virtual_packages``), but **not** ``zope.foo.bar``.
> ?(This function is needed to implement ``extend_virtual_paths()``,
> ?but is also potentially useful for other code that needs to inspect
> ?imported virtual packages.)
>
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
> ?yield the names of modules found in virtual packages.
>
> In addition to the above changes, the ``zipimport`` importer should
> have its ``iter_modules()`` implementation similarly changed. ?(Note:
> current versions of Python implement this via a shim in ``pkgutil``,
> so technically this is also a change to ``pkgutil``.)
>
> Last, but not least, the ``imp`` module should expose the algorithm
> described in the `__path__ creation`_ section above, as a
> ``get_virtual_path(modulename, parent_path=None)`` function, so that
> creators of ``__import__`` replacements can use it.

Or this could go in importlib?  I guess it depends on where the
implementation happens.

>
>
> Implementation Notes
> --------------------
>
> For users, developers, and distributors of virtual packages:
>
> * ``sys.virtual_packages`` is allowed to contain non-existent or
> ?not-yet-imported package names; code that uses its contents should

If it where a dict the module name could point to None, rather than to
the responsible meta importer.

> ?not assume that every name in this set is also present in
> ?``sys.modules`` or that importing the name will necessarily succeed.

Good point.

>
> * If you are changing a currently self-contained package into a
> ?virtual one, it's important to note that you can no longer use its
> ?``__file__`` attribute to locate data files stored in a package
> ?directory. ?Instead, you must search ``__path__`` or use the
> ?``__file__`` of a submodule adjacent to the desired files, or
> ?of a self-contained subpackage that contains the desired files.

Nice catch.

The "optional extensions" section of PEP 302 has a bit about a
get_data() method for importers.  Using get_data() instead of __file__
or __path__ seems like a safer operation, much as you recommended
using pkgutil.walk_modules() above.

In the case of importlib (yes, it's on my mind), get_data() is already
implemented for the finders surrounding _DefaultPathFinder.  I am not
familiar with the importers that are currently used on
sys.path_importer_cache, but maybe they provide get_data() too?  (a
cursory look makes me think so)

>
> * XXX what is the __file__ of a "pure virtual" package? ?``None``?
> ?Some arbitrary string? ?The path of the first directory with a
> ?trailing separator? ?No matter what we put, *some* code is
> ?going to break, but the last choice might allow some code to
> ?accidentally work. ?Is that good or bad?
>
>
> For those implementing PEP \302 importer objects:
>
> * Importers that support the ``iter_modules()`` method (used by
> ?``pkgutil`` to locate importable modules and pacakges) and want to

s/pacakges/packages/

> ?add virtual package support should modify their ``iter_modules()``
> ?method so that it discovers and lists virtual packages as well as
> ?standard modules and packages. ?To do this, the importer should
> ?simply list all immediate subdirectory names in its jurisdiction
> ?that are valid Python identifiers.
>
> ?XXX This might list a lot of not-really-packages. ?Should we
> ?require importable contents to exist? ?If so, how deep do we
> ?search, and how do we prevent e.g. link loops, or traversing onto
> ?different filesystems, etc.? ?Ick.
>
> * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
> ?not need to implement ``get_subpath()``, because the method
> ?is only called on importers corresponding to ``sys.path`` entries
> ?and ``__path__`` entries. ?If a meta importer wishes to support
> ?virtual packages, it must do so entirely within its own
> ?``find_module()`` implementation.

Certainly that is a simpler approach, but it seems like each
find_module() implementation would end up doing it pretty much the
same way, following the pattern used by the sys.path handler.
However, you are probably right that handling just the sys.path stuff
is good enough.

>
> ?Unfortunately, it is unlikely that any such implementation will be
> ?able to merge its package subpaths with those of other meta
> ?importers or ``sys.path`` importers, so the meaning of "supporting
> ?virtual packages" for a meta importer is currently undefined!
>
> ?(However, since the intended use case for meta importers is to
> ?replace Python's normal import process entirely for some subset of
> ?modules, and the number of such importers currently implemented is
> ?quite small, this seems unlikely to be a big issue in practice.)

And that is why I wonder if all my blathering is relevant.  Still, I'm
just not sure that it would be difficult for an implementation of this
PEP to handle meta importers intelligently.  I would hate to discount
them unnecessarily.  If I'm just a vocal minority on this point I'll
let it go.  :)

Meta importers could always be addressed in a later addition, if
needed.  Only a couple of things would impact that later effort:

* sys.virtual_packages being a list vs. a dictionary
* get_path() returning a string vs. a list

And only one thing seems ambiguous when meta importers are left for
later.  If a module is loaded through a meta importer, which importer
handles a get_path() call?  When extend_virtual_paths is called, how
are meta-imported modules addressed?

>
>
> References
> ==========
>
> .. [1] "namespace" vs "module" packages (mailing list thread)
> ? (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
>
> .. [2] "Dropping __init__.py requirement for subpackages"
> ? (http://mail.python.org/pipermail/python-dev/2006-April/064400.html)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
> ? Local Variables:
> ? mode: indented-text
> ? indent-tabs-mode: nil
> ? sentence-end-double-space: t
> ? fill-column: 70
> ? coding: utf-8
> ? End:
>

One last point:  This PEP results in two ways to provide a module for
a package (<NAME>.py in addition to <NAME>/__init__.py).  However, you
do offer a good distinction; __init__.py is for "self-contained"
packages.  Is it clear when to use which?  Will __init__.py go away
after a while?  Will we have to start looking in two places for a
package's code?

Again, this is much clearer to me than the PEP 382 proposals were.
And your extensive experience with packaging really shows.  Sorry if
any of my feedback displays my ignorance in that area too painfully.
I most wholeheartedly defer to you and the rest on this list regarding
most of the stuff I have said.  :)

Thanks for working on this.

-eric

p.s. if you hurry maybe you can pick up PEP 402.  It's funny how those
PEP numbers line up sometimes.


> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From pje at telecommunity.com  Thu Jul 14 01:14:18 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 13 Jul 2011 19:14:18 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.g
	mail.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.gmail.com>
Message-ID: <20110713231448.1CBB03A4100@sparrow.telecommunity.com>

At 04:27 PM 7/13/2011 -0600, Eric Snow wrote:
>This is cool stuff.  And you have presented it really well.  I have
>some (probably too much) feedback inline.

Not at all too much; I've gone ahead and taken care of the typos you 
mentioned.  Other comments follow:


>Should there be a way to indicate that you do not want a directory to
>be considered for a package (an opt-out)?  Currently I can move the
>__init__.py out of the way and it gets ignored by import.

Renaming the directory is the quick solution.  If you have a tool 
that's looking for anything that's a package, then it'll need an 
exclusion option, or you'll have to rename the directory to something 
the tool will skip.  (Ideally, tools should skip directories that 
aren't valid Python identifiers.)


>I am looking at this PEP from the perspective that it may be useful,
>and not terribly difficult, to factor in meta importers.  So if that
>viewpoint is invalid a good chunk of my remaining comments may be
>irrelevant.  Also, I have been knee deep in importlib in the last few
>weeks, which will be painfully obvious in my feedback.  I apologize in
>advance.  <wink>

If you can provide a *use case* for explicitly making meta importers 
part of the process, then great.

However, even if they are, the hooks would probably be in the form of 
a *different* API for meta importers, that's called with a parent 
path as well as a module name, that would return a list of strings 
rather than an individual string.  The virtual path creation process 
would then walk the meta importers first, calling that method, until 
it got a non-empty list, or until it had to fall back to doing it 
itself (in the way described by the PEP).

In the importlib case, then, you could just implement that method 
(say, "build_virtual_path()") on the default meta importer.  (Which 
would also implement the virtual package fallback, or leave it to 
another meta-importer later on the path.)

Anyway, that, as far as I can tell, is the only sane way to make meta 
importers participate in the virtual path building process, and IMO 
it's an extension that isn't really needed at the moment, and would 
complicate the specification in the PEP.

That being said, if somebody wanted to implement the additional 
feature in importlib "off the books", it's not going to break 
anything.  ;-)  We can always update the PEP afterwards.

Seriously, though, I suppose we could add a note saying it could be 
done, and should be done if anybody has use cases, but we're not 
spelling it out at the moment.


>sys.path is used here instead of as the default arg so that it gets
>evaluated each time?

Yes.  That's normal for ``imp`` APIs.


>Or in importlib...

Well, I don't really want to tie the PEP to importlib right now, and 
``imp`` is the established point for exposing the machinery Python is 
actually using.  But of course, I'm not the one doing the work.  ;-)


> > * A new ``get_subpath(importer, fullname)`` generic function, allowing
> >  implementations to be registered for existing importers.
>
>Not that it necessarily impacts this PEP, but I'm not sure what you
>mean by "registered for existing importers".  I am guessing that
>pkgutil is used to facilitate behaviors in packaging libraries, like
>setuptools, and that this registration is one of those behaviors.
>Then again I am a little dense sometimes <wink>.

I just killed that entire bullet.  The truth is, it really only 
mattered for 2.x, where it can't really help anyway.  So, I've 
dropped it from the spec.


>As I already noted, this is pretty specific to the default file import
>mechanism rather than the more general meta import process.  Maybe
>that's all that is needed?  My sense of extending virtual paths is
>pretty fuzzy.

Meta importers are for implementing alternative import strategies, 
rather than being one more step along the way in a standard 
import.  You could, for example, implement "pure virtual" lookup as a 
meta importer that sits *after* the one that does Python's normal 
sys.path/__path__ searching.  (And that might well be the way to do 
it in importlib.)


> > * ``sys.virtual_packages`` is allowed to contain non-existent or
> >  not-yet-imported package names; code that uses its contents should
>
>If it where a dict the module name could point to None, rather than to
>the responsible meta importer.

Let's see if there are any use cases for meta importer participation 
before we go down that route.  Outside of importlib and my sketch of 
a 2.x implementation for PEP 382, just how many meta importers 
*exist* in the outside world, after nearly nine years of PEP 302 
being in existence?


>The "optional extensions" section of PEP 302 has a bit about a
>get_data() method for importers.  Using get_data() instead of __file__
>or __path__ seems like a safer operation, much as you recommended
>using pkgutil.walk_modules() above.
>
>In the case of importlib (yes, it's on my mind), get_data() is already
>implemented for the finders surrounding _DefaultPathFinder.  I am not
>familiar with the importers that are currently used on
>sys.path_importer_cache, but maybe they provide get_data() too?  (a
>cursory look makes me think so)

I didn't bother with explaining this much because the 
``pkg_resources`` module provided by setuptools takes care of 
interfacing with these things to give you a friendly API for 
retrieving strings, streams, or filenames for module-adjacent data files.


>Certainly that is a simpler approach, but it seems like each
>find_module() implementation would end up doing it pretty much the
>same way, following the pattern used by the sys.path handler.
>However, you are probably right that handling just the sys.path stuff
>is good enough.

Again, if somebody can point to a meta importer that's *not* part of 
importlib, we can take a look at that.  ;-)


>* sys.virtual_packages being a list vs. a dictionary

Er, it's a set, not a list.  I'll change the bit that says that to 
highlight ``set()`` as a built-in type, vs. just the word "set".


>And only one thing seems ambiguous when meta importers are left for
>later.  If a module is loaded through a meta importer, which importer
>handles a get_path() call?  When extend_virtual_paths is called, how
>are meta-imported modules addressed?

That's really up to the meta-importer.  You're really not supposed to 
use meta-importers to represent import *locations*; they're for 
extending or replacing import *policies*.  If you need locations, you 
make up a string to represent the location and put it in sys.path, 
after adding a path hook that recognizes the corresponding string.

That's why the whole idea of treating a meta importer as if it were a 
regular path entry importer is bogus: if you wanted to just implement 
another search location, you should just use a path entry importer; 
you don't need a meta-importer at all.

To put it another way, if write a meta-importer, then you really do 
need to consider what way you'll do ``__path__`` building, and part 
of the point of doing so in a meta-importer would be so that you 
could *change* the way it was done.  So why would you want to be 
called as part of a protocol that you're probably going to replace, anyway?



>One last point:  This PEP results in two ways to provide a module for
>a package (<NAME>.py in addition to <NAME>/__init__.py).  However, you
>do offer a good distinction; __init__.py is for "self-contained"
>packages.  Is it clear when to use which?  Will __init__.py go away
>after a while?  Will we have to start looking in two places for a
>package's code?

I'll add something on that to the notes section:

* While virtual packages are easy to set up and use, there is still
   a time and place for using self-contained packages.  While it's not
   strictly necessary, adding an ``__init__`` module to your
   self-contained packages lets users of the package (and Python
   itself) know that *all* of the package's code will be found in
   that single subdirectory.  In addition, it lets you define
   ``__all__``, expose a public API, provide a package-level docstring,
   and do other things that make more sense for a self-contained
   project than for a mere "namespace" package.


From ericsnowcurrently at gmail.com  Thu Jul 14 01:52:47 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 13 Jul 2011 17:52:47 -0600
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
In-Reply-To: <20110713231448.1CBB03A4100@sparrow.telecommunity.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.gmail.com>
	<20110713231448.1CBB03A4100@sparrow.telecommunity.com>
Message-ID: <CALFfu7CHvVPnpyzP0xJm066kMWqtYwme5HPtWUwtaCdGQ5W6SA@mail.gmail.com>

On Wed, Jul 13, 2011 at 5:14 PM, P.J. Eby <pje at telecommunity.com> wrote:
> At 04:27 PM 7/13/2011 -0600, Eric Snow wrote:
> Outside of importlib and my sketch of a 2.x
> implementation for PEP 382, just how many meta importers *exist* in the
> outside world, after nearly nine years of PEP 302 being in existence?

So true.  I'm fine with taking the approach of  "handling sys.path
importers is good enough".

Perhaps one reason I have been pressing this is because of a project I
am working on that makes extensive use of meta importers.  And I
expect that everyone will be using it heavily within a few months of
its completion <wink>.

>> * sys.virtual_packages being a list vs. a dictionary
>
> Er, it's a set, not a list. ?I'll change the bit that says that to highlight
> ``set()`` as a built-in type, vs. just the word "set".

Yeah, should have been set vs. dictionary.  But in the reality of how
meta importers factor in here, a dictionary it need not be.

>> And only one thing seems ambiguous when meta importers are left for
>> later. ?If a module is loaded through a meta importer, which importer
>> handles a get_path() call? ?When extend_virtual_paths is called, how
>> are meta-imported modules addressed?
>
> That's really up to the meta-importer. ?You're really not supposed to use
> meta-importers to represent import *locations*; they're for extending or
> replacing import *policies*. ?If you need locations, you make up a string to
> represent the location and put it in sys.path, after adding a path hook that
> recognizes the corresponding string.

That is a great explanation.    I guess that just makes me wonder what
part of the import process meta importers should respect.  Is it
anything goes?  The onus seems to be on the meta importer to make its
new import behavior as unsurprising as possible.  Regardless, this
doesn't have much bearing on this PEP past what you have already
addressed. :)

>> One last point: ?This PEP results in two ways to provide a module for
>> a package (<NAME>.py in addition to <NAME>/__init__.py). ?However, you
>> do offer a good distinction; __init__.py is for "self-contained"
>> packages. ?Is it clear when to use which? ?Will __init__.py go away
>> after a while? ?Will we have to start looking in two places for a
>> package's code?
>
> I'll add something on that to the notes section:
>
> * While virtual packages are easy to set up and use, there is still
> ?a time and place for using self-contained packages. ?While it's not
> ?strictly necessary, adding an ``__init__`` module to your
> ?self-contained packages lets users of the package (and Python
> ?itself) know that *all* of the package's code will be found in
> ?that single subdirectory. ?In addition, it lets you define
> ?``__all__``, expose a public API, provide a package-level docstring,
> ?and do other things that make more sense for a self-contained
> ?project than for a mere "namespace" package.

Sounds good.  Thanks for taking the time to clarify.

-eric

From ncoghlan at gmail.com  Thu Jul 14 05:16:50 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 Jul 2011 13:16:50 +1000
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
In-Reply-To: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.gmail.com>

Excellent write-up!

On Thu, Jul 14, 2011 at 3:11 AM, P.J. Eby <pje at telecommunity.com> wrote:
> Thus, the only potential backwards-compatibility issues are:
>
> 1. Tools that expect package directories to have an ``__init__``
> ? module, that expect directories without an ``__init__`` module
> ? to be unimportable, or that expect ``__path__`` attributes to be
> ? static, will not recognize virtual packages as packages.
>
> ? (In practice, this just means that tools will need updating to
> ? support virtual packages, e.g. by using ``pkgutil.walk_modules()``
> ? instead of using hardcoded filesystem searches.)

It's probably worth noting here that tools that do manual filesystem
searches often already break when confronted with PEP 302 importers
(including zipimport), so this would just be more incentive for them
to do the right thing.

We may also want to provide (probably in importlib) a way to walk the
*potentially* importable modules on a path entry without actually
importing them.

While I understand the desire to focus on an import.c/pkgutil.py based
implementation at this point, it's highly likely than builtin
__import__ will be importlib based for 3.3. I'd be a lot happier if we
stopped double-keying work and just wrote the importlib versions
rather than messing with the soon-to-die C code any further.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From pje at telecommunity.com  Thu Jul 14 20:13:52 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 14 Jul 2011 14:13:52 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.g
	mail.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.gmail.com>
Message-ID: <20110714181426.C4F323A4100@sparrow.telecommunity.com>

At 01:16 PM 7/14/2011 +1000, Nick Coghlan wrote:
>We may also want to provide (probably in importlib) a way to walk the
>*potentially* importable modules on a path entry without actually
>importing them.

No problem.  Let me just set the time machine for 2006 and add it to 
pkgutil instead, so it'll be in Python 2.5+.  How dos the name 
'iter_modules()' sound?  ;-)


>While I understand the desire to focus on an import.c/pkgutil.py based
>implementation at this point, it's highly likely than builtin
>__import__ will be importlib based for 3.3. I'd be a lot happier if we
>stopped double-keying work and just wrote the importlib versions
>rather than messing with the soon-to-die C code any further.

Since I'm not doing the actual work for 3.3, I don't really care how 
it gets done.  I just don't want to make the *specification* depend 
on that, which is why I'm saying "imp" for the API rather than 
importlib.  When importlib goes in, after all, imp will be importing 
lots of other things from it anyway.  ;-)

That all being said, if somebody Pronounces that importlib is the 
right place to expose it, that's fine too.

(Presumably pkgutil will need some refactoring as well, since it 
currently simulates some things that're probably alo implemented in importlib.)



From ncoghlan at gmail.com  Fri Jul 15 06:23:34 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 Jul 2011 14:23:34 +1000
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
In-Reply-To: <20110714181426.C4F323A4100@sparrow.telecommunity.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.gmail.com>
	<20110714181426.C4F323A4100@sparrow.telecommunity.com>
Message-ID: <CADiSq7fc7WfyuQmfGFA5ZyTTTox0Sp3uspTeD1Oxr1tnpwmXpQ@mail.gmail.com>

On Fri, Jul 15, 2011 at 4:13 AM, P.J. Eby <pje at telecommunity.com> wrote:
> At 01:16 PM 7/14/2011 +1000, Nick Coghlan wrote:
>>
>> We may also want to provide (probably in importlib) a way to walk the
>> *potentially* importable modules on a path entry without actually
>> importing them.
>
> No problem. ?Let me just set the time machine for 2006 and add it to pkgutil
> instead, so it'll be in Python 2.5+. ?How dos the name 'iter_modules()'
> sound? ?;-)

For some reason I was thinking that only iterated over already loaded
modules. No, I don't know why I thought that, given that sys.modules
already covers that use case :P

Fair enough on deferring the decision on how the importlib transition
affects the public API until after it actually happens.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ericsnowcurrently at gmail.com  Sat Jul 16 09:42:43 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 16 Jul 2011 01:42:43 -0600
Subject: [Import-SIG] backport of importlib
Message-ID: <CALFfu7DVG+4cBh+fcjG3nDvQi+PzWZQSrtcHkWFfigULKQ7G4A@mail.gmail.com>

So I've gone ahead and written a (naive and probably incomplete)
script that backports importlib to 2.x[1].  There were a few syntax
differences, a couple of modules were new or had new functions, and I
had to reintroduce the old-style relative imports.

I'm hoping this will allow PEP 382 and the import engine to both be
backported simply by running this script on the implementation out of
3.3.  This was definitely a good exercise in getting familiar with the
importlib implementation.

-eric

[1] http://pypi.python.org/pypi?:action=display&name=backport_importlib

From ncoghlan at gmail.com  Sat Jul 16 10:22:39 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 16 Jul 2011 18:22:39 +1000
Subject: [Import-SIG] backport of importlib
In-Reply-To: <CALFfu7DVG+4cBh+fcjG3nDvQi+PzWZQSrtcHkWFfigULKQ7G4A@mail.gmail.com>
References: <CALFfu7DVG+4cBh+fcjG3nDvQi+PzWZQSrtcHkWFfigULKQ7G4A@mail.gmail.com>
Message-ID: <CADiSq7cG_ymiaiLk4ygCLtgnYQj16HDotjTp8QQHFiR5GD8gPQ@mail.gmail.com>

On Sat, Jul 16, 2011 at 5:42 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> So I've gone ahead and written a (naive and probably incomplete)
> script that backports importlib to 2.x[1]. ?There were a few syntax
> differences, a couple of modules were new or had new functions, and I
> had to reintroduce the old-style relative imports.

I suspect several of the transforms you're applying would be handled
natively by 3to2 - have you looked into using that at all?

> I'm hoping this will allow PEP 382 and the import engine to both be
> backported simply by running this script on the implementation out of
> 3.3. ?This was definitely a good exercise in getting familiar with the
> importlib implementation.

Did I ever tell you about the (deliberately undocumented) standard
import emulation in pkgutil? That's what runpy and a couple of other
pieces of the 2.x stdlib use to get around the fact that importlib
didn't exist until recently (and still doesn't exist in its full form
in 2.x). (Although I guess relying on that would make it harder to use
importlib itself when forward porting to 3.x)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ericsnowcurrently at gmail.com  Sun Jul 17 00:24:43 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 16 Jul 2011 16:24:43 -0600
Subject: [Import-SIG] backport of importlib
In-Reply-To: <CADiSq7cG_ymiaiLk4ygCLtgnYQj16HDotjTp8QQHFiR5GD8gPQ@mail.gmail.com>
References: <CALFfu7DVG+4cBh+fcjG3nDvQi+PzWZQSrtcHkWFfigULKQ7G4A@mail.gmail.com>
	<CADiSq7cG_ymiaiLk4ygCLtgnYQj16HDotjTp8QQHFiR5GD8gPQ@mail.gmail.com>
Message-ID: <CALFfu7CihYZBmY7Z96yaBwtD5P+5RpjgjH+5fw36Zcpn0PhxAw@mail.gmail.com>

On Sat, Jul 16, 2011 at 2:22 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I suspect several of the transforms you're applying would be handled
> natively by 3to2 - have you looked into using that at all?

Yeah, I remembered it once I was mostly already done (it didn't take a
long time).  If the backport script has many omissions I may revisit
it with 3to2.

> Did I ever tell you about the (deliberately undocumented) standard
> import emulation in pkgutil? That's what runpy and a couple of other
> pieces of the 2.x stdlib use to get around the fact that importlib
> didn't exist until recently (and still doesn't exist in its full form
> in 2.x). (Although I guess relying on that would make it harder to use
> importlib itself when forward porting to 3.x)

That and I figure it will be easier to take advantage of things like
the import engine and PEP 382 if it is a scripted backport of
importlib.  If I remember right from pycon, the packaging folks were
looking at a similar strategy.

-eric

>
> Cheers,
> Nick.
>
> --
> Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
>

From pje at telecommunity.com  Mon Jul 18 16:50:28 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 18 Jul 2011 10:50:28 -0400
Subject: [Import-SIG] So...  should we do this thing?
Message-ID: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>

What do y'all think?  Should we submit the PEP, and run it by 
Python-Dev?  Anybody have any changes, questions, etc.?

Perhaps most important: are there any people willing and able to do 
the implementation for Python 3?  ;-)


From barry at python.org  Mon Jul 18 18:17:26 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 18 Jul 2011 12:17:26 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
Message-ID: <20110718121726.123e5b44@resist.wooz.org>

I finally had a chance to read this.  TL;DR: +1.

I have a few quibbles about typos and grammar, but let's ignore that for now.
I have two questions of substance at this point.

1. Sometimes, packages can have non-importable data directories,
   e.g. foo/test/data.  Where foo.test would be an importable subpackage,
   foo.test.data should not be.  Today we can just omit the __init__.py from
   foo/test/data.  Under the proposed regime there would IIUC, be no way to
   prevent foo.test.data from being a subpackage.  It's entirely possible that
   foo/test/data would have .py files in it which would themselves be
   importable.  Is this a bad thing?  If so, do we need some mechanism to
   prevent recursion into some subdirectories?

2. The __file__ issue.  My gut tells me that pure virtual modules should have
   None as their __file__.  It seems wrong to use anything else, and your
   "accidentally work" observation is not calming. ;)

   The inability to use __file__ to find data files is somewhat troubling
   though.  Let's say we want to find the foo/test/data subdir above, and
   `foo` is pure-virtual, while `test` is an __init__.py-less package.

   I'm fine not being able to use foo.__file__, but I will probably want to
   use `os.path.join(foo.test.__file__, 'data')`.  Will that work?  What would
   foo.test's __file__ be?  The `foo/test` directory perhaps?  Of course there
   could be multiple `foo/test` directories, so this is probably why your
   suggesting to search foo.test.__path__ instead.

   I'd actually be okay with that, *if* pkg_resources will be updated to
   handle this case.  In general, we've been recommending people use
   pkg_resources anyway (wasn't there a push to move part of this package into
   the stdlib?).

I'll read up on the rest of the thread now, but I think the PEP holds up well
and makes a convincing argument.  I think it's certainly worthy of posting to
python-dev to see if anybody else can shoot holes in it, or come up with
useful solutions to open questions.  I'll be very interested to see Guido's
reaction to it. :)

Thanks for taking this on PJE.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110718/914767d9/attachment.pgp>

From barry at python.org  Mon Jul 18 18:18:24 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 18 Jul 2011 12:18:24 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.gmail.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.gmail.com>
Message-ID: <20110718121824.28db7f1e@resist.wooz.org>

On Jul 14, 2011, at 01:16 PM, Nick Coghlan wrote:

>While I understand the desire to focus on an import.c/pkgutil.py based
>implementation at this point, it's highly likely than builtin
>__import__ will be importlib based for 3.3. I'd be a lot happier if we
>stopped double-keying work and just wrote the importlib versions
>rather than messing with the soon-to-die C code any further.

Is that really true?  I keep hearing conflicting estimates about that.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110718/4f73dbd1/attachment.pgp>

From barry at python.org  Mon Jul 18 18:29:31 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 18 Jul 2011 12:29:31 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.gmail.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.gmail.com>
Message-ID: <20110718122931.6bb07aab@resist.wooz.org>

One other quick thought about __file__.

A common use case for it is for debugging purposes.  E.g. a user may say "I'm
getting a different foo package than I expected" and that's causing problems
with their application.   Commonly, we'll say to run this:

$ python -c "import foo; print foo.__file__"

to prove where they got it from.  While I still think this makes sense to
print None for pure-virtuals, I might still want to know something about where
on the file system these things live.  I suppose that if `foo` were a
pure-virtual, then this would be better diagnostics:

$ python -c "import foo; print foo.__path__"

since it would tell us what file system paths contributed to the creation of
`foo` as a pure virtual package.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110718/508c3594/attachment.pgp>

From barry at python.org  Mon Jul 18 18:32:29 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 18 Jul 2011 12:32:29 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <20110713231448.1CBB03A4100@sparrow.telecommunity.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CALFfu7Ct4Fzv0TZgeJhC0Xnh7aPASbidOh5Hd1DnfCDk7oSVEg@mail.gmail.com>
	<20110713231448.1CBB03A4100@sparrow.telecommunity.com>
Message-ID: <20110718123229.32add477@resist.wooz.org>

On Jul 13, 2011, at 07:14 PM, P.J. Eby wrote:

>At 04:27 PM 7/13/2011 -0600, Eric Snow wrote:

>>Should there be a way to indicate that you do not want a directory to
>>be considered for a package (an opt-out)?  Currently I can move the
>>__init__.py out of the way and it gets ignored by import.
>
>Renaming the directory is the quick solution.  If you have a tool that's
>looking for anything that's a package, then it'll need an exclusion option,
>or you'll have to rename the directory to something the tool will skip.
>(Ideally, tools should skip directories that aren't valid Python
>identifiers.)

I agree that tools should skip directories that aren't valid identifiers.
Maybe that's good enough, but I half suspect that the opt-out requirement will
come up often in future discussions.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110718/f512fdea/attachment.pgp>

From barry at python.org  Mon Jul 18 18:44:44 2011
From: barry at python.org (Barry Warsaw)
Date: Mon, 18 Jul 2011 12:44:44 -0400
Subject: [Import-SIG] So...  should we do this thing?
In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
Message-ID: <20110718124444.0ca8b47b@resist.wooz.org>

On Jul 18, 2011, at 10:50 AM, P.J. Eby wrote:

>What do y'all think?  Should we submit the PEP, and run it by Python-Dev?
>Anybody have any changes, questions, etc.?

Yes, to all questions!  (See my other follow up).

>Perhaps most important: are there any people willing and able to do the
>implementation for Python 3?  ;-)

Possibly so; I might even get some Official Work Time for it.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110718/8d258c7f/attachment.pgp>

From ericsnowcurrently at gmail.com  Mon Jul 18 18:55:57 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 18 Jul 2011 10:55:57 -0600
Subject: [Import-SIG] So... should we do this thing?
In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
Message-ID: <CALFfu7A2z_NyeJCPdXKZOXSwRkr3pOroZy7hUKKMjFLxthvKfQ@mail.gmail.com>

On Mon, Jul 18, 2011 at 8:50 AM, P.J. Eby <pje at telecommunity.com> wrote:
> What do y'all think? ?Should we submit the PEP, and run it by Python-Dev?
> ?Anybody have any changes, questions, etc.?
>
> Perhaps most important: are there any people willing and able to do the
> implementation for Python 3? ?;-)

I could take a stab at an importlib version.

-eric

>
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>

From brett at python.org  Mon Jul 18 19:01:36 2011
From: brett at python.org (Brett Cannon)
Date: Mon, 18 Jul 2011 10:01:36 -0700
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
In-Reply-To: <20110718121824.28db7f1e@resist.wooz.org>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<CADiSq7dDqdR2GHc_ZFSJ9drGFVMqAZ86j8oWvDOqYRHGghiB1g@mail.gmail.com>
	<20110718121824.28db7f1e@resist.wooz.org>
Message-ID: <CAP1=2W7VsonqJ9YJS5MoRXsiwQZz=4O+VHAiDP09mpnr63T=Zw@mail.gmail.com>

On Mon, Jul 18, 2011 at 09:18, Barry Warsaw <barry at python.org> wrote:

> On Jul 14, 2011, at 01:16 PM, Nick Coghlan wrote:
>
> >While I understand the desire to focus on an import.c/pkgutil.py based
> >implementation at this point, it's highly likely than builtin
> >__import__ will be importlib based for 3.3. I'd be a lot happier if we
> >stopped double-keying work and just wrote the importlib versions
> >rather than messing with the soon-to-die C code any further.
>
> Is that really true?  I keep hearing conflicting estimates about that.
>

It's as true as I make it. =) And it's my #1 Python 3.3 project so I am
going to do my damnedest to make it happen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110718/1f8ddc81/attachment-0001.html>

From ncoghlan at gmail.com  Tue Jul 19 00:07:13 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 19 Jul 2011 08:07:13 +1000
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
	Partitioning"
In-Reply-To: <20110718121726.123e5b44@resist.wooz.org>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<20110718121726.123e5b44@resist.wooz.org>
Message-ID: <CADiSq7c6uzJrbapPMnvn6R7CP6+_fus2DQ_C+rj0YB_zTioKOQ@mail.gmail.com>

On Tue, Jul 19, 2011 at 2:17 AM, Barry Warsaw <barry at python.org> wrote:
> 2. The __file__ issue. ?My gut tells me that pure virtual modules should have
> ? None as their __file__. ?It seems wrong to use anything else, and your
> ? "accidentally work" observation is not calming. ;)
>
> ? The inability to use __file__ to find data files is somewhat troubling
> ? though. ?Let's say we want to find the foo/test/data subdir above, and
> ? `foo` is pure-virtual, while `test` is an __init__.py-less package.
>
> ? I'm fine not being able to use foo.__file__, but I will probably want to
> ? use `os.path.join(foo.test.__file__, 'data')`. ?Will that work? ?What would
> ? foo.test's __file__ be? ?The `foo/test` directory perhaps? ?Of course there
> ? could be multiple `foo/test` directories, so this is probably why your
> ? suggesting to search foo.test.__path__ instead.
>
> ? I'd actually be okay with that, *if* pkg_resources will be updated to
> ? handle this case. ?In general, we've been recommending people use
> ? pkg_resources anyway (wasn't there a push to move part of this package into
> ? the stdlib?).

pkgutil.get_data() needs to be updated to handle this case, so
retrieving the contents of a specific file in the directory above
could be written as either of the following:

pkgutil.get_data(foo, 'test/data/file.dat')
pkgutil.get_data(foo.test, 'data/file.dat')

The question of PEP 302 and listing *available* data files (and other
directory-style or lazy data access I/O operations) remains open
(independent of the changes in this PEP). Note that os.path.join based
approaches already break as soon as you put the package and data files
in a zipfile.

In reality, I believe people should be using the appropriate packaging
APIs so that source files and data files may be deployed to distinct
locations.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From pje at telecommunity.com  Tue Jul 19 00:49:23 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 18 Jul 2011 18:49:23 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <20110718121726.123e5b44@resist.wooz.org>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<20110718121726.123e5b44@resist.wooz.org>
Message-ID: <20110718225006.5A3DE3A40AA@sparrow.telecommunity.com>

At 12:17 PM 7/18/2011 -0400, Barry Warsaw wrote:
>1. Sometimes, packages can have non-importable data directories,
>    e.g. foo/test/data.  Where foo.test would be an importable subpackage,
>    foo.test.data should not be.  Today we can just omit the __init__.py from
>    foo/test/data.  Under the proposed regime there would IIUC, be no way to
>    prevent foo.test.data from being a subpackage.  It's entirely 
> possible that
>    foo/test/data would have .py files in it which would themselves be
>    importable.  Is this a bad thing?

Why would it be?

>If so, do we need some mechanism to
>    prevent recursion into some subdirectories?

You could rename the subdirectory, I suppose.



>2. The __file__ issue.  My gut tells me that pure virtual modules should have
>    None as their __file__.  It seems wrong to use anything else, and your
>    "accidentally work" observation is not calming. ;)

Heh.  ;-)


>    The inability to use __file__ to find data files is somewhat troubling
>    though.  Let's say we want to find the foo/test/data subdir above, and
>    `foo` is pure-virtual, while `test` is an __init__.py-less package.
>
>    I'm fine not being able to use foo.__file__, but I will probably want to
>    use `os.path.join(foo.test.__file__, 'data')`.

Currently, you'd actually join to the dirname() of the __file__, not 
the plain file.  Thus, putting a directory name with a trailing '/' 
in __file__ would then make the current incantation work for that 
case, as long as you were fine with looking in the *first* directory 
where the file was.

However, I'm not as keen on that as a general solution, simply 
because if you add a 'foo/test.py', then the __file__ will change 
such that a different incantation is required to find the directory.


>   Will that work?  What would
>    foo.test's __file__ be?  The `foo/test` directory perhaps?  Of 
> course there
>    could be multiple `foo/test` directories, so this is probably why your
>    suggesting to search foo.test.__path__ instead.
>
>    I'd actually be okay with that, *if* pkg_resources will be updated to
>    handle this case.  In general, we've been recommending people use
>    pkg_resources anyway (wasn't there a push to move part of this 
> package into
>    the stdlib?).

pkg_resources says not to use a namespace package as your target for 
a lookup, but instead to always use a self-contained package or a 
module that's adjacent to what you're looking for, for this very 
reason.  There's really no change here.


>I'll read up on the rest of the thread now, but I think the PEP holds up well
>and makes a convincing argument.  I think it's certainly worthy of posting to
>python-dev to see if anybody else can shoot holes in it, or come up with
>useful solutions to open questions.  I'll be very interested to see Guido's
>reaction to it. :)

Me too.  ;-)


From pje at telecommunity.com  Tue Jul 19 00:52:08 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 18 Jul 2011 18:52:08 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <CADiSq7c6uzJrbapPMnvn6R7CP6+_fus2DQ_C+rj0YB_zTioKOQ@mail.g
	mail.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<20110718121726.123e5b44@resist.wooz.org>
	<CADiSq7c6uzJrbapPMnvn6R7CP6+_fus2DQ_C+rj0YB_zTioKOQ@mail.gmail.com>
Message-ID: <20110718225244.551E93A40AA@sparrow.telecommunity.com>

At 08:07 AM 7/19/2011 +1000, Nick Coghlan wrote:
>On Tue, Jul 19, 2011 at 2:17 AM, Barry Warsaw <barry at python.org> wrote:
> > 2. The __file__ issue.  My gut tells me that pure virtual modules 
> should have
> >   None as their __file__.  It seems wrong to use anything else, and your
> >   "accidentally work" observation is not calming. ;)
> >
> >   The inability to use __file__ to find data files is somewhat troubling
> >   though.  Let's say we want to find the foo/test/data subdir above, and
> >   `foo` is pure-virtual, while `test` is an __init__.py-less package.
> >
> >   I'm fine not being able to use foo.__file__, but I will probably want to
> >   use `os.path.join(foo.test.__file__, 'data')`.  Will that 
> work?  What would
> >   foo.test's __file__ be?  The `foo/test` directory perhaps?  Of 
> course there
> >   could be multiple `foo/test` directories, so this is probably why your
> >   suggesting to search foo.test.__path__ instead.
> >
> >   I'd actually be okay with that, *if* pkg_resources will be updated to
> >   handle this case.  In general, we've been recommending people use
> >   pkg_resources anyway (wasn't there a push to move part of this 
> package into
> >   the stdlib?).
>
>pkgutil.get_data() needs to be updated to handle this case, so
>retrieving the contents of a specific file in the directory above
>could be written as either of the following:
>
>pkgutil.get_data(foo, 'test/data/file.dat')
>pkgutil.get_data(foo.test, 'data/file.dat')

Really, these should be done relative to either a module or a 
self-contained package, unless we want to modify these things to 
search the __path__ -- and I'm not entirely sure that we do.


>The question of PEP 302 and listing *available* data files (and other
>directory-style or lazy data access I/O operations) remains open
>(independent of the changes in this PEP). Note that os.path.join based
>approaches already break as soon as you put the package and data files
>in a zipfile.
>
>In reality, I believe people should be using the appropriate packaging
>APIs so that source files and data files may be deployed to distinct
>locations.

Indeed. 


From eric at trueblade.com  Wed Jul 20 02:46:26 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 19 Jul 2011 20:46:26 -0400
Subject: [Import-SIG] So...  should we do this thing?
In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
Message-ID: <4E262562.5070606@trueblade.com>

On 7/18/2011 10:50 AM, P.J. Eby wrote:
> What do y'all think?  Should we submit the PEP, and run it by
> Python-Dev?  Anybody have any changes, questions, etc.?

I think you should submit the PEP and run it by python-dev. I'm curious
to hear what Martin and others think.

I like the idea of doing something more radical that not only allows for
"namespace packages" or whatever term we settle on, but simplifies how
we explain packages. I think this proposal does that, at least for
people new to Python. For oldsters like me, it will take some time to
wrap my head around it.

> Perhaps most important: are there any people willing and able to do the
> implementation for Python 3?  ;-)

I'm willing.

Eric.

From ncoghlan at gmail.com  Wed Jul 20 04:03:48 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 20 Jul 2011 12:03:48 +1000
Subject: [Import-SIG] So... should we do this thing?
In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
Message-ID: <CADiSq7ewx1fUKo_1Kk7MMtyrPiZ90XVVzzuG-KkODBD0HsSUkg@mail.gmail.com>

On Tue, Jul 19, 2011 at 12:50 AM, P.J. Eby <pje at telecommunity.com> wrote:
> What do y'all think? ?Should we submit the PEP, and run it by Python-Dev?
> ?Anybody have any changes, questions, etc.?

I think it's ready for wider distribution. I want to see how many
brains we can melt as people come to grips with the long term
implications :)

Some day virtual packages may even become the norm, with
self-contained package directories being an app startup time
optimisation.

> Perhaps most important: are there any people willing and able to do the
> implementation for Python 3? ?;-)

I have plenty on my plate for 3.3 already, but I'll definitely help
out with reviewing submitted patches.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ericsnowcurrently at gmail.com  Wed Jul 20 22:02:47 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 Jul 2011 14:02:47 -0600
Subject: [Import-SIG] PEP 402 implementation
Message-ID: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>

Last night I had a chance to get started on an implementation for the
PEP.  I'm taking the importlib route.  Before I do much more I wanted
to check on a couple of things with this group.

First of all, I don't want to go to much effort here if others are
already focused on the implementation, particularly since I'm sure all
of you would do a better job than I would.  I already feel like I have
butt in on the work Barry, Eric, and crew were getting started. <wink>
 If someone is already going to take care of the implementation please
let me know.

In case you haven't noticed, Python is my first foray into an
open-source project, and I've only been involved since the pycon
sprints (been using Python exclusively for 5 years though).   So, I am
still feeling out the mechanics of how people cooperate on this sort
of stuff.

Secondly, regardless of importlib or import.c or whatever, the sys
module will need to have "virtual_packages" added right?  I stuck that
code in import.c next to where sys.meta_path and others get
initialized [1].  Is that the right place to do it?  Should it go in
sysmodule.c instead?

Thanks,

-eric

[1] http://hg.python.org/cpython/file/default/Python/import.c#l204

From brett at python.org  Wed Jul 20 22:08:42 2011
From: brett at python.org (Brett Cannon)
Date: Wed, 20 Jul 2011 13:08:42 -0700
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
Message-ID: <CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>

On Wed, Jul 20, 2011 at 13:02, Eric Snow <ericsnowcurrently at gmail.com>wrote:

> Last night I had a chance to get started on an implementation for the
> PEP.  I'm taking the importlib route.  Before I do much more I wanted
> to check on a couple of things with this group.
>

Obviously feel free to ask me questions (publicly or privately) if anything
in the importlib code is an issue for you (I know its structure for
bootstrapping reasons is a bit odd).



>
> First of all, I don't want to go to much effort here if others are
> already focused on the implementation, particularly since I'm sure all
> of you would do a better job than I would.  I already feel like I have
> butt in on the work Barry, Eric, and crew were getting started. <wink>
>  If someone is already going to take care of the implementation please
> let me know.
>

I really doubt anyone has jumped into this as much as you have, Eric. =) You
can also always do it on bitbucket or somewhere so that others can
collaborate. I believe there is even a cpython mirror there so that should
make it easy to fork and pull in updates.


>
> In case you haven't noticed, Python is my first foray into an
> open-source project, and I've only been involved since the pycon
> sprints (been using Python exclusively for 5 years though).   So, I am
> still feeling out the mechanics of how people cooperate on this sort
> of stuff.
>
> Secondly, regardless of importlib or import.c or whatever, the sys
> module will need to have "virtual_packages" added right?  I stuck that
> code in import.c next to where sys.meta_path and others get
> initialized [1].  Is that the right place to do it?  Should it go in
> sysmodule.c instead?
>

I can understand populating those properties in import.c, but it is probably
better to initialize the empty data structures in sysmodule.c so that the
code to get the module in a basic state is centralized. but if sys.meta_path
and friends are elsewhere then you can start there and have a separate patch
(file a bug now, though) to possibly relocate the code later.

-Brett


>
> Thanks,
>
> -eric
>
> [1] http://hg.python.org/cpython/file/default/Python/import.c#l204
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110720/bb4c485f/attachment.html>

From pje at telecommunity.com  Wed Jul 20 23:15:55 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 20 Jul 2011 17:15:55 -0400
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.g
	mail.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
	<CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
Message-ID: <20110720211636.8BC553A409B@sparrow.telecommunity.com>

At 01:08 PM 7/20/2011 -0700, Brett Cannon wrote:
>Obviously feel free to ask me questions (publicly or privately) if 
>anything in the importlib code is an issue for you (I know its 
>structure for bootstrapping reasons is a bit odd).

While we're on the topic, I was just browsing through importlib 
(while doing my sketch on how to support the "no pure virtual 
imports" change to PEP 402; see 
http://mail.python.org/pipermail/python-dev/2011-July/112385.html ) 
and I noticed that there are a few places in the implementation where 
it makes assumptions about objects' boolean values.

For example, PathFinder's find_module treats an empty path the same 
as sys.path, and will also fail if for some reason the bool() of a 
PEP 302 finder or loader object is False.  Also, module_for_loader() 
will create a new module object, if you have a False module subclass 
in sys.modules.

Is there any particular reason for these digressions from strict PEP 
302?  I can understand, say, Jython and IronPython not wanting to 
generate object id's, but I was under the impression that those 
languages can do identity checks (especially against None) without 
running into the general problem of generating object IDs in the 
presence of garbage collection.

These distinctions could be more problematic than they appear, as 
it's possible to inadvertently make your loader or your module 
subclass capable of being False (for example, if you subclassed a 
sequence type or implemented a __len__), and this could lead to some 
very subtle bugs, albeit very rare ones as well.  ;-)


From brett at python.org  Wed Jul 20 23:55:38 2011
From: brett at python.org (Brett Cannon)
Date: Wed, 20 Jul 2011 14:55:38 -0700
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <20110720211636.8BC553A409B@sparrow.telecommunity.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
	<CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
	<20110720211636.8BC553A409B@sparrow.telecommunity.com>
Message-ID: <CAP1=2W5NAH9OzK0nW2i2jiLY6vWQx5G_juAtnK26PojV4yxumA@mail.gmail.com>

No specific reason. Feel free to file a bug and assign it to me.

On Wed, Jul 20, 2011 at 14:15, P.J. Eby <pje at telecommunity.com> wrote:

> At 01:08 PM 7/20/2011 -0700, Brett Cannon wrote:
>
>> Obviously feel free to ask me questions (publicly or privately) if
>> anything in the importlib code is an issue for you (I know its structure for
>> bootstrapping reasons is a bit odd).
>>
>
> While we're on the topic, I was just browsing through importlib (while
> doing my sketch on how to support the "no pure virtual imports" change to
> PEP 402; see http://mail.python.org/**pipermail/python-dev/2011-**
> July/112385.html<http://mail.python.org/pipermail/python-dev/2011-July/112385.html>) and I noticed that there are a few places in the implementation where it
> makes assumptions about objects' boolean values.
>
> For example, PathFinder's find_module treats an empty path the same as
> sys.path, and will also fail if for some reason the bool() of a PEP 302
> finder or loader object is False.  Also, module_for_loader() will create a
> new module object, if you have a False module subclass in sys.modules.
>
> Is there any particular reason for these digressions from strict PEP 302?
>  I can understand, say, Jython and IronPython not wanting to generate object
> id's, but I was under the impression that those languages can do identity
> checks (especially against None) without running into the general problem of
> generating object IDs in the presence of garbage collection.
>
> These distinctions could be more problematic than they appear, as it's
> possible to inadvertently make your loader or your module subclass capable
> of being False (for example, if you subclassed a sequence type or
> implemented a __len__), and this could lead to some very subtle bugs, albeit
> very rare ones as well.  ;-)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110720/b0ffb4d3/attachment.html>

From ncoghlan at gmail.com  Thu Jul 21 01:18:06 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 Jul 2011 09:18:06 +1000
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <CAP1=2W5NAH9OzK0nW2i2jiLY6vWQx5G_juAtnK26PojV4yxumA@mail.gmail.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
	<CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
	<20110720211636.8BC553A409B@sparrow.telecommunity.com>
	<CAP1=2W5NAH9OzK0nW2i2jiLY6vWQx5G_juAtnK26PojV4yxumA@mail.gmail.com>
Message-ID: <CADiSq7em1jH1Mq54jzXEU7d4+pPyxo8rZEhFYskdSOTcdosTHw@mail.gmail.com>

On Thu, Jul 21, 2011 at 7:55 AM, Brett Cannon <brett at python.org> wrote:
> No specific reason. Feel free to file a bug and assign it to me.

Yeah, it sounds like a few "is not None" snippets need to be sprinkled
around and some pathological cases added to the import test suite.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ericsnowcurrently at gmail.com  Thu Jul 21 01:26:18 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 Jul 2011 17:26:18 -0600
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
	<CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
Message-ID: <CALFfu7BxkeYGFM_Ohm4ZCy2HLWwB65zUer2a95NM47cmn+DgGQ@mail.gmail.com>

On Wed, Jul 20, 2011 at 2:08 PM, Brett Cannon <brett at python.org> wrote:
> Obviously feel free to ask me questions (publicly or privately) if anything
> in the importlib code is an issue for you (I know its structure for
> bootstrapping reasons is a bit odd).

Thanks.  To be honest, with the time I have spent in importlib in the
last couple months I realize how much work you put into it, so thanks.
 It makes it really easy to hack the import mechanism.

> I really doubt anyone has jumped into this as much as you have, Eric. =) You
> can also always do it on bitbucket or somewhere so that others can
> collaborate. I believe there is even a cpython mirror there so that should
> make it easy to fork and pull in updates.

Yep, already have my bitbucket clone (haven't pushed committed to it
yet though).

>> Secondly, regardless of importlib or import.c or whatever, the sys
>> module will need to have "virtual_packages" added right? ?I stuck that
>> code in import.c next to where sys.meta_path and others get
>> initialized [1]. ?Is that the right place to do it? ?Should it go in
>> sysmodule.c instead?
>
> I can understand populating those properties in import.c, but it is probably
> better to initialize the empty data structures in sysmodule.c so that the
> code to get the module in a basic state is centralized. but if sys.meta_path
> and friends are elsewhere then you can start there and have a separate patch
> (file a bug now, though) to possibly relocate the code later.

Good idea.  I submitted issue 12598 along with a patch.

-eric

From ncoghlan at gmail.com  Thu Jul 21 01:27:13 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 Jul 2011 09:27:13 +1000
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
	<CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
Message-ID: <CADiSq7fxLrD2kQ627ptkgKZF_TzCRVXyNn=Sp1aFYb-Q7VrEjQ@mail.gmail.com>

On Thu, Jul 21, 2011 at 6:08 AM, Brett Cannon <brett at python.org> wrote:
> I really doubt anyone has jumped into this as much as you have, Eric. =) You
> can also always do it on bitbucket or somewhere so that others can
> collaborate. I believe there is even a cpython mirror there so that should
> make it easy to fork and pull in updates.

+1 for publishing on bitbucket. I recently moved my own sandox from
python.org to bitbucket in order to make collaboration easier and I
know Eric already has an account there (cf. the importlib 2.x backport
scripts).

The cpython mirror is at: https://bitbucket.org/mirror/cpython/overview

Also, take note of the refinement PJE described on python-dev:
sys.virtual_packages will be a dict mapping to __path__ contents and
directly importing pure virtual packages will only be permitted if a
child package has already been successfully imported.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Thu Jul 21 01:35:18 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 Jul 2011 09:35:18 +1000
Subject: [Import-SIG] PEP 402 implementation
In-Reply-To: <CADiSq7em1jH1Mq54jzXEU7d4+pPyxo8rZEhFYskdSOTcdosTHw@mail.gmail.com>
References: <CALFfu7Db_YpE1T9wu4hsDx3wZdnKMFJ8nCs74aAZMPaiU0dybg@mail.gmail.com>
	<CAP1=2W5T2fETQZ8rXjTuGJwiVUjdJX3ZpZNYXT7WGGvNedwHUg@mail.gmail.com>
	<20110720211636.8BC553A409B@sparrow.telecommunity.com>
	<CAP1=2W5NAH9OzK0nW2i2jiLY6vWQx5G_juAtnK26PojV4yxumA@mail.gmail.com>
	<CADiSq7em1jH1Mq54jzXEU7d4+pPyxo8rZEhFYskdSOTcdosTHw@mail.gmail.com>
Message-ID: <CADiSq7fp0XnytU8VgsFF=scrj37PwfyfAaQVGT5jD5AH989Rvw@mail.gmail.com>

On Thu, Jul 21, 2011 at 9:18 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Jul 21, 2011 at 7:55 AM, Brett Cannon <brett at python.org> wrote:
>> No specific reason. Feel free to file a bug and assign it to me.
>
> Yeah, it sounds like a few "is not None" snippets need to be sprinkled
> around and some pathological cases added to the import test suite.

Created as http://bugs.python.org/issue12599

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From martin at v.loewis.de  Thu Jul 21 22:42:37 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 21 Jul 2011 22:42:37 +0200
Subject: [Import-SIG] So...  should we do this thing?
In-Reply-To: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
Message-ID: <4E288F3D.4060602@v.loewis.de>

Am 18.07.2011 16:50, schrieb P.J. Eby:
> What do y'all think?  Should we submit the PEP, and run it by
> Python-Dev?  Anybody have any changes, questions, etc.?

I still plan to write my own version of it, so that would make it
three PEPs.

Regards,
Martin

From brett at python.org  Thu Jul 21 22:59:41 2011
From: brett at python.org (Brett Cannon)
Date: Thu, 21 Jul 2011 13:59:41 -0700
Subject: [Import-SIG] So... should we do this thing?
In-Reply-To: <4E288F3D.4060602@v.loewis.de>
References: <20110718145111.AC1583A40AA@sparrow.telecommunity.com>
	<4E288F3D.4060602@v.loewis.de>
Message-ID: <CAP1=2W5oZVuYa6qrf-me-VWdKm8B2dWhbQmo4DZWH8j5B_=6nA@mail.gmail.com>

On Thu, Jul 21, 2011 at 13:42, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> Am 18.07.2011 16:50, schrieb P.J. Eby:
> > What do y'all think?  Should we submit the PEP, and run it by
> > Python-Dev?  Anybody have any changes, questions, etc.?
>
> I still plan to write my own version of it, so that would make it
> three PEPs.
>

A trifecta! At least we have options to choose from. It's a tricky enough
topic to get  right that I'm not surprised  at the possibility of three PEPs
on it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110721/539d2581/attachment.html>

From barry at python.org  Thu Jul 21 22:37:00 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 21 Jul 2011 16:37:00 -0400
Subject: [Import-SIG] New PEP draft: "Simplified Package Layout and
 Partitioning"
In-Reply-To: <CADiSq7c6uzJrbapPMnvn6R7CP6+_fus2DQ_C+rj0YB_zTioKOQ@mail.gmail.com>
References: <20110713171345.4E0673A4100@sparrow.telecommunity.com>
	<20110718121726.123e5b44@resist.wooz.org>
	<CADiSq7c6uzJrbapPMnvn6R7CP6+_fus2DQ_C+rj0YB_zTioKOQ@mail.gmail.com>
Message-ID: <20110721163700.3daff988@resist.wooz.org>

On Jul 19, 2011, at 08:07 AM, Nick Coghlan wrote:

>pkgutil.get_data() needs to be updated to handle this case, so
>retrieving the contents of a specific file in the directory above
>could be written as either of the following:
>
>pkgutil.get_data(foo, 'test/data/file.dat')
>pkgutil.get_data(foo.test, 'data/file.dat')

The latter looks fine to me.

>The question of PEP 302 and listing *available* data files (and other
>directory-style or lazy data access I/O operations) remains open
>(independent of the changes in this PEP). Note that os.path.join based
>approaches already break as soon as you put the package and data files
>in a zipfile.

Yep.

>In reality, I believe people should be using the appropriate packaging
>APIs so that source files and data files may be deployed to distinct
>locations.

Completely agree.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110721/ba452114/attachment.pgp>

From jergosh at gmail.com  Sun Jul 31 14:05:15 2011
From: jergosh at gmail.com (Greg Slodkowicz)
Date: Sun, 31 Jul 2011 14:05:15 +0200
Subject: [Import-SIG] New PEP Draft: Import Engine
Message-ID: <CAGY-8BJHJZGJh8QWMZ+i25wLE4YamU1p3S5fGotTv371gM-kgw@mail.gmail.com>

Dear all,
The following is a result of a GSoC project I've been working on with
Nick and Brett. I wrote up a description of the proposed changes as a
short PEP draft. I'd appreciate any suggestions or criticism.

A sligthly more readable version is also available at
http://wiki.python.org/moin/SummerOfCode/PythonImportEnginePlanning?action=edit&editor=text

PEP: XXX
Title: Python Import Engine
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan at gmail.com>, Greg Slodkowicz <jergosh at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-Jul-2011
Post-History: XXX

Abstract
========

This PEP proposes incorporating an 'import engine' class which would
encapsulate all state related to importing modules into a single
object and provide an alternative to the built-in implementation of
the import statement, which is syntactic sugar for the
``__import__()`` method.  Currently the bulk of importing work is done
by means of module finders and loaders, and their interfaces would
require a simple change in order to work both the builtin import
functionality and importing via import engine objects.  In that sense,
this PEP constitutes a revision of finder and loader interfaces
described in PEP 302 [1]_.

Rationale
=========

Historically, any modification to the import functionality required
re-implementing ``__import__()`` entirely.  PEP 302 provides a major
improvement by introducing separation between imports of different
types of modules.  As a result, additional process-global state is
stored in the sys module.  This, along with earlier import-related
global state, comprises:

* sys.modules
* sys.path
* sys.path_hooks
* sys.meta_path
* sys.path_importer_cache
* the import lock (imp.lock_held()/acquire_lock()/release_lock())

Isolating this state would allow multiple import states to be
conveniently stored within a process. Placing the import functionality
in a self-contained object would allow subclassing to add additional
features (e.g. module import notifications or fine-grained control
over which modules can be imported).  The engine would also be
subclassed to make it possible to use the import engine API to
interact with the existing process-global state.

Proposal
========

We propose introducing an ImportEngine class to encapsulate import
functionality. This includes the ``__import__()`` function which can
be used to as an alternative to the built-in ``__import__()`` when
desired and also ``import_module()``, equivalent to
``importlib.import_module()`` [3]_.

Since the new style finders and loaders should also have the option to
modify the global import state, we introduce a ``GlobalImportState``
class with an interface identical to ``ImportEngine`` but taking
advantage of the global state. This can be easily implemented using
class properties.


Design and Implementation
=========================

API
~~~~

The proposed extension would consist of the following objects:

``engine.ImportEngine``

 ``__import__(self, name, globals={}, locals={}, fromlist=[], level=0)``
 Reimplementation of the builtin ``__import__()`` function.  The
import of a module will proceed using the state stored in the
ImportEngine instance rather than the global import state.  For full
documentation of ``__import__`` funtionality, see [2]_ .
``__import__()`` from ``ImportEngine`` and its subclasses can be used
to customise the behaviour of the ``import`` statement by replacing
``__builtin__.__import__`` with ``ImportEngine.__import__``.

``import_module(name, package=None)``
 A reimplementation of ``importlib.import_module()`` which uses the
import state stored in the ImportEngine instance. See [3]_ for a full
reference.

``from_engine(self, other)``
  Create a new import object from another ImportEngine instance.  The
new object is initialised with a copy of the state in ``other``.  When
called on ``engine.sysengine`` as ``other``, ``from_engine()`` can be
used to create an ImportEngine object with a **copy** of the global
import state.

``GlobalImportEngine(ImportEngine)``
 Convenience class to provide engine-like access to the global state.
Provides ``__import__()``, ``import_module()`` and ``from_engine()``
methods like ``ImportEngine`` but writes through to the global state
in ``sys``.


Global variables
~~~~~~~~~~~~~~~~

``engine.sysengine``
 Instance of GlobalImportEngine provided for convenience (e. g. for
use by module finders and loaders).

Necessary changes to finder/loader interfaces:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``find_module`` (cls, fullname, path=None, **engine=None**)

``load module`` (cls, fullname, path=None, **engine=None**)

The only difference between 'new style' and PEP 302 compatible
finders/loaders is the presence of an additional ``engine`` parameter.
This is intended to specify an ImportEngine instance or subclass there
of.  This parameter is optional so that the 'new style' finders and
loaders can be made backwards compatible by falling back on
engine.sysengine with the following simple pattern:

::

 find_module(cls, fullname, path=None, engine=None)
   if not engine:
     engine = engine.sysengine

   ...

An implementation based on Brett Cannon's importlib has been developed
by Greg Slodkowicz as part of the 2011 Google Summer of Code. The code
repository is located at
https://bitbucket.org/jergosh/gsoc_import_engine/.

Open Issues
~~~~~~~~~~~

The existing importlib implementation depends on several functions
from ``imp``, Python's builtin implementation of ``__import__``
located in *Python/import.c*. These functions are unaware of
ImportEngine and place the newly imported module in ``sys.modules``.
Naturally, this is a problem from the ImportEngine point of view.  The
offending methods are:

* imp.init_builtin()
* imp.load_dynamic()

However, since there can be only a single instance of each
builtin/dynamic module per process, they are essentially
process-global regardless of the way they are imported. Currently, the
simplest solution for supporting them in ImportEngine seems to have
new style loaders call the existing imp methods and then copy
appropriate references from ``sys.modules`` into the state inside the
import engine.

Similarly, ``imp.NullImporter`` implements a ``load_module`` method
which is incompatible with 'new style' loaders. Since the
``NullImporter`` class does next to nothing (i. e. always returns
None), it has been reimplemented in Python. The only way this could
cause problems would be explicitly checking if a module's importer is
an imp.NullImporter (which occurs only in some unittests).

References
==========

.. [1] PEP 302, New Import Hooks, J van Rossum, Moore
   (http://www.python.org/dev/peps/pep-0302)

.. [2] __import__() builtin function, The Python Standard Library documentation
   (http://docs.python.org/library/functions.html#__import__)

.. [3] Importlib documentation, Cannon
   (http://docs.python.org/dev/library/importlib)


Copyright
=========

This document has been placed in the public domain.


Best regards,
Greg

From pje at telecommunity.com  Sun Jul 31 16:51:30 2011
From: pje at telecommunity.com (P.J. Eby)
Date: Sun, 31 Jul 2011 10:51:30 -0400
Subject: [Import-SIG] New PEP Draft: Import Engine
In-Reply-To: <CAGY-8BJHJZGJh8QWMZ+i25wLE4YamU1p3S5fGotTv371gM-kgw@mail.g
	mail.com>
References: <CAGY-8BJHJZGJh8QWMZ+i25wLE4YamU1p3S5fGotTv371gM-kgw@mail.gmail.com>
Message-ID: <20110731145241.AEC433A409B@sparrow.telecommunity.com>

At 02:05 PM 7/31/2011 +0200, Greg Slodkowicz wrote:
>Necessary changes to finder/loader interfaces:
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>``find_module`` (cls, fullname, path=None, **engine=None**)
>
>``load module`` (cls, fullname, path=None, **engine=None**)
>
>The only difference between 'new style' and PEP 302 compatible
>finders/loaders is the presence of an additional ``engine`` parameter.
>This is intended to specify an ImportEngine instance or subclass there
>of.  This parameter is optional so that the 'new style' finders and
>loaders can be made backwards compatible by falling back on
>engine.sysengine with the following simple pattern:

I see how you can make new style loaders callable from the old 
system, but how do you make *old* loaders usable from the *new* 
system?  That is, I don't see how this proposal is backwards 
compatible with PEP 302.

For that, I think you'd have to define new, optional method names for 
the methods that accepted an engine parameter, with the engine 
falling back to calling the PEP 302 names if the new ones weren't available.


>The existing importlib implementation depends on several functions
>from ``imp``, Python's builtin implementation of ``__import__``
>located in *Python/import.c*. These functions are unaware of
>ImportEngine and place the newly imported module in ``sys.modules``.
>Naturally, this is a problem from the ImportEngine point of view.

It's a general backwards compatibility problem, since importers in 
general are able to assume (and often do) that the loaded modules 
will be placed in sys.modules.


>Similarly, ``imp.NullImporter`` implements a ``load_module`` method
>which is incompatible with 'new style' loaders.

Again, if you use PEP 302 methods only as compatibility fallbacks, 
this won't be an issue.

The biggest problem I see with this as a PEP is that there isn't any 
discussion of backwards compatibility, in the sense that the PEP is 
all about how things *aren't* going to be backwards compatible, and 
the Rationale doesn't present any specific use cases that would 
justify the created incompatibilities.

It would be much better if you can reframe your proposal in terms of 
*additions* to the PEP 302 protocol, rather than *changes*.