[Import-SIG] PEP 420 issue: extend_path

Wed May 9 16:33:32 CEST 2012

On Wed, May 9, 2012 at 4:19 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Wed, May 9, 2012 at 4:48 PM,  <martin at v.loewis.de> wrote:
> >>> I'm not sure why we *need* a list of portions, but if we do, simple
> >>> return values seem like the way to go.  But the 2-element tuple wins
> >>> even in the single path portion case, and the tuple-return protoocol is
> >>> extensible if we need more data returned in future anyway.
> >>
> >>
> >> Nick laid out a use case in a previous email. It makes sense to me. For
> >> example, a zip file could contain multiple portions from the same
> >> namespace package. You'd need a new path hook or mods to zipimport, but
> >> it's conceivable.
> >
> >
> > I must have missed Nick's message where he explained it, so I still need
> > to ask again: how exactly would such a zip file be structured?
> >
> > I fail to see the need to ever report both a loader and a portion,
> > as well as the need to report multiple portions, for a single sys.path
> > item. That sounds like an unnecessary complication.
>
> My actual objection is the same as Antoine's: that needing to
> introspect the result of find_loader() to handle the PEP 420 use case
> is a code smell that suggests the API design is flawed. The problem I
> had with it was that find_loader() needs to report on 3 different
> scenarios:
>
> 1. I am providing a loader to fully load this module, stop scanning
> the path hooks
> 2. I am contributing to a potential namespace package, keep scanning
> the path hooks
> 3. I have nothing to provide for that name, keep scanning the path hooks.
>
> Using the type of the return value (or whether or not it has a
> "load_module" attribute) to decide between scenario 1 and 2 just feels
> wrong.
>
> My proposed alternative was to treat the "portion_found" event as a
> callback rather than as something to be handled via the return value.
> Then loaders would be free to report as many portions as they wished,
> with the final "continue scanning or not" decision handled via the
> existing "loader or None" semantics.
>
> The example I happened to use to illustrate the difference was one
> where a loader actually internally implements its *own* path scan of
> multiple locations. I wasn't specifically thinking of zipfiles, but
> you could certainly use it that way. The core concept was that a
> single entry on the main path would be handed off to a finder that
> actually knew about *multiple* code locations, and hence may want to
> report multiple path portions.
>
> The 3 scenarios above would then correspond to:
>
> 1. Loader was returned (doesn't matter if callback was invoked)
> 2. None was returned, callback was invoked one or more times
> 2. None was returned, callback was never invoked
>
> Eric's counter-proposal is to handle the 3 scenarios as:
>
> 1. (<loader>, <don't care>)
> 2. (None, [<path entries>])
> 3. (None, [])
>
> Yet another option would be to pass a namespace_path list object
> directly into the find_loader() call, instead of passing
> namespace_path.append as a callback. Then the loader would append any
> portions it finds directly to the list, with the return value again
> left as the simple choice between a loader or None.
>

IOW a path accumulator where a loader can say "I can't find anything, but
these came *damn* close to working". Seems reasonable to me.

>
> One final option would be add an optional "extend_namespace" method to
> *loader* objects. Then the logic would become, instead of type
> introspection, more like the following:
>
>    loader = find_loader(fullpath)
>    try:
>        extend_namespace = loader.extend_namespace
>    except AttributeError:
>        pass
>    else:
>        if extend_namespace(namespace_path):
>            # The loader contributed to the namespace package rather
> than loading the full module
>            continue
>     if loader is not None:
>        return loader
>

How does this avoid an unnecessary stat call on the directory (or in this
case namespace_path)? One of the reasons to have the detection of a
namespace directory in the finder was to avoid doing an extra stat call for
something the finder already noticed. You can obviously cache, but even
then you will need to do a stat call to verify the cache is not out of date
(unless we explicitly state that the extend_namespace() call is *only* made
immediately after find_loader/find_module and so the possible race
condition is small enough to ignore).

-Brett

>
> It's definitely the switch-statement feel of the proposed type checks
> that rubs me the wrong way, though. Supporting multiple portions from
> a single loader was just the most straightforward example I could
> think of a limitation imposed by that mechanism.
>
> Regards,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120509/6e4cddfb/attachment.html>