[Python-Dev] Draft PEP: "Simplified Package Layout and Partitioning"

P.J. Eby pje at telecommunity.com
Thu Jul 21 01:03:58 CEST 2011

At 03:09 PM 7/20/2011 -0700, Glenn Linderman wrote:
>On 7/20/2011 6:05 AM, P.J. Eby wrote:
>>At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote:
>>>When I read about creating __path__ from sys.path, I immediately 
>>>thought of the issue of programs that extend sys.path, and the 
>>>above is the "workaround" for such programs.  but it requires 
>>>such programs to do work, and there are a lot of such programs (I, 
>>>a relative newbie, have had to write some).  As it turns out, I 
>>>can't think of a situation where I have extended sys.path that 
>>>would result in a problem for fancy namespace packages, because so 
>>>far I've only written modules, not packages, and only modules are 
>>>on the paths that I add to sys.path.  But that does not make 
>>>for a general solution.
>>Most programs extend sys.path in order to import things.  If those 
>>things aren't yet imported, they don't have a __path__ yet, and so 
>>don't need to be fixed.  Only programs that modify sys.path 
>>*after* importing something that has a dynamic __path__ would need 
>>to do anything about that.
>Sure.  But there are a lot of things already imported by Python 
>itself, and if this mechanism gets used in the stdlib, a program 
>wouldn't know whether it is safe or not, to not bother with the 
>pkgutil.extend_virtual_paths() call or not.

I'm not sure I see how the mechanism could meaningfully be used in 
the stdlib, since IIUC we're not going for Perl-style package 
naming.  So, all stdlib packages would be self-contained.

>Plus, that requires importing pkgutil, which isn't necessarily done 
>by every program that extends the sys.path ("import sys" is 
>sufficient at present).
>Plus, if some 3rd party packages are imported before sys.path is 
>extended, the knowledge of how they are implement is required to 
>make a choice about whether it is needed to import pkgutil and call 
>extend_virtual_paths or not.

I'd recommend *always* using it, outside of simple startup code.

>So I am still left with my original question:
>>>Is there some way to create a new __path__ that would reflect the 
>>>fact that it has been dynamically created, rather than set from 
>>>__init__.py, and then when it is referenced, calculate (and 
>>>cache?) a new value of __path__ to actually search?

Hm.  Yes, there is a way to do something like that, but it would 
complicate things a bit.  We'd need to:

1. Leave __path__ off of the modules, and always pull them from 
sys.virtual_package_paths, and

2. Before using a value in sys.virtual_package_paths, we'd need to 
check whether sys.path had changed since we last cached anything, and 
if so, clear sys.virtual_package_paths first, to force a refresh.

This doesn't sound particularly forbidding, but there are various 
unpleasant consequences, like being unable to tell whether a module 
is a package or not, and whether it's a virtual package or not.  We'd 
have to invent new ways to denote these things.

On the bright side, though, it *would* allow transparent live updates 
to virtual package paths, so it might be worth considering.

By the way, the reason we have to get rid of __path__ is that if we 
kept it, then code could change it, and then we wouldn't know if it 
was actually safe to change it automatically...  even if no code had 
actually changed it.

In principle, we could keep __path__ attributes around, and 
automatically update them in the case where sys.path has changed, so 
long as user code hasn't directly altered or replaced the 
__path__.  But it seems to me to be a dangerous corner case; I'd 
rather that code which touches __path__ be taking responsibility for 
that path's correctness from then on, rather than having it get 
updated (possibly incorrectly) behind its back.

So, I'd say that for this approach, we'd have to actually leave 
__path__ off of virtual packages' parent modules.

Anyway, it seems worth considering.  We just need to sort out what 
the downsides are for any current tools thinking that such modules 
aren't packages.  (But hey, at least it'll be consistent with what 
such tools would think of the on-disk representation!  That is, a 
tool that thinks foo.py alongside a foo/ subdirectory is just a 
module with no package, will also think that 'foo', once imported, is 
a module with no package.)

>And, in the absence of knowing (because I didn't write them) whether 
>any of the packages I imported before extending sys.path are virtual 
>packages or not, I would have to do this every time I extend 
>sys.path.  And so it becomes a burden on writing programs.
>If the code is so boilerplate as you describe, should sys.path 
>become an object that acts like a list, instead of a list, and have 
>its append method automatically do the pkgutil.extend_virtual_paths 
>for me?  Then I wouldn't have to worry about whether any of the 
>packages I imported were virtual packages or not.

Well, then we'd have to worry about other mutation methods, and 
things like 'sys.path = [blah, blah]', as well.  So if we're going to 
ditch the need for extend_virtual_paths(), we should probably do it 
via the absence of __path__ attributes.

More information about the Python-Dev mailing list