[Import-SIG] What if namespace imports weren't special?

Mon Jul 11 06:49:17 CEST 2011

On Sun, Jul 10, 2011 at 21:39, P.J. Eby <pje at telecommunity.com> wrote:

> At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote:
>
>> At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote:
>>
>>> There's also a performance impact on app startup time - currently most
>>> package imports stop as soon as they hit a matching directory. Under a
>>> "partitioned by default" scheme, all package imports (including things
>>> like "logging" and "email" which currently get a hit in the first zip
>>> file for the standard library) would have to scan the entirety of
>>> sys.path just in case there are additional shards lying around. For
>>> large applications, that additional overhead is going to add up.
>>>
>>
>> Darn, I missed that.  That kills the idea pretty much dead right there, as
>> it means ALL imports are massively slowed down.  Crap.
>>
>
> Hrm.  I just realized WHY I missed it.  I was thinking that we'd only do
> that in the case where you *first* find a namespace.  IOW, I was proposing
> to only change the semantics in the case where a suitable directory is found
> on sys.path *before* the normal package or module.  IOW, the semantics I was
> thinking of were:
>
>  * Scan sys.path, keeping track of any subpaths found
>  * If you hit a module with no subpaths found before it, import and finish
>  * Otherwise, if you hit a subpath first, accumulate all subpaths and tack
> them on a module or package
>  * If the matching module was a package __init__, move its subpath to the
> beginning of the list
>
> But I agree that it's an upward climb to sell this approach.  For example,
> it means that you can have code later on sys.path affect code that's
> earlier, which seems wrong and a tad unsafe.
>
> I wish we had a way to do this that didn't require special files, and still
> allowed us to have package names be plain directory names, and didn't break
> distutils installation processes.  (Distutils can install submodules without
> a package __init__ being included, but apart from that it forces installed
> directory structure to match package name structure.)
>
> Okay, I have an idea.
>
> Suppose that we reserve a special directory name, like 'pypkg'.  And, if a
> sys.path directory contains a 'py-pkg' subdirectory, then any directory in
> that directory (recursively) is a package following __path__-assembly
> semantics.
>
> So, in order to enable new import semantics, you have to install your code
> to a 'py-pkg' directory under a regular sys.path directory...  that's the
> only catch.
>
> *However*, because the distutils actually let you install packages without
> __init__ modules, you can trick them into installing your otherwise-normal
> package this way, by the simple expedient of telling the distutils your
> package name is 'py-pkg.foo' instead of 'foo'.
>
> (Note: this is only a hack for 2.x, and setuptools will probably be doing
> the dirty work of making distutils do this anyway "under the hood".  For
> 3.x, we can hopefully assume that the 'packaging' folks will enable doing
> this in a somewhat saner way.)
>
> Anyway, revising the ongoing example to add the directory and drop the flag
> files, we get:
>
>    ProxyTypes-0.9.tgz:
>        py-pkg/peak/util/proxies.py
>
>    Importing-1.10.tgz:
>        py-pkg/peak/util/imports.py
>
> or (combined):
>
>    site-packages/   (or wherever)
>        py-pkg/
>            peak/
>                util/
>                    imports.py
>                    proxies.py
>            zope/
>            ...
>
> This approach solves several problems at once:
>
>  1. No flag files
>  2. Faster imports (stat instead of listdir)
>  3. Directory clearly identified as containing python packages
>  4. No need for a special name, these are just regular packages with
> enhanced import semantics
>  5. Distutils can still install it
>
> Minor downsides:
>
>  * Flat is better than nested
>  * Existing code has to move to take advantage (unless you're not going to
> import the code without installing it, in which case you can just tweak your
> setup.py and not actually move anything)
>

I prefer going with a specifically named file if for any other reason than
there will be less broken tools. By shifting everything into a subdirectory
you prevent any pre-existing code that scans sys.path from doing anything.
But with the special file approach you don't break those tools in the case
of when you didn't have some package fragment farther down sys.path. Plus
you can also use a specially named file instead of allowing for any file
name with a specific file ending to achieve the same result (e.g., py.pkg or
__init__.part).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110710/632b50ab/attachment.html>