[Import-SIG] What if namespace imports weren't special?
P.J. Eby
pje at telecommunity.com
Mon Jul 11 06:39:04 CEST 2011
At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote:
>At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote:
>>There's also a performance impact on app startup time - currently most
>>package imports stop as soon as they hit a matching directory. Under a
>>"partitioned by default" scheme, all package imports (including things
>>like "logging" and "email" which currently get a hit in the first zip
>>file for the standard library) would have to scan the entirety of
>>sys.path just in case there are additional shards lying around. For
>>large applications, that additional overhead is going to add up.
>
>Darn, I missed that. That kills the idea pretty much dead right
>there, as it means ALL imports are massively slowed down. Crap.
Hrm. I just realized WHY I missed it. I was thinking that we'd only
do that in the case where you *first* find a namespace. IOW, I was
proposing to only change the semantics in the case where a suitable
directory is found on sys.path *before* the normal package or
module. IOW, the semantics I was thinking of were:
* Scan sys.path, keeping track of any subpaths found
* If you hit a module with no subpaths found before it, import and finish
* Otherwise, if you hit a subpath first, accumulate all subpaths
and tack them on a module or package
* If the matching module was a package __init__, move its subpath
to the beginning of the list
But I agree that it's an upward climb to sell this approach. For
example, it means that you can have code later on sys.path affect
code that's earlier, which seems wrong and a tad unsafe.
I wish we had a way to do this that didn't require special files, and
still allowed us to have package names be plain directory names, and
didn't break distutils installation processes. (Distutils can
install submodules without a package __init__ being included, but
apart from that it forces installed directory structure to match
package name structure.)
Okay, I have an idea.
Suppose that we reserve a special directory name, like 'pypkg'. And,
if a sys.path directory contains a 'py-pkg' subdirectory, then any
directory in that directory (recursively) is a package following
__path__-assembly semantics.
So, in order to enable new import semantics, you have to install your
code to a 'py-pkg' directory under a regular sys.path
directory... that's the only catch.
*However*, because the distutils actually let you install packages
without __init__ modules, you can trick them into installing your
otherwise-normal package this way, by the simple expedient of telling
the distutils your package name is 'py-pkg.foo' instead of 'foo'.
(Note: this is only a hack for 2.x, and setuptools will probably be
doing the dirty work of making distutils do this anyway "under the
hood". For 3.x, we can hopefully assume that the 'packaging' folks
will enable doing this in a somewhat saner way.)
Anyway, revising the ongoing example to add the directory and drop
the flag files, we get:
ProxyTypes-0.9.tgz:
py-pkg/peak/util/proxies.py
Importing-1.10.tgz:
py-pkg/peak/util/imports.py
or (combined):
site-packages/ (or wherever)
py-pkg/
peak/
util/
imports.py
proxies.py
zope/
...
This approach solves several problems at once:
1. No flag files
2. Faster imports (stat instead of listdir)
3. Directory clearly identified as containing python packages
4. No need for a special name, these are just regular packages with
enhanced import semantics
5. Distutils can still install it
Minor downsides:
* Flat is better than nested
* Existing code has to move to take advantage (unless you're not
going to import the code without installing it, in which case you can
just tweak your setup.py and not actually move anything)
Thoughts?
More information about the Import-SIG
mailing list