[Import-SIG] What if namespace imports weren't special?

Mon Jul 11 06:39:04 CEST 2011

At 11:57 PM 7/10/2011 -0400, P.J. Eby wrote:
>At 01:16 PM 7/11/2011 +1000, Nick Coghlan wrote:
>>There's also a performance impact on app startup time - currently most
>>package imports stop as soon as they hit a matching directory. Under a
>>"partitioned by default" scheme, all package imports (including things
>>like "logging" and "email" which currently get a hit in the first zip
>>file for the standard library) would have to scan the entirety of
>>sys.path just in case there are additional shards lying around. For
>>large applications, that additional overhead is going to add up.
>
>Darn, I missed that.  That kills the idea pretty much dead right 
>there, as it means ALL imports are massively slowed down.  Crap.

Hrm.  I just realized WHY I missed it.  I was thinking that we'd only 
do that in the case where you *first* find a namespace.  IOW, I was 
proposing to only change the semantics in the case where a suitable 
directory is found on sys.path *before* the normal package or 
module.  IOW, the semantics I was thinking of were:

  * Scan sys.path, keeping track of any subpaths found
  * If you hit a module with no subpaths found before it, import and finish
  * Otherwise, if you hit a subpath first, accumulate all subpaths 
and tack them on a module or package
  * If the matching module was a package __init__, move its subpath 
to the beginning of the list

But I agree that it's an upward climb to sell this approach.  For 
example, it means that you can have code later on sys.path affect 
code that's earlier, which seems wrong and a tad unsafe.

I wish we had a way to do this that didn't require special files, and 
still allowed us to have package names be plain directory names, and 
didn't break distutils installation processes.  (Distutils can 
install submodules without a package __init__ being included, but 
apart from that it forces installed directory structure to match 
package name structure.)

Okay, I have an idea.

Suppose that we reserve a special directory name, like 'pypkg'.  And, 
if a sys.path directory contains a 'py-pkg' subdirectory, then any 
directory in that directory (recursively) is a package following 
__path__-assembly semantics.

So, in order to enable new import semantics, you have to install your 
code to a 'py-pkg' directory under a regular sys.path 
directory...  that's the only catch.

*However*, because the distutils actually let you install packages 
without __init__ modules, you can trick them into installing your 
otherwise-normal package this way, by the simple expedient of telling 
the distutils your package name is 'py-pkg.foo' instead of 'foo'.

(Note: this is only a hack for 2.x, and setuptools will probably be 
doing the dirty work of making distutils do this anyway "under the 
hood".  For 3.x, we can hopefully assume that the 'packaging' folks 
will enable doing this in a somewhat saner way.)

Anyway, revising the ongoing example to add the directory and drop 
the flag files, we get:

     ProxyTypes-0.9.tgz:
         py-pkg/peak/util/proxies.py

     Importing-1.10.tgz:
         py-pkg/peak/util/imports.py

or (combined):

     site-packages/   (or wherever)
         py-pkg/
             peak/
                 util/
                     imports.py
                     proxies.py
             zope/
             ...

This approach solves several problems at once:

  1. No flag files
  2. Faster imports (stat instead of listdir)
  3. Directory clearly identified as containing python packages
  4. No need for a special name, these are just regular packages with 
enhanced import semantics
  5. Distutils can still install it

Minor downsides:

  * Flat is better than nested
  * Existing code has to move to take advantage (unless you're not 
going to import the code without installing it, in which case you can 
just tweak your setup.py and not actually move anything)

Thoughts?