[Distutils] Bilingual namespace package conundrum

Robert Collins robertc at robertcollins.net
Tue Jan 13 02:39:45 CET 2015


On 2 January 2015 at 08:19, Barry Warsaw <barry at python.org> wrote:
> I hope the following makes sense; I've been a little under the weather. ;)
> Apologies in advance for the long data-dump, but I wanted to provide as
> complete information as possible.

Welcome to hell :)

> I have ported Mailman 3 to Python 3.4[*].  While working on various
> development tasks, I noticed something rather strange regarding
> (pseudo-)namespace packages which MM3 is dependent on.  These libraries work
> for both Python 2 and 3, but of course namespace packages work completely
> differently in those two versions (as of Python 3.3, i.e. PEP 420).  I've been
> thinking about how such bilingual packages can play better with PEP 420 when
> running under a compatible Python 3 version, and still *seem* to work well in
> Python 2.  More on that later.
>
> It turns out this isn't actually my problem.  Here's what's going wrong.
>
> My py3 branch has a fairly straightforward tox.ini.  When I run `tox -e py34`
> it builds the hidden .tox/py34 virtualenv, installs all the dependencies into
> this, runs the test suite, and all is happy.  Since I do most of my
> development this way, I never noticed any problems.
>
> But then I wanted to work with the code in a slightly different way, so I
> created a virtual environment, activated it, and ran `python setup.py develop`
> which appeared to work fine.  The problem though is that some packages are not
> importable by the venv's python.  Let's concentrate on three of these
> dependencies, lazr.config, lazr.delegates, and lazr.smtptest.  All three give
> ImportErrors when trying to import them from the interactive prompt of the
> venv's python.
>
> I tried something different to see if I could narrow down the problem, so I
> created another venv, activated it, and ran `pip install lazr.config
> lazr.delegates lazr.smtptest`.  Firing up the venv's interactive python, I can
> now import all three of them just fine.
>
> So this tells me that `python setup.py develop` is broken, or at the very
> least behaves differently that the `pip install` route.  Actually, `python
> setup.py install` exhibits the same problem.
>
> Just to be sure Debian's Python 3.4 package isn't introducing any weirdnesses,
> all of this work was done with an up-to-date build of Python from hg, using
> the 3.4 branch, and the virtualenvs were created with `./python -m venv`
>
> Let's look more closely at part of the site-packages directories of the two
> virtualenvs.  The one that fails looks like this:
>
>   drwxrwxr-x  [...] lazr.config-2.0.1-py3.4.egg
>   drwxrwxr-x  [...] lazr.delegates-2.0.1-py3.4.egg
>   drwxrwxr-x  [...] lazr.smtptest-2.0.2-py3.4.egg
>
> All three directories exhibit the same pattern:
>
> lazr.config-2.0.1-py3.4.egg
>   EGG-INFO
>       namespace_packages.txt (contains one line saying "lazr")
>   lazr
>       config
>       __init__.py  (contains the pseudo-namespace boilerplate code[+])
>       __pycache__
>
> [+] that boilerplate code looks like:
>
> -----snip snip-----
> # This is a namespace package, however under >= Python 3.3, let it be a true
> # namespace package (i.e. this cruft isn't necessary).
> import sys
>
> if sys.hexversion < 0x30300f0:
>     try:
>         import pkg_resources
>         pkg_resources.declare_namespace(__name__)
>     except ImportError:
>         import pkgutil
>         __path__ = pkgutil.extend_path(__path__, __name__)
> -----snip snip-----
>
> but of course, this doesn't really play nicely with PEP 420 because it's the
> mere existence of the __init__.py in the namespace package that prevents PEP
> 420 from getting invoked.  As an aside, Debian's packaging tools will actually
> *remove* this __init__.py for declared namespace packages in Python 3, so
> under Debian-land, things do work properly.  The upstream tools have no such
> provision, so it's clear why we're getting the ImportErrors.  These three
> packages do not share a PEP 420-style namespace, and the sys.hexversion guard
> prevents the old-style pseudo-namespace hack from working.
>
> This could potentially be fixable in the upstream packages, but again, more on
> that later.
>
> Now let's look at the working venvs:
>
>   drwxrwxr-x  [...] lazr
>   drwxrwxr-x  [...] lazr.config-2.0.1-py3.4.egg-info
>   -rw-rw-r--  [...] lazr.config-2.0.1-py3.4-nspkg.pth
>   drwxrwxr-x  [...] lazr.delegates-2.0.1-py3.4.egg-info
>   -rw-rw-r--  [...] lazr.delegates-2.0.1-py3.4-nspkg.pth
>   drwxrwxr-x  [...] lazr.smtptest-2.0.2-py3.4.egg-info
>   -rw-rw-r--  [...] lazr.smtptest-2.0.2-py3.4-nspkg.pth
>
>
> This looks quite different, and digging into the 'lazr' directory shows:
>
> lazr
>    config
>    delegates
>    smtptest
>
> No __init__.py in 'lazr', and the sub-package directories contain just the
> code for the appropriate library.  The egg-info directories look similar, but
> now we also have -nspkg.pth files which contain something like the following:
>
> -----snip snip-----
> import sys, types, os;p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('lazr',));ie = os.path.exists(os.path.join(p,'__init__.py'));m = not ie and sys.modules.setdefault('lazr', types.ModuleType('lazr'));mp = (m or []) and m.__dict__.setdefault('__path__',[]);(p not in mp) and mp.append(p)
> -----snip snip-----
>
> which look to me a lot like they are setting up something akin to the PEP
> 420-ish virtual `lazr` top-level package.  It all makes sense too; importing
> lazr.{config.delegates,smtptest} all come from the expected locations and
> importing 'lazr' gives you a stripped down module object with no __file__.
....
> Stepping back a moment, what is it I want?  I want `python setup.py develop`
> to work properly in both Python 2 and Python 3 virtualenvs.  I want clear
> rules we can give library authors so that they can be sure their bilingual
> namespace packages will work properly in all supported venv environments.

> How to best accomplish this?
>
> - Should distutils/setuptools remove the namespace package __init__.py files
>   automatically?  (Something like what the Debian packaging tools do.)
>
> - Should some other boilerplate be added to the namespace-package __init__.py
>   files in each portion?  Maybe the sys.hexversion guards should be removed so
>   that it acts the same way in both Python 2 and Python 3.
>
> - Should we introduce some other protocol in Python 3.5 that would allow such
>   bilingual namespace packages to actually *be* PEP 420 namespace packages in
>   Python 3?  I took a quick review of importlib and it seems tricky, since
>   it's the mere existence of the top-level __init__.py files that trigger PEP
>   420 namespace packages.  So putting some code in the __init__.py won't get
>   executed until the top-level package is attempted to be imported.  Any
>   trigger for "hey, treat me as a namespace package" would have to be file
>   system based, since even the namespace_packages.txt file may not exist.
>
> I'll stop here.  Happy new year and let the games begin.

So I dug down ridiculously deep into this when investigating a very
similar issue in the openstack space. I have an alternative solution
to the ones mentioned already - use importlib2 and patch the importer
(e.g. in a local site.py) to allow genuine namespaces to work. (Its
nearly there - just needs a little more love I think).

>From memory, this is the set of issues:
 - setuptools 'namespace' packages are actual nspth files which
directly alter the state of the interpreter to create a module.
 - they *don't* fixup the path in that module to deal with virtualenvs
(in particular sys.base_prefix will differ from sys.prefix and you
need to put all possible paths that modules within the namespace are
found at into the holder object)
 - genuine namespace packages don't work with older pythons at all
 - venn-diagram style splitouts (real common package with a __init__)
don't work in a venv environment when some modules are in the base os
and some in the virtualenv, because the path isn't adjusted... unless
the common package takes care to fixup it's own path... which needs to
include the OS site dir, the virtual env site dir, and the local
working directory (for pip install -e . support).

I did in fact get a venn-diagram style setup that would work for all
pythons, for all modes, by taking advantage of pip not seeing the
non-virtualenv copies of things and just installing them duplicatively
when that occured, but we decided that it was probably going to end up
with whack-a-mole if we'd missed any use cases along the way, so we
decided to just migrate away from namespace packages at all.

If it would help I can dig up my test code etc, though its probably
all googleable by now anyhow.

-Rob





-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud


More information about the Distutils-SIG mailing list