[Distutils] Bilingual namespace package conundrum

Barry Warsaw barry at python.org
Thu Jan 1 20:19:16 CET 2015


I hope the following makes sense; I've been a little under the weather. ;)
Apologies in advance for the long data-dump, but I wanted to provide as
complete information as possible.

I have ported Mailman 3 to Python 3.4[*].  While working on various
development tasks, I noticed something rather strange regarding
(pseudo-)namespace packages which MM3 is dependent on.  These libraries work
for both Python 2 and 3, but of course namespace packages work completely
differently in those two versions (as of Python 3.3, i.e. PEP 420).  I've been
thinking about how such bilingual packages can play better with PEP 420 when
running under a compatible Python 3 version, and still *seem* to work well in
Python 2.  More on that later.

It turns out this isn't actually my problem.  Here's what's going wrong.

My py3 branch has a fairly straightforward tox.ini.  When I run `tox -e py34`
it builds the hidden .tox/py34 virtualenv, installs all the dependencies into
this, runs the test suite, and all is happy.  Since I do most of my
development this way, I never noticed any problems.

But then I wanted to work with the code in a slightly different way, so I
created a virtual environment, activated it, and ran `python setup.py develop`
which appeared to work fine.  The problem though is that some packages are not
importable by the venv's python.  Let's concentrate on three of these
dependencies, lazr.config, lazr.delegates, and lazr.smtptest.  All three give
ImportErrors when trying to import them from the interactive prompt of the
venv's python.

I tried something different to see if I could narrow down the problem, so I
created another venv, activated it, and ran `pip install lazr.config
lazr.delegates lazr.smtptest`.  Firing up the venv's interactive python, I can
now import all three of them just fine.

So this tells me that `python setup.py develop` is broken, or at the very
least behaves differently that the `pip install` route.  Actually, `python
setup.py install` exhibits the same problem.

Just to be sure Debian's Python 3.4 package isn't introducing any weirdnesses,
all of this work was done with an up-to-date build of Python from hg, using
the 3.4 branch, and the virtualenvs were created with `./python -m venv`

Let's look more closely at part of the site-packages directories of the two
virtualenvs.  The one that fails looks like this:

  drwxrwxr-x  [...] lazr.config-2.0.1-py3.4.egg
  drwxrwxr-x  [...] lazr.delegates-2.0.1-py3.4.egg
  drwxrwxr-x  [...] lazr.smtptest-2.0.2-py3.4.egg

All three directories exhibit the same pattern:

lazr.config-2.0.1-py3.4.egg
  EGG-INFO
      namespace_packages.txt (contains one line saying "lazr")
  lazr
      config
      __init__.py  (contains the pseudo-namespace boilerplate code[+])
      __pycache__

[+] that boilerplate code looks like:

-----snip snip-----
# This is a namespace package, however under >= Python 3.3, let it be a true
# namespace package (i.e. this cruft isn't necessary).
import sys

if sys.hexversion < 0x30300f0:
    try:
        import pkg_resources
        pkg_resources.declare_namespace(__name__)
    except ImportError:
        import pkgutil
        __path__ = pkgutil.extend_path(__path__, __name__)
-----snip snip-----

but of course, this doesn't really play nicely with PEP 420 because it's the
mere existence of the __init__.py in the namespace package that prevents PEP
420 from getting invoked.  As an aside, Debian's packaging tools will actually
*remove* this __init__.py for declared namespace packages in Python 3, so
under Debian-land, things do work properly.  The upstream tools have no such
provision, so it's clear why we're getting the ImportErrors.  These three
packages do not share a PEP 420-style namespace, and the sys.hexversion guard
prevents the old-style pseudo-namespace hack from working.

This could potentially be fixable in the upstream packages, but again, more on
that later.

Now let's look at the working venvs:

  drwxrwxr-x  [...] lazr
  drwxrwxr-x  [...] lazr.config-2.0.1-py3.4.egg-info
  -rw-rw-r--  [...] lazr.config-2.0.1-py3.4-nspkg.pth
  drwxrwxr-x  [...] lazr.delegates-2.0.1-py3.4.egg-info
  -rw-rw-r--  [...] lazr.delegates-2.0.1-py3.4-nspkg.pth
  drwxrwxr-x  [...] lazr.smtptest-2.0.2-py3.4.egg-info
  -rw-rw-r--  [...] lazr.smtptest-2.0.2-py3.4-nspkg.pth


This looks quite different, and digging into the 'lazr' directory shows:

lazr
   config
   delegates
   smtptest

No __init__.py in 'lazr', and the sub-package directories contain just the
code for the appropriate library.  The egg-info directories look similar, but
now we also have -nspkg.pth files which contain something like the following:

-----snip snip-----
import sys, types, os;p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('lazr',));ie = os.path.exists(os.path.join(p,'__init__.py'));m = not ie and sys.modules.setdefault('lazr', types.ModuleType('lazr'));mp = (m or []) and m.__dict__.setdefault('__path__',[]);(p not in mp) and mp.append(p)
-----snip snip-----

which look to me a lot like they are setting up something akin to the PEP
420-ish virtual `lazr` top-level package.  It all makes sense too; importing
lazr.{config.delegates,smtptest} all come from the expected locations and
importing 'lazr' gives you a stripped down module object with no __file__.

Stepping back a moment, what is it I want?  I want `python setup.py develop`
to work properly in both Python 2 and Python 3 virtualenvs.  I want clear
rules we can give library authors so that they can be sure their bilingual
namespace packages will work properly in all supported venv environments.

How to best accomplish this?

- Should distutils/setuptools remove the namespace package __init__.py files
  automatically?  (Something like what the Debian packaging tools do.)

- Should some other boilerplate be added to the namespace-package __init__.py
  files in each portion?  Maybe the sys.hexversion guards should be removed so
  that it acts the same way in both Python 2 and Python 3.

- Should we introduce some other protocol in Python 3.5 that would allow such
  bilingual namespace packages to actually *be* PEP 420 namespace packages in
  Python 3?  I took a quick review of importlib and it seems tricky, since
  it's the mere existence of the top-level __init__.py files that trigger PEP
  420 namespace packages.  So putting some code in the __init__.py won't get
  executed until the top-level package is attempted to be imported.  Any
  trigger for "hey, treat me as a namespace package" would have to be file
  system based, since even the namespace_packages.txt file may not exist.

I'll stop here.  Happy new year and let the games begin.

Cheers,
-Barry

[*] https://mail.python.org/pipermail/mailman-announce/2014-December/000197.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20150101/23978000/attachment.sig>


More information about the Distutils-SIG mailing list