[Distutils] nspkg.pth files break $PYTHONPATH overrides

Barry Warsaw barry at python.org
Mon Mar 24 22:48:07 CET 2014


Apologies for cross-posting, but this intersects setuptools and the import
system, and I wanted to be sure it reached the right audience.

A colleague asked me why a seemingly innocent and common use case for
developing local versions of system installed packages wasn't working, and I
was quite perplexed.  As I dug into the problem, more questions than answers
came up.  I finally (think! I) figured out what is happening, but not so much
as to why, or what can/should be done about it.

This person had a local checkout of a package's source, where the package was
also installed into the system Python.  He wanted to be able to set
$PYTHONPATH so that the local package wins when he tries to import it.  E.g.:

% PYTHONPATH=`pwd`/src python3

but this didn't work because despite the setting of PYTHONPATH, the system
version of the package was always found first.  The package in question is
lazr.uri, although other packages with similar layouts will also suffer the
same problem, which prevents an easy local development of a newer version of
the package, aside from being a complete head-scratcher.

The lazr.uri package is intended to be a submodule of the lazr namespace
package.  As such, the lazr/__init__.py has the old style way of declaring a
namespace package:

try:
    import pkg_resources
    pkg_resources.declare_namespace(__name__)
except ImportError:
    import pkgutil
    __path__ = pkgutil.extend_path(__path__, __name__)

and its setup.py declares a namespace package:

setup(
    name='lazr.uri',
    version=__version__,
    namespace_packages=['lazr'],
    ...

One of the things that the Debian "helper" program does when it builds a
package for the archive is call `$python setup.py install_egg_info`.  It's
this command that breaks $PYTHONPATH overriding.

install_egg_info looks at the lazr.uri.egg-info/namespace_packages.txt file,
in which it finds the string 'lazr', and it proceeds to write a
lazr-uri-1.0.3-py3.4-nspkg.pth file.  This causes other strange and unexpected
things to happen:

% python3
Python 3.4.0 (default, Mar 22 2014, 22:51:25) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules['lazr']  
<module 'lazr'>
>>> sys.modules['lazr'].__path__  
['/usr/lib/python3/dist-packages/lazr']

It's completely weird that sys.modules would contain a key for 'lazr' when
that package was never explicitly imported.  Even stranger, because a fake
module object is stuffed into sys.modules via the .pth file, tracing imports
with -v gives you no clue as to what's happening.  And while
sys.modules['lazr'] has an __path__, it has no other attributes.

I really don't understand what the purpose of the nspkg.pth file is,
especially for Python 3 namespace packages.

Here's what the nspkg.pth file contains:

import sys,types,os; p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('lazr',)); ie = os.path.exists(os.path.join(p,'__init__.py')); m = not ie and sys.modules.setdefault('lazr',types.ModuleType('lazr')); mp = (m or []) and m.__dict__.setdefault('__path__',[]); (p not in mp) and mp.append(p)

The __path__ value is important here because even though you've never
explicitly imported 'lazr', when you *do* explicitly import 'lazr.uri', the
existing lazr module object's __path__ takes over, and thus the system
lazr.uri package is found even though both lazr/ and lazr/uri/ should have
been found earlier on sys.path (yes, sys.path looks exactly as expected).

So the presence of the nspkg.pth file breaks $PYTHONPATH overriding.  That
seems bad. ;)

If you delete the nspkg.path file, then things work as expected, but even this
is a little misleading!

I think the Debian helper is running install_egg_info as a way to determine
what namespace packages are defined, so that it can actually *remove* the
parent's __init__.py file and use PEP 420 style namespace packages.  In fact,
in the Debian python3-lazr.uri binary package, you find no system
lazr/__init__.py file.  This is why removing the nspkg.pth file works.

So I thought, why not conditionally define setup(..., namespace_packages) only
for Python 2?  This doesn't work because the Debian helper will see that no
namespace packages are defined, and thus it will leave the original
lazr/__init__.py file in place.  This then breaks $PYTHONPATH overriding too
because of __path__ extension of the pre-PEP 420 code only *appends* the local
development path.  IOW, the system import path is the first element of a
2-element list on lazr.__path__.  While the local import path is the second
element, in this case too the local import fails.

It seems like what you want for Python 3 (and we're talking >= 3.2 here) is
for there to be neither a nspkg.pth file, nor the lazr/__init__.py file, and
let PEP 420 do it's thing.  In fact if you set things up this way, $PYTHONPATH
overriding works exactly as expected.

Because I don't know why install_egg_info is installing the nspkg.pth file, I
don't know which component needs to be changed:

 * Change setuptools install_egg_info command to not install an nspkg.pth file
   even for namespace_package declare packages, at least under Python 3.
   This behavior seems pretty nasty all by itself because it magically and
   untraceably installs stripped down module objects in sys.modules when
   Python first scans the import path.

 * Change the Debian helper to remove the nspkg.pth file, or not call
   install_egg_info *and* continue to remove <nspkg>/__init__.py in Python 3
   so as to take advantage of PEP 420.  It's nice to know that PEP 420
   actually represents something sane. :)

For added bonus, we have this additional oddity:

% PYTHONPATH=`pwd`/src python3
Python 3.4.0 (default, Mar 22 2014, 22:51:25) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.modules['lazr']
<module 'lazr'>
>>> sys.modules['lazr'].__path__
['/usr/lib/python3/dist-packages/lazr']
>>> import lazr.uri
>>> lazr.uri.__file__
'/usr/lib/python3/dist-packages/lazr/uri/__init__.py'
>>> sys.modules['lazr']
<module 'lazr' from '/home/barry/projects/ubuntu/lazruri/trusty/src/lazr/__init__.py'>
>>> sys.modules['lazr'].__path__
['/home/barry/projects/ubuntu/lazruri/trusty/src/lazr', '/usr/lib/python3/dist-packages/lazr']


Notice how importing lazr.uri *replaces* sys.modules['lazr'] with the local
development one, even though it still imports lazr.uri from the system path.
I'm not exactly sure how this happens, but I've traced that to
_LoaderBasics.exec_module()'s call of _call_with_frames_removed(), which
exec's lazr.uri's code object into that module's __dict__.   Nothing in
lazr/uri/__init__.py should be doing that, afaict from both visual inspection
of the code and disassembling the compiled code object.

Hopefully I've explained the situation correctly and lucidly.  Below I'll
describe how to set up a reproducible environment on a Debian machine.
Thoughts and comments are welcome!

Cheers,
-Barry

% sudo apt-get install python3-lazr.uri
% cd tmp
% bzr branch lp:lazr.uri trunk
% cd trunk
% PYTHONPATH=`pwd`/src python3
(Then try things at the Python prompt from above.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140324/9f398bce/attachment-0001.sig>


More information about the Distutils-SIG mailing list