Path hacking [Long] (was Re: [Python-Dev] Relative Package Imports)
Barry A. Warsaw
bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Tue, 14 Sep 1999 12:19:34 -0400 (EDT)
Finally, something I can relate to. Although I have a goal of
packagizing everything I write these days, I haven't experienced any
of the problems that lead others to suggest relative imports. The
most complicated app that I hack on (continuously) is Mailman, which
has a main package and several subpackages off the main one. I always
use absolute paths in my import statements, so I don't see what the
fuss is about. But I'm perfectly willing to admit that I don't have
enough experience.
However...
>>>>> "JCA" == James C Ahlstrom <jim@interet.com> writes:
JCA> I find the PYTHONPATH mechanism totally unreliable for
JCA> commercial programs anyway. It is a global object, and an
JCA> installation of a second Python program can break the first
JCA> one. I don't think there is any solution to this other than
JCA> specify sys.path on a per-application basis. If this is
JCA> false, what is the other solution?
I completely agree with JimA here. It's been a pain with the Knowbot
stuff, a pain with Mailman, and a pain with other packages that I've
installed for shared use within CNRI. The .pth files solve part of
the problem nicely. They let me install, say PIL or PCT in a shared
location, for access by all the Python users at my site, without the
users having to individually hack their dot-files, etc.
But this doesn't work so well for apps like Mailman or the Knowbot
stuff because we can't expect that the person installing those
applications will be able to install a .pth file in the right place.
Also, .pth files don't let you tightly control sys.path, e.g. you can
only add paths, not delete or reorder them.
Plus you have a global naming problem. Mailman's top level package is
called "Mailman", so I can be fairly confident that I'm not going to
collide, but it means that I have an extra directory level within my
install that contains all the core importable modules. I don't think
that's a big deal, but it's a convention that other packaged app
writers should follow.
The problem is getting Mailman's (or the Knowbots') top level
directory on sys.path, and in exactly controlling the contents of
sys.path.
Our first approach with Knowbots was to do direct sys.path.insert()s,
which is quite ugly and error prone. Plus if you're adding many
paths, or adding and deleting, that's a lot of gibberish at the top of
your entry level executables. And now let's say that you have a dozen
or two dozen entry level executables that all have to perform the same
sys.path magic. That's a lot of cutting-and-pasting (and /highly/
error prone patching when directory structures change). It's a lose.
So for Knowbots we wrote a small module called pathhack that all entry
level executables imported. pathhack was good because it put all that
sys.path munging nonsense in one place so it was manageable from a s/w
engineering standpoint. But it sucked because those executables had
to /find/ pathhack.py! Bootstrap lossage (we've actually gone back to
sys.path.insert).
With Mailman, I could solve that problem because I added a
configure/make phase. This let me write a module template called
paths.py.in which configure flippered into paths.py containing path
hackage based on --prefix. The next trick was that "make install"
copied that paths.py file into all the subdirectories that had top
level entry points into the Mailman system (e.g. the bin directory,
the cron directory, the cgi directory). So now, an executable need
only do
import paths
import Mailman.Utils
import Mailman.Logging.Utils
and absolute paths work like a charm. I can even provide a
`pythonlib' directory that contains newer versions of standard modules
that have fixes for folks running older Pythons. Thus I do
from Mailman.pythonlib import rfc822
and the rest of my code uses my special rfc822 module with no changes.
I'm very happy with how this works for Mailman, however we can't use
the same approach (or let's say Guido doesn't want to use this
approach) for the Knowbots stuff because there /is/ no "make install"
step. You just unpack it and go. But it still has to play lots of
games searching the file system for various things.
What I've been thinking is that Python needs a registry <shudder>.
JPython's already got such a beast, and it integrates with Java's
system properties, so that things like the PYTHONPATH equivalent are
set in the registry and immediately available. But it's not very
flexible, and you still need an install step in order to bootstrap the
locating of the registry.
I think we can do a little bit better. Python already knows how to
find it's sys module. We can add an object into sys, call it
sys.registry, which would contain things like sys.path definitions,
and all sorts of other application specific keys. This object would
be tied to a file (or files) which might be human readable, a
marshal/pickle (or both). Bootstrap location of this file(s) is an
issue, but see below.
This would let you do things like the following at the beginning of
every top level executable:
import sys
sys.application = 'zope'
sys.registry.setpath(sys.application+'.pythonpath')
I'm sure all kinds of lengthy discussion will now ensue about the
exact interface of the registry object, but I'll make just a few
observations:
- There should be a system wide registry and a user specific
registry. This let's an admin install shared applications easily,
but also lets individual users have their own overrides.
- The system-wide registry can be located in say
sys.prefix/lib/python<version>/site-packages. The user registry
would reside somewhere in $HOME. This could all be platform
specific so that on Windows, maybe the Python registry is integrated
with the Windows registry, while in JPython it would be integrated
with the standard JPython registry mechanism.
- You should be able to specify registry entries on the command line.
- There needs to be defined rules for resolving registry keys b/w
system, user, and command line specifications. JPython has some
experience here (although there have been requests to change
JPython's lookup order), and at the very least, JPython and CPython
should be as consistent as possible (CPython won't have to merge in
Java's system properties).
- The sys.registry object should be read/writable. This would let an
install script do something like:
import sys
sys.registry.lock()
sys.registry.put('zope.pythonpath',
'@prefix@:@prefix@/matools:@prefix@/pythonlib')
sys.registry.write()
sys.registry.unlock()
which would write either the global system registry or the local
user registry, depending on permissions (or maybe that's spelled
explicitly in the API).
- In a sense you're pushing the namespace issue up a level into the
registry, but at least this is a domain we can completely control
from Python; it abstracts away the file system, and I don't think
there's any way to avoid requiring conventions and cooperation for
registry key naming. I also don't think it'll be a big problem in
practice. When I packagize and re-release my Zarathustra's Ocular
Python Experience virtual reality system, I'll try to think of a
non-colliding top level package name.
- (oh darn, I know I had more points, but Guido just popped in and I
lost my train of thought).
Well, this has gone on long enough so I might as well let you guys
shoot this idea all to hell. Let me close by saying that while I
think the Windows registry is a mess, I also think that it might be
useful for Python. Does it solve the same problem that the relative
imports is trying to solve? I dunno, but that's why I changed the
Subject: line above. :)
-Barry