Path hacking [Long] (was Re: [Python-Dev] Relative Package Imports)

Barry A. Warsaw bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Tue, 14 Sep 1999 12:19:34 -0400 (EDT)


Finally, something I can relate to.  Although I have a goal of
packagizing everything I write these days, I haven't experienced any
of the problems that lead others to suggest relative imports.  The
most complicated app that I hack on (continuously) is Mailman, which
has a main package and several subpackages off the main one.  I always
use absolute paths in my import statements, so I don't see what the
fuss is about.  But I'm perfectly willing to admit that I don't have
enough experience.

However...

>>>>> "JCA" == James C Ahlstrom <jim@interet.com> writes:

    JCA> I find the PYTHONPATH mechanism totally unreliable for
    JCA> commercial programs anyway.  It is a global object, and an
    JCA> installation of a second Python program can break the first
    JCA> one.  I don't think there is any solution to this other than
    JCA> specify sys.path on a per-application basis.  If this is
    JCA> false, what is the other solution?

I completely agree with JimA here.  It's been a pain with the Knowbot
stuff, a pain with Mailman, and a pain with other packages that I've
installed for shared use within CNRI.  The .pth files solve part of
the problem nicely.  They let me install, say PIL or PCT in a shared
location, for access by all the Python users at my site, without the
users having to individually hack their dot-files, etc.

But this doesn't work so well for apps like Mailman or the Knowbot
stuff because we can't expect that the person installing those
applications will be able to install a .pth file in the right place.
Also, .pth files don't let you tightly control sys.path, e.g. you can
only add paths, not delete or reorder them.

Plus you have a global naming problem.  Mailman's top level package is 
called "Mailman", so I can be fairly confident that I'm not going to
collide, but it means that I have an extra directory level within my
install that contains all the core importable modules.  I don't think
that's a big deal, but it's a convention that other packaged app
writers should follow.

The problem is getting Mailman's (or the Knowbots') top level
directory on sys.path, and in exactly controlling the contents of
sys.path.

Our first approach with Knowbots was to do direct sys.path.insert()s,
which is quite ugly and error prone.  Plus if you're adding many
paths, or adding and deleting, that's a lot of gibberish at the top of 
your entry level executables.  And now let's say that you have a dozen 
or two dozen entry level executables that all have to perform the same 
sys.path magic.  That's a lot of cutting-and-pasting (and /highly/
error prone patching when directory structures change).  It's a lose.

So for Knowbots we wrote a small module called pathhack that all entry
level executables imported.  pathhack was good because it put all that
sys.path munging nonsense in one place so it was manageable from a s/w
engineering standpoint.  But it sucked because those executables had
to /find/ pathhack.py!  Bootstrap lossage (we've actually gone back to
sys.path.insert).

With Mailman, I could solve that problem because I added a
configure/make phase.  This let me write a module template called
paths.py.in which configure flippered into paths.py containing path
hackage based on --prefix.  The next trick was that "make install"
copied that paths.py file into all the subdirectories that had top
level entry points into the Mailman system (e.g. the bin directory,
the cron directory, the cgi directory).  So now, an executable need
only do

    import paths
    import Mailman.Utils
    import Mailman.Logging.Utils

and absolute paths work like a charm.  I can even provide a
`pythonlib' directory that contains newer versions of standard modules 
that have fixes for folks running older Pythons.  Thus I do

    from Mailman.pythonlib import rfc822

and the rest of my code uses my special rfc822 module with no changes.

I'm very happy with how this works for Mailman, however we can't use
the same approach (or let's say Guido doesn't want to use this
approach) for the Knowbots stuff because there /is/ no "make install"
step.  You just unpack it and go.  But it still has to play lots of
games searching the file system for various things.

What I've been thinking is that Python needs a registry <shudder>.
JPython's already got such a beast, and it integrates with Java's
system properties, so that things like the PYTHONPATH equivalent are
set in the registry and immediately available.  But it's not very
flexible, and you still need an install step in order to bootstrap the
locating of the registry.

I think we can do a little bit better.  Python already knows how to
find it's sys module.  We can add an object into sys, call it
sys.registry, which would contain things like sys.path definitions,
and all sorts of other application specific keys.  This object would
be tied to a file (or files) which might be human readable, a
marshal/pickle (or both).  Bootstrap location of this file(s) is an
issue, but see below.

This would let you do things like the following at the beginning of
every top level executable:

    import sys
    sys.application = 'zope'
    sys.registry.setpath(sys.application+'.pythonpath')

I'm sure all kinds of lengthy discussion will now ensue about the
exact interface of the registry object, but I'll make just a few
observations:

- There should be a system wide registry and a user specific
  registry.  This let's an admin install shared applications easily,
  but also lets individual users have their own overrides.

- The system-wide registry can be located in say
  sys.prefix/lib/python<version>/site-packages.  The user registry
  would reside somewhere in $HOME.  This could all be platform
  specific so that on Windows, maybe the Python registry is integrated 
  with the Windows registry, while in JPython it would be integrated
  with the standard JPython registry mechanism.

- You should be able to specify registry entries on the command line.

- There needs to be defined rules for resolving registry keys b/w
  system, user, and command line specifications.  JPython has some
  experience here (although there have been requests to change
  JPython's lookup order), and at the very least, JPython and CPython
  should be as consistent as possible (CPython won't have to merge in
  Java's system properties).

- The sys.registry object should be read/writable.  This would let an
  install script do something like:

  import sys
  sys.registry.lock()
  sys.registry.put('zope.pythonpath',
                   '@prefix@:@prefix@/matools:@prefix@/pythonlib')
  sys.registry.write()
  sys.registry.unlock()

  which would write either the global system registry or the local
  user registry, depending on permissions (or maybe that's spelled
  explicitly in the API).

- In a sense you're pushing the namespace issue up a level into the
  registry, but at least this is a domain we can completely control
  from Python; it abstracts away the file system, and I don't think
  there's any way to avoid requiring conventions and cooperation for
  registry key naming.  I also don't think it'll be a big problem in
  practice.  When I packagize and re-release my Zarathustra's Ocular
  Python Experience virtual reality system, I'll try to think of a
  non-colliding top level package name.

- (oh darn, I know I had more points, but Guido just popped in and I
  lost my train of thought).

Well, this has gone on long enough so I might as well let you guys
shoot this idea all to hell.  Let me close by saying that while I
think the Windows registry is a mess, I also think that it might be
useful for Python.  Does it solve the same problem that the relative
imports is trying to solve?  I dunno, but that's why I changed the
Subject: line above. :)

-Barry