[Python-Dev] New relative import issue

Josiah Carlson jcarlson at uci.edu
Fri Sep 22 21:42:17 CEST 2006


"Phillip J. Eby" <pje at telecommunity.com> wrote:
> At 12:08 AM 9/22/2006 -0700, Josiah Carlson wrote:
> >"Phillip J. Eby" <pje at telecommunity.com> wrote:
> > > At 08:44 PM 9/21/2006 -0700, Josiah Carlson wrote:
[snip]
> You misunderstood me: I mean that the per-user database must be able to 
> store information for *different Python versions*.  Having a single 
> per-user database without the ability to include configuration for more 
> than one Python version (analagous to the current situation with the 
> distutils per-user config file) is problematic.

Just like having different systemwide databases for each Python version
makes sense, why wouldn't we have different user databases for each
Python version?  Something like ~/.python_packages.2.6 and
~/.python_packages.3.0

Also, by separating out the files per Python version, we can also
guarantee database compatability for any fixed Python series (2.5.x, etc.).
I don't know if the internal organization of SQLite databases changes
between revisions in a backwards compatible way, so this may not
actually be a concern (it is with bsddb).


> In truth, a per-user configuration is just a special case of the real need: 
> to have per-application environments.  In effect, a per-user environment is 
> a fallback for not having an appplication environment, and the system 
> environment is a fallback for not having a user environment.

I think you are mostly correct.  The reason you are not completely
correct is that if I were to install psyco, and I want all applications
that could use it to use it (they guard the psyco import with a
try/except), I merely need to register the package in the systemwide (or
user) package registery.  No need to muck about with each environment I
(or my installed applications) have defined, it just works.  Is it a
"fallback"?  Sure, but I prefer to call them "convenient defaults".


> >I didn't mention the following because I thought it would be superfluous,
> >but it seems that I should have stated it right out.  My thoughts were
> >that on startup, Python would first query the 'system' database, caching
> >its results in a dictionary, then query the user's listing, updating the
> >dictionary as necessary, then unload the databases.  On demand, when
> >code runs packages.register(), if both persist and systemwide are False,
> >it just updates the dictionary. If either are true, it opens up and
> >updates the relevant database.
> 
> Using a database as the primary mechanism for managing import locations 
> simply isn't workable.

Why?  Remember that this database isn't anything other than a
persistance mechanism that has pre-built locking semantics for
multi-process opening, reading, writing, and closing.  Given proper
cross-platform locking, we could use any persistance mechanism as a
replacement; miniconf, Pickle, marshal; whatever.


> You might as well suggest that each environment 
> consist of a single large zipfile containing the packages in question: this 
> would actually be *more* practical (and fast!) in terms of Python startup, 
> and is no different from having a database with respect to the need for 
> installation and uninstallation to modify a central file!

We should remember that the sizes of databases that (I expect) will be
common, we are talking about maybe 30k if a user has installed every
package in pypi.  And after the initial query, everything will be stored
in a dictionary or dictionary-like object, offering faster query times
than even a zip file (though loading the module/package from disk won't
have its performance improved).


> I'm not proposing we do that -- I'm just pointing out why using an actual 
> database isn't really workable, considering that it has all of the 
> disadvantages of a big zipfile, and none of the advantages (like speed, 
> having code already written that supports it, etc.)

SQLite is pretty fast.  And for startup, we are really only performing a
single query per database "SELECT * FROM package_registry".  It will end
up reading the entire database, but these databases will be generally
small, perhaps a few dozen rows, maybe a few thousand if we have set up
a bunch of installation-time application environments.


> >This is easily remedied with a proper 'packages' implementation:
> >
> >     python -Mpackages name path
> >
> >Note that Python could auto-insert standard library and site-packages
> >'packages' on startup (creating the initial dictionary, then the
> >systemwide, then the user, ...).
> 
> I presume here you're suggesting a way to select a runtime environment from 
> the command line, which would certainly be a good idea.

Actually, I'm offering a way of *registering* a package with the
repository from the command line.  I'm of the opinion that setting the
environment via command line for the subsequent Python runs is a bad
idea, but then again, I have been using wxPython's wxversion method for
a while to select which wxPython installation I want to use, and find
things like:

    import wxversion
    wxversion.ensureMinimal('2.6-unicode', optionsRequired=True)

To be exactly the amount of control I want, where I want it.

Further, a non-command-line mechanism to handle environment would save
people from mucking up their Python runtime environment if they forget
to switch it back to a 'default'.


With a package registry (perhaps as I have been describing, perhaps
something different), all of the disparate ways of choosing a version of
a library during import can be removed in favor of a single mechanism. 
This single mechanism could handle things like the wxPython
'ensureMinimal', perhaps even 'ensure exact' or 'use latest'.


> > > These are just a few of the issues that come to mind.  Realistically
> > > speaking, .pth files are currently the most effective mechanism we have,
> > > and there actually isn't much that can be done to improve upon them.
> >
> >Except that .pth files are only usable in certain (likely) system paths,
> >that the user may not have write access to.  There have previously been
> >proposals to add support for .pth files in the path of the run .py file,
> >but they don't seem to have gotten any support.
> 
> Setuptools works around this by installing an enhancement for the 'site' 
> module that extends .pth support to include all PYTHONPATH 
> directories.  The enhancement delegates to the original site module after 
> recording data about sys.path that the site module destroys at startup.

But wasn't there a recent discussion describing how keeping persistant
environment variables is a PITA both during install and runtime? 
Extending .pth files to PYTHONPATH seems to me like a hack meant to work
around the fact that Python doesn't have a package registry.  And really,
all of the current sys.path + .pth + PYTHONPATH stuff could be subsumed
into a *single* mechanism.

I'm of the opinion that the current system of paths, etc., are a bit
cumbersome.  And I think that we can do better, either with the
mechanism I am describing, or otherwise.


> >I believe that most of the concerns that you have brought up can be
> >addressed,
> 
> Well, as I said, I've already dealt with them, using .pth files, for the 
> use cases I care about.  Ian Bicking and Jim Fulton have also gone farther 
> with work on tools to create environments with greater isolation or more 
> fixed version linkages than what setuptools does.  (Setuptools-generated 
> environments dynamically select requirements based on available versions at 
> runtime, while Ian and Jim's tools create environments whose inter-package 
> linkages are frozen at installation time.)

All of these cases could be handled by a properly designed package
registry mechanism.


> >and I think that it could be far nicer to deal with than the
> >current sys.path hackery.
> 
> I'm not sure of that, since I don't yet know how your approach would deal 
> with namespace packages, which are distributed in pieces and assembled 
> later.  For example, many PEAK and Zope distributions live in the peak.* 
> and zope.* package namespaces, but are installed separately, and glued 
> together via __path__ changes (see the pkgutil docs).

    packages.register('zope', '/path/to/zope')

And if the installation path is different:

    packages.register('zope.subpackage', '/different/path/to/subpackage/')

Otherwise the importer will know where the zope (or peak) package exists
in the filesystem (or otherwise), and search it whenever 'from zope
import ...' is performed.


> Thus, if you are talking about a packagename->importer mapping, it has to 
> take into consideration the possibility of multiple import locations for 
> the same package.

Indeed.  But this is not any different than the "multiple import
locations for any absolute import" in all Pythons.  Only now we don't
need to rely on sys.path, .pth, PYTHONPATH, or monkey patching site.py,
and we don't need to be adding packages to the root of the absolute
import hierarchy: I can add my own package/module to the email package
if I want, and I don't even need to bork the system install to do it.


 - Josiah



More information about the Python-Dev mailing list