[Distutils] [buildout] path usage and proposal

Jim Fulton jim at zope.com
Mon Jul 6 17:48:25 CEST 2009


For buildout, there are a number of use cases relevant to using system  
Python:

U1
  Use a system Python with all of the goodies and changes it provides.
  This works now if a buildout is bootstrapped with a system Python.

U2
  Use a system python w/o 3rd-party additions.

U3
  Use a system python w/o 3rd-party additions but cherry-pick some
  3rd-part additions.  Note that this is especially attractive on
  system (like ubuntu) that make 3rd-party packages available in
  separate locations.

Right now, buildout deals with path in a somewhat ad-hoc manner. There
are two paths:

- The install path, which normally includes develop-eggs and eggs.  It
  may also include the buildout and setuptools distribution locations.

- The system path.

Buildout searches the install path for distributions that are already
installed. The intent is to skip searching external sources if a
distribution is already installed.  Unfortunately, buildout
unwittingly searches sys.path for installable packages.  This is
because indexes take a search_path argument that specifies a path to
search for installed distribtions.  This defaults to sys.path and
buildout doesn't pass it.  This means that buildout mistakingly
searches sys.path when looking for new packages.  This, of course, is
silly.

Paths are also used by running scripts to search for packages.  When
buildout generates a script, it generates a search path based on the
working set.  Note that it adds the working set locations to the base
path that the script starts with.

For existing behavior, we can work smarter by searching the base path
+ the current install paths for installed distribution and passing an
empty search path the index used to look for uninstalled distributions.

Note that not every distribution found on the base path is
importable.  In particular, an egg may be found on the base path, but
the egg itself must be added to the Python path to be importable.
The base path is used for 2 things:

- A place to look for installed distributions
- A place to import things from at run time.

It would be cleaner to separate these uses, although, for usability,
it might be simpler to keep them together.  There's probably little
harm in including a path in sys.path that *only* includes eggs.

So hear's a proposal:

1. Add an include-site-packages option, defaulting to true.  If false:
   set the base path to those paths not added by running site.py.
   We'll get this by evaling the output of:

     python -Sc "import sys; print (repr(sys.path))"

   The base path will be used for searching for distrbutions *and* for
   computing the run-time path.  The script preamble will become
   someting like::

      sys.path[:] = ... # set sys.path to the base path
      for path in ...: # iterate over working set using expression we  
use now
          if path not in sys.path:
              sys.path.insert(0, path)

   Note:
   a. We completely replace sys.path
   b. If a distro's location was already in the base path, we don't
      insert it.  This will help avoid accidently putting a base path
      ahead of egg-supplied paths.

2. Add a search-path option.  By default, it will be the base path.
   It can be manipulated in 2 ways:

   - It can be set.  If this is done, the base path will be ignored.

   - It can be incremented, for example, to cherry-pick the foo package
     on ubuntu:

       search-path += /usr/share/pycentral/foo/site-packages

So, in summary, we already handle U1. Proposal 1 addresses U2.
Proposal 1 and 2 together addresses U3.

Note that proposal 2 would also make it easier to share prebuilt eggs
with clean Pythons.  So, for example, if you already have an lxml egg
built for a clean Python, you could add it to the search path to use
it in a buildout.

Thoughts? Questions?

Jim

--
Jim Fulton
Zope Corporation




More information about the Distutils-SIG mailing list