[Distutils] Install time prefixes and data files

Wolodja Wentland wentland at cl.uni-heidelberg.de
Wed Nov 11 16:24:14 CET 2009


Hi all,

I had a discussion with Tarek on IRC yesterday and would like to bring
some points to your attention and also write down some conclusions/ideas
we had yesterday.

I think that the distutils module has a serious flaw that need to be
addressed before any further work is done on getting PyPi in a better
shape. The main issue is, that there are many ways to customise the
installation of a distribution but absolutely nothing to retrieve this
information within the setup script or within your programs/libraries.

tl;dr:

save prefixes in .egg-info/PREFIX and create methods for Distribution to
retrieve these and data files

Install prefixes / Installation schemes
---------------------------------------

There are basically five installation schemes:
            
1. Default
2. User
3. Home
4. Prefix
5. Custom

All these installation schemes taken together enable the user to install
pure libraries, non-pure libraries, data files and scripts in exactly
the places s/he choose. But this also means that it is impossible to
make any assumptions about the locations in your code.

I think it is of uttermost importance to work on an implementation that
enables users to query this information:

* at build time (within setup.py)
* at run time (within scripts/libraries)

The major problem today is that the information where files of a given
distribution are placed is *only* available at build time and not saved
anywhere. And even at build time you'll have to get the finalised
install command to query that information, which is suboptimal and
counter-intuitive.

Data files
----------

Especially data files are problematic in this respect because the FHS
mandates that these are *not* placed within the packages but at exactly
defined places in the file system. The API available to retrieve data
files in the *stdlib* pkgutil.get_data() can not cope with this and is
therefore not usable.

This leads to the following problems:

* Linux package maintainers have to implement a lot of patches if they
  want to place data files in the correct places.

* Python developers have to either (i) save the various locations at
  install time in a place where they can retrieve that information later
  or (ii) accept that their library/program will break if one of the
  fancier installation schemes is used.

* Python distribution installations that violate the FHS
* ...

Proposal
--------

I propose three things:

1. Save installation prefixes for every installed distribution
2. Define standard infixes for typical data file classes
3. Implement an API that is able to query that information/retrieve data
  files

1 and 3. Discussion

The installation prefixes could for example be saved within a suitable
file in .egg-info/ which would mandate changes to PEP 376. I could think
of a PREFIX file and the following  methods/attribute for e.g. the
Distribution class

* get_install_prefixes()    -> returns a dictionary with all prefixes for
                               this distribution

* get_data_file(type, name) -> get data file of given type and name (see
                               (see below)

2. Discussion

The FHS differentiates between various classes of files and defines
proper location for them. We could define platform dependent 
standard infixes for the following data file classes for distribution
foo:

* configuration     etc
* shared data       usr/share/foo/
* readme            usr/share/foo/README        README, TODO, ... could
                                                be automatically
                                                discovered
* examples          usr/share/foo/examples
* documentation     usr/share/foo/doc
* man files         usr/share/man
* variable          var/lib/foo
* ...

There should be a set of platform dependent prefixes for all classes we
agree on *and* it should be possible to change/set their default values
with, for example, environment variables or command line options.

I call them infixes and not prefixes here, because applications like
virtualenv might want to define a different root for these files than
the system default one.

Tarek suggested to use this scheme like this:

setup(
    ...
    data_files = [
        ('$configuration', ['data/my.conf', 'data/default.conf']),
        ('$examples', ['doc/examples/do_foo.sample']),
        ('/i/want/file/here', ['custom_placed_file']),
        ...
        ]
    )

The placeholders will then be replaced with their current value at
installation time and the locations written to .egg-info/PREFIX.

This would allow/ease a couple of things, namely:

* automatic placement of typical files in the right locations
* much less work for packaging python applications on *nix distributions
  -> debhelper could take advantage of all these for example

* automatic tests/fixers for different file types
  
  I think of automatic lintian style tests on PyPi here that will reject
  a package if it does not conform to a standard or at least warns the user
  about it

* retrieval of certain types of data easily at run time

  Distribution.get_data_file('$configuration', 'my.cfg') -> file object

* ...

thanks!

-- 
  .''`.     Wolodja Wentland    <wentland at cl.uni-heidelberg.de> 
 : :'  :    
 `. `'`     4096R/CAF14EFC 
   `-       081C B7CD FF04 2BA9 94EA  36B2 8B7F 7D30 CAF1 4EFC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20091111/4659658d/attachment.pgp>


More information about the Distutils-SIG mailing list