[Distutils] "Python Package Management Sucks"

Thu Oct 2 00:53:19 CEST 2008

Phillip J. Eby wrote:
> At 09:40 PM 10/1/2008 +0200, Josselin Mouette wrote:
>> Le mercredi 01 octobre 2008 Ã  14:39 -0400, Phillip J. Eby a Ã©crit :
>> > >   We need to be able to mark locale, config, and data files in
>> > >the metadata.
>> >
>> > Sure...  and having a standard for specifying that kind of
>> > application/system-level install stuff is great; it's just entirely
>> > outside the scope of what eggs are for.
>>
>> I donâ€™t follow you. If the library needs these files to work, you
>> definitely want to ship them, whether it is as their FHS locations in a
>> package, or in the egg.
>
> Egg files aren't an all-purpose distribution format; they were designed
> for application plugins, and for libraries needed to support application
> plugins.  As such, they're self-contained and weren't designed for
> application-level installation support, such as documentation,
> configuration or data files, icons, etc.
>
> As has been pointed out, these are deficiencies of .egg files wrt the
> full spectrum of library and application installation needs, which is
> why I'm pushing for us working on an installation metadata standard that
> can accommodate these other needs that the .egg layout isn't really
> suited for.
>
We need to get the list of problems up somewhere on the wiki so that
people can check that the evolving standard doesn't fall into the same
pitfalls.  After all, people are using the egg and pkg_resources API for
just this purpose today with some happy about it and others not so much.

> My main point about the resources is simply that it's a needless
> complication to physically separate static data needed by a library at
> runtime, based solely on its file extension, in cases where only that
> library will be reading that file, and the file's contents are constant
> for that version of the library.
>
> To put it another way, if some interpretation of the FHS makes a
> distinction between two files encoding the same data, one named foo.bar
> and foo.py, where the only difference between the two is the internal
> encoding of the data, then that interpretation of the FHS is not based
> on any real requirement, AFAICT.
>
Actually, file encoding is one major criteria in the FHS.  However, it's
probably not in the manner you're thinking of :-)  Files which are
architecture dependent generally need to be separated from files which
are architecture independent.  Since text files and binary data which
has a standard byte-oriented format are generally what's used as data
these days it's the major reason that data files usually go in
/usr/share while libraries/binaries go in /usr/lib and /usr/bin.  This
is dues to the range of computers that architecture dependent vs
architecture independent data can be shared with.  Of course, part of
python's site-packages on Linux systems violates this rule as python can
split architecture dependent and architecture independent packages from
one another.  I know that some distributions have debated moving the
architecture independent portion of site-packages to /usr/share although
I don't know if any have (Josselin, has Debian done this?)  The idea of
moving is not straight forward because of 1) compatibility with
unpackaged software and 2) /usr/share is seen in two lights: the place
for architecture independent files and the place for data; /usr/lib is
seen in two lights: the place for architecture dependent non-executables
and the place for code whose instructions are run by executables.

> Of course, for documentation, application icons, and suchlike, the data
> *will* be read by things other than the library itself, and so a
> standardized location is appropriate.  The .egg format was designed
> primarily to support resources read only by the package in question, and
> secondarily to support metadata needed by applications or libraries that
> the package "plugs in" to.  It was not originally intended to be an
> general-purpose system package installation format.
>
<nod>.  Despite this design, it's presently being used for that.  So we
need to figure out what to do about it.

>
>> > To be clear, I mean here that a "file" (as opposed to a resource) is
>> > something that the user is expected to be able to read or copy, or
>> > modify.  (Whereas a resource is something that is entirely internal
>> > to a library, and metadata is information *about* the library itself.)
>>
>> Itâ€™s not as simple as that. Python is not the only thing out there, and
>> there are many times where your resources need to be shipped in existing
>> formats, in files that land at specific places. For example icons go
>> in /usr/share/icons, locale files in .mo format in /usr/share/locale,
>> etc.
>
> And docs need to go in /usr/share/doc, I presume.

docs are special in the packaging world on several accounts.  Generally
the packager has to collect at least some of the docs themselves (as
things like LICENSE.txt aren't normally included in a doc install but
are important for distributions to package.)  rpm, at least provides a
macro to make it easy for the packager to mark files and directories
from the source tree as documentation which rpm will put in the
appropriate directory itself.  So packagers often use an upstream's
build scripts to build the docs, but usually install the docs using the
package tool's facilities.

Additionally, there's a difference between docs which the program uses
(for instance for online help) and docs which the end user would have to
navigate the filesystem and invoke a viewer themselves to read.  The
former is data, the latter is docs.

> But these aren't
> necessarily "resources" in the way I'm defining the term.  Some of them
> *could* be, perhaps.  Others aren't.
>
> To be clear, what I'm trying to say is that it is a perfectly valid use
> case for a Python package author to have static data contained within
> their Python package directory layout for purposes of accessing that
> data as if it were code, but without having to go to the trouble of
> converting it to a .py file (and possibly having to extract it back out
> at runtime).  This usage of "data" files isn't in conflict with the FHS,
> as I understand it.
>
> But I also understand that there are other kinds of "data" files which
> *don't* fall under that use case, and which it is desirable to install
> to shared locations.  We need to support both.
>
Possibly.  We could definitely throw out the first case (resources) and
just have a data category and the FHS would be fine.  Whether there's a
case for resources depends on their definition.  the test of "Could be
put in a python file and extracted" doesn't fly.  I could convert all my
images to .xpm and put them in python files.  But that's a lot of work.
 And the moment I take them back out to separate .xpm files, they would
definitely belong in /usr/share.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20081001/ccd4ee61/attachment.pgp>