[Distutils] "Python Package Management Sucks"

Toshio Kuratomi a.badger at gmail.com
Thu Oct 2 19:33:48 CEST 2008


Phillip J. Eby wrote:
> At 07:14 PM 10/1/2008 -0700, Toshio Kuratomi wrote:
>> In terms of implementation I'd much rather see something less centered
>> on the egg being the right way and the filesystem being a secondary
>> concern.
> 
> Eggs don't have anything to do with it; in Python, it's simply common
> sense to put static resources next to the code that uses them, if you
> want to "write once, run anywhere".  And given Python's strength as an
> interactive development language with no "build" step, having to
> *install* your data files somewhere else on the system to use them isn't
> a *feature* -- not for a developer, anyway.
> 
You're arguing about the developers point of view on something that's
hidden behind an API.  You've already made it so that the developer
cannot just reference the file on the filesystem because the egg may be
zipped.  So for the developer there's no change here.

I'm saying that there's no need to have a hardcoded path to lookup the
information at and then make the install tool place "forwarding
information" there to send the package somewhere else.  We have
metadata.  We should use it.

> And our hypothetical de-jure standard won't replace the de-facto
> standard unless it's adopted by developers...  and it won't be adopted
> if it makes their lives harder without a compensating benefit.  For the
> developer, FHS support is a cost, not a benefit, and only relevant to a
> subset of platforms, so the spec should make it as transparent for them
> as possible, if they don't have an interest in explicit support for it. 
> By the STASCTAP principle (Simple Things Are Simple, Complex Things Are
> Possible), it should be possible for distros to relocate, and simple for
> developers not to care about it.
> 
It's both a cost and a benefit.  The cost is having to use an API which
they have to use anyway due to eggs possibly being zip files.  The
benefit is getting their code packaged by Linux distributors quicker and
getting more contributors as a result of the exposure.

> 
>>   We should have metadata that tells us where the types of
>> resources come from.  When a package is installed on Linux the metadata
>> could point locales at file:///usr/share/locale.  When on Windows
>> egg:locale (Perhaps the uninstalled case would use this too... that
>> depends on how the egg structure and metadata evolves.)
>>
>> A question we'd have to decide is whether this particular metadata is
>> something that should be defined globally or per package.  Or globally
>> with a chance for packages to override it.
> 
> I think install tools should handle it and keep it out of developers'
> hair.  We should of course distinguish configuration and other writable
> data from static data, not to mention documentation.  Any other
> file-related info is going to have to be optional, if that.  I don't
> really think it's a good idea to ask developers to fill in information
> they don't understand.  A developer who works entirely on Windows, for
> example, is not going to have a clue what to specify for FHS stuff, and
> they absolutely shouldn't have to if all they're doing is including some
> static data.
> 
Needing to have some information about the files you ship is inevitable.
 Documentation is a good example.  man pages, License.txt, gnome help
files, windows help files, API docs, sphinx docs, etc each have to be
installed in different places, some with requirements to register the
files so the system knows they exist.    All the knowledge about what to
do with these files should be placed in the tool.  But the knowledge of
what type to mark a given file with will have to lay with the developer.

> Even today, there exist Python developers who don't use the distutils to
> distribute their packages, so anything that makes it even more difficult
> than it is today, isn't going to be a viable standard.  The closer we
> can get in ease of use to just tarring up a directory, the more viable
> it'll be.  (That's one reason, btw, why setuptools offers revision
> control support and find_packages() for automating discovery of what to
> include.)
> 
Actually, as a person who distributes upstream packages which don't use
distutils and is exposed to others, I'd say that the shortcomings in
terms of where to install files and how to reference the files after
install is one of the reasons that distutils is not used.  Are there
other reasons?  Sure.  But this is definitely one of the reasons.

> 
>> > I'd have preferred to avoid that complexity, but if the two of us can't
>> > agree then there's no way on earth to get a community consensus.
>> >
>> > Btw, pkg_resources' concept of "metadata" would also need to be
>> > relocatable, since e.g. the "EggTranslations" package uses that
>> metadata
>> > to store localizations of image resources and message catalogs.  (Other
>> > uses of the metadata files also inlcude scripts, dependencies, version
>> > info, etc.)
>> >
>> Actually, we should decide whether we want to support that kind of thing
>> within the egg metadata at all.  The other things we've been talking
>> about belonging in the metadata are simple key value pairs.
>> EggTranslations uses the metadata area as a data store.  (Or in your
>> definition, a resource store).  This breaks with the definition of what
>> metadata is.  Translations don't store information about a package, they
>> store alternate views of data within the package.
> 
> I was actually somewhat incorrect in my statement about the distinction
> between pkg_resources "metadata" and "resources"; "metadata" is really
> "data that goes with the distribution, not with a specific package
> within the distribution".  Only some of this data is "about" the
> distribution; the rest is data "with" or "of" the distribution.  (This
> is a slight API wart, but the use case exists nonetheless.)
> 
> Meanwhile, regarding the proposed key-value pairs system, I don't see
> how that works; "extras" dependency information and entry points are a
> bit more structured than just key-value pairs; both are currently
> represented as .ini-like files with arbitrary section names.  I suppose
> you could squash those entire files into values in some sort of
> key-value system, but that seems a bit hairy to me.  In particular,
> setuptools design choice for separate metadata files is that many of
> these things don't need to be loaded at the same time.  Also,
> PKG-INFO-style metadata can contain rather large blobs of text that
> aren't needed or useful at runtime.  Entry points and extras are mostly
> runtime metadata, with the occasional bit of build or install usage.
> 

Structured, yes.  Structure and optimizations to how you lookup the data
is good.  But there is a difference between using metadata to save and
lookup configuration and using metadata to save and lookup data (like
locale files).  You wouldn't save data into gconf or the Windows
Registry for instance (at least, not if you don't expect people to make
fun of you *cough*evolution*cough*).

OTOH if it's not really a metadata store vs a resource store but instead
a package store vs a distribution store we need to decide if we really
want to have both.  Someone pointed out earlier that

Side note: the fact that someone wrote EggTranslations speaks of a need
for people to be able to access the per-package data store across
packages.  Let's fix that and work with EggTranslations to rewrite its
backend to use a proper storage.  (Looking at the EggTranslations
documentation, it might even be a proper place for getting ideas and
help with designing the API for a public data store.)

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20081002/22141807/attachment.pgp>


More information about the Distutils-SIG mailing list