[Distutils] Metadata fields

Amos Latteier amos@digicool.com
Mon Mar 12 16:13:01 2001


Andrew Kuchling wrote:
> 
> On Mon, Mar 12, 2001 at 12:24:02AM -0500, Amos Latteier wrote:
> >The distutils currently has both a description and a
> >long_description. I think that both are useful.
> 
> What's the distinction?  Maybe: description is a single line, and
> long_description is one or more full paragraphs?

Exactly. Having both is useful since often the name is not enough to
descriptively identify a package. It is often useful to have a one line
description when browsing lists of packages, etc. We could do away with
description and long description if there was a simple way to get a one
line description from a long description. One solution is to just use
the first line of the description. This sounds OK, but in practice I bet
it wouldn't work well.

> >It also has a couple "derived" fields - contact and
> >contact_email are set to either the maintainer (if
> >available) or the author. It also has a fullname field which
> >is name-version. I think that we can dispense with these
> >"derived" fields.
> 
> Hmm... if get_contact_email() is removed, then users of the
> DistributionMetadata class will have to essentially repeat the same
> 'if maintainer is not None: <use maintainer> else: <use author>'
> logic, so I think it's worth keeping.  A similar argument applies to
> fullname.  I don't believe you can say fullname='blah' in the setup()
> call, can you?

I don't believe that currently you can set any derived meta-data.

It feels to me that things like contact, download_url, last modified
date, size, etc are more method-ish than meta-data-ish. Maybe this is an
artificial distinction. Meta-data feels to me like static data that a
packager provides, while these other pieces of information seem to
require calculation on the part of the catalog. Maybe we shouldn't foist
this distinction on the catalog client, though...
 
> >I wonder about the download link. The distribution packager
> >may not know what this is, assuming that the software can be
> >downloaded from the catalog. Maybe this field is set by the
> >catalog. Or maybe it doesn't belong in the meta-data.
> 
> <light bulb goes on> Good point!  It probably shouldn't, since the
> metadata tells you about the package that you have right here as a
> tarball/RPM/whatever and the package doesn't need to know where it's
> supposed to live on the Web.
> 
> If we make this decision, some consequences are that the multiple
> implementations permitted by PPD, and hence the need to use XML
> instead of a simple text file, both vanish from the Distutils' field
> of view.  (Not from the catalog's field of view, of course.)
> ActiveState's process would then look something like: grab a
> distribution, parse its metadata, convert the metadata to PPD adding
> extra things such as the location the package came from, and use the
> PPD from that point onward.

Great point. Using a format like PPD or other known XML formats makes
much more sense for communication between the catalog and its clients,
than for communication between a package and the catalog.
 
> Here's another thing I'm not sure of: does
> METADATA go in both source and binary distributions?  If so, where?
> Strawman convention: metadata goes in a METADATA file in the top
> directory of source dists.  In binary dists... it should really drop
> into a Python package database, shouldn't it?

I'm not sure what you mean by Python package database. In general, it
will be hard to put meta-data in a binary distribution in a way that the
catalog can easily retrieve it. So I agree that metadata goes in source
distributions. The upshot is that if you want to distribute your binary
package with the catalog you'll need to also provide a source package,
or else provide the metadata manually some how. I think that this is
fine for now.

> >I think that it's worth having author information seperate
> >from package information. I also think that email address is
> >a good author id. This probably means that the catalog
> >system will be in charge of managing author meta-data, while
> >packag meta-data will be managed by distribution packagers.
> 
> So, should we adopt the convention that the e-mail address returned by
> get_contact_email() can be used as an author-unique ID by external
> cataloguers?  I'll put that in the PEP if the idea meets with
> approval.

This seems reasonable to me. Also it doesn't matter if for some reason
your email address is different for different packages you release. So
long as you let the catalog know about all your email addresses, things
should be fine. The only problem is if two developers claim to have the
same email address.

-Amos

--
Amos Latteier         mailto:amos@digicool.com
Digital Creations     http://www.digicool.com