[Catalog-sig] metadata

Kapil Thangavelu k_vertigo@yahoo.com
Fri, 31 Aug 2001 11:39:38 -0700 (PDT)

I've started work on a fourth catalog implementation.
I've had a few questions/comments, i wanted to send to
the list to resolve regarding metadata.

I spent some time looking at the metadata of some
other packaging systems namely Debian's apt, ACS's
apm, and the OSD format.

i see the overall goal of this project as creating a
pythonic version of cpan/apt, to automate installation
of new packages with depedency resolution. i think
this goal is best solved by three separate software
packages, the catalog server, the distutils, and the
catalog client sitting on top of the distutils. the
key for interoperability among the three is having
package metadata.

looking over pep 241, i can note several deficiencies
that i would like to address. While the use of rfc822
for metadata definition does lower the author burden
is unextensible and creates the opportunity for
ambiguity in the metadata, i'd like to change this to
an xml based format. 

probably the biggest problem with adoption of pep241
is the lack of dependency info. Dependency info should
be both version specific and capable of being os
dependent. following along with debian's apt, i'd like
to add in multiple types of dependencies basically
mirroring the apt, with the addition of additional
type 'EXTERNAL' to denote non python dependencies.

one concept i've been debating is the introduction of
logical package keys for dependency tracking. this
would basically allow a single package to register
itself as a provider of multiple logical keys, say
zope does zpublisher, dtml, zodb, extension class,etc.
a package would always have its own name as a logical
key making the definition of other logical keys
optional. i'm not sure about the utility for resolving
dependencies though since outside of some standards
driven packages, packages with the same logical keys
will likely have different interfaces and underlying
implementation concerns. however it would allow for
aggregate distributions like the current egenix

another issue is the ability to define multiple
authors and a vendor. for example the pyxml package is
has multiple authors and a single vendor (xml-sig).
the current description in pep241 is single author
biased and introduces ambiguity regarding an authors
contact info.

the use of keywords as classification doesn't due an
adequate job IMO, to allow for automated
classification within the catalog. while i'd still
like to keep the use of keywords, i'd like to add in
the addition of heirarchical categories.

there is also an assumption within the pep241 and 243
i'd like to address. namely that the author of a
package will be the person to upload a package. at
least initially this is likely to be unlikely,
especially during an initial rush to fill up the
repository via some semi-automated extraction from the

something else i was considering is a some type of
global unique identifier to allow for replication of
information to different repositories. i was thinking
of something along the lines of a new uri protocol
that identified a package on the basis of its
classification with the catalog... i'm a little fuzzy
on this.

to facilitate software updates, i'd like to add in a
release date to the metadata info. 

also the addition of a required python version for a
given package version.

i'll try and write up an xml schema which defines this
package-metadata xml format.


kapil thangavelu

Do You Yahoo!?
Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger