[Catalog-sig] metadata

Kapil Thangavelu k_vertigo@yahoo.com
Thu, 6 Sep 2001 22:24:42 -0700 (PDT)

--- "Martin v. Loewis"
<martin@loewis.home.cs.tu-berlin.de> wrote:
> > I've started work on a fourth catalog
> implementation.
> > I've had a few questions/comments, i wanted to
> send to
> > the list to resolve regarding metadata.
> While this is a laudable goal, I have a few
> procedural concerns with
> your message.


> > the key for interoperability among the three is
> having package
> > metadata.
> More specific, the key for interoperability is
> having *standard*
> metadata.


> > looking over pep 241, i can note several
> deficiencies
> > that i would like to address. While the use of
> rfc822
> > for metadata definition does lower the author
> burden
> > is unextensible and creates the opportunity for
> > ambiguity in the metadata, i'd like to change this
> to
> > an xml based format. 
> This is where my procedural concerns start. PEP 241,
> as-is, is already
> implemented in distutils and currently available to
> users through
> Python 2.2aX. Any change at this point in time needs
> a *very* good
> reason for the change, such as the PEP being
> unimplementable, not
> achieving the goal it is intended to achieve, etc.

the pep is obviously implementable, the question is
rather does it achieve its goal, the pep itself
doesn't define a goal, so that is hard to say. 

as for achieving my goal as stated previously as 
creating a pythonic version of cpan/apt, to automate
installation of new packages with depedency

the pep as it currently stands make such a goal IMO
pratically unattainable/maintainable. 

therefore the sooner changes are discussed and made,
the better.

> IOW, changing the format of the metadata now will
> significantly slow
> down progress on producing a catalog implementation,
> and getting
> packages registered with it. Thus, we might get the
> perfect system on
> paper; I'd rather prefer an incomplete system in
> reality.

two things, regarding slowing down catalog
implementations, i think changing the format of the
metadata so that it can achieve the above goal, will
affect just the opposite, namely to speed up catalog
development by providing standard metadata needed to
provide the interfaces suggested in the catalog-sigs
in addition to additional services like subscriptions
and syndication.

second, please understand that it is not my desire to
mandate any changes to existing peps or guidelines. i
want to create a real world extensible system that can
be used and tested before asking for pep revisions and
or authoring new peps , as per my interpretation of
pep guidelines from pep 1

The PEP should be reviewed and accepted before a
reference implementation is begun, unless a reference
implementation will aid people in studying the PEP.

i intend to conduct development openly in this forum
with key draft documents available for review. i
sincerely want to have a real world implementation
that can be adapted to existing standards and allow
for open implementations of a client (for example an
rdf based client). what i've stated in my metadata
comments reflect my thoughts on an information set
that i will be using internal to the server and
client. making this infoset standard will ease both
development and maitainenace. if upon completion of
this catalog server these changes are ill-recieved i
can hack up some conversion. the server should also
allow manual addition of such information via web
interfac. also a real world implementation fufills the
original comments to the recent 'status' thread for an
implementation as well.

> As for XML specifically: What problem does the mere
> switching to XML
> achieve? I believe your claim that the current
> format is unextensible
> is incorrect: The Metadata-Version was put in
> precisely to allow
> future extensions. I'd strongly discourage
> "proprietary" extensions at
> this time, so not being able to put in those is a
> good thing: Any
> extensions used ought to be published and
> documented, in a revision of
> PEP 241.

what does xml provide...
generic language indepedent processing tools for
heirarchical information thats amenable to
internationalization and is easily extensible. all of
which are standard "xml benefits" items ( i feel like
i'm preaching to the choir:). the real question is
what do rfc822 headers provide, very little imo.
simple processing via standard module and low
developer overhead. xml parsing routines are also
standard in the library and the format is also text
editable albeit not quite as friendly as rfc822
headers but with a developer tool/module this is

modeling some of these concepts in straight rfc822
headers yeilds some fairly ugly results that can be

the cumulative benefits for implementations of catalog
servers and clients seem overwhelming to me.

> > probably the biggest problem with adoption of
> pep241
> > is the lack of dependency info. Dependency info
> should
> > be both version specific and capable of being os
> > dependent. 
> Because package dependency is really hard, I believe
> it was
> deliberately left out from version 1.0 of the
> metadata. That means
> that any package author requiring prerequisite
> packages should put the
> prerequisite list into the Description, with the
> user of the catalog
> being responsible for fulfilling the prerequisites.
> So lack of dependency info is IMO a key to success,
> rather than a
> problem.

i think we might have different core goals. to me
depedency info is a must. leaving it out of the
standard and to convention violates your stated
principal above of using 'standard metadata'. without
depedency info i think there is undue burden on client
and server implentations for depedency resolution
(both for install, upgrade, and removal).

in addition, the catalog itself should be maintaible,
shifting the burden to a few maitainers of the catalog
from the masses of the python developers will make a
catalog implementation *much* harder to maintain and
populate. this info should properly be designated by
those who know the packages namely their maintainers
and authors.

as for depedency info being hard, i'd really like some
feedback for my metadata schema ands it
characterization of depedency info. sadly emailing
this will have to wait till tomorrow when i get back
from vacation and have a real net connection.

> > there is also an assumption within the pep241 and
> 243
> > i'd like to address. namely that the author of a
> > package will be the person to upload a package. at
> > least initially this is likely to be unlikely,
> > especially during an initial rush to fill up the
> > repository via some semi-automated extraction from
> the
> > vaults.
> That's a good point. Should we support a Packager
> field in addition to
> the Author field (which, of course, requires a new
> Metadata-Version)?
> Alternatively, would could encourage uploaders to
> put their name into
> the Author field, and put the "true" author into the
> Description.  I
> doubt the true author would be happy to receive
> complaints about the
> packaging when she didn't even know somebody
> uploaded the package.

i would be much more in favor of Packager field rather
than introducing non obvious semantics into common
metadata concepts. 


kapil thangavelu

Do You Yahoo!?
Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger