[PYTHON META-SIG] Python Locator work

Paul Everitt/Digital Creations Paul.Everitt@cminds.com
29 Aug 95 15:33:47


Programming notes:
1) This will be the last post in the meta-sig.  With some patience from 
Barry, I might just figure out this high-falutin' teknowleegie yet.
2) Check out a perl implementation (rather old, I believe) of an IAFA search 
engine
at:
 http://www.ai.mit.edu/tools/site-index.html

Ken writes:

> First of all, it seems to me this discussion warrants a sig mailing-list -
> eg, locator-sig.  That'd leave the meta-sig for things that meta more...-)

Hargh hargh...many yucks.

> On Mon, 28 Aug 1995, Paul Everitt/Digital Creations wrote:
> 
> > Hi everyone.  I've condensed various conversation into a page at:
> >  http://ralph.digicool.com/psa/python.locator.html
> 
> (I am able to easily comment on paul's text because i use the emacs w3 
> web browser, so i have the text nicely formatted in a buffer, but web 
> URL's can be difficult to annotate for discussion.  I think postings to a 
> sig mailing list would be easier for discussion sake...)

My role will be to compile the discussed changes back into the 
original.
 
> > [...]
> > below so we all agree to the problem statement.  ~Don't get torqued if you
> > feel it is condescending! :^)~ Gosh, I'm just trying to be thorough! 
> 
> No need to apologize for "thorough" to me, at least - as you probably have
> noticed from my postings, i tend (for better or worse) to prefer erring on
> the side of thorough, etc... 

Yes, Ken, and that is the same type of fascination for classifying that 
has turned me into a Harvest junkie!

> > The Python community is growing too big to find things easily.  It is
> 
> First of all, nice executive summary.  I would include, at the top there,
> a comment that there is great potential benefit, all around, in making it
> easier to coordinate the communities' efforts, and to share the products
> of the efforts.  Which a good locator mechanism could significantly help 
> ennable...

Good point...and worked in.

>> [...]
> > We also need to distribute control.  Centralizing the index means, as in the
> > case of whois, that no one ever maintains their entries, and the performance
> > becomes wildly unpredicatble.  Thus, the distributed nature of the Python
> 
> (I don't mean to be dense, but the word "performance" confuses me, here. 
> Do you mean that the accuracy becomes unpredictable, as some entries
> become obsolete?)

No, that was implied in the first half.  I'll change the sentence to:
"Centralizing the index means, as in the case of whois, that no one ever 
maintains their entries, and the data becomes dirty, therefore rendering it
useless.  Also, the response time on a central index becomes wildly 
unpredictable.  Thus, ..."
 
> > Finally, any design choice we make should not be in a vacuum.  Bigger fish
> > than the PSA are working on this problem.  Thus, we should try and mirror 
the
> > standards groups and other development, where possible.  
> 
> This is great - provided the IAFA are doing a good job, then we should
> benefit from exploiting their work.  And i already am sold on harvest. 

It is within debate as to whether the IAFA work is successful.  From my 
limited knowledge though, I think they have done a nice balance of structure
and flexibility.  Plus, with Bunyip (of Archie fame) on board, they seem to 
have a good pedigree.

> > strategy which is nearly comprehensive to the problem domain of the Python
> > Locator.  [...]
> 
> (Perhaps this would be a less convoluted way to phrase that:
> "... strategy which nearly encompasses the Python Locator problem.")

Agreed, and changed.
 
> > Looking at the IAFA work, we can start right off the bat with three 
top-level
> > groupings of ~resources~:
> 
> >  o SITE
> >  o USER, ORGANIZATION, SERVICE
> >  o DOCUMENT, IMAGE, SOFTWARE, MAILARCHIVE, USENET, SOUND, VIDEO, FAQ
> 
> The division is very similar to the two categories of resource that i
> suggested - institution and product - except you separated SITE from the
> other institutions.  I suppose you're considering the service/person/
> organization more as agencies, while a site is more of a resource for
> situating things out on the net - a repository.  So perhaps we can
> identify the meta-categories as institutions, products, and repositories. 

The reason SITE is separated on my list is that site connotes a 
place where you can actually get things.  It is more of a placeholder, 
and doesn't give you any specifics.

The other reason is that this is the way the IAFA docs are written.  I'd like
avoid writing my own documentation, and I presume that these folks have 
some good reasons.  If we don't have SITE, then I am nearly positive that 
the existing IAFA collection software won't work.

So, as this is the first mandatory IAFA element, I'm motion to keep it.

> > [...]
> > For instance, we may need to increase the above list to:
> > 
> >  o SITE
> >  o USER, ORGANIZATION, SERVICE
> >  o DOCUMENT, IMAGE, SOFTWARE, MAILARCHIVE, USENET, SOUND, VIDEO, FAQ
> >  o INITIATIVE,SIG
> 
> I ~could~ see initiative and sig fitting into my "institution" category. 
> Not sure, though.  In any case, if your point is that we need to allow for
> extension of the repertoire if/when people come up with resources that
> don't fit in the existing ones, then i agree. 

Sounds like agreement.  So, it looks like there are really just two 
types of templates for us: the IAFA ones, and our extended ones.

> > Next, for each of the above template-types, we need to pick some ~fields~.
> > Fortunately, the IAFA does have some suggested fields for each template 
type.
> > We could start with them, and add new ones where appropriate.  For instance,
> > in the following, I show supplementary fields in ~italics~:
> 
> On quick glance, this all looks good.  Some questions:
> 
>  - Is the URI supposed to serve as the unique identifier for the resource?

Yep.  For now, it maps directly into a URL.

>  - It may be useful to be able to associate "related resources" with
>    many of the entry types - eg, USER could include mention of some of
>    the prominent projects and products with which the user is, or was,
>    involved...

In the IAFA RFC, there is a construct for this called "cluster":
"There are certain classes of data elements, such as contact
information, which occur every time an individual, group or
organization needs to be described. Such data as names,
telephone numbers, postal and email addresses etc. fall into
this category. To avoid repeating these common elements
explicitly in every template below, we define "clusters"
which can then be referred to in a shorthand manner in the
actual template definitions. "

So, we can thing of these as "pointers" to other defined templates.

>  - In addition to the Publication-Statutes (i presume you meant
>    "statutes", not "statues") include a possibly optional "encumberment" 
>    field, to indicate licensing, copyright, etc encumberment status.

Actually, that was a typo: it is Publication-Status.  It is corrected.

There are many types of fields suggested in the IAFA examples.  Many 
are geared at for-fee sites that may have restrictions.  I may try to put a 
list 
of suggested fields into the taxonomy.

> Altogether, this looks like a real good start - i particularly like
> capitalizing on substantial existing mechanisms, like the IAFA stuff
> and harvest.  And also providing for catalogue "browsing" as well as
> searching...

Thanks, I wanted to get it out, and find out if I was out to lunch or 
not.  There will be some groovy things come out in the 
implementation, so stay tuned!

--Paul


=================
META-SIG  - SIG on Python.Org SIGs and Mailing Lists

send messages to: meta-sig@python.org
administrivia to: meta-sig-request@python.org
=================