[Catalog-sig] simple index and urls exracted from metadata text fields

P.J. Eby pje at telecommunity.com
Fri Sep 11 15:32:59 CEST 2009

At 03:13 PM 9/11/2009 +0200, Tarek Ziadé wrote:
>This leads to some problems when scripts like easy_install scans the 
>index page: it might try to visit urls the author just put there in 
>his description text with no particular intent of making it viewable.

Easy_install only visits pages marked as "home page" links or "download" links.

>  Plus, old urls that don't work anymore are not removed, leading to 
> easy_install timeouts. 1. what's the purpose of having them in there ?

To allow easy_install to find "dev" links and other identifiable 
direct-download links.

>2. if there's a purpose, what about adding an attribute to each <a> 
>tag to identify from which metadata field it was extracted from ?

The attribute already exists: rel="download" and rel="homepage"; if 
there's no 'rel' it's from the description.

I'm rather surprised you don't know these things already, since 
they're all rather prominently documented as part of easy_install's 
"index API" here:


