[Catalog-sig] Extra links on the PyPI /simple index package pages

P.J. Eby pje at telecommunity.com
Fri Jun 18 23:14:45 CEST 2010


At 11:10 AM 6/18/2010 +0200, M.-A. Lemburg wrote:
>"Martin v. Löwis" wrote:
> > Am 17.06.2010 15:16, schrieb M.-A. Lemburg:
> >> Benji York wrote:
> >>> On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg<mal at egenix.com>  wrote:
> >>>> http://pypi.python.org/simple/zc.buildout/
> >>>>
> >>>> BTW: what are all those bug links doing on the zc.buildout index page ?
> >>>
> >>> PyPI scrapes all the links from the long description; for many projects
> >>> that includes a change log with links to fixed bugs.
> >>
> >> Isn't that dangerous ?
> >>
> >> AFAIK, setuptools would start opening all those URLs and might
> >> find download files which are not necessarily under full control of
> >> the author, e.g. anyone could add a comment to a bug report or
> >> wiki page with a link to an egg file on some rogue server.
> >
> > I think you misunderstand. Links originate *only* from the long
> > description. The package owner has full control over that.
>
>I was referring to the linked assets that the package owner
>may not have full control over, e.g. in the above case,
>you have links pointing to launchpad and one to "file://".
>
>Such links (except the file:// one) can be useful in the
>package description, e.g. to point to a bug tracking
>system, documentation or other resources, but they are
>not really needed to point setuptools to download locations.

This is a misunderstanding of what setuptools does.  Setuptools only 
retrieves URLs that are *specifically designated* as a "home page" or 
"download" link (using the "rel" field of the A tag on the PyPI 
/simple page), or which are a recognizable download URL supplied by 
way of the long_description.

So, the risk you are describing does not actually exist.


> > If you think the package owner is opening up a security threat by
> > including the links in the first place - yes, that's indeed a risk.
>
>Is this feature still needed for setuptools ?

Yes.


>We have download URLs and homepage URLs which should be enough for
>setuptools to search and find the links to package download files.

No.  This would only be the case if the project's author had some 
other form of hosting.  For example, if you had a subversion 
repository for your development trunk, but didn't have any place to 
host an HTML page to link to it, the long_description would be the 
only way (AFAIK at present) for you to securely provide a link to 
that repository for setuptools (or humans) to use.

See also:

   http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall

and:

   http://peak.telecommunity.com/DevCenter/PackageIndexAPI

for more information on how the link parsing and retrieval works.

It is a common misconception that setuptools spiders pages for links; 
the truth is, it only reads the "home" and "download" URLs provided 
via the PyPI metadata, and those only if they're not obviously links 
to a package tarball (or zip, egg, etc.).  All other links must 
visibly point to something downloadable, or else they're ignored.

That means unless your bug tracking system's URL ends with 
"/myproject-1.2.tgz", it ain't gonna get downloaded.  And unless you 
used it as your "home page" link, it won't be searched for links, either.  ;-)



More information about the Catalog-SIG mailing list