[Catalog-sig] Extra links on the PyPI /simple index package pages

M.-A. Lemburg mal at egenix.com
Mon Jun 21 12:57:42 CEST 2010

P.J. Eby wrote:
> At 11:10 AM 6/18/2010 +0200, M.-A. Lemburg wrote:
>> "Martin v. Löwis" wrote:
>> > Am 17.06.2010 15:16, schrieb M.-A. Lemburg:
>> >> Benji York wrote:
>> >>> On Thu, Jun 17, 2010 at 7:40 AM, M.-A. Lemburg<mal at egenix.com> 
>> wrote:
>> >>>> http://pypi.python.org/simple/zc.buildout/
>> >>>>
>> >>>> BTW: what are all those bug links doing on the zc.buildout index
>> page ?
>> >>>
>> >>> PyPI scrapes all the links from the long description; for many
>> projects
>> >>> that includes a change log with links to fixed bugs.
>> >>
>> >> Isn't that dangerous ?
>> >>
>> >> AFAIK, setuptools would start opening all those URLs and might
>> >> find download files which are not necessarily under full control of
>> >> the author, e.g. anyone could add a comment to a bug report or
>> >> wiki page with a link to an egg file on some rogue server.
>> >
>> > I think you misunderstand. Links originate *only* from the long
>> > description. The package owner has full control over that.
>> I was referring to the linked assets that the package owner
>> may not have full control over, e.g. in the above case,
>> you have links pointing to launchpad and one to "file://".
>> Such links (except the file:// one) can be useful in the
>> package description, e.g. to point to a bug tracking
>> system, documentation or other resources, but they are
>> not really needed to point setuptools to download locations.
> This is a misunderstanding of what setuptools does.  Setuptools only
> retrieves URLs that are *specifically designated* as a "home page" or
> "download" link (using the "rel" field of the A tag on the PyPI /simple
> page), or which are a recognizable download URL supplied by way of the
> long_description.
> So, the risk you are describing does not actually exist.
>> > If you think the package owner is opening up a security threat by
>> > including the links in the first place - yes, that's indeed a risk.
>> Is this feature still needed for setuptools ?
> Yes.
>> We have download URLs and homepage URLs which should be enough for
>> setuptools to search and find the links to package download files.
> No.  This would only be the case if the project's author had some other
> form of hosting.  For example, if you had a subversion repository for
> your development trunk, but didn't have any place to host an HTML page
> to link to it, the long_description would be the only way (AFAIK at
> present) for you to securely provide a link to that repository for
> setuptools (or humans) to use.

The author could setup the home page or download URL to point to
that repository (SVN makes the repos available as HTML pages as

> See also:
> http://peak.telecommunity.com/DevCenter/setuptools#making-your-package-available-for-easyinstall
> and:
>   http://peak.telecommunity.com/DevCenter/PackageIndexAPI
> for more information on how the link parsing and retrieval works.
> It is a common misconception that setuptools spiders pages for links;
> the truth is, it only reads the "home" and "download" URLs provided via
> the PyPI metadata, and those only if they're not obviously links to a
> package tarball (or zip, egg, etc.).  All other links must visibly point
> to something downloadable, or else they're ignored.

So in summary, the /simple index page doesn't need to include
any URLs from the long_description that do not have a rel
attribute set, or end with one of the fixed set of archive extensions
or with "#egg=...".

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jun 21 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK                27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list