[Catalog-sig] homepage/download metadata cleaning

PJ Eby pje at telecommunity.com
Fri Mar 1 23:42:34 CET 2013

On Fri, Mar 1, 2013 at 2:31 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Hmm, then why not remove links that don't match the above from
> the /simple/ index pages ?

PyPI provides the links uninterpreted since the tools' interpretations
have evolved over time.

> Note that it's easily possible to make e.g. file:/// links
> have a fragment that matches what you described, so I guess the
> filters would have to be more careful about what to allow
> (e.g. only http/ftp schemes, perhaps even only https schemes)
> and what not.

file:// URLs are an intentionally supported feature of easy_install;
many users have local NFS-based or other shared repositories.  But
yes, it certainly would be reasonable to not include links to them on
PyPI.  ;-)

> BTW: Are those links also shown as-is on the description page ?
> People could do nasty stuff by adding "javascript:" links which look
> like normal links to the descriptions.

That's true, but is unrelated to the tools, since the tools can't
process javascript links.

It would probably be best, though, if PyPI filtered such URLs to
prevent script injection/CSRF attacks on logged-in PyPI users browsing
project descriptions.  I don't know if it already does this or not,
since I've never tried to inject a CSRF attack on PyPI.  ;-)

(I guess technically it would be a same-site request forgery rather
than a cross-site one, but you know what I mean.)

More information about the Catalog-SIG mailing list