[Catalog-sig] remove historic download/homepage links for a project (was: Re: Deprecate External Links)

holger krekel holger at merlinux.eu
Thu Feb 28 10:28:35 CET 2013

On Wed, Feb 27, 2013 at 16:16 -0700, Aaron Meurer wrote:
> And by the way, this hasn't been mentioned, but I really mean *all*
> mentions of Google Code on PyPI.  pip crawls Google Code not just
> because Google Code listed as an official site for my package or
> because the latest release is there, but because a single old release
> points there.  So to get pip to not crawl there, I would have to go
> through and remove all old mentions of Google Code, even from releases
> that were made in 2006.  So you can see why the expired domain
> scenario is a very real issue. And combined with the fact that
> everyone uses pip with sudo that was discussed on this list a while
> back, you have a hackers dream for installing malicious code on
> everyone's computers.

I wrote a little command line tool "cleanpypi.py" for the
purposes of removing _all_ download/homepage metadata from all releases
of a project.  See it attached - WARNING: it's a hack but worked for me.
It uses the xmlrpc interface to get release data and then the SUBMIT POST
hook to register the new "cleaned up" metadata.  If you want to 
play with it, you might comment the final "req.post" request so that 
no actual changes take place and you can see what it would do.

Apart from preventing hijacking old download/homepage-referenced domains
it has the nice side effect that it speeds up the installation of your
package because no 3rd party crawling needs to take place.  Given some
streamlining, a tool like this could be advertised on pypi.python.org or
offered directly as an action in the server UI for package authors.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cleanpypi.py
Type: text/x-python
Size: 1903 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130228/67548664/attachment.py>

More information about the Catalog-SIG mailing list