[Distutils] distlib updated - comments sought

Vinay Sajip vinay_sajip at yahoo.co.uk
Fri Oct 5 16:37:29 CEST 2012


Paul Moore <p.f.moore <at> gmail.com> writes:

> No-one could try to claim that the sort of web-scraping that
> easy_install/pip does is a "simple" reference implementation, either.
> If you take that viewpoint, I'd say the stdlib implementation should
> *only* use the XMLRPC interface to PyPI. Code to use the "simple"
> interface and trawl all those links looking for distribution files
> can't be justified in the stdlib for any *other* reason than to save
> anyone else ever having to write it again 
[...]
> PS If you want to start over-engineering the flexibility, users should
> have a way of choosing whether to use the webscraper or XMLRPC
> interfaces to PyPI. The former finds more packages (as I understand
> it) whereas the latter is much faster. As someone who's never needed a
> package that can't be found using both interfaces (or neither ) I

Is that really the case? I'd assumed that the simple pages were generated from
the package database created from uploads to PyPI, so I would have expected
querying the XML-RPC interface to produce the same results as from scraping the
HTML (allowing for the possibility that, if the HTML pages are generated
periodically as static files from the database, they might be stale at times).

I thought that pip needed to scrape pages because people host distribution
archives on servers other than PyPI (e.g. Google code, GitHub, BitBucket or
their own servers), with the links to those archives navigable through e.g. the
"dependency_links" argument to setup(), or the URLs mentioned in the PyPI
metadata.

Regards,

Vinay Sajip



More information about the Distutils-SIG mailing list