[Catalog-sig] [Distutils] EasyInstall 0.3a3 released; what about PyPI? (was Re: Initial auto-installation support)

Phillip J. Eby pje at telecommunity.com
Tue May 31 03:14:09 CEST 2005


>At 04:10 PM 5/30/2005 -0500, Ian Bicking wrote:
> >   But besides
> >that, this should work now for any packages with a distutils install, so
> >long as those packages are reasonably well behaved.  Hrm... except
> >setuptools 0.3a2 doesn't have SourceForge download support, but 0.3a3
> >does and I think PJE will release that soon.

0.3a3 is now released, with a new --build-dir option, sandboxing, more 
package workarounds, SourceForge mirror selection, and "installation 
reports".  See:

http://peak.telecommunity.com/DevCenter/EasyInstall#release-notes-change-history

for more details.

I'm thinking that adding automatic package location via PyPI is probably 
pretty doable now, by the way.  My plan is to create a PackageFinder class 
(subclassing AvailableDistributions) whose obtain() method searches for the 
desired package on PyPI, keeping a cache of URLs it has already seen.  (It 
would also accept a callback argument that it would use to create Installer 
objects when it needs to install packages.)

The command-line tool (easy_install.main) would create a PackageFinder with 
an interactive installation callback, and in the main loop it would pass it 
to each new Installer instance.  The Installer would then use it whenever 
it gets a non-file, non-URL command line option, and use it to resolve() 
such requests.

The PackageFinder.obtain() method would go to the PyPI base URL followed by 
the desired distribution name, e.g. 'http://www.python.org/pypi/SQLObject', 
and then scrape the page to see if it is a multi-version page, or a 
single-version page.  If it's multi-version, it would scrape the version 
links and select the highest-numbered version that meets all of your criteria.

Once it has a single-version page, it would look for a download URL, and 
see if its filename is that of an archive (.egg, .tar, .tgz, etc.) or if 
the URL is for subversion.  If so, we assume it's the right thing and 
invoke the callback to do the install.

If not, then we follow the link anyway, and scrape for links to archives, 
checking versions when we get there if possible.  If there's still nothing 
suitable (or there was no download URL), we apply the same procedure to the 
homepage URL.

This should suffice to make a significant number of packages available from 
PyPI with autodownload, and packages with dependencies would also be 
downloaded, built, and installed.

The hardest parts of this aren't in the screen-scraping per se; it's more 
in the heuristics for evaluating whether a specific URL is suitable for 
download.  Many PyPI download URLs are of the form "foopackage-latest.tgz", 
so it's not possible to determine a usable version number from this, unless 
I special-case "latest" in the version parser -- which I guess I could do.

We also probably need some kind of heuristic to determine which URLs are 
"better" to try, as we don't want to just run through the links in order.

Hm.  You know, what if as an interim step we had the command-line tool just 
launch a webbrowser pointing you to PyPI?  Getting to a page for a suitable 
version is easy, so we could then let the user find the right download URL 
and then go back to paste it on the command line.  That could be a nice 
interim addition, although it isn't much of a solution for packages with a 
lot of un-installed dependencies.  You'd keep getting kicked back to the 
web browser a lot, and more to the point you'd have to keep restarting the 
tool.  So, ultimately we really need a way to actually find the URLs.

There are going to have to be new options for the tool, too.  Like a way to 
set the PyPI URL to use, and a way to specify what sort of package 
revisions are acceptable (e.g. no alphas, no betas, no snapshots).



More information about the Catalog-sig mailing list