Attached is a patch so it can install from Sourceforge Download pages, choosing a random mirror. Sadly I find the file I want to install from SF doesn't include version information in its setup file, so it gets the version 0.0.0. Bah. It's an abandoned package anyway (zpt.sf.net), but the up-to-date Zope version hasn't *quite* been separately distributed (though I guess it's possible to build the package from the Zope sources). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org --- orig/setuptools-0.3a2/easy_install.py 2005-05-29 18:01:42.000000000 -0500 +++ easy_install.py 2005-05-29 20:40:34.734190272 -0500 @@ -464,14 +464,20 @@ def _download_html(self, url, headers, filename): - # Check if it is a subversion index page: + sf_url = url.startswith('http://prdownloads.') file = open(filename) for line in file: if line.strip(): + # Check if it is a subversion index page: if re.search(r'<title>Revision \d+:', line): file.close() os.unlink(filename) return self._download_svn(url, filename) + elif sf_url and re.search(r'^<HTML><HEAD>', line, re.I): + continue + elif sf_url and re.search(r'\s*<TITLE>Select a Mirror for File: ', line): + # Sourceforge mirror page: + return self._download_sourceforge(url, file.read()) else: break # not an index page file.close() @@ -482,8 +488,33 @@ os.system("svn checkout -q %s %s" % (url, filename)) return filename + def _download_sourceforge(self, source_url, sf_page): + """ + Return a (randomly-selected) (scheme URL) for downloading the + package, given the SourceForge mirror-selection page. + """ + import random + import urlparse + import urllib + mirror_regex = re.compile(r'HREF=(/.*?\?use_mirror=[^>]*)') + urls = [m.group(1) for m in mirror_regex.finditer(sf_page)] + if not urls: + raise RuntimeError( + "URL appears to be a Sourceforge mirror page, but no URLs found") + url = urlparse.urljoin(source_url, random.choice(urls)) + f = urllib.urlopen(url) + mirror_page = f.read() + f.close() + match = re.search(r'
At 08:45 PM 5/29/2005 -0500, Ian Bicking wrote:
Attached is a patch so it can install from Sourceforge Download pages, choosing a random mirror.
Yay! I've incorporated more or less the same patch, after an unsuccesful experiment trying to pull the 'refresh:' from the HTTP headers. (Firefox showed it in the "View Page Info", so I thought it was there, and in fact it is only in the HTML. Oh well.) So, have you changed your opinion at all about the value of screenscraping as a way to build a Python package tool? I notice you've sent two patches today that apply regexes to HTML. :) I am a *little* concerned about the sourceforge support, given that they could change their download system any time, and if easy_install is distributed with Python that might make it harder to upgrade. But, at least people have the option of subclassing.
Phillip J. Eby wrote:
I've incorporated more or less the same patch, after an unsuccesful experiment trying to pull the 'refresh:' from the HTTP headers. (Firefox showed it in the "View Page Info", so I thought it was there, and in fact it is only in the HTML. Oh well.)
So, have you changed your opinion at all about the value of screenscraping as a way to build a Python package tool? I notice you've sent two patches today that apply regexes to HTML. :)
Yeah... now you're making me second guess myself ;) I dunno... I just want stuff to work, and if a better solution comes along later that's fine too. I've always expected special code just for SF, since they are a big-and-annoying source of downloads. The whole thing tends to be stupid for Python code anyway, which usually isn't large enough to justify the complexity of mirroring systems (for example the zpt package is 35kb, and I'm sure the mirroring system takes far more resources to provide). With PyPI getting file hosting hopefully this kind of thing will go away -- which is why I don't think a general solution (outside of SF) is necessary, because SF is an anachronistic style of distribution.
I am a *little* concerned about the sourceforge support, given that they could change their download system any time, and if easy_install is distributed with Python that might make it harder to upgrade. But, at least people have the option of subclassing.
Yeah, I thought about that too. In practice SF doesn't change much. A better set of regexes might look for a hostname of prdownloads.(sf|sourceforge).net, and then any href=(".*?\?use_mirror=[^"]*"|.*?\?use_mirror=[^ >]*), both case insensitive, which is probably a little less fragile. There's a good chance if they ever change it that they'll provide documented APIs, since I'm sure there's a lot of screen scrapers similar to this one out there. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
participants (2)
-
Ian Bicking
-
Phillip J. Eby