[Distutils] EasyInstall: SF installs
Ian Bicking
ianb at colorstudy.com
Mon May 30 06:57:04 CEST 2005
Phillip J. Eby wrote:
> I've incorporated more or less the same patch, after an unsuccesful
> experiment trying to pull the 'refresh:' from the HTTP headers.
> (Firefox showed it in the "View Page Info", so I thought it was there,
> and in fact it is only in the HTML. Oh well.)
>
> So, have you changed your opinion at all about the value of
> screenscraping as a way to build a Python package tool? I notice you've
> sent two patches today that apply regexes to HTML. :)
Yeah... now you're making me second guess myself ;) I dunno... I just
want stuff to work, and if a better solution comes along later that's
fine too. I've always expected special code just for SF, since they are
a big-and-annoying source of downloads. The whole thing tends to be
stupid for Python code anyway, which usually isn't large enough to
justify the complexity of mirroring systems (for example the zpt package
is 35kb, and I'm sure the mirroring system takes far more resources to
provide). With PyPI getting file hosting hopefully this kind of thing
will go away -- which is why I don't think a general solution (outside
of SF) is necessary, because SF is an anachronistic style of distribution.
> I am a *little* concerned about the sourceforge support, given that they
> could change their download system any time, and if easy_install is
> distributed with Python that might make it harder to upgrade. But, at
> least people have the option of subclassing.
Yeah, I thought about that too. In practice SF doesn't change much. A
better set of regexes might look for a hostname of
prdownloads.(sf|sourceforge).net, and then any
href=(".*?\?use_mirror=[^"]*"|.*?\?use_mirror=[^ >]*), both case
insensitive, which is probably a little less fragile. There's a good
chance if they ever change it that they'll provide documented APIs,
since I'm sure there's a lot of screen scrapers similar to this one out
there.
--
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Distutils-SIG
mailing list