[ann] CGI Link Checker 0.1
Christopher T King
squirrel at WPI.EDU
Tue Jul 13 11:26:01 EDT 2004
On Tue, 13 Jul 2004, Christopher T King wrote:
> > 2.Slow: I don't know how to make the script perform better. I've tried
> > to look into the code to make it run faster, but I couldn't do so.
>
> For the same reason as above (time is spent mostly checking the links) I
> don't think tweaking the code will help much in this case. I was going to
> suggest checking if urllib2 uses read-ahead buffering, but a quick check
> reveals it doesn't do any... perhaps the culprit is in the HTML parsing?
A further thought on the issue... the W3C's link checker might be
multithreaded, allowing it to check multiple links at the same time,
rather than waiting for each server to respond in turn. This may or may
not help in Python; Python doesn't play well with mulithreading (due to a
global interpreter lock), so whether or not you see a speedup using this
method is dependent on whether the socket module is smart enough to
release the interpreter lock (my guess is it is). Otherwise, to get the
same effect, you'd have to use the socket module directly for link
checking, in concert with the select module, which will likely get quite
messy.
More information about the Python-list
mailing list