urllib.urlretrieve never returns???
John Nagle
nagle at animats.com
Tue Mar 20 16:18:24 EDT 2012
On 3/17/2012 9:34 AM, Chris Angelico wrote:
> 2012/3/18 Laszlo Nagy<gandalf at shopzeus.com>:
>> In the later case, "log.txt" only contains "#1" and nothing else. If I look
>> at pythonw.exe from task manager, then its shows +1 thread every time I
>> click the button, and "#1" is appended to the file.
Does it fail to retrieve on all URLs, or only on some of them?
Running a web crawler, I've seen some pathological cases.
There are a very few sites that emit data very, very slowly,
but don't time out because they are making progress. There are
also some sites where attempting to negotiate a SSL connection
results in the SSL protocol reaching a point where the host end
is supposed to finish the handshake, but it doesn't.
The odds are against this being the problem. I see problems
like that in maybe 1 in 100,000 URLs.
John Nagle
More information about the Python-list
mailing list