A problem while using urllib
steve at holdenweb.com
Wed Oct 12 09:02:51 CEST 2005
Johnny Lee wrote:
> Alex Martelli wrote:
>>Johnny Lee <johnnyandfiona at hotmail.com> wrote:
>>> webPage = urllib2.urlopen(url)
>>> except urllib2.URLError:
>>> return True
>>> But every time when I ran to the 70 to 75 urls (that means 70-75
>>>urls have been tested via this way), the program will crash and all the
>>>urls left will raise urllib2.URLError until the program exits. I tried
>>>many ways to work it out, using urllib, set a sleep(1) in the filter (I
>>>thought it was the massive urls crashed the program). But none works.
>>>BTW, if I set the url from which the program crashed to base url, the
>>>program will still crashed at the 70-75 url. How can I solve this
>>>problem? thanks for your help
>>Sure looks like a resource leak somewhere (probably leaving a file open
>>until your program hits some wall of maximum simultaneously open files),
>>but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and
>>2.4.1). What version of Python are you using, and on what platform?
>>Maybe a simple Python upgrade might fix your problem...
> Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP.
> If you want to reproduce the problem, I can send the source to you.
> This morning I found that this is caused by urllib2. When I use urllib
> instead of urllib2, it won't crash any more. But the matters is that I
> want to catch the HTTP 404 Error which is handled by FancyURLopener in
> urllib.open(). So I can't catch it.
I'm using exactly that configuration, so if you let me have that source
I could take a look at it for you.
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
More information about the Python-list