How to test a URL request in a "while True" loop
MRAB
python at mrabarnett.plus.com
Wed Dec 30 20:08:04 EST 2009
Brian D wrote:
> Thanks MRAB as well. I've printed all of the replies to retain with my
> pile of essential documentation.
>
> To follow up with a complete response, I'm ripping out of my mechanize
> module the essential components of the solution I got to work.
>
> The main body of the code passes a URL to the scrape_records function.
> The function attempts to open the URL five times.
>
> If the URL is opened, a values dictionary is populated and returned to
> the calling statement. If the URL cannot be opened, a fatal error is
> printed and the module terminates. There's a little sleep call in the
> function to leave time for any errant connection problem to resolve
> itself.
>
> Thanks to all for your replies. I hope this helps someone else:
>
> import urllib2, time
> from mechanize import Browser
>
> def scrape_records(url):
> maxattempts = 5
> br = Browser()
> user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
> 1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
> br.addheaders = [('User-agent', user_agent)]
> for count in xrange(maxattempts):
> try:
> print url, count
> br.open(url)
> break
> except urllib2.URLError:
> print 'URL error', count
> # Pretend a failed connection was fixed
> if count == 2:
> url = 'http://www.google.com'
> time.sleep(1)
> pass
'pass' isn't necessary.
> else:
> print 'Fatal URL error. Process terminated.'
> return None
> # Scrape page and populate valuesDict
> valuesDict = {}
> return valuesDict
>
> url = 'http://badurl'
> valuesDict = scrape_records(url)
> if valuesDict == None:
When checking whether or not something is a singleton, such as None, use
"is" or "is not" instead of "==" or "!=".
> print 'Failed to retrieve valuesDict'
More information about the Python-list
mailing list