How to test a URL request in a "while True" loop
Brian D
briandenzer at gmail.com
Wed Dec 30 20:17:19 EST 2009
On Dec 30, 7:08 pm, MRAB <pyt... at mrabarnett.plus.com> wrote:
> Brian D wrote:
> > Thanks MRAB as well. I've printed all of the replies to retain with my
> > pile of essential documentation.
>
> > To follow up with a complete response, I'm ripping out of my mechanize
> > module the essential components of the solution I got to work.
>
> > The main body of the code passes a URL to the scrape_records function.
> > The function attempts to open the URL five times.
>
> > If the URL is opened, a values dictionary is populated and returned to
> > the calling statement. If the URL cannot be opened, a fatal error is
> > printed and the module terminates. There's a little sleep call in the
> > function to leave time for any errant connection problem to resolve
> > itself.
>
> > Thanks to all for your replies. I hope this helps someone else:
>
> > import urllib2, time
> > from mechanize import Browser
>
> > def scrape_records(url):
> > maxattempts = 5
> > br = Browser()
> > user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
> > 1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
> > br.addheaders = [('User-agent', user_agent)]
> > for count in xrange(maxattempts):
> > try:
> > print url, count
> > br.open(url)
> > break
> > except urllib2.URLError:
> > print 'URL error', count
> > # Pretend a failed connection was fixed
> > if count == 2:
> > url = 'http://www.google.com'
> > time.sleep(1)
> > pass
>
> 'pass' isn't necessary.
>
> > else:
> > print 'Fatal URL error. Process terminated.'
> > return None
> > # Scrape page and populate valuesDict
> > valuesDict = {}
> > return valuesDict
>
> > url = 'http://badurl'
> > valuesDict = scrape_records(url)
> > if valuesDict == None:
>
> When checking whether or not something is a singleton, such as None, use
> "is" or "is not" instead of "==" or "!=".
>
> > print 'Failed to retrieve valuesDict'
>
>
I'm definitely acquiring some well-deserved schooling -- and it's
really appreciated. I'd seen the "is/is not" preference before, but it
just didn't stick.
I see now that "pass" is redundant -- thanks for catching that.
Cheers.
More information about the Python-list
mailing list