waiting for html to load: a followup
Josh
joshl at commenspace.org
Thu Aug 26 12:53:44 EDT 2004
Hi - A couple days ago I posted asking for help on how to download a
pushed file. I am trying to write a script to download a bunch of links
from a page that takes a while to load.
I managed to get just about everything done using python to load IE, but
aside from not really liking that style, I couldnt figure out how to
have python download the pushed file, or how to read IE headers into
python (the headers point to the download location)
Anyway, I decided to forget IE and I am now trying to use urllib2 to
open up the page, read it, etc. My problem is the page has a built-in
refresh and I don't know how to have python re-read the page until it's
ready to hand over the links.
An example of the page is:
http://edcw2ks23.cr.usgs.gov/Website/zipship/waiting.jsp?areaList=49.0,47.0,-122.0,-124.08&prodList=NED,
I believe I need to read the header, grab the cookie session id, and add
it back to the header. I can do all thus, but I'm stuck on probably
very simple syntax to re-read the page rather than open a new
connection, if that makes sense (I'm new to http as well as python).
My code snippets:
myreq = urllib2.Request(url)
opener = urllib2.build_opener()
headers = feeddata.info()
cookie = headers['set-cookie']
cookie = cookie[:-8]
while x < 10:
feeddata = opener.open(myreq)
data = feeddata.read()
myreq.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1)')
myreq.add_header('Cookie', cookie)
print data[1600:1650]
print '\n\n\n\n*****************Using Cookie: %s' % cookie
print '****************Header info: \n',headers
sleep(3)
x = x+1
Any help greatly appreciated. Thanks in advance, and when I know what
I'm doing I'll repay the favors.
-Josh
More information about the Python-list
mailing list