mangled attempt at using htmllib
ari_deja at ivritype.com
Tue Oct 17 20:36:44 CEST 2000
Wow! I got sick for a few days and missed this very, very useful
tutorial. As it happens, my goals were slightly different that was
apparent from the code:
>> 200 OK <a href="urlstatusgo.html?col=test&url= /
>Well, I think your first misapprehension is that you appear to be
expecting HTTP back >form the urllib readlines() call, when in fact the
HTTP is stripped off, and what *you* >see is just the HTML!
I knew that this particular page would yield such lines. The idea was
to evaluate each such line and grap the URL between the anchor_bgn and
anchor_end, in the example shown, a simple
This might have been done more simply with regular expression, e.g.,
myUrl = re.search(r'<a href.*?>(.*?)</a>)
because, as I seem to be discovering, the "handle_data" stuff in my
>> def handle_data(self, data):
doesn't refer to the data inside the anchor tag, which is what I
wanted, but to something else (or, my current modules aren't asking for
the right thing the right way, because printing the contents of
self.c_data gives me "none" as a result.
Anyway, just getting straight on idiosyncracies of htmllib and being
reminded that cutting and pasting python code almost ALWAYS requires
attention paid to spaces--tabs convert oddly, and the interpreter on my
machine sees them as different, regardless of what they look like, has
moved me forward in very nice, useful ways.
ari at ivritype.com
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Python-list