Urllib2, problems with a webserver
John J. Lee
jjl at pobox.com
Tue Aug 31 22:04:58 CEST 2004
Erling Ringen Elvsrud <ere.lists at killozapHALLO.com.invalid> writes:
> HTMLParser.HTMLParseError: malformed start tag, at line 2, column 1365
> How come I get this error?
Bad HTML. (OK, I haven't actually looked at the HTML, but it's 100/1
that HTMLParser is at fault.)
I hope eventually to rewrite mechanize to use htmllib.HTMLParser
everywhere, and not use HTMLParser.HTMLParser. The former is less
fussy. That just means rewriting pullparser to support both classes,
I think. Not too hard (see ClientForm for how to do it -- why not
write a patch?-).
In the meantime, the best thing to do is to pre-process the HTML.
Inconvenient, I know. Also a bit inconvenient is that the only way to
do this ATM with mechanize is to write a tiny urllib2 handler class
(.http_response() is the handler method you want, which only exists in
the as-yet-unreleased Python 2.4, and in ClientCookie, which has a
near-identical interface to urllib2; mechanize uses ClientCookie, not
urllib2). See posts to the wwwsearch-general mailing lists for sample
Don't mix urllib2 and ClientCookie, BTW (with the exception of classes
that exist in urllib2 but not in ClientCookie: you can use those
urllib2 classes with ClientCookie).
More information about the Python-list