SGMLParser eats ä etc
Anders Eriksson
ameLista at telia.com
Mon Dec 1 13:03:35 EST 2003
On 30 Nov 2003 00:53:28 +0000, John J. Lee wrote:
> You probably want to use HTMLParser.HTMLParser instead (NOT the same
> thing as htmllib.HTMLParser, note). It knows about XHTML, sgmllib &
> htmllib don't.
å etc isn't XHTML, is it? AFAIK it is defined in HTML 4.
the strange thing is that the Character entity (i.e. å) is stripped
from the text. I don't want to change it since I'm feeding the output to a
browser.
I will try the HTMLParser instead but it seems to me that there is a bug in
SMGLParser...
// Anders
More information about the Python-list
mailing list