SGMLParser eats ä etc

Anders Eriksson ameLista at telia.com
Mon Dec 1 13:03:35 EST 2003


On 30 Nov 2003 00:53:28 +0000, John J. Lee wrote:

> You probably want to use HTMLParser.HTMLParser instead (NOT the same
> thing as htmllib.HTMLParser, note).  It knows about XHTML, sgmllib &
> htmllib don't.  
å etc isn't XHTML, is it? AFAIK it is defined in HTML 4.

the strange thing is that the Character entity (i.e. å) is stripped
from the text. I don't want to change it since I'm feeding the output to a
browser.

I will try the HTMLParser instead but it seems to me that there is a bug in
SMGLParser...

// Anders




More information about the Python-list mailing list