Py 2.5: Bug in sgmllib

"Martin v. Löwis" martin at v.loewis.de
Sun Oct 22 08:54:15 EDT 2006


Michael Butscher schrieb:
> Is this a bug or is SGMLParser not meant to be used for unicode strings 
> (it should be documented then)?

In a sense, SGML itself is not meant to be used for Unicode. In SGML,
the document character set is subject to the SGML application. So what
specific character a character reference refers to is also subject to
the SGML application.

This entire issue is already documented; see the discussion of
convert_charref and convert_codepoint in

http://docs.python.org/lib/module-sgmllib.html

Regards,
Martin



More information about the Python-list mailing list