[XML-SIG] "Character reference too large" error with HtmlLib.Reader()
Lars Marius Garshol
larsga@garshol.priv.no
31 Jul 2002 14:15:02 +0200
* Lars Marius Garshol
|
| This sounds like an obvious bug. I suggest you make the smallest
| document you can that reproduces the error, and then report this as
| a bug in the PyXML Sourceforge project (it seems to be in sgmlop,
| which I don't think is part of Python proper), attaching the file to
| it.
* Martin v. Loewis
|
| It turns out that the bug is not that obvious. sgmlop cannot return
| a Unicode string, since, in SGML mode, it would have to know what the
| character set for character references is.
That I understand, but it shouldn't just say that the reference is too
big. So the error message, at least, has to be improved.
| Instead, this was a bug in xml.dom.reader.SgmlOp.HtmlParser, which
| failed to implement handler_charref (sgmlop only tries to interpret
| the character references itself if handle_charref is not
| implemented).
Sounds reasonable to me.
| This will be fixed in PyXML 0.8; the fix is in SgmlOp.py 1.10.
Good. :)
--
Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC <URL: http://www.garshol.priv.no >