Mysterious xml.sax Encoding Exception
JKPeck
JKPeck at gmail.com
Mon Feb 4 17:02:09 EST 2008
On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <asmo... at in-
nomine.org> wrote:
> -On [20080201 19:06], JKPeck (JKP... at gmail.com) wrote:
>
> >In both of these cases, there are only plain, 7-bit ascii characters
> >in the xml, and it really is valid utf-16 as far as I can tell.
>
> Did you mean to say that the only characters they used in the UTF-16 encoded
> file are characters from the Basic Latin Unicode block?
>
> --
> Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
> イェルーン ラウフロック ヴァン デル ウェルヴェンhttp://www.in-nomine.org/|http://www.rangaku.org/
> We have met the enemy and they are ours...
It appears that the root cause of this problem is indeed passing a
Unicode XML string to xml.sax.parseString with an encoding declaration
in the XML of utf-16. This works with the standard distribution on
Windows. It does not work with ActiveState on Windows even though
both distributions report
64K for sys.maxunicode.
So I don't know why the results are different, but the problem is
solved by encoding the Unicode string into utf-16 before passing it to
the parser.
Thanks to all for helping to track this down.
Regards,
Jon Peck
More information about the Python-list
mailing list