Mysterious xml.sax Encoding Exception

John Machin sjmachin at lexicon.net
Tue Feb 5 00:09:48 CET 2008


On Feb 5, 9:02 am, JKPeck <JKP... at gmail.com> wrote:
> On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <asmo... at in-
>
> nomine.org> wrote:
> > -On [20080201 19:06], JKPeck (JKP... at gmail.com) wrote:
>
> > >In both of these cases, there are only plain, 7-bit ascii characters
> > >in the xml, and it really is valid utf-16 as far as I can tell.
>
> > Did you mean to say that the only characters they used in the UTF-16 encoded
> > file are characters from the Basic Latin Unicode block?
>
>
> It appears that the root cause of this problem is indeed passing a
> Unicode XML string to xml.sax.parseString with an encoding declaration
> in the XML of utf-16.  This works with the standard distribution on
> Windows.

It did NOT work for me with the standard 2.5.1 Windows distribution --
see the code + output that I posted.

>  It does not work with ActiveState on Windows even though
> both distributions report
> 64K for sys.maxunicode.
>
> So I don't know why the results are different, but the problem is
> solved by encoding the Unicode string into utf-16 before passing it to
> the parser.



More information about the Python-list mailing list