XML can't read Unicode shock horror. News at 11.

Walter Dörwald walter at livinglogic.de
Wed Oct 31 12:17:27 EST 2001


Martin von Loewis wrote:

> Dale Strickland-Clark <dale at riverhall.NOTHANKS.co.uk> writes:
> 
> 
>>I see that this is probably the same as Python bug #216388 which has
>>been around for over a year and been given a low priority (3).
>>
> 
> It is not the same bug. Even if cStringIO supported Unicode objects,
> expat would still require byte strings.
> 
> 
>>Non-unicode XML is a bit restrictive. :-(
>>
> 
> Why do you think so? XML documents are byte sequences, not character
> strings. The *content* is Unicode; the document is not.


But xml.sax.xmlreader.InputSource provides methods setCharacterStream
and getCharacterStream, to be able to parse something which is already a
decoded unicode character stream.

Does any of the available parsers support this?

Bye,
    Walter Dörwald





More information about the Python-list mailing list