Any reason why cStringIO in 2.5 behaves different from 2.4?

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Jul 27 08:59:26 CEST 2007


On Fri, 27 Jul 2007 06:47:48 +0200, Stefan Scholl wrote:

> Chris Mellon <arkanes at gmail.com> wrote:
>> XML is not a string. It's a specific type of bytestream. If you want
>> to work with XML, then generate well-formed XML in the correct
>> encoding. There's no reason you should have an XML document (as
>> opposed to values extracted from that document) in unicode objects at
>> all.
> 
> The affected method in xml.sax is called parseString()

Exactly.  It's *not* called `parseUnicode()`.

In Python you have the types `str` which is a bunch of bytes and `unicode`
which is a character string that can hold unicode characters.  How those
are internally represented is an implementation detail.  You shouldn't
know or depend on the internal representation, other modules like XML
parsers shouldn't do either.

XML, the serialized form, is about bytes in some encoding.  So this can
only be stored in `str` objects.  `unicode` is already decoded.  If you
want to feed `unicode` objects to an XML parser, simply encode it before
passing it.

The question remains why you have "serialized XML" as `unicode` in the
first place as it is about bytes not unicode characters.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list