Fred L. Drake, Jr.
Fri, 25 May 2001 16:39:52 -0400 (EDT)
Martin v. Loewis writes:
> One issue of reading UTF-8, whether from cStringIO or elsewhere, might
> break result strings inside a character (i.e. between character
> boundaries). So be careful with applying unicode() or .decode on such
> a string - you may have to save some bytes for the next .read() call.
Correct -- the cStringIO object is just a stream of bytes, like a
file object. To read characters, you'll need to wrap it with a
decoder using the codecs module, or pass the bytes to a parser that
can handle them properly (like Expat).
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Digital Creations