[Python-3000] Draft PEP for New IO system
walter at livinglogic.de
Tue Feb 27 22:47:26 CET 2007
Guido van Rossum wrote:
> On 2/27/07, Walter Dörwald <walter at livinglogic.de> wrote:
>> The basic principle is that these codecs can encode strings and decode
>> bytes in multiple chunks. If you want to encode a unicode string u in
>> UTF-16 you can do it in one go:
>> s = u.encode("utf-16")
>> or character by character:
>> encoder = codecs.lookup("utf-16").incrementalencoder()
>> s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True)
>> The incremental encoder makes sure, that the result contains only one
>> Decoding works in the same way:
>> decoder = codecs.lookup("utf-16").incrementaldecoder()
>> u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True)
> Thanks for the explanations, it is a little bit clearer now!
>> >> Should it be possible to change the error handling during the lifetime
>> >> of a stream? Then this change would have to be passed through to the
>> >> underlying codec.
>> > Not unless you have a really good use case handy...
>> Not for decoding, but for encoding: If you're outputting XML and use an
>> encoding that can't encode all unicode characters, then it makes sense
>> to switch to "xmlcharrefreplace" error handling during the output of
>> text nodes (and back to "strict" for element names etc.).
> So do the incremental codecs allow this switching?
>>> import codecs
>>> ci = codecs.lookup("ascii")
>>> enc = ci.incrementalencoder(errors="xmlcharrefreplace")
>>> enc.errors = "strict"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/ascii.py", line 22, in encode
return codecs.ascii_encode(input, self.errors)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)
And it's documented that changing the errors attribute is allowed:
More information about the Python-3000