[Python-3000] Draft PEP for New IO system
Walter Dörwald
walter at livinglogic.de
Tue Feb 27 22:47:26 CET 2007
Guido van Rossum wrote:
> On 2/27/07, Walter Dörwald <walter at livinglogic.de> wrote:
> [...]
>> The basic principle is that these codecs can encode strings and decode
>> bytes in multiple chunks. If you want to encode a unicode string u in
>> UTF-16 you can do it in one go:
>> s = u.encode("utf-16")
>> or character by character:
>> encoder = codecs.lookup("utf-16").incrementalencoder()
>> s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True)
>> The incremental encoder makes sure, that the result contains only one
>> BOM.
>>
>> Decoding works in the same way:
>> decoder = codecs.lookup("utf-16").incrementaldecoder()
>> u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True)
>
> Thanks for the explanations, it is a little bit clearer now!
>
> [...]
>> >> Should it be possible to change the error handling during the lifetime
>> >> of a stream? Then this change would have to be passed through to the
>> >> underlying codec.
>> >
>> > Not unless you have a really good use case handy...
>>
>> Not for decoding, but for encoding: If you're outputting XML and use an
>> encoding that can't encode all unicode characters, then it makes sense
>> to switch to "xmlcharrefreplace" error handling during the output of
>> text nodes (and back to "strict" for element names etc.).
>
> So do the incremental codecs allow this switching?
Yes:
>>> import codecs
>>> ci = codecs.lookup("ascii")
>>> enc = ci.incrementalencoder(errors="xmlcharrefreplace")
>>> enc.encode(u"\xff")
'ÿ'
>>> enc.errors = "strict"
>>> enc.encode(u"\xff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/ascii.py", line 22, in encode
return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)
And it's documented that changing the errors attribute is allowed:
http://docs.python.org/lib/incremental-encoder-objects.html
http://docs.python.org/lib/incremental-decoder-objects.html
Servus,
Walter
More information about the Python-3000
mailing list