[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Marc-Andre Lemburg
report at bugs.python.org
Thu Apr 1 10:46:34 CEST 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Ezio Melotti wrote:
>
> Ezio Melotti <ezio.melotti at gmail.com> added the comment:
>
> Here is an incomplete patch. It seems to solve the problem but I still have to add more tests and check it better.
Thanks. Please also check whether it's worthwhile unrolling those
loops by hand.
> I also wonder if the sequences with the first byte in range F5-FD (start of 4/5/6-byte sequences, restricted by RFC 3629) should behave in the same way. Right now they just "eat" the following 4/5/6 chars without checking.
I think we need to do this all the way, even though 5 and 6 byte
sequences are not used at the moment.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________
More information about the Python-bugs-list
mailing list