[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Marc-Andre Lemburg report at bugs.python.org
Thu Apr 1 10:46:34 CEST 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti at gmail.com> added the comment:
> 
> Here is an incomplete patch. It seems to solve the problem but I still have to add more tests and check it better.

Thanks. Please also check whether it's worthwhile unrolling those
loops by hand.

> I also wonder if the sequences with the first byte in range F5-FD (start of 4/5/6-byte sequences, restricted by RFC 3629) should behave in the same way. Right now they just "eat" the following 4/5/6 chars without checking.

I think we need to do this all the way, even though 5 and 6 byte
sequences are not used at the moment.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________


More information about the Python-bugs-list mailing list