[Python-Dev] lone surrogates in utf-8
Antoine Pitrou
solipsis at pitrou.net
Tue Apr 28 15:13:37 CEST 2009
Hrvoje Niksic <hrvoje.niksic <at> avl.com> writes:
>
> "Should be considered" or "will be considered"? Python 3.0's UTF-8
> decoder happily accepts it and returns u'\udcff':
>
> >>> b'\xed\xb3\xbf'.decode('utf-8')
> '\udcff'
Yes, there is already a bug entry for it:
http://bugs.python.org/issue3672
I think we could happily fix it for 3.1 (perhaps leaving 2.7 unchanged for
compatibility reasons - I don't know if some people may rely on the current
behaviour).
More information about the Python-Dev
mailing list