[Python-ideas] Processing surrogates in

Serhiy Storchaka storchaka at gmail.com
Fri May 8 14:32:50 CEST 2015


On 08.05.15 15:28, Chris Angelico wrote:
> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>> Can you give a simple example of a Python 2 program that provides output
>>> that Python 3 will read as surrogates?
>>
>>
>> f.write(u'𝄞'[:1].encode('utf-8'))
>> json.dump(f, u'𝄞'[:1])
>> pickle.dump(f, u'𝄞'[:1])
>
> Not for me. In my Python 2, u'𝄞'[:1] == u'𝄞'. I suppose you're
> talking only about the (buggy) narrow builds, in which case you don't
> need to use string slicing at all. But in that case, all you're doing
> is using a single "\uNNNN" escape code to create an unmatched
> surrogate.

I want to say that that it is easy to unintentionally get a data with 
encoded lone surrogate in Python 2.




More information about the Python-ideas mailing list