[Python-ideas] Processing surrogates in

Fri May 8 14:41:01 CEST 2015

On Fri, May 8, 2015 at 10:32 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 08.05.15 15:28, Chris Angelico wrote:
>>
>> On Fri, May 8, 2015 at 9:18 PM, Serhiy Storchaka <storchaka at gmail.com>
>> wrote:
>>>>
>>>> Can you give a simple example of a Python 2 program that provides output
>>>> that Python 3 will read as surrogates?
>>>
>>>
>>>
>>> f.write(u'𝄞'[:1].encode('utf-8'))
>>> json.dump(f, u'𝄞'[:1])
>>> pickle.dump(f, u'𝄞'[:1])
>>
>>
>> Not for me. In my Python 2, u'𝄞'[:1] == u'𝄞'. I suppose you're
>> talking only about the (buggy) narrow builds, in which case you don't
>> need to use string slicing at all. But in that case, all you're doing
>> is using a single "\uNNNN" escape code to create an unmatched
>> surrogate.
>
>
> I want to say that that it is easy to unintentionally get a data with
> encoded lone surrogate in Python 2.

Only on Windows, where the standard builds are narrow ones. (Also, how
hard and how bad would it be to change that, and have all python.org
installers produce wide builds?)

ChrisA