[Python-ideas] Type hints for text/binary data in Python 2+3 code

Wed Mar 23 03:37:57 EDT 2016

> 2016-03-23, в 6:39, Guido van Rossum <guido at python.org> написал(а):
> 
>>> * Do we really need _AsciiUnicode? I see the point of _AsciiStr,
>>> because Python 2 accepts 'x' + u'' but fails '\xff' + u'', so 'x'
>>> needs to be of type _AsciiStr while '\xff' should not (it should be
>>> just str).  However there's no difference in how u'x' is treated from
>>> how u'\u1234' or u'\xff' are treated -- none of them can be
>>> concatenated to '\xff' and all of them can be concatenated to _'x'.
>> 
>> I was concerned with UnicodeEncodeErrors in Python 2 during implicit conversions from unicode to bytes:
>> 
>>    getattr(obj, u'Non-ASCII-name')
>> 
>> There are several places in the Python 2 API where these ASCII-based unicode->bytes conversions take place, so the _AsciiUnicode type comes to mind.
> 
> OK, so you want the type of u'hello' to be _AsciiUnicode but the type
> of u'Здравствуйте' to be just unicode, right? And getattr()'s second
> argument would be typed as... What?

The type of the second argument would be str, the "native string" type. If people use from __future__ import unicode_literals then there are many places in Python 2 where str is expected but an ASCII-unicode literal is given. Having the internal _AsciiUnicode type that inherits from unicode while being compatible with str (and bytes) would solve this issue.

-- 
Andrey Vlasovskikh

Web: http://pirx.ru/