[Python-ideas] Type hints for text/binary data in Python 2+3 code
Guido van Rossum
guido at python.org
Tue Mar 22 23:39:10 EDT 2016
On Tue, Mar 22, 2016 at 3:18 PM, Andrey Vlasovskikh
<andrey.vlasovskikh at gmail.com> wrote:
>
>> 2016-03-19, в 21:51, Guido van Rossum <guido at python.org> написал(а):
>>
>> I like the way this is going. I think it needs to be a separate PEP;
>> PEP 484 is already too long and this topic deserves being written up
>> carefully (like you have done here).
>
> I would like to experiment with various text/binary types for Python 2 and 3 for some time before coming up with a PEP about it. And I would like everybody interested in 2/3 compatible type hints join the discussion. My perspective (mostly PyCharm-specific) might be a bit narrow here.
As you wish!
>> * Do we really need _AsciiUnicode? I see the point of _AsciiStr,
>> because Python 2 accepts 'x' + u'' but fails '\xff' + u'', so 'x'
>> needs to be of type _AsciiStr while '\xff' should not (it should be
>> just str). However there's no difference in how u'x' is treated from
>> how u'\u1234' or u'\xff' are treated -- none of them can be
>> concatenated to '\xff' and all of them can be concatenated to _'x'.
>
> I was concerned with UnicodeEncodeErrors in Python 2 during implicit conversions from unicode to bytes:
>
> getattr(obj, u'Non-ASCII-name')
>
> There are several places in the Python 2 API where these ASCII-based unicode->bytes conversions take place, so the _AsciiUnicode type comes to mind.
OK, so you want the type of u'hello' to be _AsciiUnicode but the type
of u'Здравствуйте' to be just unicode, right? And getattr()'s second
argument would be typed as... What?
>> * It would be helpful to spell out exactly what is and isn't allowed
>> when different core types (bytes, str, unicode, Text) meet in Python 2
>> and in Python 3. Something like a table with a row and a column for
>> each and the type of x+y (or "error") in each of the cells.
>
> Agreed. I'll try to come up with specific rules for handling text/binary types (bytes, str, unicode, Text, _Ascii*) in Python 2 and 3. For me the rules for dealing with _Ascii* look the most problematic at the moment as it's unclear how these types should propagate via text-handling functions.
You can try that out at runtime though.
>> * I propose that Python 2+3 mode is just the intersection of what
>> Python 2 and Python 3 mode allow. (In mypy, I don't think we'll
>> implement this -- users will just have to run mypy twice with and
>> without --py2. But for PyCharm it makes sense to be able to declare
>> this. Yet I think it would be good not to have to spell out separately
>> which rules it uses, defining it as the intersection of 2 and 3 is all
>> we need.
>
> Yes, there is no need in having a specific 2+3 mode, I was really referring to the intersection of the Python 2 and 3 APIs when the user accesses a text / binary method not available in both.
Cool.
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list