On Fri, Jan 22, 2016 at 12:58 PM, Paul Moore <p.f.moore@gmail.com> wrote:

On 22 January 2016 at 19:08, Guido van Rossum <guido@python.org> wrote:
> On Fri, Jan 22, 2016 at 10:37 AM, Brett Cannon <brett@python.org> wrote:
>>
>>
>>
>> On Thu, 21 Jan 2016 at 10:45 Guido van Rossum <guido@python.org> wrote:
>>>
>>> On Thu, Jan 21, 2016 at 10:14 AM, Agustín Herranz Cecilia
>>> <agustin.herranz@gmail.com> wrote:
>>> [...]
>>> Yes, this is no related with the choice of syntax for annotations
>>> directly. This is intended to help in the process of porting python2 code to
>>> python3, and it's outside of the PEP scope but related to the original
>>> problem. What I have in mind is some type aliases so you could annotate a
>>> version specific type to avoid ambiguousness on code that it's used on
>>> different versions. At the end what I originally try to said is that it's
>>> good to have a convention way to name this type aliases.
>>>
>>> Yes, this is a useful thing to discuss.
>>>
>>> Maybe we can standardize on the types defined by the 'six' package, which
>>> is commonly used for 2-3 straddling code:
>>>
>>> six.text_type (unicode in PY2, str in PY3)
>>> six.binary_type (str in PY2, bytes in PY3)
>>>
>>> Actually for the latter we might as well use bytes.
>>
>>
>> I agree that `bytes` should cover str/bytes in Python 2 and `bytes` in
>> Python 3.
>
>
> OK, that's settled.
>
>>
>> As for the textual type, I say either `text` or `unicode` since they are
>> both unambiguous between Python 2 and 3 and get the point across.
>
>
> Then let's call it unicode. I suppose we can add this to typing.py. In PY2,
> typing.unicode is just the built-in unicode. In PY3, it's the built-in str.

This thread came to my attention just as I'd been thinking about a
related point.

For me, by far the worst Unicode-related porting issue I see is people
with a confused view of what type of data reading a file will give.
This is because open() returns a different type (byte stream or
character stream) depending on its arguments (specifically 'b' in the
mode) and it's frustratingly difficult to track this type across
function calls - especially in code originally written in a Python 2
environment where people *expect* to confuse bytes and strings in this
context. So, for example, I see a function read_one_byte which does
f.read(1), and works fine in real use when a data file (opened with
'b') is processed, but fails when sys.stdin us used (on Python 3once
someone types a Unicode character).

As far as I know, there's no way for type annotations to capture this
distinction - either as they are at present in Python3, nor as being
discussed here. But what I'm not sure of is whether it's something
that *could* be tracked by a type checker. Of course I'm also not sure
I'm right when I say you can't do it right now :-)

Is this something worth including in the discussion, or is it a
completely separate topic?
Paul

--Guido van Rossum (python.org/~guido)