On 22 January 2016 at 19:08, Guido van Rossum
On Fri, Jan 22, 2016 at 10:37 AM, Brett Cannon
wrote: On Thu, 21 Jan 2016 at 10:45 Guido van Rossum
wrote: On Thu, Jan 21, 2016 at 10:14 AM, AgustÃn Herranz Cecilia
wrote: [...] Yes, this is no related with the choice of syntax for annotations directly. This is intended to help in the process of porting python2 code to python3, and it's outside of the PEP scope but related to the original problem. What I have in mind is some type aliases so you could annotate a version specific type to avoid ambiguousness on code that it's used on different versions. At the end what I originally try to said is that it's good to have a convention way to name this type aliases. Yes, this is a useful thing to discuss.
Maybe we can standardize on the types defined by the 'six' package, which is commonly used for 2-3 straddling code:
six.text_type (unicode in PY2, str in PY3) six.binary_type (str in PY2, bytes in PY3)
Actually for the latter we might as well use bytes.
I agree that `bytes` should cover str/bytes in Python 2 and `bytes` in Python 3.
OK, that's settled.
As for the textual type, I say either `text` or `unicode` since they are both unambiguous between Python 2 and 3 and get the point across.
Then let's call it unicode. I suppose we can add this to typing.py. In PY2, typing.unicode is just the built-in unicode. In PY3, it's the built-in str.
This thread came to my attention just as I'd been thinking about a related point. For me, by far the worst Unicode-related porting issue I see is people with a confused view of what type of data reading a file will give. This is because open() returns a different type (byte stream or character stream) depending on its arguments (specifically 'b' in the mode) and it's frustratingly difficult to track this type across function calls - especially in code originally written in a Python 2 environment where people *expect* to confuse bytes and strings in this context. So, for example, I see a function read_one_byte which does f.read(1), and works fine in real use when a data file (opened with 'b') is processed, but fails when sys.stdin us used (on Python 3once someone types a Unicode character). As far as I know, there's no way for type annotations to capture this distinction - either as they are at present in Python3, nor as being discussed here. But what I'm not sure of is whether it's something that *could* be tracked by a type checker. Of course I'm also not sure I'm right when I say you can't do it right now :-) Is this something worth including in the discussion, or is it a completely separate topic? Paul