Proposal to extend PEP 484 (gradual typing) to support Python 2.7

Hi!, I'd come to this thread late and by coincidence, but after read the whole thread I want to share some thoughts: The main concern it's add a way to add some kind of gradual typing to python 2 code. Because it's working python2 code it can't use annotations and it can't be added in code, so a type comment is the way to go (independent of the convenience, or not, of having type annotation available on runtime). Someone points that using a comment with the python3 annotation signature it's good to educate people on how to use annotations, I feel that's not the point, the point is to people get used to type hints. For this the same syntax must be used across different python versions, so 'function type comments' must be available to use also in python 3 code, this also allow people who can't/won't use annotations to use type hinting. For this, I don't believe that the section "Suggested syntax for Python 2.7 and straddling code" added to PEP 484 is the correct way to go, the proper it's add type comments for functions, as a extension of the "Type comments" section or perhaps in a new PEP. Some concerns that must be take into account to add function type comments: - Using 'type comments' syntax of PEP484 a function signature should look like this: def func(arg1, arg2): # type: Callable[[int, int], int] """ Do something """ return arg1 + arg2 - This easily becomes a long line so breaks PEP8 and linters would complain. So it need to define a way to put the type comment in another line. The type comment will be put In the line after or in the line before? Put in another line will be only available for function type comments or for other type comments too? - With some kind of complex types the type comment surely become a long lint too, how the type comments will be break into different lines? - GVR proposal includes some kind of syntactic sugar for function type comments (" # type: (t_arg1, t_arg2) -> t_ret "). I think it's good but this must be an alternative over typing module syntax (PEP484), not the preferred way (for people get used to typehints). Is this syntactic sugar compatible with generators? The type analyzers could be differentiate between a Callable and a Generator? More concerns on type comments: - As this is intended to gradual type python2 code to port it to python 3 I think it's convenient to add some sort of import that only be used for type checking, and be only imported by the type analyzer, not the runtime. This could be achieve by prepending "#type: " to the normal import statement, something like: # type: import module # type: from package import module - Also there must be addressed how it work on a python2 to python3 environment as there are types with the same name, str for example, that works differently on each python version. If the code is for only one version uses the type names of that version. For 2/3 code types could be define with a "py2" prefix on a module that could be "py2types" having "py2str", for example, to mark things that be of python2 str type. Python 3 types will not have prefixes. I hope this reasoning/ideas will be useful. Also I hope that I have been expressed good enough, English is not my mother tongue. Agustín Herranz.

On Jan 20, 2016, at 06:27, Agustín Herranz Cecilia <agustin.herranz@gmail.com> wrote:
- GVR proposal includes some kind of syntactic sugar for function type comments (" # type: (t_arg1, t_arg2) -> t_ret "). I think it's good but this must be an alternative over typing module syntax (PEP484), not the preferred way (for people get used to typehints). Is this syntactic sugar compatible with generators? The type analyzers could be differentiate between a Callable and a Generator?
I'm pretty sure Generator is not the type of a generator function, bit of a generator object. So to type a generator function, you just write `(int, int) -> Generator[int]`. Or, the long way, `Function[[int, int], Generator[int]]`. (Of course you can use Callable instead of the more specific Function, or Iterator (or even Iterable) instead of the more specific Generator, if you want to be free to change the implementation to use an iterator class or something later, but normally you'd want the most specific type, I think.)
That sounds like a bad idea. If the typing module shadows some global, you won't get any errors, but your code will be misleading to a reader (and even worse if you from package.module import t). If the cost of the import is too high for Python 2, surely it's also too high for Python 3. And what other reason do you have for skipping it?
- Also there must be addressed how it work on a python2 to python3 environment as there are types with the same name, str for example, that works differently on each python version. If the code is for only one version uses the type names of that version.
That's the same problem that exists at runtime, and people (and tools) already know how to deal with it: use bytes when you mean bytes, unicode when you mean unicode, and str when you mean whatever is "native" to the version you're running under and are willing to deal with it. So now you just have to do the same thing in type hints that you're already doing in constructors, isinstance checks, etc. Of course many people use libraries like six to help them deal with this, which means that those libraries have to be type-hinted appropriately for both versions (maybe using different stubs for py2 and py3, with the right one selected at pip install time?), but if that's taken care of, user code should just work.

On Wed, Jan 20, 2016 at 9:42 AM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
There is no 'Function' -- it existed in mypy before PEP 484 but was replaced by 'Callable'. And you don't annotate a function def with '-> Callable' (unless it returns another function). The Callable type is only needed in the signature of higher-order functions, i.e. functions that take functions for arguments or return a function. For example, a simple map function would be written like this: def map(f: Callable[[T], S], a: List[T]) -> List[S]: ... As to generators, we just improved how mypy treats generators ( https://github.com/JukkaL/mypy/commit/d8f72279344f032e993a3518c667bba813ae04...). The Generator type has *three* parameters: the "yield" type (what's yielded), the "send" type (what you send() into the generator, and what's returned by yield), and the "return" type (what a return statement in the generator returns, i.e. the value for the StopIteration exception). You can also use Iterator if your generator doesn't expect its send() or throw() messages to be called and it isn't returning a value for the benefit of `yield from'. For example, here's a simple generator that iterates over a list of strings, skipping alternating values: def skipper(a: List[str]) -> Iterator[str]: for i, s in enumerate(a): if i%2 == 0: yield s and here's a coroutine returning a string (I know, it's pathetic, but it's an example :-): @asyncio.coroutine def readchar() -> Generator[Any, None, str]: # Implementation not shown @asyncio.coroutine def readline() -> Generator[Any, None, str]: buf = '' while True: c = yield from readchar() if not c: break buf += c if c == '\n': break return buf Here, in Generator[Any, None, str], the first parameter ('Any') refers to the type yielded -- it actually yields Futures, but we don't care about that (it's an asyncio implementation detail). The second parameter ('None') is the type returned by yield -- again, it's an implementation detail and we might just as well say 'Any' here. The third parameter (here 'str') is the type actually returned by the 'return' statement. It's illustrative to observe that the signature of readchar() is exactly the same (since it also returns a string). OTOH the return type of e.g. asyncio.sleep() is Generator[Any, None, None], because it doesn't return a value. This business is clearly still suboptimal -- we would like to introduce a new type, perhaps named Coroutine, so that you can write Coroutine[T] instead of Generator[Any, None, T]. But that would just be a shorthand. The actual type of a generator object is always some parametrization of Generator. In any case, whatever we write after the -> (i.e., the return type) is still the type of the value you get when you call the function. If the function is a generator function, the value you get is a generator object, and that's what the return type designates.
I don't know where you read about Callable vs. Function. Regarding using Iterator[T] instead of Generator[..., ..., T], you are correct. Note that you *cannot* define a generator function as returning a *subclass* of Iterator/Generator; there is no way to have a generator function instantiate some other class as its return value. Consider (ignoring generic types): class MyIterator: def __next__(self): ... def __iter__(self): ... def bar(self): ... def foo() -> MyIterator: yield x = foo() x.bar() # Boom! The type checker would assume that x has a method bar() based on the declared return type for foo(), but it doesn't. (There are a few other special cases, in addition to Generator and Iterator; declaring the return type to be Any or object is allowed.)
Exactly. Even though (when using Python 2) all type annotations are in comments, you still must write real imports. (This causes minor annoyances with linters that warn about unused imports, but there are ways to teach them.)
This is actually still a real problem. But it has no bearing on the choice of syntax for annotations in Python 2 or straddling code.
Yeah, we could use help. There are some very rudimentary stubs for a few things defined by six ( https://github.com/python/typeshed/tree/master/third_party/3/six, https://github.com/python/typeshed/tree/master/third_party/2.7/six) but we need more. There's a PR but it's of bewildering size ( https://github.com/python/typeshed/pull/21). PS. I have a hard time following the rest of Agustin's comments. The comment-based syntax I proposed for Python 2.7 does support exactly the same functionality as the official PEP 484 syntax; the only thing it doesn't allow is selectively leaving out types for some arguments -- you must use 'Any' to fill those positions. It's not a problem in practice, and it doesn't reduce functionality (omitted argument types are assumed to be Any in PEP 484 too). I should also remark that mypy supports the comment-based syntax in Python 2 mode as well as in Python 3 mode; but when writing Python 3 only code, the non-comment version is strongly preferred. (We plan to eventually produce a tool that converts the comments to standard PEP 484 syntax). -- --Guido van Rossum (python.org/~guido)

On Wednesday, January 20, 2016 4:11 PM, Guido van Rossum <guido@python.org> wrote:
There is no 'Function' -- it existed in mypy before PEP 484 but was replaced by 'Callable'. And you don't annotate a function def with '-> Callable' (unless it returns another function).
Sorry about getting the `Function` from the initial proposal instead of the current PEP. Anyway, I don't think the OP was suggesting that. If I interpreted his question right: He was expecting that the comment `(int, int) -> int` was a way to annotate a function so it comes out as type `Callable[[int, int], int]`, which is correct. And he wanted to know how to instead write a comment for a generator function of type `GeneratorFunction[[int, int], int]`, and the answer is that you don't. There is no type needed for generator functions; they're just functions that return generators. You're right that he doesn't need to know the actual type; you're never going to write that, you're just going to annotate the arguments and return value, or use the 2.x comment style: def f(arg1: int, arg2: int) -> Iterator[int] def f(arg1, arg2): # type: (int, int) -> Iterator[int] Either way, the type checker will determine that type of the function is `Callable[[int, int], Iterator[int]]`, and the only reason you'll ever care is if that type shows up in an error message.
Regarding using Iterator[T] instead of Generator[..., ..., T], you are correct.
Note that you *cannot* define a generator function as returning a *subclass* of Iterator/Generator;
But you could define it as returning the superclass `Iterable`, right? As I understand it, it's normal type variance, so any superclass will work; the only reason `Iterator` is special is that it happens to be simpler to specify than Generator and it's plausible that it isn't going to matter whether you've written a generator function or, say, a function that returns a list iterator.
there is no way to have a generator function instantiate some other class as its return value.
If you really want that, you could always write a wrapper that forwards __next__, and a decorator that applies the wrapper. Can MyPy infer the type of the decorated function from the wrapped function and the decorator? # Can I leave this annotation off? And get one specialized to the actual # argument types of the wrapped function? That would be cool. def my_iterating(func: Callable[Any, Iterator]) -> Callable[Any, MyIterator] @wraps(func) def wrapper(*args, **kw): return MyIterator(func(*args, **kw)) return wrapper @my_iterating def foo() -> Iterator[int]: yield x = foo() x.bar()

On Wed, Jan 20, 2016 at 4:54 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Not really. I understand that you're saying that after: def foo(a, b): # type: (int, int) -> str return str(a+b) the type of 'foo' is 'Callable[[int, int], int]'. But it really isn't. The type checker (e.g. mypy) knows more at this point: it knows that foo has arguments named 'a' and 'b' and that e.g. calls like 'foo(1, b=2)' are valid. There's no way to express that using Callable. Also Callable doesn't support argument defaults.
Aha. No wonder I didn't get the question. :-(
I don't think you can the word 'Callable' to show up in an error message unless it's part of the type as written somewhere. A name defined with 'def' is special and it shows up differently. (And so is a lambda.)
Yes.
Right.
I think that's an open question. Your example below is complicated because of the **args, *kw pattern.
You can't -- mypy never infers a function's type from its inner workings. However, some Googlers are working on a tool that does infer types: https://github.com/google/pytype It's early days though (relatively speaking), and I don't think it handles this case yet.
def my_iterating(func: Callable[Any, Iterator]) -> Callable[Any, MyIterator]
Alas, PEP 484 is not powerful enough to describe the relationship between the input and output functions. You'd want to do something that uses a type variable to capture all arguments together, so you could write something like T = TypeVar('T') S = TypeVar('S') def my_iterating(func: Callable[T, Iterator[S]]) -> Callable[T, MyIterator[S]]:
The only reasonable way to do something like this without adding more sophistication to PEP 484 would be to give up on the decorator and just hardcode it using a pair of functions: # API def foo() -> MyIterator[int]: return MyIterator(_foo()) # Implementation def _foo() -> Iterator[int]: yield 0 -- --Guido van Rossum (python.org/~guido)

El 2016/01/21 a las 1:11, Guido van Rossum escribió:
This type comment 'imports' are not intended to shadow the current namespace, are intended to tell the analyzer where it can find those types present in the type comments that are not in the current namespace without import in it. This surely complicates the analyzer task but helps avoid namespace pollution and also saves memory on runtime. The typical case I've found is when using a third party library (that don't have type information) and you creates objects with a factory. The class of the objects is no needed anywhere so it's not imported in the current namespace, but it's needed only for type analysis and autocomplete.
Yes, this is no related with the choice of syntax for annotations directly. This is intended to help in the process of porting python2 code to python3, and it's outside of the PEP scope but related to the original problem. What I have in mind is some type aliases so you could annotate a version specific type to avoid ambiguousness on code that it's used on different versions. At the end what I originally try to said is that it's good to have a convention way to name this type aliases. This are intended to use during the process of porting, to help some automated tools, in a period of transition between versions. It's a way to tell the analyzer that a type have a behavior, perhaps different, than the same type on the running python version. For example. You start with some working python2 code that you want to still be working. A code analysis tool can infer the types and annotate the code. Also can check which parts are py2/py3 compatible and which not, and mark those types with the mentioned type aliases. With this, and test suites, it could be calculated how much code is needed to be ported. Refactor to adapt the code to python3 maintaining code to still run on python2 (it could be marked for automate deletion), and when it's done, drop all the python2 code..
I think the process of porting it's different from the process of adapting code to work on python 2/3. Code with bytes, unicode, & str(don't mind) are not python2 code nor python3. Lot's of libraries that are 2/3 compatibles are just python2 code minimally adapted to run on python3 with six, and still be developed with a python2 style. When the time of drop python2 arrives the refactor needed will be huge. There is also an article that recently claims "Stop writing code that break on Python 4" and show code that treats python3 as the special case..
My original point is that if comment-based function annotations are going to be added, add it to python 3 too, no only for the special case of "Python 2.7 and straddling code", even though, on python 3, type annotations are preferred. I think that have the alternative to define types of a function as a type comment is a good thing because annotations could become a mesh, specially with complex types and default parameters, and I don't fell that the optional part of gradual typing must include readability. Some examples of my own code: class Field: def __init__(self, name: str, extract: Callable[[str], str], validate: Callable[[str], bool]=bool_test, transform: Callable[[str], Any]=identity) -> 'Field': class RepeatableField: def __init__(self, extract: Callable[[str], str], size: int, fields: List[Field], index_label: str, index_transform: Callable[[int], str]=lambda x: str(x)) -> 'RepeatableField': def filter_by(field_gen: Iterable[Dict[str, Any]], **kwargs) -> Generator[Dict[str, Any], Any, Any]: So, for define a comment-based function annotation it should be accepted two kind of syntax: - one 'explicit' marking the type of the function according to the PEP484 syntax: def embezzle(self, account, funds=1000000, *fake_receipts): # type: Callable[[str, int, *str], None] """Embezzle funds from account using fake receipts.""" <code goes here> like if was a normal type comment: embezzle = get_embezzle_function() # type: Callable[[str, int, *str], None] - and another one that 'implicitly' define the type of the function as Callable: def embezzle(self, account, funds=1000000, *fake_receipts): # type: (str, int, *str) -> None """Embezzle funds from account using fake receipts.""" <code goes here> Both ways are easily translated back and forth into python3 annotations. Also, comment-based function annotations easily goes over one line's characters, so it should be define which syntax is used to break the line. As it said on https://github.com/JukkaL/mypy/issues/1102 Those things should be on a PEP as a standard way to implement this, not only for mypy, also for other tools. Accept comment-based function annotations in python3 is good for migration python 2/3 code as it helps on refactor and use (better autocomplete), but makes it a python2 feature and not python3 increase the gap between versions. Hope I expressed better, if not, sorry about that. Agustín Herranz

On Thu, Jan 21, 2016 at 10:14 AM, Agustín Herranz Cecilia < agustin.herranz@gmail.com> wrote:
You're describing a case I have also encountered: we have a module with a function foo # foo_mod.py def foo(a): return a.bar() and the intention is that a is an instance of a class A defined in another module, which is not imported. If we add annotations we have to add an import from a_mod import A def foo(a: A) -> str: return a.bar() But the code that calls foo() is already importing A from a_mod somewhere, so there's not really any time wasted -- the import is just done at a different time. At least, that's the theory. In practice, indeed there are some unpleasant cases. For example, adding the explicit import might create an import cycle, and A may not yet be defined when foo_mod is loaded. We can't use the usual subterfuge, since we really need the definition of A: import a_mod def foo(a: a_mod.A) -> str: return a.bar() This will still fail if a_mod hasn't defined A yet because we reference a_mod.A at load time (annotations are evaluated when the function definition is executed). So we end up with this: import a_mod def foo(a: 'a_mod.A') -> str: return a.bar() This is both hard to read and probably wastes a lot of developer time figuring out they have to do this. And there are other issues, e.g. some folks have tricks to work around their start-up time by importing modules late (e.g. do the import inside the function that needs that module). In mypy there's another hack possible: it doesn't care if an import is inside "if False". So you can write: if False: from a_mod import A def foo(a: 'A') -> str: return a.bar() You still have to quote 'A' because A isn't actually defined at run time, but it's the best we can do. When using type comments you can skip the quotes: if False: from a_mod import A def foo(a): # type: (A) -> str return a.bar() All of this is unpleasant but not unbearable -- the big constraint here is that we don't want to add extra syntax (beyond PEP 3107, i.e. function annotations), so that we can use mypy for Python 3.2 and up. And with the type comments we even support 2.7.
Yes, this is a useful thing to discuss. Maybe we can standardize on the types defined by the 'six' package, which is commonly used for 2-3 straddling code: six.text_type (unicode in PY2, str in PY3) six.binary_type (str in PY2, bytes in PY3) Actually for the latter we might as well use bytes.
Yes, that's the kind of process we're trying to develop. It's still early days though -- people have gotten different workflows already using six and tests and the notion of straddling code, __future__ imports, and PyPI backports of some PY3 stdlib packages (e.g. contextlib2). There's also a healthy set of tools that converts PY2 code to straddling code, approximately (e.g. futurize and modernize). What's missing (as you point out) is tools that help automating a larger part of the conversion once PY2 code has been annotated. But first we need to agree on how to annotate PY2 code.
The text I added to the end of PEP 484 already says so: """ - Tools that support this syntax should support it regardless of the Python version being checked. This is necessary in order to support code that straddles Python 2 and Python 3. """
I don't see what adding support for # type: Callable[[str, int, *str], None] adds. It's more verbose, and when the 'implicit' notation is used, the type checker already knows that embezzle is a function with that signature. You can already do this (except for the *str part): from typing import Callable def embezzle(account, funds=1000000): # type: (str, int) -> None """Embezzle funds from account using fake receipts.""" pass f = None # type: Callable[[str, int], None] f = embezzle f('a', 42) However, note that no matter which notation you use, there's no way in PEP 484 to write the type of the original embezzle() function object using Callable -- Callable does not have support for varargs like *fake_receipts. If you want that the best place to bring it up is the typehinting tracker ( https://github.com/ambv/typehinting/issues). But it's going to be a tough nut to crack, and the majority of places where Callable is needed (mostly higher-order functions like filter/map) don't need it -- their function arguments have purely positional arguments.
Consider it done. The time machine strikes again. :-)
Hope I expressed better, if not, sorry about that.
It's perfectly fine this time!
Agustín Herranz
-- --Guido van Rossum (python.org/~guido)

On 1/21/2016 1:44 PM, Guido van Rossum wrote: [Snip discussion of nitty-gritty issue of annotating code, especially 2.7 code.] I suspect that at this point making migration from 2.7 to 3.x *easier*, with annotations, will do more to encourage migration, overall, than yet another new module. So I support this work even if I will not directly use it. If you are looking for a PyCon talk topic, I think this, with your experiences up that time, would be a good one. Only slightly off topic, I also think it worthwhile to reiterate that pydev support for 2.7 really really will end in 2020, possibly on the first day, as now documented in the nice new front-page devguide chart. https://docs.python.org/devguide/#status-of-python-branches I have read people saying (SO comments, I think) that there might or will be a security-patch only phase of some number of years *after* that.
PEP 484 gives the motivation for 2.7 compatible type comments as "Some tools may want to support type annotations in code that must be compatible with Python 2.7. " To me, this just implies running a static analyzer over *existing* code. Using type hint comments to help automate conversion, if indeed possible, would be worth adding to the motivation.
But first we need to agree on how to annotate PY2 code.
Given the current addition to an accepted PEP, I though we more or less had, at least provisionally. -- Terry Jan Reedy

On Thu, 21 Jan 2016 at 10:45 Guido van Rossum <guido@python.org> wrote:
I agree that `bytes` should cover str/bytes in Python 2 and `bytes` in Python 3. As for the textual type, I say either `text` or `unicode` since they are both unambiguous between Python 2 and 3 and get the point across. And does `str` represent the type for the specific version of Python mypy is running under, or is it pegged to a specific representation across Python 2 and 3? If it's the former then fine, else those people who use the "native string" concept might want a way to say "I want the `str` type as defined on the version of Python I'm running under" (personally I don't promote the "native string" concept, but I know it has been brought up in the past). -Brett

On Fri, Jan 22, 2016 at 10:37 AM, Brett Cannon <brett@python.org> wrote:
OK, that's settled.
As for the textual type, I say either `text` or `unicode` since they are both unambiguous between Python 2 and 3 and get the point across.
Then let's call it unicode. I suppose we can add this to typing.py. In PY2, typing.unicode is just the built-in unicode. In PY3, it's the built-in str.
In mypy (and in typeshed and in typing.py), 'str' refers to the type named str in the Python version for which you are checking -- i.e. by default mypy checks in PY3 mode and str will be the unicode type; but "mypy --py2" checks in PY2 mode and str will be the Python 2 8-bit string type. (This is actually the only thing that makes sense IMO.) There's one more thing that I wonder might be useful. In PY2 we have basestring as the supertype of str and unicode. As far as mypy is concerned it's almost the same as Union[str, unicode]. Maybe we could add this to typing.py as well so it's also available in PY3, in that case as a shorthand for Union[str, unicode]. FWIW We are having a long discussion about this topic in the mypy tracker: https://github.com/JukkaL/mypy/issues/1135 -- interested parties are invited to participate there! -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 22, 2016 at 11:19 AM, Random832 <random832@fastmail.com> wrote:
There are many differences between PY2 and PY3, not the least in the stubs for the stdlib. If you get an expression by calling a built-in function (or anything else that's not a literal) the type depends on what's in the stub. The architecture of mypy just isn't designed to take two different sets of stubs (and other differences in rules, e.g. whether something's an iterator because it defines '__next__' or 'next') into account at once. -- --Guido van Rossum (python.org/~guido)

On 22 January 2016 at 19:08, Guido van Rossum <guido@python.org> wrote:
This thread came to my attention just as I'd been thinking about a related point. For me, by far the worst Unicode-related porting issue I see is people with a confused view of what type of data reading a file will give. This is because open() returns a different type (byte stream or character stream) depending on its arguments (specifically 'b' in the mode) and it's frustratingly difficult to track this type across function calls - especially in code originally written in a Python 2 environment where people *expect* to confuse bytes and strings in this context. So, for example, I see a function read_one_byte which does f.read(1), and works fine in real use when a data file (opened with 'b') is processed, but fails when sys.stdin us used (on Python 3once someone types a Unicode character). As far as I know, there's no way for type annotations to capture this distinction - either as they are at present in Python3, nor as being discussed here. But what I'm not sure of is whether it's something that *could* be tracked by a type checker. Of course I'm also not sure I'm right when I say you can't do it right now :-) Is this something worth including in the discussion, or is it a completely separate topic? Paul

Interesting. PEP 484 defines an IO generic class, so you can write IO[str] or IO[bytes]. Maybe introducing separate helper functions that open files in text or binary mode can complement this to get a solution? On Fri, Jan 22, 2016 at 12:58 PM, Paul Moore <p.f.moore@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Jan 22, 2016, at 13:11, Guido van Rossum <guido@python.org> wrote:
Interesting. PEP 484 defines an IO generic class, so you can write IO[str] or IO[bytes]. Maybe introducing separate helper functions that open files in text or binary mode can complement this to get a solution?
The runtime types are a little weird here as well. In 3.x, open returns different types depending on the value, rather than the type, of its inputs. Also, TextIOBase is a subclass of IOBase, even though it isn't a subtype in the LSP sense, so you have to test isinstance(IOBase) and not isinstance(TextIOBase) to know that read() is going to return bytes. That's all a little wonky, but not impossible to deal with. In 2.x, most file-like objects--including file itself, which open returns--don't satisfy either ABC, and most of them can return either type from read. Having a different function for open-binary instead of a mode flag would solve this, but it seems a little late to be adding that now. You'd have to go through all your 2.x code and change every open to one of the two new functions just to statically type your code, and then change it again for 3.x. Plus, you'd need to do the same thing not just for the builtin open, but for every library that provides an open-like method. Maybe this special case is special enough that static type checkers just have to deal with it specially? When the mode flag is a literal, process it; when it's forwarded from another function, it may be possible to get the type from there; otherwise, everything is just unicode|bytes and the type checker can't know any more unless you explicitly tell it (by annotating the variable the result of open is stored in).

On Fri, Jan 22, 2016 at 1:40 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Agreed. At this level it's really hard to fix. :-(
Well, the type returned by the builtin open() never returns Unicode. For duck types (and even StringIO) it's indeed a crapshoot. :-(
Yeah, painful. Though in most cases you can also patch up individual calls using cast(IO[str], open(...)) etc.
That would be a lot of work too. We have so many other important-but-not-urgent things already that I would really like to push back on this until someone has actually tried the alternative and tells us how bad it is (like Ben Darnell did for @overload). -- --Guido van Rossum (python.org/~guido)

Instead of special-casing open() specifically, adding a 'Literal' class would solve this issue (although only in a stub file): @overload def open(mode: Literal['rb', 'wb', 'ab']) -> BufferedIOBase: ... @overload def open(mode: Literal['rt', 'wt', 'at']) -> TextIOBase: ... Literal[a,b,c] == Union[Literal[a], Literal[b], Literal[c]] for convenience purposes. To avoid repetition, func(arg: Literal='value') could be made equivalent to func(arg: Literal['value']='value'). Typecheckers should just treat this the same as the type of the value, but for cases where it knows the value (literals or aliases) check the value too. (Either by comparison for core types, or just by identity. That allows use of object() sentinel values or Enum members.)

On Jan 22, 2016, at 10:37, Brett Cannon <brett@python.org> wrote:
The only problem is that, while bytes is a builtin type in both 2.7 and 3.x, with similar behavior (especially in 3.5, where simple %-formatting code works the same as in 2.7), unicode exists in 2.x but not 3.x, so that would require people writing something like "try: unicode except: unicode=str" at the top of every file (or monkeypatching builtins somewhere) for the annotations to actually be valid 3.x code. And, if you're going to do that, using something that's already wide-spread and as close to a de facto standard as possible, like the six type suggested by Guido, seems less disruptive than inventing a new standard (even if "text" or "unicode" is a little nicer than "six.text_type"). (Or, of course, Guido could just get in his time machine and, along with restoring the u string literal prefix in 3.3, also restore the builtin name unicode as a synonym for str, and then this whole mail thread would fade out like Marty McFly.) Also, don't forget "basestring", which some 2.x code uses. A lot of such code just drops bytes support when modernizing, but if not, it has to change to something that means basestring or str|unicode in 2.x and bytes|str in 3.x. Again, six has a solution for that, string_types, and mypy could standardize on that solution too.
And does `str` represent the type for the specific version of Python mypy is running under, or is it pegged to a specific representation across Python 2 and 3? If it's the former then fine,
In six-based code, it means native string, and there are tools designed to help you go over all your str uses and decide which ones should be changed to something else (usually text_type or binary_type), but no special name to use when you decide "I really do want native str here". So, I think it makes sense for mypy to assume the same, rather than to encourage people to shadow or rebind str to make mypy happy in 2.x. Speaking of native strings: six code often doesn't use native strings for __str__, instead using explicit text, and the @python_2_unicode_compatible class decorator. Will mypy need special support for that decorator to handle those types? If so, it's probably worth adding; otherwise, it would be encouraging people to stick with native strings instead of switching to text.

Looks like our messages crossed. On Fri, Jan 22, 2016 at 11:35 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
That decorator is in the typeshed stubs and appears to work -- although it looks like it's just a noop even in PY2. If that requires tweaks please submit a bug to the typeshed project tracker ( https://github.com/python/typeshed/issues). -- --Guido van Rossum (python.org/~guido)

On Fri, 22 Jan 2016 at 11:35 Andrew Barnert <abarnert@yahoo.com> wrote:
But why do they have to be valid code? This is for Python 2/3 code which means any typing information is going to be in a comment and so it isn't important that it be valid code as-is as long as the tools involved realize what `unicode` represents. IOW if mypy knows what the `unicode` type represents in PY3 mode then what does it matter if `unicode` is not a built-in type of Python 3?
I long thought about that option, but I don't think it buys us enough to bother to add the alias for `str` in Python 3. Considering all of the other built-in tweaks you typically end up making, I don't think this one change is worth it. -Brett

On Sat, 23 Jan 2016 at 11:18 Brett Cannon <brett@python.org> wrote:
I should also mention that Guido is suggesting typing.unicode come into existence, so there is no special import guard necessary. And since you will be importing `typing` anyway for type details then having typing.unicode in both Python 2 and Python 3 is a very minor overhead.

On Jan 20, 2016, at 06:27, Agustín Herranz Cecilia <agustin.herranz@gmail.com> wrote:
- GVR proposal includes some kind of syntactic sugar for function type comments (" # type: (t_arg1, t_arg2) -> t_ret "). I think it's good but this must be an alternative over typing module syntax (PEP484), not the preferred way (for people get used to typehints). Is this syntactic sugar compatible with generators? The type analyzers could be differentiate between a Callable and a Generator?
I'm pretty sure Generator is not the type of a generator function, bit of a generator object. So to type a generator function, you just write `(int, int) -> Generator[int]`. Or, the long way, `Function[[int, int], Generator[int]]`. (Of course you can use Callable instead of the more specific Function, or Iterator (or even Iterable) instead of the more specific Generator, if you want to be free to change the implementation to use an iterator class or something later, but normally you'd want the most specific type, I think.)
That sounds like a bad idea. If the typing module shadows some global, you won't get any errors, but your code will be misleading to a reader (and even worse if you from package.module import t). If the cost of the import is too high for Python 2, surely it's also too high for Python 3. And what other reason do you have for skipping it?
- Also there must be addressed how it work on a python2 to python3 environment as there are types with the same name, str for example, that works differently on each python version. If the code is for only one version uses the type names of that version.
That's the same problem that exists at runtime, and people (and tools) already know how to deal with it: use bytes when you mean bytes, unicode when you mean unicode, and str when you mean whatever is "native" to the version you're running under and are willing to deal with it. So now you just have to do the same thing in type hints that you're already doing in constructors, isinstance checks, etc. Of course many people use libraries like six to help them deal with this, which means that those libraries have to be type-hinted appropriately for both versions (maybe using different stubs for py2 and py3, with the right one selected at pip install time?), but if that's taken care of, user code should just work.

On Wed, Jan 20, 2016 at 9:42 AM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
There is no 'Function' -- it existed in mypy before PEP 484 but was replaced by 'Callable'. And you don't annotate a function def with '-> Callable' (unless it returns another function). The Callable type is only needed in the signature of higher-order functions, i.e. functions that take functions for arguments or return a function. For example, a simple map function would be written like this: def map(f: Callable[[T], S], a: List[T]) -> List[S]: ... As to generators, we just improved how mypy treats generators ( https://github.com/JukkaL/mypy/commit/d8f72279344f032e993a3518c667bba813ae04...). The Generator type has *three* parameters: the "yield" type (what's yielded), the "send" type (what you send() into the generator, and what's returned by yield), and the "return" type (what a return statement in the generator returns, i.e. the value for the StopIteration exception). You can also use Iterator if your generator doesn't expect its send() or throw() messages to be called and it isn't returning a value for the benefit of `yield from'. For example, here's a simple generator that iterates over a list of strings, skipping alternating values: def skipper(a: List[str]) -> Iterator[str]: for i, s in enumerate(a): if i%2 == 0: yield s and here's a coroutine returning a string (I know, it's pathetic, but it's an example :-): @asyncio.coroutine def readchar() -> Generator[Any, None, str]: # Implementation not shown @asyncio.coroutine def readline() -> Generator[Any, None, str]: buf = '' while True: c = yield from readchar() if not c: break buf += c if c == '\n': break return buf Here, in Generator[Any, None, str], the first parameter ('Any') refers to the type yielded -- it actually yields Futures, but we don't care about that (it's an asyncio implementation detail). The second parameter ('None') is the type returned by yield -- again, it's an implementation detail and we might just as well say 'Any' here. The third parameter (here 'str') is the type actually returned by the 'return' statement. It's illustrative to observe that the signature of readchar() is exactly the same (since it also returns a string). OTOH the return type of e.g. asyncio.sleep() is Generator[Any, None, None], because it doesn't return a value. This business is clearly still suboptimal -- we would like to introduce a new type, perhaps named Coroutine, so that you can write Coroutine[T] instead of Generator[Any, None, T]. But that would just be a shorthand. The actual type of a generator object is always some parametrization of Generator. In any case, whatever we write after the -> (i.e., the return type) is still the type of the value you get when you call the function. If the function is a generator function, the value you get is a generator object, and that's what the return type designates.
I don't know where you read about Callable vs. Function. Regarding using Iterator[T] instead of Generator[..., ..., T], you are correct. Note that you *cannot* define a generator function as returning a *subclass* of Iterator/Generator; there is no way to have a generator function instantiate some other class as its return value. Consider (ignoring generic types): class MyIterator: def __next__(self): ... def __iter__(self): ... def bar(self): ... def foo() -> MyIterator: yield x = foo() x.bar() # Boom! The type checker would assume that x has a method bar() based on the declared return type for foo(), but it doesn't. (There are a few other special cases, in addition to Generator and Iterator; declaring the return type to be Any or object is allowed.)
Exactly. Even though (when using Python 2) all type annotations are in comments, you still must write real imports. (This causes minor annoyances with linters that warn about unused imports, but there are ways to teach them.)
This is actually still a real problem. But it has no bearing on the choice of syntax for annotations in Python 2 or straddling code.
Yeah, we could use help. There are some very rudimentary stubs for a few things defined by six ( https://github.com/python/typeshed/tree/master/third_party/3/six, https://github.com/python/typeshed/tree/master/third_party/2.7/six) but we need more. There's a PR but it's of bewildering size ( https://github.com/python/typeshed/pull/21). PS. I have a hard time following the rest of Agustin's comments. The comment-based syntax I proposed for Python 2.7 does support exactly the same functionality as the official PEP 484 syntax; the only thing it doesn't allow is selectively leaving out types for some arguments -- you must use 'Any' to fill those positions. It's not a problem in practice, and it doesn't reduce functionality (omitted argument types are assumed to be Any in PEP 484 too). I should also remark that mypy supports the comment-based syntax in Python 2 mode as well as in Python 3 mode; but when writing Python 3 only code, the non-comment version is strongly preferred. (We plan to eventually produce a tool that converts the comments to standard PEP 484 syntax). -- --Guido van Rossum (python.org/~guido)

On Wednesday, January 20, 2016 4:11 PM, Guido van Rossum <guido@python.org> wrote:
There is no 'Function' -- it existed in mypy before PEP 484 but was replaced by 'Callable'. And you don't annotate a function def with '-> Callable' (unless it returns another function).
Sorry about getting the `Function` from the initial proposal instead of the current PEP. Anyway, I don't think the OP was suggesting that. If I interpreted his question right: He was expecting that the comment `(int, int) -> int` was a way to annotate a function so it comes out as type `Callable[[int, int], int]`, which is correct. And he wanted to know how to instead write a comment for a generator function of type `GeneratorFunction[[int, int], int]`, and the answer is that you don't. There is no type needed for generator functions; they're just functions that return generators. You're right that he doesn't need to know the actual type; you're never going to write that, you're just going to annotate the arguments and return value, or use the 2.x comment style: def f(arg1: int, arg2: int) -> Iterator[int] def f(arg1, arg2): # type: (int, int) -> Iterator[int] Either way, the type checker will determine that type of the function is `Callable[[int, int], Iterator[int]]`, and the only reason you'll ever care is if that type shows up in an error message.
Regarding using Iterator[T] instead of Generator[..., ..., T], you are correct.
Note that you *cannot* define a generator function as returning a *subclass* of Iterator/Generator;
But you could define it as returning the superclass `Iterable`, right? As I understand it, it's normal type variance, so any superclass will work; the only reason `Iterator` is special is that it happens to be simpler to specify than Generator and it's plausible that it isn't going to matter whether you've written a generator function or, say, a function that returns a list iterator.
there is no way to have a generator function instantiate some other class as its return value.
If you really want that, you could always write a wrapper that forwards __next__, and a decorator that applies the wrapper. Can MyPy infer the type of the decorated function from the wrapped function and the decorator? # Can I leave this annotation off? And get one specialized to the actual # argument types of the wrapped function? That would be cool. def my_iterating(func: Callable[Any, Iterator]) -> Callable[Any, MyIterator] @wraps(func) def wrapper(*args, **kw): return MyIterator(func(*args, **kw)) return wrapper @my_iterating def foo() -> Iterator[int]: yield x = foo() x.bar()

On Wed, Jan 20, 2016 at 4:54 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Not really. I understand that you're saying that after: def foo(a, b): # type: (int, int) -> str return str(a+b) the type of 'foo' is 'Callable[[int, int], int]'. But it really isn't. The type checker (e.g. mypy) knows more at this point: it knows that foo has arguments named 'a' and 'b' and that e.g. calls like 'foo(1, b=2)' are valid. There's no way to express that using Callable. Also Callable doesn't support argument defaults.
Aha. No wonder I didn't get the question. :-(
I don't think you can the word 'Callable' to show up in an error message unless it's part of the type as written somewhere. A name defined with 'def' is special and it shows up differently. (And so is a lambda.)
Yes.
Right.
I think that's an open question. Your example below is complicated because of the **args, *kw pattern.
You can't -- mypy never infers a function's type from its inner workings. However, some Googlers are working on a tool that does infer types: https://github.com/google/pytype It's early days though (relatively speaking), and I don't think it handles this case yet.
def my_iterating(func: Callable[Any, Iterator]) -> Callable[Any, MyIterator]
Alas, PEP 484 is not powerful enough to describe the relationship between the input and output functions. You'd want to do something that uses a type variable to capture all arguments together, so you could write something like T = TypeVar('T') S = TypeVar('S') def my_iterating(func: Callable[T, Iterator[S]]) -> Callable[T, MyIterator[S]]:
The only reasonable way to do something like this without adding more sophistication to PEP 484 would be to give up on the decorator and just hardcode it using a pair of functions: # API def foo() -> MyIterator[int]: return MyIterator(_foo()) # Implementation def _foo() -> Iterator[int]: yield 0 -- --Guido van Rossum (python.org/~guido)

El 2016/01/21 a las 1:11, Guido van Rossum escribió:
This type comment 'imports' are not intended to shadow the current namespace, are intended to tell the analyzer where it can find those types present in the type comments that are not in the current namespace without import in it. This surely complicates the analyzer task but helps avoid namespace pollution and also saves memory on runtime. The typical case I've found is when using a third party library (that don't have type information) and you creates objects with a factory. The class of the objects is no needed anywhere so it's not imported in the current namespace, but it's needed only for type analysis and autocomplete.
Yes, this is no related with the choice of syntax for annotations directly. This is intended to help in the process of porting python2 code to python3, and it's outside of the PEP scope but related to the original problem. What I have in mind is some type aliases so you could annotate a version specific type to avoid ambiguousness on code that it's used on different versions. At the end what I originally try to said is that it's good to have a convention way to name this type aliases. This are intended to use during the process of porting, to help some automated tools, in a period of transition between versions. It's a way to tell the analyzer that a type have a behavior, perhaps different, than the same type on the running python version. For example. You start with some working python2 code that you want to still be working. A code analysis tool can infer the types and annotate the code. Also can check which parts are py2/py3 compatible and which not, and mark those types with the mentioned type aliases. With this, and test suites, it could be calculated how much code is needed to be ported. Refactor to adapt the code to python3 maintaining code to still run on python2 (it could be marked for automate deletion), and when it's done, drop all the python2 code..
I think the process of porting it's different from the process of adapting code to work on python 2/3. Code with bytes, unicode, & str(don't mind) are not python2 code nor python3. Lot's of libraries that are 2/3 compatibles are just python2 code minimally adapted to run on python3 with six, and still be developed with a python2 style. When the time of drop python2 arrives the refactor needed will be huge. There is also an article that recently claims "Stop writing code that break on Python 4" and show code that treats python3 as the special case..
My original point is that if comment-based function annotations are going to be added, add it to python 3 too, no only for the special case of "Python 2.7 and straddling code", even though, on python 3, type annotations are preferred. I think that have the alternative to define types of a function as a type comment is a good thing because annotations could become a mesh, specially with complex types and default parameters, and I don't fell that the optional part of gradual typing must include readability. Some examples of my own code: class Field: def __init__(self, name: str, extract: Callable[[str], str], validate: Callable[[str], bool]=bool_test, transform: Callable[[str], Any]=identity) -> 'Field': class RepeatableField: def __init__(self, extract: Callable[[str], str], size: int, fields: List[Field], index_label: str, index_transform: Callable[[int], str]=lambda x: str(x)) -> 'RepeatableField': def filter_by(field_gen: Iterable[Dict[str, Any]], **kwargs) -> Generator[Dict[str, Any], Any, Any]: So, for define a comment-based function annotation it should be accepted two kind of syntax: - one 'explicit' marking the type of the function according to the PEP484 syntax: def embezzle(self, account, funds=1000000, *fake_receipts): # type: Callable[[str, int, *str], None] """Embezzle funds from account using fake receipts.""" <code goes here> like if was a normal type comment: embezzle = get_embezzle_function() # type: Callable[[str, int, *str], None] - and another one that 'implicitly' define the type of the function as Callable: def embezzle(self, account, funds=1000000, *fake_receipts): # type: (str, int, *str) -> None """Embezzle funds from account using fake receipts.""" <code goes here> Both ways are easily translated back and forth into python3 annotations. Also, comment-based function annotations easily goes over one line's characters, so it should be define which syntax is used to break the line. As it said on https://github.com/JukkaL/mypy/issues/1102 Those things should be on a PEP as a standard way to implement this, not only for mypy, also for other tools. Accept comment-based function annotations in python3 is good for migration python 2/3 code as it helps on refactor and use (better autocomplete), but makes it a python2 feature and not python3 increase the gap between versions. Hope I expressed better, if not, sorry about that. Agustín Herranz

On Thu, Jan 21, 2016 at 10:14 AM, Agustín Herranz Cecilia < agustin.herranz@gmail.com> wrote:
You're describing a case I have also encountered: we have a module with a function foo # foo_mod.py def foo(a): return a.bar() and the intention is that a is an instance of a class A defined in another module, which is not imported. If we add annotations we have to add an import from a_mod import A def foo(a: A) -> str: return a.bar() But the code that calls foo() is already importing A from a_mod somewhere, so there's not really any time wasted -- the import is just done at a different time. At least, that's the theory. In practice, indeed there are some unpleasant cases. For example, adding the explicit import might create an import cycle, and A may not yet be defined when foo_mod is loaded. We can't use the usual subterfuge, since we really need the definition of A: import a_mod def foo(a: a_mod.A) -> str: return a.bar() This will still fail if a_mod hasn't defined A yet because we reference a_mod.A at load time (annotations are evaluated when the function definition is executed). So we end up with this: import a_mod def foo(a: 'a_mod.A') -> str: return a.bar() This is both hard to read and probably wastes a lot of developer time figuring out they have to do this. And there are other issues, e.g. some folks have tricks to work around their start-up time by importing modules late (e.g. do the import inside the function that needs that module). In mypy there's another hack possible: it doesn't care if an import is inside "if False". So you can write: if False: from a_mod import A def foo(a: 'A') -> str: return a.bar() You still have to quote 'A' because A isn't actually defined at run time, but it's the best we can do. When using type comments you can skip the quotes: if False: from a_mod import A def foo(a): # type: (A) -> str return a.bar() All of this is unpleasant but not unbearable -- the big constraint here is that we don't want to add extra syntax (beyond PEP 3107, i.e. function annotations), so that we can use mypy for Python 3.2 and up. And with the type comments we even support 2.7.
Yes, this is a useful thing to discuss. Maybe we can standardize on the types defined by the 'six' package, which is commonly used for 2-3 straddling code: six.text_type (unicode in PY2, str in PY3) six.binary_type (str in PY2, bytes in PY3) Actually for the latter we might as well use bytes.
Yes, that's the kind of process we're trying to develop. It's still early days though -- people have gotten different workflows already using six and tests and the notion of straddling code, __future__ imports, and PyPI backports of some PY3 stdlib packages (e.g. contextlib2). There's also a healthy set of tools that converts PY2 code to straddling code, approximately (e.g. futurize and modernize). What's missing (as you point out) is tools that help automating a larger part of the conversion once PY2 code has been annotated. But first we need to agree on how to annotate PY2 code.
The text I added to the end of PEP 484 already says so: """ - Tools that support this syntax should support it regardless of the Python version being checked. This is necessary in order to support code that straddles Python 2 and Python 3. """
I don't see what adding support for # type: Callable[[str, int, *str], None] adds. It's more verbose, and when the 'implicit' notation is used, the type checker already knows that embezzle is a function with that signature. You can already do this (except for the *str part): from typing import Callable def embezzle(account, funds=1000000): # type: (str, int) -> None """Embezzle funds from account using fake receipts.""" pass f = None # type: Callable[[str, int], None] f = embezzle f('a', 42) However, note that no matter which notation you use, there's no way in PEP 484 to write the type of the original embezzle() function object using Callable -- Callable does not have support for varargs like *fake_receipts. If you want that the best place to bring it up is the typehinting tracker ( https://github.com/ambv/typehinting/issues). But it's going to be a tough nut to crack, and the majority of places where Callable is needed (mostly higher-order functions like filter/map) don't need it -- their function arguments have purely positional arguments.
Consider it done. The time machine strikes again. :-)
Hope I expressed better, if not, sorry about that.
It's perfectly fine this time!
Agustín Herranz
-- --Guido van Rossum (python.org/~guido)

On 1/21/2016 1:44 PM, Guido van Rossum wrote: [Snip discussion of nitty-gritty issue of annotating code, especially 2.7 code.] I suspect that at this point making migration from 2.7 to 3.x *easier*, with annotations, will do more to encourage migration, overall, than yet another new module. So I support this work even if I will not directly use it. If you are looking for a PyCon talk topic, I think this, with your experiences up that time, would be a good one. Only slightly off topic, I also think it worthwhile to reiterate that pydev support for 2.7 really really will end in 2020, possibly on the first day, as now documented in the nice new front-page devguide chart. https://docs.python.org/devguide/#status-of-python-branches I have read people saying (SO comments, I think) that there might or will be a security-patch only phase of some number of years *after* that.
PEP 484 gives the motivation for 2.7 compatible type comments as "Some tools may want to support type annotations in code that must be compatible with Python 2.7. " To me, this just implies running a static analyzer over *existing* code. Using type hint comments to help automate conversion, if indeed possible, would be worth adding to the motivation.
But first we need to agree on how to annotate PY2 code.
Given the current addition to an accepted PEP, I though we more or less had, at least provisionally. -- Terry Jan Reedy

On Thu, 21 Jan 2016 at 10:45 Guido van Rossum <guido@python.org> wrote:
I agree that `bytes` should cover str/bytes in Python 2 and `bytes` in Python 3. As for the textual type, I say either `text` or `unicode` since they are both unambiguous between Python 2 and 3 and get the point across. And does `str` represent the type for the specific version of Python mypy is running under, or is it pegged to a specific representation across Python 2 and 3? If it's the former then fine, else those people who use the "native string" concept might want a way to say "I want the `str` type as defined on the version of Python I'm running under" (personally I don't promote the "native string" concept, but I know it has been brought up in the past). -Brett

On Fri, Jan 22, 2016 at 10:37 AM, Brett Cannon <brett@python.org> wrote:
OK, that's settled.
As for the textual type, I say either `text` or `unicode` since they are both unambiguous between Python 2 and 3 and get the point across.
Then let's call it unicode. I suppose we can add this to typing.py. In PY2, typing.unicode is just the built-in unicode. In PY3, it's the built-in str.
In mypy (and in typeshed and in typing.py), 'str' refers to the type named str in the Python version for which you are checking -- i.e. by default mypy checks in PY3 mode and str will be the unicode type; but "mypy --py2" checks in PY2 mode and str will be the Python 2 8-bit string type. (This is actually the only thing that makes sense IMO.) There's one more thing that I wonder might be useful. In PY2 we have basestring as the supertype of str and unicode. As far as mypy is concerned it's almost the same as Union[str, unicode]. Maybe we could add this to typing.py as well so it's also available in PY3, in that case as a shorthand for Union[str, unicode]. FWIW We are having a long discussion about this topic in the mypy tracker: https://github.com/JukkaL/mypy/issues/1135 -- interested parties are invited to participate there! -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 22, 2016 at 11:19 AM, Random832 <random832@fastmail.com> wrote:
There are many differences between PY2 and PY3, not the least in the stubs for the stdlib. If you get an expression by calling a built-in function (or anything else that's not a literal) the type depends on what's in the stub. The architecture of mypy just isn't designed to take two different sets of stubs (and other differences in rules, e.g. whether something's an iterator because it defines '__next__' or 'next') into account at once. -- --Guido van Rossum (python.org/~guido)

On 22 January 2016 at 19:08, Guido van Rossum <guido@python.org> wrote:
This thread came to my attention just as I'd been thinking about a related point. For me, by far the worst Unicode-related porting issue I see is people with a confused view of what type of data reading a file will give. This is because open() returns a different type (byte stream or character stream) depending on its arguments (specifically 'b' in the mode) and it's frustratingly difficult to track this type across function calls - especially in code originally written in a Python 2 environment where people *expect* to confuse bytes and strings in this context. So, for example, I see a function read_one_byte which does f.read(1), and works fine in real use when a data file (opened with 'b') is processed, but fails when sys.stdin us used (on Python 3once someone types a Unicode character). As far as I know, there's no way for type annotations to capture this distinction - either as they are at present in Python3, nor as being discussed here. But what I'm not sure of is whether it's something that *could* be tracked by a type checker. Of course I'm also not sure I'm right when I say you can't do it right now :-) Is this something worth including in the discussion, or is it a completely separate topic? Paul

Interesting. PEP 484 defines an IO generic class, so you can write IO[str] or IO[bytes]. Maybe introducing separate helper functions that open files in text or binary mode can complement this to get a solution? On Fri, Jan 22, 2016 at 12:58 PM, Paul Moore <p.f.moore@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Jan 22, 2016, at 13:11, Guido van Rossum <guido@python.org> wrote:
Interesting. PEP 484 defines an IO generic class, so you can write IO[str] or IO[bytes]. Maybe introducing separate helper functions that open files in text or binary mode can complement this to get a solution?
The runtime types are a little weird here as well. In 3.x, open returns different types depending on the value, rather than the type, of its inputs. Also, TextIOBase is a subclass of IOBase, even though it isn't a subtype in the LSP sense, so you have to test isinstance(IOBase) and not isinstance(TextIOBase) to know that read() is going to return bytes. That's all a little wonky, but not impossible to deal with. In 2.x, most file-like objects--including file itself, which open returns--don't satisfy either ABC, and most of them can return either type from read. Having a different function for open-binary instead of a mode flag would solve this, but it seems a little late to be adding that now. You'd have to go through all your 2.x code and change every open to one of the two new functions just to statically type your code, and then change it again for 3.x. Plus, you'd need to do the same thing not just for the builtin open, but for every library that provides an open-like method. Maybe this special case is special enough that static type checkers just have to deal with it specially? When the mode flag is a literal, process it; when it's forwarded from another function, it may be possible to get the type from there; otherwise, everything is just unicode|bytes and the type checker can't know any more unless you explicitly tell it (by annotating the variable the result of open is stored in).

On Fri, Jan 22, 2016 at 1:40 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Agreed. At this level it's really hard to fix. :-(
Well, the type returned by the builtin open() never returns Unicode. For duck types (and even StringIO) it's indeed a crapshoot. :-(
Yeah, painful. Though in most cases you can also patch up individual calls using cast(IO[str], open(...)) etc.
That would be a lot of work too. We have so many other important-but-not-urgent things already that I would really like to push back on this until someone has actually tried the alternative and tells us how bad it is (like Ben Darnell did for @overload). -- --Guido van Rossum (python.org/~guido)

Instead of special-casing open() specifically, adding a 'Literal' class would solve this issue (although only in a stub file): @overload def open(mode: Literal['rb', 'wb', 'ab']) -> BufferedIOBase: ... @overload def open(mode: Literal['rt', 'wt', 'at']) -> TextIOBase: ... Literal[a,b,c] == Union[Literal[a], Literal[b], Literal[c]] for convenience purposes. To avoid repetition, func(arg: Literal='value') could be made equivalent to func(arg: Literal['value']='value'). Typecheckers should just treat this the same as the type of the value, but for cases where it knows the value (literals or aliases) check the value too. (Either by comparison for core types, or just by identity. That allows use of object() sentinel values or Enum members.)

On Jan 22, 2016, at 10:37, Brett Cannon <brett@python.org> wrote:
The only problem is that, while bytes is a builtin type in both 2.7 and 3.x, with similar behavior (especially in 3.5, where simple %-formatting code works the same as in 2.7), unicode exists in 2.x but not 3.x, so that would require people writing something like "try: unicode except: unicode=str" at the top of every file (or monkeypatching builtins somewhere) for the annotations to actually be valid 3.x code. And, if you're going to do that, using something that's already wide-spread and as close to a de facto standard as possible, like the six type suggested by Guido, seems less disruptive than inventing a new standard (even if "text" or "unicode" is a little nicer than "six.text_type"). (Or, of course, Guido could just get in his time machine and, along with restoring the u string literal prefix in 3.3, also restore the builtin name unicode as a synonym for str, and then this whole mail thread would fade out like Marty McFly.) Also, don't forget "basestring", which some 2.x code uses. A lot of such code just drops bytes support when modernizing, but if not, it has to change to something that means basestring or str|unicode in 2.x and bytes|str in 3.x. Again, six has a solution for that, string_types, and mypy could standardize on that solution too.
And does `str` represent the type for the specific version of Python mypy is running under, or is it pegged to a specific representation across Python 2 and 3? If it's the former then fine,
In six-based code, it means native string, and there are tools designed to help you go over all your str uses and decide which ones should be changed to something else (usually text_type or binary_type), but no special name to use when you decide "I really do want native str here". So, I think it makes sense for mypy to assume the same, rather than to encourage people to shadow or rebind str to make mypy happy in 2.x. Speaking of native strings: six code often doesn't use native strings for __str__, instead using explicit text, and the @python_2_unicode_compatible class decorator. Will mypy need special support for that decorator to handle those types? If so, it's probably worth adding; otherwise, it would be encouraging people to stick with native strings instead of switching to text.

Looks like our messages crossed. On Fri, Jan 22, 2016 at 11:35 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
That decorator is in the typeshed stubs and appears to work -- although it looks like it's just a noop even in PY2. If that requires tweaks please submit a bug to the typeshed project tracker ( https://github.com/python/typeshed/issues). -- --Guido van Rossum (python.org/~guido)

On Fri, 22 Jan 2016 at 11:35 Andrew Barnert <abarnert@yahoo.com> wrote:
But why do they have to be valid code? This is for Python 2/3 code which means any typing information is going to be in a comment and so it isn't important that it be valid code as-is as long as the tools involved realize what `unicode` represents. IOW if mypy knows what the `unicode` type represents in PY3 mode then what does it matter if `unicode` is not a built-in type of Python 3?
I long thought about that option, but I don't think it buys us enough to bother to add the alias for `str` in Python 3. Considering all of the other built-in tweaks you typically end up making, I don't think this one change is worth it. -Brett

On Sat, 23 Jan 2016 at 11:18 Brett Cannon <brett@python.org> wrote:
I should also mention that Guido is suggesting typing.unicode come into existence, so there is no special import guard necessary. And since you will be importing `typing` anyway for type details then having typing.unicode in both Python 2 and Python 3 is a very minor overhead.
participants (8)
-
Agustín Herranz Cecilia
-
Andrew Barnert
-
Brett Cannon
-
Guido van Rossum
-
Paul Moore
-
Random832
-
Spencer Brown
-
Terry Reedy