Structural type checking for PEP 484

Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss. https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 -- --Guido van Rossum (python.org/~guido)

Thanks for sharing, Guido. Some random thoughts: - "classes should need to be explicitly marked as protocols" If so, why are they classes in the first place? Other languages has dedicated keywords like "interface". - "recursive types"? Yes, please. I am very curious about how to as I am working a similar problem. It would basically require defining of the protocol first and then populating its member as they might use the protocol's name. pyfu is supposed to do exactly this. But it's not going to work 100% when metaclasses come into the game. Best, Sven On 09.09.2015 22:17, Guido van Rossum wrote:

On Wed, Sep 9, 2015 at 2:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
I want to preserve compatibility with earlier Python versions (down to 3.2), and this makes it impossible to add any new syntax. Also, there is no need to add a keyword as there are other existing mechanisms which are good enough, including base classes (as in the proposal) and class decorators. I don't think that this will become a very commonly used language feature, and thus adding special syntax for this doesn't seem very important. My expectation is that structural subtyping would be primarily useful for libraries and frameworks. Jukka

Not specifically about this proposal but about the effort put into Python typehinting in general currently: What are the supposed benefits? I somewhere read that right now tools are able to infer 60% of the types. That seems pretty good to me and a lot of effort on your side to make some additional 20?/30? %. Don't get me wrong, I like the theoretical and abstract discussions around this topic but I feel this type of feature way out of the practical realm. I don't see the effort for adding type hints AND the effort for further parsing (by human eyes) justified by partially better IDE support and 1 single additional test within test suites of about 10,000s of tests. Especially, when considering that correct types don't prove functionality in any case. But tested functionality in some way proves correct typing. Just my two cents since I felt I had to say this and maybe I am missing something. :) Best, Sven On 09.09.2015 22:17, Guido van Rossum wrote:

On Wed, Sep 9, 2015 at 3:02 PM, Sven R. Kunze <srkunze@mail.de> wrote:
This has been discussed almost to the death before, but there are some of main the benefits as I see them: - Code becomes more readable. This is especially true for code that doesn't have very detailed docstrings. This may go against the intuition of some people, but my experience strongly suggests this, and many others who've used optional typing have shared the sentiment. It probably takes a couple of days before you get used to the type annotations, after which they likely won't distract you any more but will actually improve code understanding by providing important contextual information that is often difficult to infer otherwise. - Tools can automatically find most (simple) bugs of certain common kinds in statically typed code. A lot of production code has way below 100% test coverage, so this can save many manual testing iterations and help avoid breaking stuff in production due to stupid mistakes (that humans are bad at spotting). - Refactoring becomes way less scary, especially if you don't have close to 100% test coverage. A type checker can find many mistakes that are commonly introduced when refactoring code. You'll get the biggest benefits if you are working on a large code base mostly written by other people with limited test coverage and little comments or documentation. You get extra credit if your tests are slow to run and flaky, as this slows down your iteration speed, whereas type checking can be quick (with the right tools, which might not exist as of now ;-). If you have a small (say, less than 10k lines) code base you've mostly written yourself and have meticuously documented everything and have 95% test coverage and your full test suite runs in 10 seconds, you'll probably get less out of it. Context matters.
Such a tool can't infer 40% of the types. This probably includes most of the tricky parts of the program that I'd actually like to statically check. A type checker that uses annotations might understand 95% of the types, i.e. it would miss 5% of the types. This seems like a reasonable figure for code that has been written with some thought about type checkability. I consider that difference pretty significant. I wouldn't want to increase the fraction of unchecked parts of my annotated code by a factor of 8, and I want to have control over which parts can be type checked. Jukka

Jukka, thank you very much for working on such a hard topic and being patient enough to respond to issues that I am sure were exhaustively discussed before (but I was not following the discussions then since I was in the final sprint for my book, Fluent Python, at the time). I have two questions which were probably already asked before, so feel free to point me to relevant past messages: 1) Why is a whole new hierarchy of types being created in the typing module, instead of continuing the hierarchy in the collections module while enhancing the ABCs already there? For example, why weren't the List and Dict type created under the existing MutableSequence and MutableMapping types in collections.abc? 2) Similarly, I note that PEP-484 shuns existing ABCs like those in the numbers module, and the ByteString ABC. The reasons given are pragmatic, so that users don't need to import the numbers module, and would not "have to write typing.ByteString everywhere." as the PEP says... I don not understand these arguments because: a) as you just wrote in another message, the users will be primarily the authors of libraries and frameworks, who will always be forced to import typing anyhow, so it does not seem such a burden to have them import other modules get the benefits of type hinting; b) alternatively, there could be aliases of the relevant ABCs in the typing module for convenience So the second question is: what's wrong with points (a) and (b), and why did PEP-484 keep such a distance form existing ABCs in general? I understand pragmatic choices, but as a teacher and writer I know such choices are often obstacles to learning because they seem arbitrary to anyone who is not privy to the reasons behind them. So I'd like to better understand the reasoning, and I think PEP-484 is not very persuasive when it comes to the issues I mentioned. Thanks! Best, Luciano -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg

On Thu, Sep 10, 2015 at 3:01 AM, Luciano Ramalho <luciano@ramalho.org> wrote:
There are two main reasons. First, we wanted typing to be backward compatible down to Python 3.2, and so all the new features had to work without any changes to other standard library modules. Second, the module is provisional and it would be awkward to have non-provisional standard library modules depend on or closely interact with a provisional module. Also, List and Dict are actually type aliases for regular classes (list and dict, respectively) and so they actually represent subclasses of MutableSequence and MutableMapping as defined in collections.abc. They aren't proper classes so they don't directly play a role at runtime outside annotations.
I meant that protocols will likely be often *defined* in libraries or frameworks (or their stubs). Almost any code can *use* protocols in annotations, but user code might be less likely to define additional protocols. That's just a guess and I could be easily proven wrong, though. b) alternatively, there could be aliases of the relevant ABCs in the
typing module for convenience
There are other reasons for not using ABCs for things like numbers. For example, a lot of standard library functions expect concrete numeric types and won't accept arbitrary subclasses of the ABCs. For example, you couldn't pass a value with the numbers.Integral type to math.sin, because it expects an int or a float. Using ABCs instead of int, float or str wouldn't really work well (or at all) for type checking.
So the second question is: what's wrong with points (a) and (b), and why did PEP-484 keep such a distance form existing ABCs in general?
See above. There are more reasons but those that I mentioned are some of the more important ones. If you are still unconvinced, ask for more details and maybe I'll dig through the archives. :-)
Yeah, PEP 484 doesn't go through the rationale and subtleties in much detail. Maybe there should be a separate rationale PEP and we could just link to it when we get asked some of these (quite reasonable, mind you!) questions again. ;-) Jukka

On 10.09.2015 06:12, Jukka Lehtosalo wrote:
This has been discussed almost to the death before,
I am sorry. :)
If I have code without docstrings, I better write docstrings then. ;) I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API. If my variables have crappy names, so I need to add type hints to them, well, then, I rather fix them first.
If I had large untested and undocumented code base (well I actually have), then static type checking would be ONE tool to find out issues. Once found out, I write tests as hell. Tests, tests, tests. I would not add type annotations. I need tested functionality not proper typing.
You get extra credit if your tests are slow to run and flaky,
We are problem solvers. So, I would tell my team: "make them faster and more reliable".
Granted. But you still don't know if your code runs correctly. You are better off with tests. And I agree type checking is 1 test to perform (out of 10K). But:
I didn't see you respond to that. But you probably know that. :) Thanks for responding anyway. It is helpful to see your intentions, though I don't agree with it 100%. Moreover, I think it is about time to talk about this. If it were not you, somebody else would finally have added type hints to Python. Keep up the good work. +1 Best, Sven

On Sep 10, 2015, at 09:42, Sven R. Kunze <srkunze@mail.de> wrote:
I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API.
As a bit of useless anecdotal evidence: After starting to play with MyPy when Guido first announced the idea, I haven't actually started using static type checking seriously, but I have started writing annotations for some of my functions. It feels like a concise and natural way to say "this function wants two integers", and it reads as well as it writes. Of course there's no reason I couldn't have been doing this since 3.0, but I wasn't, and now I am. Try playing around with it and see if you get the same feeling. Since everyone is thinking about the random module right now, and it makes a great example of what I'm talking about, specify which functions take/return int vs. float, which need a real int vs. anything Integral, etc., and how much more easily you absorb the information than if it's in the middle of a sentence in the docstring. Anyway, I don't actually annotate every function (or every function except the ones that are so simple that any checker or reader that couldn't infer the types is useless, the way I would in Haskell), just the ones where the types seem like an important part of the semantics. So I haven't missed the more complex features the way I expected to. But I've still got no problem with them being added as we go along, of course. :)

On 11.09.2015 00:22, Andrew Barnert wrote:
Thanks for the anecdote. It's good to hear you don't do it for every function and I am glad it helps you a lot. :) Do you know what makes me sad? If you do that for this function but don't do it for another what is the guideline then? Python Zen tells us to have one obvious way to do sth. At least for me, it's not obvious anymore when to annotate and when not to annote. Just a random guess depending on the moon phase? :( Sometimes and sometimes that. That can't be right for something to basic like types. Couldn't these problems not be solved by further research on typecheckers? Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. But as said, I like the theoretical discussion around it. :) Best, Sven

Sven R. Kunze writes:
No. There's a simple rule: if it's obvious to you that type annotation is useful, do it. If it's not obvious you want it, you don't, and you don't do it. You obviously are unlikely to do it for some time, if ever. Me too. But some shops want to use automated tools to analyze these things, and I don't see why there's a problem in providing a feature that makes it easier for them to do that.
So don't, nothing else in the language depends on type annotation or on running a type checker for that matter. What's your point? That you'll have to read them in the stdlib? Nope; the stdlib will use stubfiles where it uses type annotations at all for the foreseeable future. That your employer might make you use them? That's the nature of employment. And if you can't convince your boss that annotations have no useful role in a program written in good style, why would you expect to convince us?

On Wed, Sep 16, 2015 at 10:57:29PM +0200, Sven R. Kunze wrote:
This is no different from when to document and when to write tests. In a perfect world, every function is fully documented and fully tested. But in reality we have only a limited about of time to spend writing code, and only a portion of that is spent writing documentation and tests, so we have to prioritise. Some functions are less than fully documented and less than fully tested. How do you decide which ones get your attention? People will use the same sort of heuristic for deciding which functions get annotated: - does the function need annotations/documentation/tests? - do I have time to write annotations/documentation/tests? - is my manager telling me to add annotations/documentation/tests? - if I don't, will bad things happen? - if it easy or interesting to add them? - or difficult and boring? Don't expect to hold annotations up to a higher standard than we already hold other aspects of programming. -- Steve

On 17.09.2015 05:59, Steven D'Aprano wrote:
I fear I am not convinced of that analogy. Tests and documentation is all or nothing. Either you have them or you don't and one is not worthier than another. Type annotations (as far as I understand them) are basically completing a picture of 40%-of-already-inferred types. So, I have difficulties to infer which parameters actually would benefit from annotating. I am either doing redundant work (because the typechecker is already very well aware of the type) or I actually insert explicit knowledge (which might become redundant in case typecheckers actually become better).

On Thu, Sep 17, 2015 at 11:24:53PM +0200, Sven R. Kunze wrote:
I don't think they are all or nothing. I think it is possible to have incomplete documentation and partial test coverage -- it isn't like you go from "no documentation at all and zero tests" to "fully documented and 100% test coverage" in a single step. Unless you are religiously following something like Test Driven Development, where code is always written to follow a failed test, there will be times where you have to decide between writing new code or improving test coverage. Other choices may include: - improve documentation; - fix bugs; - run a linter and fix the warnings it generates; Adding "fix type errors found by the type checker" doesn't fundamentally change the nature of the work. You are still deciding what your priorities are, according to the needs of the project, your own personal preferences, and the instructions of your project manager (if you have one).
Type annotations (as far as I understand them) are basically completing a picture of 40%-of-already-inferred types.
That's one use-case for them. Another use-case is as documentation: def agm(x:float, y:float)->float: """Return the arithmetic-geometric mean of x and y.""" versus def agm(x, y): """Return the arithmetic-geometric mean of x and y. Args: x (float): A number. y (float): A number. Returns: float: The agm of the two numbers. """
So, I have difficulties to infer which parameters actually would benefit from annotating.
The simplest process may be something like this: - run the type-checker in a mode where it warns about variables with unknown types; - add just enough annotations so that the warnings go away. This is, in part, a matter of the quality of your tools. A good type checker should be able to tell you where it can, or can't, infer a type.
You make it sound like, alone out of everything else in Python programming, once a type annotation is added to a function it is carved in stone forever, never to be removed or changed :-) If you add redundant type annotations, no harm is done. For example: def spam(n=3): return "spam"*n A decent type-checker should be able to infer that n is an int. What if you add a type annotation? def spam(n:int=3): return "spam"*n Is that really such a big problem that you need to worry about this? I don't think so. The choice whether to rigorously stamp out all redundant type annotations, or leave them in, is a decision for your project. There is no universal right or wrong answer. -- Steve

On 18.09.2015 05:00, Steven D'Aprano wrote:
This was a misunderstanding. The "all or nothing" wasn't about "test everything or don't do it at all". It was about the robustness of future benefits you gain from it. Either you have a test or you don't. With type annotations you have 40% or 60% *depending* on the quality of the tool you use. It's fuzzy. I don't like to build stuff on jello. Just my personal feeling here.
The type annotation explains nothing. The short doc-string "arithmetic-geometric mean" explains everything (or prepare you to google it). So, I would prefer this one: def agm(x, y): """Return the arithmetic-geometric mean of x and y."""
You see? Depending on who runs which tools, type annotations need to be added which are redundant for one tool and not for another and vice versa. (Yes, we allow that because we grant the liberty to our devs to use the tools they perform best with.) Coverage, on the other hand, is strict. Either you traverse that line of code or you don't (assuming no bugs in the coverage tools).
Let me reformulate my point: it's not about setting things in stone. It's about having more to read/process mentally. You might think, 'nah, he's exaggerating; it's just one tiny little ": int" more here and there', but these things build up slowly over time, due to missing clear guidelines (see the fuzziness I described above). Devs will simply add them just everywhere just to make sure OR ignore the whole concept completely. It's simply not good enough. :( Nevertheless, I like the protocol idea more as it introduces actual names to be exposed by IDEs without any work from the devs. That's great! You might further think, 'you're so lazy, Sven. First, you don't want to help the type checker but you still want to use it?' Yes, I am lazy! And I already benefit from it when using PyCharm. It might not be perfect but it still amazes me again and again what it can infer without any type annotations present.
It's nothing seriously wrong with it (except what I described above). However, these examples (this one in particular) are/should not be real-world code. The function name is not helpful, the parameter name is not helpful, the functionality is a toy. My observation so far: 1) Type checking illustrates its point well when using academic examples, such as the tuples-of-tuples-of-tuples-of-ints I described somewhere else on this thread or unreasonably short toy examples. (This might be domain specific; I can witness it for business applications and web applications none of which actually need to solve hard problems admittedly.) 2) Just using constant and sane types like a class, lists of single-class instances and dicts of single-class instances for a single variable enables you to assign a proper name to it and forces you to design a reasonable architecture of your functionality by keeping the level of nesting at 0 or 1 and split out pieces into separate code blocks. Best, Sven

On Sep 18, 2015, at 10:35, Sven R. Kunze <srkunze@mail.de> wrote:
Surely gaining 40% or gaining 60% is better than gaining 0%? At any rate, if you're really concerned with this, there is research you might be interested in. The first static typer that I'm aware of that used a "fallback to any" rule like MyPy was for an ML language, and it used unsafety marking: any time it falls back to any, it marks the code unsafe, and that propagates in the obvious way. At the end of the typer run, it can tell you which parts of your program are type safe and which aren't. (It can also refactor the type safe parts into separate modules, which are then reusable in other programs, with well-defined type-safe APIs.) This sounds really nifty, and is fun to play with, but I don't think people found it useful in practice. (This is not the same as the explicit Unsafe type found in most SML descendants, where it's used explicitly, to mark FFIs and access to interval structures, which definitely is useful--although of course it's not completely unrelated.) I think someone could pretty easily write something similar around PEP 484, and then display the results in a way similar to a code coverage map. If people found it useful, that would become a quality of implementation issue for static typers, IDEs, etc. to compete on, and might be worth adding as a required feature to some future update to the standard; if not, it would just be a checklist on some typer's feature list that would eventually stop being worth maintaining. Would that solve your "40% problem" to your satisfaction?
I know that Steven wasn't expecting any of those, and will probably do the wrong thing (including silently doing something bad like silently throwing away Decimal precision or improperly extending to the complex plane). With yours, I don't know that. I may not even notice that there's a problem and just call it and get a bug months later. Even if I do notice the question, I have to read through your implementation and/or your test suite to find out if you'd considered the case, or write my own tests to find out empirically. And that's exactly what I meant earlier by annotations sometimes being useful for human readers whether or not they're useful to the checker.
The only way to avoid that is to define the type system completely and then define the inference engine as part of the language spec. The static type system is inherently an approximation of the much more powerful partly-implicit dynamic type system; not allowing it to act as an approximation would mean severely weakening Python's dynamic type system, which would mean severely weakening what you can write in Python. That's a terrible idea. Something like PEP 484 and an ecosystem of competing checkers is the only possibly useful thing that could be added to Python. If you disagree, nothing that could be feasibly added to Python will ever be useful to you, so you should resign yourself to never using static type checking (which you're allowed to do, of course).
What you're essentially arguing is that if nobody ever used dynamic types (e.g., types with __getattr__, types constructed at runtime by PyObjC or similar bridges, etc.), or dynamically-typed values (like the result of json.loads), or static types that are hard to express manually (like ADTs or dependent types), we could easily build a static type checker that worked near-perfectly, and then we could define exactly where you do and don't need to annotate types. That's true, but it effectively means restricting yourself to the Java type system. Which sucks. There are many things that are easy to write readably in Python (or in Haskell) that require ugliness in Java simply because its type system is too weak. Restricting Python (or even idiomatic Python) to the things that could Java-typed would seriously weaken the language, to the point where I'd rather go find a language that got duck typing right than stick with it. You could argue that Swift actually does a pretty good job of making 90% of your code just work and making it as non-ugly as possible to force the rest of the 10% through escapes in the type system (at least for many kinds of programs). But this actually required a more complicated type system than the one you're suggesting--and, more importantly, it involved explicitly designing the language and the stdlib around that goal. Even the first few public betas didn't work for real programs without a lot of ugliness, requiring drastic changes to the language and stdlib to make it usable. Imagine how much would have to change about a language that was designed for duck typing and grew organically over two and a half decades. Also, there are many corners of Swift that have inconsistently ad-hoc rules that make it much harder to fit the entire language into your brain than Python, despite the language being about the same size. A language that you developed out of performing a similar process on Python might be a good language, maybe even better than Swift, but it would be not be Python, and would not be useful for the same kinds of projects where a language-agnostic programmer would choose Python over other alternatives.

On Sep 16, 2015, at 13:57, Sven R. Kunze <srkunze@mail.de> wrote:
Sometimes and sometimes that. That can't be right for something to basic like types.
Types aren't as basic as you think, and assuming they are leads you to design languages like Java, that restrict you to working within unnecessary constraints. For an obvious example, what's the return type of json.loads (or, worse, eval)? Haskell, Dependent ML, and other languages have made great strides in working out how to get most of the power of a language like Python (and some things Python can't do, too) in a type-driven paradigm, but there's still plenty of research to go. And, even if that were a solved problem, nobody wants to rewrite Python as an ML dialect, and nobody would use it if you did. Python solves the json.loads problem by saying its runtime type is defined lazily and implicitly by the data. And there's no way any static type checker can possibly infer that type. A good statically-typed language can make it a lot easier to handle than a bad one like Java, but it will be very different from Python.
Couldn't these problems not be solved by further research on typecheckers?
I'm not sure which problems you want solved. If you want every type to be inferable, for a language with a sufficiently powerful type system, that's provably equivalent to the halting problem, so it's not going to happen. More importantly, we already have languages with a powerful static type system and a great inference engine, and experience with those languages shows that it's often useful to annotate some types for readability that the inference engine could have figured out. If a particular function is more understandable to the reader when it declared its parameter types, I can't imagine what research anyone would do that would cause me to stop wanting to declaring those types. Also, even when you want to rely on inference, you still want the types to have meaningful names that you can read, and could have figured out how to construct on your own, for things like error messages, debuggers, and reflective code. So, the work that Jukka is proposing would still be worth doing even if we had perfect inference.
Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production.
The problem here is that you're coming from C++/C#/Java, which are terrible examples of static typing. Disliking static typing because of Java is like disliking dynamic typing because of Tcl. I won't get into details of why they're so bad, but: if you don't have the time to learn you a Haskell for great good, you can probably at least pick up Boo in an hour or so, to at least see what static typing is like with inference by default and annotations only when needed in a very pythonesque languages, and that will give you half the answer.

On 17.09.2015 07:56, Andrew Barnert wrote:
I'm not sure which problems you want solved.
If you want every type to be inferable, for a language with a sufficiently powerful type system, that's provably equivalent to the halting problem, so it's not going to happen.
Nobody said it must be perfect. It just needs to be good enough.
More importantly, we already have languages with a powerful static type system and a great inference engine, and experience with those languages shows that it's often useful to annotate some types for readability that the inference engine could have figured out. If a particular function is more understandable to the reader when it declared its parameter types, I can't imagine what research anyone would do that would cause me to stop wanting to declaring those types.
Because it's more code, redundant, needs to me maintained and so on and so forth.
Also, even when you want to rely on inference, you still want the types to have meaningful names that you can read, and could have figured out how to construct on your own, for things like error messages, debuggers, and reflective code. So, the work that Jukka is proposing would still be worth doing even if we had perfect inference.
I totally agree (and I said this before). Speaking of meaningful names, which name(s) are debuggers supposed to show when there is a multitude of protocols that would fit?
Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. The problem here is that you're coming from C++/C#/Java, which are terrible examples of static typing. Disliking static typing because of Java is like disliking dynamic typing because of Tcl. I won't get into details of why they're so bad, but: if you don't have the time to learn you a Haskell for great good, you can probably at least pick up Boo in an hour or so, to at least see what static typing is like with inference by default and annotations only when needed in a very pythonesque languages, and that will give you half the answer.
I came across Haskell quite some time ago and I have to admit it feels not natural but for other reasons than its typing system and inference.

On Thu, Sep 10, 2015 at 9:42 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Even good variable names can leave the type ambiguous. And besides, if you assume that all code is perfect or can be made perfect I think that you've already lost the discussion. Reality disagrees with you. ;-) You can't just wave a magic wand and to get every programmer to document their code and write unit tests. However, we know quite well that programmers are perfectly capable of writing type annotations, and tools can even enforce that they are present (witness all the Java code in existence). Tools can't verify that you have good variable names or useful docstrings, and people are too inconsistent or lazy to be relied on.
Sure, it doesn't solve everything.
Once found out, I write tests as hell. Tests, tests, tests. I would not add type annotations. I need tested functionality not proper typing.
Most programmers only have limited time for improving existing code. Adding type annotations is usually easier that writing tests. In a cost/benefit analysis it may be optimal to spent half the available time on annotating parts of the code base to get some (but necessarily limited) static checking coverage and spend the remaining half on writing tests for selected parts of the code base, for example. It's not all or nothing.
But you'd probably also ask them to implement new features (or *your* manager might be unhappy), and they have to find the right balance, as they only have 40 hours a week (or maybe 80 hours if you work at an early-stage startup :-). Having more tools gives you more options for spending your time efficiently.
Actually a type checker can verify multiple properties of a typical line of code. So for 10k lines of code, complete type checking coverage would give you the equivalent of maybe 30,000 (simple) tests. :-P And I'm sure it would take much less time to annotate your code than to manually write the 30,000 test cases.
This is a variation of an old argument, which goes along the lines of "if you have tests and comments (and everybody should, of course!) type checking doesn't buy you anyhing". But if the premise can't be met, the argument doesn't actually say anything about the usefulness of type checking. :-) It's often not cost effective to have good test coverage (and even 100% line coverage doesn't give you full coverage of all interactions). Testing can't prove that your code doesn't have defects -- it just proves that for a tiny subset of possible inputs you code works as expected. A type checker may be able to prove that for *all* possible inputs your code doesn't do certain bad things, but it can't prove that it does the good things. Neither subsumes the other, and both of these are approaches are useful and complementary (but incomplete). I think that there was a good talk basically about this at PyCon this year, by the way, but I can't remember the title. Jukka

On 11.09.2015 08:24, Jukka Lehtosalo wrote:
Try harder then.
Not sure where I said this.
You can't just wave a magic wand and to get every programmer to add type annotations to their code. However, we know quite well that programmers are perfectly capable of writing unit tests, and tools can even enforce that they are present (witness coverage tools and hooks in SCM systems preventing it from dropping). [ Interesting, that it was that easy to exchange the parts you've given me ;) ] Btw. have you heard of code review?
Tools can't verify that you have good variable names or useful docstrings, and people are too inconsistent or lazy to be relied on.
Same can be said for type annotations.
I would like to peer-review that cost/benefit analysis you've made to see whether your numbers are sane.
Yes, I am going to tell him: "Hey, it doesn't work but we got all/most of the types right."
I think you should be more specific on this. Using hypothesis, e.g., you can easily increase the number of simple tests as well. What I can tell is that most of the time, a variable carries the same type. It is really convenient that it doesn't have to but most of the time it does. Thus, one test run can probably reveal a dangerous type mistake. I've seen code where that is not the case indeed and one variable is either re-used or accidentally have different types. But, well, you better stay away from it anyway because most of the time it's very old code. Moreover, in order to add *reasonable* type annotations you would probably invest equal amount of time that you would invest to write some tests for it. The majority of time is about *understanding* the code. And there, better variable names help a lot.
I fully agree on this. Yet I don't need type annotations. ;) A simple test running a typechecker working at 40%-60% (depending on whom you ask) efficiency suffices at least for me. I would love to see better typecheckers rather than cluttering our code with some questionable annotations; btw. of which I don't know of are necessary at all. Don't be fooled by the possibility of dynamic typing in Python. Just because it's possible doesn't necessarily mean it's the usual thing.
I think that there was a good talk basically about this at PyCon this year, by the way, but I can't remember the title.
It'll be great to have it. :) Best, Sven

On September 16, 2015 3:42:20 PM CDT, "Sven R. Kunze" <srkunze@mail.de> wrote:
def process_integer_coordinate_tuples(integer_tuple_1, integer_tuple_2, is_fast): ... vs def process_coords(t1: Tuple[int, int], t2: Tuple[int, int], fast: bool): ... Java's fatal mistake.
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

Embedding type names in arguments and method names. On Thu, Sep 17, 2015 at 4:45 PM, Sven R. Kunze <srkunze@mail.de> wrote:
You said:
Even good variable names can leave the type ambiguous.
These are names that don't leave anything ambiguous! :D Really, though: relying on naming to make types explicit fails badly whenever you start refactoring and makes hell for the users of the API you made. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 17.09.2015 23:56, Ryan Gonzalez wrote:
I was actually confused by 'Java' in your reply.
They just do. Because they don't tell me why I would want to call that function and with what. If any of these versions is supposed to represent good style, you still need to learn a lot.
Professional refactoring would not change venerable APIs. It would provide another version of it and slowly deprecate the old one. Not sure where you heading here but do you say t1 and t2 are good names? Not sure how big the applications you work with are but those I know of are very large. So, I am glad when 2000 lines and 10 files later a variable somehow tells me something about it. And no "*Tuple[int, int]*" doesn't tell me anything (even when an IDE could tell that). Most of the time when discussing typecheckers and so forth, I get the feeling people think most applications are using data structures like *tuples of tuples of tuples of ints*. That is definitely not the case (anymore). Most of the time the data types are instances, list of instances and dicts of instances. That's one reason I somehow like Jukka's structural proposal because I actually can see some real-world benefit which goes beyond the tuples of tuples and that is: *inferring proper names*. Best, Sven

On Thu, Sep 17, 2015 at 04:56:33PM -0500, Ryan Gonzalez wrote:
Embedding type names in arguments and method names.
supposedly being "Java's fatal mistake". I'm not sure that Java developers commonly make a practice of doing that. It would be strange, since Java requires type declarations. I'm not really a Java guy, but I think this would be more like what you would expect: public class Example{ public void processCoords(Point t1, Point t2, boolean fast){ ... } where Point is equivalent to a (int, int) tuple. You seem to be describing a verbose version of "Apps Hungarian Notation". I don't think Hungarian Notation was ever standard practice in the Java world, although I did find at least one tutorial (from 1999) recommending it: http://www.developer.com/java/ent/article.php/615891/Applying-Hungarian-Nota... In any case, I *think* that your intended lesson is that type annotations can increase the quality of code even without a type checker, as they act as type documentation to the reader. I agree with that. -- Steve

On Sep 9, 2015, at 13:17, Guido van Rossum <guido@python.org> wrote:
Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss.
https://github.com/ambv/typehinting/issues/11#issuecomment-138133867
Are we going to continue to have (both implicit and explicit) ABCs in collections.abc, numbers, etc., and also have protocols that are also ABCs and are largely parallel to them (and implicit at static checking time whether they're implicit or explicit at runtime) In typing? If so, I think we've reached the point where the two parallel hierarchies are a problem. Also, why are both the terminology and implementation so different from what we already have for ABCs? Why not just have a decorator or metaclass that can be added to ABCs that makes them implicit (rather than writing a manual __subclasshook__ for each one), which also makes them implicit at static type checking time, which means there's no need for a whole separate but similar notion? I'm not sure why it's important to also have some times that are implicit at static type checking time but not at runtime, but if there is a good reason, that just means two different decorators/metaclasses/whatever (or a flag passed to the decorator, etc.). Compare: Hashable is an implicit ABC, Sequence is an explicit ABC, Reversible is an implicit-static/explicit-runtime ABC. Hashable is an implicit ABC and also a Protocol that's an explicit ABC, Sequence is an explicit ABC and not a Protocol, Reversible is a Protocol that's an explicit ABC. The first one is clearly simpler; is there some compelling reason that makes the second one better anyway?

On Wed, Sep 9, 2015 at 3:08 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
I'm not proposing creating protocols for numbers or most collection types. I'd change some of the existing ABCs (mentioned in the proposal, including things like Sized) in typing into equivalent protocols, but they'd still support isinstance as before and would be functionally almost identical to the existing ABCs. I clarified the latter fact in the github issue.
Protocol would use a metaclass that is derived from the ABC metaclass, and it would be similar to the Generic class that we already have. The reason why the proposal doesn't use an explicit metaclass or a class decorator is consistency. It's possible to define generic protocols by having Protocol[t, ...] as a base class, which is consistent with how Generic[...] works. The latter is already part of typing, and introducing a similar concept with a different syntax seems inelegant to me. Consider a generic class: class Bucket(Generic[T]): ... Now we can have a generic protocol using a very similar syntax: class BucketProtocol(Protocol[T]): ... I wonder how we'd use a metaclass or a class decorator to represent generic protocols. Maybe something like this: @protocol[T] class BucketProtocol: ... However, this looks quite different from the Generic[...] case and thus I'd rather not use it. I guess if we'd have picked this syntax for generic classes it would make more sense: @generic[T] class Bucket: ...
I'm not sure if I fully understand what you mean by implicit vs. explicit ABCs (and the static/runtime distinction). Could you define these terms and maybe give some examples of each? Note that in my proposal a protocol is just a kind of ABC, as GenericMeta is a subclass of ABCMeta and protocol would have a similar metaclass (or maybe even the same one), even though I'm not sure if I explicitly mentioned that. Every protocol is also an ABC. Jukka

On 2015-09-09 13:17, Guido van Rossum wrote:
I'm not totally hip to all the latest typing developments, but I'm not sure I fully understand the benefit of this protocol concept. At the beginning it says that classes have to be explicitly marked to support these protocols. But why is that? Doesn't the existing __subclasshook__ already allow an ABC to use any criteria it likes to determine if a given class is considered a subclass? So couldn't ABCs like the ones we already have inspect the type annotations and decide a class "counts" as an iterable (or whatever) if it defines the right methods with the right type hints? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 10.09.2015 03:50, Brendan Barnwell wrote:
You bet what I am.
The benefit from what I understand is actually really, really nice. It's basically adding the ability to shorten the following 'capability' check: if hasattr(obj, 'important') and hasattr(obj, 'relevant') and hasattr(obj, 'necessary'): # do to if implements(obj, protocol): # do As usual with type hints, functionality is not guaranteed. But it simplifies sanity checks OR decision making: if implements(obj, protocol1): # do this elif implements(obj, (protocol2, protocol3)): # do that The ability to extract all protocols of a type would provide a more flexible way of decision making and processing such as: if my_protocol in obj.__protocols__: # iterate over the protocols and do something @Jukka I haven't found the abilities described above. Would it make sense to add it (except it's already there)? Best, Sven

On 2015-09-10 10:01, Sven R. Kunze wrote:
Right, but can't you already do that with ABCs, as in the example in the docs (https://docs.python.org/2/library/abc.html)? You can write an ABC whose __subclasshook__ does whatever hasattr checks you want (and, if you want, checks the type annotations too), and then you can use isinstance/issubclass to check if a given instance/class "provides the protocol" described by that ABC. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 10.09.2015 20:24, Brendan Barnwell wrote:
You might probably be write. Maybe, it's that this kind of "does whatever hasattr checks you want" gets standardized via the protocol base class. Pondering about this idea further, current Python actually gives enough means to do that on runtime. If I rely on method A to be present at object b, Python will give me simply an AttributeError and that'll suffice. So, it's only for the static typechecker again. Best, Sven

On 9 September 2015 at 21:17, Guido van Rossum <guido@python.org> wrote:
Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss.
Some good feedback has been provided in this thread already, but I want to provide an enthusiastic +1 for this change. I'm one of the people who has been extremely lukewarm towards the Python type hints proposal, but I believe this addresses one of my major areas of concern. Overall the proposal seems like a graceful solution to many of the duck typing problems. It does not address all of them, particularly around classes that may dynamically (but deterministically) modify themselves to satisfy the constraints of the Protocol (e.g. by generating methods for themselves at instantiation-time), but that's a pretty hairy use-case and there's not much that a static type checker could do about it anyway. Altogether this looks great (modulo a couple of small concerns raised by others), and it's enough for me to consider using static type hints on basically all my projects with the ongoing exception of Requests (which has duck typing problems that this cannot solve, I think). Great work Jukka!

On 09.09.2015 22:17, Guido van Rossum wrote:
*15) How would|Protocol|be implemented? *"Implement metaclass functionality to detect whether a class is a protocol or not. Maybe add a class attribute such as __protocol__ = True if that's the case" If you consider the __protocols__ attribute I mentioned in an earlier post, I would like to see __protocol__ renamed to __is_protocol__. I think that would make it more readable in the long run. Best, Sven

I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea.

On Wednesday, September 9, 2015 at 1:19:12 PM UTC-7, Guido van Rossum wrote:
I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol A? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea.

On Sep 10, 2015, at 11:57, Matthias Kramm via Python-ideas <python-ideas@python.org> wrote:
I don't understand this, given that resorting to protocols is basically the same thing as resorting to ABCs. Clearly there's some perceiving difficulty or complexity of ABCs within the Python community that makes people not realize how simple and useful they are. But I don't see how adding something that's nearly equivalent but different and maintaining the two in parallel is a good solution to that problem. There are some cases where the fact that ABCs rely on a metaclass makes them problematic where Protocols aren't (basically, where you need another metaclass), but I doubt that's the case you're worried about.

On Thu, Sep 10, 2015 at 11:57 AM, Matthias Kramm via Python-ideas < python-ideas@python.org> wrote:
The proposal doesn't spell out the rules for subtyping, but we should follow the ordinary rules for subtyping for functions, and return types would behave covariantly. So the answer is yes.
The proposal also lets you define the protocols implemented by your class explicitly, and without having the explicit Protocol base class or some other marker these would be impossible to distinguish in general. Example: class MyList(Sized): # I want this to be a normal class, not a protocol. def __len__(self) -> int: return self.num_items class DerivedProtocol(Sized): # This should actually be a protocol. def foo(self) -> int: ...
We could use that. The tradeoff it that then we'd have some inconsistency depending on whether a protocol is generic or not: @protocol class A(metaclass=ProtocolMeta): # Non-generic protocol ... @protocol class B(Generic[T]): # Generic protocol. But this has a different metaclass than the above? ... I'm not sure if we can use ABCMeta for protocols as protocols may need some additional metaclass functionality. Anyway, any proposal should consider all these possible ways of defining protocols: 1. Basic protocol, no protocol inheritance 2. Generic protocol, no protocol inheritance 3. Basic protocol that inherits one or more protocols 4. Generic protocol that inherits one or more protocols My approach seems to deal with all of these reasonable well in my opinion (but I haven't implemented it yet!), but the tradeoff is that the Protocol base class needs to be present for all protocols. Jukka

On Thursday, September 10, 2015 at 11:38:48 PM UTC-7, Jukka Lehtosalo wrote:
Ok. Note that this introduces some weird corner cases when trying to decide whether a class implements a protocol. Consider class P(Protocol): def f() -> P class A: def f() -> A It would be both valid to say that A *does not* implement P (because the return value of f is incompatible with P) as it would be to say that A *does* implement it (because once it does, the return value of f becomes compatible with P). For a more quirky example, consider class A(Protocol): def f(self) -> B def g(self) -> str class B(Protocol): def f(self) -> A def g(self) -> float class C: def f(self) -> D: return self.x def g(self): return self.y class D: def f(self) -> C: return self.x def g(self): return self.y Short of introducing intersection types, the protocols A and B are incompatible (because the return types of g() are mutually exclusive). Hence, C and D can, respectively, conform to either A or B, but not both. So the possible assignments are: C -> A D -> B *or* C -> B D -> A . It seems undecidable which of the two is the right one. (The structural type converter in pytype solves this by dropping the "mutually exclusive" constraint to the floor and making A and B both a C *and* a D, which you can do if all you want is a name for an anonymous structural type, But here you're using your structural types in type declarations, so that solution doesn't apply) Matthias

Thanks for sharing, Guido. Some random thoughts: - "classes should need to be explicitly marked as protocols" If so, why are they classes in the first place? Other languages has dedicated keywords like "interface". - "recursive types"? Yes, please. I am very curious about how to as I am working a similar problem. It would basically require defining of the protocol first and then populating its member as they might use the protocol's name. pyfu is supposed to do exactly this. But it's not going to work 100% when metaclasses come into the game. Best, Sven On 09.09.2015 22:17, Guido van Rossum wrote:

On Wed, Sep 9, 2015 at 2:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
I want to preserve compatibility with earlier Python versions (down to 3.2), and this makes it impossible to add any new syntax. Also, there is no need to add a keyword as there are other existing mechanisms which are good enough, including base classes (as in the proposal) and class decorators. I don't think that this will become a very commonly used language feature, and thus adding special syntax for this doesn't seem very important. My expectation is that structural subtyping would be primarily useful for libraries and frameworks. Jukka

Not specifically about this proposal but about the effort put into Python typehinting in general currently: What are the supposed benefits? I somewhere read that right now tools are able to infer 60% of the types. That seems pretty good to me and a lot of effort on your side to make some additional 20?/30? %. Don't get me wrong, I like the theoretical and abstract discussions around this topic but I feel this type of feature way out of the practical realm. I don't see the effort for adding type hints AND the effort for further parsing (by human eyes) justified by partially better IDE support and 1 single additional test within test suites of about 10,000s of tests. Especially, when considering that correct types don't prove functionality in any case. But tested functionality in some way proves correct typing. Just my two cents since I felt I had to say this and maybe I am missing something. :) Best, Sven On 09.09.2015 22:17, Guido van Rossum wrote:

On Wed, Sep 9, 2015 at 3:02 PM, Sven R. Kunze <srkunze@mail.de> wrote:
This has been discussed almost to the death before, but there are some of main the benefits as I see them: - Code becomes more readable. This is especially true for code that doesn't have very detailed docstrings. This may go against the intuition of some people, but my experience strongly suggests this, and many others who've used optional typing have shared the sentiment. It probably takes a couple of days before you get used to the type annotations, after which they likely won't distract you any more but will actually improve code understanding by providing important contextual information that is often difficult to infer otherwise. - Tools can automatically find most (simple) bugs of certain common kinds in statically typed code. A lot of production code has way below 100% test coverage, so this can save many manual testing iterations and help avoid breaking stuff in production due to stupid mistakes (that humans are bad at spotting). - Refactoring becomes way less scary, especially if you don't have close to 100% test coverage. A type checker can find many mistakes that are commonly introduced when refactoring code. You'll get the biggest benefits if you are working on a large code base mostly written by other people with limited test coverage and little comments or documentation. You get extra credit if your tests are slow to run and flaky, as this slows down your iteration speed, whereas type checking can be quick (with the right tools, which might not exist as of now ;-). If you have a small (say, less than 10k lines) code base you've mostly written yourself and have meticuously documented everything and have 95% test coverage and your full test suite runs in 10 seconds, you'll probably get less out of it. Context matters.
Such a tool can't infer 40% of the types. This probably includes most of the tricky parts of the program that I'd actually like to statically check. A type checker that uses annotations might understand 95% of the types, i.e. it would miss 5% of the types. This seems like a reasonable figure for code that has been written with some thought about type checkability. I consider that difference pretty significant. I wouldn't want to increase the fraction of unchecked parts of my annotated code by a factor of 8, and I want to have control over which parts can be type checked. Jukka

Jukka, thank you very much for working on such a hard topic and being patient enough to respond to issues that I am sure were exhaustively discussed before (but I was not following the discussions then since I was in the final sprint for my book, Fluent Python, at the time). I have two questions which were probably already asked before, so feel free to point me to relevant past messages: 1) Why is a whole new hierarchy of types being created in the typing module, instead of continuing the hierarchy in the collections module while enhancing the ABCs already there? For example, why weren't the List and Dict type created under the existing MutableSequence and MutableMapping types in collections.abc? 2) Similarly, I note that PEP-484 shuns existing ABCs like those in the numbers module, and the ByteString ABC. The reasons given are pragmatic, so that users don't need to import the numbers module, and would not "have to write typing.ByteString everywhere." as the PEP says... I don not understand these arguments because: a) as you just wrote in another message, the users will be primarily the authors of libraries and frameworks, who will always be forced to import typing anyhow, so it does not seem such a burden to have them import other modules get the benefits of type hinting; b) alternatively, there could be aliases of the relevant ABCs in the typing module for convenience So the second question is: what's wrong with points (a) and (b), and why did PEP-484 keep such a distance form existing ABCs in general? I understand pragmatic choices, but as a teacher and writer I know such choices are often obstacles to learning because they seem arbitrary to anyone who is not privy to the reasons behind them. So I'd like to better understand the reasoning, and I think PEP-484 is not very persuasive when it comes to the issues I mentioned. Thanks! Best, Luciano -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg

On Thu, Sep 10, 2015 at 3:01 AM, Luciano Ramalho <luciano@ramalho.org> wrote:
There are two main reasons. First, we wanted typing to be backward compatible down to Python 3.2, and so all the new features had to work without any changes to other standard library modules. Second, the module is provisional and it would be awkward to have non-provisional standard library modules depend on or closely interact with a provisional module. Also, List and Dict are actually type aliases for regular classes (list and dict, respectively) and so they actually represent subclasses of MutableSequence and MutableMapping as defined in collections.abc. They aren't proper classes so they don't directly play a role at runtime outside annotations.
I meant that protocols will likely be often *defined* in libraries or frameworks (or their stubs). Almost any code can *use* protocols in annotations, but user code might be less likely to define additional protocols. That's just a guess and I could be easily proven wrong, though. b) alternatively, there could be aliases of the relevant ABCs in the
typing module for convenience
There are other reasons for not using ABCs for things like numbers. For example, a lot of standard library functions expect concrete numeric types and won't accept arbitrary subclasses of the ABCs. For example, you couldn't pass a value with the numbers.Integral type to math.sin, because it expects an int or a float. Using ABCs instead of int, float or str wouldn't really work well (or at all) for type checking.
So the second question is: what's wrong with points (a) and (b), and why did PEP-484 keep such a distance form existing ABCs in general?
See above. There are more reasons but those that I mentioned are some of the more important ones. If you are still unconvinced, ask for more details and maybe I'll dig through the archives. :-)
Yeah, PEP 484 doesn't go through the rationale and subtleties in much detail. Maybe there should be a separate rationale PEP and we could just link to it when we get asked some of these (quite reasonable, mind you!) questions again. ;-) Jukka

On 10.09.2015 06:12, Jukka Lehtosalo wrote:
This has been discussed almost to the death before,
I am sorry. :)
If I have code without docstrings, I better write docstrings then. ;) I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API. If my variables have crappy names, so I need to add type hints to them, well, then, I rather fix them first.
If I had large untested and undocumented code base (well I actually have), then static type checking would be ONE tool to find out issues. Once found out, I write tests as hell. Tests, tests, tests. I would not add type annotations. I need tested functionality not proper typing.
You get extra credit if your tests are slow to run and flaky,
We are problem solvers. So, I would tell my team: "make them faster and more reliable".
Granted. But you still don't know if your code runs correctly. You are better off with tests. And I agree type checking is 1 test to perform (out of 10K). But:
I didn't see you respond to that. But you probably know that. :) Thanks for responding anyway. It is helpful to see your intentions, though I don't agree with it 100%. Moreover, I think it is about time to talk about this. If it were not you, somebody else would finally have added type hints to Python. Keep up the good work. +1 Best, Sven

On Sep 10, 2015, at 09:42, Sven R. Kunze <srkunze@mail.de> wrote:
I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API.
As a bit of useless anecdotal evidence: After starting to play with MyPy when Guido first announced the idea, I haven't actually started using static type checking seriously, but I have started writing annotations for some of my functions. It feels like a concise and natural way to say "this function wants two integers", and it reads as well as it writes. Of course there's no reason I couldn't have been doing this since 3.0, but I wasn't, and now I am. Try playing around with it and see if you get the same feeling. Since everyone is thinking about the random module right now, and it makes a great example of what I'm talking about, specify which functions take/return int vs. float, which need a real int vs. anything Integral, etc., and how much more easily you absorb the information than if it's in the middle of a sentence in the docstring. Anyway, I don't actually annotate every function (or every function except the ones that are so simple that any checker or reader that couldn't infer the types is useless, the way I would in Haskell), just the ones where the types seem like an important part of the semantics. So I haven't missed the more complex features the way I expected to. But I've still got no problem with them being added as we go along, of course. :)

On 11.09.2015 00:22, Andrew Barnert wrote:
Thanks for the anecdote. It's good to hear you don't do it for every function and I am glad it helps you a lot. :) Do you know what makes me sad? If you do that for this function but don't do it for another what is the guideline then? Python Zen tells us to have one obvious way to do sth. At least for me, it's not obvious anymore when to annotate and when not to annote. Just a random guess depending on the moon phase? :( Sometimes and sometimes that. That can't be right for something to basic like types. Couldn't these problems not be solved by further research on typecheckers? Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. But as said, I like the theoretical discussion around it. :) Best, Sven

Sven R. Kunze writes:
No. There's a simple rule: if it's obvious to you that type annotation is useful, do it. If it's not obvious you want it, you don't, and you don't do it. You obviously are unlikely to do it for some time, if ever. Me too. But some shops want to use automated tools to analyze these things, and I don't see why there's a problem in providing a feature that makes it easier for them to do that.
So don't, nothing else in the language depends on type annotation or on running a type checker for that matter. What's your point? That you'll have to read them in the stdlib? Nope; the stdlib will use stubfiles where it uses type annotations at all for the foreseeable future. That your employer might make you use them? That's the nature of employment. And if you can't convince your boss that annotations have no useful role in a program written in good style, why would you expect to convince us?

On Wed, Sep 16, 2015 at 10:57:29PM +0200, Sven R. Kunze wrote:
This is no different from when to document and when to write tests. In a perfect world, every function is fully documented and fully tested. But in reality we have only a limited about of time to spend writing code, and only a portion of that is spent writing documentation and tests, so we have to prioritise. Some functions are less than fully documented and less than fully tested. How do you decide which ones get your attention? People will use the same sort of heuristic for deciding which functions get annotated: - does the function need annotations/documentation/tests? - do I have time to write annotations/documentation/tests? - is my manager telling me to add annotations/documentation/tests? - if I don't, will bad things happen? - if it easy or interesting to add them? - or difficult and boring? Don't expect to hold annotations up to a higher standard than we already hold other aspects of programming. -- Steve

On 17.09.2015 05:59, Steven D'Aprano wrote:
I fear I am not convinced of that analogy. Tests and documentation is all or nothing. Either you have them or you don't and one is not worthier than another. Type annotations (as far as I understand them) are basically completing a picture of 40%-of-already-inferred types. So, I have difficulties to infer which parameters actually would benefit from annotating. I am either doing redundant work (because the typechecker is already very well aware of the type) or I actually insert explicit knowledge (which might become redundant in case typecheckers actually become better).

On Thu, Sep 17, 2015 at 11:24:53PM +0200, Sven R. Kunze wrote:
I don't think they are all or nothing. I think it is possible to have incomplete documentation and partial test coverage -- it isn't like you go from "no documentation at all and zero tests" to "fully documented and 100% test coverage" in a single step. Unless you are religiously following something like Test Driven Development, where code is always written to follow a failed test, there will be times where you have to decide between writing new code or improving test coverage. Other choices may include: - improve documentation; - fix bugs; - run a linter and fix the warnings it generates; Adding "fix type errors found by the type checker" doesn't fundamentally change the nature of the work. You are still deciding what your priorities are, according to the needs of the project, your own personal preferences, and the instructions of your project manager (if you have one).
Type annotations (as far as I understand them) are basically completing a picture of 40%-of-already-inferred types.
That's one use-case for them. Another use-case is as documentation: def agm(x:float, y:float)->float: """Return the arithmetic-geometric mean of x and y.""" versus def agm(x, y): """Return the arithmetic-geometric mean of x and y. Args: x (float): A number. y (float): A number. Returns: float: The agm of the two numbers. """
So, I have difficulties to infer which parameters actually would benefit from annotating.
The simplest process may be something like this: - run the type-checker in a mode where it warns about variables with unknown types; - add just enough annotations so that the warnings go away. This is, in part, a matter of the quality of your tools. A good type checker should be able to tell you where it can, or can't, infer a type.
You make it sound like, alone out of everything else in Python programming, once a type annotation is added to a function it is carved in stone forever, never to be removed or changed :-) If you add redundant type annotations, no harm is done. For example: def spam(n=3): return "spam"*n A decent type-checker should be able to infer that n is an int. What if you add a type annotation? def spam(n:int=3): return "spam"*n Is that really such a big problem that you need to worry about this? I don't think so. The choice whether to rigorously stamp out all redundant type annotations, or leave them in, is a decision for your project. There is no universal right or wrong answer. -- Steve

On 18.09.2015 05:00, Steven D'Aprano wrote:
This was a misunderstanding. The "all or nothing" wasn't about "test everything or don't do it at all". It was about the robustness of future benefits you gain from it. Either you have a test or you don't. With type annotations you have 40% or 60% *depending* on the quality of the tool you use. It's fuzzy. I don't like to build stuff on jello. Just my personal feeling here.
The type annotation explains nothing. The short doc-string "arithmetic-geometric mean" explains everything (or prepare you to google it). So, I would prefer this one: def agm(x, y): """Return the arithmetic-geometric mean of x and y."""
You see? Depending on who runs which tools, type annotations need to be added which are redundant for one tool and not for another and vice versa. (Yes, we allow that because we grant the liberty to our devs to use the tools they perform best with.) Coverage, on the other hand, is strict. Either you traverse that line of code or you don't (assuming no bugs in the coverage tools).
Let me reformulate my point: it's not about setting things in stone. It's about having more to read/process mentally. You might think, 'nah, he's exaggerating; it's just one tiny little ": int" more here and there', but these things build up slowly over time, due to missing clear guidelines (see the fuzziness I described above). Devs will simply add them just everywhere just to make sure OR ignore the whole concept completely. It's simply not good enough. :( Nevertheless, I like the protocol idea more as it introduces actual names to be exposed by IDEs without any work from the devs. That's great! You might further think, 'you're so lazy, Sven. First, you don't want to help the type checker but you still want to use it?' Yes, I am lazy! And I already benefit from it when using PyCharm. It might not be perfect but it still amazes me again and again what it can infer without any type annotations present.
It's nothing seriously wrong with it (except what I described above). However, these examples (this one in particular) are/should not be real-world code. The function name is not helpful, the parameter name is not helpful, the functionality is a toy. My observation so far: 1) Type checking illustrates its point well when using academic examples, such as the tuples-of-tuples-of-tuples-of-ints I described somewhere else on this thread or unreasonably short toy examples. (This might be domain specific; I can witness it for business applications and web applications none of which actually need to solve hard problems admittedly.) 2) Just using constant and sane types like a class, lists of single-class instances and dicts of single-class instances for a single variable enables you to assign a proper name to it and forces you to design a reasonable architecture of your functionality by keeping the level of nesting at 0 or 1 and split out pieces into separate code blocks. Best, Sven

On Sep 18, 2015, at 10:35, Sven R. Kunze <srkunze@mail.de> wrote:
Surely gaining 40% or gaining 60% is better than gaining 0%? At any rate, if you're really concerned with this, there is research you might be interested in. The first static typer that I'm aware of that used a "fallback to any" rule like MyPy was for an ML language, and it used unsafety marking: any time it falls back to any, it marks the code unsafe, and that propagates in the obvious way. At the end of the typer run, it can tell you which parts of your program are type safe and which aren't. (It can also refactor the type safe parts into separate modules, which are then reusable in other programs, with well-defined type-safe APIs.) This sounds really nifty, and is fun to play with, but I don't think people found it useful in practice. (This is not the same as the explicit Unsafe type found in most SML descendants, where it's used explicitly, to mark FFIs and access to interval structures, which definitely is useful--although of course it's not completely unrelated.) I think someone could pretty easily write something similar around PEP 484, and then display the results in a way similar to a code coverage map. If people found it useful, that would become a quality of implementation issue for static typers, IDEs, etc. to compete on, and might be worth adding as a required feature to some future update to the standard; if not, it would just be a checklist on some typer's feature list that would eventually stop being worth maintaining. Would that solve your "40% problem" to your satisfaction?
I know that Steven wasn't expecting any of those, and will probably do the wrong thing (including silently doing something bad like silently throwing away Decimal precision or improperly extending to the complex plane). With yours, I don't know that. I may not even notice that there's a problem and just call it and get a bug months later. Even if I do notice the question, I have to read through your implementation and/or your test suite to find out if you'd considered the case, or write my own tests to find out empirically. And that's exactly what I meant earlier by annotations sometimes being useful for human readers whether or not they're useful to the checker.
The only way to avoid that is to define the type system completely and then define the inference engine as part of the language spec. The static type system is inherently an approximation of the much more powerful partly-implicit dynamic type system; not allowing it to act as an approximation would mean severely weakening Python's dynamic type system, which would mean severely weakening what you can write in Python. That's a terrible idea. Something like PEP 484 and an ecosystem of competing checkers is the only possibly useful thing that could be added to Python. If you disagree, nothing that could be feasibly added to Python will ever be useful to you, so you should resign yourself to never using static type checking (which you're allowed to do, of course).
What you're essentially arguing is that if nobody ever used dynamic types (e.g., types with __getattr__, types constructed at runtime by PyObjC or similar bridges, etc.), or dynamically-typed values (like the result of json.loads), or static types that are hard to express manually (like ADTs or dependent types), we could easily build a static type checker that worked near-perfectly, and then we could define exactly where you do and don't need to annotate types. That's true, but it effectively means restricting yourself to the Java type system. Which sucks. There are many things that are easy to write readably in Python (or in Haskell) that require ugliness in Java simply because its type system is too weak. Restricting Python (or even idiomatic Python) to the things that could Java-typed would seriously weaken the language, to the point where I'd rather go find a language that got duck typing right than stick with it. You could argue that Swift actually does a pretty good job of making 90% of your code just work and making it as non-ugly as possible to force the rest of the 10% through escapes in the type system (at least for many kinds of programs). But this actually required a more complicated type system than the one you're suggesting--and, more importantly, it involved explicitly designing the language and the stdlib around that goal. Even the first few public betas didn't work for real programs without a lot of ugliness, requiring drastic changes to the language and stdlib to make it usable. Imagine how much would have to change about a language that was designed for duck typing and grew organically over two and a half decades. Also, there are many corners of Swift that have inconsistently ad-hoc rules that make it much harder to fit the entire language into your brain than Python, despite the language being about the same size. A language that you developed out of performing a similar process on Python might be a good language, maybe even better than Swift, but it would be not be Python, and would not be useful for the same kinds of projects where a language-agnostic programmer would choose Python over other alternatives.

On Sep 16, 2015, at 13:57, Sven R. Kunze <srkunze@mail.de> wrote:
Sometimes and sometimes that. That can't be right for something to basic like types.
Types aren't as basic as you think, and assuming they are leads you to design languages like Java, that restrict you to working within unnecessary constraints. For an obvious example, what's the return type of json.loads (or, worse, eval)? Haskell, Dependent ML, and other languages have made great strides in working out how to get most of the power of a language like Python (and some things Python can't do, too) in a type-driven paradigm, but there's still plenty of research to go. And, even if that were a solved problem, nobody wants to rewrite Python as an ML dialect, and nobody would use it if you did. Python solves the json.loads problem by saying its runtime type is defined lazily and implicitly by the data. And there's no way any static type checker can possibly infer that type. A good statically-typed language can make it a lot easier to handle than a bad one like Java, but it will be very different from Python.
Couldn't these problems not be solved by further research on typecheckers?
I'm not sure which problems you want solved. If you want every type to be inferable, for a language with a sufficiently powerful type system, that's provably equivalent to the halting problem, so it's not going to happen. More importantly, we already have languages with a powerful static type system and a great inference engine, and experience with those languages shows that it's often useful to annotate some types for readability that the inference engine could have figured out. If a particular function is more understandable to the reader when it declared its parameter types, I can't imagine what research anyone would do that would cause me to stop wanting to declaring those types. Also, even when you want to rely on inference, you still want the types to have meaningful names that you can read, and could have figured out how to construct on your own, for things like error messages, debuggers, and reflective code. So, the work that Jukka is proposing would still be worth doing even if we had perfect inference.
Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production.
The problem here is that you're coming from C++/C#/Java, which are terrible examples of static typing. Disliking static typing because of Java is like disliking dynamic typing because of Tcl. I won't get into details of why they're so bad, but: if you don't have the time to learn you a Haskell for great good, you can probably at least pick up Boo in an hour or so, to at least see what static typing is like with inference by default and annotations only when needed in a very pythonesque languages, and that will give you half the answer.

On 17.09.2015 07:56, Andrew Barnert wrote:
I'm not sure which problems you want solved.
If you want every type to be inferable, for a language with a sufficiently powerful type system, that's provably equivalent to the halting problem, so it's not going to happen.
Nobody said it must be perfect. It just needs to be good enough.
More importantly, we already have languages with a powerful static type system and a great inference engine, and experience with those languages shows that it's often useful to annotate some types for readability that the inference engine could have figured out. If a particular function is more understandable to the reader when it declared its parameter types, I can't imagine what research anyone would do that would cause me to stop wanting to declaring those types.
Because it's more code, redundant, needs to me maintained and so on and so forth.
Also, even when you want to rely on inference, you still want the types to have meaningful names that you can read, and could have figured out how to construct on your own, for things like error messages, debuggers, and reflective code. So, the work that Jukka is proposing would still be worth doing even if we had perfect inference.
I totally agree (and I said this before). Speaking of meaningful names, which name(s) are debuggers supposed to show when there is a multitude of protocols that would fit?
Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. The problem here is that you're coming from C++/C#/Java, which are terrible examples of static typing. Disliking static typing because of Java is like disliking dynamic typing because of Tcl. I won't get into details of why they're so bad, but: if you don't have the time to learn you a Haskell for great good, you can probably at least pick up Boo in an hour or so, to at least see what static typing is like with inference by default and annotations only when needed in a very pythonesque languages, and that will give you half the answer.
I came across Haskell quite some time ago and I have to admit it feels not natural but for other reasons than its typing system and inference.

On Thu, Sep 10, 2015 at 9:42 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Even good variable names can leave the type ambiguous. And besides, if you assume that all code is perfect or can be made perfect I think that you've already lost the discussion. Reality disagrees with you. ;-) You can't just wave a magic wand and to get every programmer to document their code and write unit tests. However, we know quite well that programmers are perfectly capable of writing type annotations, and tools can even enforce that they are present (witness all the Java code in existence). Tools can't verify that you have good variable names or useful docstrings, and people are too inconsistent or lazy to be relied on.
Sure, it doesn't solve everything.
Once found out, I write tests as hell. Tests, tests, tests. I would not add type annotations. I need tested functionality not proper typing.
Most programmers only have limited time for improving existing code. Adding type annotations is usually easier that writing tests. In a cost/benefit analysis it may be optimal to spent half the available time on annotating parts of the code base to get some (but necessarily limited) static checking coverage and spend the remaining half on writing tests for selected parts of the code base, for example. It's not all or nothing.
But you'd probably also ask them to implement new features (or *your* manager might be unhappy), and they have to find the right balance, as they only have 40 hours a week (or maybe 80 hours if you work at an early-stage startup :-). Having more tools gives you more options for spending your time efficiently.
Actually a type checker can verify multiple properties of a typical line of code. So for 10k lines of code, complete type checking coverage would give you the equivalent of maybe 30,000 (simple) tests. :-P And I'm sure it would take much less time to annotate your code than to manually write the 30,000 test cases.
This is a variation of an old argument, which goes along the lines of "if you have tests and comments (and everybody should, of course!) type checking doesn't buy you anyhing". But if the premise can't be met, the argument doesn't actually say anything about the usefulness of type checking. :-) It's often not cost effective to have good test coverage (and even 100% line coverage doesn't give you full coverage of all interactions). Testing can't prove that your code doesn't have defects -- it just proves that for a tiny subset of possible inputs you code works as expected. A type checker may be able to prove that for *all* possible inputs your code doesn't do certain bad things, but it can't prove that it does the good things. Neither subsumes the other, and both of these are approaches are useful and complementary (but incomplete). I think that there was a good talk basically about this at PyCon this year, by the way, but I can't remember the title. Jukka

On 11.09.2015 08:24, Jukka Lehtosalo wrote:
Try harder then.
Not sure where I said this.
You can't just wave a magic wand and to get every programmer to add type annotations to their code. However, we know quite well that programmers are perfectly capable of writing unit tests, and tools can even enforce that they are present (witness coverage tools and hooks in SCM systems preventing it from dropping). [ Interesting, that it was that easy to exchange the parts you've given me ;) ] Btw. have you heard of code review?
Tools can't verify that you have good variable names or useful docstrings, and people are too inconsistent or lazy to be relied on.
Same can be said for type annotations.
I would like to peer-review that cost/benefit analysis you've made to see whether your numbers are sane.
Yes, I am going to tell him: "Hey, it doesn't work but we got all/most of the types right."
I think you should be more specific on this. Using hypothesis, e.g., you can easily increase the number of simple tests as well. What I can tell is that most of the time, a variable carries the same type. It is really convenient that it doesn't have to but most of the time it does. Thus, one test run can probably reveal a dangerous type mistake. I've seen code where that is not the case indeed and one variable is either re-used or accidentally have different types. But, well, you better stay away from it anyway because most of the time it's very old code. Moreover, in order to add *reasonable* type annotations you would probably invest equal amount of time that you would invest to write some tests for it. The majority of time is about *understanding* the code. And there, better variable names help a lot.
I fully agree on this. Yet I don't need type annotations. ;) A simple test running a typechecker working at 40%-60% (depending on whom you ask) efficiency suffices at least for me. I would love to see better typecheckers rather than cluttering our code with some questionable annotations; btw. of which I don't know of are necessary at all. Don't be fooled by the possibility of dynamic typing in Python. Just because it's possible doesn't necessarily mean it's the usual thing.
I think that there was a good talk basically about this at PyCon this year, by the way, but I can't remember the title.
It'll be great to have it. :) Best, Sven

On September 16, 2015 3:42:20 PM CDT, "Sven R. Kunze" <srkunze@mail.de> wrote:
def process_integer_coordinate_tuples(integer_tuple_1, integer_tuple_2, is_fast): ... vs def process_coords(t1: Tuple[int, int], t2: Tuple[int, int], fast: bool): ... Java's fatal mistake.
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

Embedding type names in arguments and method names. On Thu, Sep 17, 2015 at 4:45 PM, Sven R. Kunze <srkunze@mail.de> wrote:
You said:
Even good variable names can leave the type ambiguous.
These are names that don't leave anything ambiguous! :D Really, though: relying on naming to make types explicit fails badly whenever you start refactoring and makes hell for the users of the API you made. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/

On 17.09.2015 23:56, Ryan Gonzalez wrote:
I was actually confused by 'Java' in your reply.
They just do. Because they don't tell me why I would want to call that function and with what. If any of these versions is supposed to represent good style, you still need to learn a lot.
Professional refactoring would not change venerable APIs. It would provide another version of it and slowly deprecate the old one. Not sure where you heading here but do you say t1 and t2 are good names? Not sure how big the applications you work with are but those I know of are very large. So, I am glad when 2000 lines and 10 files later a variable somehow tells me something about it. And no "*Tuple[int, int]*" doesn't tell me anything (even when an IDE could tell that). Most of the time when discussing typecheckers and so forth, I get the feeling people think most applications are using data structures like *tuples of tuples of tuples of ints*. That is definitely not the case (anymore). Most of the time the data types are instances, list of instances and dicts of instances. That's one reason I somehow like Jukka's structural proposal because I actually can see some real-world benefit which goes beyond the tuples of tuples and that is: *inferring proper names*. Best, Sven

On Thu, Sep 17, 2015 at 04:56:33PM -0500, Ryan Gonzalez wrote:
Embedding type names in arguments and method names.
supposedly being "Java's fatal mistake". I'm not sure that Java developers commonly make a practice of doing that. It would be strange, since Java requires type declarations. I'm not really a Java guy, but I think this would be more like what you would expect: public class Example{ public void processCoords(Point t1, Point t2, boolean fast){ ... } where Point is equivalent to a (int, int) tuple. You seem to be describing a verbose version of "Apps Hungarian Notation". I don't think Hungarian Notation was ever standard practice in the Java world, although I did find at least one tutorial (from 1999) recommending it: http://www.developer.com/java/ent/article.php/615891/Applying-Hungarian-Nota... In any case, I *think* that your intended lesson is that type annotations can increase the quality of code even without a type checker, as they act as type documentation to the reader. I agree with that. -- Steve

On Sep 9, 2015, at 13:17, Guido van Rossum <guido@python.org> wrote:
Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss.
https://github.com/ambv/typehinting/issues/11#issuecomment-138133867
Are we going to continue to have (both implicit and explicit) ABCs in collections.abc, numbers, etc., and also have protocols that are also ABCs and are largely parallel to them (and implicit at static checking time whether they're implicit or explicit at runtime) In typing? If so, I think we've reached the point where the two parallel hierarchies are a problem. Also, why are both the terminology and implementation so different from what we already have for ABCs? Why not just have a decorator or metaclass that can be added to ABCs that makes them implicit (rather than writing a manual __subclasshook__ for each one), which also makes them implicit at static type checking time, which means there's no need for a whole separate but similar notion? I'm not sure why it's important to also have some times that are implicit at static type checking time but not at runtime, but if there is a good reason, that just means two different decorators/metaclasses/whatever (or a flag passed to the decorator, etc.). Compare: Hashable is an implicit ABC, Sequence is an explicit ABC, Reversible is an implicit-static/explicit-runtime ABC. Hashable is an implicit ABC and also a Protocol that's an explicit ABC, Sequence is an explicit ABC and not a Protocol, Reversible is a Protocol that's an explicit ABC. The first one is clearly simpler; is there some compelling reason that makes the second one better anyway?

On Wed, Sep 9, 2015 at 3:08 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
I'm not proposing creating protocols for numbers or most collection types. I'd change some of the existing ABCs (mentioned in the proposal, including things like Sized) in typing into equivalent protocols, but they'd still support isinstance as before and would be functionally almost identical to the existing ABCs. I clarified the latter fact in the github issue.
Protocol would use a metaclass that is derived from the ABC metaclass, and it would be similar to the Generic class that we already have. The reason why the proposal doesn't use an explicit metaclass or a class decorator is consistency. It's possible to define generic protocols by having Protocol[t, ...] as a base class, which is consistent with how Generic[...] works. The latter is already part of typing, and introducing a similar concept with a different syntax seems inelegant to me. Consider a generic class: class Bucket(Generic[T]): ... Now we can have a generic protocol using a very similar syntax: class BucketProtocol(Protocol[T]): ... I wonder how we'd use a metaclass or a class decorator to represent generic protocols. Maybe something like this: @protocol[T] class BucketProtocol: ... However, this looks quite different from the Generic[...] case and thus I'd rather not use it. I guess if we'd have picked this syntax for generic classes it would make more sense: @generic[T] class Bucket: ...
I'm not sure if I fully understand what you mean by implicit vs. explicit ABCs (and the static/runtime distinction). Could you define these terms and maybe give some examples of each? Note that in my proposal a protocol is just a kind of ABC, as GenericMeta is a subclass of ABCMeta and protocol would have a similar metaclass (or maybe even the same one), even though I'm not sure if I explicitly mentioned that. Every protocol is also an ABC. Jukka

On 2015-09-09 13:17, Guido van Rossum wrote:
I'm not totally hip to all the latest typing developments, but I'm not sure I fully understand the benefit of this protocol concept. At the beginning it says that classes have to be explicitly marked to support these protocols. But why is that? Doesn't the existing __subclasshook__ already allow an ABC to use any criteria it likes to determine if a given class is considered a subclass? So couldn't ABCs like the ones we already have inspect the type annotations and decide a class "counts" as an iterable (or whatever) if it defines the right methods with the right type hints? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 10.09.2015 03:50, Brendan Barnwell wrote:
You bet what I am.
The benefit from what I understand is actually really, really nice. It's basically adding the ability to shorten the following 'capability' check: if hasattr(obj, 'important') and hasattr(obj, 'relevant') and hasattr(obj, 'necessary'): # do to if implements(obj, protocol): # do As usual with type hints, functionality is not guaranteed. But it simplifies sanity checks OR decision making: if implements(obj, protocol1): # do this elif implements(obj, (protocol2, protocol3)): # do that The ability to extract all protocols of a type would provide a more flexible way of decision making and processing such as: if my_protocol in obj.__protocols__: # iterate over the protocols and do something @Jukka I haven't found the abilities described above. Would it make sense to add it (except it's already there)? Best, Sven

On 2015-09-10 10:01, Sven R. Kunze wrote:
Right, but can't you already do that with ABCs, as in the example in the docs (https://docs.python.org/2/library/abc.html)? You can write an ABC whose __subclasshook__ does whatever hasattr checks you want (and, if you want, checks the type annotations too), and then you can use isinstance/issubclass to check if a given instance/class "provides the protocol" described by that ABC. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On 10.09.2015 20:24, Brendan Barnwell wrote:
You might probably be write. Maybe, it's that this kind of "does whatever hasattr checks you want" gets standardized via the protocol base class. Pondering about this idea further, current Python actually gives enough means to do that on runtime. If I rely on method A to be present at object b, Python will give me simply an AttributeError and that'll suffice. So, it's only for the static typechecker again. Best, Sven

On 9 September 2015 at 21:17, Guido van Rossum <guido@python.org> wrote:
Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss.
Some good feedback has been provided in this thread already, but I want to provide an enthusiastic +1 for this change. I'm one of the people who has been extremely lukewarm towards the Python type hints proposal, but I believe this addresses one of my major areas of concern. Overall the proposal seems like a graceful solution to many of the duck typing problems. It does not address all of them, particularly around classes that may dynamically (but deterministically) modify themselves to satisfy the constraints of the Protocol (e.g. by generating methods for themselves at instantiation-time), but that's a pretty hairy use-case and there's not much that a static type checker could do about it anyway. Altogether this looks great (modulo a couple of small concerns raised by others), and it's enough for me to consider using static type hints on basically all my projects with the ongoing exception of Requests (which has duck typing problems that this cannot solve, I think). Great work Jukka!

On 09.09.2015 22:17, Guido van Rossum wrote:
*15) How would|Protocol|be implemented? *"Implement metaclass functionality to detect whether a class is a protocol or not. Maybe add a class attribute such as __protocol__ = True if that's the case" If you consider the __protocols__ attribute I mentioned in an earlier post, I would like to see __protocol__ renamed to __is_protocol__. I think that would make it more readable in the long run. Best, Sven

I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea.

On Wednesday, September 9, 2015 at 1:19:12 PM UTC-7, Guido van Rossum wrote:
I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol A? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea.

On Sep 10, 2015, at 11:57, Matthias Kramm via Python-ideas <python-ideas@python.org> wrote:
I don't understand this, given that resorting to protocols is basically the same thing as resorting to ABCs. Clearly there's some perceiving difficulty or complexity of ABCs within the Python community that makes people not realize how simple and useful they are. But I don't see how adding something that's nearly equivalent but different and maintaining the two in parallel is a good solution to that problem. There are some cases where the fact that ABCs rely on a metaclass makes them problematic where Protocols aren't (basically, where you need another metaclass), but I doubt that's the case you're worried about.

On Thu, Sep 10, 2015 at 11:57 AM, Matthias Kramm via Python-ideas < python-ideas@python.org> wrote:
The proposal doesn't spell out the rules for subtyping, but we should follow the ordinary rules for subtyping for functions, and return types would behave covariantly. So the answer is yes.
The proposal also lets you define the protocols implemented by your class explicitly, and without having the explicit Protocol base class or some other marker these would be impossible to distinguish in general. Example: class MyList(Sized): # I want this to be a normal class, not a protocol. def __len__(self) -> int: return self.num_items class DerivedProtocol(Sized): # This should actually be a protocol. def foo(self) -> int: ...
We could use that. The tradeoff it that then we'd have some inconsistency depending on whether a protocol is generic or not: @protocol class A(metaclass=ProtocolMeta): # Non-generic protocol ... @protocol class B(Generic[T]): # Generic protocol. But this has a different metaclass than the above? ... I'm not sure if we can use ABCMeta for protocols as protocols may need some additional metaclass functionality. Anyway, any proposal should consider all these possible ways of defining protocols: 1. Basic protocol, no protocol inheritance 2. Generic protocol, no protocol inheritance 3. Basic protocol that inherits one or more protocols 4. Generic protocol that inherits one or more protocols My approach seems to deal with all of these reasonable well in my opinion (but I haven't implemented it yet!), but the tradeoff is that the Protocol base class needs to be present for all protocols. Jukka

On Thursday, September 10, 2015 at 11:38:48 PM UTC-7, Jukka Lehtosalo wrote:
Ok. Note that this introduces some weird corner cases when trying to decide whether a class implements a protocol. Consider class P(Protocol): def f() -> P class A: def f() -> A It would be both valid to say that A *does not* implement P (because the return value of f is incompatible with P) as it would be to say that A *does* implement it (because once it does, the return value of f becomes compatible with P). For a more quirky example, consider class A(Protocol): def f(self) -> B def g(self) -> str class B(Protocol): def f(self) -> A def g(self) -> float class C: def f(self) -> D: return self.x def g(self): return self.y class D: def f(self) -> C: return self.x def g(self): return self.y Short of introducing intersection types, the protocols A and B are incompatible (because the return types of g() are mutually exclusive). Hence, C and D can, respectively, conform to either A or B, but not both. So the possible assignments are: C -> A D -> B *or* C -> B D -> A . It seems undecidable which of the two is the right one. (The structural type converter in pytype solves this by dropping the "mutually exclusive" constraint to the floor and making A and B both a C *and* a D, which you can do if all you want is a name for an anonymous structural type, But here you're using your structural types in type declarations, so that solution doesn't apply) Matthias
participants (12)
-
Andrew Barnert
-
Brendan Barnwell
-
Brett Cannon
-
Cory Benfield
-
Guido van Rossum
-
Jukka Lehtosalo
-
Luciano Ramalho
-
Matthias Kramm
-
Ryan Gonzalez
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze