Delay evaluation of annotations

Hi all, Annotations of function parameters and variables are evaluated when encountered. This makes it necessary to use string representation for names that are not yet bound, which affects almost every class definition. It is also easy to forget, and the result might be a (very uninteresting) exception in certain untested paths, e.g. inside functions. Editors and IDEs also don't handle it well; for example, PyDev does not consider string-annotations as an occurrence of the name, and warns about unused imports. I propose delaying evaluation of annotation-expressions by either keeping the AST of the annotation, or turning it implicitly from EXP into "lambda: EXP". Inspection code that is interested in this information can access it be calling (or evaluating) it. It certainly isn't a backward compatible change, but I think it won't affect too much code. On the positive side, it will make annotated code much more pleasing to read, will be less surprising for beginners, and will help editors in syntax highlighting and name lookup. In short, it will shift the burden of handling type annotations from standard code to inspection code, which is where I believe it should rest. While this idea is quite obvious, I've found no clue in PEP-3107 as to why will it be a bad one (no such "rejected proposal" there), so I raise the question here. Elazar

On Thu, Sep 22, 2016 at 05:19:12PM +0000, אלעזר wrote:
Right, like all other Python expressions in general, and specifically like function parameter default arguments.
This makes it necessary to use string representation for names that are not yet bound, which affects almost every class definition.
Almost every class? Really? I find that implausible. Still, I can see it affecting *many* class definitions, so let's not quibble.
Unlikely, unless you're talking about functions nested inside other functions, or unusual (but legal and sometimes useful) conditional definitions: if condition: # forward reference to MyClass def f(arg:'MyClass'): ... else: # oops, untested path def f(arg:MyClass): ... class MyClass: ... But generally speaking, that sort of code is unusual, and besides, if you're doing this, either the static type checker won't be able to cope with it at all (in which case there's little point in annotating the function), or it will cope, and detect the invalid annotation.
I would call that a bug in PyDev.
-1 on complicating the simple Python model that expressions are evaluated when they are reached. You would also complicate the introspection of annotations. With your proposal, *every* annotation would be a function, and every(?) inspection would require calling the function to find out what the real annotation is. And what would that do to modules which use annotations for some other purpose? I know Guido is keen to discourage such alternative uses, but they're still legal, and a change like this would outright break them. Personally, I'm not convinced that it is a burden to expect people to remember to quote forward references. If they forget, they will nearly always get a NameError at runtime or a warning/error when they run the type checker.
A bit more pleasant. It's not unpleasant to read 'MyClass' instead of MyClass.
will be less surprising for beginners,
Only because said beginners aren't familar enough to be surprised by how surprising this is.
and will help editors in syntax highlighting and name lookup.
But will harm runtime introspection. -- Steve

On Thu, Sep 22, 2016 at 11:42 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I would say this affects a "rare class here and there." Almost all typing will be with things defined in the `typing` module (or built-ins). I guess once in a while we'll see e.g. `Sequence[CustomThing]`, but it will be uncommon for that typing involving `CutomThing` to be within CustomThing itself (well, unless you use much more recursion than Python encourages).
-1 on complicating the simple Python model that expressions are evaluated when they are reached.
I think there is a decent argument for a more general concept of macros, or symbols, or simpler delayed evaluation than lambda for Python in general. I see places where this would be very nice for Pandas, for example, and for Dask (I work with the developers of both of those projects). In such a hypothetical future world we might come to allow, e.g. `Sequence[#CustomThing]` where some general lazy facility or indirection is indicated by the '#' (just a placeholder for this comment, not a proposal). But if that comes about, it should be available everywhere, not only in annotations. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Sep 22, 2016 at 10:29 PM David Mertz <mertz@gnosis.cx> wrote:
I think we're talking about different things here. I just referred to the common need to use the name of the current class in type annotation class A: def add(self, other: A) -> A: ...
I generally agree, but this future world must be very far and has many consequences, whereas the story of annotations is special in that it's not actually an expression, to the reader. Elazar

On Thu, Sep 22, 2016 at 12:35 PM, אלעזר <elazarg@gmail.com> wrote:
The CPython developers (of whom I'm not one, but I've followed them closely for 18 years) place a high value on simplicity in the parser and interpreter. Adding a new custom type of thing that is an "annotation object" would be a special case with a high burden to show its utility. My feeling is that this burden is actually lower for a new "delayed eval object" that might conceivably be added at a syntax level. In some sense, this would add just as much complexity as a new annotation object, but it would be something that applies many places and hence perhaps be worth the added complexity. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Sep 22, 2016 at 10:45 PM David Mertz <mertz@gnosis.cx> wrote: then so be it; I can't claim I know better. I only speculate that it does not necessarily requires a new custom type. A delayed eval object will be very useful for initilizers, for the very reason that the current behavior is surprising. -- This made me think about Steven's argument above: it is not true that expressions are evaluated when they are encountered, since x = lambda: print(1) prints nothing. So a colon before an expression hints about delayed evaluation. This includes annotations and lambda. Elazar

On Thu, Sep 22, 2016 at 12:59 PM, אלעזר <elazarg@gmail.com> wrote:
I don't mean a runtime type here (necessarily), but rather a new type for the parser. I.e. transform this actual expression into some sort of delayed expression when parsing. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Sep 22, 2016 at 11:02 PM David Mertz <mertz@gnosis.cx> wrote:
Just as a demonstration, the parser can transform `EXP` into `lambda: EXP` - and that's it. It will not solve everything (e.g. error messages and .__annotation__ access as Alexander says), but it demonstrates the fact that the change need not be so deep at all. Elazar

On Thu, Sep 22, 2016 at 4:29 PM, אלעזר <elazarg@gmail.com> wrote:
On the second thought, why can't the parser a simply replace A with 'A' in annotations that appear in the body of class A? This will only break somewhat pathological code that defines A before it is (re)defined by the class statement.

On Fri, Sep 23, 2016 at 6:48 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On the third thought, this entire feature can be implemented in the metaclass by injecting A = 'A' in the dict in __prepare__.
That would be the easiest, and least magical, solution. It simply means that the name of the current class is available as a pseudo-reference to itself, for typing purposes only. It parallels function recursion, which is done using the function's name: # Recursion in functions def spam(): return spam() # Recursion in type annotations class Spam: def make_spam() -> Spam: return self Clean and simple. And there's less magic here than super() - a lot less. It does mean that Spam.Spam == "Spam" forever afterwards, but I doubt that's going to break anything. It'd be just like __name__, except that currently, Spam.__name__ is set afterwards, so it's not available during class definition (and the module's name will be used instead). ChrisA

On Fri, Sep 23, 2016 at 12:18 AM Chris Angelico <rosuav@gmail.com> wrote:
I just note that it *is* surprising, for most users, that you can't be sure that this is a recursion, yet. So it if you want a trusted-upon recursion you should write # spam: def spam(): def spam(): return spam() return spam() Elazar

On Fri, Sep 23, 2016 at 7:33 AM, אלעזר <elazarg@gmail.com> wrote:
Only surprising for people who want it _guaranteed_. It's the exact same problem as this: def helper(x): ... def spaminate(x, y): helper(x) helper(y) How do you know that a replacement helper hasn't been injected? You don't... but you trust that people aren't normally going to do that, and if they do, they're taking responsibility (maybe they're mocking helper for testing). ChrisA

On Thu, Sep 22, 2016 at 09:33:58PM +0000, אלעזר wrote:
Who are these "most users" of which you speak? Fortran programmers? C programmers? *Beginner* Python programmers? You should specify who you are referring about, rather than claim "most" without evidence. Experienced Python programmers should realise that recursion in Python is implemented by name lookup, like all other function calls, so if you rebind the name "spam" to something else, the function will call something else. This is no different from any other form of function call, including calls to built-ins. If you rebind or shadow a name, you will change which object is called. That shouldn't be a surprise, whether it involves recursion or not.
*shrug* But if I do that, then I make it difficult or impossible to monkey-patch spam on the fly, for instance in the interactive interpreter. I wouldn't do it in production, but for interactive exploritory work, it is astonishing how often monkey-patching comes in handy. Just yesterday I played around with some code where I monkey-patched the built-in iter() so I could get a better idea of how the code worked. The straight-forward and simple way of writing a recursive spam() function surprises beginners, but they might go years or their entire career without running into a situation where they are caught by surprise. After all, it is rare for productuon code to rename functions, and rarer still to do it to recursive functions: func = spam spam = something_else() func() # why does the recursion not work??? In production code, that sort of thing almost never happens. On the other hand, your clever trick for preventing that surprise will surprise *me* and other experienced Pythonistas who know how recursion and function calls work in Python and expect to be able to take advantage of that when and if needed. In other words, in order to protect beginners from accidents which are extremely rare, you will take away power from experienced programmers who are more likely to want to make use of that power. I don't think that's a good tradeoff. For the avoidance of doubt: we're all adults here. If you personally want to write your recursive functions the "trusted" way, go right ahead. It will make them just a little bit less useful to experts, add an insignificant amount of safety, and require a bit more work on your part. But it's your code, and I don't intend to tell you not to do this. In the meantime, I'll usually just write my recursive functions the old-fashioned normal way. -- Steve

On Fri, Sep 23, 2016 at 12:35 PM, Steven D'Aprano <steve@pearwood.info> wrote:
There's actually one very common technique involving rebinding functions. @count_calls def mergesort(lst): mid = len(lst) // 2 if not mid: return lst return merge(mergesort(lst[..mid]), mergesort(lst[mid..])) *Obviously* this is recursive. But if you used some magic that said "call the function that's currently being called", you'd be bypassing the count_calls decoration (which would presumably work by creating a wrapper function). Yes, it may defeat some potential optimizations (eg tail recursion optimization), but it enables all this flexibility. So we _need_ to have this kind of rebind available, and not just for experts.
In the meantime, I'll usually just write my recursive functions the old-fashioned normal way.
As will I. Of course, people are welcome to work differently, just as long as I never have to write tests for their code, or refactor anything into a decorator, or anything like that. I want the POWAH!!!!! :) ChrisA

On Fri, Sep 23, 2016 at 5:54 AM Chris Angelico <rosuav@gmail.com> wrote:
I think you are mixing levels of abstraction because you know how this is implemented. The user only sees "A function named mergesort decorated by count_calls". She does not see "A function named mergesort passed to a higher order function named count_calls whose result is bound into the variable mergesort". Even if the latter is exactly what happens, declaratively the former is more accurate by intention. Ideally, the calls to mergesort will rebind to this _decorated_ function. not to the mutable global variable. Again, the argument that it will be very hard to implement it in a different way, or that is will break things, is a very strong argument, and I am not confronting it.
As will I, simply because the old-fashioned way is more readable. And I will sadly accept the fact that I can't be 100% sure what's function is called at runtime. But _some_ people (medium-level, Steven, whose main language is probably not Python) will not even know this is the case. Tests are important and could have reworked into the system (through inspect, or by using a special import which allow monkey patching). I can't see why the ability to test must remain in production. Elazar

אלעזר writes:
And the dinosaurs will have returned by independent evolution by the time it matters to them (unless it's a deliberate attack, in which case people at that level would be toast anyway). But I think you're completely missing what people are trying to tell you. You shouldn't be so concerned with refuting their arguments because it doesn't matter. No matter how many points you amass for technique, you're going to get creamed on style points anyway. It's like this: (1) Python is a "consenting adults" language, and that is presumed by its development culture. The goal is not to stop people from creating "functions that look like recursions but aren't" on purpose; it's to make it easy for them to write recursive functions if they want to. From your example, that goal is obviously satisfied. Nobody who matters wants to go farther than that in Python. The reason one can create "functions that look like recursions but aren't" is because another of Python's goals is to ensure that all things -- specifically including functions -- are objects that can be manipulated "the same way" where appropriate -- in this case, saving off the original function object somewhere then rebinding the original name to something else.[1] Granted, we don't go so far as Lisp where expressions are lists that you can manipulate like any other list, but aside from the fact that the code itself is an opaque object, functions are no different from other objects. Even builtins: Python 3.6.0a4 (default, Sep 3 2016, 19:21:32) >>> def help(*ignored, **paid_no_attention): ... print("Ouch, you just shot off your foot!") ... >>> help(help) Ouch, you just shot off your foot! >>> Shooting off your opposite extremity by redefining builtin classes is left as an exercise for the reader. All of this is a matter of the general attitude of pragmatism and bias toward simplicity of implementation (both enshrined in the Zen of Python). (2) You keep talking about others being lost in terminology, but in the context of Python discussions, you have a really big problem yourself. You use the phrase "just an annotation" as though that means something, but there is nothing like a "just an <anything>" in Python discourse, not in the sense that "once we introduce <anythings>s, they can be anything we want". The Language Reference defines what things are possible, and truly new ones are rarely added. This is deliberate. Another design principle is Occam's Razor, here applied as "new kinds of thing shall not spring up like whiskers on Barry's chin." Yes, function annotations need new syntax and so are a new kind of thing to that extent. *Their values don't need to be,* and even the annotations themselves are implemented in the preferred way for "new things" (a dunder on an existing type). Since it's new syntax, it's language-level, and so the values are going to be something already defined in the language reference. "Expression resolving to object to be saved in an attribute on the function" seems to be as close to "anything you want" as you're gonna get without a new kind of thing. (3) Python has a very simple model of expressions. The compiler turns them into code. The interpreter executes that code, except in the case where it is "quoted" by the "def" or "lambda" keywords, in which case it's stored in an object (and in the case of "def", registered in a namespace). As Nick admits, you could indeed argue that initializations and annotation values *could* consistently be turned into "thunks" (stored code objects, we already have those) in attributes on the function object. But (1) that's an extension of the model (admittedly slight since functions, which already do that for their bodies, are involved -- but see Nick's reply for the hidden difficulties due to normal handling of namespaces in Python), and (2) it's a clear pessimization in the many cases where those values are immutable or very rarely mutated, and the use case (occasional) of keeping state in mutable values. The thunk approach is more complex, for rather small benefit. Re "small benefit", IMHO YMMV, but at least with initialization Guido is on record saying it's the RightThang[tm] (despite a propensity of new users to write buggy initializations). (4) Chris argues that "compile to thunk" is incoherent, that expressions in function bodies are no different than anywhere else -- they're evaluated when flow of control reaches them. AFAICS that *still* doesn't rule out having the compiler recognize the syntax and produce code that returns thunks instead of ordinary values, but Chris's point makes that seem way too magical to me. (5) This points up the fact that Python is thoroughly dynamic. It's not just that types adhere to objects rather than variables, but the whole attitude toward language design and implementation is. A variable not defined because it's on the path not taken, or even a function: they just don't exist as far as the interpreter is concerned -- there's no way to find them from Python. That's not true in say C: if you have a powerful enough debugger, you can even call a function defined, but never referenced, in the code. So while we'd be happy for people familiar with "statically-typed languages" to enjoy the benefits of using Python for some of their work, we can't help them if they can't shake off that attitude when using Python. Making things seem intuitive (which here translates to "familiar", as usual) to them is just misleading. Python doesn't work that way, and often enough, that matters. (6) As you point out: of course, thunks are more general than values (in fact, without instructions that move them around, in computers a value is just a tree falling in a forest with noone to hear). But maximum generality is not necessarily an important goal, even if it makes some things prettier. Allow me to quote the late Saunders Mac Lane: "[G]ood general theory does not search for the maximum generality, but for the right generality." (7) Re Nick's comment about backward compatibility on the high bar having a G degree of difficulty, I'm sure you don't disagree with the principle of avoiding compatibility breaks. But while that's probably the argument that defeats this proposal here, I think that even looking forward from the time before releasing Python 3.0, the decision would be the same. That is, I think the decision to go with the simpler "evaluate to object" model was one of the right decisions at that time, for the reasons above. Your proposal of "evaluate to thunk" (possibly incorporating the property-based magic Alexander proposed) might be right *too*, but it's far from obviously better to me. I see nothing there that would be likely to have dissuaded the authors of PEP 3107, or Guido when he designed default initialization, from "evaluate to object". If you don't like that philosophy, or somehow don't think it applies here, keep trying, you may have a point. But in this thread, IMO you're trying to ski up a slope that's barely able to hold snow, and wasting everybody's time. And even if I'm wrong about wasting time with the feature, you'll be way more persuasive if you argue in terms of Python as it is designed (mostly deliberately so), which is the way most of us mostly like it. Although we do change our mind every 18 months. :-) Footnotes: [1] For maximum humor, rebind it to a different recursive function!

Thank you all for your feedback. I will try to respond to it in a way that will not waste your time, But to do that I still need an example for the stromgest issue raised - bacwards compatibility. It is not just the mere change that is incompatible, since _any_ visible change is incompatible in some way, or otherwise it wasn't visible. Again, I assume ".__annotations__" access evaluates them in the original context. I couldn't find any useful example yet. Elazar בתאריך שבת, 24 בספט' 2016, 22:07, מאת Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp>:

אלעזר writes:
But to do that I still need an example for the stromgest issue raised - bacwards compatibility.
Given that Nick has misgivings about the ease of actually implementing this, I think you need to present an implementation, and then we talk about how closely it approximates backward compatibility.
Again, I assume ".__annotations__" access evaluates them in the original context.
You don't get to assume that, without an implementation that shows how you work around the "def can't see names of lambda arguments" issue. At the very least you need to define "evaluate in original context" operationally -- the "original context" as I understand it is what is visible the current implementation, but that is clearly not what you mean. Of course an implementation would serve to define that.

On Thu, Sep 22, 2016 at 12:35 PM, אלעזר <elazarg@gmail.com> wrote:
Yeah, I find the need for using the string "A" here a wart. Rather than change the entire semantics of annotations, it feels like a placeholder for this meaning would be better. E.g.: class A: def __add__(self, other: CLS) -> CLS: ... A static checker could do the magic of recognizing that special name easily enough (no harder than recognizing the quoted string). At runtime 'CLS' could either just be a singleton with no other behavior... or perhaps it could be some sort of magic introspection object. It's more verbose, but you can also spell it now as: class A: def __add__(self, other: type(self)) -> type(self): ... That's a little ugly, but it expresses the semantics we want. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 22 September 2016 at 22:02, אלעזר <elazarg@gmail.com> wrote:
Concerning the __add__ method, I think a more typical type for it is T = TypeVar('T', bound='A') class A: def __add__(self: T, other: T) -> T: ... There is a plan to support this in one of next releases of mypy. In general I think it is a bit early to judge what would be the best solution for forward references. (there are related issues like performance drop etc). More evidence is needed to decide a way forward. -- Ivan

On Fri, Sep 23, 2016 at 12:05 AM Ivan Levkivskyi <levkivskyi@gmail.com> wrote: the string there. Not that I'm saying it's a bad solution, but it fits pyi files more than average-programmer-code.
The problem with waiting for more evidence is that more code will break if the change require such breakage. At least side-effect in annotation expressions should be "deprecated" and not guarantee when it happen, how many times, etc. Elazar

On 23 September 2016 at 15:50, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Same answer as with any other circular dependency: the code smell is the circular dependency itself, not the awkwardness of the syntax for spelling it. If the string based "circular reference here!" spelling really bothers you, refactor to eliminate the circularity (e.g. by extracting a base class or an implementation independent interface definition), rather than advocating to make the spelling less obnoxious. The difference between that and the "methods referring to the class they're defined in" case is that it's likely to be pretty normal to want to do the latter, so it may prove worthwhile to provide a cleaner standard spelling for it. The counter-argument is the general circularity one above: do you *really* need instances of the particular class being defined? Or is there a more permissive interface based type signature you could specify instead? Or perhaps no type signature at all, and let ducktyping sort it out implicitly at runtime? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Friday, September 23, 2016 at 2:23:58 AM UTC-4, Nick Coghlan wrote:
I agree that circularity should in general be avoided, but it's not always possible or elegant to do that. Sometimes you really need two classes to refer to each other. In that case, why not expose your placeholder idea to the user via a library? You have one function that generates placeholder singletons (generate_placeholder()), and another function to walks a class object and replaces a placeholder with a given value (replace_placeholder(placeholder, cls, value)). Best, Neil

On 27 September 2016 at 17:29, Neil Girdhar <mistersheik@gmail.com> wrote:
Because the general case is already covered by using a quoted string instead of a name reference. "I don't like using strings to denote delayed evaluation" isn't a compelling argument, which is why alternative ideas have to offer some other significant benefit, or else be incredibly simple both to implement and to explain. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 27, 2016 at 5:01 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
My motivation for something other than quoted strings is that there are other instances of circular dependencies. Currently, when I am forced into a circular dependency, I import the later class in the member functions of the first: # module x class X: def f(self): from y import Y # do something with Y # module y class Y: pass That's not ideal and I don't see how to extend this solution to use of "y" in class level definitions. Best, Neil

Neil Girdhar writes:
Why not just expose it through a simple assignment? https://mail.python.org/pipermail/python-ideas/2016-September/042563.html Note that this also works for typechecking in PEP 484 checkers that allow forward reference via the stringified class name. See also https://mail.python.org/pipermail/python-ideas/2016-September/042544.html, which would allow eliding the assignment, but pollutes the class namespace. "Simple is better than complex." This feature is still looking for a persuasive use that needs it, and not something simpler. Steve

I don't understand why that would work and this clearly doesn't? Mutual2 = "Mutual2" # Pre-declare Mutual2 class Mutual1: def spam(self, x=Mutual2): print(type(x)) class Mutual2: def spam(self): pass Mutual1().spam() prints class "str" rather than "type". On Tue, Sep 27, 2016 at 6:20 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

On Tue, Sep 27, 2016 at 11:54:40AM +0000, Neil Girdhar <mistersheik@gmail.com> wrote:
Try this: class Mutual1: def spam(self, x=None): if x is None: x = Mutual2 print(type(x)) class Mutual2: def spam(self): pass Mutual1().spam() Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 27 September 2016 at 13:46, Neil Girdhar <mistersheik@gmail.com> wrote:
Yes, I understand that, but I don't see how that would help at all with annotations. Aren't annotations also evaluated at "compile time"?
Yes, but a string whose value is a class name is treated as being the same annotation (i.e., meaning the same) as the class itself. Paul

On 27 September 2016 at 22:46, Neil Girdhar <mistersheik@gmail.com> wrote:
Yes, I understand that, but I don't see how that would help at all with annotations. Aren't annotations also evaluated at "compile time"?
This thread isn't about circular references in general, just circular references in the context of type hinting. For type hinting purposes, it already doesn't matter whether you use a variable name to refer to a type or a quoted string literal, as the typechecker ignores the quotation marks (and this is mandated by PEP 484). For runtime annotation use, the difference is visible, but the only required runtime behaviours for the typechecking use case are "doesn't throw an exception" and "doesn't take a prohibitive amount of time to evaluate when the function is defined". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Doh! Yes, of course 'self' is only a scoped name within the body of the method, not in the signature. On Thu, Sep 22, 2016 at 1:02 PM, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 23 September 2016 at 05:58, David Mertz <mertz@gnosis.cx> wrote:
That doesn't work, as "self" hasn't been bound yet when the annotations are evaluated, just like A hasn't been bound yet (since it doesn't exist until *after* the class body finishes executing). As others have noted, the general idea of allowing either a placeholder name or the class name to refer to a suitable type annotation is fine, though - that would be a matter of implicitly injecting that name into the class namespace after calling __prepare__, and ensuring the compiler is aware of that behaviour, just as we inject __class__ as a nonlocal reference into method bodies that reference "super" or "__class__". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23 September 2016 at 12:05, אלעזר <elazarg@gmail.com> wrote:
Right now? No - you'll get a name error on the "A", just as you would if you tried to reference it as a default argument:
And that's the problem with using the class name in method annotations in the class body: they're evaluated eagerly, so they'd fail at runtime, even if the typecheckers were updated to understand them. Rather than switching annotations to being evaluated lazilly in the general case, one of the solutions being suggested is that *by default*, the class name could implicitly be bound in the body of the class definition to some useful placeholder, which can already be done explicitly today: placeholder placeholder placeholder Since method bodies don't see class level name bindings (by design), such an approach would have the effect of "A" referring to the placeholder in the class body (including for annotations and default arguments), but to the class itself in method bodies. I don't think this is an urgent problem (since the "A"-as-a-string spelling works today without any runtime changes), but it's worth keeping an eye on as folks gain more experience with annotations and the factors affecting their readability. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

David Mertz wrote:
I think that depends on what kind of software you're writing. Anything involving any kind of trees or graphs will have classes that refer to themselves or each other.
(well, unless you use much more recursion than Python encourages).
Recursive data structures don't necessarily imply recursive code to process them, although recursion is often the most natural way to write that code. -- Greg

On Thu, Sep 22, 2016 at 9:43 PM Steven D'Aprano <steve@pearwood.info> wrote:
Just because you call it "expression", when for most purposes it isn't - it is an annotation. "Expression" is something that you need its value right now, and "annotation" is something that, well, annotates the code you see right now. Terminology is not the important thing, but that seems to be the basis your argument here.
I was thinking about the former, but yeah, uncovered code will fail at runtime, possibly in production, for *no* real reason. I do not claim that this is common, but it is definitely unnecessary - unlike initialization expressions. (which I would have liked to see delayed too, but I can understand the reasons why this is strongly opposed; it *is* an expression. Sadly bugs are *much* more common there).
And of course the fact that I use annotated code does not necessarily mean I also use type checkers.
You would also complicate the introspection of annotations. With
This argument was partially answered by Alexander before. Generally, introspection *libraries* will be tiny bit more complicated. Introspection user code will not. And people that write introspection code *must* understand the nitty-gritty details of the language, whereas people that read and write regular code need not.
I don't know any concrete examples. Can you give any? Are these examples use side effects on annotation evaluation? Lastly, do *you* consider it a good idea, one that should be accounted for?
It's just another irritating inconvenience making the write-test cycle longer for no obvious reason (at least from the perspective of the user).
It is to me, but that's only personal taste.
I don't understand this answer at all. I am pretty familiar with Python - not as most of the people on this list, but possibly not less than anyone I know in person (sadly so). And this behavior still surprises me. It definitely surprises people coming from a statically-typed background.
Very little. And to quote Frank Miller, “An old man dies, a little girl lives. Fair trade.” Elazar

On Thu, Sep 22, 2016 at 07:21:18PM +0000, אלעזר wrote:
It is *both*. It's an expression, because it's not a statement or a block. You cannot write: def func(arg1: while flag: sleep(1), arg2: raise ValueError): ... because the annotation must be a legal Python expression, not a code block or a statement. It's an annotation because that's the specific *purpose* of the expression in that context. As an analogy: would you argue that it is wrong to call the for-loop iterable an expression? for <target-list> in <expression>: block I trust that you understand that the loop iterable can be any expression that evaluates to an iterable. Well, annotations can be any expression that evaluates to anything at all, but for the purposes of type checking, are expected to evaluate to a string or a type object. In the case of function annotations, remember that they can be any legal Python expression. They're not even guaranteed to be type annotations. Guido has expressed a strong preference that they are only used as type annotations, but he hasn't yet banned other uses (and I hope he doesn't), so any "solution" for a type annotation problem must not break other uses.
Right. In the case of Python, function annotations **do** have a runtime effect: the expressions are evaluated, and the evaluated results are assigned in function.__annotations__ and made available for runtime introspection. Don't think that function annotations are **only** for the static type checker. Python is a much richer language than that!
Unnecessary? class MyClass: pass def function(arg: MyCalss): ... I want to see an immediate NameError here, thank you very much, even if I'm not running a static checker. I don't want to have to manually call: function.__annotations__['arg']() to see whether or not the annotation is valid. I accept that using strings as forward annotations is not a foolproof solution either: def function(arg: 'MyCalss'): ... but let's not jump into a "fix" that actually makes things worse.
MyClass doesn't exist at that point, so it is in invalid annotation.
Not to the old man, and especially not if the little girl is a psychopath who grows up to become a mass murdering totalitarian dictator. -- Steve

On 23 September 2016 at 13:06, Steven D'Aprano <steve@pearwood.info> wrote:
If folks are after a simple non-type-checking related example of annotation usage, the "begins" CLI library is a decent one: https://pypi.python.org/pypi/begins That lets you supply command line help for parameters as annotations: ============ In Python3, any function annotations for a parameter become the command line option help. For example:
Will generate command help like: usage: holygrail_py3.py [-h] -n NAME -q QUEST -c COLOUR optional arguments: -h, --help show this help message and exit -n NAME, --name NAME What, is your name? -q QUEST, --quest QUEST What, is your quest? -c COLOUR, --colour COLOUR What, is your favourite colour? ============ It's not a substitute for something like click or argparse when it comes to more complex argument parsing, but it's a good example of the kind of simple pseudo-DSL folks have long been able to create with annotations independently of the type hinting use case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 23, 2016 at 6:24 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
That's a very nice use, and I was wrong - I did know it; I've found it not long ago when I wanted to implement it myself... And guess what? It does not require eager evaluation _at all_. No decorator-helped-annotation mechanism require eager evaluation built into the language. Lazy evaluation is more general than eager, in that it can always be forced (and not the other way around). def eager_annotation(f): f.__annotations__ = {k:v() for k, v in f.__annotations__} return f Use @eager_annotation wherever you like, or collapse it into other decorators. You don't need @eager_annotation for type annotations, or any other form of annotation without runtime semantics. On the other hand - if you do want side effect in this function's annotations, well there's better be some nice big @EAGER! decorator above it. Elazar

On 23 September 2016 at 20:31, אלעזר <elazarg@gmail.com> wrote:
The problem it poses for your proposal isn't that a library like begins couldn't be updated to work with lazy annotations (as you say, it clearly could be), it's that it demonstrates the idea of switching to lazy annotations involves a language level *compatibility break* for a feature that has been around and in use for almost 8 years now, and those need incredibly strong justifications. While I personally have some sympathy for the perspective that using strings for forward references in type hints feels a bit clunky, it still doesn't come close to reaching that deliberately high bar. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 23, 2016 at 6:06 AM Steven D'Aprano <steve@pearwood.info> wrote:
Did you just use a false-trichotomy argument? :)
because the annotation must be a legal Python expression, not a code
block or a statement.
This is the situation I'm asking to change
It's an annotation because that's the specific *purpose* of the expression in that context.
Exactly! Ergo, this is an annotation.
for-loop iterable is an expression, evaluated at runtime, _for_ the resulting value to be used in computation. A perfectly standard expression. Nothing fancy.
Must *allow* other use cases. My proposal allows: just evaluate them at the time of their use, instead at definition time.
function.__annotations__ can have the delayed value, be it a lambda, ast or string. It can also be computed at the time of access as suggested earlier.
Two things to note here: A. IDEs will point at this NameError B. Type checkers catch this NameError C. Even the compiler can be made to catch this name error, since the name MyCalss is bound to builtins where it does not exist - you see, name lookup does happen at compile time anyway. I'm not really suggesting the compiler should make it error though. D. Really, where's the error here? if no tool looks at this signature, there's nothing wrong with it - As a human I understand perfectly. If a tool will look at it, it will warn or fail, exactly as I would liked it too. function.__annotations__['arg']()
but let's not jump into a "fix" that actually makes things worse.
That's not a "fix". I suggest always using the last form - which is already in common use - with a nicer syntax and semantics, since there's nothing wrong about it. It is there for a very natural reason.
:) Elazar

On Fri, Sep 23, 2016 at 10:17:15AM +0000, אלעזר wrote:
No. You are the one trying to deny that annotations are expressions -- I'm saying that they are both annotations and expressions at the same time. There's no dichotomy here, since the two are not mutually exclusive. (The word here is dichotomy, not trichotomy, since there's only two things under discussion, not three.)
That's a much bigger change than what you suggested earlier, changing function annotations to lazy evaluation instead of eager. Supporting non-expressions as annotations -- what's your use-case? Under what circumstances would you want to annotate an function parameter with a code block instead of an expression?
I've never denied that annotations are annotations, or that annotations are used to annotate function parameters. I'm not sure why you are giving a triumphant cry of "Exactly!" here -- it's not under dispute that annotations are annotations. And it shouldn't be under dispute that annotations are expressions. They're not code blocks. They're not statements. What else could they be apart from expressions? The PEP that introduced them describes them as expressions: Function annotations are nothing more than a way of associating arbitrary Python EXPRESSIONS with various parts of a function at compile-time. [Emphasis added.] https://www.python.org/dev/peps/pep-3107/ and they are documented as an expression: parameter ::= identifier [":" expression] Parameters may have annotations of the form “: expression” following the parameter name. ... These annotations can be any valid Python expression https://docs.python.org/3/reference/compound_stmts.html#function-definitions I think its time to give up arguing that annotations aren't expressions.
Right. And so are annotations. You want to make them fancy, give them super-powers, in order to solve the forward reference problem. I don't think that the problem is serious enough to justify changing the semantics of annotation evaluation and make them non-standard, fancy, lazy-evaluated expressions.
I meant what I said. Changing the evaluation model for annotations is a big semantic change, a backwards-incompatible change. It's not just adding new syntax for something that was a syntax error before, it would be changing the meaning of existing Python code. The transition from 3.6 to 3.7 is not like that from 2.x to 3.0 -- backwards compatibility is a hard requirement. Code that works a certain way in 3.6 is expected to work the same way in 3.7 onwards, unless we go through a deprecation period of at least one full release, and probably with a `from __future__ import ...` directive required. There may be a little bit of wiggle-room available for small changes in behaviour, under some circumstances -- but changing the evaluation model is unlikely to be judged to be a "small" change. In any case, before such a backwards-incompatible change would be allowed, you would have to prove that it was needed. [...]
Some or them might. Not everyone uses an IDE, it is not a requirement for Python programmers. Runtime exceptions are still, and always will be, the primary way of detecting such errors.
B. Type checkers catch this NameError
Likewise for type checkers.
C. Even the compiler can be made to catch this name error, since the name MyCalss is bound to builtins where it does not exist
How do you know it doesn't exist? Any module, any function, any class, any attribute access, might have added something called MyCalss to this module's namespace, or to the built-ins. It's okay for a non-compulsory type-checker, linter or editor to make common-sense assumptions about built-ins. But the compiler cannot: it has no way of knowing *for sure* whether or not MyCalss exists until runtime. It has to actually do the name lookup, and see what happens.
- you see, name lookup does happen at compile time anyway.
It really doesn't. You might be confusing function-definition time (which occurs at runtime) with compile time. When the function is defined, which occurs at runtime, the name MyCalss must exist or a NameError will occur. But that's not at compile time.
D. Really, where's the error here? if no tool looks at this signature, there's nothing wrong with it - As a human I understand perfectly.
class CircuitDC: ... class CircuitAC: ... def func(arg: CircuitSC): ... Do you still understand perfectly what I mean? -- Steve

On Fri, Sep 23, 2016 at 3:11 PM Steven D'Aprano <steve@pearwood.info> wrote:
The argument "It's an expression, because it's not a statement or a block" assumes that things must an expression, a statement or a block. Hence "trichotomy". And it is false. But I think we are getting lost in the terminology. Since I propose no change in what is considered valid syntax,
It indeed came out different than I meant. I don't suggest allowing anything that is not already allowed, syntactically. I only propose giving the current syntax a slightly different meaning, in a way that I'm sure matches how Python coders already understand the code.
:( this kind of fighting over terminology takes us nowhere indeed. What other context you see where the result of an expression is not intended to be used at all? Well there's Expression statements, which are evaluated for side effect. There's docstrings, which are a kind of annotations. What else? The only other that comes to mind is reveal_type(exp)... surely I don't need evaluation there. And it shouldn't be under dispute that annotations are expressions. mainly for the resulting value (hence "expression") and annotations are there mainly for being there. In the code.
Syntactically, yes. Just like X in "a = lambda: X" is an expression, but you don't see it evaluated, do you? And this is an _actual_ expression, undeniably so, that is intended to be evaluated and used at runtime.
I don't care if you call them expressions, delayed-expressions, or flying monkeys. The allowed syntax is exactly that of an expression (like inside a lambda). The time of binding of names to scope is the same (again like a lambda) but the evaluation time is unknown to the non-reflecting-developer. Decorators may promise time of evaluation, if they want to. "Unknown evaluation time" is scary. _for expressions_, which might have side effects (one of which is running time). But annotations must be pure by convention (and tools are welcome to warn about it). I admit that I propose breaking the following code: def foo(x: print("defining foo!")): pass Do you know anyone who would dream about writing such code?
My proposal solves the forward reference problem, but I believe in it because I believe it is aligned with what the programmer see.
I would like to see an example for a code that breaks under the Alexander's suggestion of forcing evaluation at `.__annotations__` access time.
How useful is the detection of this error at production?
Yeah it was just a thought. I wouldn't really want the compiler to do that.
Can you repeat that? NameError indeed happens at runtime, but the scope in which MyCalss was looked up for is determined at compile time - as far as I know. The bytecode-based typechecker I wrote rely on this information being accessible statically in the bytecode. def foo(): locals()['MyType'] = str def bar(a : MyType): pass
What do I miss?
No. def func(arg: CircuitAC): ... Do you understand what I mean? Code with small distance (hamming distance / edit distance) between related-but-different entities is prone to such errors, and NameError gives you very little assurance - if you erred this way, you get it; If you err that way, you don't. --- This way or the other, the very least that I hope, is explicitly forbidding reliance on side-effect or any other way to distinguish evaluation time of annotation expressions. Annotations must be pure, and the current promise of evaluation time should be deprecated. Additionally, before making it impossible to go back, we should make the new variable annotation syntax add its annotations to a special object __reflect__, so that __reflect__.annotations__ will allow forcing evaluation (since there is no mechanism to do this in a variable). Elazar

On Fri, Sep 23, 2016 at 11:58 PM, אלעזר <elazarg@gmail.com> wrote:
Function annotations ARE used. They're stored as function attributes, just as default argument values and docstrings are. (It's not the language's problem if you never use them.)
And the X in "if False: X" is a statement, but you don't see it evaluated either. This is an actual expression that has to be evaluated and used just like any other does.
Thing is, literally every other expression in Python is evaluated at the point where it's hit. You can guard an expression with control flow statements or operators, but other than that, it will be hit when execution reaches its line: def func(x): expr # evaluated when function called if cond: expr # evaluated if cond is true [expr for x in range(n)] # evaluated if n > 0 (expr for x in [1]) # evaluated when genexp nexted expr if cond else "spam" # evaluated if cond is true lambda: expr # evaluated when function called def func(x=expr): pass # evaluated when function defined def func(x: expr): pass # evaluated when function defined Default arguments trip some people up because they expect them to be evaluated when the function's called, but it can easily be explained. Function annotations are exactly the same. Making them magically late-evaluate would have consequences for the grokkability of the language - they would be special. Now, that can be done, but as Rumplestiltskin keeps reminding us, all magic comes with a price, so it has to be strongly justified. (For instance, the no-arg form of super() is most definitely magical, but its justification is obvious when you compare Py2 inheritance with Py3.)
Yes, side effects make evaluation time scary. But so do rebindings, and any other influences on expression evaluation. Good, readable code generally follows the rule that the first instance of a name is its definition. That's why we put imports up the top of the script, and so on. Making annotations not work that way isn't going to improve readability; you'd have to search the entire project for the class being referenced. And since you can't probe them at definition time, you have to wait until, uhh, SOME time, to do that search - you never know where the actual name binding will come from. (It might even get injected from another file, so you can't statically search the one file.)
This is on par with a proposal to make default argument values late-bind, which comes up every now and then. It's just not worth making these expressions magical.
The sooner you catch an error, the better. Always.
That locals() is not editable (or rather, that mutations to it don't necessarily change the actual locals). This is equivalent to: def foo(): locals()['MyType'] = str print(MyType)
In each case, you have to *call* foo() to see the NameError. It's happening at run time.
Define "pure". Function decorator syntax goes to some lengths to ensure that this is legal: @deco(arg) def f(): pass PEP 484 annotations include subscripting, even nested: def inproduct(v: Iterable[Tuple[T, T]]) -> T: so you'd have to accept some measure of run-time evaluation. It's worth reiterating, too, that function annotations have had the exact same semantics since Python 3.0, in 2008. Changing that now would potentially break up to eight years' worth of code, not all of which follows PEP 484. When Steve mentioned 'not breaking other uses of annotations', he's including this large body of code that might well not even be visible to us, much less under python.org control. Changing how annotations get evaluated is a *major, breaking change*, so all you can really do is make a style guide recommendation that "annotations should be able to be understood with minimal external information" or something.
Wow, lots of magic needed to make this work. Here's my counter-proposal. In C++, you can pre-declare a class like this: class Mutual2; //Pre-declare Mutual2 class Mutual1 { Mutual2 *ptr; }; class Mutual2 { Mutual1 *ptr; } Here's how you could do it in Python: Mutual2 = "Mutual2" # Pre-declare Mutual2 class Mutual1: def spam() -> Mutual2: pass class Mutual2: def spam() -> Mutual1: pass Problem solved, no magic needed. ChrisA

On 24 September 2016 at 01:58, Chris Angelico <rosuav@gmail.com> wrote:
Folks have been assuming that's straightforward, but the way class namespaces work actually makes it significantly harder than it first appears. Using lambda and default arguments to illustrate the problem: >>> class Example: ... attr = 10 ... @staticmethod ... def good_method(eager=attr): ... return eager ... @staticmethod ... def bad_method(lazy=(lambda:attr)): ... return lazy() ... >>> Example().good_method() 10 >>> Example().bad_method() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 8, in bad_method File "<stdin>", line 7, in <lambda> NameError: name 'attr' is not defined By design, function scopes can't see attributes defined in containing class scopes, and we don't currently have any other kind of scope that supports delayed evaluation (unlike function bodies, class bodies are evaluated eagerly at class definition time, and all the other delayed evaluation constructs are syntactic sugar for some particular flavour of function scope definition - even generators and coroutines use the same basic name resolution scheme as regular functions, they just use different execution models). If it was still 2006 or 2007 and Python 3.0 hadn't been released yet, lazy annotations could seriously be considered as an option. It's 2016 though, eager annotations have been out in the wild since December 2008, and the existing "string literals are Python's de facto lazy evaluation syntax" approach works well enough for the purpose, since type checkers can say they're actually going to parse those string literals when they appear to be type hints. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I promised not to bother you, but I really can't. So here's what I felt I have to say. This email is quite long. Please do not feel obliged to read it. You might find some things you'll want to bash at the end though :) Short-ish version: 1. Please consider disallowing the use of side effects of any kind in annotations, in that it is not promised when it will happen, if at all. So that a change 3 years from now will be somewhat less likely to break things. Please consider doing this for version 3.6; it is feature-frozen, but this is not (yet) a feature, and I got the feeling it is hardly controversial. I really have no interest in wasting the time of anybody here. If this request is not something you would ever consider, please ignore the rest of this email. 2. A refined proposal for future versions of the language: the ASTs of the annotation-expressions will be bound to __raw_annotations__. * This is actually more in line to what PEP-3107 was about ("no assigned semantics"; except for a single sentence, it is only about expressions. Not objects). * This is helpful even if the expression is evaluated at definition time, and can help in smoothing the transformation. 3. The main benefit from my proposal is that contracts (examples, explanations, assertions, and types) are naturally expressible as (almost) arbitrary Python expressions, but not if they are evaluated or evaluatable, at definition time, by the interpreter. Why: because it is really written in a different language - *always*. This is the real reason behind the existence, and the current solutions, of the forward reference problem. In general it is much more flexible than current situation. 4. For compatibility, a new raw_annotations() function will be added, and a new annotations() function will be used to get the eval()ed version of them. Similarly to dir(), locals() and globals(). * Accessing __annotations__ should work like calling annotations(), but frowned upon, as it might disappear in the far future. * Of course other `inspect` functions should give the same results as today. * Calling annotations()['a'] is like a eval(raw_annotations()['a']) which resembles eval(raw_input()). I believe the last point has a very good reason, as explained later: it is an interpretation of a different language, foreign to the interpreter, although sometimes close enough to be useful. It is of course well formed, so the considerations are not really security-related. I am willing to do any hard work that will make this proposal happen (evaluating existing libraries, implementing changes to CPython, etc) given a reasonable chance for acceptance. Thank you, Elazar --- Long version: Stephen - I read your last email only after writing this one; I think I have partially addressed the lookup issue (with ASTs and scopes), and partially agree: if there's a problem implementing this feature, I should look deeper into it. But I want to know that it _might_ be considered seriously, _if_ it is implementable. I also think that Nick refuted the claim that evaluation time and lookup *today* are so simple to explain. I know I have hard time explaining them to people. Nick, I have read your blog post about the high bar required for compatibility break, and I follow this mailing list for a while. So I agree with the reasoning (from my very, very little experience); I only want to understand where is this break of compatibility happen, because I can't see it. Chris: On Fri, Sep 23, 2016 at 6:59 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 23, 2016 at 11:58 PM, אלעזר <elazarg@gmail.com> wrote:
No, it isn't. I guess that even the code you write or consider to be excellent and readable still contains functions that use entities defined only later in the code. It is only when you follow execution path that you should be already familiar with the names. I think rebinding is only scary when it is combined with side effect or when the name lookup is not clear. And why do you call it _re_binding? <snip>
No. No. No. If a code in production will fail at my client's site because of a mispelled annotation (unused by runtime tools), I will be mad. *On the language*. It is just as reasonable as failing because of mispled documentation. (My suggestion does not prevent it completely of course. Nothing will. I only say this is unhelpful). <snip> It's worth reiterating, too, that function annotations have had the
exact same semantics since Python 3.0, in 2008.
When was this semantics decided and for what purposes, if I may ask? because the PEP (2006) explicitly states that "this PEP makes no attempt to introduce any kind of standard semantics". The main relevant paragraph reads (I quote the PEP, my own emphasis): "2. Function annotations are nothing more than a way of associating arbitrary Python EXPRESSIONS with various parts of a function at compile-time. By itself, Python does not attach ANY PARTICULAR MEANING or significance to annotations. Left to its own, Python simply makes these EXPRESSIONS available as described in Accessing Function Annotations below. The only way that annotations take on meaning is when they are interpreted by third-party libraries. These annotation consumers can do anything they want with a function's annotations." Amen to that! Word by word as my suggestion. Why aren't these _expressions_ available to me, as promised? <baby crying> Sadly, a few paragraphs later, the PEP adds that "All annotation expressions are evaluated when the function definition is executed, just like default values." - Now please explain to me how is that attaching "no particular meaning or significance to annotations". You practically evaluate them, for heavens sake! this is plain and simple "attached meaning" and "standard semantics". Unusefully so. I put there an expression, and all I got is a lousy object.
As I said, it is a strong argument - given an example of such a potential break for non-convoluted code. I want to see such an example. But why is "deprecating side effects in annotation's definition-time-execution" considered a breaking change? It is just a documentation. Everything will work as always has. Even edge cases. I would think this is possible even for the feature-freezed 3.6. Like saying "We've found a loophole in the language; it might get fixed in the future. Don't count on it."
Problem not solved. Your counter proposal solves only certain forward references, and requires keeping on more thing in sync, in particular adapting the scope of the "forward declaration" to the scope of the later definition, which may change over time and is in violation of DRY. Oh and lastly, type checkers will scream or will work very hard to allow this idiom. My proposal asks for no magic at all. Unless you consider dir() and locals() magical (they are. a bit). Stephen: Regarding the terminology, I said "But I think we are getting lost in the terminology." including myself. On Sat, Sep 24, 2016 at 10:07 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Another design principle is Occam's Razor, here applied as "new kinds of
Here's the version for annotations: The compiler turns them into an AST. The interpreter does nothing with them, except attaching them to the __annotations__ object. Are you the average programmer? then just read them, they should be helpful. How simple is that? thing shall not spring up like whiskers on Barry's chin." Yes, function annotations need new syntax and so are a new kind of thing to that extent. *Their values don't need to be,* Their values don't need to be there at all. All is needed is their structure. The AST. Their value is of very little use, actually. And I'm familiar with the typing module; it is bent here and there to match this need to have a "value" or an object when you actually need only the structure. I don't invent "new thing" any more than is already there. I have a strong belief that _already there_ is a new thing. In Python it is called "types", "annotations" and some forms of "documentations". In other languages it is called in other names. I don't mind the merging of this concept with the concept of Expression - I actually think it is brilliant. Sadly, the _interpreter_ does not understand it. It is (brilliantly again) admitted in the PEP, but sadly the interpreter does not seem to get it, and it tries to execute it anyway. So people shush it with a quote. Well done. Now where is that Expression thing? And why can't my editor highlight it?
Keeping an AST without evaluation at all is still a clear pessimization?
I argue that flow of control should not reach annotations at all. Not the control of the interpreter. It does not understand what's written there, and should not learn.
See my previous comment. The interpreter should not look for them. You know what? It can try to give a hint in name resolution. As a favor. If the tools can't find it, it's their business to report.
"[G]ood general theory does not search for the maximum generality, but for the right generality."
I believe this is the right generality "because maths" and my own intuition. This should not convince you at all.
As you've noticed, I refined my proposal to "don't evaluate". And here's my attempt at presenting the "because maths" argument you probably don't want to hear: it will allow natural and well-based way to express contracts and dependent types, which is a much more natural and flexible way to type dynamically-typed languages such as Python. It is nothing new really; it is based on a 40-years old understanding that types are propositions *and propositions are types*. And I want to use it. From simple to complex: @typecheck def to_float(x: int or str) -> float: ... (Oh, @typecheck can't access this, so they invented Union. Oh well. TODO: explain to junior what a Union[int, str] is) @typecheck def __add__(self, x: int and float) -> float: ... This should help resolve a real problem in type checkers regarding overloads and overlapping types. Except @typecheck can only see the object "float". And they did not invent Intersection[] yet. Bummer, but fixable. @dependent def call_foo_twice(x: x.foo()) -> None: x.foo() x.foo() Uh, I don't know. Perhaps x.foo() has side effect and I'm not sure how "dependent" works, and perhaps it is not so clear. Try again: @dependent def call_foo_twice(x: hasattr(x, "foo") and is_callable(x.foo)) -> None: x.foo() x.foo() But why NameError? :( Let's define contracts @contract def divmod(x: int, y: int and y != 0) -> (x//y, x % y): return # an optimized version NameError again? :( Not every function _should_ be annotated this way, but why can't you _allow_ this kind of annotation? These make a needless syntax error for an obviously meaningful expression. What if I want to specify "a class that subclasses Abstract but can be istantiated? I need it because otherwise mypy resorts to allowing unsafe code: def create(cls: typing.Type[Abstract] and cls(...) ) -> Base: return cls() NameError again. Why? Not because _you_ (people) don't understand it. No. It is because the _interpreter_ claims to understand it, but it doesn't. It cannot, because Python is not intended to be a specification language, and probably should not be. Even with simple types it doesn't, really. It just happen to look up the right _class_ which is not a type but is useful nonetheless; when I write "x: int" I actually mean "x: isinstance(x, int)", but the interpreter doesn't get it and should not get it. And what if I want "x: type(x)==int"? It has its uses. Regarding my "not an expression" claim: this is a proposition, an assumption, a precondition, or explanation - different incarnation of the same idea - as is any annotation system I have seen (except possibly injection, which will not break). Including Nick's "begin". Now notice this nice command-line-parser - the annotations there can still be strings, combined easily with type checkers. Why? because it targets human users, that's why. And the interpreter does not try to understand them because it does not understand English. Well, it does not understand Type-ish or Contract-ish, or any other Spec-ish. There are external tools for that, thank you very much. Just give them those things that you (rightly) consider to be expressions. I like the raw. Elazar

On Sun, Sep 25, 2016 at 11:55 AM, אלעזר <elazarg@gmail.com> wrote:
I don't think Python has any concept of *disallowing* side effects. As soon as arbitrary objects can be called and/or subscripted, arbitrary code can be executed. However, a style guide may *discourage* extensive side effects, and this I would agree with - not for reasons of future change, but for reasons of simplicity and readability.
So, basically, you want annotations to be able to make use of names defined in the object they're annotating. That's a reasonable summary of the idea, if I have this correct. I'll trim out a ton of quoted material that digs into details. Ultimately, though, you're asking to change something that has been this way since *Python 3.0*. You're not asking for a tiny tweak to a feature that's new in 3.6. If you were, perhaps this could be done, despite feature freeze; but you're breaking compat with eight years of Pythons, and that's almost certainly not going to happen.
Actually, no, I do generally stick to this pattern, builtins aside. Obviously there are times when you can't (mutually exclusive functions, for instance), but those are pretty rare. Here's an example program of mine: https://github.com/Rosuav/LetMeKnow/blob/master/letmeknow.py There is one star-import, which breaks this pattern (the global name CLIENT_SECRET comes from keys.py), and which I consider to be a failing under this principle; but it's better than most of the alternatives, and like all style recommendations, "define before use" is a rule that can be broken.
Then I strongly disagree. If it's going to fail at the client's site, I want it to first fail on my computer.
Decorators also have clearly defined local semantics and completely undefined overall semantics. If you see this in a .py file: @spaminate(ham=1) def frobber(): pass you know exactly what's happening, on a mechanical level: first spaminate(ham=1) will be called, and then the result will be called with frobber as an argument, and the result of that bound to the name frobber. But exactly what the spaminate decorator does is not Python's business. It might make frobber externally callable (cf routing decorators in Flask), or it might modify frobber's behaviour (eg require that ham be 1 before it'll be called), or it might trigger some sort of run-time optimization (memoization being an easy one, and actual code modifications being also possible). Annotations are the same. There's a clearly defined local syntactic handling: @multicall def frobber(ham: [10,20,30]): pass but nothing in the language that says what this truly means. In this case, I'm envisioning a kind of special default argument handling that says "if you don't provide a ham argument, call frobber three times with the successive values from ham's annotation". But you can be absolutely certain that, on the mechanical level, what happens is that the expression "[10,20,30]" gets evaluated, and the result gets stashed into the function's __annotations__. In contrast, function default arguments have *both* forms of semantics clearly defined. The expression is evaluated and the result stashed away; and then, when the function is called, if there's no argument, the default is used.
Deprecating in the sense of "style guides recommend against this" is fine. PEP 8 has been updated periodically, and it doesn't break anyone's code (except MAYBE linters, and even then they're not broken, just not up-to-date). But an actual code change that means that Python 3.7 will reject code that Python 3.5 accepted? That's a breaking change. And the purpose of your documentation-only deprecation is exactly that, or possibly Python 3.8 or 3.9, but timeframe doesn't change the fact that it will break code.
Type checkers that comply with PEP 484 are already going to support this notation, because "Mutual2" is a valid annotation. All I've done differently is make a simple assignment, in the same way that typevars get assigned.
Keeping an AST without evaluation at all is still a clear pessimization?
The AST for an expression usually takes up more memory than the result of it, yeah.
Please let's not go down this path. Already I have to explain to my students that this won't work: if response == "yes" or "y": If it *does* work in annotations but doesn't work everywhere else, that would be extremely confusing.
I'm not sure what the intersection of int and float would be, but perhaps you mean this more like Java's interfaces - something that "implements X" and "implements Y" is the intersection of the types X and Y.
Now, this is where stuff starts to get interesting. You want to be able to define an assertion in terms of the variables you're creating here. In effect, you have something like this: def divmod(x, y): assert isinstance(x, int) assert isinstance(y, int) and y != 0 ... # optimized calculation assert ret == (x // y, x % y) return ret As ideas go, not a bad one. Not really compatible with annotations, though, and very difficult to adequately parse. If you want to flesh this out as your proposal, I would suggest setting this thread aside and starting over, explaining (a) why actual assertions aren't good enough, and (b) how annotations could be used without breaking compatibility.
Actually, I don't understand exactly what this should do. Does it assert that cls can be instantiated with some unknown args? Because you then instantiate it with no args. What does cls(...) require?
And what if I want "x: type(x)==int"? It has its uses.
Then explicitly assert that. I don't see why you should type-declare that something is "this and not a subclass" - the whole point of subclassing is that it still is an instance of the superclass. Maybe what Python needs is a simple syntax for "AST-for-this-expression". We have lambda, which means "function which evaluates this expression"; this would operate similarly.
It'd be exactly the same as ast.parse("x + y"), but might be able to make use of the existing parsing operation, and would be more easily syntax highlighted. (Or maybe it'd start at the Expr node instead - so it'd be equiv to ast.parse("x + y").body[0].) Open to suggestions as to an actual name. With that change, your proposals could all be added in a 100% backward compatible way. Annotations, as a feature, wouldn't change; you'd just kappafy your contracts: @contract def divmod(x: int, y: kappa: int and y != 0) -> kappa: (x//y, x % y): ... And then you could define contract() as either the identity function (optimized mode, no checks done), or a wrapper function that does run-time checks. Maybe that, rather than making annotations magical, would solve the problem? ChrisA

On 25 September 2016 at 11:55, אלעזר <elazarg@gmail.com> wrote:
This may be part of the confusion, as Python is a language with a *reference implementation*, rather than relying solely on a documented language specification. Unless we specifically call something out in the language reference and/or the test suite as a CPython implementation detail, then "what CPython does" should be taken as the specification. While we're fairly permissive in allowing alternative implementations to deviate a bit and still call themselves Python, and sometimes alternate implementation authors point out quirky behaviours and we declare them to be bugs in CPython, "CPython correctly implements the Python language specification" is still the baseline assumption. So the order of evaluation for annotations with side effects has been defined since 3.0 came out:
That is, at function definition time: - default values are evaluated from left to right - annotations are evaluated from left to right
I don't think you're wasting anyone's time - this is a genuinely complex topic, and some of it relates to design instinct about what keeps a language relatively easy to learn. However, I do think we're talking past each other a bit. I suspect the above point regarding the differences between languages that are formally defined by a written specification and those like Python that let a particular implementation (in our case, CPython) fill in the details not otherwise written down may be a contributing factor to that Another may be that there are some things (like advanced metaprogramming techniques) where making them easy isn't actually a goal we necessarily pursue: we want to ensure they're *possible*, as in some situations they really are the best available answer, but we also want to guide folks towards simpler alternatives when those simpler alternatives are sufficient. PEP 487 is an interesting example of that, as that has the express goal of taking two broad categories of use cases that currently require a custom metaclass (implicitly affecting the definition of subclasses and letting descriptors know the attribute name they're bound to), and making them standard parts of the default class definition protocol. Ideally, this will lead to *fewer* custom metaclasses being defined in the future, with folks being able to instead rely on normal class definitions and those simpler extracted patterns.
PEP 3107 came with a reference implementation, it wasn't just the written PEP content: https://www.python.org/dev/peps/pep-3107/#implementation
* This is helpful even if the expression is evaluated at definition time, and can help in smoothing the transformation.
We talk about the idea of expression quoting and AST preservation fairly often, but it's not easy to extract from the archives unless you already know roughly what you're looking for - it tends to come up as a possible solution to *other* problems, and each time we either decide to leave the problem unsolved, or find a simpler alternative to letting the "syntactic support for AST metaprogramming" genie out of the bottle. Currently, the only supported interfaces for this are using the ast.parse() helper, or passing the ast.PyCF_ONLY_AST flag to the compile() builtin. This approach gives alternative implementations a fair bit of flexibility to *not* use that AST internally if it doesn't help their particular implementation. Once you start tying it in directly to language level features, though, it starts to remove a lot of that implementation flexibility.
"More flexible" is only a virtue if you have concrete use cases in mind that can't otherwise be addressed today. Since you mention design-by-contract, you may want to take a look at https://www.python.org/dev/peps/pep-0316/ which is an old deferred proposal to support DBC by way of a particular formatting convention in docstrings, especially as special formatting in docstrings was one of the main ways folks did type annotations before PEP 3107 added dedicated syntax for them.
4. For compatibility, a new raw_annotations() function will be added, and a new annotations() function will be used to get the eval()ed version of them.
Nothing *new* can ever be added for compatibility reasons: by definition, preserving backwards compatibility means old code continuing to run *without modification*. New interfaces can be added to simplify migration of old code, but it's not the same thing as actually preserving backwards compatibility.
Here you're getting into the question of expression quoting, and for a statement level version of that, you may want to explore the thread at https://mail.python.org/pipermail/python-ideas/2011-April/009765.html (I started that thread because I'd had an idea I needed to share so I could stop thinking about it, but I also think more syntactic sugar for metaprogramming isn't really something the vast majority of Python developers actually need) Mython, which was built as a variant of Python 2 with more metaprogramming features is also worth a look: http://mython.org/
I think the two basic road blocks you're running into are: - the order of evaluation for annotations with side effects is already well defined and has been since Python 3.0. It's just defined by the way CPython works as the reference implementation, rather than in English prose anywhere. - delayed evaluation already has two forms in Python (function scopes and quoted strings) and adding a third is a *really* controversial prospect, but if you don't add a third, you run into the fact that all function scopes inside a class scope are treated as methods by the compiler Stephen's post went into more detail on *why* that second point is so controversial: because it's a relatively major increase in the underlying complexity of the runtime execution model. The most recent run at it that I recall was my suggestion to extend f-strings (which are eagerly evaluated) to a more general purpose namespace capturing capability in https://www.python.org/dev/peps/pep-0501/ That's deferred pending more experience with f-strings between now and the 3.7 beta, but at this point I'll honestly be surprised if the simple expedient of "lambda: <f-string>" doesn't turn out to be sufficient to cover any delayed evaluation needs that arise in practice (folks tend not to put complex logic in their class bodies).
Most folks coming from pre-compiled languages like C++, C# & Java struggle with the fact that Python doesn't have separate compile time constructs (which deal with function, class and method declarations) and runtime constructs (which are your traditional control flow statements). Instead, Python just has runtime statements, and function and class definition are executed when encountered, just like any other statement. This fundamentally changes the relationship between compile time, definition time, and call time, most significantly by having "definition time" be something that happens during the operation of the program itself.
This code works as a doctest today: >>> def func(a: "Expected output"): ... pass ... >>> print(func.__annotations__["a"]) Expected output Any change that breaks that currently valid doctest is necessarily a compatibility break for the way annotations are handled at runtime. It doesn't matter for that determination how small the change to fix the second command is, it only matters that it *would* have to change in some way. In particular, switching to delayed evaluation would break all the introspection tools that currently read annotations at runtime, both those in the standard library (like inspect.signature() and pydoc), and those in third party tools (like IDEs). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks for the references. I will read them. In general, I am against magic in code. I am for magic in specification, with appropriate hints (e.g. the explicit name in the decorator, as pointed to me by Chris) and with taste. The most important part about specification is being naturally understood by human. The second most important is being understood by tools. What's not important: being understood by the interpreter. CPython as a reference implementation has a very, very specific behavior, changing at every minor release. Of course not every tiny detail of this behavior is promised. It is understood by users that e.g. they cannot rely on their code taking 0.6ms to execute in such and such settings, since real-time conדtraints are not promised even if some part of some version of CPython happens to run this fast deterministically. The implementation specifies the behavior, except when common sense or documentation says otherwise. Am I wrong? On Sun, Sep 25, 2016 at 7:07 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
But my intention is that this code will work just fine. As is any other access using __annotations__ or any existing API. The only visible change should be that of expressions with visible side effect, so this is the kind of break I am looking for. The following will break def foo(a: print(1)): pass But nobody (yet) claimed it to be a reasonable example of code we don't want to break. There can hardly be any, since any side effect can be placed right before the definition. So just like star-imports that are broken every time a function is added to some library, IIUC, you don't care about breaking them, because they are strongly and explicitly discouraged. Elazar

אלעזר writes:
But nobody (yet) claimed it to be a reasonable example of code we don't want to break.
"Reasonable example" is not the standard. The ideal is that *nobody*'s code breaks unless it's necessary to to fix a bug. The current implementation conforms to the specification[1], and therefore the proposed change is not a bugfix. The Yale Book of Quotations quotes English judge Robert Megarry as follows: "Whereas in England all is permitted that is not expressly prohibited, it has been said that in Germany all is prohibited unless expressly permitted and in France all is permitted that is expressly prohibited. In the European Common Market no-one knows what is permitted and it all costs more." http://freakonomics.com/2009/10/29/quotes-uncovered-death-and-statistics/ Python, of course, follows the principle of English law. That's what we mean by "consenting adults". The rules about change are more flexible in the stdlib, but even there we get reports every release about breakage due to improvements in various modules. This is the language definition, so "if you can do it in vX.Y, it should do the same in vX.(Y+1)" is a strict rule.[2] Footnotes: [1] Assuming, as I do, that in PEP 3107 "expression" refers only to the syntax specification and does not at all imply adding a new expression type to the language. What is stored in __annotations__ is thus implied to be the object that is the value of the expression, following the precedent of initialization, and the general Pythonic approach of evaluating expressions when encountered. And that semantics is stated explicitly in PEP 3107. [2] The definition of "do the same" does not necessarily mean "produce identical output", eg, in the case of "dir()" in the bare interpreter with no imports.

Nick Coghlan writes:
This is a bit unfair to אלעזר, although it's been a long thread so I can understand why some of his ideas have gone missing. His proposals have gotten a bit incoherent because he has been answering all the different objections one by one rather than organizing things into a single design, but I think eventually he would organize it as follows: (1) Add __raw_annotations__ and save the thunked expressions there, whether as code objects or AST. (2) Turn __annotations__ into a property which evaluates (and memoizes?) the thunks and returns them. (First explicitly suggested by Alexander Belopol, I think.) He claims that (2) solves the backward compatibility problem, I don't have the knowledge to figure out whether it is that simple or not. It seems plausible to me, so I'd love to hear an explanation. New ideas like DBC would of course be supported by the new __raw_annotations__ since there's no backward compatibility issue there. I'm still -1 on the whole thing on the principle "although sometimes never is better than *right* now". I think the aClass = "aClass" trick described by Chris is perfectly serviceable to deal with the PEP 484 forward type reference issue. The "let's turn all the annotations into expressions" idea can be practically exploited with ast.parse(). I'm guessing a decorator could be used to provide __raw_annotations__ and __annotations__ per (1) and (2) above (although I'm not sure how to do it myself: copying __annotations__ to __raw_annotations__ and then propertizing __annotations__ could be a bit tricky, I guess).

Thank you Stephen. You have phrased my proposal better than I did. As per the using quoted strings, the problems are: 1. The well-formedness of the expression is not checked by the compiler. 2. It is not naturally supported by syntax highlighters and IDEs. They can be made to support it, but most will not. Partly because 3. There is no natural way to distinguish quoted expressions from actual human-readable text (as in the begins library). 4. (My own taste): this is ugly and inconsistent, and there's 2 meaningless characters there :) (6 if multiline) On Sun, Sep 25, 2016 at 8:42 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

On Sep 25, 2016 10:59 AM, "אלעזר" <elazarg@gmail.com> wrote:
2. It is not naturally supported by syntax highlighters and IDEs. They can be made to support it, but most will not.
This is a complete red herring. Having a highlight rule of "apply highlights in string annotations" is straightforward in modern editors. This is like arguing Python should do <whatever> because Notepad.exe doesn't do something smart with it.

On Sun, Sep 25, 2016 at 9:28 PM David Mertz <mertz@gnosis.cx> wrote:
Not that I think it's a killer argument, but why a red herring? Quick search does not find such an explicit option in Gedit, PyDev and yes, Notepad++.exe. It is not a common or default option. Having such a rule by default amounts to admitting that these are not essentially strings, and the quotes there are overloaded. It also means that actual strings are not understood as such, and are incorrectly highlighted. But please let's not delve into this: it is of some importance, but should not affect an actual decision. IDEs are more important. Renaming facilities do over-renaming or under-renaming because of this need to rename inside some strings, but not inside others. Similarly code search facilities, and warnings from IDEs about inlining variables. I have encountered real bugs caused by such an imperfect renaming (and I hope your answer is not "don't do renaming"). A prefix like code"foo()" might help of course, but it is not really used as a string. Elazar

On 26 September 2016 at 03:42, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
OK, that does indeed make more sense, and significantly reduces the scope for potential runtime compatibility breaks related to __annotations__ access. Instead, it changes the discussion to focus on the following main challenges: - the inconsistency introduced between annotations (lazily evaluated) and default arguments (eagerly evaluated) - the remaining compatibility breaks (depending on implementation details) - the runtime overhead of lazy evaluation - the debugging challenges of lazy evaluation The inconsistency argument is simply that people will be even more confused than they are today if default arguments are evaluated at definition time while annotations aren't. There is a lot of code out there that actively relies on eager evaluation of default arguments, so changing that is out of the question, which then provides a strong consistency argument in favour of keeping annotations eagerly evaluated as well. There would likely still be some compatibility breaks around name access in method annotation definitions, and compatibility would also break for any code that actually did expect to trigger a side-effect at definition time. This is a much smaller scope for breakage than breaking __annotations__ access, but we can't assume it won't affect anyone as there's a lot of code out there that we'd judge to be questionable from the point of view of maintainability and good design aesthetics that nevertheless still solves the problem the author was aiming to solve. The runtime memory overhead of lazy evaluation isn't trivial. Using a naive function based approach:
And that's only the function object itself - it's not counting all the other objects hanging off the function object like the attribute dictionary. A more limited thunk type could reduce that overhead, but it's still going to be larger in most cases than just storing the evaluation result. The impact on runtime speed overhead is less certain, but also likely to be a net negative - defining functions isn't particularly cheap (especially compared to literal references or a simple name lookup), and calling them if you actually access __annotations__ isn't going to be particularly cheap either. The debugging challenge is the same one that arises with any form of delayed evaluation: by default, the traceback you get will point you to the location where the delayed evaluation took place *not* the location where the flawed expression was found. That problem can be mitigated through an exception chaining design that references the likely location of the actual error, but it's never going to be as easy to figure out as cases where the traceback points directly at the code responsible for the problem. So I'm still -1 on the idea, but it's not as straightforward as the argument against the naive version of the proposal that also broke __annotations__ lookup. Cheers, Nick. P.S. As an illustration of that last point, the PEP 487 implementation currently makes problems with __set_name__ attribute definitions quite hard to figure out since the traceback points at the class definition header, rather than the offending descriptor assignment: http://bugs.python.org/issue28214 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thank you all. I think this thread is pretty much close by now. I understand at least most of your concerns and I will take time to shape my idea. I wanted to note one last thing, though, regarding my claim that annotations are not actually standard expressions: Guido had once expressed his concerns regarding the performance hit of using cast(), since it is not easily (or at all) optimized away. This performance hit should not be there in the first place, if the distinction between annotations and evaluatable expressions was kept - i.e. by allowing the attachment of annotations to expressions (as I believe was proposed several times). Now, I understand that there are very good reasons not to allow it; keeping the language simple and familiar would be my first guess - but note how the semantics of the "main" language is hindered by the complexities of its specification-related syntactic subset, which is not due, in my opinion. If you want to specify things, the syntactic hit is unavoidable, but the semantic hit is not. (BTW why isn't it written cast[T](exp) ?) Thank you again for this discussion Elazar

Let's talk about lazy evaluation in a broader sense that just function annotations. If we had syntax for lazy annotation -- let's call them thunks, after Algol's thunks -- then we could use them in annotations as well as elsewhere. But if we special case annotations only, the Zen has something to say about special cases. On Mon, Sep 26, 2016 at 02:57:36PM +1000, Nick Coghlan wrote: [...]
Default arguments are a good use-case for thunks. One of the most common gotchas in Python is early binding of function defaults: def func(arg=[]): ... Nine times out of ten, that's probably not what you want. Now, to avoid all doubt, I do not want to change function defaults to late binding. I've argued repeatedly on comp.lang.python and elsewhere that if a language only offers one of early binding or late binding, it should offer early binding as Python does. The reason is, given early binding, it it trivial to simulate something like late binding: def func(arg=None): if arg is None: arg = [] ... but given late binding, it is ugly and inconvenient to get a poor substitute for early binding when that's what you want. So, please, let's not have a debate over the behaviour of function defaults. But what if we could have both? Suppose we use backticks `...` to make a thunk, then we could write: def func(arg=`[]`): ... to get the late binding result wanted. Are there other uses for thunks? Potentially, they could be used for Ruby-like code blocks: result = function(arg1, arg2, block=```# triple backticks do_this() do_that() while condition: do_something_else() print('Done') ```, another_arg=1) but then I'm not really sure what advantage code blocks have over functions.
Indeed. There are only (to my knowledge) only two places where Python delays evaluation of code: - functions (def statements and lambda expressions); - generator expressions; where the second can be considered to be syntactic sugar for a generator function (def with yield). Have I missed anything? In the same way that Haskell is fundamentally built on lazy evaluation, Python is fundamentally built on eager evaluation, and I don't think we should change that. Until now, the only way to delay the evaluation of code (other than the body of a function, of course) is to write it as a string, then pass it to eval/exec. Thunks offer an alternative for delayed evaluation that makes it easier for editors to apply syntax highlighting: don't apply it to ordinary strings, but do apply it to thunks. I must admit that I've loved the concept of thunks for years now, but I'm still looking for the killer use-case for them, the one clear justification for why Python should include them. - Late-bound function default arguments? Nice to have, but we already have a perfectly serviceable way to get the equivalent behaviour. - Code blocks? Maybe a Ruby programmer can explain why they're so important, but we have functions, including lambda. - Function annotations? I'm not convinced thunks are needed or desirable for annotations. - A better way to write code intended for delayed execution? Sounds interesting, but not critical. Maybe somebody else can think of the elusive killer use-case for thunks, because I've been pondering this question for many years now and I'm no closer to an answer.
It's not just published code. It's also one-off throw-away code, including code executed in the interactive interpreter then thrown away. It is really helpful to be able to monkey-patch or shadow builtins, insert some logging code or even a few print statements, or perhaps something that modifies and global variable, for debugging or to learn how something works. Could it be equally useful inside annotations? I expect so... complicated only by the fact that one needs to monkey-patch the *metaclass*, not the type itself. It may be that I'm completely off-base here and this is a stupid thing to do. But I say that until the community has more experience with annotations, we shouldn't rule it out. (Just to be clear: I'm mostly talking about interactive exploration of code, not production code. Python is not Ruby and we don't encourage heavy use of monkey-patching in production code. But it has its uses.)
This is better: py> sys.getsizeof((lambda: "").__code__) 80
Nevertheless, an explicit thunk syntax will make this a matter of consenting adults: if you choose to shoot your foot off with a hard-to-debug thunk, you have nobody to blame but yourself. Or whoever wrote the library that you're using. *wink* -- Steve

On Mon, Sep 26, 2016 at 10:46:57PM +1000, Steven D'Aprano wrote:
Well, there's a use-case I have been pondering for a long while now which could be satisfied by this: enumerated generator displays. So suppose you have a composite boolean value, composed by the 'and' of many conditions (which all take long to compute), and you want to short-circuit. Let's take the following example. valid = True valid &= looks_like_emailaddress(username) valid &= more_than_8_characters(password) valid &= does_not_exist_in_database(username) valid &= domain_name_of_emailaddress_has_mx_record(username) ... some more options ... (I forgot the exact use-case, but I still remember the functionality I wanted, so bear with me). Of course, the above is not short-circuiting, so it would be replaced by def check_valid(username, password): if not looks_like_emailaddress(username): return False if not more_than_8_characters(password): return False if not does_not_exist_in_database(username): return False if not domain_name_of_emailaddress_has_mx_record(username): return False ... return True valid = check_valid() or valid = True\ and looks_like_emailaddress(username)\ and more_than_8_characters(password)\ and does_not_exist_in_database(username)\ and domain_name_of_emailaddress_has_mx_record(username) But in all reality, I want to write something like: valid = all(@@@ looks_like_emailaddress(username), more_than_8_characters(password), does_not_exist_in_database(username), domain_name_of_emailaddress_has_mx_record(username), @@@) With `@@@` designating the beginning/ending of the enumerated generator display. Now, this is currently not possible, but if we had some kind of thunk syntax that would become possible, without needing an enumerated generator display. However the problem I see with the concept of `thunk` is: When does it get un-thunked? In which of the following cases? 1. When getting an attribute on it? 2. When calling it? --> See 1. with `__call__`. 3. When subindexing it? --> See 1. with `__getitem__`. 4. When assigning it to a name? It shouldn't have to be un-thunked, I think. 5. When adding it to a list? No un-thunking should be necessary, I think. However, the problem with thunks is (I think) that to make that happen either - *all* objects need to include yet another level of redirection, or - a thunk needs to get allocated the maximum size of the value it could possibly store. (But a `unicode` object could have an arbitrary size) or - there needs to be some way to 'notify' objects holding the thunk that its value got updated. For a dict/list/tuple this could readily grow into O(n) behaviour when un-thunking a thunk. or - any C-level functionality needs to learn how to deal with thunks. For instance, `Py_TYPE` would have to *resolve* the thunk, and then return the type of the value. or - I'm running out of ideas here, but maybe creating a custom type object for each thunk that does pass-through to a wrapped item? Thunked objects would work *exactly* the same as normal objects, but at a (small) indirection for any action taken. Still, somehow `Py_TYPE` and `Py_SIZE` and any other macros would still have to force evaluation. Kind regards, Sjoerd Job

Hello everyone, this idea looks like something I have tried building already: https://github.com/llllllllll/lazy_python. This project implements a `thunk` class which builds up a deferred computation which is evaluated only when needed. One use case I have had for this project is building up a larger expression so that it may be simplified and then computed concurrently with dask: http://daisy-python.readthedocs.io/en/latest/. By building up a larger expression (and making the tree accessible) users have the ability to remove common subexpressions or remove intermediate objects. In numpy chained expressions often make lots of allocations which are quickly thrown away which is why projects like numexpr ( https://github.com/pydata/numexpr) can be such a serious speed up. These intermediates are required because the whole expression isn't known at the start so it must be evaluated as written. Things to consider about when to evaluate: 1. Functions which branch on their input need to know which branch to select. 2. Iteration is really hard to defer in a way that is efficient. lazy_python just eagerly evaluates at iteration time but builds thunks in the body. 3. Stateful operations like IO which normally have an implied order of operation now need some explicit ordering. Regarding the `Py_TYPE` change: I don't think that is correct unless we made a thunk have the same binary representation as the underlying object. A lot of code does a type check and then calls macros that act on the actual type like `PyTuple_GET_ITEM` so we cannot fool C functions very easily. On Mon, Sep 26, 2016 at 9:27 AM, Sjoerd Job Postmus <sjoerdjob@sjoerdjob.com
wrote:

You already know I want this for contracts etc.. Here some things that I consider important: 1. There should be some way to bind the names to function parameters, as in @contract def invert(x: `x != 0`) -> float: return 1 / x @contract def invertdiff(x: int, y: `x != y`) -> float: return 1 / (x-y) 2. For this and other reasons, the AST should be available. I think it can be a single AST per place in code, but it should be immutable. 3. Backticks are problematic because they cannot be nested. I suggest (name: <expression>) or ('name': expression). This name can be googled. def compose(f: `such_that: pure(f)`, g: `such_that: pure(g)`): return lambda x: f(g(x)) 4. I think it's a bad idea to use thunks as DSL (different semantics than standard expressions), except in annotations and for specification purposes. In short, I want this thing. But for only annotations, assertions, and possibly default arguments as an ad-hoc fix. Elazar On Mon, Sep 26, 2016 at 5:05 PM Joseph Jevnik <joejev@gmail.com> wrote:

On Sun, Sep 25, 2016 at 01:55:09AM +0000, אלעזר wrote:
1. Please consider disallowing the use of side effects of any kind in annotations,
That is *simply not possible* in Python. Actually, no, that's not quite correct. One way to prohibit side-effects would be to make all annotations string literals, and ONLY string literals. Or possibly bare names (assuming current semantics for local variable name lookup): def func(arg:'no possible side effects here') -> OrHere: ... But as soon as allow such things as union types and lists, then all bets are off: def func(arg:Sequence[list]): ... There is no way of prohibiting side effects in type(Sequence).__getitem__ once it is called. Nor would we want to. The ability to shadow or monkey-patch types for mocking, testing, debugging etc, including the ability to have them call print, or perform logging, is a feature beyond price. We don't need it often, but when we do, the ability to replace Sequence with a mock that may have side-effects is really useful.
It has been a feature since Python 3.0 that annotations are evaluated at runtime. And that means the possibility of side-effects. So, yes, it is already a feature. Even if you get the behaviour that you want, the absolute earliest it could happen would be after a deprecation period of at least one point release. That means: * 3.7 introduces a DeprecationWarning whenever you use annotations which aren't simple names or strings; * and possibly a __future__ import to give the new behaviour; * and 3.8 would be the earliest it could be mandatory. Forget about 3.6 -- that's already frozen apart from bug fixes, and this is not a bug.
and I got the feeling it is hardly controversial.
It is extremely controversial. The fact that you can say that it isn't suggests that you're not really paying attention to what we're saying. Even if what you ask for is easy (it isn't), or even possible, it still goes completely and utterly against the normal semantics of Python and the philosophy of the language. No, under normal circumstances nobody is going to write: def func(arg: mylist.append(value) or int): ... in production code. That's simply bad style. But we don't ban things just because they are bad style. Circumstances are not always normal, sometimes it is useful to use dirty hacks (but hopefully not in production code), and Python is not a B&D language where everything is prohibited unless explicitly allowed.
I really have no interest in wasting the time of anybody here.
And yet, despite receiving virtually no interest from any other person, you continue to loudly and frequently argue for this proposal. [...]
All expressions evaluate to a value. And all values in Python are objects. I don't understand what distinction you think you are making here. Are you suggesting that Python should gain some sort of values which aren't objects?
Wrong. Not always. The proof is that Python exists. Contracts, types, assertions etc in Python *are* written in Python. That's the end of the story. You cannot argue that "contracts are written in a different language" because that is untrue. Contracts are written in Python, and we wouldn't have it any other way.
The forward reference problem still exists in languages where type declarations are a separate language, e.g. Pascal, C++, Java, etc. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4754974 http://stackoverflow.com/questions/951234/forward-declaration-of-nested-type... etc. There are many ways around it. One way is to make the language so simple that forward declarations aren't relevant. Another is to make multiple passes over the source code. Another is to introduce an explicit "forward" declaration, as in some dialects of Pascal. Python uses strings.
I believe the last point has a very good reason, as explained later: it is an interpretation of a different language,
But it *isn't* such a thing, nor should it be.
foreign to the interpreter, although sometimes close enough to be useful.
Sometimes close enough to be useful. Does that mean it is usually useless? *wink*
It is of course well formed, so the considerations are not really security-related.
You've talked about eval'ing the contents of __raw_annotations__. That means if somebody can fool you into storing arbitrary values into __raw_annotations__, then get you to call annotations() or use inspect, they can execute arbitrary code. How is this not a security concern? It might be hard to exploit, since it requires the victim to do something like: myfunc.__raw_annotations__['arg'] = something_untrusted but if exploited, the consequences are major: full eval of arbitrary code. In comparison, the only similar threat with annotations today is if the victim is fooled into building a string containing a def with annotations, then passing it to exec: annot = something_untrusted code = """def func(arg: %s): ... """ % annot exec(code) but if you're using exec on an untrusted string you have already lost. So annotations as they exist now aren't adding any new vulnerabilities. Still, the important thing here is not the (hard to exploit) potential vulerability, but the fact that your proposal would lead to a massive increase in the complexity of the language (a whole new compiler/ iterpreter for the second, types-only, mini-language) and an equally major *decrease* in useful functionality. Have I mentioned that I'm against this? If not, I'm against it. -- Steve

On Thu, Sep 22, 2016 at 05:19:12PM +0000, אלעזר wrote:
Right, like all other Python expressions in general, and specifically like function parameter default arguments.
This makes it necessary to use string representation for names that are not yet bound, which affects almost every class definition.
Almost every class? Really? I find that implausible. Still, I can see it affecting *many* class definitions, so let's not quibble.
Unlikely, unless you're talking about functions nested inside other functions, or unusual (but legal and sometimes useful) conditional definitions: if condition: # forward reference to MyClass def f(arg:'MyClass'): ... else: # oops, untested path def f(arg:MyClass): ... class MyClass: ... But generally speaking, that sort of code is unusual, and besides, if you're doing this, either the static type checker won't be able to cope with it at all (in which case there's little point in annotating the function), or it will cope, and detect the invalid annotation.
I would call that a bug in PyDev.
-1 on complicating the simple Python model that expressions are evaluated when they are reached. You would also complicate the introspection of annotations. With your proposal, *every* annotation would be a function, and every(?) inspection would require calling the function to find out what the real annotation is. And what would that do to modules which use annotations for some other purpose? I know Guido is keen to discourage such alternative uses, but they're still legal, and a change like this would outright break them. Personally, I'm not convinced that it is a burden to expect people to remember to quote forward references. If they forget, they will nearly always get a NameError at runtime or a warning/error when they run the type checker.
A bit more pleasant. It's not unpleasant to read 'MyClass' instead of MyClass.
will be less surprising for beginners,
Only because said beginners aren't familar enough to be surprised by how surprising this is.
and will help editors in syntax highlighting and name lookup.
But will harm runtime introspection. -- Steve

On Thu, Sep 22, 2016 at 11:42 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I would say this affects a "rare class here and there." Almost all typing will be with things defined in the `typing` module (or built-ins). I guess once in a while we'll see e.g. `Sequence[CustomThing]`, but it will be uncommon for that typing involving `CutomThing` to be within CustomThing itself (well, unless you use much more recursion than Python encourages).
-1 on complicating the simple Python model that expressions are evaluated when they are reached.
I think there is a decent argument for a more general concept of macros, or symbols, or simpler delayed evaluation than lambda for Python in general. I see places where this would be very nice for Pandas, for example, and for Dask (I work with the developers of both of those projects). In such a hypothetical future world we might come to allow, e.g. `Sequence[#CustomThing]` where some general lazy facility or indirection is indicated by the '#' (just a placeholder for this comment, not a proposal). But if that comes about, it should be available everywhere, not only in annotations. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Sep 22, 2016 at 10:29 PM David Mertz <mertz@gnosis.cx> wrote:
I think we're talking about different things here. I just referred to the common need to use the name of the current class in type annotation class A: def add(self, other: A) -> A: ...
I generally agree, but this future world must be very far and has many consequences, whereas the story of annotations is special in that it's not actually an expression, to the reader. Elazar

On Thu, Sep 22, 2016 at 12:35 PM, אלעזר <elazarg@gmail.com> wrote:
The CPython developers (of whom I'm not one, but I've followed them closely for 18 years) place a high value on simplicity in the parser and interpreter. Adding a new custom type of thing that is an "annotation object" would be a special case with a high burden to show its utility. My feeling is that this burden is actually lower for a new "delayed eval object" that might conceivably be added at a syntax level. In some sense, this would add just as much complexity as a new annotation object, but it would be something that applies many places and hence perhaps be worth the added complexity. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Sep 22, 2016 at 10:45 PM David Mertz <mertz@gnosis.cx> wrote: then so be it; I can't claim I know better. I only speculate that it does not necessarily requires a new custom type. A delayed eval object will be very useful for initilizers, for the very reason that the current behavior is surprising. -- This made me think about Steven's argument above: it is not true that expressions are evaluated when they are encountered, since x = lambda: print(1) prints nothing. So a colon before an expression hints about delayed evaluation. This includes annotations and lambda. Elazar

On Thu, Sep 22, 2016 at 12:59 PM, אלעזר <elazarg@gmail.com> wrote:
I don't mean a runtime type here (necessarily), but rather a new type for the parser. I.e. transform this actual expression into some sort of delayed expression when parsing. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Thu, Sep 22, 2016 at 11:02 PM David Mertz <mertz@gnosis.cx> wrote:
Just as a demonstration, the parser can transform `EXP` into `lambda: EXP` - and that's it. It will not solve everything (e.g. error messages and .__annotation__ access as Alexander says), but it demonstrates the fact that the change need not be so deep at all. Elazar

On Thu, Sep 22, 2016 at 4:29 PM, אלעזר <elazarg@gmail.com> wrote:
On the second thought, why can't the parser a simply replace A with 'A' in annotations that appear in the body of class A? This will only break somewhat pathological code that defines A before it is (re)defined by the class statement.

On Fri, Sep 23, 2016 at 6:48 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On the third thought, this entire feature can be implemented in the metaclass by injecting A = 'A' in the dict in __prepare__.
That would be the easiest, and least magical, solution. It simply means that the name of the current class is available as a pseudo-reference to itself, for typing purposes only. It parallels function recursion, which is done using the function's name: # Recursion in functions def spam(): return spam() # Recursion in type annotations class Spam: def make_spam() -> Spam: return self Clean and simple. And there's less magic here than super() - a lot less. It does mean that Spam.Spam == "Spam" forever afterwards, but I doubt that's going to break anything. It'd be just like __name__, except that currently, Spam.__name__ is set afterwards, so it's not available during class definition (and the module's name will be used instead). ChrisA

On Fri, Sep 23, 2016 at 12:18 AM Chris Angelico <rosuav@gmail.com> wrote:
I just note that it *is* surprising, for most users, that you can't be sure that this is a recursion, yet. So it if you want a trusted-upon recursion you should write # spam: def spam(): def spam(): return spam() return spam() Elazar

On Fri, Sep 23, 2016 at 7:33 AM, אלעזר <elazarg@gmail.com> wrote:
Only surprising for people who want it _guaranteed_. It's the exact same problem as this: def helper(x): ... def spaminate(x, y): helper(x) helper(y) How do you know that a replacement helper hasn't been injected? You don't... but you trust that people aren't normally going to do that, and if they do, they're taking responsibility (maybe they're mocking helper for testing). ChrisA

On Thu, Sep 22, 2016 at 09:33:58PM +0000, אלעזר wrote:
Who are these "most users" of which you speak? Fortran programmers? C programmers? *Beginner* Python programmers? You should specify who you are referring about, rather than claim "most" without evidence. Experienced Python programmers should realise that recursion in Python is implemented by name lookup, like all other function calls, so if you rebind the name "spam" to something else, the function will call something else. This is no different from any other form of function call, including calls to built-ins. If you rebind or shadow a name, you will change which object is called. That shouldn't be a surprise, whether it involves recursion or not.
*shrug* But if I do that, then I make it difficult or impossible to monkey-patch spam on the fly, for instance in the interactive interpreter. I wouldn't do it in production, but for interactive exploritory work, it is astonishing how often monkey-patching comes in handy. Just yesterday I played around with some code where I monkey-patched the built-in iter() so I could get a better idea of how the code worked. The straight-forward and simple way of writing a recursive spam() function surprises beginners, but they might go years or their entire career without running into a situation where they are caught by surprise. After all, it is rare for productuon code to rename functions, and rarer still to do it to recursive functions: func = spam spam = something_else() func() # why does the recursion not work??? In production code, that sort of thing almost never happens. On the other hand, your clever trick for preventing that surprise will surprise *me* and other experienced Pythonistas who know how recursion and function calls work in Python and expect to be able to take advantage of that when and if needed. In other words, in order to protect beginners from accidents which are extremely rare, you will take away power from experienced programmers who are more likely to want to make use of that power. I don't think that's a good tradeoff. For the avoidance of doubt: we're all adults here. If you personally want to write your recursive functions the "trusted" way, go right ahead. It will make them just a little bit less useful to experts, add an insignificant amount of safety, and require a bit more work on your part. But it's your code, and I don't intend to tell you not to do this. In the meantime, I'll usually just write my recursive functions the old-fashioned normal way. -- Steve

On Fri, Sep 23, 2016 at 12:35 PM, Steven D'Aprano <steve@pearwood.info> wrote:
There's actually one very common technique involving rebinding functions. @count_calls def mergesort(lst): mid = len(lst) // 2 if not mid: return lst return merge(mergesort(lst[..mid]), mergesort(lst[mid..])) *Obviously* this is recursive. But if you used some magic that said "call the function that's currently being called", you'd be bypassing the count_calls decoration (which would presumably work by creating a wrapper function). Yes, it may defeat some potential optimizations (eg tail recursion optimization), but it enables all this flexibility. So we _need_ to have this kind of rebind available, and not just for experts.
In the meantime, I'll usually just write my recursive functions the old-fashioned normal way.
As will I. Of course, people are welcome to work differently, just as long as I never have to write tests for their code, or refactor anything into a decorator, or anything like that. I want the POWAH!!!!! :) ChrisA

On Fri, Sep 23, 2016 at 5:54 AM Chris Angelico <rosuav@gmail.com> wrote:
I think you are mixing levels of abstraction because you know how this is implemented. The user only sees "A function named mergesort decorated by count_calls". She does not see "A function named mergesort passed to a higher order function named count_calls whose result is bound into the variable mergesort". Even if the latter is exactly what happens, declaratively the former is more accurate by intention. Ideally, the calls to mergesort will rebind to this _decorated_ function. not to the mutable global variable. Again, the argument that it will be very hard to implement it in a different way, or that is will break things, is a very strong argument, and I am not confronting it.
As will I, simply because the old-fashioned way is more readable. And I will sadly accept the fact that I can't be 100% sure what's function is called at runtime. But _some_ people (medium-level, Steven, whose main language is probably not Python) will not even know this is the case. Tests are important and could have reworked into the system (through inspect, or by using a special import which allow monkey patching). I can't see why the ability to test must remain in production. Elazar

אלעזר writes:
And the dinosaurs will have returned by independent evolution by the time it matters to them (unless it's a deliberate attack, in which case people at that level would be toast anyway). But I think you're completely missing what people are trying to tell you. You shouldn't be so concerned with refuting their arguments because it doesn't matter. No matter how many points you amass for technique, you're going to get creamed on style points anyway. It's like this: (1) Python is a "consenting adults" language, and that is presumed by its development culture. The goal is not to stop people from creating "functions that look like recursions but aren't" on purpose; it's to make it easy for them to write recursive functions if they want to. From your example, that goal is obviously satisfied. Nobody who matters wants to go farther than that in Python. The reason one can create "functions that look like recursions but aren't" is because another of Python's goals is to ensure that all things -- specifically including functions -- are objects that can be manipulated "the same way" where appropriate -- in this case, saving off the original function object somewhere then rebinding the original name to something else.[1] Granted, we don't go so far as Lisp where expressions are lists that you can manipulate like any other list, but aside from the fact that the code itself is an opaque object, functions are no different from other objects. Even builtins: Python 3.6.0a4 (default, Sep 3 2016, 19:21:32) >>> def help(*ignored, **paid_no_attention): ... print("Ouch, you just shot off your foot!") ... >>> help(help) Ouch, you just shot off your foot! >>> Shooting off your opposite extremity by redefining builtin classes is left as an exercise for the reader. All of this is a matter of the general attitude of pragmatism and bias toward simplicity of implementation (both enshrined in the Zen of Python). (2) You keep talking about others being lost in terminology, but in the context of Python discussions, you have a really big problem yourself. You use the phrase "just an annotation" as though that means something, but there is nothing like a "just an <anything>" in Python discourse, not in the sense that "once we introduce <anythings>s, they can be anything we want". The Language Reference defines what things are possible, and truly new ones are rarely added. This is deliberate. Another design principle is Occam's Razor, here applied as "new kinds of thing shall not spring up like whiskers on Barry's chin." Yes, function annotations need new syntax and so are a new kind of thing to that extent. *Their values don't need to be,* and even the annotations themselves are implemented in the preferred way for "new things" (a dunder on an existing type). Since it's new syntax, it's language-level, and so the values are going to be something already defined in the language reference. "Expression resolving to object to be saved in an attribute on the function" seems to be as close to "anything you want" as you're gonna get without a new kind of thing. (3) Python has a very simple model of expressions. The compiler turns them into code. The interpreter executes that code, except in the case where it is "quoted" by the "def" or "lambda" keywords, in which case it's stored in an object (and in the case of "def", registered in a namespace). As Nick admits, you could indeed argue that initializations and annotation values *could* consistently be turned into "thunks" (stored code objects, we already have those) in attributes on the function object. But (1) that's an extension of the model (admittedly slight since functions, which already do that for their bodies, are involved -- but see Nick's reply for the hidden difficulties due to normal handling of namespaces in Python), and (2) it's a clear pessimization in the many cases where those values are immutable or very rarely mutated, and the use case (occasional) of keeping state in mutable values. The thunk approach is more complex, for rather small benefit. Re "small benefit", IMHO YMMV, but at least with initialization Guido is on record saying it's the RightThang[tm] (despite a propensity of new users to write buggy initializations). (4) Chris argues that "compile to thunk" is incoherent, that expressions in function bodies are no different than anywhere else -- they're evaluated when flow of control reaches them. AFAICS that *still* doesn't rule out having the compiler recognize the syntax and produce code that returns thunks instead of ordinary values, but Chris's point makes that seem way too magical to me. (5) This points up the fact that Python is thoroughly dynamic. It's not just that types adhere to objects rather than variables, but the whole attitude toward language design and implementation is. A variable not defined because it's on the path not taken, or even a function: they just don't exist as far as the interpreter is concerned -- there's no way to find them from Python. That's not true in say C: if you have a powerful enough debugger, you can even call a function defined, but never referenced, in the code. So while we'd be happy for people familiar with "statically-typed languages" to enjoy the benefits of using Python for some of their work, we can't help them if they can't shake off that attitude when using Python. Making things seem intuitive (which here translates to "familiar", as usual) to them is just misleading. Python doesn't work that way, and often enough, that matters. (6) As you point out: of course, thunks are more general than values (in fact, without instructions that move them around, in computers a value is just a tree falling in a forest with noone to hear). But maximum generality is not necessarily an important goal, even if it makes some things prettier. Allow me to quote the late Saunders Mac Lane: "[G]ood general theory does not search for the maximum generality, but for the right generality." (7) Re Nick's comment about backward compatibility on the high bar having a G degree of difficulty, I'm sure you don't disagree with the principle of avoiding compatibility breaks. But while that's probably the argument that defeats this proposal here, I think that even looking forward from the time before releasing Python 3.0, the decision would be the same. That is, I think the decision to go with the simpler "evaluate to object" model was one of the right decisions at that time, for the reasons above. Your proposal of "evaluate to thunk" (possibly incorporating the property-based magic Alexander proposed) might be right *too*, but it's far from obviously better to me. I see nothing there that would be likely to have dissuaded the authors of PEP 3107, or Guido when he designed default initialization, from "evaluate to object". If you don't like that philosophy, or somehow don't think it applies here, keep trying, you may have a point. But in this thread, IMO you're trying to ski up a slope that's barely able to hold snow, and wasting everybody's time. And even if I'm wrong about wasting time with the feature, you'll be way more persuasive if you argue in terms of Python as it is designed (mostly deliberately so), which is the way most of us mostly like it. Although we do change our mind every 18 months. :-) Footnotes: [1] For maximum humor, rebind it to a different recursive function!

Thank you all for your feedback. I will try to respond to it in a way that will not waste your time, But to do that I still need an example for the stromgest issue raised - bacwards compatibility. It is not just the mere change that is incompatible, since _any_ visible change is incompatible in some way, or otherwise it wasn't visible. Again, I assume ".__annotations__" access evaluates them in the original context. I couldn't find any useful example yet. Elazar בתאריך שבת, 24 בספט' 2016, 22:07, מאת Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp>:

אלעזר writes:
But to do that I still need an example for the stromgest issue raised - bacwards compatibility.
Given that Nick has misgivings about the ease of actually implementing this, I think you need to present an implementation, and then we talk about how closely it approximates backward compatibility.
Again, I assume ".__annotations__" access evaluates them in the original context.
You don't get to assume that, without an implementation that shows how you work around the "def can't see names of lambda arguments" issue. At the very least you need to define "evaluate in original context" operationally -- the "original context" as I understand it is what is visible the current implementation, but that is clearly not what you mean. Of course an implementation would serve to define that.

On Thu, Sep 22, 2016 at 12:35 PM, אלעזר <elazarg@gmail.com> wrote:
Yeah, I find the need for using the string "A" here a wart. Rather than change the entire semantics of annotations, it feels like a placeholder for this meaning would be better. E.g.: class A: def __add__(self, other: CLS) -> CLS: ... A static checker could do the magic of recognizing that special name easily enough (no harder than recognizing the quoted string). At runtime 'CLS' could either just be a singleton with no other behavior... or perhaps it could be some sort of magic introspection object. It's more verbose, but you can also spell it now as: class A: def __add__(self, other: type(self)) -> type(self): ... That's a little ugly, but it expresses the semantics we want. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 22 September 2016 at 22:02, אלעזר <elazarg@gmail.com> wrote:
Concerning the __add__ method, I think a more typical type for it is T = TypeVar('T', bound='A') class A: def __add__(self: T, other: T) -> T: ... There is a plan to support this in one of next releases of mypy. In general I think it is a bit early to judge what would be the best solution for forward references. (there are related issues like performance drop etc). More evidence is needed to decide a way forward. -- Ivan

On Fri, Sep 23, 2016 at 12:05 AM Ivan Levkivskyi <levkivskyi@gmail.com> wrote: the string there. Not that I'm saying it's a bad solution, but it fits pyi files more than average-programmer-code.
The problem with waiting for more evidence is that more code will break if the change require such breakage. At least side-effect in annotation expressions should be "deprecated" and not guarantee when it happen, how many times, etc. Elazar

On 23 September 2016 at 15:50, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Same answer as with any other circular dependency: the code smell is the circular dependency itself, not the awkwardness of the syntax for spelling it. If the string based "circular reference here!" spelling really bothers you, refactor to eliminate the circularity (e.g. by extracting a base class or an implementation independent interface definition), rather than advocating to make the spelling less obnoxious. The difference between that and the "methods referring to the class they're defined in" case is that it's likely to be pretty normal to want to do the latter, so it may prove worthwhile to provide a cleaner standard spelling for it. The counter-argument is the general circularity one above: do you *really* need instances of the particular class being defined? Or is there a more permissive interface based type signature you could specify instead? Or perhaps no type signature at all, and let ducktyping sort it out implicitly at runtime? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Friday, September 23, 2016 at 2:23:58 AM UTC-4, Nick Coghlan wrote:
I agree that circularity should in general be avoided, but it's not always possible or elegant to do that. Sometimes you really need two classes to refer to each other. In that case, why not expose your placeholder idea to the user via a library? You have one function that generates placeholder singletons (generate_placeholder()), and another function to walks a class object and replaces a placeholder with a given value (replace_placeholder(placeholder, cls, value)). Best, Neil

On 27 September 2016 at 17:29, Neil Girdhar <mistersheik@gmail.com> wrote:
Because the general case is already covered by using a quoted string instead of a name reference. "I don't like using strings to denote delayed evaluation" isn't a compelling argument, which is why alternative ideas have to offer some other significant benefit, or else be incredibly simple both to implement and to explain. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 27, 2016 at 5:01 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
My motivation for something other than quoted strings is that there are other instances of circular dependencies. Currently, when I am forced into a circular dependency, I import the later class in the member functions of the first: # module x class X: def f(self): from y import Y # do something with Y # module y class Y: pass That's not ideal and I don't see how to extend this solution to use of "y" in class level definitions. Best, Neil

Neil Girdhar writes:
Why not just expose it through a simple assignment? https://mail.python.org/pipermail/python-ideas/2016-September/042563.html Note that this also works for typechecking in PEP 484 checkers that allow forward reference via the stringified class name. See also https://mail.python.org/pipermail/python-ideas/2016-September/042544.html, which would allow eliding the assignment, but pollutes the class namespace. "Simple is better than complex." This feature is still looking for a persuasive use that needs it, and not something simpler. Steve

I don't understand why that would work and this clearly doesn't? Mutual2 = "Mutual2" # Pre-declare Mutual2 class Mutual1: def spam(self, x=Mutual2): print(type(x)) class Mutual2: def spam(self): pass Mutual1().spam() prints class "str" rather than "type". On Tue, Sep 27, 2016 at 6:20 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

On Tue, Sep 27, 2016 at 11:54:40AM +0000, Neil Girdhar <mistersheik@gmail.com> wrote:
Try this: class Mutual1: def spam(self, x=None): if x is None: x = Mutual2 print(type(x)) class Mutual2: def spam(self): pass Mutual1().spam() Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 27 September 2016 at 13:46, Neil Girdhar <mistersheik@gmail.com> wrote:
Yes, I understand that, but I don't see how that would help at all with annotations. Aren't annotations also evaluated at "compile time"?
Yes, but a string whose value is a class name is treated as being the same annotation (i.e., meaning the same) as the class itself. Paul

On 27 September 2016 at 22:46, Neil Girdhar <mistersheik@gmail.com> wrote:
Yes, I understand that, but I don't see how that would help at all with annotations. Aren't annotations also evaluated at "compile time"?
This thread isn't about circular references in general, just circular references in the context of type hinting. For type hinting purposes, it already doesn't matter whether you use a variable name to refer to a type or a quoted string literal, as the typechecker ignores the quotation marks (and this is mandated by PEP 484). For runtime annotation use, the difference is visible, but the only required runtime behaviours for the typechecking use case are "doesn't throw an exception" and "doesn't take a prohibitive amount of time to evaluate when the function is defined". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Doh! Yes, of course 'self' is only a scoped name within the body of the method, not in the signature. On Thu, Sep 22, 2016 at 1:02 PM, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 23 September 2016 at 05:58, David Mertz <mertz@gnosis.cx> wrote:
That doesn't work, as "self" hasn't been bound yet when the annotations are evaluated, just like A hasn't been bound yet (since it doesn't exist until *after* the class body finishes executing). As others have noted, the general idea of allowing either a placeholder name or the class name to refer to a suitable type annotation is fine, though - that would be a matter of implicitly injecting that name into the class namespace after calling __prepare__, and ensuring the compiler is aware of that behaviour, just as we inject __class__ as a nonlocal reference into method bodies that reference "super" or "__class__". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23 September 2016 at 12:05, אלעזר <elazarg@gmail.com> wrote:
Right now? No - you'll get a name error on the "A", just as you would if you tried to reference it as a default argument:
And that's the problem with using the class name in method annotations in the class body: they're evaluated eagerly, so they'd fail at runtime, even if the typecheckers were updated to understand them. Rather than switching annotations to being evaluated lazilly in the general case, one of the solutions being suggested is that *by default*, the class name could implicitly be bound in the body of the class definition to some useful placeholder, which can already be done explicitly today: placeholder placeholder placeholder Since method bodies don't see class level name bindings (by design), such an approach would have the effect of "A" referring to the placeholder in the class body (including for annotations and default arguments), but to the class itself in method bodies. I don't think this is an urgent problem (since the "A"-as-a-string spelling works today without any runtime changes), but it's worth keeping an eye on as folks gain more experience with annotations and the factors affecting their readability. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

David Mertz wrote:
I think that depends on what kind of software you're writing. Anything involving any kind of trees or graphs will have classes that refer to themselves or each other.
(well, unless you use much more recursion than Python encourages).
Recursive data structures don't necessarily imply recursive code to process them, although recursion is often the most natural way to write that code. -- Greg

On Thu, Sep 22, 2016 at 9:43 PM Steven D'Aprano <steve@pearwood.info> wrote:
Just because you call it "expression", when for most purposes it isn't - it is an annotation. "Expression" is something that you need its value right now, and "annotation" is something that, well, annotates the code you see right now. Terminology is not the important thing, but that seems to be the basis your argument here.
I was thinking about the former, but yeah, uncovered code will fail at runtime, possibly in production, for *no* real reason. I do not claim that this is common, but it is definitely unnecessary - unlike initialization expressions. (which I would have liked to see delayed too, but I can understand the reasons why this is strongly opposed; it *is* an expression. Sadly bugs are *much* more common there).
And of course the fact that I use annotated code does not necessarily mean I also use type checkers.
You would also complicate the introspection of annotations. With
This argument was partially answered by Alexander before. Generally, introspection *libraries* will be tiny bit more complicated. Introspection user code will not. And people that write introspection code *must* understand the nitty-gritty details of the language, whereas people that read and write regular code need not.
I don't know any concrete examples. Can you give any? Are these examples use side effects on annotation evaluation? Lastly, do *you* consider it a good idea, one that should be accounted for?
It's just another irritating inconvenience making the write-test cycle longer for no obvious reason (at least from the perspective of the user).
It is to me, but that's only personal taste.
I don't understand this answer at all. I am pretty familiar with Python - not as most of the people on this list, but possibly not less than anyone I know in person (sadly so). And this behavior still surprises me. It definitely surprises people coming from a statically-typed background.
Very little. And to quote Frank Miller, “An old man dies, a little girl lives. Fair trade.” Elazar

On Thu, Sep 22, 2016 at 07:21:18PM +0000, אלעזר wrote:
It is *both*. It's an expression, because it's not a statement or a block. You cannot write: def func(arg1: while flag: sleep(1), arg2: raise ValueError): ... because the annotation must be a legal Python expression, not a code block or a statement. It's an annotation because that's the specific *purpose* of the expression in that context. As an analogy: would you argue that it is wrong to call the for-loop iterable an expression? for <target-list> in <expression>: block I trust that you understand that the loop iterable can be any expression that evaluates to an iterable. Well, annotations can be any expression that evaluates to anything at all, but for the purposes of type checking, are expected to evaluate to a string or a type object. In the case of function annotations, remember that they can be any legal Python expression. They're not even guaranteed to be type annotations. Guido has expressed a strong preference that they are only used as type annotations, but he hasn't yet banned other uses (and I hope he doesn't), so any "solution" for a type annotation problem must not break other uses.
Right. In the case of Python, function annotations **do** have a runtime effect: the expressions are evaluated, and the evaluated results are assigned in function.__annotations__ and made available for runtime introspection. Don't think that function annotations are **only** for the static type checker. Python is a much richer language than that!
Unnecessary? class MyClass: pass def function(arg: MyCalss): ... I want to see an immediate NameError here, thank you very much, even if I'm not running a static checker. I don't want to have to manually call: function.__annotations__['arg']() to see whether or not the annotation is valid. I accept that using strings as forward annotations is not a foolproof solution either: def function(arg: 'MyCalss'): ... but let's not jump into a "fix" that actually makes things worse.
MyClass doesn't exist at that point, so it is in invalid annotation.
Not to the old man, and especially not if the little girl is a psychopath who grows up to become a mass murdering totalitarian dictator. -- Steve

On 23 September 2016 at 13:06, Steven D'Aprano <steve@pearwood.info> wrote:
If folks are after a simple non-type-checking related example of annotation usage, the "begins" CLI library is a decent one: https://pypi.python.org/pypi/begins That lets you supply command line help for parameters as annotations: ============ In Python3, any function annotations for a parameter become the command line option help. For example:
Will generate command help like: usage: holygrail_py3.py [-h] -n NAME -q QUEST -c COLOUR optional arguments: -h, --help show this help message and exit -n NAME, --name NAME What, is your name? -q QUEST, --quest QUEST What, is your quest? -c COLOUR, --colour COLOUR What, is your favourite colour? ============ It's not a substitute for something like click or argparse when it comes to more complex argument parsing, but it's a good example of the kind of simple pseudo-DSL folks have long been able to create with annotations independently of the type hinting use case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 23, 2016 at 6:24 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
That's a very nice use, and I was wrong - I did know it; I've found it not long ago when I wanted to implement it myself... And guess what? It does not require eager evaluation _at all_. No decorator-helped-annotation mechanism require eager evaluation built into the language. Lazy evaluation is more general than eager, in that it can always be forced (and not the other way around). def eager_annotation(f): f.__annotations__ = {k:v() for k, v in f.__annotations__} return f Use @eager_annotation wherever you like, or collapse it into other decorators. You don't need @eager_annotation for type annotations, or any other form of annotation without runtime semantics. On the other hand - if you do want side effect in this function's annotations, well there's better be some nice big @EAGER! decorator above it. Elazar

On 23 September 2016 at 20:31, אלעזר <elazarg@gmail.com> wrote:
The problem it poses for your proposal isn't that a library like begins couldn't be updated to work with lazy annotations (as you say, it clearly could be), it's that it demonstrates the idea of switching to lazy annotations involves a language level *compatibility break* for a feature that has been around and in use for almost 8 years now, and those need incredibly strong justifications. While I personally have some sympathy for the perspective that using strings for forward references in type hints feels a bit clunky, it still doesn't come close to reaching that deliberately high bar. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 23, 2016 at 6:06 AM Steven D'Aprano <steve@pearwood.info> wrote:
Did you just use a false-trichotomy argument? :)
because the annotation must be a legal Python expression, not a code
block or a statement.
This is the situation I'm asking to change
It's an annotation because that's the specific *purpose* of the expression in that context.
Exactly! Ergo, this is an annotation.
for-loop iterable is an expression, evaluated at runtime, _for_ the resulting value to be used in computation. A perfectly standard expression. Nothing fancy.
Must *allow* other use cases. My proposal allows: just evaluate them at the time of their use, instead at definition time.
function.__annotations__ can have the delayed value, be it a lambda, ast or string. It can also be computed at the time of access as suggested earlier.
Two things to note here: A. IDEs will point at this NameError B. Type checkers catch this NameError C. Even the compiler can be made to catch this name error, since the name MyCalss is bound to builtins where it does not exist - you see, name lookup does happen at compile time anyway. I'm not really suggesting the compiler should make it error though. D. Really, where's the error here? if no tool looks at this signature, there's nothing wrong with it - As a human I understand perfectly. If a tool will look at it, it will warn or fail, exactly as I would liked it too. function.__annotations__['arg']()
but let's not jump into a "fix" that actually makes things worse.
That's not a "fix". I suggest always using the last form - which is already in common use - with a nicer syntax and semantics, since there's nothing wrong about it. It is there for a very natural reason.
:) Elazar

On Fri, Sep 23, 2016 at 10:17:15AM +0000, אלעזר wrote:
No. You are the one trying to deny that annotations are expressions -- I'm saying that they are both annotations and expressions at the same time. There's no dichotomy here, since the two are not mutually exclusive. (The word here is dichotomy, not trichotomy, since there's only two things under discussion, not three.)
That's a much bigger change than what you suggested earlier, changing function annotations to lazy evaluation instead of eager. Supporting non-expressions as annotations -- what's your use-case? Under what circumstances would you want to annotate an function parameter with a code block instead of an expression?
I've never denied that annotations are annotations, or that annotations are used to annotate function parameters. I'm not sure why you are giving a triumphant cry of "Exactly!" here -- it's not under dispute that annotations are annotations. And it shouldn't be under dispute that annotations are expressions. They're not code blocks. They're not statements. What else could they be apart from expressions? The PEP that introduced them describes them as expressions: Function annotations are nothing more than a way of associating arbitrary Python EXPRESSIONS with various parts of a function at compile-time. [Emphasis added.] https://www.python.org/dev/peps/pep-3107/ and they are documented as an expression: parameter ::= identifier [":" expression] Parameters may have annotations of the form “: expression” following the parameter name. ... These annotations can be any valid Python expression https://docs.python.org/3/reference/compound_stmts.html#function-definitions I think its time to give up arguing that annotations aren't expressions.
Right. And so are annotations. You want to make them fancy, give them super-powers, in order to solve the forward reference problem. I don't think that the problem is serious enough to justify changing the semantics of annotation evaluation and make them non-standard, fancy, lazy-evaluated expressions.
I meant what I said. Changing the evaluation model for annotations is a big semantic change, a backwards-incompatible change. It's not just adding new syntax for something that was a syntax error before, it would be changing the meaning of existing Python code. The transition from 3.6 to 3.7 is not like that from 2.x to 3.0 -- backwards compatibility is a hard requirement. Code that works a certain way in 3.6 is expected to work the same way in 3.7 onwards, unless we go through a deprecation period of at least one full release, and probably with a `from __future__ import ...` directive required. There may be a little bit of wiggle-room available for small changes in behaviour, under some circumstances -- but changing the evaluation model is unlikely to be judged to be a "small" change. In any case, before such a backwards-incompatible change would be allowed, you would have to prove that it was needed. [...]
Some or them might. Not everyone uses an IDE, it is not a requirement for Python programmers. Runtime exceptions are still, and always will be, the primary way of detecting such errors.
B. Type checkers catch this NameError
Likewise for type checkers.
C. Even the compiler can be made to catch this name error, since the name MyCalss is bound to builtins where it does not exist
How do you know it doesn't exist? Any module, any function, any class, any attribute access, might have added something called MyCalss to this module's namespace, or to the built-ins. It's okay for a non-compulsory type-checker, linter or editor to make common-sense assumptions about built-ins. But the compiler cannot: it has no way of knowing *for sure* whether or not MyCalss exists until runtime. It has to actually do the name lookup, and see what happens.
- you see, name lookup does happen at compile time anyway.
It really doesn't. You might be confusing function-definition time (which occurs at runtime) with compile time. When the function is defined, which occurs at runtime, the name MyCalss must exist or a NameError will occur. But that's not at compile time.
D. Really, where's the error here? if no tool looks at this signature, there's nothing wrong with it - As a human I understand perfectly.
class CircuitDC: ... class CircuitAC: ... def func(arg: CircuitSC): ... Do you still understand perfectly what I mean? -- Steve

On Fri, Sep 23, 2016 at 3:11 PM Steven D'Aprano <steve@pearwood.info> wrote:
The argument "It's an expression, because it's not a statement or a block" assumes that things must an expression, a statement or a block. Hence "trichotomy". And it is false. But I think we are getting lost in the terminology. Since I propose no change in what is considered valid syntax,
It indeed came out different than I meant. I don't suggest allowing anything that is not already allowed, syntactically. I only propose giving the current syntax a slightly different meaning, in a way that I'm sure matches how Python coders already understand the code.
:( this kind of fighting over terminology takes us nowhere indeed. What other context you see where the result of an expression is not intended to be used at all? Well there's Expression statements, which are evaluated for side effect. There's docstrings, which are a kind of annotations. What else? The only other that comes to mind is reveal_type(exp)... surely I don't need evaluation there. And it shouldn't be under dispute that annotations are expressions. mainly for the resulting value (hence "expression") and annotations are there mainly for being there. In the code.
Syntactically, yes. Just like X in "a = lambda: X" is an expression, but you don't see it evaluated, do you? And this is an _actual_ expression, undeniably so, that is intended to be evaluated and used at runtime.
I don't care if you call them expressions, delayed-expressions, or flying monkeys. The allowed syntax is exactly that of an expression (like inside a lambda). The time of binding of names to scope is the same (again like a lambda) but the evaluation time is unknown to the non-reflecting-developer. Decorators may promise time of evaluation, if they want to. "Unknown evaluation time" is scary. _for expressions_, which might have side effects (one of which is running time). But annotations must be pure by convention (and tools are welcome to warn about it). I admit that I propose breaking the following code: def foo(x: print("defining foo!")): pass Do you know anyone who would dream about writing such code?
My proposal solves the forward reference problem, but I believe in it because I believe it is aligned with what the programmer see.
I would like to see an example for a code that breaks under the Alexander's suggestion of forcing evaluation at `.__annotations__` access time.
How useful is the detection of this error at production?
Yeah it was just a thought. I wouldn't really want the compiler to do that.
Can you repeat that? NameError indeed happens at runtime, but the scope in which MyCalss was looked up for is determined at compile time - as far as I know. The bytecode-based typechecker I wrote rely on this information being accessible statically in the bytecode. def foo(): locals()['MyType'] = str def bar(a : MyType): pass
What do I miss?
No. def func(arg: CircuitAC): ... Do you understand what I mean? Code with small distance (hamming distance / edit distance) between related-but-different entities is prone to such errors, and NameError gives you very little assurance - if you erred this way, you get it; If you err that way, you don't. --- This way or the other, the very least that I hope, is explicitly forbidding reliance on side-effect or any other way to distinguish evaluation time of annotation expressions. Annotations must be pure, and the current promise of evaluation time should be deprecated. Additionally, before making it impossible to go back, we should make the new variable annotation syntax add its annotations to a special object __reflect__, so that __reflect__.annotations__ will allow forcing evaluation (since there is no mechanism to do this in a variable). Elazar

On Fri, Sep 23, 2016 at 11:58 PM, אלעזר <elazarg@gmail.com> wrote:
Function annotations ARE used. They're stored as function attributes, just as default argument values and docstrings are. (It's not the language's problem if you never use them.)
And the X in "if False: X" is a statement, but you don't see it evaluated either. This is an actual expression that has to be evaluated and used just like any other does.
Thing is, literally every other expression in Python is evaluated at the point where it's hit. You can guard an expression with control flow statements or operators, but other than that, it will be hit when execution reaches its line: def func(x): expr # evaluated when function called if cond: expr # evaluated if cond is true [expr for x in range(n)] # evaluated if n > 0 (expr for x in [1]) # evaluated when genexp nexted expr if cond else "spam" # evaluated if cond is true lambda: expr # evaluated when function called def func(x=expr): pass # evaluated when function defined def func(x: expr): pass # evaluated when function defined Default arguments trip some people up because they expect them to be evaluated when the function's called, but it can easily be explained. Function annotations are exactly the same. Making them magically late-evaluate would have consequences for the grokkability of the language - they would be special. Now, that can be done, but as Rumplestiltskin keeps reminding us, all magic comes with a price, so it has to be strongly justified. (For instance, the no-arg form of super() is most definitely magical, but its justification is obvious when you compare Py2 inheritance with Py3.)
Yes, side effects make evaluation time scary. But so do rebindings, and any other influences on expression evaluation. Good, readable code generally follows the rule that the first instance of a name is its definition. That's why we put imports up the top of the script, and so on. Making annotations not work that way isn't going to improve readability; you'd have to search the entire project for the class being referenced. And since you can't probe them at definition time, you have to wait until, uhh, SOME time, to do that search - you never know where the actual name binding will come from. (It might even get injected from another file, so you can't statically search the one file.)
This is on par with a proposal to make default argument values late-bind, which comes up every now and then. It's just not worth making these expressions magical.
The sooner you catch an error, the better. Always.
That locals() is not editable (or rather, that mutations to it don't necessarily change the actual locals). This is equivalent to: def foo(): locals()['MyType'] = str print(MyType)
In each case, you have to *call* foo() to see the NameError. It's happening at run time.
Define "pure". Function decorator syntax goes to some lengths to ensure that this is legal: @deco(arg) def f(): pass PEP 484 annotations include subscripting, even nested: def inproduct(v: Iterable[Tuple[T, T]]) -> T: so you'd have to accept some measure of run-time evaluation. It's worth reiterating, too, that function annotations have had the exact same semantics since Python 3.0, in 2008. Changing that now would potentially break up to eight years' worth of code, not all of which follows PEP 484. When Steve mentioned 'not breaking other uses of annotations', he's including this large body of code that might well not even be visible to us, much less under python.org control. Changing how annotations get evaluated is a *major, breaking change*, so all you can really do is make a style guide recommendation that "annotations should be able to be understood with minimal external information" or something.
Wow, lots of magic needed to make this work. Here's my counter-proposal. In C++, you can pre-declare a class like this: class Mutual2; //Pre-declare Mutual2 class Mutual1 { Mutual2 *ptr; }; class Mutual2 { Mutual1 *ptr; } Here's how you could do it in Python: Mutual2 = "Mutual2" # Pre-declare Mutual2 class Mutual1: def spam() -> Mutual2: pass class Mutual2: def spam() -> Mutual1: pass Problem solved, no magic needed. ChrisA

On 24 September 2016 at 01:58, Chris Angelico <rosuav@gmail.com> wrote:
Folks have been assuming that's straightforward, but the way class namespaces work actually makes it significantly harder than it first appears. Using lambda and default arguments to illustrate the problem: >>> class Example: ... attr = 10 ... @staticmethod ... def good_method(eager=attr): ... return eager ... @staticmethod ... def bad_method(lazy=(lambda:attr)): ... return lazy() ... >>> Example().good_method() 10 >>> Example().bad_method() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 8, in bad_method File "<stdin>", line 7, in <lambda> NameError: name 'attr' is not defined By design, function scopes can't see attributes defined in containing class scopes, and we don't currently have any other kind of scope that supports delayed evaluation (unlike function bodies, class bodies are evaluated eagerly at class definition time, and all the other delayed evaluation constructs are syntactic sugar for some particular flavour of function scope definition - even generators and coroutines use the same basic name resolution scheme as regular functions, they just use different execution models). If it was still 2006 or 2007 and Python 3.0 hadn't been released yet, lazy annotations could seriously be considered as an option. It's 2016 though, eager annotations have been out in the wild since December 2008, and the existing "string literals are Python's de facto lazy evaluation syntax" approach works well enough for the purpose, since type checkers can say they're actually going to parse those string literals when they appear to be type hints. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I promised not to bother you, but I really can't. So here's what I felt I have to say. This email is quite long. Please do not feel obliged to read it. You might find some things you'll want to bash at the end though :) Short-ish version: 1. Please consider disallowing the use of side effects of any kind in annotations, in that it is not promised when it will happen, if at all. So that a change 3 years from now will be somewhat less likely to break things. Please consider doing this for version 3.6; it is feature-frozen, but this is not (yet) a feature, and I got the feeling it is hardly controversial. I really have no interest in wasting the time of anybody here. If this request is not something you would ever consider, please ignore the rest of this email. 2. A refined proposal for future versions of the language: the ASTs of the annotation-expressions will be bound to __raw_annotations__. * This is actually more in line to what PEP-3107 was about ("no assigned semantics"; except for a single sentence, it is only about expressions. Not objects). * This is helpful even if the expression is evaluated at definition time, and can help in smoothing the transformation. 3. The main benefit from my proposal is that contracts (examples, explanations, assertions, and types) are naturally expressible as (almost) arbitrary Python expressions, but not if they are evaluated or evaluatable, at definition time, by the interpreter. Why: because it is really written in a different language - *always*. This is the real reason behind the existence, and the current solutions, of the forward reference problem. In general it is much more flexible than current situation. 4. For compatibility, a new raw_annotations() function will be added, and a new annotations() function will be used to get the eval()ed version of them. Similarly to dir(), locals() and globals(). * Accessing __annotations__ should work like calling annotations(), but frowned upon, as it might disappear in the far future. * Of course other `inspect` functions should give the same results as today. * Calling annotations()['a'] is like a eval(raw_annotations()['a']) which resembles eval(raw_input()). I believe the last point has a very good reason, as explained later: it is an interpretation of a different language, foreign to the interpreter, although sometimes close enough to be useful. It is of course well formed, so the considerations are not really security-related. I am willing to do any hard work that will make this proposal happen (evaluating existing libraries, implementing changes to CPython, etc) given a reasonable chance for acceptance. Thank you, Elazar --- Long version: Stephen - I read your last email only after writing this one; I think I have partially addressed the lookup issue (with ASTs and scopes), and partially agree: if there's a problem implementing this feature, I should look deeper into it. But I want to know that it _might_ be considered seriously, _if_ it is implementable. I also think that Nick refuted the claim that evaluation time and lookup *today* are so simple to explain. I know I have hard time explaining them to people. Nick, I have read your blog post about the high bar required for compatibility break, and I follow this mailing list for a while. So I agree with the reasoning (from my very, very little experience); I only want to understand where is this break of compatibility happen, because I can't see it. Chris: On Fri, Sep 23, 2016 at 6:59 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 23, 2016 at 11:58 PM, אלעזר <elazarg@gmail.com> wrote:
No, it isn't. I guess that even the code you write or consider to be excellent and readable still contains functions that use entities defined only later in the code. It is only when you follow execution path that you should be already familiar with the names. I think rebinding is only scary when it is combined with side effect or when the name lookup is not clear. And why do you call it _re_binding? <snip>
No. No. No. If a code in production will fail at my client's site because of a mispelled annotation (unused by runtime tools), I will be mad. *On the language*. It is just as reasonable as failing because of mispled documentation. (My suggestion does not prevent it completely of course. Nothing will. I only say this is unhelpful). <snip> It's worth reiterating, too, that function annotations have had the
exact same semantics since Python 3.0, in 2008.
When was this semantics decided and for what purposes, if I may ask? because the PEP (2006) explicitly states that "this PEP makes no attempt to introduce any kind of standard semantics". The main relevant paragraph reads (I quote the PEP, my own emphasis): "2. Function annotations are nothing more than a way of associating arbitrary Python EXPRESSIONS with various parts of a function at compile-time. By itself, Python does not attach ANY PARTICULAR MEANING or significance to annotations. Left to its own, Python simply makes these EXPRESSIONS available as described in Accessing Function Annotations below. The only way that annotations take on meaning is when they are interpreted by third-party libraries. These annotation consumers can do anything they want with a function's annotations." Amen to that! Word by word as my suggestion. Why aren't these _expressions_ available to me, as promised? <baby crying> Sadly, a few paragraphs later, the PEP adds that "All annotation expressions are evaluated when the function definition is executed, just like default values." - Now please explain to me how is that attaching "no particular meaning or significance to annotations". You practically evaluate them, for heavens sake! this is plain and simple "attached meaning" and "standard semantics". Unusefully so. I put there an expression, and all I got is a lousy object.
As I said, it is a strong argument - given an example of such a potential break for non-convoluted code. I want to see such an example. But why is "deprecating side effects in annotation's definition-time-execution" considered a breaking change? It is just a documentation. Everything will work as always has. Even edge cases. I would think this is possible even for the feature-freezed 3.6. Like saying "We've found a loophole in the language; it might get fixed in the future. Don't count on it."
Problem not solved. Your counter proposal solves only certain forward references, and requires keeping on more thing in sync, in particular adapting the scope of the "forward declaration" to the scope of the later definition, which may change over time and is in violation of DRY. Oh and lastly, type checkers will scream or will work very hard to allow this idiom. My proposal asks for no magic at all. Unless you consider dir() and locals() magical (they are. a bit). Stephen: Regarding the terminology, I said "But I think we are getting lost in the terminology." including myself. On Sat, Sep 24, 2016 at 10:07 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Another design principle is Occam's Razor, here applied as "new kinds of
Here's the version for annotations: The compiler turns them into an AST. The interpreter does nothing with them, except attaching them to the __annotations__ object. Are you the average programmer? then just read them, they should be helpful. How simple is that? thing shall not spring up like whiskers on Barry's chin." Yes, function annotations need new syntax and so are a new kind of thing to that extent. *Their values don't need to be,* Their values don't need to be there at all. All is needed is their structure. The AST. Their value is of very little use, actually. And I'm familiar with the typing module; it is bent here and there to match this need to have a "value" or an object when you actually need only the structure. I don't invent "new thing" any more than is already there. I have a strong belief that _already there_ is a new thing. In Python it is called "types", "annotations" and some forms of "documentations". In other languages it is called in other names. I don't mind the merging of this concept with the concept of Expression - I actually think it is brilliant. Sadly, the _interpreter_ does not understand it. It is (brilliantly again) admitted in the PEP, but sadly the interpreter does not seem to get it, and it tries to execute it anyway. So people shush it with a quote. Well done. Now where is that Expression thing? And why can't my editor highlight it?
Keeping an AST without evaluation at all is still a clear pessimization?
I argue that flow of control should not reach annotations at all. Not the control of the interpreter. It does not understand what's written there, and should not learn.
See my previous comment. The interpreter should not look for them. You know what? It can try to give a hint in name resolution. As a favor. If the tools can't find it, it's their business to report.
"[G]ood general theory does not search for the maximum generality, but for the right generality."
I believe this is the right generality "because maths" and my own intuition. This should not convince you at all.
As you've noticed, I refined my proposal to "don't evaluate". And here's my attempt at presenting the "because maths" argument you probably don't want to hear: it will allow natural and well-based way to express contracts and dependent types, which is a much more natural and flexible way to type dynamically-typed languages such as Python. It is nothing new really; it is based on a 40-years old understanding that types are propositions *and propositions are types*. And I want to use it. From simple to complex: @typecheck def to_float(x: int or str) -> float: ... (Oh, @typecheck can't access this, so they invented Union. Oh well. TODO: explain to junior what a Union[int, str] is) @typecheck def __add__(self, x: int and float) -> float: ... This should help resolve a real problem in type checkers regarding overloads and overlapping types. Except @typecheck can only see the object "float". And they did not invent Intersection[] yet. Bummer, but fixable. @dependent def call_foo_twice(x: x.foo()) -> None: x.foo() x.foo() Uh, I don't know. Perhaps x.foo() has side effect and I'm not sure how "dependent" works, and perhaps it is not so clear. Try again: @dependent def call_foo_twice(x: hasattr(x, "foo") and is_callable(x.foo)) -> None: x.foo() x.foo() But why NameError? :( Let's define contracts @contract def divmod(x: int, y: int and y != 0) -> (x//y, x % y): return # an optimized version NameError again? :( Not every function _should_ be annotated this way, but why can't you _allow_ this kind of annotation? These make a needless syntax error for an obviously meaningful expression. What if I want to specify "a class that subclasses Abstract but can be istantiated? I need it because otherwise mypy resorts to allowing unsafe code: def create(cls: typing.Type[Abstract] and cls(...) ) -> Base: return cls() NameError again. Why? Not because _you_ (people) don't understand it. No. It is because the _interpreter_ claims to understand it, but it doesn't. It cannot, because Python is not intended to be a specification language, and probably should not be. Even with simple types it doesn't, really. It just happen to look up the right _class_ which is not a type but is useful nonetheless; when I write "x: int" I actually mean "x: isinstance(x, int)", but the interpreter doesn't get it and should not get it. And what if I want "x: type(x)==int"? It has its uses. Regarding my "not an expression" claim: this is a proposition, an assumption, a precondition, or explanation - different incarnation of the same idea - as is any annotation system I have seen (except possibly injection, which will not break). Including Nick's "begin". Now notice this nice command-line-parser - the annotations there can still be strings, combined easily with type checkers. Why? because it targets human users, that's why. And the interpreter does not try to understand them because it does not understand English. Well, it does not understand Type-ish or Contract-ish, or any other Spec-ish. There are external tools for that, thank you very much. Just give them those things that you (rightly) consider to be expressions. I like the raw. Elazar

On Sun, Sep 25, 2016 at 11:55 AM, אלעזר <elazarg@gmail.com> wrote:
I don't think Python has any concept of *disallowing* side effects. As soon as arbitrary objects can be called and/or subscripted, arbitrary code can be executed. However, a style guide may *discourage* extensive side effects, and this I would agree with - not for reasons of future change, but for reasons of simplicity and readability.
So, basically, you want annotations to be able to make use of names defined in the object they're annotating. That's a reasonable summary of the idea, if I have this correct. I'll trim out a ton of quoted material that digs into details. Ultimately, though, you're asking to change something that has been this way since *Python 3.0*. You're not asking for a tiny tweak to a feature that's new in 3.6. If you were, perhaps this could be done, despite feature freeze; but you're breaking compat with eight years of Pythons, and that's almost certainly not going to happen.
Actually, no, I do generally stick to this pattern, builtins aside. Obviously there are times when you can't (mutually exclusive functions, for instance), but those are pretty rare. Here's an example program of mine: https://github.com/Rosuav/LetMeKnow/blob/master/letmeknow.py There is one star-import, which breaks this pattern (the global name CLIENT_SECRET comes from keys.py), and which I consider to be a failing under this principle; but it's better than most of the alternatives, and like all style recommendations, "define before use" is a rule that can be broken.
Then I strongly disagree. If it's going to fail at the client's site, I want it to first fail on my computer.
Decorators also have clearly defined local semantics and completely undefined overall semantics. If you see this in a .py file: @spaminate(ham=1) def frobber(): pass you know exactly what's happening, on a mechanical level: first spaminate(ham=1) will be called, and then the result will be called with frobber as an argument, and the result of that bound to the name frobber. But exactly what the spaminate decorator does is not Python's business. It might make frobber externally callable (cf routing decorators in Flask), or it might modify frobber's behaviour (eg require that ham be 1 before it'll be called), or it might trigger some sort of run-time optimization (memoization being an easy one, and actual code modifications being also possible). Annotations are the same. There's a clearly defined local syntactic handling: @multicall def frobber(ham: [10,20,30]): pass but nothing in the language that says what this truly means. In this case, I'm envisioning a kind of special default argument handling that says "if you don't provide a ham argument, call frobber three times with the successive values from ham's annotation". But you can be absolutely certain that, on the mechanical level, what happens is that the expression "[10,20,30]" gets evaluated, and the result gets stashed into the function's __annotations__. In contrast, function default arguments have *both* forms of semantics clearly defined. The expression is evaluated and the result stashed away; and then, when the function is called, if there's no argument, the default is used.
Deprecating in the sense of "style guides recommend against this" is fine. PEP 8 has been updated periodically, and it doesn't break anyone's code (except MAYBE linters, and even then they're not broken, just not up-to-date). But an actual code change that means that Python 3.7 will reject code that Python 3.5 accepted? That's a breaking change. And the purpose of your documentation-only deprecation is exactly that, or possibly Python 3.8 or 3.9, but timeframe doesn't change the fact that it will break code.
Type checkers that comply with PEP 484 are already going to support this notation, because "Mutual2" is a valid annotation. All I've done differently is make a simple assignment, in the same way that typevars get assigned.
Keeping an AST without evaluation at all is still a clear pessimization?
The AST for an expression usually takes up more memory than the result of it, yeah.
Please let's not go down this path. Already I have to explain to my students that this won't work: if response == "yes" or "y": If it *does* work in annotations but doesn't work everywhere else, that would be extremely confusing.
I'm not sure what the intersection of int and float would be, but perhaps you mean this more like Java's interfaces - something that "implements X" and "implements Y" is the intersection of the types X and Y.
Now, this is where stuff starts to get interesting. You want to be able to define an assertion in terms of the variables you're creating here. In effect, you have something like this: def divmod(x, y): assert isinstance(x, int) assert isinstance(y, int) and y != 0 ... # optimized calculation assert ret == (x // y, x % y) return ret As ideas go, not a bad one. Not really compatible with annotations, though, and very difficult to adequately parse. If you want to flesh this out as your proposal, I would suggest setting this thread aside and starting over, explaining (a) why actual assertions aren't good enough, and (b) how annotations could be used without breaking compatibility.
Actually, I don't understand exactly what this should do. Does it assert that cls can be instantiated with some unknown args? Because you then instantiate it with no args. What does cls(...) require?
And what if I want "x: type(x)==int"? It has its uses.
Then explicitly assert that. I don't see why you should type-declare that something is "this and not a subclass" - the whole point of subclassing is that it still is an instance of the superclass. Maybe what Python needs is a simple syntax for "AST-for-this-expression". We have lambda, which means "function which evaluates this expression"; this would operate similarly.
It'd be exactly the same as ast.parse("x + y"), but might be able to make use of the existing parsing operation, and would be more easily syntax highlighted. (Or maybe it'd start at the Expr node instead - so it'd be equiv to ast.parse("x + y").body[0].) Open to suggestions as to an actual name. With that change, your proposals could all be added in a 100% backward compatible way. Annotations, as a feature, wouldn't change; you'd just kappafy your contracts: @contract def divmod(x: int, y: kappa: int and y != 0) -> kappa: (x//y, x % y): ... And then you could define contract() as either the identity function (optimized mode, no checks done), or a wrapper function that does run-time checks. Maybe that, rather than making annotations magical, would solve the problem? ChrisA

On 25 September 2016 at 11:55, אלעזר <elazarg@gmail.com> wrote:
This may be part of the confusion, as Python is a language with a *reference implementation*, rather than relying solely on a documented language specification. Unless we specifically call something out in the language reference and/or the test suite as a CPython implementation detail, then "what CPython does" should be taken as the specification. While we're fairly permissive in allowing alternative implementations to deviate a bit and still call themselves Python, and sometimes alternate implementation authors point out quirky behaviours and we declare them to be bugs in CPython, "CPython correctly implements the Python language specification" is still the baseline assumption. So the order of evaluation for annotations with side effects has been defined since 3.0 came out:
That is, at function definition time: - default values are evaluated from left to right - annotations are evaluated from left to right
I don't think you're wasting anyone's time - this is a genuinely complex topic, and some of it relates to design instinct about what keeps a language relatively easy to learn. However, I do think we're talking past each other a bit. I suspect the above point regarding the differences between languages that are formally defined by a written specification and those like Python that let a particular implementation (in our case, CPython) fill in the details not otherwise written down may be a contributing factor to that Another may be that there are some things (like advanced metaprogramming techniques) where making them easy isn't actually a goal we necessarily pursue: we want to ensure they're *possible*, as in some situations they really are the best available answer, but we also want to guide folks towards simpler alternatives when those simpler alternatives are sufficient. PEP 487 is an interesting example of that, as that has the express goal of taking two broad categories of use cases that currently require a custom metaclass (implicitly affecting the definition of subclasses and letting descriptors know the attribute name they're bound to), and making them standard parts of the default class definition protocol. Ideally, this will lead to *fewer* custom metaclasses being defined in the future, with folks being able to instead rely on normal class definitions and those simpler extracted patterns.
PEP 3107 came with a reference implementation, it wasn't just the written PEP content: https://www.python.org/dev/peps/pep-3107/#implementation
* This is helpful even if the expression is evaluated at definition time, and can help in smoothing the transformation.
We talk about the idea of expression quoting and AST preservation fairly often, but it's not easy to extract from the archives unless you already know roughly what you're looking for - it tends to come up as a possible solution to *other* problems, and each time we either decide to leave the problem unsolved, or find a simpler alternative to letting the "syntactic support for AST metaprogramming" genie out of the bottle. Currently, the only supported interfaces for this are using the ast.parse() helper, or passing the ast.PyCF_ONLY_AST flag to the compile() builtin. This approach gives alternative implementations a fair bit of flexibility to *not* use that AST internally if it doesn't help their particular implementation. Once you start tying it in directly to language level features, though, it starts to remove a lot of that implementation flexibility.
"More flexible" is only a virtue if you have concrete use cases in mind that can't otherwise be addressed today. Since you mention design-by-contract, you may want to take a look at https://www.python.org/dev/peps/pep-0316/ which is an old deferred proposal to support DBC by way of a particular formatting convention in docstrings, especially as special formatting in docstrings was one of the main ways folks did type annotations before PEP 3107 added dedicated syntax for them.
4. For compatibility, a new raw_annotations() function will be added, and a new annotations() function will be used to get the eval()ed version of them.
Nothing *new* can ever be added for compatibility reasons: by definition, preserving backwards compatibility means old code continuing to run *without modification*. New interfaces can be added to simplify migration of old code, but it's not the same thing as actually preserving backwards compatibility.
Here you're getting into the question of expression quoting, and for a statement level version of that, you may want to explore the thread at https://mail.python.org/pipermail/python-ideas/2011-April/009765.html (I started that thread because I'd had an idea I needed to share so I could stop thinking about it, but I also think more syntactic sugar for metaprogramming isn't really something the vast majority of Python developers actually need) Mython, which was built as a variant of Python 2 with more metaprogramming features is also worth a look: http://mython.org/
I think the two basic road blocks you're running into are: - the order of evaluation for annotations with side effects is already well defined and has been since Python 3.0. It's just defined by the way CPython works as the reference implementation, rather than in English prose anywhere. - delayed evaluation already has two forms in Python (function scopes and quoted strings) and adding a third is a *really* controversial prospect, but if you don't add a third, you run into the fact that all function scopes inside a class scope are treated as methods by the compiler Stephen's post went into more detail on *why* that second point is so controversial: because it's a relatively major increase in the underlying complexity of the runtime execution model. The most recent run at it that I recall was my suggestion to extend f-strings (which are eagerly evaluated) to a more general purpose namespace capturing capability in https://www.python.org/dev/peps/pep-0501/ That's deferred pending more experience with f-strings between now and the 3.7 beta, but at this point I'll honestly be surprised if the simple expedient of "lambda: <f-string>" doesn't turn out to be sufficient to cover any delayed evaluation needs that arise in practice (folks tend not to put complex logic in their class bodies).
Most folks coming from pre-compiled languages like C++, C# & Java struggle with the fact that Python doesn't have separate compile time constructs (which deal with function, class and method declarations) and runtime constructs (which are your traditional control flow statements). Instead, Python just has runtime statements, and function and class definition are executed when encountered, just like any other statement. This fundamentally changes the relationship between compile time, definition time, and call time, most significantly by having "definition time" be something that happens during the operation of the program itself.
This code works as a doctest today: >>> def func(a: "Expected output"): ... pass ... >>> print(func.__annotations__["a"]) Expected output Any change that breaks that currently valid doctest is necessarily a compatibility break for the way annotations are handled at runtime. It doesn't matter for that determination how small the change to fix the second command is, it only matters that it *would* have to change in some way. In particular, switching to delayed evaluation would break all the introspection tools that currently read annotations at runtime, both those in the standard library (like inspect.signature() and pydoc), and those in third party tools (like IDEs). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks for the references. I will read them. In general, I am against magic in code. I am for magic in specification, with appropriate hints (e.g. the explicit name in the decorator, as pointed to me by Chris) and with taste. The most important part about specification is being naturally understood by human. The second most important is being understood by tools. What's not important: being understood by the interpreter. CPython as a reference implementation has a very, very specific behavior, changing at every minor release. Of course not every tiny detail of this behavior is promised. It is understood by users that e.g. they cannot rely on their code taking 0.6ms to execute in such and such settings, since real-time conדtraints are not promised even if some part of some version of CPython happens to run this fast deterministically. The implementation specifies the behavior, except when common sense or documentation says otherwise. Am I wrong? On Sun, Sep 25, 2016 at 7:07 PM Nick Coghlan <ncoghlan@gmail.com> wrote:
But my intention is that this code will work just fine. As is any other access using __annotations__ or any existing API. The only visible change should be that of expressions with visible side effect, so this is the kind of break I am looking for. The following will break def foo(a: print(1)): pass But nobody (yet) claimed it to be a reasonable example of code we don't want to break. There can hardly be any, since any side effect can be placed right before the definition. So just like star-imports that are broken every time a function is added to some library, IIUC, you don't care about breaking them, because they are strongly and explicitly discouraged. Elazar

אלעזר writes:
But nobody (yet) claimed it to be a reasonable example of code we don't want to break.
"Reasonable example" is not the standard. The ideal is that *nobody*'s code breaks unless it's necessary to to fix a bug. The current implementation conforms to the specification[1], and therefore the proposed change is not a bugfix. The Yale Book of Quotations quotes English judge Robert Megarry as follows: "Whereas in England all is permitted that is not expressly prohibited, it has been said that in Germany all is prohibited unless expressly permitted and in France all is permitted that is expressly prohibited. In the European Common Market no-one knows what is permitted and it all costs more." http://freakonomics.com/2009/10/29/quotes-uncovered-death-and-statistics/ Python, of course, follows the principle of English law. That's what we mean by "consenting adults". The rules about change are more flexible in the stdlib, but even there we get reports every release about breakage due to improvements in various modules. This is the language definition, so "if you can do it in vX.Y, it should do the same in vX.(Y+1)" is a strict rule.[2] Footnotes: [1] Assuming, as I do, that in PEP 3107 "expression" refers only to the syntax specification and does not at all imply adding a new expression type to the language. What is stored in __annotations__ is thus implied to be the object that is the value of the expression, following the precedent of initialization, and the general Pythonic approach of evaluating expressions when encountered. And that semantics is stated explicitly in PEP 3107. [2] The definition of "do the same" does not necessarily mean "produce identical output", eg, in the case of "dir()" in the bare interpreter with no imports.

Nick Coghlan writes:
This is a bit unfair to אלעזר, although it's been a long thread so I can understand why some of his ideas have gone missing. His proposals have gotten a bit incoherent because he has been answering all the different objections one by one rather than organizing things into a single design, but I think eventually he would organize it as follows: (1) Add __raw_annotations__ and save the thunked expressions there, whether as code objects or AST. (2) Turn __annotations__ into a property which evaluates (and memoizes?) the thunks and returns them. (First explicitly suggested by Alexander Belopol, I think.) He claims that (2) solves the backward compatibility problem, I don't have the knowledge to figure out whether it is that simple or not. It seems plausible to me, so I'd love to hear an explanation. New ideas like DBC would of course be supported by the new __raw_annotations__ since there's no backward compatibility issue there. I'm still -1 on the whole thing on the principle "although sometimes never is better than *right* now". I think the aClass = "aClass" trick described by Chris is perfectly serviceable to deal with the PEP 484 forward type reference issue. The "let's turn all the annotations into expressions" idea can be practically exploited with ast.parse(). I'm guessing a decorator could be used to provide __raw_annotations__ and __annotations__ per (1) and (2) above (although I'm not sure how to do it myself: copying __annotations__ to __raw_annotations__ and then propertizing __annotations__ could be a bit tricky, I guess).

Thank you Stephen. You have phrased my proposal better than I did. As per the using quoted strings, the problems are: 1. The well-formedness of the expression is not checked by the compiler. 2. It is not naturally supported by syntax highlighters and IDEs. They can be made to support it, but most will not. Partly because 3. There is no natural way to distinguish quoted expressions from actual human-readable text (as in the begins library). 4. (My own taste): this is ugly and inconsistent, and there's 2 meaningless characters there :) (6 if multiline) On Sun, Sep 25, 2016 at 8:42 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

On Sep 25, 2016 10:59 AM, "אלעזר" <elazarg@gmail.com> wrote:
2. It is not naturally supported by syntax highlighters and IDEs. They can be made to support it, but most will not.
This is a complete red herring. Having a highlight rule of "apply highlights in string annotations" is straightforward in modern editors. This is like arguing Python should do <whatever> because Notepad.exe doesn't do something smart with it.

On Sun, Sep 25, 2016 at 9:28 PM David Mertz <mertz@gnosis.cx> wrote:
Not that I think it's a killer argument, but why a red herring? Quick search does not find such an explicit option in Gedit, PyDev and yes, Notepad++.exe. It is not a common or default option. Having such a rule by default amounts to admitting that these are not essentially strings, and the quotes there are overloaded. It also means that actual strings are not understood as such, and are incorrectly highlighted. But please let's not delve into this: it is of some importance, but should not affect an actual decision. IDEs are more important. Renaming facilities do over-renaming or under-renaming because of this need to rename inside some strings, but not inside others. Similarly code search facilities, and warnings from IDEs about inlining variables. I have encountered real bugs caused by such an imperfect renaming (and I hope your answer is not "don't do renaming"). A prefix like code"foo()" might help of course, but it is not really used as a string. Elazar

On 26 September 2016 at 03:42, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
OK, that does indeed make more sense, and significantly reduces the scope for potential runtime compatibility breaks related to __annotations__ access. Instead, it changes the discussion to focus on the following main challenges: - the inconsistency introduced between annotations (lazily evaluated) and default arguments (eagerly evaluated) - the remaining compatibility breaks (depending on implementation details) - the runtime overhead of lazy evaluation - the debugging challenges of lazy evaluation The inconsistency argument is simply that people will be even more confused than they are today if default arguments are evaluated at definition time while annotations aren't. There is a lot of code out there that actively relies on eager evaluation of default arguments, so changing that is out of the question, which then provides a strong consistency argument in favour of keeping annotations eagerly evaluated as well. There would likely still be some compatibility breaks around name access in method annotation definitions, and compatibility would also break for any code that actually did expect to trigger a side-effect at definition time. This is a much smaller scope for breakage than breaking __annotations__ access, but we can't assume it won't affect anyone as there's a lot of code out there that we'd judge to be questionable from the point of view of maintainability and good design aesthetics that nevertheless still solves the problem the author was aiming to solve. The runtime memory overhead of lazy evaluation isn't trivial. Using a naive function based approach:
And that's only the function object itself - it's not counting all the other objects hanging off the function object like the attribute dictionary. A more limited thunk type could reduce that overhead, but it's still going to be larger in most cases than just storing the evaluation result. The impact on runtime speed overhead is less certain, but also likely to be a net negative - defining functions isn't particularly cheap (especially compared to literal references or a simple name lookup), and calling them if you actually access __annotations__ isn't going to be particularly cheap either. The debugging challenge is the same one that arises with any form of delayed evaluation: by default, the traceback you get will point you to the location where the delayed evaluation took place *not* the location where the flawed expression was found. That problem can be mitigated through an exception chaining design that references the likely location of the actual error, but it's never going to be as easy to figure out as cases where the traceback points directly at the code responsible for the problem. So I'm still -1 on the idea, but it's not as straightforward as the argument against the naive version of the proposal that also broke __annotations__ lookup. Cheers, Nick. P.S. As an illustration of that last point, the PEP 487 implementation currently makes problems with __set_name__ attribute definitions quite hard to figure out since the traceback points at the class definition header, rather than the offending descriptor assignment: http://bugs.python.org/issue28214 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thank you all. I think this thread is pretty much close by now. I understand at least most of your concerns and I will take time to shape my idea. I wanted to note one last thing, though, regarding my claim that annotations are not actually standard expressions: Guido had once expressed his concerns regarding the performance hit of using cast(), since it is not easily (or at all) optimized away. This performance hit should not be there in the first place, if the distinction between annotations and evaluatable expressions was kept - i.e. by allowing the attachment of annotations to expressions (as I believe was proposed several times). Now, I understand that there are very good reasons not to allow it; keeping the language simple and familiar would be my first guess - but note how the semantics of the "main" language is hindered by the complexities of its specification-related syntactic subset, which is not due, in my opinion. If you want to specify things, the syntactic hit is unavoidable, but the semantic hit is not. (BTW why isn't it written cast[T](exp) ?) Thank you again for this discussion Elazar

Let's talk about lazy evaluation in a broader sense that just function annotations. If we had syntax for lazy annotation -- let's call them thunks, after Algol's thunks -- then we could use them in annotations as well as elsewhere. But if we special case annotations only, the Zen has something to say about special cases. On Mon, Sep 26, 2016 at 02:57:36PM +1000, Nick Coghlan wrote: [...]
Default arguments are a good use-case for thunks. One of the most common gotchas in Python is early binding of function defaults: def func(arg=[]): ... Nine times out of ten, that's probably not what you want. Now, to avoid all doubt, I do not want to change function defaults to late binding. I've argued repeatedly on comp.lang.python and elsewhere that if a language only offers one of early binding or late binding, it should offer early binding as Python does. The reason is, given early binding, it it trivial to simulate something like late binding: def func(arg=None): if arg is None: arg = [] ... but given late binding, it is ugly and inconvenient to get a poor substitute for early binding when that's what you want. So, please, let's not have a debate over the behaviour of function defaults. But what if we could have both? Suppose we use backticks `...` to make a thunk, then we could write: def func(arg=`[]`): ... to get the late binding result wanted. Are there other uses for thunks? Potentially, they could be used for Ruby-like code blocks: result = function(arg1, arg2, block=```# triple backticks do_this() do_that() while condition: do_something_else() print('Done') ```, another_arg=1) but then I'm not really sure what advantage code blocks have over functions.
Indeed. There are only (to my knowledge) only two places where Python delays evaluation of code: - functions (def statements and lambda expressions); - generator expressions; where the second can be considered to be syntactic sugar for a generator function (def with yield). Have I missed anything? In the same way that Haskell is fundamentally built on lazy evaluation, Python is fundamentally built on eager evaluation, and I don't think we should change that. Until now, the only way to delay the evaluation of code (other than the body of a function, of course) is to write it as a string, then pass it to eval/exec. Thunks offer an alternative for delayed evaluation that makes it easier for editors to apply syntax highlighting: don't apply it to ordinary strings, but do apply it to thunks. I must admit that I've loved the concept of thunks for years now, but I'm still looking for the killer use-case for them, the one clear justification for why Python should include them. - Late-bound function default arguments? Nice to have, but we already have a perfectly serviceable way to get the equivalent behaviour. - Code blocks? Maybe a Ruby programmer can explain why they're so important, but we have functions, including lambda. - Function annotations? I'm not convinced thunks are needed or desirable for annotations. - A better way to write code intended for delayed execution? Sounds interesting, but not critical. Maybe somebody else can think of the elusive killer use-case for thunks, because I've been pondering this question for many years now and I'm no closer to an answer.
It's not just published code. It's also one-off throw-away code, including code executed in the interactive interpreter then thrown away. It is really helpful to be able to monkey-patch or shadow builtins, insert some logging code or even a few print statements, or perhaps something that modifies and global variable, for debugging or to learn how something works. Could it be equally useful inside annotations? I expect so... complicated only by the fact that one needs to monkey-patch the *metaclass*, not the type itself. It may be that I'm completely off-base here and this is a stupid thing to do. But I say that until the community has more experience with annotations, we shouldn't rule it out. (Just to be clear: I'm mostly talking about interactive exploration of code, not production code. Python is not Ruby and we don't encourage heavy use of monkey-patching in production code. But it has its uses.)
This is better: py> sys.getsizeof((lambda: "").__code__) 80
Nevertheless, an explicit thunk syntax will make this a matter of consenting adults: if you choose to shoot your foot off with a hard-to-debug thunk, you have nobody to blame but yourself. Or whoever wrote the library that you're using. *wink* -- Steve

On Mon, Sep 26, 2016 at 10:46:57PM +1000, Steven D'Aprano wrote:
Well, there's a use-case I have been pondering for a long while now which could be satisfied by this: enumerated generator displays. So suppose you have a composite boolean value, composed by the 'and' of many conditions (which all take long to compute), and you want to short-circuit. Let's take the following example. valid = True valid &= looks_like_emailaddress(username) valid &= more_than_8_characters(password) valid &= does_not_exist_in_database(username) valid &= domain_name_of_emailaddress_has_mx_record(username) ... some more options ... (I forgot the exact use-case, but I still remember the functionality I wanted, so bear with me). Of course, the above is not short-circuiting, so it would be replaced by def check_valid(username, password): if not looks_like_emailaddress(username): return False if not more_than_8_characters(password): return False if not does_not_exist_in_database(username): return False if not domain_name_of_emailaddress_has_mx_record(username): return False ... return True valid = check_valid() or valid = True\ and looks_like_emailaddress(username)\ and more_than_8_characters(password)\ and does_not_exist_in_database(username)\ and domain_name_of_emailaddress_has_mx_record(username) But in all reality, I want to write something like: valid = all(@@@ looks_like_emailaddress(username), more_than_8_characters(password), does_not_exist_in_database(username), domain_name_of_emailaddress_has_mx_record(username), @@@) With `@@@` designating the beginning/ending of the enumerated generator display. Now, this is currently not possible, but if we had some kind of thunk syntax that would become possible, without needing an enumerated generator display. However the problem I see with the concept of `thunk` is: When does it get un-thunked? In which of the following cases? 1. When getting an attribute on it? 2. When calling it? --> See 1. with `__call__`. 3. When subindexing it? --> See 1. with `__getitem__`. 4. When assigning it to a name? It shouldn't have to be un-thunked, I think. 5. When adding it to a list? No un-thunking should be necessary, I think. However, the problem with thunks is (I think) that to make that happen either - *all* objects need to include yet another level of redirection, or - a thunk needs to get allocated the maximum size of the value it could possibly store. (But a `unicode` object could have an arbitrary size) or - there needs to be some way to 'notify' objects holding the thunk that its value got updated. For a dict/list/tuple this could readily grow into O(n) behaviour when un-thunking a thunk. or - any C-level functionality needs to learn how to deal with thunks. For instance, `Py_TYPE` would have to *resolve* the thunk, and then return the type of the value. or - I'm running out of ideas here, but maybe creating a custom type object for each thunk that does pass-through to a wrapped item? Thunked objects would work *exactly* the same as normal objects, but at a (small) indirection for any action taken. Still, somehow `Py_TYPE` and `Py_SIZE` and any other macros would still have to force evaluation. Kind regards, Sjoerd Job

Hello everyone, this idea looks like something I have tried building already: https://github.com/llllllllll/lazy_python. This project implements a `thunk` class which builds up a deferred computation which is evaluated only when needed. One use case I have had for this project is building up a larger expression so that it may be simplified and then computed concurrently with dask: http://daisy-python.readthedocs.io/en/latest/. By building up a larger expression (and making the tree accessible) users have the ability to remove common subexpressions or remove intermediate objects. In numpy chained expressions often make lots of allocations which are quickly thrown away which is why projects like numexpr ( https://github.com/pydata/numexpr) can be such a serious speed up. These intermediates are required because the whole expression isn't known at the start so it must be evaluated as written. Things to consider about when to evaluate: 1. Functions which branch on their input need to know which branch to select. 2. Iteration is really hard to defer in a way that is efficient. lazy_python just eagerly evaluates at iteration time but builds thunks in the body. 3. Stateful operations like IO which normally have an implied order of operation now need some explicit ordering. Regarding the `Py_TYPE` change: I don't think that is correct unless we made a thunk have the same binary representation as the underlying object. A lot of code does a type check and then calls macros that act on the actual type like `PyTuple_GET_ITEM` so we cannot fool C functions very easily. On Mon, Sep 26, 2016 at 9:27 AM, Sjoerd Job Postmus <sjoerdjob@sjoerdjob.com
wrote:

You already know I want this for contracts etc.. Here some things that I consider important: 1. There should be some way to bind the names to function parameters, as in @contract def invert(x: `x != 0`) -> float: return 1 / x @contract def invertdiff(x: int, y: `x != y`) -> float: return 1 / (x-y) 2. For this and other reasons, the AST should be available. I think it can be a single AST per place in code, but it should be immutable. 3. Backticks are problematic because they cannot be nested. I suggest (name: <expression>) or ('name': expression). This name can be googled. def compose(f: `such_that: pure(f)`, g: `such_that: pure(g)`): return lambda x: f(g(x)) 4. I think it's a bad idea to use thunks as DSL (different semantics than standard expressions), except in annotations and for specification purposes. In short, I want this thing. But for only annotations, assertions, and possibly default arguments as an ad-hoc fix. Elazar On Mon, Sep 26, 2016 at 5:05 PM Joseph Jevnik <joejev@gmail.com> wrote:

On Sun, Sep 25, 2016 at 01:55:09AM +0000, אלעזר wrote:
1. Please consider disallowing the use of side effects of any kind in annotations,
That is *simply not possible* in Python. Actually, no, that's not quite correct. One way to prohibit side-effects would be to make all annotations string literals, and ONLY string literals. Or possibly bare names (assuming current semantics for local variable name lookup): def func(arg:'no possible side effects here') -> OrHere: ... But as soon as allow such things as union types and lists, then all bets are off: def func(arg:Sequence[list]): ... There is no way of prohibiting side effects in type(Sequence).__getitem__ once it is called. Nor would we want to. The ability to shadow or monkey-patch types for mocking, testing, debugging etc, including the ability to have them call print, or perform logging, is a feature beyond price. We don't need it often, but when we do, the ability to replace Sequence with a mock that may have side-effects is really useful.
It has been a feature since Python 3.0 that annotations are evaluated at runtime. And that means the possibility of side-effects. So, yes, it is already a feature. Even if you get the behaviour that you want, the absolute earliest it could happen would be after a deprecation period of at least one point release. That means: * 3.7 introduces a DeprecationWarning whenever you use annotations which aren't simple names or strings; * and possibly a __future__ import to give the new behaviour; * and 3.8 would be the earliest it could be mandatory. Forget about 3.6 -- that's already frozen apart from bug fixes, and this is not a bug.
and I got the feeling it is hardly controversial.
It is extremely controversial. The fact that you can say that it isn't suggests that you're not really paying attention to what we're saying. Even if what you ask for is easy (it isn't), or even possible, it still goes completely and utterly against the normal semantics of Python and the philosophy of the language. No, under normal circumstances nobody is going to write: def func(arg: mylist.append(value) or int): ... in production code. That's simply bad style. But we don't ban things just because they are bad style. Circumstances are not always normal, sometimes it is useful to use dirty hacks (but hopefully not in production code), and Python is not a B&D language where everything is prohibited unless explicitly allowed.
I really have no interest in wasting the time of anybody here.
And yet, despite receiving virtually no interest from any other person, you continue to loudly and frequently argue for this proposal. [...]
All expressions evaluate to a value. And all values in Python are objects. I don't understand what distinction you think you are making here. Are you suggesting that Python should gain some sort of values which aren't objects?
Wrong. Not always. The proof is that Python exists. Contracts, types, assertions etc in Python *are* written in Python. That's the end of the story. You cannot argue that "contracts are written in a different language" because that is untrue. Contracts are written in Python, and we wouldn't have it any other way.
The forward reference problem still exists in languages where type declarations are a separate language, e.g. Pascal, C++, Java, etc. http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4754974 http://stackoverflow.com/questions/951234/forward-declaration-of-nested-type... etc. There are many ways around it. One way is to make the language so simple that forward declarations aren't relevant. Another is to make multiple passes over the source code. Another is to introduce an explicit "forward" declaration, as in some dialects of Pascal. Python uses strings.
I believe the last point has a very good reason, as explained later: it is an interpretation of a different language,
But it *isn't* such a thing, nor should it be.
foreign to the interpreter, although sometimes close enough to be useful.
Sometimes close enough to be useful. Does that mean it is usually useless? *wink*
It is of course well formed, so the considerations are not really security-related.
You've talked about eval'ing the contents of __raw_annotations__. That means if somebody can fool you into storing arbitrary values into __raw_annotations__, then get you to call annotations() or use inspect, they can execute arbitrary code. How is this not a security concern? It might be hard to exploit, since it requires the victim to do something like: myfunc.__raw_annotations__['arg'] = something_untrusted but if exploited, the consequences are major: full eval of arbitrary code. In comparison, the only similar threat with annotations today is if the victim is fooled into building a string containing a def with annotations, then passing it to exec: annot = something_untrusted code = """def func(arg: %s): ... """ % annot exec(code) but if you're using exec on an untrusted string you have already lost. So annotations as they exist now aren't adding any new vulnerabilities. Still, the important thing here is not the (hard to exploit) potential vulerability, but the fact that your proposal would lead to a massive increase in the complexity of the language (a whole new compiler/ iterpreter for the second, types-only, mini-language) and an equally major *decrease* in useful functionality. Have I mentioned that I'm against this? If not, I'm against it. -- Steve
participants (15)
-
Alexander Belopolsky
-
Chris Angelico
-
David Mertz
-
Greg Ewing
-
Ivan Levkivskyi
-
Joseph Jevnik
-
Neil Girdhar
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Sjoerd Job Postmus
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
אלעזר