pep-0484 - Forward references and Didactics - be orthogonal

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, when looking at pep-0484 I find the *non-orthogonal* construction, which may lead to a misconcetion: What students should be able to code: 1. varinat #-------------wishful----------------------------------\ class Tree: def __init__(self, left: Tree, right: Tree): self.left = left self.right = right what students have to write instead: #-------------bad workaround----------------------------\ class Tree: def __init__(self, left: 'Tree', right: 'Tree'): self.left = left self.right = right / Please enable: from __future__ import annotations so the *first* variant should be possible \ At this very moment (python 3.5rc1), it is not possible, but we need it, so the construction will be orthogonal from the point of view for students(!) - _one_ concept should work in different circumstances. TNX Ludger Humbert - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXam2MACgkQJQsN9FQ+jJ+RHgCfdcTgjVmZ3ULLwjerpJ3NdN7d NH8AoIvdTqWbkcfi7o8e7JuAYXbgZk0V =OMhU -----END PGP SIGNATURE-----

Joseph Jevnik writes:
I would expect "class" to somehow redeclare the class name as it does when executed:
Regardless of the plausibility of the proposed behavior, the reason for the behavior defined by PEP 484 is explained there. Annotations (including type hints), like default values, are evaluated at the time of method *definition* per PEP 3107, and that value is bound to the default slot. OTOH, the class *declaration* is implemented by binding the name to the class, which occurs once the class object is available, ie, *after* the methods are defined and added to the object. Guido and Mark (the BDFL-Delegate) clearly considered the current situation to be acceptable if not elegant, and left elegance for a future PEP (which somebody else will have to write, I guess). There's no guarantee of acceptance, since it seems to involve a backward- incompatible change. N.B. I don't think other users of annotations would be happy with "from __future__ import annotations", which could easily be taken as a deprecation of their use cases. The name of the __future__ import will probably have to be bikeshedded a bit. <wink/>

On 24.08.2015 06:19, Prof. Dr. L. Humbert wrote:
How about not using type hints when teaching the basics of trees? Type hints do not replace good variables names. What about using left_tree instead of left and right_tree instead of right? That should simplify the example for the students, remove advanced concepts and teach them something about right naming (conventions) which is desirable these days when looking at production code (readability, maintainability and so forth). Regards, Sven R. Kunze

Sven R. Kunze writes:
That's somewhat unfair. I'll let Prof. Humbert explain his own thinking, but I can imagine a number of pedagogical contexts where I would use Python because it doesn't get in the programmer's way very often, but require students to provide type hints as a compact (and machine-checkable!) way of documenting that aspect of their design. I found his pedagogical approoach perfectly plausible when he posted originally.

On Mon, Aug 24, 2015 at 06:19:47AM +0200, Prof. Dr. L. Humbert wrote: [...]
That would be very nice, but I don't see how it would be possible in a dynamic language with Python's rules. Can you explain how you expect this to work? In particular, what happens if Tree already has a value? Tree = "This is my Tree!" class Tree: def __init__(self, left: Tree, right: Tree): ... What happens if you use Tree outside of an annotation? # somehow, we enable forward declarations class Tree: x = Tree # What is the value of x here? The difficulty is that annotations are not merely declarations to the compiler, they have runtime effects as well. If they were pure declarations, we could invent some ad hoc rule like "if an annotation is an unbound name, inside a class, and that name is the same as the class, then treat it as a forward declaration". But we can't, because the __init__ function object needs to set the annotations before the Tree class exists. So the annotation needs to be something that actually exists. In order for the annotation to use Tree (without quotation marks) the name Tree needs to be bound to some existing value, and that value is used as the annotation, not the Tree class. If you can think of some way around this restriction, preferably one which is backwards-compatible (although that is not absolutely required) then please suggest it.
I don't think this is a "bad" work-around. I think it is quite a good one. It is sad that we need a work-around, but given that we do, this is simple to use and learn: If the type already exists, you can annotate variables with the type itself. But if the type doesn't yet exist (say you are still constructing it), you will get a NameError, so you can use the name of the class as a string as a forward declaration: class Tree: # At this point, Tree is still being constructed, and the class # doesn't yet exist, so we need to use a forward reference. def __eq__(self, other: 'Tree') -> Bool: ... # At this point, the Tree class exists and no forward reference # is needed. def builder(data: List) -> Tree: ...
I agree that is desirable, but surely many languages have some sort of forward declaration syntax? I know that both the Pascal and C families of languages do. -- Steve

On Aug 24, 2015, at 19:52, Steven D'Aprano <steve@pearwood.info> wrote:
What would a forward declaration mean in Python? In C, a forward declaration for a struct tag specifies that it is a struct tag. You can reference "struct spam *" as a type after that, but you can't reference "struct spam", because you need the size for that, which doesn't exist yet. You can't dereference a spam or access a member of a spam. The only thing you know is that a thing called struct spam exists, and is a struct type rather than a function type or native value typedef. That wouldn't do any good in Python. To be useful, it would have to mean something very different. For example, it could bind the name to some magic marker that means "after something else is bound to this name, go back and fix up everything that made a reference to this magic marker to refer to the bound value instead". (Presumably any method on the marker value just raises a NoValueYetException or something.)

On Tue, Aug 25, 2015 at 01:35:24AM -0700, Andrew Barnert wrote:
I thought it was obvious from context, not to mention from the example given by the OP. Its a reference to something that doesn't exist yet, namely the class still in the process of being created. E.g.: class Tree: def merge(self, other:'Tree') -> 'Tree': ... The string 'Tree' is a forward reference to the Tree class, as far as either the type-checker or a human reader is concerned. The annotations will, of course, be strings. But they will be understood as a reference to the Tree class. I mean reference in the sense of "to refer to", not in the technical sense of "pointer". Aside: we could use a decorator which replaces all annotations of the form 'Tree' with the actual Tree class itself. In pseudo-code: def decorate(cls): for each method in cls: for key, val in method.__annotations__: if val == cls.__name__: method.__annotations__[key] = cls @decorate class Tree: ... This may be useful for runtime introspection, but it comes too late to be of any use to any type-checker that runs at compile-time or earlier.
You're over complicating this. (Snarky comments regarding "a-strings" for annotations can go straight to /dev/null :-) Both PEP 484 and mypy call "use the class name as a string as a stand in for the actual class" a "forward reference": https://www.python.org/dev/peps/pep-0484/#forward-references http://mypy.readthedocs.org/en/latest/kinds_of_types.html#class-name-forward... and the OP's example of annotations in the Tree class comes straight out of the PEP. I am sorry if I mislead you by being sloppy and calling them "forward declaration" sometimes. -- Steve

On Aug 25, 2015, at 09:56, Steven D'Aprano <steve@pearwood.info> wrote:
I thought it was obvious, until you brought up C and Pascal, whose forward references are a pretty different thing from what PEP 484 and the OP's example imply, and whose compilation process is radically different from Python's. If you meant the same thing as the PEP, then the shorter answer is: I don't think there's anything useful to learn from C here. I think people have a sense of what it would mean to do what the OP wants, or at least more so than what it would mean to port the vaguely similar idea from C.

On Tue, Aug 25, 2015 at 02:24:45PM -0700, Andrew Barnert wrote:
In context, I was explicitly replying to the OPs comment about "needing" to annotate methods with the class object itself, rather than using a string, because "_one_ concept should work in different circumstances". I was pointing out that other languages make do with two concepts, and have their own ways of dealing with the problem of referring to something which doesn't exist yet. I wasn't suggesting that we copy what C, or any other language, does. To be honest, I thought that my post was pretty clear that far from thinking there is a problem to be solved, the use of string literals like 'Tree' is not just an acceptable solution to the problem, but it is an elegant solution to the problem. As I see it: - adding some sort of complicated, ad hoc special case to allow forward references would be a nasty hack and should be rejected; - large changes to the language (e.g. swapping to a two-pass compile process, to allow function and class hoisting) would eliminate the problem but break backwards compatibility and is a huge change for such a minor issue. I don't see this as needing anything more than teaching the students how Python's execution model actually works, plus a simple work-around for annotations within a class (use the class name as a string). -- Steve

On 8/24/2015 12:19 AM, Prof. Dr. L. Humbert wrote:
As you should know, at least after reading previous responses, making this work would require one of two major changes to Python class statements. 1. The class name has special (context sensitive) meaning in enclosed def statements. The compiler would have to compile def statements differently than it would the same def statements not in a Tree class. It would then have to patch all methods after the class is created. See the annoclass function below. A proposal to make the definition name of a function special within its definition has already been rejected. 2. Class statements would initially create an empty class bound to the class name. This could break back compatibility, and would require cleanup in case of a syntax error in the body. This would be similar to import statements initially putting a empty module in sys.modules to support circular imports. This is messy and still bug prone is use.
You did not say why you think this is bad. Is it a) students have to type "'"s?, or b) the resulting annotations are strings instead of the class? The latter can easily be fixed. --- from types import FunctionType def annofix(klass): classname = klass.__name__ for ob in klass.__dict__.values(): if type(ob) is FunctionType: annotations = ob.__annotations__ for arg, anno in annotations.items(): if anno == classname: annotations[arg] = klass return klass @annofix class Tree: def __init__(self, left: 'Tree', right: 'Tree'): self.left = left self.right = right print(Tree.__init__.__annotations__) # {'left': <class '__main__.Tree'>, 'right': <class '__main__.Tree'>} --- An alternative is to use a placeholder object instead of the class name. This is less direct, not repeating the name of the class throughout the definition makes it easier to rename the class or copy methods to another class. --- class Klass: pass # An annotation object meaning 'the class this method is defined in' def annofix2(klass): for ob in klass.__dict__.values(): if type(ob) is FunctionType: annotations = ob.__annotations__ for arg, anno in annotations.items(): if anno == Klass: annotations[arg] = klass return klass @annofix2 class Tree2: def __init__(self, left: Klass, right: Klass): self.left = left self.right = right print(Tree2.__init__.__annotations__) {'right': <class '__main__.Tree2'>, 'left': <class '__main__.Tree2'>} -- Terry Jan Reedy

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 25.08.2015 17:15, Terry Reedy wrote:
Ok, the answer for short: a) Pls let me explain it a bit:
1st class pedagogical/didactical thinking … Consider: there are recursive defined ADTs and we want to enable students to understand concepts and produce python-code to realize, what they understood. The main point: if the students already understood, that it is possible to place type hints to place type hints for arguments and results of functions/methods they should be able to reuse the notation in an orthogonal manner. For example:
When it comes to recursive ADTs they should be able to write
TNX Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXckU0ACgkQJQsN9FQ+jJ+zGgCdHZvnTcM5H4YMGVa/S0hv/c2o g8IAn2ZEFy8sL0f8uZDnzr1yFcFHc3A+ =Ex97 -----END PGP SIGNATURE-----

On 25.08.2015 18:01, Prof. Dr. L. Humbert wrote:
I am sorry about going back to this but why not teaching this in a different lesson?
I still think, only looking at recursive ADTs, it is enough to for them to write class Tree: def __init__(self, left_tree, right_tree): self.left_tree = left_tree self.right_tree = right_tree This way, you can teach them about code style, proper variable names and so forth. Regards, Sven R. Kunze

Sven R. Kunze writes:
Personally, I think that's ugly and unnecessarily verbose. YMMV, but I don't see why anybody else's sense of style should bow to yours. Also, if you're using Python, your abstraction is broken. I suspect students will write from tree import Tree my_tree = Tree(1, Tree(2, 3)) *which should fail* because Tree is not a union type, but your "ADTs by naming convention" approach can't catch that. Of course, in a real program "Although practicality beats purity" would argue that's a feature, but if you're teaching ADTs it's a bug. Also of course, under strict typing Prof. Humbert's example cannot be instantiated (since there's no default for the subtrees, you need to create an infinitely deep Tree of Trees of Trees ...). I doubt he intended that, but when students graduate to languages like Haskell, they're going to need to understand that kind of thing.

Prof. Dr. L. Humbert writes:
I think we all understand your point. The problem is that Python as designed is simply not capable of that. You have a choice: python-code with its "class statements bind names to (fully-constructed) class objects" semantics, or pseudo-python with "class statements are declarations" semantics. Python simply isn't a declarative language in that sense. I don't have any objection to a language with different semantics, but I find Python's semantics very consistent (except for module import -- which is improved a lot thanks to Brett -- and the occasional class which creates attributes in __getattr__ -- which I dislike for this reason). I doubt Python-Dev will want to give up that consistency; I know I don't want to. Although the PEP doesn't explicitly say it's a good practice, AFAICS using the name of a type (ie, a string) is supported everywhere a type identifier is. (Explicitly permitted are *forward* references to *undefined* types. However, in the "Django" example where "A.a" is the name of a type defined in module A, "B.b" is defined in B, and each uses the other, in import A import B # refers to A.a using the string "A.a" A.a is an existing type. I conclude that unless the implementation is excessively complicated, it's permitted to refer to already defined types using the string name.) In other words, although class Leaf(): def __init__(self, value: int): self.value = value class Tree: # with leaves def __init__(self, left: Union['Tree', Leaf], right: Union['Tree', Leaf]): self.left = left self.right = right is indeed non-orthogonal and ugly, you could declare the constructors def __init__(self, value: 'int'): def __init__(self, left: 'Union[Tree, Leaf]', right: 'Union[Tree, Leaf]'): using actual names of types (strings like "'Tree'") instead of bound names (identifiers like "Tree") everywhere. That may not be quite as clean as you'd like, but it seems orthogonal enough to me: you have a consistent syntax for all type annotations. I'd also point out that your own notation isn't quite orthogonal: self isn't annotated in your method definitions. If students can handle the special syntax for "self", I suppose that they can handle a special syntax for recursively defined types (or you could use Steven d'A's approach of a placeholder class "RecursivelyDefined", which would require augmenting the typechecker to recognize it). Steve

On Tue, Aug 25, 2015 at 11:15 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I have been thinking about this lately in a different context, and I would very much favor this approach. I think in large part because it works this way for modules it would make sense for it to work for classes as well. The fact that ClassName is bound to an object that will *eventually* become the class as soon as the parser has read in: class ClassName: represents, to me (and I would suspect to many students as well), the least astonishment. I realize it would be a very non-trivial change, however.
What about:
A little ugly, and potentially error-prone (but only, I think, in exceptional cases). It's also a decent opportunity to teach something about forward-declaration, which I think is worth knowing about. And I think this makes what's going on clearer than the string-based workaround. I didn't follow every single thread about PEP-484 though and I don't know if, or why this approach to forward-declaration was rejected. Erik

On 8/25/2015 12:19 PM, Erik Bray wrote:
'in use'.
It might be more useful to have def statements work that way (bind name to blank function object). Then def fac(n, _fac=fac): # less confusing than 'fac=fac' return _fac(n-1)*n if n > 1 else 1 would actually be recursive regardless of external name bindings. But as is the case with modules, exposing incomplete objects easily leads to buggy code, such as def f(n, code=f.__code__): pass
I like this better than my decorator version. Notice that if Python were changed so that 'Tree' were bound to a blank class first thing, then Tree(Tree) would be subclassing itself, breaking code like the above unless a special rule was added to remove a class from its list of subclasses. -- Terry Jan Reedy

On Aug 25, 2015, at 09:19, Erik Bray <erik.m.bray@gmail.com> wrote:
The problem here is, what if someone writes this: def __init__(self, left: Tree, right: Tree): # something with left.left Or: @classmethod def maketree(cls): return Tree(None, None) Here, Tree is "defined", but the type checker can't actually infer the type of left.left or the arguments of Tree's constructor (even if __init__ was defined before maketree). There are various ways you could special-case things to deal with this problem. The simplest would be that a forward-declared class just has no methods or other attributes, or maybe that it has only the ones inherited from superclasses or metaclasses, until the definition is completed, but my naive intuition says that it's obvious what both of the above mean, and the only reason I'd expect it to be an error is by understanding how it has to work under the covers.

On 25.08.2015 17:15, Terry Reedy wrote:
Although, I do not agree with the intentions of the OP, I would love to have "more forward references" in Python. I think the main issue here is the gab between intuition and what the compiler actually does. The following line: class MyClass: # first appearance of MyClass basically creates MyClass in the mind of the developer reading this piece of code. Thus, he expects to be able to use it after this line. However, Python first assigns the class to the name MyClass at the end of the class definition. Thus, it is usable only after that. People get around this (especially since one doesn't need it thus often), but it still feels... different. Best, Sven

On 8/25/2015 1:48 PM, Sven R. Kunze wrote:
I think the gap is less than you think ;-). Or maybe we think differently when reading code. Both human and compiler create the concept 'MyClass' (properly quoted) as an instance of the concept 'class'. In a static language like C, types are only concepts in the minds of programmers and compilers. There are no runtime char, int, float, or struct xyz objects, only the names or concepts. When the compiler is done, there are only bytes in a sense not true of Python.
Thus, he expects to be able to use it after this line.
One can use the string 'MyClass' in an annotation, for instance, and eventually dereference it to the object after the object is created. A smart type checker could understand that 'MyClass' in annotations within the class MyClass statement means instances of the future MyClass object. A developer should not expect to use not-yet-existent attributes and methods of the object. -- Terry Jan Reedy

On 26.08.2015 07:50, Terry Reedy wrote:
A developer should not expect to use not-yet-existent attributes and methods of the object.
Unfortunately, that is where I disagree. The definition of "not-yet-existent attribute" can vary from developer to developer.

On Aug 26, 2015, at 13:22, Sven R. Kunze <srkunze@mail.de> wrote:
But it has to mean something, and what it means dramatically affects what the code does. That's why Python has a simple rule: a straightforward imperative execution order that means you can easily tell whether the attribute was created before use. Also note that in Python, attributes can be added, replaced, and removed later, and their values can be mutated later. So the notion of "later" has to be simple to understand. Just saying "evaluate these statements in some order that's legal" doesn't work when some of those statements can be mutating state. In Python, a statement can create, replace, or destroys attributes of the module or any other object, and even an expression can mutate the values of those attributes. And in fact that's what everything in Python is doing, even declarative-looking statements like class, so you can't just block mutation, you have to deal with it as a fundamental thing. Besides fully compiler-driven evaluation order, there are two obvious alternatives that let you to keep linear order, but make sense of using values before they're created: lazy evaluation, as in Haskell, and dataflow evaluation, as in Oz. Maybe one of those is what you want here. But trying to fit either of those together with mutable objects sensibly is not trivial, nor is it trivial to fit them together with the kind of dynamic OO that Python provides, much less both in one. I'd love to see what someone could come up with by pursuing either of those, but I suspect it wouldn't feel much like Python.

On Tue, Aug 25, 2015 at 07:48:39PM +0200, Sven R. Kunze wrote:
Intuition according to whom? Some people expect that. Others do not. People hold all sorts of miscomprehensions and misunderstandings about the languages they use, and Python is no different.
To me, it feels intuitive and natural. Of course you can't use the class until after you have finished creating it. To me, alternatives like Javascript's function hoisting feel weird. This looks like time travel: // print is provided by the Rhino JS interpreter var x = f(); print(x); // multiple pages later function f() {return "Hello World!";}; How can you call a function that doesn't exist yet? There are even stranger examples, but for the sake of brevity let's just say that what seems "intuitive" to one person may be "weird" to another. With one or two minor exceptions, the Python interactive interpreter behaves identically to the non-interactive interpreter. If you have valid Python code, you can run it interactively. The same can't be said for Javascript. You can't run the above example interactively without *actual* time travel, if you try, it fails: [steve@ando ~]$ rhino Rhino 1.7 release 0.7.r2.3.el5_6 2011 05 04 js> var x = f(); js: "<stdin>", line 2: uncaught JavaScript runtime exception: ReferenceError: "f" is not defined. at <stdin>:2 A nice, clean, easy to understand execution model is easy to reason about. Predictability is much more important than convenience: I much prefer code which does what I expect over code that saves me a few characters, or lines, of typing, but surprises me by acting in a way I didn't expect. The fewer special cases I have to learn, the more predictable the language and the less often I am surprised. Python treats functions and classes as ordinary values bound to ordinary names in the ordinary way: the binding doesn't occur until the statement is executed. I like it that way. -- Steve

On 8/26/2015 10:43 PM, Steven D'Aprano wrote:
So do I. The same is true of import statements -- the binding of the name to the module does not happen until the module is built. It happens that the import machinery has a cache where is sticks an initially empty module in case of circular imports. But that is normally invisible to the code with the import statement. -- Terry Jan Reedy

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear collegues, for me the arguments are quite clear and I don't want to change much of the underlying work to get my prefered notation ;-) So I though of an alternative orthogonal approach for learning and being orthogonal. But I found, this is another showstopper for being orthogona l: It is indeed possible to run the following code / from typing import List class Tree: def __init__(self, left: 'Tree', right: 'Tree'): self.left = left self.right = right def leaves(self) -> List['Tree']: return [] def greeting(name: 'str') -> 'str': return 'Hello ' + name \ but not … def leaves(self) -> 'List'['Tree']: … which would be orthogonal, when deciding to put all used types in '…' Perhaps there will be a chance to make this a valid construction? be orthogonal ;-) Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXgBu4ACgkQJQsN9FQ+jJ/ueQCdH7RUdpJ4DZd0/12AbP6dLF+E 8NgAn1XFIuRRCIC+Bas68qPXi0SVwgtT =QF2Z -----END PGP SIGNATURE-----

Prof. Dr. L. Humbert <humbert@...> writes:
The issue is that Python does not have separate type/value universes. 'Tree' is just a type hint, not a type in the conventional sense. I would very much like if Python *did* have separate types/values, so that one could write (OCaml): class tree (left : tree) (right : tree) = object val left = left val right = right end ;; Which is an uninhabited type, since you need a tree to construct a tree! :) Thus: class tree (left : tree option) (right : tree option) = object val left = left val right = right end ;; Stefan Krah

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28.08.2015 15:58, Petr Viktorin wrote: … pv> You can put the entire hint in a string: pv> def leaves(self) -> 'List[Tree]': TNX … solves the problem in this example and makes it orthogonal. Next showstopper will come, when working on/with datastructures, which contains entangled class-structures, perhaps the instantiation of a class, when we have to use self.node = Node(…) but not self.node= 'Node'(…) So I think, we as educators have to live with this pedagogical »suboptimal« solution(s) and have to communicate those non-orthogonal notation and make clear, what the reason is all about. TNX Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXgtLMACgkQJQsN9FQ+jJ+4cgCfTaer3dFG4CscmNu/yo4AWxti 0AQAoI9nyTaA2hgDfyCEFk5WdkW1/28L =Wp9U -----END PGP SIGNATURE-----

Joseph Jevnik writes:
I would expect "class" to somehow redeclare the class name as it does when executed:
Regardless of the plausibility of the proposed behavior, the reason for the behavior defined by PEP 484 is explained there. Annotations (including type hints), like default values, are evaluated at the time of method *definition* per PEP 3107, and that value is bound to the default slot. OTOH, the class *declaration* is implemented by binding the name to the class, which occurs once the class object is available, ie, *after* the methods are defined and added to the object. Guido and Mark (the BDFL-Delegate) clearly considered the current situation to be acceptable if not elegant, and left elegance for a future PEP (which somebody else will have to write, I guess). There's no guarantee of acceptance, since it seems to involve a backward- incompatible change. N.B. I don't think other users of annotations would be happy with "from __future__ import annotations", which could easily be taken as a deprecation of their use cases. The name of the __future__ import will probably have to be bikeshedded a bit. <wink/>

On 24.08.2015 06:19, Prof. Dr. L. Humbert wrote:
How about not using type hints when teaching the basics of trees? Type hints do not replace good variables names. What about using left_tree instead of left and right_tree instead of right? That should simplify the example for the students, remove advanced concepts and teach them something about right naming (conventions) which is desirable these days when looking at production code (readability, maintainability and so forth). Regards, Sven R. Kunze

Sven R. Kunze writes:
That's somewhat unfair. I'll let Prof. Humbert explain his own thinking, but I can imagine a number of pedagogical contexts where I would use Python because it doesn't get in the programmer's way very often, but require students to provide type hints as a compact (and machine-checkable!) way of documenting that aspect of their design. I found his pedagogical approoach perfectly plausible when he posted originally.

On Mon, Aug 24, 2015 at 06:19:47AM +0200, Prof. Dr. L. Humbert wrote: [...]
That would be very nice, but I don't see how it would be possible in a dynamic language with Python's rules. Can you explain how you expect this to work? In particular, what happens if Tree already has a value? Tree = "This is my Tree!" class Tree: def __init__(self, left: Tree, right: Tree): ... What happens if you use Tree outside of an annotation? # somehow, we enable forward declarations class Tree: x = Tree # What is the value of x here? The difficulty is that annotations are not merely declarations to the compiler, they have runtime effects as well. If they were pure declarations, we could invent some ad hoc rule like "if an annotation is an unbound name, inside a class, and that name is the same as the class, then treat it as a forward declaration". But we can't, because the __init__ function object needs to set the annotations before the Tree class exists. So the annotation needs to be something that actually exists. In order for the annotation to use Tree (without quotation marks) the name Tree needs to be bound to some existing value, and that value is used as the annotation, not the Tree class. If you can think of some way around this restriction, preferably one which is backwards-compatible (although that is not absolutely required) then please suggest it.
I don't think this is a "bad" work-around. I think it is quite a good one. It is sad that we need a work-around, but given that we do, this is simple to use and learn: If the type already exists, you can annotate variables with the type itself. But if the type doesn't yet exist (say you are still constructing it), you will get a NameError, so you can use the name of the class as a string as a forward declaration: class Tree: # At this point, Tree is still being constructed, and the class # doesn't yet exist, so we need to use a forward reference. def __eq__(self, other: 'Tree') -> Bool: ... # At this point, the Tree class exists and no forward reference # is needed. def builder(data: List) -> Tree: ...
I agree that is desirable, but surely many languages have some sort of forward declaration syntax? I know that both the Pascal and C families of languages do. -- Steve

On Aug 24, 2015, at 19:52, Steven D'Aprano <steve@pearwood.info> wrote:
What would a forward declaration mean in Python? In C, a forward declaration for a struct tag specifies that it is a struct tag. You can reference "struct spam *" as a type after that, but you can't reference "struct spam", because you need the size for that, which doesn't exist yet. You can't dereference a spam or access a member of a spam. The only thing you know is that a thing called struct spam exists, and is a struct type rather than a function type or native value typedef. That wouldn't do any good in Python. To be useful, it would have to mean something very different. For example, it could bind the name to some magic marker that means "after something else is bound to this name, go back and fix up everything that made a reference to this magic marker to refer to the bound value instead". (Presumably any method on the marker value just raises a NoValueYetException or something.)

On Tue, Aug 25, 2015 at 01:35:24AM -0700, Andrew Barnert wrote:
I thought it was obvious from context, not to mention from the example given by the OP. Its a reference to something that doesn't exist yet, namely the class still in the process of being created. E.g.: class Tree: def merge(self, other:'Tree') -> 'Tree': ... The string 'Tree' is a forward reference to the Tree class, as far as either the type-checker or a human reader is concerned. The annotations will, of course, be strings. But they will be understood as a reference to the Tree class. I mean reference in the sense of "to refer to", not in the technical sense of "pointer". Aside: we could use a decorator which replaces all annotations of the form 'Tree' with the actual Tree class itself. In pseudo-code: def decorate(cls): for each method in cls: for key, val in method.__annotations__: if val == cls.__name__: method.__annotations__[key] = cls @decorate class Tree: ... This may be useful for runtime introspection, but it comes too late to be of any use to any type-checker that runs at compile-time or earlier.
You're over complicating this. (Snarky comments regarding "a-strings" for annotations can go straight to /dev/null :-) Both PEP 484 and mypy call "use the class name as a string as a stand in for the actual class" a "forward reference": https://www.python.org/dev/peps/pep-0484/#forward-references http://mypy.readthedocs.org/en/latest/kinds_of_types.html#class-name-forward... and the OP's example of annotations in the Tree class comes straight out of the PEP. I am sorry if I mislead you by being sloppy and calling them "forward declaration" sometimes. -- Steve

On Aug 25, 2015, at 09:56, Steven D'Aprano <steve@pearwood.info> wrote:
I thought it was obvious, until you brought up C and Pascal, whose forward references are a pretty different thing from what PEP 484 and the OP's example imply, and whose compilation process is radically different from Python's. If you meant the same thing as the PEP, then the shorter answer is: I don't think there's anything useful to learn from C here. I think people have a sense of what it would mean to do what the OP wants, or at least more so than what it would mean to port the vaguely similar idea from C.

On Tue, Aug 25, 2015 at 02:24:45PM -0700, Andrew Barnert wrote:
In context, I was explicitly replying to the OPs comment about "needing" to annotate methods with the class object itself, rather than using a string, because "_one_ concept should work in different circumstances". I was pointing out that other languages make do with two concepts, and have their own ways of dealing with the problem of referring to something which doesn't exist yet. I wasn't suggesting that we copy what C, or any other language, does. To be honest, I thought that my post was pretty clear that far from thinking there is a problem to be solved, the use of string literals like 'Tree' is not just an acceptable solution to the problem, but it is an elegant solution to the problem. As I see it: - adding some sort of complicated, ad hoc special case to allow forward references would be a nasty hack and should be rejected; - large changes to the language (e.g. swapping to a two-pass compile process, to allow function and class hoisting) would eliminate the problem but break backwards compatibility and is a huge change for such a minor issue. I don't see this as needing anything more than teaching the students how Python's execution model actually works, plus a simple work-around for annotations within a class (use the class name as a string). -- Steve

On 8/24/2015 12:19 AM, Prof. Dr. L. Humbert wrote:
As you should know, at least after reading previous responses, making this work would require one of two major changes to Python class statements. 1. The class name has special (context sensitive) meaning in enclosed def statements. The compiler would have to compile def statements differently than it would the same def statements not in a Tree class. It would then have to patch all methods after the class is created. See the annoclass function below. A proposal to make the definition name of a function special within its definition has already been rejected. 2. Class statements would initially create an empty class bound to the class name. This could break back compatibility, and would require cleanup in case of a syntax error in the body. This would be similar to import statements initially putting a empty module in sys.modules to support circular imports. This is messy and still bug prone is use.
You did not say why you think this is bad. Is it a) students have to type "'"s?, or b) the resulting annotations are strings instead of the class? The latter can easily be fixed. --- from types import FunctionType def annofix(klass): classname = klass.__name__ for ob in klass.__dict__.values(): if type(ob) is FunctionType: annotations = ob.__annotations__ for arg, anno in annotations.items(): if anno == classname: annotations[arg] = klass return klass @annofix class Tree: def __init__(self, left: 'Tree', right: 'Tree'): self.left = left self.right = right print(Tree.__init__.__annotations__) # {'left': <class '__main__.Tree'>, 'right': <class '__main__.Tree'>} --- An alternative is to use a placeholder object instead of the class name. This is less direct, not repeating the name of the class throughout the definition makes it easier to rename the class or copy methods to another class. --- class Klass: pass # An annotation object meaning 'the class this method is defined in' def annofix2(klass): for ob in klass.__dict__.values(): if type(ob) is FunctionType: annotations = ob.__annotations__ for arg, anno in annotations.items(): if anno == Klass: annotations[arg] = klass return klass @annofix2 class Tree2: def __init__(self, left: Klass, right: Klass): self.left = left self.right = right print(Tree2.__init__.__annotations__) {'right': <class '__main__.Tree2'>, 'left': <class '__main__.Tree2'>} -- Terry Jan Reedy

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 25.08.2015 17:15, Terry Reedy wrote:
Ok, the answer for short: a) Pls let me explain it a bit:
1st class pedagogical/didactical thinking … Consider: there are recursive defined ADTs and we want to enable students to understand concepts and produce python-code to realize, what they understood. The main point: if the students already understood, that it is possible to place type hints to place type hints for arguments and results of functions/methods they should be able to reuse the notation in an orthogonal manner. For example:
When it comes to recursive ADTs they should be able to write
TNX Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXckU0ACgkQJQsN9FQ+jJ+zGgCdHZvnTcM5H4YMGVa/S0hv/c2o g8IAn2ZEFy8sL0f8uZDnzr1yFcFHc3A+ =Ex97 -----END PGP SIGNATURE-----

On 25.08.2015 18:01, Prof. Dr. L. Humbert wrote:
I am sorry about going back to this but why not teaching this in a different lesson?
I still think, only looking at recursive ADTs, it is enough to for them to write class Tree: def __init__(self, left_tree, right_tree): self.left_tree = left_tree self.right_tree = right_tree This way, you can teach them about code style, proper variable names and so forth. Regards, Sven R. Kunze

Sven R. Kunze writes:
Personally, I think that's ugly and unnecessarily verbose. YMMV, but I don't see why anybody else's sense of style should bow to yours. Also, if you're using Python, your abstraction is broken. I suspect students will write from tree import Tree my_tree = Tree(1, Tree(2, 3)) *which should fail* because Tree is not a union type, but your "ADTs by naming convention" approach can't catch that. Of course, in a real program "Although practicality beats purity" would argue that's a feature, but if you're teaching ADTs it's a bug. Also of course, under strict typing Prof. Humbert's example cannot be instantiated (since there's no default for the subtrees, you need to create an infinitely deep Tree of Trees of Trees ...). I doubt he intended that, but when students graduate to languages like Haskell, they're going to need to understand that kind of thing.

Prof. Dr. L. Humbert writes:
I think we all understand your point. The problem is that Python as designed is simply not capable of that. You have a choice: python-code with its "class statements bind names to (fully-constructed) class objects" semantics, or pseudo-python with "class statements are declarations" semantics. Python simply isn't a declarative language in that sense. I don't have any objection to a language with different semantics, but I find Python's semantics very consistent (except for module import -- which is improved a lot thanks to Brett -- and the occasional class which creates attributes in __getattr__ -- which I dislike for this reason). I doubt Python-Dev will want to give up that consistency; I know I don't want to. Although the PEP doesn't explicitly say it's a good practice, AFAICS using the name of a type (ie, a string) is supported everywhere a type identifier is. (Explicitly permitted are *forward* references to *undefined* types. However, in the "Django" example where "A.a" is the name of a type defined in module A, "B.b" is defined in B, and each uses the other, in import A import B # refers to A.a using the string "A.a" A.a is an existing type. I conclude that unless the implementation is excessively complicated, it's permitted to refer to already defined types using the string name.) In other words, although class Leaf(): def __init__(self, value: int): self.value = value class Tree: # with leaves def __init__(self, left: Union['Tree', Leaf], right: Union['Tree', Leaf]): self.left = left self.right = right is indeed non-orthogonal and ugly, you could declare the constructors def __init__(self, value: 'int'): def __init__(self, left: 'Union[Tree, Leaf]', right: 'Union[Tree, Leaf]'): using actual names of types (strings like "'Tree'") instead of bound names (identifiers like "Tree") everywhere. That may not be quite as clean as you'd like, but it seems orthogonal enough to me: you have a consistent syntax for all type annotations. I'd also point out that your own notation isn't quite orthogonal: self isn't annotated in your method definitions. If students can handle the special syntax for "self", I suppose that they can handle a special syntax for recursively defined types (or you could use Steven d'A's approach of a placeholder class "RecursivelyDefined", which would require augmenting the typechecker to recognize it). Steve

On Tue, Aug 25, 2015 at 11:15 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I have been thinking about this lately in a different context, and I would very much favor this approach. I think in large part because it works this way for modules it would make sense for it to work for classes as well. The fact that ClassName is bound to an object that will *eventually* become the class as soon as the parser has read in: class ClassName: represents, to me (and I would suspect to many students as well), the least astonishment. I realize it would be a very non-trivial change, however.
What about:
A little ugly, and potentially error-prone (but only, I think, in exceptional cases). It's also a decent opportunity to teach something about forward-declaration, which I think is worth knowing about. And I think this makes what's going on clearer than the string-based workaround. I didn't follow every single thread about PEP-484 though and I don't know if, or why this approach to forward-declaration was rejected. Erik

On 8/25/2015 12:19 PM, Erik Bray wrote:
'in use'.
It might be more useful to have def statements work that way (bind name to blank function object). Then def fac(n, _fac=fac): # less confusing than 'fac=fac' return _fac(n-1)*n if n > 1 else 1 would actually be recursive regardless of external name bindings. But as is the case with modules, exposing incomplete objects easily leads to buggy code, such as def f(n, code=f.__code__): pass
I like this better than my decorator version. Notice that if Python were changed so that 'Tree' were bound to a blank class first thing, then Tree(Tree) would be subclassing itself, breaking code like the above unless a special rule was added to remove a class from its list of subclasses. -- Terry Jan Reedy

On Aug 25, 2015, at 09:19, Erik Bray <erik.m.bray@gmail.com> wrote:
The problem here is, what if someone writes this: def __init__(self, left: Tree, right: Tree): # something with left.left Or: @classmethod def maketree(cls): return Tree(None, None) Here, Tree is "defined", but the type checker can't actually infer the type of left.left or the arguments of Tree's constructor (even if __init__ was defined before maketree). There are various ways you could special-case things to deal with this problem. The simplest would be that a forward-declared class just has no methods or other attributes, or maybe that it has only the ones inherited from superclasses or metaclasses, until the definition is completed, but my naive intuition says that it's obvious what both of the above mean, and the only reason I'd expect it to be an error is by understanding how it has to work under the covers.

On 25.08.2015 17:15, Terry Reedy wrote:
Although, I do not agree with the intentions of the OP, I would love to have "more forward references" in Python. I think the main issue here is the gab between intuition and what the compiler actually does. The following line: class MyClass: # first appearance of MyClass basically creates MyClass in the mind of the developer reading this piece of code. Thus, he expects to be able to use it after this line. However, Python first assigns the class to the name MyClass at the end of the class definition. Thus, it is usable only after that. People get around this (especially since one doesn't need it thus often), but it still feels... different. Best, Sven

On 8/25/2015 1:48 PM, Sven R. Kunze wrote:
I think the gap is less than you think ;-). Or maybe we think differently when reading code. Both human and compiler create the concept 'MyClass' (properly quoted) as an instance of the concept 'class'. In a static language like C, types are only concepts in the minds of programmers and compilers. There are no runtime char, int, float, or struct xyz objects, only the names or concepts. When the compiler is done, there are only bytes in a sense not true of Python.
Thus, he expects to be able to use it after this line.
One can use the string 'MyClass' in an annotation, for instance, and eventually dereference it to the object after the object is created. A smart type checker could understand that 'MyClass' in annotations within the class MyClass statement means instances of the future MyClass object. A developer should not expect to use not-yet-existent attributes and methods of the object. -- Terry Jan Reedy

On 26.08.2015 07:50, Terry Reedy wrote:
A developer should not expect to use not-yet-existent attributes and methods of the object.
Unfortunately, that is where I disagree. The definition of "not-yet-existent attribute" can vary from developer to developer.

On Aug 26, 2015, at 13:22, Sven R. Kunze <srkunze@mail.de> wrote:
But it has to mean something, and what it means dramatically affects what the code does. That's why Python has a simple rule: a straightforward imperative execution order that means you can easily tell whether the attribute was created before use. Also note that in Python, attributes can be added, replaced, and removed later, and their values can be mutated later. So the notion of "later" has to be simple to understand. Just saying "evaluate these statements in some order that's legal" doesn't work when some of those statements can be mutating state. In Python, a statement can create, replace, or destroys attributes of the module or any other object, and even an expression can mutate the values of those attributes. And in fact that's what everything in Python is doing, even declarative-looking statements like class, so you can't just block mutation, you have to deal with it as a fundamental thing. Besides fully compiler-driven evaluation order, there are two obvious alternatives that let you to keep linear order, but make sense of using values before they're created: lazy evaluation, as in Haskell, and dataflow evaluation, as in Oz. Maybe one of those is what you want here. But trying to fit either of those together with mutable objects sensibly is not trivial, nor is it trivial to fit them together with the kind of dynamic OO that Python provides, much less both in one. I'd love to see what someone could come up with by pursuing either of those, but I suspect it wouldn't feel much like Python.

On Tue, Aug 25, 2015 at 07:48:39PM +0200, Sven R. Kunze wrote:
Intuition according to whom? Some people expect that. Others do not. People hold all sorts of miscomprehensions and misunderstandings about the languages they use, and Python is no different.
To me, it feels intuitive and natural. Of course you can't use the class until after you have finished creating it. To me, alternatives like Javascript's function hoisting feel weird. This looks like time travel: // print is provided by the Rhino JS interpreter var x = f(); print(x); // multiple pages later function f() {return "Hello World!";}; How can you call a function that doesn't exist yet? There are even stranger examples, but for the sake of brevity let's just say that what seems "intuitive" to one person may be "weird" to another. With one or two minor exceptions, the Python interactive interpreter behaves identically to the non-interactive interpreter. If you have valid Python code, you can run it interactively. The same can't be said for Javascript. You can't run the above example interactively without *actual* time travel, if you try, it fails: [steve@ando ~]$ rhino Rhino 1.7 release 0.7.r2.3.el5_6 2011 05 04 js> var x = f(); js: "<stdin>", line 2: uncaught JavaScript runtime exception: ReferenceError: "f" is not defined. at <stdin>:2 A nice, clean, easy to understand execution model is easy to reason about. Predictability is much more important than convenience: I much prefer code which does what I expect over code that saves me a few characters, or lines, of typing, but surprises me by acting in a way I didn't expect. The fewer special cases I have to learn, the more predictable the language and the less often I am surprised. Python treats functions and classes as ordinary values bound to ordinary names in the ordinary way: the binding doesn't occur until the statement is executed. I like it that way. -- Steve

On 8/26/2015 10:43 PM, Steven D'Aprano wrote:
So do I. The same is true of import statements -- the binding of the name to the module does not happen until the module is built. It happens that the import machinery has a cache where is sticks an initially empty module in case of circular imports. But that is normally invisible to the code with the import statement. -- Terry Jan Reedy

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear collegues, for me the arguments are quite clear and I don't want to change much of the underlying work to get my prefered notation ;-) So I though of an alternative orthogonal approach for learning and being orthogonal. But I found, this is another showstopper for being orthogona l: It is indeed possible to run the following code / from typing import List class Tree: def __init__(self, left: 'Tree', right: 'Tree'): self.left = left self.right = right def leaves(self) -> List['Tree']: return [] def greeting(name: 'str') -> 'str': return 'Hello ' + name \ but not … def leaves(self) -> 'List'['Tree']: … which would be orthogonal, when deciding to put all used types in '…' Perhaps there will be a chance to make this a valid construction? be orthogonal ;-) Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXgBu4ACgkQJQsN9FQ+jJ/ueQCdH7RUdpJ4DZd0/12AbP6dLF+E 8NgAn1XFIuRRCIC+Bas68qPXi0SVwgtT =QF2Z -----END PGP SIGNATURE-----

Prof. Dr. L. Humbert <humbert@...> writes:
The issue is that Python does not have separate type/value universes. 'Tree' is just a type hint, not a type in the conventional sense. I would very much like if Python *did* have separate types/values, so that one could write (OCaml): class tree (left : tree) (right : tree) = object val left = left val right = right end ;; Which is an uninhabited type, since you need a tree to construct a tree! :) Thus: class tree (left : tree option) (right : tree option) = object val left = left val right = right end ;; Stefan Krah

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28.08.2015 15:58, Petr Viktorin wrote: … pv> You can put the entire hint in a string: pv> def leaves(self) -> 'List[Tree]': TNX … solves the problem in this example and makes it orthogonal. Next showstopper will come, when working on/with datastructures, which contains entangled class-structures, perhaps the instantiation of a class, when we have to use self.node = Node(…) but not self.node= 'Node'(…) So I think, we as educators have to live with this pedagogical »suboptimal« solution(s) and have to communicate those non-orthogonal notation and make clear, what the reason is all about. TNX Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXgtLMACgkQJQsN9FQ+jJ+4cgCfTaer3dFG4CscmNu/yo4AWxti 0AQAoI9nyTaA2hgDfyCEFk5WdkW1/28L =Wp9U -----END PGP SIGNATURE-----
participants (10)
-
Andrew Barnert
-
Erik Bray
-
Joseph Jevnik
-
Petr Viktorin
-
Prof. Dr. L. Humbert
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy