Method signature syntactic sugar (especially for dunder methods)

Python has very intuitive and clear syntax, except when it comes to method definitions, particularly dunder methods. class Vec(object): def __init__(self, x, y): self.x, self.y = x, y def __add__(self, other): return Vec(self.x + other.x, self.y + other.y) def __getitem__(self, key): return self.x if key == 'x' else self.y if key == 'y' else None def __contains__(self, item): return self.x == item or self.y == item def __bool__(self): return self.x or self.y def display(self): print('x:', self.x, 'y:', self.y) Having to declare a self parameter is confusing since you don't pass anything in when you call the method on an instance (I am aware of bound vs. unbound methods, etc. but a beginner would not be). The double underscores are also confusing. I propose syntactic sugar to make these method signatures more intuitive and clean. class Vec(object): def class(x, y): self.x, self.y = x, y def self + other: return Vec(self.x + other.x, self.y + other.y) def self[key]: return self.x if key == 'x' else self.y if key == 'y' else None def item in self: return self.x == item or self.y == item def bool(self): return self.x or self.y def self.display(): print('x:', self.x, 'y:', self.y) There are some immediate problems with this, such as `bool(self)` being indistinguishable from a regular method signature and `class(x, y)` not declaring the `self` identifier. These and other problems can be solved to some extent, but I thought I would see if there is any interest around this before going too in depth.

On 6 November 2016 at 16:28, Nathan Dunn <nathanfdunn@gmail.com> wrote:
The syntax is the least confusing part of special method overrides, so if folks are still struggling with that aspect of defining them, there are plenty of other things that are going to trip them up.
From your examples:
* __add__ is only part of the addition protocol, there is also __radd__ and __iadd__ * likewise, there is not a one-to-one correspondence between the bool() builtin and the __bool__() special method (there are other ways to support bool(), like defining __len__() on a container) * the mapping protocol covers more than just __getitem__ (and you also need to decide if you're implementing a mapping, sequence, or multi-dimensional array) If the current syntax makes people think "This looks tricky and complicated and harder than defining normal methods", that's a good thing, as magic methods *are* a step up in complexity from normal method definitions, since you need to learn more about how and when they get called and the arguments they receive, while normal methods are accessed via plain function calls. My concern with the last suggestion is different (permitting the first parameter to be specified on the left of the method name), which is that it would break the current symmetry between between name binding in def statements and target binding in assignment statements - currently, all permitted binding targets in def and class statements behave the same way as they do in normal assigment statements, and throw SyntaxError otherwise. With the proposed change, we'd face the problem that the following would both be legal, but meant very different things: cls.mymethod = lambda self: print(self) def cls.mymethod(self): print(self) The former is already legal and assigns the given lambda function as a method on the existing class, `cls` The latter currently throws SyntaxError. With the proposed change, rather than throwing SyntaxError as it does now, the latter would instead be equivalent to: def mymethod(cls, self): print(self) which would be a very surprising difference in behaviour. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Nov 06, 2016 at 01:28:34AM -0500, Nathan Dunn wrote:
Python has very intuitive and clear syntax, except when it comes to method definitions, particularly dunder methods.
I disagree with your premise here. Python's method definitions are just as intuitive and clear as the rest of Python's syntax: methods are just functions, indented in the body of the class where they belong, with an explicit "self" parameter. And dunder methods are just a naming convention. They're not the most visually attractive methods, due to the underscores, but its just a naming convention. Otherwise they are declared in exactly the same way as any other method: using normal function syntax, indented inside the body of the class, with an explicit "self" the same as other methods. So there's no magic to learn. Once you know how to declare a function, it is a tiny step to learn to declare a method: put it inside a class, indent it, and add "self", and now you have a method. And once you know how to declare a method, there's nothing more to learn to handle dunder methods. All you need know is the name of the method or methods you need, including the underscores. [...]
You are mistaking "mysterious" for "confusing". "Why do I have to explicitly declare a self parameter?" is a mystery, and the answer can be given as: - you just do - because internally methods are just functions - because it is actually useful (e.g. for unbound methods) depending on the experience of the person asking. But its not *confusing*. "Sometimes I have to implicitly declare self, and sometimes I don't, and there doesn't seem to be any pattern to which it is" would be confusing. "Always explicitly declare self" is not.
The double underscores are also confusing.
I've certainly a few cases of people who misread __init__ as _init_ and was surprised by their code not working. In over a decade of dealing with beginners' questions on comp.lang.python and the tutor mailing list. So it is an easy mistake to make, but apparently a *rare* mistake to make, and very easy to correct. So I disagree that double underscores are "confusing". What is confusing about the instructions "press underscore twice at the beginning and end of the method name"?
I don't think that there is anything intuitive about changing the name of the method from __init__ to "class". What makes you think that people will intuit the word "class" to create instance? That seems like a dubious idea to me. And it certainly isn't *clean*. At the moment, Python's rules are nicely clean: keywords can never be used as identifiers. You would either break that rule, or have some sort of magic where *some* keywords can *sometimes* be used as identifiers, but not always. That's the very opposite of clean -- it is a nasty, yucky design, and it doesn't scale to other protocols: def with: # is this __enter__ or __exit__? It doesn't even work for instance construction! Is class(...) the __new__ or __init__ method? Not all beginners to Python are beginners to programming at all. Other languages typically use one of three naming conventions for the constructor: - a method with the same name as the class itself e.g. Java, C#, PHP 4, C++, ActionScript. - special predefined method names e.g. "New" in VisualBasic, "alloc" and "init" in Objective C, "initialize" in Ruby, "__construct" in PHP 5. - a keyword used before an otherwise normal method definition e.g. "constructor" in Object Pascal, "initializer" in Ocaml, "create" in Eiffel, "new" in F#. So there's lots of variation in how constructors are written, and what seems "intuitive" will probably depend on the reader's background. Total beginners to OOP don't have any pre-conceived expectations, because the very concept of initialising an instance is new to them. Whether it is spelled "New" or "__init__" or "mzygplwts" is just a matter of how hard it is to spell correctly and memorise.
def self + other: return Vec(self.x + other.x, self.y + other.y)
My guess is that this is impossible in a LL(1) parser, but even if possible, how do you write the reversed __radd__ method? My guess is that you would need: def other + self: but for that to work, "self" now needs to be a keyword rather than just a regular identifier which is special only because it is the first in the parameter list. And that's a problem because there are cases (rare, but they do happen) where we don't want to use "self" for the instance parameter. A very common pattern in writing classes is: def __add__(self, other): # implementation goes here __radd__ = __add__ since addition is usually commutative. How would your new syntax handle that? -- Steve

On 6 November 2016 at 16:28, Nathan Dunn <nathanfdunn@gmail.com> wrote:
The syntax is the least confusing part of special method overrides, so if folks are still struggling with that aspect of defining them, there are plenty of other things that are going to trip them up.
From your examples:
* __add__ is only part of the addition protocol, there is also __radd__ and __iadd__ * likewise, there is not a one-to-one correspondence between the bool() builtin and the __bool__() special method (there are other ways to support bool(), like defining __len__() on a container) * the mapping protocol covers more than just __getitem__ (and you also need to decide if you're implementing a mapping, sequence, or multi-dimensional array) If the current syntax makes people think "This looks tricky and complicated and harder than defining normal methods", that's a good thing, as magic methods *are* a step up in complexity from normal method definitions, since you need to learn more about how and when they get called and the arguments they receive, while normal methods are accessed via plain function calls. My concern with the last suggestion is different (permitting the first parameter to be specified on the left of the method name), which is that it would break the current symmetry between between name binding in def statements and target binding in assignment statements - currently, all permitted binding targets in def and class statements behave the same way as they do in normal assigment statements, and throw SyntaxError otherwise. With the proposed change, we'd face the problem that the following would both be legal, but meant very different things: cls.mymethod = lambda self: print(self) def cls.mymethod(self): print(self) The former is already legal and assigns the given lambda function as a method on the existing class, `cls` The latter currently throws SyntaxError. With the proposed change, rather than throwing SyntaxError as it does now, the latter would instead be equivalent to: def mymethod(cls, self): print(self) which would be a very surprising difference in behaviour. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Nov 06, 2016 at 01:28:34AM -0500, Nathan Dunn wrote:
Python has very intuitive and clear syntax, except when it comes to method definitions, particularly dunder methods.
I disagree with your premise here. Python's method definitions are just as intuitive and clear as the rest of Python's syntax: methods are just functions, indented in the body of the class where they belong, with an explicit "self" parameter. And dunder methods are just a naming convention. They're not the most visually attractive methods, due to the underscores, but its just a naming convention. Otherwise they are declared in exactly the same way as any other method: using normal function syntax, indented inside the body of the class, with an explicit "self" the same as other methods. So there's no magic to learn. Once you know how to declare a function, it is a tiny step to learn to declare a method: put it inside a class, indent it, and add "self", and now you have a method. And once you know how to declare a method, there's nothing more to learn to handle dunder methods. All you need know is the name of the method or methods you need, including the underscores. [...]
You are mistaking "mysterious" for "confusing". "Why do I have to explicitly declare a self parameter?" is a mystery, and the answer can be given as: - you just do - because internally methods are just functions - because it is actually useful (e.g. for unbound methods) depending on the experience of the person asking. But its not *confusing*. "Sometimes I have to implicitly declare self, and sometimes I don't, and there doesn't seem to be any pattern to which it is" would be confusing. "Always explicitly declare self" is not.
The double underscores are also confusing.
I've certainly a few cases of people who misread __init__ as _init_ and was surprised by their code not working. In over a decade of dealing with beginners' questions on comp.lang.python and the tutor mailing list. So it is an easy mistake to make, but apparently a *rare* mistake to make, and very easy to correct. So I disagree that double underscores are "confusing". What is confusing about the instructions "press underscore twice at the beginning and end of the method name"?
I don't think that there is anything intuitive about changing the name of the method from __init__ to "class". What makes you think that people will intuit the word "class" to create instance? That seems like a dubious idea to me. And it certainly isn't *clean*. At the moment, Python's rules are nicely clean: keywords can never be used as identifiers. You would either break that rule, or have some sort of magic where *some* keywords can *sometimes* be used as identifiers, but not always. That's the very opposite of clean -- it is a nasty, yucky design, and it doesn't scale to other protocols: def with: # is this __enter__ or __exit__? It doesn't even work for instance construction! Is class(...) the __new__ or __init__ method? Not all beginners to Python are beginners to programming at all. Other languages typically use one of three naming conventions for the constructor: - a method with the same name as the class itself e.g. Java, C#, PHP 4, C++, ActionScript. - special predefined method names e.g. "New" in VisualBasic, "alloc" and "init" in Objective C, "initialize" in Ruby, "__construct" in PHP 5. - a keyword used before an otherwise normal method definition e.g. "constructor" in Object Pascal, "initializer" in Ocaml, "create" in Eiffel, "new" in F#. So there's lots of variation in how constructors are written, and what seems "intuitive" will probably depend on the reader's background. Total beginners to OOP don't have any pre-conceived expectations, because the very concept of initialising an instance is new to them. Whether it is spelled "New" or "__init__" or "mzygplwts" is just a matter of how hard it is to spell correctly and memorise.
def self + other: return Vec(self.x + other.x, self.y + other.y)
My guess is that this is impossible in a LL(1) parser, but even if possible, how do you write the reversed __radd__ method? My guess is that you would need: def other + self: but for that to work, "self" now needs to be a keyword rather than just a regular identifier which is special only because it is the first in the parameter list. And that's a problem because there are cases (rare, but they do happen) where we don't want to use "self" for the instance parameter. A very common pattern in writing classes is: def __add__(self, other): # implementation goes here __radd__ = __add__ since addition is usually commutative. How would your new syntax handle that? -- Steve
participants (3)
-
Nathan Dunn
-
Nick Coghlan
-
Steven D'Aprano