Auto assignment of attributes

The problem of assigning init arguments as attributes has appeared several times in the past ( https://mail.python.org/archives/list/python-ideas@python.org/message/VLI3DO... was the most recent we could find) and is already handled in dataclasses. Lately, discussing this topic with a friend, we thought that using a specific token could be a possible approach, so you could do: class MyClass: def __init__(self, @a, @b, c): pass and it would be analogous to doing: class MyClass: def __init__(self, a, b, c): self.a = a self.b = b Then, you would instantiate the class as usual, and the variables tagged with `@` would be bound to the object:
objekt = MyClass(2, 3, 4)
print(objekt.b)
3
print(objekt.c)
AttributeError: 'MyClass' object has no attribute 'c' We have a working implementation here if anyone wants to take a look at: https://github.com/pabloalcain/cpython/tree/feature/auto_attribute. Keep in mind that we have limited knowledge about how to modify cpython itself, and which would the best places be to do the modifications, so it's more than likely that some design decisions aren't very sound ( https://devguide.python.org/grammar/ and https://devguide.python.org/parser/ were incredibly helpful). Besides the implementation, we would like to know what the community thinks on whether this might have any value. While developing this, we realized that Crystal already has this feature (eg https://github.com/askn/crystal-by-example/blob/master/struct/struct.cr) with the same syntax; which is kind of expected, considering it's syntax is based on Ruby. Random collection of thoughts: 1. If auto-assignment made sense in general, one of the reasons we went for this rather than the decorator approach is that we wouldn't like to have a list of strings that can vary decoupled from the actual argument name. 2. The current implementation of `@` works for any function, not only init. We don't know if this would actually be a desirable feature. 3. It also works with any function in the wild. This mostly allows for monkey-patching to work out of the box:
class Klass:
... def __init__(self): ... pass ...
def add_parameter(k, @p):
... pass ...
Klass.add_parameter = add_parameter
objekt = Klass()
print(objekt.p)
Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Klass' object has no attribute 'p'
objekt.add_parameter(11)
print(objekt.p)
11 Again, we are not sure if this is desirable, but it's what made most sense for us at the moment. 4. Adding the `@` token to the argument doesn’t remove the variable from the function/method scope, so this would be perfectly valid:
def my_function(k, @parameter):
... print(parameter)
my_function(objekt, 4)
4
k.parameter
4 5. We didn’t implement it for lambda functions. Cheers, Pablo and Quimey

There is no need for a whole new syntax for what can trivially be accomplished by a decorator, and a simple one, in this cases. I for one am all for the inclusion of a decorator targeting either the __init__ method or the class itself to perform this binding of known arguments to instance attributes prior to entering __init__. It could live either in functools or dataclasses itself. On Sat, Apr 16, 2022 at 5:49 PM Pablo Alcain <pabloalcain@gmail.com> wrote:

On Mon, Apr 18, 2022 at 4:24 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Isn’t this what dataclasses already accomplish? I understand that it’s the reverse— with a dataclass, you specify the fields, and the __init__ is generated, whereas this proposal is ttt be at you’d write an __init__, and the attributes would be set — but other than taste, is there a practical difference? -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Regarding the usage of a decorator to do the auto-assignment, I think that it has an issue regarding how to select a subset of the variables that you would be setting. In the general case, you can probably get away with calling `autoassign`. But, for example, if you want to set a but not b, you'd probably have to use a string as the identifier of the parameters that you want to assign: ``` class MyKlass: @autoassign('a') def __init__(self, a, b): print(b) ``` This, in my perspective, brings two things: the first one is that you'd be repeating everywhere the list of names, so for example doing refactors like changing a variable name would be a bit error-prone: If you change the variable from `a` to `my_var_name`, you'd have to also change the list in the autoassign. It's not a lot, but it can induce some errors because of the repetition. On the other hand, I guess it would be a bit hard for IDEs and static checkers to follow this execution path. I know that I'm only one data point, but for what it's worth, I was very excited with the idea but this prevented me from actually implementing this solution on a day-to-day basis: it felt a bit fragile and induced me to some errors. About dataclasses, the point that Chris mentions, I think that they are in a different scope from this, since they do much more stuff. But, beyond this, a solution on the dataclass style would face a similar scenario: since the `__init__` is autogenerated, you would also be in a tight spot in the situation of "how would I bind only one of the items?". Again, now I'm talking about my experience, but I think that it's very hard to think that we could replace "classes" with "dataclasses" altogether. Here's an example of one of the (unexpected for me) things that happen when you try to do inheritance on dataclasses: https://peps.python.org/pep-0557/#inheritance. Overall, I think that it's hard to think about a solution to this problem that is clean and robust without adding new syntax with it. I would like to hear your thoughts on this (and everyone else's of course!) Cheers, Pablo On Mon, Apr 18, 2022 at 9:55 PM Christopher Barker <pythonchb@gmail.com> wrote:

On Wed, Apr 20, 2022 at 12:30 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
IMO, that is trivially resolvable by doing the decorator, by default, assign all parameters. If it tkaes a string or sequence with parameter names, then, it will just bind those (still shorter than one line `self.attr = attr` for each attribute. And for the fragility: that is the advantage of having a robust implementation of something like this on the stdlib: it is not something most people will go out of their way to write their own, since the tradeoff is just copy and paste a bunch of plain assignments. But having it right and known, could chop off tens of lines of useless code in, probably the majority of Python projects. Also answering Christopher Barker: This has a subtle, but different use than dataclasses. It might be grouped in the dataclasses module, on the stdlib.

I take the freedom to interpret 'no news == good news' on this thread - nominally that there are no major disagreements that a decorator to auto-commit `__init__` atributes to the instance could be a nice addition to the stdlib. I also assume it is uncontroversial enough for not needing a PEP. I remember a similar thread from a couple years ago where the thread ended up agreeing on such a decorator as well. I would then like to take the opportunity to bike-shed a bit on this, and maybe we can get out of here with a BPO and an actual implementation. So, 1) Would it live better in "dataclasses" or "functools"? Or some other package? 2) What about: the usage of the decorator without arguments would imply in committing all of `__init__` arguments as instance attributes, and two, mutually exclusive, optional kwonly "parameters" and "except" parameters could be used with a list of arguments to set? 2.1) The usage of a "parameters" argument to the decorator would even pick names from an eventual "**kwargs" "__init__" parameter, which otherwise would be commited be left alone/ 2.2) would it make sense to have another argument to specify how "**kwargs" should be treated? I see three options: (i) ignore it altogether, (ii) commit it as a dictionary, (iii) commit all keys in kwargs as instance attributes, (iii.a) "safe commit" keys in kwargs, avoiding overriding methods and class attributes. (whatever the option here, while the principle of "less features are safer" in a first release, should consider if it would be possible to include the new features later in a backwards compatible way) 3) While the Python implementation for such a decorator is somewhat straightforward, are there any chances of making it static-annotation friendly? AFAIK dataclasses just work with static type checking because the @dataclass thecorator is special-cased in the checker tools themselves. Would that be the same case here? 4) Naming. What about "@commitargs"? 5) Should it emit a warning (or TypeError) when decorating anything but a function named `__init__ ` ? On Wed, Apr 20, 2022 at 3:14 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:

Hey Joao! For what it's worth, I'm not a big fan of the proposal to be honest, for the reasons I have already mentioned. I'm not heavily against it, but I would most likely not use it. Nevertheless, I believe it would need a PEP since it probably can change substantially the way Python code is being written. I think that discussion can probably belong to a specific thread with the proposal with your questions summary there so everyone can contribute to the implementation that, clearly, has some interesting points that it would be better if we could discuss in detail. I would very much like for us to evaluate, in this thread, the original proposal we sent, regarding if anyone else thinks it would make sense to add a new syntax for the binding of attributes. Pablo

On 4/21/22 10:29, Joao S. O. Bueno wrote:
I am strongly against using a decorator for this purpose. It would only be useful when *all* the arguments are saved as-is; in those cases where only *some* of them are, it either wouldn't work at all or would need to retype the names which would eliminate all benefits of using a decorator. -- ~Ethan~

On Wed, Apr 20, 2022 at 3:31 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
in the class: 1. Generate a __init__ 2. Generate a reasonable __repr__ 3. Generate a reasonable __eq__ 4. Automatically support destructuring with match statements And you can independently disable any/all of them with arguments to the decorator. They *can* do much more, but I find it pretty unusual to *ever* write a class that I wouldn't want most of those for. The __init__ it generates is essentially automatically writing the boilerplate you're trying to avoid, so it seems entirely reasonable to consider this the same scope. As for "how would I bind only one/some of the items?", dataclasses already support this with dataclasses.InitVar and a custom __post_init__ method; so: class MyClass: def __init__(self, @a, @b, c): ... do something with c that doesn't just assign it as self.c... where you directly move values from the a and b arguments to self.a and self.b, but use c for some other purpose, is spelled (using typing.Any as a placeholder annotation when there's no better annotation to use): @dataclass class MyClass: a: Any b: Any c: InitVar[Any] def __post_init__(self, c): ... do something with c that doesn't just assign it as self.c; self.a and self.b already exist ... The only name repeated is c (because you're not doing trivial assignment with it), and it's perfectly readable. I'm really not seeing how this is such an unwieldy solution that it's worth adding dedicated syntax to avoid a pretty trivial level of boilerplate that is already avoidable with dataclasses anyway. -Josh

On Thu, Apr 21, 2022 at 7:33 PM Josh Rosenberg < shadowranger+pythonideas@gmail.com> wrote:
This is very interesting. I use dataclasses quite a lot tbh, but I do think that the purpose is different: the `__repr__` and `__eq__` they generate, for example, are reasonable when you consider that the classes are "mutable namedtuples with defaults". the effective use of dataclasses indeed goes beyond the "mutable namedtuples" thing, but as this happens then the "reasonability" of the default `__repr__` and `__eq__` also starts to fade. as a small experience token: in one point in time, I ended up having a `dataclass` that reimplemented both `__init__` (through `__post_init__`) and `__repr__`. Then I realized that I probably didn't want a dataclass to begin with.
I think that in some scenarios it can be done, but it doesn't look very clean: namely with the explicit declaration of `InitVar` and the usage of the `__post_init__`. It looks like this case in which you don't like the autogenerated solution and patch it to reflect your actual goal. It's ok, but it's also going to lead towards harder to mantain and interpret code.
-Josh
Overall, I think that not all Classes can be thought of as Dataclasses and, even though dataclasses solutions have their merits, they probably cannot be extended to most of the other classes. Pablo

On Sat, Apr 23, 2022 at 10:53 AM Pablo Alcain <pabloalcain@gmail.com> wrote:
Absolutely. However, this is not an "all Classes" question. I don't think of dataclasses as "mutable namedtuples with defaults" at all. But do think they are for classes that are primarily about storing a defined set of data. I make heavy use of them for this, when I am adding quite a bit of ucntionatily, but their core function is still to store a collection of data. To put it less abstractly: Dataclasses are good for classes in which the collection of fields is a primary focus -- so the auto-generated __init__, __eq__ etc are appropriate. It's kind of a recursive definition: dataclasses work well for those things that data classes' auto generated methods work well for :-) If, indeed, you need a lot of custom behavior for teh __init__, and __eq__, and ... then datclasses are not for you. And the current Python class system is great for fully customized behaviour. It's quite purposeful that parameters of the __init__ have no special behavior, and that "self" is explicit -- it gives you full flexibility, and everything is explicit. That's a good thing. But, of course, the reason this proposal is on the table (and it's not the first time by any means) is that it's a common pattern to assign (at least some of) the __init__ parameters to instance attributes as is. So we have two extremes -- on one hand: A) Most __init__ params are assigned as instance attributes as is, and these are primarily needed for __eq__ and __repr__ and on the other extreme: B) Most __init__ params need specialized behavior, and are quite distinct from what's needed by __eq__ and __repr__ (A) is, of course, the entire point of dataclasses, so that's covered. (B) is well covered by the current, you-need-to-specify-everything approach. So the question is -- how common is it that you have code that's far enough toward the (A) extreme as far as __init__ params being instance attributes that we want special syntax, when we don't want most of the __eq__ and __repr__ behaviour. In my experience, not all that much -- my code tends to be on one extreme or the other. But I think that's the case that needs to be made -- that there's a lot of use cases for auto-assigning instance attributes, that also need highly customized behaviour for other attributes and __eq__ and __repr__. NOTE: another key question for this proposal is how you would handle mutable defaults -- anything special, or "don't do that"? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 4/23/22 12:11, Christopher Barker wrote:
NOTE: another key question for this proposal is how you would handle mutable defaults -- anything special, or "don't do that"?
Why should they be handled at all? If the programmer writes def __init__(a, @b, @c=[]): pass then all instances that didn't have `c` given will share the same list -- just like if the code was: def __init__(a, b, c=[]): self.b = b self.c = c The purpose of the syntax is to automatically save arguments to same-named attributes, not to perform any other magic. -- ~Ethan~

On Thu, Apr 28, 2022 at 4:15 PM Ethan Furman <ethan@stoneleaf.us> wrote:
so the answer is "don't do that" (unless, in the rare case, that's what you actually want). The purpose of the syntax is to automatically save arguments to same-named
attributes, not to perform any other magic.
If the programmer writes
def __init__(a, @b, @c=[]): pass sure but that's the coon case -- more common would be: def __init__(a, @b, c=None): handle_a if c is None: c = [] or some such. so without "any other magic", then we have a less useful proposal. One thing you can say about dataclasses is that they provide a way to handle all parameters, mutable and immutable. Anyway, I just thought it should be clearly said. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 4/28/22 21:46, Christopher Barker wrote:
One thing you can say about dataclasses is that they provide a way to handle all parameters, mutable and immutable.
Really? I did not know that. Interesting. Definitely an issue of dataclasses being special, though, and not attributable to the syntax used to make a dataclass. -- ~Ethan~

On Thu, Apr 28, 2022 at 10:26 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Absolutely -- my point is that if you want to auto-assign all parameters, then a dataclass is a good way to do it. If you need to write handling code for most parameters, then the current do-it-by-hand approach is fine. A new syntax would help most when you need to write custom code for a few parameters, but auto-assign the rest. I brought up mutable defaults, as they would require custom code, making the auto-assignment a tad less useful. But perhaps if the recent ideas for late-bound parameters ever bears fruit, then combining that with auto-assigning would increase the usefulness of both features. -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Apr 23, 2022, 1:11 PM Christopher Barker <pythonchb@gmail.com> wrote:
Although I agree that dataclasses have definitely grown beyond this scope, the definition of “mutable namedtuples with defaults” come from the original PEP (https://peps.python.org/pep-0557/#abstract). The main point here is that there are several usecases for classes that do not fit conceptually the “dataclass” goal.
I agree 100%. This proposal, at its core, is not related with dataclasses. There are some cases in which dataclasses are the solution, but there are many many times in which you will want to use just classes.
I don’t see B as a “extreme approach”. I think that comparing python classes with the specific dataclass is not helpful. The B scenario is simply the general case for class usage. Scenario A, I agree, is a very common one and fortunately we have dataclasses for them.
I agree that this is the main question. For what it’s worth, a quick grep on the stdlib (it’s an overestimation) provides: $ grep -Ie "self\.\(\w\+\) = \1" -r cpython/Lib | wc 2095 I did the same in two libraries that I use regularly: pandas and scikit-learn: $ grep -Ie "self\.\(\w\+\) = \1" -r sklearn | wc -l 1786 $ grep -Ie "self\.\(\w\+\) = \1" -r pandas | wc -l 650 That’s a total of ~4.5k lines of code (again, this is an overestimation, but it can give us an idea of the ballpark estimate) For a better and more fine-grained analysis, Quimey wrote this small library (https://github.com/quimeyps/analize_autoassign) that uses the Abstract Syntax Tree to analyze a bunch of libraries and identify when the “autoassign” could work. It shows that out of 20k analyzed classes in the selected libraries (including black, pandas, numpy, etc), ~17% of them could benefit from the usage of auto-assign syntax. So it looks like the isolated pattern of `self.<something> = <something>` is used a lot. I don’t think that moving all of these cases to dataclasses can provide a meaningful solution. When I take a look at these numbers (and reflect in my own experience and my colleagues) it looks like there is a use case for this feature. And this syntax modification looks small and kind of clean, not adding any boilerplate. But, obviously, it entails a further of discussion whether it makes sense to add new syntax for this, considering the maintenance that it implies.
As Ethan wrote on this thread, there is nothing “special” happening with mutable defaults: the early binding will work the same way and doing `def __init__(self, @a=[]): pass` would be the same than doing `def __init__(self, a=[]): self.a = a`. So I believe that, in spite of some very specific cases, it would be as discouraged as setting mutable variables as default in general. Late binding is probably a whole other can of worms. Pablo

On Sat, Apr 30, 2022 at 2:17 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
It's not and "extreme" approach -- it's one end of a continuum.
I think that comparing python classes with the specific dataclass is not helpful. The B scenario is simply the
That, well, is pretty much useless, if I understand the re correctly -- the fact that a class is assigning to self doesn't mean it's directly assigning the parameters with no other logic. And any number of those self assignments could be in non-__init__ methods. All that shows is that instance attributes are used. I don't think anyone is questioning that. For a better and more fine-grained analysis, Quimey wrote this small
It shows that out of 20k analyzed classes in the selected libraries
(including black, pandas, numpy, etc), ~17% of them could benefit from the usage of auto-assign syntax.
I only read English, and haven't studied the coe, so I don't know how that works, but assuming it's accurately testing for the simple cases that auto-assigning could work for; That's not that much actually -- for approx every six-parameter function, one of them could be auto-assigned. or for every six functions, one could make good use of auto-assignment (and maybe be a dataclass?) So it looks like the isolated pattern of `self.<something> = <something>`
is used a lot.
I don't think that's ever been in question. The question, as I see it, is what fraction of parameters could get auto-assigned in general -- for classes where dataclasses wouldn't make sense. And I'm not trying to be a Negative Nelly here -- I honestly don't know, I actually expected it to be higher than 17% -- but in any case, I think it should be higher than 17% to make it worth a syntax addition. But pandas and numpy may not be the least bit representative -- maybe run the most popular packages on PyPi? I thought I made this point, but it seems to have gotten lost: What I'm saying is that, for example, if a class __init__ has 6 parameters, and one of them could be auto-assigned, then yes, auto-assigning could be used, but you really haven't gained much from that -- it would not be worth the syntax change. And any class with an __init__ in which most or all parameters could be auto-assigned -- then those might be a good candidate for a dataclass. So how many are there where say, more than half of __init__ parameters could be auto-assigned, where dataclasses wouldn't be helpful? A lot? then, yes, new syntax may be warranted. But, obviously, it entails a further of discussion whether it makes sense
to add new syntax for this, considering the maintenance that it implies.
It's not so much the maintenance -- it's the transition burden, and the burden of yet more complexity in the language, particularly parameters/arguments: Try to teach a newbie about arguments/parameters in Python, there is a remarkable complexity there already: positional vs keyword *args, **kwargs keyword-only. (and all of these from both the caller and callee perspective) That's a lot of possible combinations -- believe me, it's pretty darn complex and hard to explain! -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 5/1/22 00:21, Christopher Barker wrote:
On Sat, Apr 30, 2022 at 2:17 PM Pablo Alcain wrote:
I think you place too much emphasis on dataclasses -- none of my projects use them, nor could they. Going through a one of my smaller projects, this is what I found: - number of `__init__`s: 11 - number of total params (not counting self): 25 - number of those params assigned as-is: 19 - number of `__init__`s where all are assigned as-is: 6 - number of non-`__init__`s where this would useful: 0
17% is a massive amount of code.
But pandas and numpy may not be the least bit representative [...]?
This would not be the first time Python was improved to help the scientific community. My own thoughts about the proposal: It seems interesting, and assigning as-is arguments is a chore -- but I'm not sure using up a token to help only one method per class is a good trade. -- ~Ethan~

On Sun, May 1, 2022 at 9:35 AM Ethan Furman <ethan@stoneleaf.us> wrote:
Is it unreasonable to instead suggest generalizing the assignment target for parameters? For example, if parameter assignment happened left to right, and allowed more than just variables, then one could do: def __init__(self, self.x, self.y): pass Python 2 had non-variable parameters (but not attributes, just unpacking: def foo((x, y)): pass), but it was removed in Python 3, because of issues with introspection (among other, perhaps less significant, issues): https://peps.python.org/pep-3113/ Perhaps now that positional-only parameters exist, and the introspection APIs have evolved over time, there is a way to work this into the introspection APIs sensibly. (a "@foo" parameter does not have this problem, because it's just a parameter named `foo` with some extra stuff.) -- Devin

On Mon, May 2, 2022 at 7:21 AM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, I agree. I don't think that the syntax is unreasonable, but it looks like it would be putting `self` at the same "level" of all the other possible parameters and could lead to this kind of confusion. What _might_ be a possibility (I'm not advocating in favor of it) is, like ruby does, to also add the `@x` as syntactic sugar for `self.x` in the body of the methods. This way the `@x` in the signature would be consistent, but I believe it can conflict conceptually with the "explicit self" philosophy.

Steven D'Aprano writes:
IMO, both of those should be errors. This syntax only makes much sense for the first formal argument of a method definition, because it's the only formal argument which has a fixed definition. The form "def foo(self, x, x.y)" has an interpretation, I guess, but def foo(self, x, y): x.y = y is not a pattern I can recall ever seeing, and it could be relatively easily relaxed if it were requested enough. On the other hand, folks do frequently request a way to DRY out long suites of "self.x = x" assignments. This could, of course, be done with a symbol such as '@' or even '.', but '@' could also be used for other purposes (late binding, for example), and "def foo(self, .x, .y):" looks like both grit on Tim's screen and a typo. On the other hand, I can't imagine what else might be meant by "def foo(self, self.x):". All that said, I'm not a fan of this feature as such. But giving this semantics to formal arguments of the form "self.x" is the most intuitive (in the sense of "hard to give another interpretation") of the proposals I've seen.

On Sat, 7 May 2022 at 23:15, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
I'd define it very simply. For positional args, these should be exactly equivalent: def func(self, x, x.y): ... def func(*args): self, x, x.y = args ... The logical extension to kwargs would work in natural parallel. ChrisA

On Sat, May 7, 2022 at 6:28 AM Chris Angelico <rosuav@gmail.com> wrote:
I really don't like this --hard to put my finger on it exactly, but I think it's because Python doesn't use any magic in method definitions. There is a touch of magic in binding methods to classes, but that comes later. so while: class Foo: def method(self, ...) Looks special, that's only because: - it's defined in the class definition directly - its using the self convention But class Foo: pass def method(fred, ...): pass Foo.method = method Means exactly the same thing. So then we have: def fun(foo, foo.bar): ... Is legal, but: def fun(this, foo, foo.bar): ... Is not.
it's the only formal argument which has a fixed definition.
yes, but only in one specific context -- so I don't like it leaking out of that context.
I like that -- it's simple to understand, clear, it doesn't only make sense for methods, and it might even be useful in other contexts [*]. I think: def fun(x, y.z): ... would work fine, too. e.g. you wouldn't be restricted to using other parameters. That being said, I'm still -1 on the idea. [*] -- the "other contexts" is key for me -- if someone can show that this is a useful pattern in other contexts, I think it would be a stronger proposal. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, May 07, 2022 at 11:38:19AM -0700, Ethan Furman wrote:
Indeed. Just because we can imagine semantics for some syntax, doesn't make it useful. Aside from the very special case of attribute binding in initialisation methods (usually `__init__`), and not even all, or even a majority, of those, this is a great example of YAGNI. Outside of that narrow example of auto-assignment of attributes, can anyone think of a use-case for this? And as far as auto-assignment of attributes goes, I want to remind everyone that we have that already, in a simple two-liner: vars(self).update(locals()) del self.self which will work for most practical cases where auto-assignment would be useful. (It doesn't work with slots though.) This is a useful technique that isn't widely known enough. I believe that if it were more widely know, we wouldn't be having this discussion at all. -- Steve

On Sun, 8 May 2022 at 10:23, Steven D'Aprano <steve@pearwood.info> wrote:
Honestly, I don't know of any. But in response to the objection that it makes no sense, I offer the perfectly reasonable suggestion that it could behave identically to other multiple assignment in Python. There's not a lot of places where people use "for x, x.y in iterable", but it's perfectly legal. Do we need a use-case for that one to justify having it, or is it justified by the simple logic that assignment targets are populated from left to right? I'm not advocating for this, but it shouldn't be pooh-poohed just because it has more power than you personally can think of uses for. ChrisA

On Sun, May 08, 2022 at 11:02:22AM +1000, Chris Angelico wrote:
On Sun, 8 May 2022 at 10:23, Steven D'Aprano <steve@pearwood.info> wrote:
Nobody says that it makes "no sense". Stephen Turnbull suggested it doesn't make "much sense", but in context I think it is clear that he meant there are no good uses for generalising this dotted parameter name idea, not that we can't invent a meaning for the syntax.
The analogy breaks down because we aren't talking about assignment targets, but function parameters. Function parameters are only *kinda sorta* like assignment targets, and the process of binding function arguments passed by the caller to those parameters is not as simple as self, x, x.y = args The interpreter also does a second pass using keyword arguments, and a third pass assigning defaults if needed. Or something like that -- I don't think the precise implementation matters. Of course we could make it work by giving *some* set of defined semantics, but unless it is actually useful, why should we bother? Hence my comment YAGNI.
I'm not advocating for this, but it shouldn't be pooh-poohed just because it has more power than you personally can think of uses for.
Power to do *what*? If nobody can think of any uses for this (beyond the auto-assignment of attributes), then what power does it really have? I don't think "power" of a programming language feature has a purely objective, precise definition. But if it did, it would surely have something to do with the ability to solve actual problems. -- Steve

On Sun, May 1, 2022 at 10:35 AM Ethan Furman <ethan@stoneleaf.us> wrote:
Yes, I agree that the cost/benefit should be analyzed. For what it's worth, the choice of the `@` was because of two different reasons: first, because we were inspired by Ruby's syntax (later on learned that CoffeeScript and Crystal had already taken the approach we are proposing) and because the `@` token is already used as an infix for `__matmul__` ( https://peps.python.org/pep-0465/). I believe it's the only usage that it has, so it probably won't be that confusing to give it this new semantic as well. All of this, I believe, mitigate the "using up a token", but it's not enough to make it a clear decision, so I 100% agree with your concern.

On Mon, May 02, 2022 at 10:34:56AM -0600, Pablo Alcain wrote:
Did you forget decorators? What other languages support this feature, and what syntax do they use? Personally, I don't like the idea of introducing syntax which looks legal in any function call at all, but is only semantically meaningful in methods, and not all methods. Mostly only `__init__`. How would this feature work with immutable classes where you want to assign attributes to the instance in the `__new__` method? I fear that this is too magical, too cryptic, for something that people only use in a tiny fraction of method. 17% of `__init__` methods is probably less than 1% of methods, which means that it is going to be a rare and unusual piece of syntax. Beginners and casual coders (students, scientists, sys admins, etc, anyone who dabbles in Python without being immersed in the language) are surely going to struggle to recognise where `instance.spam` gets assigned, when there is no `self.spam = spam` anywhere in the class or its superclasses. There is nothing about "@" that hints that it is an assignment. (Well, I suppose there is that assignment and at-sign both start with A.) I realise that this will not satisfy those who want to minimize the amount of keystrokes, but remembering that code is read perhaps 20-100 times more than it is written, perhaps we should consider a keyword: def __init__(self, auto spam:int, eggs:str = ''): # spam is automatically bound to self.spam self.eggs = eggs.lower() I dunno... I guess because of that "code is read more than it is written" thing, I've never felt that this was a major problem needing solving. Sure, every time I've written an __init__ with a bunch of `self.spam = spam` bindings, I've felt a tiny pang of "There has to be a better way!!!". But **not once** when I have read that same method later on have I regretted that those assignments are explicitly written out, or wished that they were implicit and invisible. Oh, by the way, if *all* of the parameters are to be bound: def __init__(self, spam, eggs, cheese, aardvark): vars(self).update(locals()) del self.self Still less magical and more explicit than this auto-assignment proposal. -- Steve

On Mon, 2 May 2022 at 18:46, Steven D'Aprano <steve@pearwood.info> wrote:
I have classes with 20+ parameters (packaging metadata). You can argue that a dataclass would be better, or some other form of refactoring, and you may actually be right. But it is a legitimate design for that use case. In that sort of case, 20+ lines of assignments in the constructor *are* actually rather unreadable, not just a pain to write. Of course the real problem is that you often don't want to *quite* assign the argument unchanged - `self.provides_extras = set(provides_extras or [])` or `self.requires_python = requires_python or specifiers.SpecifierSet()` are variations that break the whole "just assign the argument unchanged" pattern. As a variation on the issue, which the @ syntax *wouldn't* solve, in classmethods for classes like this, I often find myself constructing dictionaries of arguments, copying multiple values from one dict to another, sometimes with the same sort of subtle variation as above: @classmethod def from_other_args(cls, a, b, c, d): kw = {} kw["a"] = a kw["b"] = b kw["c"] = c kw["d"] = d return cls(**kw) Again, in "real code", not all of these would be copied, or some would have defaults, etc. The pattern's the same, though - enough args arecopied to make the idea of marking them with an @ seem attractive. Overall, as described I don't think the @arg proposal provides enough benefit to justify new syntax (and I think trying to extend it would end badly...). On the other hand, if someone were to come up with a useful, general way of bulk-copying named values from one "place"[1] to another, possibly with minor modifications, I think I'd find that very useful. Call it a DSL for bulk data initialisation, if you like. I think such a thing could pretty easily be designed as a library. But I doubt anyone will bother, as adhoc "on the fly" solutions tend to be sufficient in practice. Paul [1] A "place" might be a dictionary - dict["name"] or an object - getattr(self, "name").

On Mon, May 02, 2022 at 07:44:14PM +0100, Paul Moore wrote:
Indeed. 20+ parameters is only a code smell, it's not *necessarily* wrong. Sometimes you just need lots of parameters, even if it is ugly. For reference, open() only takes 8, so 20 is a pretty wiffy code smell, but it is what it is.
I don't know. Its pretty easy to skim lines when reading, especially when they follow a pattern: self.spam = spam self.eggs = eggs self.cheese = cheese self.aardvark = aardvark self.hovercraft = hovercraft self.grumpy = grumpy self.dopey = dopey self.doc = doc self.happy = happy self.bashful = bashful self.sneezy = sneezy self.sleepy = sleepy self.foo = foo self.bar = bar self.baz = baz self.major = major self.minor = minor self.minimus = minimus self.quantum = quantum self.aether = aether self.phlogiston = phlogiston Oh that was painful to write! But I only needed to write it once, and I bet that 99% of people reading it will just skim down the list rather than read each line in full. To be fair, having written it once, manual refactoring may require me to rewrite it again, or at least edit it. In early development, sometimes the parameters are in rapid flux, and that's really annoying. But that's just a minor period of experimental coding, not an on-going maintenance issue.
Indeed. Once we move out of that unchanged assignment pattern, we need to read more carefully rather than skim self._spam = (spam or '').lower().strip() but you can't replace that with auto assignment.
You may find it easier to make a copy of locals() and delete the parameters you don't want, rather than retype them all like that: params = locals().copy() for name in ['cls', 'e', 'g']: del params[name] return cls(**params)
But the @ proposal here won't help. If you mark them with @, won't they be auto-assigned onto cls? -- Steve

On Tue, 3 May 2022 at 03:04, Steven D'Aprano <steve@pearwood.info> wrote:
It's worth noting that dataclasses with lots of attributes by default generate constructors that require all of those as parameters. So it's a code smell yes, but by that logic so are dataclasses with many attributes (unless you write a bunch of custom code). Genuine question - what *is* a non-smelly way of writing a dataclass with 24 attributes? I've written about 20 variations on this particular class so far, and none of them feel "right" to me :-(
Precisely.
Again, precisely. My point here is that the @ proposal is, in my experience, useful in far fewer situations than people are claiming. What *is* common (again in my experience) is variations on a pattern that can be described as "lots of repetitive copying of values from one location to another, possibly with minor modifications". Having a way of addressing the broader problem *might* be of sufficient use to be worth pursuing, and it might even be possible to do something useful in a library, not needing new syntax. On the other hand, the @ syntax as proposed *doesn't* address enough use cases (for me!) to be worthwhile, especially not if new syntax is needed rather than just something like a decorator. Paul

On Tue, May 3, 2022 at 6:36 AM Paul Moore <p.f.moore@gmail.com> wrote:
It's a good point. We have been thinking a bit about how to measure the "usefulness" of the proposal. What we have so far is mainly an intuition driven by code that we and colleagues developed and a couple of statistics ( https://github.com/quimeyps/analize_autoassign in case anyone reading this doesn't have the link nearby). Although I think that code analysis can be a way to find out the usefulness of the proposal, the statistics that we have so far feel a bit coarse-grained to be honest. So any idea on what would be good metrics to answer the question of "how ofthen the syntax would be useful" will be more than welcome!

On Mon, May 2, 2022 at 11:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
totally forgot decorators, my bad!
What other languages support this feature, and what syntax do they use?
you mean languages other than those two? I haven't found any. In case you mean the syntax for those two, I know a tiny bit Crystal's. It leverages the fact that they use `@` for referring to `self`, as in Ruby. so you would be able to write something like this: ``` class Person def initialize(@name : String) end def greet print("Hello, ", @name) end end p = Person.new "Terry" p.greet ```
Yes, it's a good point. Allowing functions in the wild to use this syntax would simplify the usage for monkeypatching... but how often would you want to monkeypatch an `__init__`? and how often would you want to use the auto-assign outside of the `__init__`? i believe that it's too little. so in that case, maybe we can make it legal only in all methods. I agree, if we forego monkeypatching, that semantically it wouldn't be meaningful in functions; but, in methods, I think that semantically it would make sense apart from the `__init__`; the thing is that probably it wouldn't be that useful.

On Sat, Apr 23, 2022 at 12:11:07PM -0700, Christopher Barker wrote:
Isn't it? I thought this was a proposal to allow any class to partake in the dataclass autoassignment feature. (Not necessarily the implementation.)
I don't think of dataclasses as "mutable namedtuples with defaults" at all.
What do you think of them as?
But do think they are for classes that are primarily about storing a defined set of data.
Ah, mutable named tuples, with or without defaults? :-) Or possibly records/structs. -- Steve

On Sat, Apr 30, 2022 at 6:40 PM Steven D'Aprano <steve@pearwood.info> wrote:
no -- it's about only a small part of that.
I answered that in the next line, that you quote.
well, no. - the key is that you can add other methods to them, and produce all sort of varyingly complex functionality. I have done that myself.
Or possibly records/structs.
nope, nope, and nope. But anyway, the rest of my post was the real point, and we're busy arguing semantics here. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Apr 30, 2022 at 11:54:47PM -0700, Christopher Barker wrote:
How so? Dataclasses support autoassignment. This proposes to allow **all classes** (including non-dataclasses) to also support autoassignment. So can you pleae clarify your meaning. To me, this does look like an "all Classes" question. What am I missing?
Perhaps your answer isn't as clear as you think it is. See below.
Named tuples support all of that too. One of the reasons I have not glommed onto dataclasses is that for my purposes, they don't seem to add much that named tuples didn't already give us. * Record- or struct-like named fields? Check. * Automatic equality? Check. * Nice repr? Check. * Can add arbitrary methods and override existing methods? Check. Perhaps named tuples offer *too much**: * Instances of tuple; * Equality with other tuples; and maybe dataclasses offer some features I haven't needed yet, but it seems to me that named tuples and dataclasses are two solutions to the same problem: how to create a record with named fields.
Or possibly records/structs.
nope, nope, and nope.
Okay, I really have no idea what you think dataclasses are, if you don't think of them as something like an object-oriented kind of record or struct (a class with named data fields). You even define them in terms of storing a defined set of data, except you clearly don't mean a set in the mathematical meaning of an unordered collection (i.e. set()). A set of data is another term for a record. So I don't understand what you think dataclasses are, if you vehemently deny that they are records (not just one nope, but three). And since I don't understand your concept of dataclasses, I don't know how to treat your position in this discussion. Should I treat it as mainstream, or idiosyncratic? Right now, it seems pretty idiosyncratic. Maybe that's because I don't understand you. See below.
But anyway, the rest of my post was the real point, and we're busy arguing semantics here.
Well yes, because if we don't agree on semantics, we cannot possibly communicate. Semantics is the **meaning of our words and concepts**. If we don't agree on what those words mean, then how do we understand each other? I've never understood people who seem to prefer to talk past one another with misunderstanding after misunderstanding rather than "argue semantics" and clarify precisely what they mean. -- Steve

On Sun, May 1, 2022 at 1:16 AM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, any class could use this feature (though it's more limited than what dataclasses do) -- what I was getting is is that it would not be (particularly) useful for all classes -- only classes where there are a lot of __init__ parameters that can be auto-assigned. And that use case overlaps to some extent with dataclasses. Perhaps your answer isn't as clear as you think it is.
apparently not.
well, no. - the key is that you can add other methods to them, and
"primarily" -- but the key difference is that dataclasses are far more customisable and flexible. They are more like "classes with boiler plate dunders auto-generated" That is, a lot more like "regular" classes than they are like tuples. Whereas namedtupels are , well, tuples where the item have names. That's kinda it. produce
all sort of varyingly complex functionality.
Named tuples support all of that too.
No, they don't -- you can add methods, though with a klunky interface, and they ARE tuples under the hood which does come with restrictions. And the immutability means that added methods can't actually do very much. One of the reasons I have not glommed onto dataclasses is that for my
purposes, they don't seem to add much that named tuples didn't already give us.
ahh -- that may be because you think of them as "mutable named tuples" -- that is, the only reason you'd want to use them is if you want your "record" to be mutable. But I think you miss the larger picture.
that's a little klunky though, isn't it? Have you seen much use of named tuples like that? For that matter do folks do that with tuples much either? Perhaps named tuples offer *too much**:
* Instances of tuple; * Equality with other tuples;
Yes, that can be a downside, indeed. and maybe dataclasses offer some features I haven't needed yet, but it
seems to me that named tuples and dataclasses are two solutions to the same problem: how to create a record with named fields.
I suspect you may have missed the power of datclasses because you started with this assumption. Maybe it's because I'm not much of a database guy, but I don't think in terms of records. For me, datclasses are a way to make a general purpose class that hold a bunch of data, and have the boilerplate written for me. And what dataclasses add that makes them so flexible is that they: - allow for various custom fields: - notably default factories to handle mutable defaults - provide a way to customise the initialization - and critically, provide a collection of field objects that can be used to customize behavior. All this makes them very useful for more general purpose classes than a simple record. I guess a way to think if them is this: if you are writing a class in which the __init__ assigns most of the parameters to the instance, then a dataclass could be helpful. which is why I think they solve *part* of the problem that special auto assigning syntax would solve. Not all of the problem, which is why I'm suggesting that folks find evidence for how often auto-assigned parameters would be very useful when dataclasses would not. So I don't understand what you think dataclasses are, if you vehemently
deny that they are records (not just one nope, but three).
It's not that they can't be used as records, it's that they can be so much more. After all what is any class but a collection of attributes (some of which may be methods) ?
perhaps it is -- the mainstream may not have noticed how much one can do with dataclasses.
sure -- I should have been more explicit -- arguing about what "mutable named tuple" means didn't seem useful. But in retrospect I was wrong -- it may matter to this discussion, and that's why I originally pointed out that I don't think of dataclasses as "mutable named tuples" -- perhaps I should have said not *only* mutable named tuples. The relevant point here is that I'm suggesting there are uses for dataclasses that go beyond a simple record that happens to be mutable -- i.e. classes that do (potentially much) more than simply get and set attributes. Which means that someone that needs to assign a lot of parameters to self, but doesn't think they are writing a simple record may well be able to use dataclasses, and thus may not need syntax for auto-assigning paramaters. But one thinks that dataclasses are "mutable named tuples", then then they wont' consider them for more complex needs, and thus may find they really want that auto-assigning syntax.
What I meant by arguing semantics is that we don't need to agree on a brief way to describe dataclasses -- dataclasses are what they are, and are defined by what features they have. That's it. If you want to call them 'mutable namedtuples', fine, just be careful that that doesn't limit what you think they can be used for. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, May 01, 2022 at 10:40:49PM -0700, Christopher Barker wrote:
Ah, the penny drops! That makes sense.
Named tuples support all of that too.
No, they don't -- you can add methods, though with a klunky interface,
Its the same class+def interface used for adding methods to any class, just with a call to namedtuple as the base class. class Thingy(namedtuple("Thingy", "spam eggs cheese")): def method(self, arg): pass I think it is a beautifully elegant interface.
and they ARE tuples under the hood which does come with restrictions.
That is a very good point.
And the immutability means that added methods can't actually do very much.
TIL that string and Decimal methods don't do much. *wink*
I'm not a database guy either. When I say record, I mean in the sense of Pascal records, or what C calls structs. A collection of named fields holding data. Objects fundamentally have three properties: identity, state, and behaviour. The behaviour comes from methods operating on the object's state. And that state is normally a collection of named fields holding data. That is, a record. If your class is written in C, like the builtins, you can avoid exposing the names of your data fields, thus giving the illusion from Python that they don't have a name. But at the C level, they have a name, otherwise you can't refer to them from your C code.
For me, datclasses are a way to make a general purpose class that hold a bunch of data,
I.e. a bunch of named fields, or a record :-)
and have the boilerplate written for me.
Yes, I get that part. I just find the boilerplate to be less of a cognitive burden than learning the details of dataclasses. Perhaps that's because I've been fortunate enough to not have to deal with classes with vast amounts of boilerplate. Or I'm just slow to recognise Blub features :-)
That sounds like a class builder mini-framework. What you describe as "flexible" I describe as "overcomplex". All that extra complexity to just avoid writing a class and methods. Anyway, I'm not trying to discourage you from using dataclasses, or persuade you that they are "bad". I'm sure you know your use-cases, and I have not yet sat down and given dataclasses a real solid workout. Maybe I will come around to them once I do.
All this makes them very useful for more general purpose classes than a simple record.
I'm not saying that all classes *are* a simple record, heavens not! I'm saying that all classes contain, at their core, a record of named fields containing data. Of course classes extend that with all sorts of goodies, like inheritance, object identity, methods to operate on that data in all sorts of ways, a nice OOP interface, and more. Anyway, I think I now understand where you are coming from, thank you for taking the time to elaborate.
+1 -- Steve

On Mon, May 2, 2022 at 7:42 PM Steven D'Aprano <steve@pearwood.info> wrote:
Now you get it :-) What you describe as "flexible" I describe as "overcomplex". All that
extra complexity to just avoid writing a class and methods.
I don’t think much of that complexity is exposed if you don’t need it. I'm saying that all classes contain, at their core, a record of named
fields containing data.
Exactly — which is why data classes can be a base for a lot of use cases. Anyway, I think thee we conversation has moved beyond this. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Anyway, there is something dataclasses do today that prevent you from jsut adding a @dataclass for binding __init__ attributes from an otherwise "complete class that does things": it overwrites __init__ itself - one hass to resort to write "__post_init__" instead,¨ That means that if some class you would like to have all the explicit named parameters associated as attributes, and then pass other parameters to be taken care of by the super-class, it simply does not work: ``` In [17]: from dataclasses import dataclass In [18]: In [18]: @dataclass ...: class A: ...: a: int ...: def __post_init__(self, **kwargs): ...: print(kwargs) ...: In [19]: A(1, b=3) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [19], in <cell line: 1>() ----> 1 A(1, b=3) TypeError: A.__init__() got an unexpected keyword argument 'b' ``` No way to have "b" reach "__post_init__" and even less ways to have it be passed on to any super().__init__ of "A", unless I manually write A.__init__ A "language dedicated method" to do just the part of auto assigning the attributes, is much more straightforward than converting the whole thing into a dataclass. I re-iterate that while I'd find this an useful addition, I think new syntax for this would be overkill. On Tue, May 3, 2022 at 12:34 AM Christopher Barker <pythonchb@gmail.com> wrote:

On Mon, May 2, 2022 at 9:30 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
I nor anyone else ever claimed dataclasses could be used for everything. You are quite right that you can’t make a dataclass that subclasses a non-dataclass— particularly when the subclass takes fewer parameters in its __init__. But that’s a bit of an anti-pattern anyway. In [17]: from dataclasses import dataclass
Why would you not write that as: @dataclass class A: a: int b: int def __post_init__(self, **kwargs): print(kwargs) If you really don't want b, you could remove it in the __post_init__ -- but as stated earlier in this thread, dataclasses are not a good idea if you have to do a lot of hand-manipulation -- you might as well write a standard class. And if you are subclassing, and need to have a parameter passed on to the superclass, you need to deal with that by hand anyway: class A(B): def __init__(self, a): super.__init__(a, b) is clearly not going to work, so you do: class A(B): def __init__(self, a, b): super.__init__(a, b) unless you do: class A(B): def __init__(self, a, *args, **kwags): super.__init__(a, *args, **kwargs) which is indeed a common pattern, and not supported by datclasses *yet -- there's talk of adding a way to support **kwargs. The problem with toy examples is that we can have no idea how best to solve a non-problem -- so we have no idea if the point being made is valid -- it's a challenge. But yes, there are many use cases not suited to dataclasses. The question is how many of these would rap parge benefit from auto-assigning syntax? I re-iterate that while I'd find this an useful addition,
I agree.
I think new syntax for this would be overkill.
And I agree there, too. -CHB

On 5/2/22 23:21, Christopher Barker wrote:
But yes, there are many use cases not suited to dataclasses. The question is how many of these would rap parge benefit from auto-assigning syntax?
How did you get to "rap parge" from "reap the" ? On 5/2/22 20:32, Christopher Barker wrote:
Anyway, I think thee we conversation has moved beyond this.
And "thee we" from "that the" ? Do we get bonus points if we break the code? ;-) -- ~Ethan~

Sorry - auto-correct is not my friend :-( -CHB On Tue, May 3, 2022 at 12:07 PM Ethan Furman <ethan@stoneleaf.us> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

There is no need for a whole new syntax for what can trivially be accomplished by a decorator, and a simple one, in this cases. I for one am all for the inclusion of a decorator targeting either the __init__ method or the class itself to perform this binding of known arguments to instance attributes prior to entering __init__. It could live either in functools or dataclasses itself. On Sat, Apr 16, 2022 at 5:49 PM Pablo Alcain <pabloalcain@gmail.com> wrote:

On Mon, Apr 18, 2022 at 4:24 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Isn’t this what dataclasses already accomplish? I understand that it’s the reverse— with a dataclass, you specify the fields, and the __init__ is generated, whereas this proposal is ttt be at you’d write an __init__, and the attributes would be set — but other than taste, is there a practical difference? -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Regarding the usage of a decorator to do the auto-assignment, I think that it has an issue regarding how to select a subset of the variables that you would be setting. In the general case, you can probably get away with calling `autoassign`. But, for example, if you want to set a but not b, you'd probably have to use a string as the identifier of the parameters that you want to assign: ``` class MyKlass: @autoassign('a') def __init__(self, a, b): print(b) ``` This, in my perspective, brings two things: the first one is that you'd be repeating everywhere the list of names, so for example doing refactors like changing a variable name would be a bit error-prone: If you change the variable from `a` to `my_var_name`, you'd have to also change the list in the autoassign. It's not a lot, but it can induce some errors because of the repetition. On the other hand, I guess it would be a bit hard for IDEs and static checkers to follow this execution path. I know that I'm only one data point, but for what it's worth, I was very excited with the idea but this prevented me from actually implementing this solution on a day-to-day basis: it felt a bit fragile and induced me to some errors. About dataclasses, the point that Chris mentions, I think that they are in a different scope from this, since they do much more stuff. But, beyond this, a solution on the dataclass style would face a similar scenario: since the `__init__` is autogenerated, you would also be in a tight spot in the situation of "how would I bind only one of the items?". Again, now I'm talking about my experience, but I think that it's very hard to think that we could replace "classes" with "dataclasses" altogether. Here's an example of one of the (unexpected for me) things that happen when you try to do inheritance on dataclasses: https://peps.python.org/pep-0557/#inheritance. Overall, I think that it's hard to think about a solution to this problem that is clean and robust without adding new syntax with it. I would like to hear your thoughts on this (and everyone else's of course!) Cheers, Pablo On Mon, Apr 18, 2022 at 9:55 PM Christopher Barker <pythonchb@gmail.com> wrote:

On Wed, Apr 20, 2022 at 12:30 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
IMO, that is trivially resolvable by doing the decorator, by default, assign all parameters. If it tkaes a string or sequence with parameter names, then, it will just bind those (still shorter than one line `self.attr = attr` for each attribute. And for the fragility: that is the advantage of having a robust implementation of something like this on the stdlib: it is not something most people will go out of their way to write their own, since the tradeoff is just copy and paste a bunch of plain assignments. But having it right and known, could chop off tens of lines of useless code in, probably the majority of Python projects. Also answering Christopher Barker: This has a subtle, but different use than dataclasses. It might be grouped in the dataclasses module, on the stdlib.

I take the freedom to interpret 'no news == good news' on this thread - nominally that there are no major disagreements that a decorator to auto-commit `__init__` atributes to the instance could be a nice addition to the stdlib. I also assume it is uncontroversial enough for not needing a PEP. I remember a similar thread from a couple years ago where the thread ended up agreeing on such a decorator as well. I would then like to take the opportunity to bike-shed a bit on this, and maybe we can get out of here with a BPO and an actual implementation. So, 1) Would it live better in "dataclasses" or "functools"? Or some other package? 2) What about: the usage of the decorator without arguments would imply in committing all of `__init__` arguments as instance attributes, and two, mutually exclusive, optional kwonly "parameters" and "except" parameters could be used with a list of arguments to set? 2.1) The usage of a "parameters" argument to the decorator would even pick names from an eventual "**kwargs" "__init__" parameter, which otherwise would be commited be left alone/ 2.2) would it make sense to have another argument to specify how "**kwargs" should be treated? I see three options: (i) ignore it altogether, (ii) commit it as a dictionary, (iii) commit all keys in kwargs as instance attributes, (iii.a) "safe commit" keys in kwargs, avoiding overriding methods and class attributes. (whatever the option here, while the principle of "less features are safer" in a first release, should consider if it would be possible to include the new features later in a backwards compatible way) 3) While the Python implementation for such a decorator is somewhat straightforward, are there any chances of making it static-annotation friendly? AFAIK dataclasses just work with static type checking because the @dataclass thecorator is special-cased in the checker tools themselves. Would that be the same case here? 4) Naming. What about "@commitargs"? 5) Should it emit a warning (or TypeError) when decorating anything but a function named `__init__ ` ? On Wed, Apr 20, 2022 at 3:14 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:

Hey Joao! For what it's worth, I'm not a big fan of the proposal to be honest, for the reasons I have already mentioned. I'm not heavily against it, but I would most likely not use it. Nevertheless, I believe it would need a PEP since it probably can change substantially the way Python code is being written. I think that discussion can probably belong to a specific thread with the proposal with your questions summary there so everyone can contribute to the implementation that, clearly, has some interesting points that it would be better if we could discuss in detail. I would very much like for us to evaluate, in this thread, the original proposal we sent, regarding if anyone else thinks it would make sense to add a new syntax for the binding of attributes. Pablo

On 4/21/22 10:29, Joao S. O. Bueno wrote:
I am strongly against using a decorator for this purpose. It would only be useful when *all* the arguments are saved as-is; in those cases where only *some* of them are, it either wouldn't work at all or would need to retype the names which would eliminate all benefits of using a decorator. -- ~Ethan~

On Wed, Apr 20, 2022 at 3:31 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
in the class: 1. Generate a __init__ 2. Generate a reasonable __repr__ 3. Generate a reasonable __eq__ 4. Automatically support destructuring with match statements And you can independently disable any/all of them with arguments to the decorator. They *can* do much more, but I find it pretty unusual to *ever* write a class that I wouldn't want most of those for. The __init__ it generates is essentially automatically writing the boilerplate you're trying to avoid, so it seems entirely reasonable to consider this the same scope. As for "how would I bind only one/some of the items?", dataclasses already support this with dataclasses.InitVar and a custom __post_init__ method; so: class MyClass: def __init__(self, @a, @b, c): ... do something with c that doesn't just assign it as self.c... where you directly move values from the a and b arguments to self.a and self.b, but use c for some other purpose, is spelled (using typing.Any as a placeholder annotation when there's no better annotation to use): @dataclass class MyClass: a: Any b: Any c: InitVar[Any] def __post_init__(self, c): ... do something with c that doesn't just assign it as self.c; self.a and self.b already exist ... The only name repeated is c (because you're not doing trivial assignment with it), and it's perfectly readable. I'm really not seeing how this is such an unwieldy solution that it's worth adding dedicated syntax to avoid a pretty trivial level of boilerplate that is already avoidable with dataclasses anyway. -Josh

On Thu, Apr 21, 2022 at 7:33 PM Josh Rosenberg < shadowranger+pythonideas@gmail.com> wrote:
This is very interesting. I use dataclasses quite a lot tbh, but I do think that the purpose is different: the `__repr__` and `__eq__` they generate, for example, are reasonable when you consider that the classes are "mutable namedtuples with defaults". the effective use of dataclasses indeed goes beyond the "mutable namedtuples" thing, but as this happens then the "reasonability" of the default `__repr__` and `__eq__` also starts to fade. as a small experience token: in one point in time, I ended up having a `dataclass` that reimplemented both `__init__` (through `__post_init__`) and `__repr__`. Then I realized that I probably didn't want a dataclass to begin with.
I think that in some scenarios it can be done, but it doesn't look very clean: namely with the explicit declaration of `InitVar` and the usage of the `__post_init__`. It looks like this case in which you don't like the autogenerated solution and patch it to reflect your actual goal. It's ok, but it's also going to lead towards harder to mantain and interpret code.
-Josh
Overall, I think that not all Classes can be thought of as Dataclasses and, even though dataclasses solutions have their merits, they probably cannot be extended to most of the other classes. Pablo

On Sat, Apr 23, 2022 at 10:53 AM Pablo Alcain <pabloalcain@gmail.com> wrote:
Absolutely. However, this is not an "all Classes" question. I don't think of dataclasses as "mutable namedtuples with defaults" at all. But do think they are for classes that are primarily about storing a defined set of data. I make heavy use of them for this, when I am adding quite a bit of ucntionatily, but their core function is still to store a collection of data. To put it less abstractly: Dataclasses are good for classes in which the collection of fields is a primary focus -- so the auto-generated __init__, __eq__ etc are appropriate. It's kind of a recursive definition: dataclasses work well for those things that data classes' auto generated methods work well for :-) If, indeed, you need a lot of custom behavior for teh __init__, and __eq__, and ... then datclasses are not for you. And the current Python class system is great for fully customized behaviour. It's quite purposeful that parameters of the __init__ have no special behavior, and that "self" is explicit -- it gives you full flexibility, and everything is explicit. That's a good thing. But, of course, the reason this proposal is on the table (and it's not the first time by any means) is that it's a common pattern to assign (at least some of) the __init__ parameters to instance attributes as is. So we have two extremes -- on one hand: A) Most __init__ params are assigned as instance attributes as is, and these are primarily needed for __eq__ and __repr__ and on the other extreme: B) Most __init__ params need specialized behavior, and are quite distinct from what's needed by __eq__ and __repr__ (A) is, of course, the entire point of dataclasses, so that's covered. (B) is well covered by the current, you-need-to-specify-everything approach. So the question is -- how common is it that you have code that's far enough toward the (A) extreme as far as __init__ params being instance attributes that we want special syntax, when we don't want most of the __eq__ and __repr__ behaviour. In my experience, not all that much -- my code tends to be on one extreme or the other. But I think that's the case that needs to be made -- that there's a lot of use cases for auto-assigning instance attributes, that also need highly customized behaviour for other attributes and __eq__ and __repr__. NOTE: another key question for this proposal is how you would handle mutable defaults -- anything special, or "don't do that"? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 4/23/22 12:11, Christopher Barker wrote:
NOTE: another key question for this proposal is how you would handle mutable defaults -- anything special, or "don't do that"?
Why should they be handled at all? If the programmer writes def __init__(a, @b, @c=[]): pass then all instances that didn't have `c` given will share the same list -- just like if the code was: def __init__(a, b, c=[]): self.b = b self.c = c The purpose of the syntax is to automatically save arguments to same-named attributes, not to perform any other magic. -- ~Ethan~

On Thu, Apr 28, 2022 at 4:15 PM Ethan Furman <ethan@stoneleaf.us> wrote:
so the answer is "don't do that" (unless, in the rare case, that's what you actually want). The purpose of the syntax is to automatically save arguments to same-named
attributes, not to perform any other magic.
If the programmer writes
def __init__(a, @b, @c=[]): pass sure but that's the coon case -- more common would be: def __init__(a, @b, c=None): handle_a if c is None: c = [] or some such. so without "any other magic", then we have a less useful proposal. One thing you can say about dataclasses is that they provide a way to handle all parameters, mutable and immutable. Anyway, I just thought it should be clearly said. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 4/28/22 21:46, Christopher Barker wrote:
One thing you can say about dataclasses is that they provide a way to handle all parameters, mutable and immutable.
Really? I did not know that. Interesting. Definitely an issue of dataclasses being special, though, and not attributable to the syntax used to make a dataclass. -- ~Ethan~

On Thu, Apr 28, 2022 at 10:26 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Absolutely -- my point is that if you want to auto-assign all parameters, then a dataclass is a good way to do it. If you need to write handling code for most parameters, then the current do-it-by-hand approach is fine. A new syntax would help most when you need to write custom code for a few parameters, but auto-assign the rest. I brought up mutable defaults, as they would require custom code, making the auto-assignment a tad less useful. But perhaps if the recent ideas for late-bound parameters ever bears fruit, then combining that with auto-assigning would increase the usefulness of both features. -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Apr 23, 2022, 1:11 PM Christopher Barker <pythonchb@gmail.com> wrote:
Although I agree that dataclasses have definitely grown beyond this scope, the definition of “mutable namedtuples with defaults” come from the original PEP (https://peps.python.org/pep-0557/#abstract). The main point here is that there are several usecases for classes that do not fit conceptually the “dataclass” goal.
I agree 100%. This proposal, at its core, is not related with dataclasses. There are some cases in which dataclasses are the solution, but there are many many times in which you will want to use just classes.
I don’t see B as a “extreme approach”. I think that comparing python classes with the specific dataclass is not helpful. The B scenario is simply the general case for class usage. Scenario A, I agree, is a very common one and fortunately we have dataclasses for them.
I agree that this is the main question. For what it’s worth, a quick grep on the stdlib (it’s an overestimation) provides: $ grep -Ie "self\.\(\w\+\) = \1" -r cpython/Lib | wc 2095 I did the same in two libraries that I use regularly: pandas and scikit-learn: $ grep -Ie "self\.\(\w\+\) = \1" -r sklearn | wc -l 1786 $ grep -Ie "self\.\(\w\+\) = \1" -r pandas | wc -l 650 That’s a total of ~4.5k lines of code (again, this is an overestimation, but it can give us an idea of the ballpark estimate) For a better and more fine-grained analysis, Quimey wrote this small library (https://github.com/quimeyps/analize_autoassign) that uses the Abstract Syntax Tree to analyze a bunch of libraries and identify when the “autoassign” could work. It shows that out of 20k analyzed classes in the selected libraries (including black, pandas, numpy, etc), ~17% of them could benefit from the usage of auto-assign syntax. So it looks like the isolated pattern of `self.<something> = <something>` is used a lot. I don’t think that moving all of these cases to dataclasses can provide a meaningful solution. When I take a look at these numbers (and reflect in my own experience and my colleagues) it looks like there is a use case for this feature. And this syntax modification looks small and kind of clean, not adding any boilerplate. But, obviously, it entails a further of discussion whether it makes sense to add new syntax for this, considering the maintenance that it implies.
As Ethan wrote on this thread, there is nothing “special” happening with mutable defaults: the early binding will work the same way and doing `def __init__(self, @a=[]): pass` would be the same than doing `def __init__(self, a=[]): self.a = a`. So I believe that, in spite of some very specific cases, it would be as discouraged as setting mutable variables as default in general. Late binding is probably a whole other can of worms. Pablo

On Sat, Apr 30, 2022 at 2:17 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
It's not and "extreme" approach -- it's one end of a continuum.
I think that comparing python classes with the specific dataclass is not helpful. The B scenario is simply the
That, well, is pretty much useless, if I understand the re correctly -- the fact that a class is assigning to self doesn't mean it's directly assigning the parameters with no other logic. And any number of those self assignments could be in non-__init__ methods. All that shows is that instance attributes are used. I don't think anyone is questioning that. For a better and more fine-grained analysis, Quimey wrote this small
It shows that out of 20k analyzed classes in the selected libraries
(including black, pandas, numpy, etc), ~17% of them could benefit from the usage of auto-assign syntax.
I only read English, and haven't studied the coe, so I don't know how that works, but assuming it's accurately testing for the simple cases that auto-assigning could work for; That's not that much actually -- for approx every six-parameter function, one of them could be auto-assigned. or for every six functions, one could make good use of auto-assignment (and maybe be a dataclass?) So it looks like the isolated pattern of `self.<something> = <something>`
is used a lot.
I don't think that's ever been in question. The question, as I see it, is what fraction of parameters could get auto-assigned in general -- for classes where dataclasses wouldn't make sense. And I'm not trying to be a Negative Nelly here -- I honestly don't know, I actually expected it to be higher than 17% -- but in any case, I think it should be higher than 17% to make it worth a syntax addition. But pandas and numpy may not be the least bit representative -- maybe run the most popular packages on PyPi? I thought I made this point, but it seems to have gotten lost: What I'm saying is that, for example, if a class __init__ has 6 parameters, and one of them could be auto-assigned, then yes, auto-assigning could be used, but you really haven't gained much from that -- it would not be worth the syntax change. And any class with an __init__ in which most or all parameters could be auto-assigned -- then those might be a good candidate for a dataclass. So how many are there where say, more than half of __init__ parameters could be auto-assigned, where dataclasses wouldn't be helpful? A lot? then, yes, new syntax may be warranted. But, obviously, it entails a further of discussion whether it makes sense
to add new syntax for this, considering the maintenance that it implies.
It's not so much the maintenance -- it's the transition burden, and the burden of yet more complexity in the language, particularly parameters/arguments: Try to teach a newbie about arguments/parameters in Python, there is a remarkable complexity there already: positional vs keyword *args, **kwargs keyword-only. (and all of these from both the caller and callee perspective) That's a lot of possible combinations -- believe me, it's pretty darn complex and hard to explain! -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 5/1/22 00:21, Christopher Barker wrote:
On Sat, Apr 30, 2022 at 2:17 PM Pablo Alcain wrote:
I think you place too much emphasis on dataclasses -- none of my projects use them, nor could they. Going through a one of my smaller projects, this is what I found: - number of `__init__`s: 11 - number of total params (not counting self): 25 - number of those params assigned as-is: 19 - number of `__init__`s where all are assigned as-is: 6 - number of non-`__init__`s where this would useful: 0
17% is a massive amount of code.
But pandas and numpy may not be the least bit representative [...]?
This would not be the first time Python was improved to help the scientific community. My own thoughts about the proposal: It seems interesting, and assigning as-is arguments is a chore -- but I'm not sure using up a token to help only one method per class is a good trade. -- ~Ethan~

On Sun, May 1, 2022 at 9:35 AM Ethan Furman <ethan@stoneleaf.us> wrote:
Is it unreasonable to instead suggest generalizing the assignment target for parameters? For example, if parameter assignment happened left to right, and allowed more than just variables, then one could do: def __init__(self, self.x, self.y): pass Python 2 had non-variable parameters (but not attributes, just unpacking: def foo((x, y)): pass), but it was removed in Python 3, because of issues with introspection (among other, perhaps less significant, issues): https://peps.python.org/pep-3113/ Perhaps now that positional-only parameters exist, and the introspection APIs have evolved over time, there is a way to work this into the introspection APIs sensibly. (a "@foo" parameter does not have this problem, because it's just a parameter named `foo` with some extra stuff.) -- Devin

On Mon, May 2, 2022 at 7:21 AM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, I agree. I don't think that the syntax is unreasonable, but it looks like it would be putting `self` at the same "level" of all the other possible parameters and could lead to this kind of confusion. What _might_ be a possibility (I'm not advocating in favor of it) is, like ruby does, to also add the `@x` as syntactic sugar for `self.x` in the body of the methods. This way the `@x` in the signature would be consistent, but I believe it can conflict conceptually with the "explicit self" philosophy.

Steven D'Aprano writes:
IMO, both of those should be errors. This syntax only makes much sense for the first formal argument of a method definition, because it's the only formal argument which has a fixed definition. The form "def foo(self, x, x.y)" has an interpretation, I guess, but def foo(self, x, y): x.y = y is not a pattern I can recall ever seeing, and it could be relatively easily relaxed if it were requested enough. On the other hand, folks do frequently request a way to DRY out long suites of "self.x = x" assignments. This could, of course, be done with a symbol such as '@' or even '.', but '@' could also be used for other purposes (late binding, for example), and "def foo(self, .x, .y):" looks like both grit on Tim's screen and a typo. On the other hand, I can't imagine what else might be meant by "def foo(self, self.x):". All that said, I'm not a fan of this feature as such. But giving this semantics to formal arguments of the form "self.x" is the most intuitive (in the sense of "hard to give another interpretation") of the proposals I've seen.

On Sat, 7 May 2022 at 23:15, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
I'd define it very simply. For positional args, these should be exactly equivalent: def func(self, x, x.y): ... def func(*args): self, x, x.y = args ... The logical extension to kwargs would work in natural parallel. ChrisA

On Sat, May 7, 2022 at 6:28 AM Chris Angelico <rosuav@gmail.com> wrote:
I really don't like this --hard to put my finger on it exactly, but I think it's because Python doesn't use any magic in method definitions. There is a touch of magic in binding methods to classes, but that comes later. so while: class Foo: def method(self, ...) Looks special, that's only because: - it's defined in the class definition directly - its using the self convention But class Foo: pass def method(fred, ...): pass Foo.method = method Means exactly the same thing. So then we have: def fun(foo, foo.bar): ... Is legal, but: def fun(this, foo, foo.bar): ... Is not.
it's the only formal argument which has a fixed definition.
yes, but only in one specific context -- so I don't like it leaking out of that context.
I like that -- it's simple to understand, clear, it doesn't only make sense for methods, and it might even be useful in other contexts [*]. I think: def fun(x, y.z): ... would work fine, too. e.g. you wouldn't be restricted to using other parameters. That being said, I'm still -1 on the idea. [*] -- the "other contexts" is key for me -- if someone can show that this is a useful pattern in other contexts, I think it would be a stronger proposal. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, May 07, 2022 at 11:38:19AM -0700, Ethan Furman wrote:
Indeed. Just because we can imagine semantics for some syntax, doesn't make it useful. Aside from the very special case of attribute binding in initialisation methods (usually `__init__`), and not even all, or even a majority, of those, this is a great example of YAGNI. Outside of that narrow example of auto-assignment of attributes, can anyone think of a use-case for this? And as far as auto-assignment of attributes goes, I want to remind everyone that we have that already, in a simple two-liner: vars(self).update(locals()) del self.self which will work for most practical cases where auto-assignment would be useful. (It doesn't work with slots though.) This is a useful technique that isn't widely known enough. I believe that if it were more widely know, we wouldn't be having this discussion at all. -- Steve

On Sun, 8 May 2022 at 10:23, Steven D'Aprano <steve@pearwood.info> wrote:
Honestly, I don't know of any. But in response to the objection that it makes no sense, I offer the perfectly reasonable suggestion that it could behave identically to other multiple assignment in Python. There's not a lot of places where people use "for x, x.y in iterable", but it's perfectly legal. Do we need a use-case for that one to justify having it, or is it justified by the simple logic that assignment targets are populated from left to right? I'm not advocating for this, but it shouldn't be pooh-poohed just because it has more power than you personally can think of uses for. ChrisA

On Sun, May 08, 2022 at 11:02:22AM +1000, Chris Angelico wrote:
On Sun, 8 May 2022 at 10:23, Steven D'Aprano <steve@pearwood.info> wrote:
Nobody says that it makes "no sense". Stephen Turnbull suggested it doesn't make "much sense", but in context I think it is clear that he meant there are no good uses for generalising this dotted parameter name idea, not that we can't invent a meaning for the syntax.
The analogy breaks down because we aren't talking about assignment targets, but function parameters. Function parameters are only *kinda sorta* like assignment targets, and the process of binding function arguments passed by the caller to those parameters is not as simple as self, x, x.y = args The interpreter also does a second pass using keyword arguments, and a third pass assigning defaults if needed. Or something like that -- I don't think the precise implementation matters. Of course we could make it work by giving *some* set of defined semantics, but unless it is actually useful, why should we bother? Hence my comment YAGNI.
I'm not advocating for this, but it shouldn't be pooh-poohed just because it has more power than you personally can think of uses for.
Power to do *what*? If nobody can think of any uses for this (beyond the auto-assignment of attributes), then what power does it really have? I don't think "power" of a programming language feature has a purely objective, precise definition. But if it did, it would surely have something to do with the ability to solve actual problems. -- Steve

On Sun, May 1, 2022 at 10:35 AM Ethan Furman <ethan@stoneleaf.us> wrote:
Yes, I agree that the cost/benefit should be analyzed. For what it's worth, the choice of the `@` was because of two different reasons: first, because we were inspired by Ruby's syntax (later on learned that CoffeeScript and Crystal had already taken the approach we are proposing) and because the `@` token is already used as an infix for `__matmul__` ( https://peps.python.org/pep-0465/). I believe it's the only usage that it has, so it probably won't be that confusing to give it this new semantic as well. All of this, I believe, mitigate the "using up a token", but it's not enough to make it a clear decision, so I 100% agree with your concern.

On Mon, May 02, 2022 at 10:34:56AM -0600, Pablo Alcain wrote:
Did you forget decorators? What other languages support this feature, and what syntax do they use? Personally, I don't like the idea of introducing syntax which looks legal in any function call at all, but is only semantically meaningful in methods, and not all methods. Mostly only `__init__`. How would this feature work with immutable classes where you want to assign attributes to the instance in the `__new__` method? I fear that this is too magical, too cryptic, for something that people only use in a tiny fraction of method. 17% of `__init__` methods is probably less than 1% of methods, which means that it is going to be a rare and unusual piece of syntax. Beginners and casual coders (students, scientists, sys admins, etc, anyone who dabbles in Python without being immersed in the language) are surely going to struggle to recognise where `instance.spam` gets assigned, when there is no `self.spam = spam` anywhere in the class or its superclasses. There is nothing about "@" that hints that it is an assignment. (Well, I suppose there is that assignment and at-sign both start with A.) I realise that this will not satisfy those who want to minimize the amount of keystrokes, but remembering that code is read perhaps 20-100 times more than it is written, perhaps we should consider a keyword: def __init__(self, auto spam:int, eggs:str = ''): # spam is automatically bound to self.spam self.eggs = eggs.lower() I dunno... I guess because of that "code is read more than it is written" thing, I've never felt that this was a major problem needing solving. Sure, every time I've written an __init__ with a bunch of `self.spam = spam` bindings, I've felt a tiny pang of "There has to be a better way!!!". But **not once** when I have read that same method later on have I regretted that those assignments are explicitly written out, or wished that they were implicit and invisible. Oh, by the way, if *all* of the parameters are to be bound: def __init__(self, spam, eggs, cheese, aardvark): vars(self).update(locals()) del self.self Still less magical and more explicit than this auto-assignment proposal. -- Steve

On Mon, 2 May 2022 at 18:46, Steven D'Aprano <steve@pearwood.info> wrote:
I have classes with 20+ parameters (packaging metadata). You can argue that a dataclass would be better, or some other form of refactoring, and you may actually be right. But it is a legitimate design for that use case. In that sort of case, 20+ lines of assignments in the constructor *are* actually rather unreadable, not just a pain to write. Of course the real problem is that you often don't want to *quite* assign the argument unchanged - `self.provides_extras = set(provides_extras or [])` or `self.requires_python = requires_python or specifiers.SpecifierSet()` are variations that break the whole "just assign the argument unchanged" pattern. As a variation on the issue, which the @ syntax *wouldn't* solve, in classmethods for classes like this, I often find myself constructing dictionaries of arguments, copying multiple values from one dict to another, sometimes with the same sort of subtle variation as above: @classmethod def from_other_args(cls, a, b, c, d): kw = {} kw["a"] = a kw["b"] = b kw["c"] = c kw["d"] = d return cls(**kw) Again, in "real code", not all of these would be copied, or some would have defaults, etc. The pattern's the same, though - enough args arecopied to make the idea of marking them with an @ seem attractive. Overall, as described I don't think the @arg proposal provides enough benefit to justify new syntax (and I think trying to extend it would end badly...). On the other hand, if someone were to come up with a useful, general way of bulk-copying named values from one "place"[1] to another, possibly with minor modifications, I think I'd find that very useful. Call it a DSL for bulk data initialisation, if you like. I think such a thing could pretty easily be designed as a library. But I doubt anyone will bother, as adhoc "on the fly" solutions tend to be sufficient in practice. Paul [1] A "place" might be a dictionary - dict["name"] or an object - getattr(self, "name").

On Mon, May 02, 2022 at 07:44:14PM +0100, Paul Moore wrote:
Indeed. 20+ parameters is only a code smell, it's not *necessarily* wrong. Sometimes you just need lots of parameters, even if it is ugly. For reference, open() only takes 8, so 20 is a pretty wiffy code smell, but it is what it is.
I don't know. Its pretty easy to skim lines when reading, especially when they follow a pattern: self.spam = spam self.eggs = eggs self.cheese = cheese self.aardvark = aardvark self.hovercraft = hovercraft self.grumpy = grumpy self.dopey = dopey self.doc = doc self.happy = happy self.bashful = bashful self.sneezy = sneezy self.sleepy = sleepy self.foo = foo self.bar = bar self.baz = baz self.major = major self.minor = minor self.minimus = minimus self.quantum = quantum self.aether = aether self.phlogiston = phlogiston Oh that was painful to write! But I only needed to write it once, and I bet that 99% of people reading it will just skim down the list rather than read each line in full. To be fair, having written it once, manual refactoring may require me to rewrite it again, or at least edit it. In early development, sometimes the parameters are in rapid flux, and that's really annoying. But that's just a minor period of experimental coding, not an on-going maintenance issue.
Indeed. Once we move out of that unchanged assignment pattern, we need to read more carefully rather than skim self._spam = (spam or '').lower().strip() but you can't replace that with auto assignment.
You may find it easier to make a copy of locals() and delete the parameters you don't want, rather than retype them all like that: params = locals().copy() for name in ['cls', 'e', 'g']: del params[name] return cls(**params)
But the @ proposal here won't help. If you mark them with @, won't they be auto-assigned onto cls? -- Steve

On Tue, 3 May 2022 at 03:04, Steven D'Aprano <steve@pearwood.info> wrote:
It's worth noting that dataclasses with lots of attributes by default generate constructors that require all of those as parameters. So it's a code smell yes, but by that logic so are dataclasses with many attributes (unless you write a bunch of custom code). Genuine question - what *is* a non-smelly way of writing a dataclass with 24 attributes? I've written about 20 variations on this particular class so far, and none of them feel "right" to me :-(
Precisely.
Again, precisely. My point here is that the @ proposal is, in my experience, useful in far fewer situations than people are claiming. What *is* common (again in my experience) is variations on a pattern that can be described as "lots of repetitive copying of values from one location to another, possibly with minor modifications". Having a way of addressing the broader problem *might* be of sufficient use to be worth pursuing, and it might even be possible to do something useful in a library, not needing new syntax. On the other hand, the @ syntax as proposed *doesn't* address enough use cases (for me!) to be worthwhile, especially not if new syntax is needed rather than just something like a decorator. Paul

On Tue, May 3, 2022 at 6:36 AM Paul Moore <p.f.moore@gmail.com> wrote:
It's a good point. We have been thinking a bit about how to measure the "usefulness" of the proposal. What we have so far is mainly an intuition driven by code that we and colleagues developed and a couple of statistics ( https://github.com/quimeyps/analize_autoassign in case anyone reading this doesn't have the link nearby). Although I think that code analysis can be a way to find out the usefulness of the proposal, the statistics that we have so far feel a bit coarse-grained to be honest. So any idea on what would be good metrics to answer the question of "how ofthen the syntax would be useful" will be more than welcome!

On Mon, May 2, 2022 at 11:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
totally forgot decorators, my bad!
What other languages support this feature, and what syntax do they use?
you mean languages other than those two? I haven't found any. In case you mean the syntax for those two, I know a tiny bit Crystal's. It leverages the fact that they use `@` for referring to `self`, as in Ruby. so you would be able to write something like this: ``` class Person def initialize(@name : String) end def greet print("Hello, ", @name) end end p = Person.new "Terry" p.greet ```
Yes, it's a good point. Allowing functions in the wild to use this syntax would simplify the usage for monkeypatching... but how often would you want to monkeypatch an `__init__`? and how often would you want to use the auto-assign outside of the `__init__`? i believe that it's too little. so in that case, maybe we can make it legal only in all methods. I agree, if we forego monkeypatching, that semantically it wouldn't be meaningful in functions; but, in methods, I think that semantically it would make sense apart from the `__init__`; the thing is that probably it wouldn't be that useful.

On Sat, Apr 23, 2022 at 12:11:07PM -0700, Christopher Barker wrote:
Isn't it? I thought this was a proposal to allow any class to partake in the dataclass autoassignment feature. (Not necessarily the implementation.)
I don't think of dataclasses as "mutable namedtuples with defaults" at all.
What do you think of them as?
But do think they are for classes that are primarily about storing a defined set of data.
Ah, mutable named tuples, with or without defaults? :-) Or possibly records/structs. -- Steve

On Sat, Apr 30, 2022 at 6:40 PM Steven D'Aprano <steve@pearwood.info> wrote:
no -- it's about only a small part of that.
I answered that in the next line, that you quote.
well, no. - the key is that you can add other methods to them, and produce all sort of varyingly complex functionality. I have done that myself.
Or possibly records/structs.
nope, nope, and nope. But anyway, the rest of my post was the real point, and we're busy arguing semantics here. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Apr 30, 2022 at 11:54:47PM -0700, Christopher Barker wrote:
How so? Dataclasses support autoassignment. This proposes to allow **all classes** (including non-dataclasses) to also support autoassignment. So can you pleae clarify your meaning. To me, this does look like an "all Classes" question. What am I missing?
Perhaps your answer isn't as clear as you think it is. See below.
Named tuples support all of that too. One of the reasons I have not glommed onto dataclasses is that for my purposes, they don't seem to add much that named tuples didn't already give us. * Record- or struct-like named fields? Check. * Automatic equality? Check. * Nice repr? Check. * Can add arbitrary methods and override existing methods? Check. Perhaps named tuples offer *too much**: * Instances of tuple; * Equality with other tuples; and maybe dataclasses offer some features I haven't needed yet, but it seems to me that named tuples and dataclasses are two solutions to the same problem: how to create a record with named fields.
Or possibly records/structs.
nope, nope, and nope.
Okay, I really have no idea what you think dataclasses are, if you don't think of them as something like an object-oriented kind of record or struct (a class with named data fields). You even define them in terms of storing a defined set of data, except you clearly don't mean a set in the mathematical meaning of an unordered collection (i.e. set()). A set of data is another term for a record. So I don't understand what you think dataclasses are, if you vehemently deny that they are records (not just one nope, but three). And since I don't understand your concept of dataclasses, I don't know how to treat your position in this discussion. Should I treat it as mainstream, or idiosyncratic? Right now, it seems pretty idiosyncratic. Maybe that's because I don't understand you. See below.
But anyway, the rest of my post was the real point, and we're busy arguing semantics here.
Well yes, because if we don't agree on semantics, we cannot possibly communicate. Semantics is the **meaning of our words and concepts**. If we don't agree on what those words mean, then how do we understand each other? I've never understood people who seem to prefer to talk past one another with misunderstanding after misunderstanding rather than "argue semantics" and clarify precisely what they mean. -- Steve

On Sun, May 1, 2022 at 1:16 AM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, any class could use this feature (though it's more limited than what dataclasses do) -- what I was getting is is that it would not be (particularly) useful for all classes -- only classes where there are a lot of __init__ parameters that can be auto-assigned. And that use case overlaps to some extent with dataclasses. Perhaps your answer isn't as clear as you think it is.
apparently not.
well, no. - the key is that you can add other methods to them, and
"primarily" -- but the key difference is that dataclasses are far more customisable and flexible. They are more like "classes with boiler plate dunders auto-generated" That is, a lot more like "regular" classes than they are like tuples. Whereas namedtupels are , well, tuples where the item have names. That's kinda it. produce
all sort of varyingly complex functionality.
Named tuples support all of that too.
No, they don't -- you can add methods, though with a klunky interface, and they ARE tuples under the hood which does come with restrictions. And the immutability means that added methods can't actually do very much. One of the reasons I have not glommed onto dataclasses is that for my
purposes, they don't seem to add much that named tuples didn't already give us.
ahh -- that may be because you think of them as "mutable named tuples" -- that is, the only reason you'd want to use them is if you want your "record" to be mutable. But I think you miss the larger picture.
that's a little klunky though, isn't it? Have you seen much use of named tuples like that? For that matter do folks do that with tuples much either? Perhaps named tuples offer *too much**:
* Instances of tuple; * Equality with other tuples;
Yes, that can be a downside, indeed. and maybe dataclasses offer some features I haven't needed yet, but it
seems to me that named tuples and dataclasses are two solutions to the same problem: how to create a record with named fields.
I suspect you may have missed the power of datclasses because you started with this assumption. Maybe it's because I'm not much of a database guy, but I don't think in terms of records. For me, datclasses are a way to make a general purpose class that hold a bunch of data, and have the boilerplate written for me. And what dataclasses add that makes them so flexible is that they: - allow for various custom fields: - notably default factories to handle mutable defaults - provide a way to customise the initialization - and critically, provide a collection of field objects that can be used to customize behavior. All this makes them very useful for more general purpose classes than a simple record. I guess a way to think if them is this: if you are writing a class in which the __init__ assigns most of the parameters to the instance, then a dataclass could be helpful. which is why I think they solve *part* of the problem that special auto assigning syntax would solve. Not all of the problem, which is why I'm suggesting that folks find evidence for how often auto-assigned parameters would be very useful when dataclasses would not. So I don't understand what you think dataclasses are, if you vehemently
deny that they are records (not just one nope, but three).
It's not that they can't be used as records, it's that they can be so much more. After all what is any class but a collection of attributes (some of which may be methods) ?
perhaps it is -- the mainstream may not have noticed how much one can do with dataclasses.
sure -- I should have been more explicit -- arguing about what "mutable named tuple" means didn't seem useful. But in retrospect I was wrong -- it may matter to this discussion, and that's why I originally pointed out that I don't think of dataclasses as "mutable named tuples" -- perhaps I should have said not *only* mutable named tuples. The relevant point here is that I'm suggesting there are uses for dataclasses that go beyond a simple record that happens to be mutable -- i.e. classes that do (potentially much) more than simply get and set attributes. Which means that someone that needs to assign a lot of parameters to self, but doesn't think they are writing a simple record may well be able to use dataclasses, and thus may not need syntax for auto-assigning paramaters. But one thinks that dataclasses are "mutable named tuples", then then they wont' consider them for more complex needs, and thus may find they really want that auto-assigning syntax.
What I meant by arguing semantics is that we don't need to agree on a brief way to describe dataclasses -- dataclasses are what they are, and are defined by what features they have. That's it. If you want to call them 'mutable namedtuples', fine, just be careful that that doesn't limit what you think they can be used for. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
participants (12)
-
Chris Angelico
-
Christopher Barker
-
Devin Jeanpierre
-
Eric V. Smith
-
Ethan Furman
-
Joao S. O. Bueno
-
Josh Rosenberg
-
Pablo Alcain
-
Paul Moore
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Yves Duprat