
On Sat, Apr 23, 2022, 1:11 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, Apr 23, 2022 at 10:53 AM Pablo Alcain <pabloalcain@gmail.com> wrote:
Overall, I think that not all Classes can be thought of as Dataclasses and, even though dataclasses solutions have their merits, they probably cannot be extended to most of the other classes.
Absolutely. However, this is not an "all Classes" question.
I don't think of dataclasses as "mutable namedtuples with defaults" at all.
Although I agree that dataclasses have definitely grown beyond this scope, the definition of “mutable namedtuples with defaults” come from the original PEP (https://peps.python.org/pep-0557/#abstract). The main point here is that there are several usecases for classes that do not fit conceptually the “dataclass” goal.
But do think they are for classes that are primarily about storing a defined set of data.
I make heavy use of them for this, when I am adding quite a bit of ucntionatily, but their core function is still to store a collection of data. To put it less abstractly:
Dataclasses are good for classes in which the collection of fields is a primary focus -- so the auto-generated __init__, __eq__ etc are appropriate.
It's kind of a recursive definition: dataclasses work well for those things that data classes' auto generated methods work well for :-)
If, indeed, you need a lot of custom behavior for teh __init__, and __eq__, and ... then datclasses are not for you.
I agree 100%. This proposal, at its core, is not related with dataclasses. There are some cases in which dataclasses are the solution, but there are many many times in which you will want to use just classes.
And the current Python class system is great for fully customized behaviour. It's quite purposeful that parameters of the __init__ have no special behavior, and that "self" is explicit -- it gives you full flexibility, and everything is explicit. That's a good thing.
But, of course, the reason this proposal is on the table (and it's not the first time by any means) is that it's a common pattern to assign (at least some of) the __init__ parameters to instance attributes as is.
So we have two extremes -- on one hand:
A) Most __init__ params are assigned as instance attributes as is, and these are primarily needed for __eq__ and __repr__
and on the other extreme:
B) Most __init__ params need specialized behavior, and are quite distinct from what's needed by __eq__ and __repr__
(A) is, of course, the entire point of dataclasses, so that's covered.
(B) is well covered by the current, you-need-to-specify-everything approach.
I don’t see B as a “extreme approach”. I think that comparing python classes with the specific dataclass is not helpful. The B scenario is simply the general case for class usage. Scenario A, I agree, is a very common one and fortunately we have dataclasses for them.
So the question is -- how common is it that you have code that's far enough toward the (A) extreme as far as __init__ params being instance attributes that we want special syntax, when we don't want most of the __eq__ and __repr__ behaviour.
I agree that this is the main question. For what it’s worth, a quick grep on the stdlib (it’s an overestimation) provides: $ grep -Ie "self\.\(\w\+\) = \1" -r cpython/Lib | wc 2095 I did the same in two libraries that I use regularly: pandas and scikit-learn: $ grep -Ie "self\.\(\w\+\) = \1" -r sklearn | wc -l 1786 $ grep -Ie "self\.\(\w\+\) = \1" -r pandas | wc -l 650 That’s a total of ~4.5k lines of code (again, this is an overestimation, but it can give us an idea of the ballpark estimate) For a better and more fine-grained analysis, Quimey wrote this small library (https://github.com/quimeyps/analize_autoassign) that uses the Abstract Syntax Tree to analyze a bunch of libraries and identify when the “autoassign” could work. It shows that out of 20k analyzed classes in the selected libraries (including black, pandas, numpy, etc), ~17% of them could benefit from the usage of auto-assign syntax. So it looks like the isolated pattern of `self.<something> = <something>` is used a lot. I don’t think that moving all of these cases to dataclasses can provide a meaningful solution. When I take a look at these numbers (and reflect in my own experience and my colleagues) it looks like there is a use case for this feature. And this syntax modification looks small and kind of clean, not adding any boilerplate. But, obviously, it entails a further of discussion whether it makes sense to add new syntax for this, considering the maintenance that it implies.
In my experience, not all that much -- my code tends to be on one extreme or the other.
But I think that's the case that needs to be made -- that there's a lot of use cases for auto-assigning instance attributes, that also need highly customized behaviour for other attributes and __eq__ and __repr__.
NOTE: another key question for this proposal is how you would handle mutable defaults -- anything special, or "don't do that"?
As Ethan wrote on this thread, there is nothing “special” happening with mutable defaults: the early binding will work the same way and doing `def __init__(self, @a=[]): pass` would be the same than doing `def __init__(self, a=[]): self.a = a`. So I believe that, in spite of some very specific cases, it would be as discouraged as setting mutable variables as default in general. Late binding is probably a whole other can of worms. Pablo
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython