[Python-ideas] Re: Auto assignment of attributes

April 30, 2022

      On Sat, Apr 23, 2022, 1:11 PM Christopher Barker <pythonchb@gmail.com>
wrote:
...
On Sat, Apr 23, 2022 at 10:53 AM Pablo Alcain <pabloalcain@gmail.com>
wrote:
...
Overall, I think that not all Classes can be thought of as Dataclasses
and, even though dataclasses solutions have their merits, they probably
cannot be extended to most of the other classes.
Absolutely. However, this is not an "all Classes" question.
I don't think of dataclasses as "mutable namedtuples with defaults" at all.
Although I agree that dataclasses have definitely grown beyond this scope,
the definition of “mutable namedtuples with defaults” come from the
original PEP (https://peps.python.org/pep-0557/#abstract). The main point
here is that there are several usecases for classes that do not fit
conceptually the “dataclass” goal.
...
But do think they are for classes that are primarily about storing a
defined set of data.
I make heavy use of them for this, when I am adding quite a bit of
ucntionatily, but their core function is still to store a collection of
data. To put it less abstractly:
Dataclasses are good for classes in which the collection of fields is a
primary focus -- so the auto-generated __init__, __eq__ etc are appropriate.
It's kind of a recursive definition: dataclasses work well for those
things that data classes' auto generated methods work well for :-)
If, indeed, you need a lot of custom behavior for teh __init__, and
__eq__, and ... then datclasses are not for you.
I agree 100%. This proposal, at its core, is not related with dataclasses.
There are some cases in which dataclasses are the solution, but there are
many many times in which you will want to use just classes.
...
And the current Python class system is great for fully customized
behaviour. It's quite purposeful that parameters of the __init__ have no
special behavior, and that "self" is explicit -- it gives you full
flexibility, and everything is explicit. That's a good thing.
But, of course, the reason this proposal is on the table (and it's not the
first time by any means) is that it's a common pattern to assign (at least
some of) the __init__ parameters to instance attributes as is.
So we have two extremes -- on one hand:
A) Most __init__ params are assigned as instance attributes as is, and
these are primarily needed for __eq__ and __repr__
and on the other extreme:
B) Most __init__ params need specialized behavior, and are quite distinct
from what's needed by __eq__ and __repr__
(A) is, of course, the entire point of dataclasses, so that's covered.
(B) is well covered by the current, you-need-to-specify-everything
approach.
I don’t see B as a “extreme approach”. I think that comparing python
classes with the specific dataclass is not helpful. The B scenario is
simply the general case for class usage. Scenario A, I agree, is a very
common one and fortunately we have dataclasses for them.
...
So the question is -- how common is it that you have code that's far
enough toward the (A) extreme as far as __init__ params being instance
attributes that we want special syntax, when we don't want most of the
__eq__ and __repr__ behaviour.
I agree that this is the main question. For what it’s worth, a quick grep
on the stdlib (it’s an overestimation) provides:

$ grep -Ie "self\.\(\w\+\) = \1" -r cpython/Lib | wc
2095

I did the same in two libraries that I use regularly: pandas and
scikit-learn:

$ grep -Ie "self\.\(\w\+\) = \1" -r sklearn | wc -l
1786

$ grep -Ie "self\.\(\w\+\) = \1" -r pandas | wc -l
650

That’s a total of ~4.5k lines of code (again, this is an overestimation,
but it can give us an idea of the ballpark estimate)

For a better and more fine-grained analysis, Quimey wrote this small
library (https://github.com/quimeyps/analize_autoassign) that uses the
Abstract Syntax Tree to analyze a bunch of libraries and identify when the
“autoassign” could work. It shows that out of 20k analyzed classes in the
selected libraries (including black, pandas, numpy, etc), ~17% of them
could benefit from the usage of auto-assign syntax.

So it looks like the isolated pattern of `self.<something> = <something>`
is used a lot. I don’t think that moving all of these cases to dataclasses
can provide a meaningful solution. When I take a look at these numbers (and
reflect in my own experience and my colleagues) it looks like there is a
use case for this feature. And this syntax modification looks small and
kind of clean, not adding any boilerplate. But, obviously, it entails a
further of discussion whether it makes sense to add new syntax for this,
considering the maintenance that it implies.
...
In my experience, not all that much -- my code tends to be on one extreme
or the other.
But I think that's the case that needs to be made -- that there's a lot of
use cases for auto-assigning instance attributes, that also need
highly customized behaviour for other attributes and __eq__  and __repr__.
NOTE: another key question for this proposal is how you would handle
mutable defaults -- anything special, or "don't do that"?
As Ethan wrote on this thread, there is nothing “special” happening with
mutable defaults: the early binding will work the same way and doing `def
__init__(self, @a=[]): pass` would be the same than doing `def
__init__(self, a=[]): self.a = a`. So I believe that, in spite of some very
specific cases, it would be as discouraged as setting mutable variables as
default in general. Late binding is probably a whole other can of worms.

Pablo
...
-CHB
--
Christopher Barker, PhD (Chris)
Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython