
On Sat, Apr 30, 2022 at 2:17 PM Pablo Alcain <pabloalcain@gmail.com> wrote:
B) Most __init__ params need specialized behavior, and are quite distinct from what's needed by __eq__ and __repr__
(B) is well covered by the current, you-need-to-specify-everything approach.
I don’t see B as a “extreme approach”.
It's not and "extreme" approach -- it's one end of a continuum.
I think that comparing python classes with the specific dataclass is not helpful. The B scenario is simply the
$ grep -Ie "self\.\(\w\+\) = \1" -r cpython/Lib | wc 2095
I did the same in two libraries that I use regularly: pandas and scikit-learn:
$ grep -Ie "self\.\(\w\+\) = \1" -r sklearn | wc -l 1786
$ grep -Ie "self\.\(\w\+\) = \1" -r pandas | wc -l 650
That’s a total of ~4.5k lines of code (again, this is an overestimation, but it can give us an idea of the ballpark estimate)
That, well, is pretty much useless, if I understand the re correctly -- the fact that a class is assigning to self doesn't mean it's directly assigning the parameters with no other logic. And any number of those self assignments could be in non-__init__ methods. All that shows is that instance attributes are used. I don't think anyone is questioning that. For a better and more fine-grained analysis, Quimey wrote this small
library (https://github.com/quimeyps/analize_autoassign) that uses the Abstract Syntax Tree to analyze a bunch of libraries and identify when the “autoassign” could work.
It shows that out of 20k analyzed classes in the selected libraries
(including black, pandas, numpy, etc), ~17% of them could benefit from the usage of auto-assign syntax.
I only read English, and haven't studied the coe, so I don't know how that works, but assuming it's accurately testing for the simple cases that auto-assigning could work for; That's not that much actually -- for approx every six-parameter function, one of them could be auto-assigned. or for every six functions, one could make good use of auto-assignment (and maybe be a dataclass?) So it looks like the isolated pattern of `self.<something> = <something>`
is used a lot.
I don't think that's ever been in question. The question, as I see it, is what fraction of parameters could get auto-assigned in general -- for classes where dataclasses wouldn't make sense. And I'm not trying to be a Negative Nelly here -- I honestly don't know, I actually expected it to be higher than 17% -- but in any case, I think it should be higher than 17% to make it worth a syntax addition. But pandas and numpy may not be the least bit representative -- maybe run the most popular packages on PyPi? I thought I made this point, but it seems to have gotten lost: What I'm saying is that, for example, if a class __init__ has 6 parameters, and one of them could be auto-assigned, then yes, auto-assigning could be used, but you really haven't gained much from that -- it would not be worth the syntax change. And any class with an __init__ in which most or all parameters could be auto-assigned -- then those might be a good candidate for a dataclass. So how many are there where say, more than half of __init__ parameters could be auto-assigned, where dataclasses wouldn't be helpful? A lot? then, yes, new syntax may be warranted. But, obviously, it entails a further of discussion whether it makes sense
to add new syntax for this, considering the maintenance that it implies.
It's not so much the maintenance -- it's the transition burden, and the burden of yet more complexity in the language, particularly parameters/arguments: Try to teach a newbie about arguments/parameters in Python, there is a remarkable complexity there already: positional vs keyword *args, **kwargs keyword-only. (and all of these from both the caller and callee perspective) That's a lot of possible combinations -- believe me, it's pretty darn complex and hard to explain! -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython