Re: Should dataclass init call super?

On Mon, Apr 13, 2020 at 9:22 PM Neil Girdhar <mistersheik@gmail.com> wrote:
Cool, thanks for doing the relevant research.
For my part, I'd like to see an aeefort to move dataclasses forward. Now that they are in the standard library, they do need to remain pretty stable, but there's still room for extending them. But it's a bit hard when ideas and PRs are mingled in with everything else Python. Maybe a gitHub repo just for dataclasses? @Eric V. Smith <eric@trueblade.com>: what do you think? IS there a way to keep them moving forward?
I'm just going to swap dataclasses for actual classes whenever I need inheritance. It seems like a pity though.
For my part, I've gotten around it (for a different reason...) with an extra inheritance dance: @dataclass MyClassBase: .... MyRealClass(MyClassBase, Some_other_baseclass): def __init__(self., args, kwargs): dc_args, dc_kwargs = some_magic_with_self.__dataclass_fields__ MyClassBase.__init__(dc_args, dc_kwargs) super().__init__(self, args, kwargs) and you could put that __init__ in a mixin to re-use it. Or, frankly, just give your dataclass some extra fields that are needed by the superclass you want to use. -CHB Best,
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

For simple situations you can call super in the __post_init__ method and things will work fine: class BaseClass: def __init__(self): print("class BaseClass") @dataclass class DataClass(BaseClass): def __post_init__(self): super().__init__() print("class DataClass") class ChildClass(DataClass): def __init__(self): super().__init__() print("class ChildClass")
Note that this will break if you try to add a second dataclass to the inheritance hierarchy using the same method: @dataclass class BrokenClass(ChildClass): def __post_init__(self): super().__init__()
Maybe some work could be done to allow dataclasses to be smarter about calling super().__init__() inside of the __post_init__ method (so that recursion is avoided), I do not know. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Tue, Apr 14, 2020 at 7:37 PM Christopher Barker <pythonchb@gmail.com> wrote:

On Tue, Apr 14, 2020 at 7:46 PM Ricky Teachey <ricky@teachey.org> wrote:
For simple situations you can call super in the __post_init__ method and things will work fine:
But not for the OP's case: he wanted to pass extra parameters in -- and the dataclass' __init__ won't accept extra arguments. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

InitVar fields for all the desired parent class init parameters can often solve the problem. But it can be painful to have to manually provide every parameter explicitly when normally (when not using a dataclass) you'd just add *args and **kwargs to the init signature and call super().__init__(*args, **kwargs). Which is what the OP is after. It becomes more painful the more parameters the parent has- parameters which the dataclass may not even care about. It not only makes the class definition long, it adds so these additional parameters to the init signature, which is icky for introspection and discoverability. Lots of "What the heck is this parameter doing here?" head scratching for future me (because I forget everything). There's currently not a very compelling solution, AFAIK, to be able to use dataclasses in these kinds of situations ("these kinds" = any situation other than the most simple) other than the solution Christopher Barker suggested: using a mixin approach that treats the dataclass parameters specially. So I just haven't. I did write a decorator of my own that replaces the dataclass init with one that calls super().__init__(*args, **kwargs) first before proceeding with the one written by dataclasses... I can't find it at the moment. But that has its own problems; one being the IDE doesn't know the init has been rewritten this way and so will complain about parameters sent to the dataclass that it doesn't know about.

you'd just add *args and **kwargs to the init signature and call super().__init__(*args, **kwargs).
Which is what the OP is after.
Hmm, makes me wonder if there should be an option to define a __pre_init__ method. Then you could customize the signature, but still use data classes nifty features for the primary __init__ And no, I haven’t thought this out, it would be tricky, and maybe impossible. Which brings me back to the suggestion in a PR: Optional have the __init__ accept *args, *kwargs, and then store them in self. Then users could do whatever they like with them in __post_init -Chris It becomes more painful the more parameters the parent has- parameters
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Apr 15, 2020 at 8:45 AM Christopher Barker <pythonchb@gmail.com> wrote:
Also note that the 'attr' package on PyPI is still available and provides features that dataclasses do not. Generalizing something in the stdlib is not always the best/necessary solution, especially if there's a battle-tested alternative available on PyPI. -Brett

On Apr 15, 2020, at 10:16, Brett Cannon <brett@python.org> wrote:
Wasn’t dataclass designed with customization/extension hooks for apps or libraries to use, like the field metadata? Are any libs on PyPI taking advantage of that? If not, maybe this would be a good test case for that functionality. If it turns out to be easy and obvious, then as soon as someone’s got something stable and popular, it could be proposed for a merge into the stdlib—but if it turns out that there are multiple good ways to handle it they could stay as competitors on PyPI forever, while if it turns out that the extension hooks aren’t sufficient, someone could propose exactly what needs to be changed to make the extension writable.

There is a way to have dataclasses as they are now behave collaboratively with a further decorator. For Python 3.8 livefcycle such decorator could live in an external package - if its good, it could go into the stdlib, or maybe, another "dataclasses.collaborative_dataclass" would already create a collaborative dataclass without breaking backwards compatibility. here is some sample code, I've tested locally: ``` def colaborative(cls): if not hasattr(cls, "__dataclass_fields__") or not "__init__" in cls.__dict__: return cls cls._dataclass_init = cls.__init__ super_ = cls.__mro__[1] @wraps(cls.__init__) def __init__(self, *args, **kw): # use inspect.signature stuff to proper retrieve 'args' into dataclass kw if needed, else: dataclass_kw = {} for key, value in list(kw.items()): if key in cls.__annotations__: dataclass_kw[key] = kw.pop(key) self._dataclass_init(**dataclass_kw) super_.__init__(self, *args, **kw) cls.__init__ = __init__ return cls ``` https://gist.github.com/jsbueno/5a207e6a2c6c433a7549c78ba2edab7d On Wed, 15 Apr 2020 at 16:40, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

Scanning the messages, I met this from Ricky Teachey: I did write a decorator of my own that replaces the dataclass init with one
yep. My approach is the same and will fail the same way - but that is only so much IDEs can go if they keep tring to statically guess parameters to a class' __init__ . Anyway, I think it is a way to go if one needs the collaborative dataclasses now. On Wed, 15 Apr 2020 at 17:45, Joao S. O. Bueno <jsbueno@python.org.br> wrote:

To handle that case, couldn’t we just add InitVarStar and InitVarStarStar fields? If present, any extra positional and/or keyword args get captured for the __postinit__ to pass along just like normal InitVars. I think that could definitely be useful, but not as useful as you seem to think it would be.
It becomes more painful the more parameters the parent has- parameters which the dataclass may not even care about. It not only makes the class definition long, it adds so these additional parameters to the init signature, which is icky for introspection and discoverability. Lots of "What the heck is this parameter doing here?" head scratching for future me (because I forget everything).
I think that’s backward. The signature is there for the user of the dataclass, not the implementer. And the user had better care about that x argument, because it’s a mandatory parameter of the X class, so if they don’t pass one, they’re going to get an exception from inside some class they never heard of. So having x show up in the signature would be helpful for introspection and discovery, not harmful. It makes your users ask “What the heck is the x parameter doing here?” but that’s a question that they’d better have an answer to or they can’t construct a Y instance. (And notice that the X doesn’t take or pass along *args, so if the Y claims to take *args as well as **kwargs, that’s even more misleading, because passing any extra positional args to the constructor will also raise.) And that’s as true for tools as for human readers—an IDE auto-completing the parameters of Y(…) should be prompting you for an x; a static analyzer should be catching that you forgot to pass as x; etc. There are cases where you need *args and/or **kwargs in a constructor. But I don’t think they make sense for a dataclsss. For example, you’re not going to write a functools.partial replacement or a generic RPC proxy object as a dataclass. But there are cases where you _want_ them, just out of… call it enlightened laziness. And those cases do seem to apply to dataclass at least as much as normal classes. And that’s why I think it could be worth having these new fields. The OP’s toy example looks like part if a design where the X is one of a bag of mixins from some library that you compose up however you want in your app. It would be more discoverable and readable if the final composed Y class knew which parameters its mixins demanded—but it may not be worth the effort to put that together (either manually, or programmatically at class def time). If your Y class is being exposed as part of a library (or RPC or bridge or whatever), or it forms part of the connection between two key components in an air traffic control system, then you probably do want to put in that effort. If it’s an internal class that only gets constructed in one place, in a tool for trolling each other with bad music in the office stereo that only you and three colleagues will ever run, why do the extra work to get earlier checking that you don’t need? The fact that Python leaves that kind of choice up to you to decide (because you’re the only one who knows), and so do most pythonic libraries like the one you got that X mixin out of… that’s a big part of why you wrote that script in Python in the first place. And if dataclasses get in the way of that, it’s a problem, and probably worth fixing.

I'm just curious: what is the downside of calling super with kwargs? Usually when I define a class, the first I write is def __init__(self, **kwargs): super().__init__(**kwargs) just in case I want to use the class in cooperative inheritance. I always thought it couldn't hurt? I might be alone in this interpretation, but I imagine that there are three fundamental kinds of inheritance patterns for methods, which I defined in my ipromise package (https://pypi.org/project/ipromise/): implementing an abstract method, overriding, and augmenting. If I had to choose, I would say that __init__ should be an "augmenting" pattern. Therefore, it seems weird for __init__ not to call super. Even if you want Y t to override some behavior in X, what happens if Z inherits from Y and W? Now, Y.__init__'s decision not to call super would mean that W.__init__ would not be called. That seems like a bug. Instead, I would rather put the behavior that Y wants to override in a separate method, say X.f, which is called in X.__init__. Now, if Y.f overrides X.f, everything is okay. Even if Z inherits from Y and W, the override still works, and W.__init__ still gets called, angels sing, etc. Long story short, am I wrong to interpret __init__ as a "must augment" method and always call super().__init__(**kwargs)? On Wed, Apr 15, 2020 at 1:32 PM Andrew Barnert <abarnert@yahoo.com> wrote:
You got it.

Therefore, it seems weird for __init__ not to call super.
I was not part of it, but I know this was heavily discussed during the development of dataclasses prior to 3.7, and it was decided not to do this, at least at THAT time (not saying that can't change, but there were reasons). If Eric V Smith is about I'm sure he could provide insight, or links to the relevant conversations.

On 4/15/2020 1:52 PM, Ricky Teachey wrote:
Sorry, I've been tied up on other things. In general, it's not possible to know how to call super.__init__() if you don't a priori know the arguments it takes. That's why dataclasses doesn't guess. The only general-purpose way to do it is to use *args and **kwargs. Since a goal of dataclasses is to use good typing information that works with type checkers, that seems to defeat part of the purpose. One thing you could do in this scenario is to use a classmethod as an "alternate constructor". This basically gives you any post- and pre- init functionality you want, using any regular parameters or *args and **kwargs as fits your case. This example shows how to do it with *args and **kwargs, but you could just as easily do it with known param names: from dataclasses import dataclass class BaseClass: def __init__(self, x, y): print('initializing base class', x, y) self.x = x self.y = y @dataclass class MyClass(BaseClass): a: int b: int c: int @classmethod def new(cls, a, b, c, *args, **kwargs): self = MyClass(a, b, c) super(cls, self).__init__(*args, **kwargs) return self c = MyClass.new(1, 2, 3, 10, 20) print(c) print(c.x, c.y) If you got really excited about it, you could probably write a decorator to do some of this boilerplate for you (not that there's a lot of it in this example). But I don't think this would get added to dataclasses itself. To Brett's point: has anyone looked to see what attrs does here? When last I looked, they didn't call any super __init__, but maybe they've added something to address this. Eric

On Wed, Apr 15, 2020 at 3:36 PM Eric V. Smith <eric@trueblade.com> wrote:
I was about to make this same point, thanks. super.__init__() absolutely should NOT be called by default in an auto-generated __init__. However, dataclasses, could, optionally, take *args and **kwargs, and store them in a instance attribute. Then you could call super(), or anything else, in __post_init__. And there are other reasons to want them to take arbitrary other parameters (like to ignore them, see the PR).
Since a goal of dataclasses is to use good typing information that works with type checkers, that seems to defeat part of the purpose.
I have never used a type checker :-) -- but I still really dislike the passing around of *args, **kwargs -- it makes your API completely non-self documented. So I'll agree here.
That's a nice way to go -- I suggested a similar way by adding another layer of subclassing. Using a classmethod is cleaner, but using another layer of subclassing allows you to keep the regular __init__ syntax -- and if you want to make these classes for others to use, the familiar API is a good thing. I do wonder if we could have a __pre_init__ -- as a way to add something like this classmethod, but as a regular __init__. Butnow that I think about it, here's another option: if the user passes in init=False, (or maybe some other value), then the __init__ could still be generated, and accesable to the user's __init__. It might look something like this (to follow your example): from dataclasses import dataclass class BaseClass: def __init__(self, x, y): print('initializing base class', x, y) self.x = x self.y = y @dataclass(init=False) class MyClass(BaseClass): a: int b: int c: int def __init__(self, a, b, c, x, y): super.__init__(x, y) # _ds__init__ would be the auto-generated __init__ self._ds__init__(self, a, b, c) return self c = MyClass.new(1, 2, 3, 10, 20) print(c) print(c.x, c.y)
"Please note that attrs does not call super() *ever*." (emphasis theirs) So that's that. :-) I also did see this: (from https://www.attrs.org/en/stable/init.html): "Embrace classmethods as a filter between reality and what’s best for you to work with" and: "Generally speaking, the moment you think that you need finer control over how your class is instantiated than what attrs offers, it’s usually best to use a classmethod factory..." Which seems to support your classmethod idea :-) But I still think that it would be good to have some kin dof way to customise the __init__ without re-writting the whole thing, and/or have a way to keep *args,**kwargs around to use in __post_init__ Maybe a gitHub repo just for dataclasses?
@Eric V. Smith <eric@trueblade.com>: what do you think? Is there a way to
keep them moving forward?
I think it's fine to make suggestions and have discussions here.
Dataclasses aren't sufficiently large that they need their own repo (like tulip did, for example).
Sure -- and go to know you're monitoring this list (and no, I don't expect instant responses) But it's just that I've seen at least one idea (the **kwargs one) kind of wither and die, rather than be rejected (at least not obviously) -- which maybe would have happened anyway.
But I also don't think we're going to just add lots of features to dataclasses. They're meant to be lean.
As they should be.
I realize drawing the line is difficult. For example, I think asdict and astuple were mistakes, and should have been done outside of the stdlib.
I agree there -- I've found I needed to re-implement asdict anyway, as I needed something a little special. But that's only possible because dataclasses already have the infrastructure in place to do that. So I think any enhancements should be to allow third-party extensions, rather than actual new functionality. In this case, yes, folks can use an alternate constructor, but there is no way to get other arguments passed through an auto-generated__init__ -- so there are various ways that one cannot extend dataclasses -- I'd like to see that additional feature, to enable various third-party extensions. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

This is a good critique of what I said! Just want to clarify, though, that I was thinking the entire time of OPTIONAL arguments-- like, for example, of the kind heavily used in pandas. There tons of optional arguments in that library I do not care about, ever. But you are correct about non-optional arguments.

Well, the OP’s example doesn’t have an optional argument, only a non-optional one. But for optional parameters that you really do not care about, ever, you’re never going to pass them to Y, so why do you care that Y won’t pass them along to X? And for optional parameters that are meaningful and useful to Y’s users, surely having them visible in the help, etc. would make things more discoverable, not less? I think ultimately the argument you want to make really is the “enlightened laziness” one: there are lots of optional Pandas-y parameters in your superclass(es), and most of them you will definitely never care about, but a few of them you actually might occasionally care about. What then? Well, if your Y class is part of mission-critical interface code that lives depend on, you probably do need to work out which those are and get them nicely documented and statically checkable, but in a lot of cases it isn’t nearly worth that much effort—just make Y pass everything through, and the couple places you end up needing to pass down one of those Pandas-y arguments you just do so, and it works, and that’s fine.

On 4/14/2020 7:34 PM, Christopher Barker wrote:
I think it's fine to make suggestions and have discussions here. Dataclasses aren't sufficiently large that they need their own repo (like tulip did, for example). But I also don't think we're going to just add lots of features to dataclasses. They're meant to be lean. I realize drawing the line is difficult. For example, I think asdict and astuple were mistakes, and should have been done outside of the stdlib.
I think the suggestion elsewhere of InitVarArgs and InitVarKwargs might have some merit, although I think getting input from mypy or some other type checkers first would be a good idea. Eric

For simple situations you can call super in the __post_init__ method and things will work fine: class BaseClass: def __init__(self): print("class BaseClass") @dataclass class DataClass(BaseClass): def __post_init__(self): super().__init__() print("class DataClass") class ChildClass(DataClass): def __init__(self): super().__init__() print("class ChildClass")
Note that this will break if you try to add a second dataclass to the inheritance hierarchy using the same method: @dataclass class BrokenClass(ChildClass): def __post_init__(self): super().__init__()
Maybe some work could be done to allow dataclasses to be smarter about calling super().__init__() inside of the __post_init__ method (so that recursion is avoided), I do not know. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Tue, Apr 14, 2020 at 7:37 PM Christopher Barker <pythonchb@gmail.com> wrote:

On Tue, Apr 14, 2020 at 7:46 PM Ricky Teachey <ricky@teachey.org> wrote:
For simple situations you can call super in the __post_init__ method and things will work fine:
But not for the OP's case: he wanted to pass extra parameters in -- and the dataclass' __init__ won't accept extra arguments. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

InitVar fields for all the desired parent class init parameters can often solve the problem. But it can be painful to have to manually provide every parameter explicitly when normally (when not using a dataclass) you'd just add *args and **kwargs to the init signature and call super().__init__(*args, **kwargs). Which is what the OP is after. It becomes more painful the more parameters the parent has- parameters which the dataclass may not even care about. It not only makes the class definition long, it adds so these additional parameters to the init signature, which is icky for introspection and discoverability. Lots of "What the heck is this parameter doing here?" head scratching for future me (because I forget everything). There's currently not a very compelling solution, AFAIK, to be able to use dataclasses in these kinds of situations ("these kinds" = any situation other than the most simple) other than the solution Christopher Barker suggested: using a mixin approach that treats the dataclass parameters specially. So I just haven't. I did write a decorator of my own that replaces the dataclass init with one that calls super().__init__(*args, **kwargs) first before proceeding with the one written by dataclasses... I can't find it at the moment. But that has its own problems; one being the IDE doesn't know the init has been rewritten this way and so will complain about parameters sent to the dataclass that it doesn't know about.

you'd just add *args and **kwargs to the init signature and call super().__init__(*args, **kwargs).
Which is what the OP is after.
Hmm, makes me wonder if there should be an option to define a __pre_init__ method. Then you could customize the signature, but still use data classes nifty features for the primary __init__ And no, I haven’t thought this out, it would be tricky, and maybe impossible. Which brings me back to the suggestion in a PR: Optional have the __init__ accept *args, *kwargs, and then store them in self. Then users could do whatever they like with them in __post_init -Chris It becomes more painful the more parameters the parent has- parameters
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Apr 15, 2020 at 8:45 AM Christopher Barker <pythonchb@gmail.com> wrote:
Also note that the 'attr' package on PyPI is still available and provides features that dataclasses do not. Generalizing something in the stdlib is not always the best/necessary solution, especially if there's a battle-tested alternative available on PyPI. -Brett

On Apr 15, 2020, at 10:16, Brett Cannon <brett@python.org> wrote:
Wasn’t dataclass designed with customization/extension hooks for apps or libraries to use, like the field metadata? Are any libs on PyPI taking advantage of that? If not, maybe this would be a good test case for that functionality. If it turns out to be easy and obvious, then as soon as someone’s got something stable and popular, it could be proposed for a merge into the stdlib—but if it turns out that there are multiple good ways to handle it they could stay as competitors on PyPI forever, while if it turns out that the extension hooks aren’t sufficient, someone could propose exactly what needs to be changed to make the extension writable.

There is a way to have dataclasses as they are now behave collaboratively with a further decorator. For Python 3.8 livefcycle such decorator could live in an external package - if its good, it could go into the stdlib, or maybe, another "dataclasses.collaborative_dataclass" would already create a collaborative dataclass without breaking backwards compatibility. here is some sample code, I've tested locally: ``` def colaborative(cls): if not hasattr(cls, "__dataclass_fields__") or not "__init__" in cls.__dict__: return cls cls._dataclass_init = cls.__init__ super_ = cls.__mro__[1] @wraps(cls.__init__) def __init__(self, *args, **kw): # use inspect.signature stuff to proper retrieve 'args' into dataclass kw if needed, else: dataclass_kw = {} for key, value in list(kw.items()): if key in cls.__annotations__: dataclass_kw[key] = kw.pop(key) self._dataclass_init(**dataclass_kw) super_.__init__(self, *args, **kw) cls.__init__ = __init__ return cls ``` https://gist.github.com/jsbueno/5a207e6a2c6c433a7549c78ba2edab7d On Wed, 15 Apr 2020 at 16:40, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

Scanning the messages, I met this from Ricky Teachey: I did write a decorator of my own that replaces the dataclass init with one
yep. My approach is the same and will fail the same way - but that is only so much IDEs can go if they keep tring to statically guess parameters to a class' __init__ . Anyway, I think it is a way to go if one needs the collaborative dataclasses now. On Wed, 15 Apr 2020 at 17:45, Joao S. O. Bueno <jsbueno@python.org.br> wrote:

To handle that case, couldn’t we just add InitVarStar and InitVarStarStar fields? If present, any extra positional and/or keyword args get captured for the __postinit__ to pass along just like normal InitVars. I think that could definitely be useful, but not as useful as you seem to think it would be.
It becomes more painful the more parameters the parent has- parameters which the dataclass may not even care about. It not only makes the class definition long, it adds so these additional parameters to the init signature, which is icky for introspection and discoverability. Lots of "What the heck is this parameter doing here?" head scratching for future me (because I forget everything).
I think that’s backward. The signature is there for the user of the dataclass, not the implementer. And the user had better care about that x argument, because it’s a mandatory parameter of the X class, so if they don’t pass one, they’re going to get an exception from inside some class they never heard of. So having x show up in the signature would be helpful for introspection and discovery, not harmful. It makes your users ask “What the heck is the x parameter doing here?” but that’s a question that they’d better have an answer to or they can’t construct a Y instance. (And notice that the X doesn’t take or pass along *args, so if the Y claims to take *args as well as **kwargs, that’s even more misleading, because passing any extra positional args to the constructor will also raise.) And that’s as true for tools as for human readers—an IDE auto-completing the parameters of Y(…) should be prompting you for an x; a static analyzer should be catching that you forgot to pass as x; etc. There are cases where you need *args and/or **kwargs in a constructor. But I don’t think they make sense for a dataclsss. For example, you’re not going to write a functools.partial replacement or a generic RPC proxy object as a dataclass. But there are cases where you _want_ them, just out of… call it enlightened laziness. And those cases do seem to apply to dataclass at least as much as normal classes. And that’s why I think it could be worth having these new fields. The OP’s toy example looks like part if a design where the X is one of a bag of mixins from some library that you compose up however you want in your app. It would be more discoverable and readable if the final composed Y class knew which parameters its mixins demanded—but it may not be worth the effort to put that together (either manually, or programmatically at class def time). If your Y class is being exposed as part of a library (or RPC or bridge or whatever), or it forms part of the connection between two key components in an air traffic control system, then you probably do want to put in that effort. If it’s an internal class that only gets constructed in one place, in a tool for trolling each other with bad music in the office stereo that only you and three colleagues will ever run, why do the extra work to get earlier checking that you don’t need? The fact that Python leaves that kind of choice up to you to decide (because you’re the only one who knows), and so do most pythonic libraries like the one you got that X mixin out of… that’s a big part of why you wrote that script in Python in the first place. And if dataclasses get in the way of that, it’s a problem, and probably worth fixing.

I'm just curious: what is the downside of calling super with kwargs? Usually when I define a class, the first I write is def __init__(self, **kwargs): super().__init__(**kwargs) just in case I want to use the class in cooperative inheritance. I always thought it couldn't hurt? I might be alone in this interpretation, but I imagine that there are three fundamental kinds of inheritance patterns for methods, which I defined in my ipromise package (https://pypi.org/project/ipromise/): implementing an abstract method, overriding, and augmenting. If I had to choose, I would say that __init__ should be an "augmenting" pattern. Therefore, it seems weird for __init__ not to call super. Even if you want Y t to override some behavior in X, what happens if Z inherits from Y and W? Now, Y.__init__'s decision not to call super would mean that W.__init__ would not be called. That seems like a bug. Instead, I would rather put the behavior that Y wants to override in a separate method, say X.f, which is called in X.__init__. Now, if Y.f overrides X.f, everything is okay. Even if Z inherits from Y and W, the override still works, and W.__init__ still gets called, angels sing, etc. Long story short, am I wrong to interpret __init__ as a "must augment" method and always call super().__init__(**kwargs)? On Wed, Apr 15, 2020 at 1:32 PM Andrew Barnert <abarnert@yahoo.com> wrote:
You got it.

Therefore, it seems weird for __init__ not to call super.
I was not part of it, but I know this was heavily discussed during the development of dataclasses prior to 3.7, and it was decided not to do this, at least at THAT time (not saying that can't change, but there were reasons). If Eric V Smith is about I'm sure he could provide insight, or links to the relevant conversations.

On 4/15/2020 1:52 PM, Ricky Teachey wrote:
Sorry, I've been tied up on other things. In general, it's not possible to know how to call super.__init__() if you don't a priori know the arguments it takes. That's why dataclasses doesn't guess. The only general-purpose way to do it is to use *args and **kwargs. Since a goal of dataclasses is to use good typing information that works with type checkers, that seems to defeat part of the purpose. One thing you could do in this scenario is to use a classmethod as an "alternate constructor". This basically gives you any post- and pre- init functionality you want, using any regular parameters or *args and **kwargs as fits your case. This example shows how to do it with *args and **kwargs, but you could just as easily do it with known param names: from dataclasses import dataclass class BaseClass: def __init__(self, x, y): print('initializing base class', x, y) self.x = x self.y = y @dataclass class MyClass(BaseClass): a: int b: int c: int @classmethod def new(cls, a, b, c, *args, **kwargs): self = MyClass(a, b, c) super(cls, self).__init__(*args, **kwargs) return self c = MyClass.new(1, 2, 3, 10, 20) print(c) print(c.x, c.y) If you got really excited about it, you could probably write a decorator to do some of this boilerplate for you (not that there's a lot of it in this example). But I don't think this would get added to dataclasses itself. To Brett's point: has anyone looked to see what attrs does here? When last I looked, they didn't call any super __init__, but maybe they've added something to address this. Eric

On Wed, Apr 15, 2020 at 3:36 PM Eric V. Smith <eric@trueblade.com> wrote:
I was about to make this same point, thanks. super.__init__() absolutely should NOT be called by default in an auto-generated __init__. However, dataclasses, could, optionally, take *args and **kwargs, and store them in a instance attribute. Then you could call super(), or anything else, in __post_init__. And there are other reasons to want them to take arbitrary other parameters (like to ignore them, see the PR).
Since a goal of dataclasses is to use good typing information that works with type checkers, that seems to defeat part of the purpose.
I have never used a type checker :-) -- but I still really dislike the passing around of *args, **kwargs -- it makes your API completely non-self documented. So I'll agree here.
That's a nice way to go -- I suggested a similar way by adding another layer of subclassing. Using a classmethod is cleaner, but using another layer of subclassing allows you to keep the regular __init__ syntax -- and if you want to make these classes for others to use, the familiar API is a good thing. I do wonder if we could have a __pre_init__ -- as a way to add something like this classmethod, but as a regular __init__. Butnow that I think about it, here's another option: if the user passes in init=False, (or maybe some other value), then the __init__ could still be generated, and accesable to the user's __init__. It might look something like this (to follow your example): from dataclasses import dataclass class BaseClass: def __init__(self, x, y): print('initializing base class', x, y) self.x = x self.y = y @dataclass(init=False) class MyClass(BaseClass): a: int b: int c: int def __init__(self, a, b, c, x, y): super.__init__(x, y) # _ds__init__ would be the auto-generated __init__ self._ds__init__(self, a, b, c) return self c = MyClass.new(1, 2, 3, 10, 20) print(c) print(c.x, c.y)
"Please note that attrs does not call super() *ever*." (emphasis theirs) So that's that. :-) I also did see this: (from https://www.attrs.org/en/stable/init.html): "Embrace classmethods as a filter between reality and what’s best for you to work with" and: "Generally speaking, the moment you think that you need finer control over how your class is instantiated than what attrs offers, it’s usually best to use a classmethod factory..." Which seems to support your classmethod idea :-) But I still think that it would be good to have some kin dof way to customise the __init__ without re-writting the whole thing, and/or have a way to keep *args,**kwargs around to use in __post_init__ Maybe a gitHub repo just for dataclasses?
@Eric V. Smith <eric@trueblade.com>: what do you think? Is there a way to
keep them moving forward?
I think it's fine to make suggestions and have discussions here.
Dataclasses aren't sufficiently large that they need their own repo (like tulip did, for example).
Sure -- and go to know you're monitoring this list (and no, I don't expect instant responses) But it's just that I've seen at least one idea (the **kwargs one) kind of wither and die, rather than be rejected (at least not obviously) -- which maybe would have happened anyway.
But I also don't think we're going to just add lots of features to dataclasses. They're meant to be lean.
As they should be.
I realize drawing the line is difficult. For example, I think asdict and astuple were mistakes, and should have been done outside of the stdlib.
I agree there -- I've found I needed to re-implement asdict anyway, as I needed something a little special. But that's only possible because dataclasses already have the infrastructure in place to do that. So I think any enhancements should be to allow third-party extensions, rather than actual new functionality. In this case, yes, folks can use an alternate constructor, but there is no way to get other arguments passed through an auto-generated__init__ -- so there are various ways that one cannot extend dataclasses -- I'd like to see that additional feature, to enable various third-party extensions. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

This is a good critique of what I said! Just want to clarify, though, that I was thinking the entire time of OPTIONAL arguments-- like, for example, of the kind heavily used in pandas. There tons of optional arguments in that library I do not care about, ever. But you are correct about non-optional arguments.

Well, the OP’s example doesn’t have an optional argument, only a non-optional one. But for optional parameters that you really do not care about, ever, you’re never going to pass them to Y, so why do you care that Y won’t pass them along to X? And for optional parameters that are meaningful and useful to Y’s users, surely having them visible in the help, etc. would make things more discoverable, not less? I think ultimately the argument you want to make really is the “enlightened laziness” one: there are lots of optional Pandas-y parameters in your superclass(es), and most of them you will definitely never care about, but a few of them you actually might occasionally care about. What then? Well, if your Y class is part of mission-critical interface code that lives depend on, you probably do need to work out which those are and get them nicely documented and statically checkable, but in a lot of cases it isn’t nearly worth that much effort—just make Y pass everything through, and the couple places you end up needing to pass down one of those Pandas-y arguments you just do so, and it works, and that’s fine.

On 4/14/2020 7:34 PM, Christopher Barker wrote:
I think it's fine to make suggestions and have discussions here. Dataclasses aren't sufficiently large that they need their own repo (like tulip did, for example). But I also don't think we're going to just add lots of features to dataclasses. They're meant to be lean. I realize drawing the line is difficult. For example, I think asdict and astuple were mistakes, and should have been done outside of the stdlib.
I think the suggestion elsewhere of InitVarArgs and InitVarKwargs might have some merit, although I think getting input from mypy or some other type checkers first would be a good idea. Eric
participants (8)
-
Andrew Barnert
-
Brett Cannon
-
Christopher Barker
-
Eric V. Smith
-
Joao S. O. Bueno
-
Neil Girdhar
-
Ricky Teachey
-
Wes Turner