Second post: PEP 557, Data Classes
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly. The major changes from the previous version are: - Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic. The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77 Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first. Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work. Feedback is welcomed! Eric.
One more change: - Per-field metadata, for use by third parties. Also, thanks to Guido and Ivan for all of their feedback on the various issues that got the PEP to this point. Eric. On 11/25/2017 4:06 PM, Eric V. Smith wrote:
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77
Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.
Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.
Feedback is welcomed!
Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...
Hi Eric, Really excited about this PEP, thanks for working on it. A couple minor questions:
If compare is True, then eq is ignored, and __eq__ and __ne__ will be automatically generated.
IMO it's generally preferable to make nonsensical parameter combinations an immediate error, rather than silently ignore one of them. Is there a strong reason for letting nonsense pass silently here? (I reviewed the previous thread; there was a lot of discussion about enums/flags vs two boolean params, but I didn't see explicit discussion of this issue; the only passing references I noticed said the invalid combo should be "disallowed", e.g. Guido in [1], which to me implies "an error.")
isdataclass(instance): Returns True if instance is an instance of a Data Class, otherwise returns False.
Something smells wrong with the naming here. If I have @dataclass class Person: name: str I think it would be considered obvious and undeniable (in English prose, anyway) that Person is a dataclass. So it seems wrong to have `isdataclass(Person)` return `False`. Is there a reason not to let it handle either a class or an instance (looks like it would actually simplify the implementation)? Carl [1] https://mail.python.org/pipermail/python-dev/2017-September/149505.html On 11/25/2017 01:06 PM, Eric V. Smith wrote:
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77
Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.
Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.
Feedback is welcomed!
Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/carl%40oddbird.net
On 11/26/2017 12:23 AM, Carl Meyer wrote:
Hi Eric,
Really excited about this PEP, thanks for working on it.
Thanks, Carl.
A couple minor questions:
If compare is True, then eq is ignored, and __eq__ and __ne__ will be automatically generated.
IMO it's generally preferable to make nonsensical parameter combinations an immediate error, rather than silently ignore one of them. Is there a strong reason for letting nonsense pass silently here?
(I reviewed the previous thread; there was a lot of discussion about enums/flags vs two boolean params, but I didn't see explicit discussion of this issue; the only passing references I noticed said the invalid combo should be "disallowed", e.g. Guido in [1], which to me implies "an error.")
I think you're right here. I'll change it to a ValueError.
isdataclass(instance): Returns True if instance is an instance of a Data Class, otherwise returns False.
Something smells wrong with the naming here. If I have
@dataclass class Person: name: str
I think it would be considered obvious and undeniable (in English prose, anyway) that Person is a dataclass. So it seems wrong to have `isdataclass(Person)` return `False`. Is there a reason not to let it handle either a class or an instance (looks like it would actually simplify the implementation)?
I think of this as really "isdataclassinstance". Let's see what others think. There are places in dataclasses.py where I need to know both ways: for example fields() works with either a class or instance, but asdict() just an instance. For what it's worth, the equivalent attrs API, attr.has(), returns True for both an instance and a class. And the recommended solution for a namedtuple (check for existence of _fields) would also work for an instance and a class. And I suppose it's easy enough for the caller to further disallow a class, if that's what they want. Eric.
Carl
[1] https://mail.python.org/pipermail/python-dev/2017-September/149505.html
On 11/25/2017 01:06 PM, Eric V. Smith wrote:
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77
Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.
Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.
Feedback is welcomed!
Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/carl%40oddbird.net
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...
On 11/26/2017 3:35 AM, Eric V. Smith wrote:
On 11/26/2017 12:23 AM, Carl Meyer wrote:
A couple minor questions:
If compare is True, then eq is ignored, and __eq__ and __ne__ will be automatically generated.
IMO it's generally preferable to make nonsensical parameter combinations an immediate error, rather than silently ignore one of them. Is there a strong reason for letting nonsense pass silently here?
(I reviewed the previous thread; there was a lot of discussion about enums/flags vs two boolean params, but I didn't see explicit discussion of this issue; the only passing references I noticed said the invalid combo should be "disallowed", e.g. Guido in [1], which to me implies "an error.")
I think you're right here. I'll change it to a ValueError.
While creating an issue for this (https://github.com/ericvsmith/dataclasses/issues/88), it occurs to me that the class-level parameter really should be "order" or "orderable", not "compare". It made more sense when it was called "cmp", but "compare" now seems wrong. Because "eq" says "can I compare two instances", and what's currently called "compare" is "can I order two instances". Nick had a similar suggestion before the PEP was written (https://mail.python.org/pipermail/python-ideas/2017-May/045740.html). The field-level parameter should stay "compare", because it's used for both __gt__ and friends, as well as __eq__ and __ne__. It's saying "is this field used in all of the comparison methods". Eric.
On 11/26/2017 3:48 AM, Eric V. Smith wrote:
While creating an issue for this (https://github.com/ericvsmith/dataclasses/issues/88), it occurs to me that the class-level parameter really should be "order" or "orderable", not "compare". It made more sense when it was called "cmp", but "compare" now seems wrong.
Because "eq" says "can I compare two instances", and what's currently called "compare" is "can I order two instances". Nick had a similar suggestion before the PEP was written (https://mail.python.org/pipermail/python-ideas/2017-May/045740.html).
The field-level parameter should stay "compare", because it's used for both __gt__ and friends, as well as __eq__ and __ne__. It's saying "is this field used in all of the comparison methods".
I created https://github.com/ericvsmith/dataclasses/issues/90 for this. I think I'll leave 'eq' alone, and change 'compare' to 'order', for the class-level parameter name. Eric.
On Sat, Nov 25, 2017, 14:00 Eric V. Smith,
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.
It looks great and I'm excited to get to start using this PEP!
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51
I say don't require the type comparison for duck typing purposes. -Brett
- Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77
Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.
Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.
Feedback is welcomed!
Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On 11/26/2017 1:04 PM, Brett Cannon wrote:
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51
I say don't require the type comparison for duck typing purposes.
The problem with that is that you end up with cases like this, which I don't think we want: @dataclass class Point: x: int y: int @dataclass class Point3d: x: int y: int z: int assert Point(1, 2) == Point3d(1, 2, 3) Eric.
On 27 November 2017 at 06:22, Eric V. Smith
On 11/26/2017 1:04 PM, Brett Cannon wrote:
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51
I say don't require the type comparison for duck typing purposes.
The problem with that is that you end up with cases like this, which I don't think we want:
@dataclass class Point: x: int y: int
@dataclass class Point3d: x: int y: int z: int
assert Point(1, 2) == Point3d(1, 2, 3)
Perhaps the check could be: (type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match) That way the exact type check would be an optimisation to speed up the common case, while the formal semantic constraint would be that the field definitions have to match (including their names and order). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan wrote:
Perhaps the check could be:
(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)
I think the types should *always* have to match, or at least one should be a subclass of the other. Consider: @dataclass class Point3d: x: float y: float z: float @dataclass class Vector3d: x: float y: float z: float Points and vectors are different things, and they should never compare equal, even if they have the same field names and values. -- Greg
On 27 November 2017 at 15:04, Greg Ewing
Nick Coghlan wrote:
Perhaps the check could be:
(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)
I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:
@dataclass class Point3d: x: float y: float z: float
@dataclass class Vector3d: x: float y: float z: float
Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.
And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 11/27/2017 1:04 AM, Nick Coghlan wrote:
On 27 November 2017 at 15:04, Greg Ewing
wrote: Nick Coghlan wrote:
Perhaps the check could be:
(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)
I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:
@dataclass class Point3d: x: float y: float z: float
@dataclass class Vector3d: x: float y: float z: float
Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.
And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.
And in this case you could also do: astuple(point) == astuple(vector) Eric.
Following up on this subthread (inline below).
On Mon, Nov 27, 2017 at 2:56 AM, Eric V. Smith
On 11/27/2017 1:04 AM, Nick Coghlan wrote:
On 27 November 2017 at 15:04, Greg Ewing
wrote: Nick Coghlan wrote:
Perhaps the check could be:
(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)
I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:
@dataclass class Point3d: x: float y: float z: float
@dataclass class Vector3d: x: float y: float z: float
Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.
And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.
And in this case you could also do: astuple(point) == astuple(vector)
Didn't we at one point have something like isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match> (plus some optimization if the types are identical)? That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal. -- --Guido van Rossum (python.org/~guido)
On 11/27/17 10:51 AM, Guido van Rossum wrote:
Following up on this subthread (inline below).
On Mon, Nov 27, 2017 at 2:56 AM, Eric V. Smith
mailto:eric@trueblade.com> wrote: On 11/27/2017 1:04 AM, Nick Coghlan wrote:
On 27 November 2017 at 15:04, Greg Ewing
mailto:greg.ewing@canterbury.ac.nz> wrote: Nick Coghlan wrote:
Perhaps the check could be:
(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)
I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:
@dataclass class Point3d: x: float y: float z: float
@dataclass class Vector3d: x: float y: float z: float
Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.
And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.
And in this case you could also do: astuple(point) == astuple(vector)
Didn't we at one point have something like
isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match>
(plus some optimization if the types are identical)?
That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal.
I don't think we had that before, but it sounds right to me. I think it could be: isinstance(other, self.__class__) and len(fields(other)) == len(fields(self)) and <all individual fields match> Since by definition if you're a subclass you'll start with all of the same fields. So if the len's match, you won't have added any new fields. That should be sufficiently cheap. Then the optimized version would be: (self.__class__ is other.__class__) or (isinstance(other, self.__class__) and len(fields(other)) == len(fields(self))) and <all individual fields match> I'd probably further optimize len(fields(obj)), but that's the general idea. Eric.
Sounds good.
On Nov 27, 2017 8:00 AM, "Eric V. Smith"
On 11/27/17 10:51 AM, Guido van Rossum wrote:
Following up on this subthread (inline below).
On Mon, Nov 27, 2017 at 2:56 AM, Eric V. Smith
mailto:eric@trueblade.com> wrote: On 11/27/2017 1:04 AM, Nick Coghlan wrote:
On 27 November 2017 at 15:04, Greg Ewing
mailto:greg.ewing@canterbury.ac.nz> wrote: Nick Coghlan wrote:
Perhaps the check could be:
(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)
I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:
@dataclass class Point3d: x: float y: float z: float
@dataclass class Vector3d: x: float y: float z: float
Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.
And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.
And in this case you could also do: astuple(point) == astuple(vector)
Didn't we at one point have something like
isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match>
(plus some optimization if the types are identical)?
That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal.
I don't think we had that before, but it sounds right to me. I think it could be:
isinstance(other, self.__class__) and len(fields(other)) == len(fields(self)) and <all individual fields match>
Since by definition if you're a subclass you'll start with all of the same fields. So if the len's match, you won't have added any new fields. That should be sufficiently cheap.
Then the optimized version would be:
(self.__class__ is other.__class__) or (isinstance(other, self.__class__) and len(fields(other)) == len(fields(self))) and <all individual fields match>
I'd probably further optimize len(fields(obj)), but that's the general idea.
Eric.
On 11/27/2017 10:51 AM, Guido van Rossum wrote:
Following up on this subthread (inline below).
Didn't we at one point have something like
isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match>
(plus some optimization if the types are identical)?
That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal.
One thing this doesn't let you do is compare instances of two different subclasses of a base type: @dataclass class B: i: int @dataclass class C1(B): pass @dataclass class C2(B): pass You can't compare C1(0) and C2(0), because neither one is an instance of the other's type. The test to get this case to work would be expensive: find the common ancestor, and then make sure no fields have been added since then. And I haven't thought through multiple inheritance. I suggest we don't try to support this case. Eric.
On 28 November 2017 at 17:41, Eric V. Smith
One thing this doesn't let you do is compare instances of two different subclasses of a base type:
@dataclass class B: i: int
@dataclass class C1(B): pass
@dataclass class C2(B): pass
You can't compare C1(0) and C2(0), because neither one is an instance of the other's type. The test to get this case to work would be expensive: find the common ancestor, and then make sure no fields have been added since then. And I haven't thought through multiple inheritance.
I suggest we don't try to support this case.
That gets you onto problematic ground as far as transitivity is concerned, since you'd end up with the following: >>> b = B(0); c1 = C1(0); c2 = C2(0) >>> c1 == b True >>> b == c2 True >>> c1 == c2 False However, I think you can fix this by injecting the first base in the MRO that defines a data field as a "__field_layout__" class attribute, and then have the comparison methods check for "other.__field_layout__ is self.__field_layout__", rather than checking the runtime class directly. So in the above example, you would have:
B.__field_layout__ is B True C1.__field_layout__ is B True C2.__field_layout__ is B True
It would then be up to the dataclass decorator to set `__field_layout__` correctly, using the follow rules: 1. Use the just-defined class if the class defines any fields 2. Use the just-defined class if it inherits from multiple base classes that define fields and don't already share an MRO 3. Use a base class if that's either the only base class that defines fields, or if all other base classes that define fields are already in the MRO of that base class Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 11/28/17 7:02 AM, Nick Coghlan wrote:
On 28 November 2017 at 17:41, Eric V. Smith
wrote: One thing this doesn't let you do is compare instances of two different subclasses of a base type:
@dataclass class B: i: int
@dataclass class C1(B): pass
@dataclass class C2(B): pass
You can't compare C1(0) and C2(0), because neither one is an instance of the other's type. The test to get this case to work would be expensive: find the common ancestor, and then make sure no fields have been added since then. And I haven't thought through multiple inheritance.
I suggest we don't try to support this case.
That gets you onto problematic ground as far as transitivity is concerned, since you'd end up with the following:
Excellent point, thanks for raising it.
>>> b = B(0); c1 = C1(0); c2 = C2(0) >>> c1 == b True >>> b == c2 True >>> c1 == c2 False
However, I think you can fix this by injecting the first base in the MRO that defines a data field as a "__field_layout__" class attribute, and then have the comparison methods check for "other.__field_layout__ is self.__field_layout__", rather than checking the runtime class directly.
So in the above example, you would have:
B.__field_layout__ is B True C1.__field_layout__ is B True C2.__field_layout__ is B True
It would then be up to the dataclass decorator to set `__field_layout__` correctly, using the follow rules:
1. Use the just-defined class if the class defines any fields 2. Use the just-defined class if it inherits from multiple base classes that define fields and don't already share an MRO 3. Use a base class if that's either the only base class that defines fields, or if all other base classes that define fields are already in the MRO of that base class
That seems like a lot of complication for a feature that will be rarely used. I'll give it some thought, especially the MI logic. I think what you're laying out is an optimization for "do the classes have identical fields, inherited through a common base class or classes", right? Eric.
I would also be happy with a retreat, where we define `__eq__` to insist that the classes are the same, and people can override this to their hearts' content.
On 11/28/2017 1:57 PM, Guido van Rossum wrote:
I would also be happy with a retreat, where we define `__eq__` to insist that the classes are the same, and people can override this to their hearts' content.
I agree. And I guess we could always add it later, if there's a huge demand and someone writes a decent implementation. There would be a slight backwards incompatibility, though. Frankly, I think it would never be needed. One question remains: do we do what attrs does: for `__eq__` and `__ne__` use an exact type match, and for the 4 ordered comparison operators use an isinstance check? On the comparison operators, they also ignore attributes defined on any derived class [0]. As I said, I couldn't find an attrs issue that discusses their choice. I'll ask Hynek over on the dataclasses github issue. Currently the dataclasses code on master uses an exact type match for all 6 methods. Eric. [0] That is, they do the following (using dataclasses terms): Given: @dataclass class B: i: int j: int @dataclass class C: k: int Then B.__eq__ is: def __eq__(self, other): if other.__class__ is self.__class__: return (other.i, other.j) == (self.i, self.j) return NotImplemented And B.__lt__ is: def __lt__(self, other): if isinstance(other, self.__class__): return (other.i, other.j) < (self.i, self.j) return NotImplemented So if you do: b = B(1, 2) c = C(1, 2, 3) Then `B(1, 2) < C(1, 2, 3)` ignores `c.k`.
On Tue, Nov 28, 2017 at 12:56 PM, Eric V. Smith
On 11/28/2017 1:57 PM, Guido van Rossum wrote:
I would also be happy with a retreat, where we define `__eq__` to insist that the classes are the same, and people can override this to their hearts' content.
I agree. And I guess we could always add it later, if there's a huge demand and someone writes a decent implementation. There would be a slight backwards incompatibility, though. Frankly, I think it would never be needed.
One question remains: do we do what attrs does: for `__eq__` and `__ne__` use an exact type match, and for the 4 ordered comparison operators use an isinstance check? On the comparison operators, they also ignore attributes defined on any derived class [0]. As I said, I couldn't find an attrs issue that discusses their choice. I'll ask Hynek over on the dataclasses github issue.
Currently the dataclasses code on master uses an exact type match for all 6 methods.
Eric.
[0] That is, they do the following (using dataclasses terms):
Given:
@dataclass class B: i: int j: int
@dataclass class C: k: int
Then B.__eq__ is:
def __eq__(self, other): if other.__class__ is self.__class__: return (other.i, other.j) == (self.i, self.j) return NotImplemented
And B.__lt__ is:
def __lt__(self, other): if isinstance(other, self.__class__): return (other.i, other.j) < (self.i, self.j) return NotImplemented
So if you do: b = B(1, 2) c = C(1, 2, 3)
Then `B(1, 2) < C(1, 2, 3)` ignores `c.k`.
Hm. Maybe for the ordering comparisons we could defer to the class with the longest list of fields, as long as there's a subtype relationship? That way b<c and c>b would be equivalent, and both would use C.__gt__. Which had better not reject this on the basis that other is not an instance of a subclass of C. IIRC there's already something in the interpreter that tries the most derived class first for binary operators -- that may force our hand here. -- --Guido van Rossum (python.org/~guido)
On 11/28/2017 4:14 PM, Guido van Rossum wrote:
Hm. Maybe for the ordering comparisons we could defer to the class with the longest list of fields, as long as there's a subtype relationship? That way b<c and c>b would be equivalent, and both would use C.__gt__. Which had better not reject this on the basis that other is not an instance of a subclass of C.
IIRC there's already something in the interpreter that tries the most derived class first for binary operators -- that may force our hand here.
I'm leaning toward doing the same thing attrs does. They have much more experience with this. This is my last open specification issue. I think I'm ready for a pronouncement on the PEP once I do one final editing pass, in hopes of getting this in 3.7.0a3 by next weekend. If the comparisons need to change, I'm okay with doing that in the beta. Eric.
On 11/28/2017 8:31 PM, Eric V. Smith wrote:
On 11/28/2017 4:14 PM, Guido van Rossum wrote:
Hm. Maybe for the ordering comparisons we could defer to the class with the longest list of fields, as long as there's a subtype relationship? That way b<c and c>b would be equivalent, and both would use C.__gt__. Which had better not reject this on the basis that other is not an instance of a subclass of C.
IIRC there's already something in the interpreter that tries the most derived class first for binary operators -- that may force our hand here.
I'm leaning toward doing the same thing attrs does. They have much more experience with this.
Except that given Hynek's response in https://github.com/ericvsmith/dataclasses/issues/51#issuecomment-347769322, I'm just going to leave it as-is, with a strict type requirement for all 6 methods. Eric.
On 29 November 2017 at 04:31, Eric V. Smith
On 11/28/17 7:02 AM, Nick Coghlan wrote:
So in the above example, you would have:
B.__field_layout__ is B True C1.__field_layout__ is B True C2.__field_layout__ is B True
It would then be up to the dataclass decorator to set `__field_layout__` correctly, using the follow rules:
1. Use the just-defined class if the class defines any fields 2. Use the just-defined class if it inherits from multiple base classes that define fields and don't already share an MRO 3. Use a base class if that's either the only base class that defines fields, or if all other base classes that define fields are already in the MRO of that base class
That seems like a lot of complication for a feature that will be rarely used. I'll give it some thought, especially the MI logic.
I think what you're laying out is an optimization for "do the classes have identical fields, inherited through a common base class or classes", right?
It's a combination of that and "How do I get my own class to compare equal with a dataclass instance?". However, having the dataclass methods return NotImplemented for mismatched types should be enough to enable interoperability, since it will leave the question up to the other type. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 25.11.2017 22:06, Eric V. Smith wrote:
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.
This PEP looks very promising and will make my life quite a bit easier, since we are using a pattern involving data classes. Currently, we write the constructor by hand.
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields.
This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types? - Sebastian
On 11/27/2017 6:01 AM, Sebastian Rittau wrote:
On 25.11.2017 22:06, Eric V. Smith wrote:
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields.
This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types?
First off, I expect this feature to be used extremely rarely. I'm tempted to remove it, since it's infrequently needed and it could be added later. And as the PEP points out, you can get most of the way with an alternate classmethod constructor. I had something like your suggestion half coded up, except I inspected the args to __post_init__() and added them to __init__, avoiding the API-unfriendly *args and **kwargs. So in: @dataclass class C: x: int y: int def __post_init__(self, database: DatabaseType): pass Then the __init__ signature became: def __init__(self, x:int, y:int, database:DatabaseType): In the end, that seems like a lot of magic (but what about this isn't?), it required the inspect module to be imported, and I thought it made more sense for all of the init params to be near each other: @dataclass class C: x: int y: int database: InitVar[DatabaseType] def __post_init__(self, database): pass No matter what we do here, static type checkers are going to have to be aware of either the InitVars or the hoisting of params from __post_init__ to __init__. One other thing about InitVar: it lets you control where the init-only parameter goes in the __init__ call. This is especially important with default values: @dataclass class C: x: int database: InitVar[DatabaseType] y: int = 0 def __post_init__(self, database): pass In this case, if I were hoisting params from __post_init__ to __init__, the __init__ call would be: def __init__(self, x, y=0, database) Which is an error. I guess you could say the init-only parameters would go first in the __init__ definition, but then you have the same problem if any of them have default values. Eric.
On 27.11.2017 13:23, Eric V. Smith wrote:
I had something like your suggestion half coded up, except I inspected the args to __post_init__() and added them to __init__, avoiding the API-unfriendly *args and **kwargs. I understand your concerns with *args and **kwargs. I think we need to find a solution for that eventually.
One other thing about InitVar: it lets you control where the init-only parameter goes in the __init__ call. This is especially important with default values:
This is indeed a nice property. I was thinking about that myself and how to best handle it. One use case that could occur in out codebase is passing in a "context" argument. By convention, this is always the first argument to the constructor, so it would be nice if this would also work for dataclasses. - Sebastian
On 11/27/2017 7:31 AM, Sebastian Rittau wrote:
On 27.11.2017 13:23, Eric V. Smith wrote:
I had something like your suggestion half coded up, except I inspected the args to __post_init__() and added them to __init__, avoiding the API-unfriendly *args and **kwargs. I understand your concerns with *args and **kwargs. I think we need to find a solution for that eventually.
One other thing about InitVar: it lets you control where the init-only parameter goes in the __init__ call. This is especially important with default values:
This is indeed a nice property. I was thinking about that myself and how to best handle it. One use case that could occur in out codebase is passing in a "context" argument. By convention, this is always the first argument to the constructor, so it would be nice if this would also work for dataclasses.
And that's the one thing that you can't do with an alternate classmethod constructor, and is the reason I added InitVar: you can't force a non-field parameter such as a context (or in my example, a database) to be always present when instances are constructed. And also consider the "replace()" module method. InitVars must also be supplied there, whereas with a classmethod constructor, they wouldn't be. This is for the case where a context or database is needed to construct the instance, but isn't stored as a field on the instance. Again, not super-common, but it does happen. My point here is not that InitVar is better than __post_init__ parameter hoisting for this specific need, but that both of them provide something that classmethod constructors do not. I'll add some wording on this to the PEP. Eric.
On 27.11.2017 12:01, Sebastian Rittau wrote:
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields.
This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types?
I am sorry for the double post, but I thought a bit more about why this does not right with me: * As written above, InitVars look like fields, but aren't. * InitVar goes against the established way to pass through arguments, *args and **kwargs. While type checking those is an unsolved problem, from what I understand, I don't think we should introduce a second way just for dataclasses. * InitVars look like a way to satisfy the type checker without providing any benefit to the programmer. Even when I'm not interested in type checking, I have to declare init vars. * InitVars force me to repeat myself. I have the InitVar declaration and then I have the repeat myself in the signature of __post_init__(). This has all the usual problems of repeated code. I hope I did not misunderstood the purpose of InitVar. - Sebastian
On 11/27/2017 7:26 AM, Sebastian Rittau wrote:
On 27.11.2017 12:01, Sebastian Rittau wrote:
The major changes from the previous version are:
- Add InitVar to specify initialize-only fields.
This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types?
I am sorry for the double post, but I thought a bit more about why this does not right with me:
* As written above, InitVars look like fields, but aren't.
Same as with ClassVars, which is where the inspiration came from.
* InitVar goes against the established way to pass through arguments, *args and **kwargs. While type checking those is an unsolved problem, from what I understand, I don't think we should introduce a second way just for dataclasses. * InitVars look like a way to satisfy the type checker without providing any benefit to the programmer. Even when I'm not interested in type checking, I have to declare init vars.
Same as with ClassVars, if you're using them. And that's not just a dataclasses thing, although dataclasses is the first place I know of where it would change the code semantics.
* InitVars force me to repeat myself. I have the InitVar declaration and then I have the repeat myself in the signature of __post_init__(). This has all the usual problems of repeated code.
There was some discussion about this starting at https://github.com/ericvsmith/dataclasses/issues/17#issuecomment-345529717, in particular a few messages down where we discussed what would be repeated, and what mypy would be able to deduce. You won't need to repeat the type declaration.
I hope I did not misunderstood the purpose of InitVar.
I think you understand it perfectly well, especially with the "context" discussion. Thanks for bringing it up. Eric.
participants (7)
-
Brett Cannon
-
Carl Meyer
-
Eric V. Smith
-
Greg Ewing
-
Guido van Rossum
-
Nick Coghlan
-
Sebastian Rittau