Mailman 3 Second post: PEP 557, Data Classes - Python-Dev

newer
Removing files from the repository

Second post: PEP 557, Data Classes

older
Can Python guarantee the order of...

Eric V. Smith

25 Nov 2017 25 Nov '17

9:06 p.m.

The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly. The major changes from the previous version are: - Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic. The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77 Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first. Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work. Feedback is welcomed! Eric.

Show replies by date

Eric V. Smith

26 Nov 26 Nov

12:44 a.m.

One more change: - Per-field metadata, for use by third parties. Also, thanks to Guido and Ivan for all of their feedback on the various issues that got the PEP to this point. Eric. On 11/25/2017 4:06 PM, Eric V. Smith wrote:

...

The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.

The major changes from the previous version are:

- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.

The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77

Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.

Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.

Feedback is welcomed!

Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...

Carl Meyer

5:23 a.m.

Hi Eric, Really excited about this PEP, thanks for working on it. A couple minor questions:

...

If compare is True, then eq is ignored, and __eq__ and __ne__ will be automatically generated.

IMO it's generally preferable to make nonsensical parameter combinations an immediate error, rather than silently ignore one of them. Is there a strong reason for letting nonsense pass silently here? (I reviewed the previous thread; there was a lot of discussion about enums/flags vs two boolean params, but I didn't see explicit discussion of this issue; the only passing references I noticed said the invalid combo should be "disallowed", e.g. Guido in [1], which to me implies "an error.")

...

isdataclass(instance): Returns True if instance is an instance of a Data Class, otherwise returns False.

Something smells wrong with the naming here. If I have @dataclass class Person: name: str I think it would be considered obvious and undeniable (in English prose, anyway) that Person is a dataclass. So it seems wrong to have `isdataclass(Person)` return `False`. Is there a reason not to let it handle either a class or an instance (looks like it would actually simplify the implementation)? Carl [1] https://mail.python.org/pipermail/python-dev/2017-September/149505.html On 11/25/2017 01:06 PM, Eric V. Smith wrote:

...

The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.

The major changes from the previous version are:

- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.

The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77

Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.

Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.

Feedback is welcomed!

Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/carl%40oddbird.net

Eric V. Smith

8:35 a.m.

On 11/26/2017 12:23 AM, Carl Meyer wrote:

...

Hi Eric,

Really excited about this PEP, thanks for working on it.

Thanks, Carl.

...

A couple minor questions:

...
If compare is True, then eq is ignored, and __eq__ and __ne__ will be automatically generated.

IMO it's generally preferable to make nonsensical parameter combinations an immediate error, rather than silently ignore one of them. Is there a strong reason for letting nonsense pass silently here?

(I reviewed the previous thread; there was a lot of discussion about enums/flags vs two boolean params, but I didn't see explicit discussion of this issue; the only passing references I noticed said the invalid combo should be "disallowed", e.g. Guido in [1], which to me implies "an error.")

I think you're right here. I'll change it to a ValueError.

...

...
isdataclass(instance): Returns True if instance is an instance of a Data Class, otherwise returns False.

Something smells wrong with the naming here. If I have

@dataclass class Person: name: str

I think it would be considered obvious and undeniable (in English prose, anyway) that Person is a dataclass. So it seems wrong to have `isdataclass(Person)` return `False`. Is there a reason not to let it handle either a class or an instance (looks like it would actually simplify the implementation)?

I think of this as really "isdataclassinstance". Let's see what others think. There are places in dataclasses.py where I need to know both ways: for example fields() works with either a class or instance, but asdict() just an instance. For what it's worth, the equivalent attrs API, attr.has(), returns True for both an instance and a class. And the recommended solution for a namedtuple (check for existence of _fields) would also work for an instance and a class. And I suppose it's easy enough for the caller to further disallow a class, if that's what they want. Eric.

...

Carl

[1] https://mail.python.org/pipermail/python-dev/2017-September/149505.html

On 11/25/2017 01:06 PM, Eric V. Smith wrote:

...
The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.

The major changes from the previous version are:

- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.

The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51 - Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77

Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.

Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.

Feedback is welcomed!

Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/carl%40oddbird.net

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...

Eric V. Smith

8:48 a.m.

On 11/26/2017 3:35 AM, Eric V. Smith wrote:

...

On 11/26/2017 12:23 AM, Carl Meyer wrote:

...
A couple minor questions:

...
If compare is True, then eq is ignored, and __eq__ and __ne__ will be automatically generated.

IMO it's generally preferable to make nonsensical parameter combinations an immediate error, rather than silently ignore one of them. Is there a strong reason for letting nonsense pass silently here?

(I reviewed the previous thread; there was a lot of discussion about enums/flags vs two boolean params, but I didn't see explicit discussion of this issue; the only passing references I noticed said the invalid combo should be "disallowed", e.g. Guido in [1], which to me implies "an error.")

I think you're right here. I'll change it to a ValueError.

While creating an issue for this (https://github.com/ericvsmith/dataclasses/issues/88), it occurs to me that the class-level parameter really should be "order" or "orderable", not "compare". It made more sense when it was called "cmp", but "compare" now seems wrong. Because "eq" says "can I compare two instances", and what's currently called "compare" is "can I order two instances". Nick had a similar suggestion before the PEP was written (https://mail.python.org/pipermail/python-ideas/2017-May/045740.html). The field-level parameter should stay "compare", because it's used for both __gt__ and friends, as well as __eq__ and __ne__. It's saying "is this field used in all of the comparison methods". Eric.

Eric V. Smith

3:14 p.m.

On 11/26/2017 3:48 AM, Eric V. Smith wrote:

...

While creating an issue for this (https://github.com/ericvsmith/dataclasses/issues/88), it occurs to me that the class-level parameter really should be "order" or "orderable", not "compare". It made more sense when it was called "cmp", but "compare" now seems wrong.

Because "eq" says "can I compare two instances", and what's currently called "compare" is "can I order two instances". Nick had a similar suggestion before the PEP was written (https://mail.python.org/pipermail/python-ideas/2017-May/045740.html).

The field-level parameter should stay "compare", because it's used for both __gt__ and friends, as well as __eq__ and __ne__. It's saying "is this field used in all of the comparison methods".

I created https://github.com/ericvsmith/dataclasses/issues/90 for this. I think I'll leave 'eq' alone, and change 'compare' to 'order', for the class-level parameter name. Eric.

Brett Cannon

6:04 p.m.

On Sat, Nov 25, 2017, 14:00 Eric V. Smith, wrote:

...

The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.

The major changes from the previous version are:

- Add InitVar to specify initialize-only fields. - Renamed __dataclass_post_init__() to __post_init(). - Rename cmp to compare. - Added eq, separate from compare, so you can test unorderable items for equality. - Flushed out asdict() and astuple(). - Changed replace() to just call __init__(), and dropped the complex post-create logic.

It looks great and I'm excited to get to start using this PEP!

...

The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51

I say don't require the type comparison for duck typing purposes. -Brett

...

- Should the replace() function be renamed to something else? https://github.com/ericvsmith/dataclasses/issues/77

Most of the items that were previously discussed on python-dev were discussed in detail at https://github.com/ericvsmith/dataclasses. Before rehashing an old discussion, please check there first.

Also at https://github.com/ericvsmith/dataclasses is an implementation, with tests, that should work with 3.6 and 3.7. The only action item for the code is to clean up the implementation of InitVar, but that's waiting for PEP 560. Oh, and if PEP 563 is accepted I'll also need to do some work.

Feedback is welcomed!

Eric. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org

Eric V. Smith

8:22 p.m.

On 11/26/2017 1:04 PM, Brett Cannon wrote:

...

The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51

I say don't require the type comparison for duck typing purposes.

The problem with that is that you end up with cases like this, which I don't think we want: @dataclass class Point: x: int y: int @dataclass class Point3d: x: int y: int z: int assert Point(1, 2) == Point3d(1, 2, 3) Eric.

Nick Coghlan

27 Nov 27 Nov

4:07 a.m.

On 27 November 2017 at 06:22, Eric V. Smith wrote:

...

On 11/26/2017 1:04 PM, Brett Cannon wrote:

...
The only open issues I know of are: - Should object comparison require an exact match on the type? https://github.com/ericvsmith/dataclasses/issues/51

I say don't require the type comparison for duck typing purposes.

The problem with that is that you end up with cases like this, which I don't think we want:

@dataclass class Point: x: int y: int

@dataclass class Point3d: x: int y: int z: int

assert Point(1, 2) == Point3d(1, 2, 3)

Perhaps the check could be: (type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match) That way the exact type check would be an optimisation to speed up the common case, while the formal semantic constraint would be that the field definitions have to match (including their names and order). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Greg Ewing

5:04 a.m.

Nick Coghlan wrote:

...

Perhaps the check could be:

(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)

I think the types should *always* have to match, or at least one should be a subclass of the other. Consider: @dataclass class Point3d: x: float y: float z: float @dataclass class Vector3d: x: float y: float z: float Points and vectors are different things, and they should never compare equal, even if they have the same field names and values. -- Greg

Nick Coghlan

6:04 a.m.

On 27 November 2017 at 15:04, Greg Ewing wrote:

...

Nick Coghlan wrote:

...
Perhaps the check could be:

(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)

I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:

@dataclass class Point3d: x: float y: float z: float

@dataclass class Vector3d: x: float y: float z: float

Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.

And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Eric V. Smith

10:56 a.m.

On 11/27/2017 1:04 AM, Nick Coghlan wrote:

...

On 27 November 2017 at 15:04, Greg Ewing wrote:

...
Nick Coghlan wrote:

...
Perhaps the check could be:

(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)

I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:

@dataclass class Point3d: x: float y: float z: float

@dataclass class Vector3d: x: float y: float z: float

Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.

And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.

And in this case you could also do: astuple(point) == astuple(vector) Eric.

Guido van Rossum

3:51 p.m.

Following up on this subthread (inline below). On Mon, Nov 27, 2017 at 2:56 AM, Eric V. Smith wrote:

...

On 11/27/2017 1:04 AM, Nick Coghlan wrote:

...
On 27 November 2017 at 15:04, Greg Ewing wrote:

...
Nick Coghlan wrote:

...
Perhaps the check could be:

(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)

I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:

@dataclass class Point3d: x: float y: float z: float

@dataclass class Vector3d: x: float y: float z: float

Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.

And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.

And in this case you could also do: astuple(point) == astuple(vector)

Didn't we at one point have something like isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match> (plus some optimization if the types are identical)? That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal. -- --Guido van Rossum (python.org/~guido)

Eric V. Smith

4 p.m.

On 11/27/17 10:51 AM, Guido van Rossum wrote:

...

Following up on this subthread (inline below).

On Mon, Nov 27, 2017 at 2:56 AM, Eric V. Smith mailto:eric@trueblade.com> wrote:

On 11/27/2017 1:04 AM, Nick Coghlan wrote:

On 27 November 2017 at 15:04, Greg Ewing mailto:greg.ewing@canterbury.ac.nz> wrote:

Nick Coghlan wrote:

Perhaps the check could be:

(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)

I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:

@dataclass class Point3d: x: float y: float z: float

@dataclass class Vector3d: x: float y: float z: float

Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.

And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.

And in this case you could also do: astuple(point) == astuple(vector)

Didn't we at one point have something like

isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match>

(plus some optimization if the types are identical)?

That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal.

I don't think we had that before, but it sounds right to me. I think it could be: isinstance(other, self.__class__) and len(fields(other)) == len(fields(self)) and <all individual fields match> Since by definition if you're a subclass you'll start with all of the same fields. So if the len's match, you won't have added any new fields. That should be sufficiently cheap. Then the optimized version would be: (self.__class__ is other.__class__) or (isinstance(other, self.__class__) and len(fields(other)) == len(fields(self))) and <all individual fields match> I'd probably further optimize len(fields(obj)), but that's the general idea. Eric.

Guido van Rossum

4:15 p.m.

Sounds good. On Nov 27, 2017 8:00 AM, "Eric V. Smith" wrote:

...

On 11/27/17 10:51 AM, Guido van Rossum wrote:

...
Following up on this subthread (inline below).

On Mon, Nov 27, 2017 at 2:56 AM, Eric V. Smith mailto:eric@trueblade.com> wrote:

On 11/27/2017 1:04 AM, Nick Coghlan wrote:

On 27 November 2017 at 15:04, Greg Ewing mailto:greg.ewing@canterbury.ac.nz> wrote:

Nick Coghlan wrote:

Perhaps the check could be:

(type(lhs) == type(rhs) or fields(lhs) == fields(rhs)) and all (individual fields match)

I think the types should *always* have to match, or at least one should be a subclass of the other. Consider:

@dataclass class Point3d: x: float y: float z: float

@dataclass class Vector3d: x: float y: float z: float

Points and vectors are different things, and they should never compare equal, even if they have the same field names and values.

And I guess if folks actually want more permissive structure-based matching, that's one of the features that collections.namedtuple offers that data classes don't.

And in this case you could also do: astuple(point) == astuple(vector)

Didn't we at one point have something like

isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match>

(plus some optimization if the types are identical)?

That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal.

I don't think we had that before, but it sounds right to me. I think it could be:

isinstance(other, self.__class__) and len(fields(other)) == len(fields(self)) and <all individual fields match>

Since by definition if you're a subclass you'll start with all of the same fields. So if the len's match, you won't have added any new fields. That should be sufficiently cheap.

Then the optimized version would be:

(self.__class__ is other.__class__) or (isinstance(other, self.__class__) and len(fields(other)) == len(fields(self))) and <all individual fields match>

I'd probably further optimize len(fields(obj)), but that's the general idea.

Eric.

Eric V. Smith

28 Nov 28 Nov

7:41 a.m.

On 11/27/2017 10:51 AM, Guido van Rossum wrote:

...

Following up on this subthread (inline below).

...

Didn't we at one point have something like

isinstance(other, self.__class__) and fields(other) == fields(self) and <all individual fields match>

(plus some optimization if the types are identical)?

That feels ideal, because it means you can subclass Point just to add some methods and it will stay comparable, but if you add fields it will always be unequal.

One thing this doesn't let you do is compare instances of two different subclasses of a base type: @dataclass class B: i: int @dataclass class C1(B): pass @dataclass class C2(B): pass You can't compare C1(0) and C2(0), because neither one is an instance of the other's type. The test to get this case to work would be expensive: find the common ancestor, and then make sure no fields have been added since then. And I haven't thought through multiple inheritance. I suggest we don't try to support this case. Eric.

Nick Coghlan

12:02 p.m.

On 28 November 2017 at 17:41, Eric V. Smith wrote:

...

One thing this doesn't let you do is compare instances of two different subclasses of a base type:

@dataclass class B: i: int

@dataclass class C1(B): pass

@dataclass class C2(B): pass

You can't compare C1(0) and C2(0), because neither one is an instance of the other's type. The test to get this case to work would be expensive: find the common ancestor, and then make sure no fields have been added since then. And I haven't thought through multiple inheritance.

I suggest we don't try to support this case.

That gets you onto problematic ground as far as transitivity is concerned, since you'd end up with the following: >>> b = B(0); c1 = C1(0); c2 = C2(0) >>> c1 == b True >>> b == c2 True >>> c1 == c2 False However, I think you can fix this by injecting the first base in the MRO that defines a data field as a "__field_layout__" class attribute, and then have the comparison methods check for "other.__field_layout__ is self.__field_layout__", rather than checking the runtime class directly. So in the above example, you would have:

...

...
...
B.__field_layout__ is B True C1.__field_layout__ is B True C2.__field_layout__ is B True

It would then be up to the dataclass decorator to set `__field_layout__` correctly, using the follow rules: 1. Use the just-defined class if the class defines any fields 2. Use the just-defined class if it inherits from multiple base classes that define fields and don't already share an MRO 3. Use a base class if that's either the only base class that defines fields, or if all other base classes that define fields are already in the MRO of that base class Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Eric V. Smith

6:31 p.m.

On 11/28/17 7:02 AM, Nick Coghlan wrote:

...

On 28 November 2017 at 17:41, Eric V. Smith wrote:

...
One thing this doesn't let you do is compare instances of two different subclasses of a base type:

@dataclass class B: i: int

@dataclass class C1(B): pass

@dataclass class C2(B): pass

You can't compare C1(0) and C2(0), because neither one is an instance of the other's type. The test to get this case to work would be expensive: find the common ancestor, and then make sure no fields have been added since then. And I haven't thought through multiple inheritance.

I suggest we don't try to support this case.

That gets you onto problematic ground as far as transitivity is concerned, since you'd end up with the following:

Excellent point, thanks for raising it.

...

>>> b = B(0); c1 = C1(0); c2 = C2(0) >>> c1 == b True >>> b == c2 True >>> c1 == c2 False

However, I think you can fix this by injecting the first base in the MRO that defines a data field as a "__field_layout__" class attribute, and then have the comparison methods check for "other.__field_layout__ is self.__field_layout__", rather than checking the runtime class directly.

So in the above example, you would have:

...
...
...
B.__field_layout__ is B True C1.__field_layout__ is B True C2.__field_layout__ is B True

It would then be up to the dataclass decorator to set `__field_layout__` correctly, using the follow rules:

1. Use the just-defined class if the class defines any fields 2. Use the just-defined class if it inherits from multiple base classes that define fields and don't already share an MRO 3. Use a base class if that's either the only base class that defines fields, or if all other base classes that define fields are already in the MRO of that base class

That seems like a lot of complication for a feature that will be rarely used. I'll give it some thought, especially the MI logic. I think what you're laying out is an optimization for "do the classes have identical fields, inherited through a common base class or classes", right? Eric.

Guido van Rossum

6:57 p.m.

I would also be happy with a retreat, where we define `__eq__` to insist that the classes are the same, and people can override this to their hearts' content.

Eric V. Smith

8:56 p.m.

On 11/28/2017 1:57 PM, Guido van Rossum wrote:

...

I would also be happy with a retreat, where we define `__eq__` to insist that the classes are the same, and people can override this to their hearts' content.

I agree. And I guess we could always add it later, if there's a huge demand and someone writes a decent implementation. There would be a slight backwards incompatibility, though. Frankly, I think it would never be needed. One question remains: do we do what attrs does: for `__eq__` and `__ne__` use an exact type match, and for the 4 ordered comparison operators use an isinstance check? On the comparison operators, they also ignore attributes defined on any derived class [0]. As I said, I couldn't find an attrs issue that discusses their choice. I'll ask Hynek over on the dataclasses github issue. Currently the dataclasses code on master uses an exact type match for all 6 methods. Eric. [0] That is, they do the following (using dataclasses terms): Given: @dataclass class B: i: int j: int @dataclass class C: k: int Then B.__eq__ is: def __eq__(self, other): if other.__class__ is self.__class__: return (other.i, other.j) == (self.i, self.j) return NotImplemented And B.__lt__ is: def __lt__(self, other): if isinstance(other, self.__class__): return (other.i, other.j) < (self.i, self.j) return NotImplemented So if you do: b = B(1, 2) c = C(1, 2, 3) Then `B(1, 2) < C(1, 2, 3)` ignores `c.k`.

Guido van Rossum

9:14 p.m.

On Tue, Nov 28, 2017 at 12:56 PM, Eric V. Smith wrote:

...

On 11/28/2017 1:57 PM, Guido van Rossum wrote:

...
I would also be happy with a retreat, where we define `__eq__` to insist that the classes are the same, and people can override this to their hearts' content.

I agree. And I guess we could always add it later, if there's a huge demand and someone writes a decent implementation. There would be a slight backwards incompatibility, though. Frankly, I think it would never be needed.

One question remains: do we do what attrs does: for `__eq__` and `__ne__` use an exact type match, and for the 4 ordered comparison operators use an isinstance check? On the comparison operators, they also ignore attributes defined on any derived class [0]. As I said, I couldn't find an attrs issue that discusses their choice. I'll ask Hynek over on the dataclasses github issue.

Currently the dataclasses code on master uses an exact type match for all 6 methods.

Eric.

[0] That is, they do the following (using dataclasses terms):

Given:

@dataclass class B: i: int j: int

@dataclass class C: k: int

Then B.__eq__ is:

def __eq__(self, other): if other.__class__ is self.__class__: return (other.i, other.j) == (self.i, self.j) return NotImplemented

And B.__lt__ is:

def __lt__(self, other): if isinstance(other, self.__class__): return (other.i, other.j) < (self.i, self.j) return NotImplemented

So if you do: b = B(1, 2) c = C(1, 2, 3)

Then `B(1, 2) < C(1, 2, 3)` ignores `c.k`.

Hm. Maybe for the ordering comparisons we could defer to the class with the longest list of fields, as long as there's a subtype relationship? That way b<c and c>b would be equivalent, and both would use C.__gt__. Which had better not reject this on the basis that other is not an instance of a subclass of C. IIRC there's already something in the interpreter that tries the most derived class first for binary operators -- that may force our hand here. -- --Guido van Rossum (python.org/~guido)

Eric V. Smith

29 Nov 29 Nov

1:31 a.m.

On 11/28/2017 4:14 PM, Guido van Rossum wrote:

...

Hm. Maybe for the ordering comparisons we could defer to the class with the longest list of fields, as long as there's a subtype relationship? That way b<c and c>b would be equivalent, and both would use C.__gt__. Which had better not reject this on the basis that other is not an instance of a subclass of C.

IIRC there's already something in the interpreter that tries the most derived class first for binary operators -- that may force our hand here.

I'm leaning toward doing the same thing attrs does. They have much more experience with this. This is my last open specification issue. I think I'm ready for a pronouncement on the PEP once I do one final editing pass, in hopes of getting this in 3.7.0a3 by next weekend. If the comparisons need to change, I'm okay with doing that in the beta. Eric.

Eric V. Smith

8:35 a.m.

On 11/28/2017 8:31 PM, Eric V. Smith wrote:

...

On 11/28/2017 4:14 PM, Guido van Rossum wrote:

...
Hm. Maybe for the ordering comparisons we could defer to the class with the longest list of fields, as long as there's a subtype relationship? That way b<c and c>b would be equivalent, and both would use C.__gt__. Which had better not reject this on the basis that other is not an instance of a subclass of C.

IIRC there's already something in the interpreter that tries the most derived class first for binary operators -- that may force our hand here.

I'm leaning toward doing the same thing attrs does. They have much more experience with this.

Except that given Hynek's response in https://github.com/ericvsmith/dataclasses/issues/51#issuecomment-347769322, I'm just going to leave it as-is, with a strict type requirement for all 6 methods. Eric.

Nick Coghlan

6:19 a.m.

On 29 November 2017 at 04:31, Eric V. Smith wrote:

...

On 11/28/17 7:02 AM, Nick Coghlan wrote:

...
So in the above example, you would have:

...
...
...
B.__field_layout__ is B True C1.__field_layout__ is B True C2.__field_layout__ is B True

It would then be up to the dataclass decorator to set `__field_layout__` correctly, using the follow rules:

1. Use the just-defined class if the class defines any fields 2. Use the just-defined class if it inherits from multiple base classes that define fields and don't already share an MRO 3. Use a base class if that's either the only base class that defines fields, or if all other base classes that define fields are already in the MRO of that base class

That seems like a lot of complication for a feature that will be rarely used. I'll give it some thought, especially the MI logic.

I think what you're laying out is an optimization for "do the classes have identical fields, inherited through a common base class or classes", right?

It's a combination of that and "How do I get my own class to compare equal with a dataclass instance?". However, having the dataclass methods return NotImplemented for mismatched types should be enough to enable interoperability, since it will leave the question up to the other type. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Sebastian Rittau

27 Nov 27 Nov

11:01 a.m.

On 25.11.2017 22:06, Eric V. Smith wrote:

...

The updated version should show up at https://www.python.org/dev/peps/pep-0557/ shortly.

This PEP looks very promising and will make my life quite a bit easier, since we are using a pattern involving data classes. Currently, we write the constructor by hand.

...

The major changes from the previous version are:

- Add InitVar to specify initialize-only fields.

This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types? - Sebastian

Eric V. Smith

12:23 p.m.

On 11/27/2017 6:01 AM, Sebastian Rittau wrote:

...

On 25.11.2017 22:06, Eric V. Smith wrote:

...

...
The major changes from the previous version are:

- Add InitVar to specify initialize-only fields.

This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types?

First off, I expect this feature to be used extremely rarely. I'm tempted to remove it, since it's infrequently needed and it could be added later. And as the PEP points out, you can get most of the way with an alternate classmethod constructor. I had something like your suggestion half coded up, except I inspected the args to __post_init__() and added them to __init__, avoiding the API-unfriendly *args and **kwargs. So in: @dataclass class C: x: int y: int def __post_init__(self, database: DatabaseType): pass Then the __init__ signature became: def __init__(self, x:int, y:int, database:DatabaseType): In the end, that seems like a lot of magic (but what about this isn't?), it required the inspect module to be imported, and I thought it made more sense for all of the init params to be near each other: @dataclass class C: x: int y: int database: InitVar[DatabaseType] def __post_init__(self, database): pass No matter what we do here, static type checkers are going to have to be aware of either the InitVars or the hoisting of params from __post_init__ to __init__. One other thing about InitVar: it lets you control where the init-only parameter goes in the __init__ call. This is especially important with default values: @dataclass class C: x: int database: InitVar[DatabaseType] y: int = 0 def __post_init__(self, database): pass In this case, if I were hoisting params from __post_init__ to __init__, the __init__ call would be: def __init__(self, x, y=0, database) Which is an error. I guess you could say the init-only parameters would go first in the __init__ definition, but then you have the same problem if any of them have default values. Eric.

Sebastian Rittau

12:31 p.m.

On 27.11.2017 13:23, Eric V. Smith wrote:

...

I had something like your suggestion half coded up, except I inspected the args to __post_init__() and added them to __init__, avoiding the API-unfriendly *args and **kwargs. I understand your concerns with *args and **kwargs. I think we need to find a solution for that eventually.

...

One other thing about InitVar: it lets you control where the init-only parameter goes in the __init__ call. This is especially important with default values:

This is indeed a nice property. I was thinking about that myself and how to best handle it. One use case that could occur in out codebase is passing in a "context" argument. By convention, this is always the first argument to the constructor, so it would be nice if this would also work for dataclasses. - Sebastian

Eric V. Smith

12:42 p.m.

On 11/27/2017 7:31 AM, Sebastian Rittau wrote:

...

On 27.11.2017 13:23, Eric V. Smith wrote:

...
I had something like your suggestion half coded up, except I inspected the args to __post_init__() and added them to __init__, avoiding the API-unfriendly *args and **kwargs. I understand your concerns with *args and **kwargs. I think we need to find a solution for that eventually.

...
One other thing about InitVar: it lets you control where the init-only parameter goes in the __init__ call. This is especially important with default values:

This is indeed a nice property. I was thinking about that myself and how to best handle it. One use case that could occur in out codebase is passing in a "context" argument. By convention, this is always the first argument to the constructor, so it would be nice if this would also work for dataclasses.

And that's the one thing that you can't do with an alternate classmethod constructor, and is the reason I added InitVar: you can't force a non-field parameter such as a context (or in my example, a database) to be always present when instances are constructed. And also consider the "replace()" module method. InitVars must also be supplied there, whereas with a classmethod constructor, they wouldn't be. This is for the case where a context or database is needed to construct the instance, but isn't stored as a field on the instance. Again, not super-common, but it does happen. My point here is not that InitVar is better than __post_init__ parameter hoisting for this specific need, but that both of them provide something that classmethod constructors do not. I'll add some wording on this to the PEP. Eric.

Sebastian Rittau

12:26 p.m.

On 27.11.2017 12:01, Sebastian Rittau wrote:

...

...
The major changes from the previous version are:

- Add InitVar to specify initialize-only fields.

This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types?

I am sorry for the double post, but I thought a bit more about why this does not right with me: * As written above, InitVars look like fields, but aren't. * InitVar goes against the established way to pass through arguments, *args and **kwargs. While type checking those is an unsolved problem, from what I understand, I don't think we should introduce a second way just for dataclasses. * InitVars look like a way to satisfy the type checker without providing any benefit to the programmer. Even when I'm not interested in type checking, I have to declare init vars. * InitVars force me to repeat myself. I have the InitVar declaration and then I have the repeat myself in the signature of __post_init__(). This has all the usual problems of repeated code. I hope I did not misunderstood the purpose of InitVar. - Sebastian

Eric V. Smith

1:02 p.m.

On 11/27/2017 7:26 AM, Sebastian Rittau wrote:

...

On 27.11.2017 12:01, Sebastian Rittau wrote:

...
...
The major changes from the previous version are:

- Add InitVar to specify initialize-only fields.

This is the only feature that does not sit right with me. It looks very obscure and "hacky". From what I understand, we are supposed to use the field syntax to define constructor arguments. I'd argue that the name "initialize-only fields" is a misnomer, which only hides the fact that this has nothing to do with fields at all. Couldn't dataclassses just pass *args and **kwargs to __post_init__()? Type checkers need to be special-cases for InitVar anyway, couldn't they instead be special cased to look at __post_init__ argument types?

I am sorry for the double post, but I thought a bit more about why this does not right with me:

* As written above, InitVars look like fields, but aren't.

Same as with ClassVars, which is where the inspiration came from.

...

* InitVar goes against the established way to pass through arguments, *args and **kwargs. While type checking those is an unsolved problem, from what I understand, I don't think we should introduce a second way just for dataclasses. * InitVars look like a way to satisfy the type checker without providing any benefit to the programmer. Even when I'm not interested in type checking, I have to declare init vars.

Same as with ClassVars, if you're using them. And that's not just a dataclasses thing, although dataclasses is the first place I know of where it would change the code semantics.

...

* InitVars force me to repeat myself. I have the InitVar declaration and then I have the repeat myself in the signature of __post_init__(). This has all the usual problems of repeated code.

There was some discussion about this starting at https://github.com/ericvsmith/dataclasses/issues/17#issuecomment-345529717, in particular a few messages down where we discussed what would be repeated, and what mypy would be able to deduce. You won't need to repeat the type declaration.

...

I hope I did not misunderstood the purpose of InitVar.

I think you understand it perfectly well, especially with the "context" discussion. Thanks for bringing it up. Eric.

2339

Age (days ago)

2343

Last active (days ago)

List overview

Download

29 comments

7 participants

participants (7)

Brett Cannon
Carl Meyer
Eric V. Smith
Greg Ewing
Guido van Rossum
Nick Coghlan
Sebastian Rittau

Second post: PEP 557, Data Classes

tags

participants (7)