Consider adding an iterable option to dataclass

It would be nice if dataclasses (https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass) had an option to make them a sequence. This would make dataclass(frozen=True, order=True, sequence=True) an optionally-typed version of namedtuple. It would almost totally supplant it except that namedtuples have a smaller memory footprint. sequence would simply inherit from collections.abc.Sequence and implement the two methods __len__ and __getitme__. Best, Neil

On 8/10/2018 7:01 PM, Neil Girdhar wrote:
Note that type.NamedTuple already gives you typed namedtuples. Admittedly the feature set is different from dataclasses, though.
sequence would simply inherit from collections.abc.Sequence and implement the two methods __len__ and __getitme__.
Unless I'm misunderstanding you, this falls in to the same problem as setting __slots__: you need to return a new class, in this case since you can't add inheritance after the fact. I don't think __isinstancecheck__ helps you here, but maybe I'm missing something (I'm not a big user of inheritance or ABCs). Not that returning a new class is impossible, it's just that I didn't want to do it in the first go-round with dataclasses. For slots, I have a sample @add_slots() at https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py. Maybe we could do something similar with @add_sequence() and test it out? It would have to be a little more sophisticated than @add_slots(), since it would need to iterate over __dataclass_fields__, etc. I'm on vacation next week, maybe I'll play around with this. Eric

My only motivation for this idea is so that I can forget about namedtuple. Thinking about it again today, I withdraw my suggestion until I one day see a need for it. On Fri, Aug 10, 2018 at 10:14 PM Eric V. Smith <eric@trueblade.com> wrote:
That's a fair point. I'm sure you know that your decorator could always return a new class that inherits from both Sequence and the original class. As a user of dataclass, I never assumed that it wouldn't do this.
Cool, have a great vacation.

On 11 August 2018 at 01:29, Eric V. Smith <eric@trueblade.com> wrote:
Here are three points to add: 1. collections.abc.Sequence doesn't have a __subclasshook__, i.e. it doesn't support structural behaviour. There was an idea as a part of PEP 544 to make Sequence and Mapping structural, but it was rejected after all. 2. Mutating __bases__ doesn't require creating a new class. So one can just add Sequence after creation. That said, I don't like this idea, `typing` used to do some manipulations with bases, and it caused several confusions and subtle bugs, until it was "standardised" in PEP 560. 3. In my experience with some real life code the most used tuple API in named tuples is unpacking, for example: class Row(NamedTuple): id: int name: str rows: List[Row] for id, name in rows: ... I proposed to add it some time ago in https://github.com/ericvsmith/dataclasses/issues/21, it will be enough to just generate an __iter__ (btw such classes will be automatically subclasses of collections.abc.Iterable, which is structural): @data(iterable=True)class Point: x: int y: int origin = Point(0, 0) x, y = origin But this idea was postponed/deferred. Maybe we can reconsider it? -- Ivan

On Fri, Aug 10, 2018 at 04:01:59PM -0700, Neil Girdhar wrote:
Do you have a use-case or reason for this other than "it would be nice"? Nice in what way? We already have namedtuple, and for backwards compatibility if no other reason it won't be going away. What benefit do we get from allowing dataclasses to do what namedtuple already does? You already mentioned one disadvantage: namedtuple is much more memory efficient. What corresponding benefit do you see? Dataclass already supports explicit conversion to tuples and dicts. What use-cases for sequence-ness don't they support? Conceptually, I think of dataclasses as a record or a struct, not as a sequence. (I'll admit that I think of namedtuples the same way, and almost never make use of their tuple-ness.) I would find it strange for dataclass to support a sequence API out of the box. -- Steve

On 8/10/2018 7:01 PM, Neil Girdhar wrote:
Note that type.NamedTuple already gives you typed namedtuples. Admittedly the feature set is different from dataclasses, though.
sequence would simply inherit from collections.abc.Sequence and implement the two methods __len__ and __getitme__.
Unless I'm misunderstanding you, this falls in to the same problem as setting __slots__: you need to return a new class, in this case since you can't add inheritance after the fact. I don't think __isinstancecheck__ helps you here, but maybe I'm missing something (I'm not a big user of inheritance or ABCs). Not that returning a new class is impossible, it's just that I didn't want to do it in the first go-round with dataclasses. For slots, I have a sample @add_slots() at https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py. Maybe we could do something similar with @add_sequence() and test it out? It would have to be a little more sophisticated than @add_slots(), since it would need to iterate over __dataclass_fields__, etc. I'm on vacation next week, maybe I'll play around with this. Eric

My only motivation for this idea is so that I can forget about namedtuple. Thinking about it again today, I withdraw my suggestion until I one day see a need for it. On Fri, Aug 10, 2018 at 10:14 PM Eric V. Smith <eric@trueblade.com> wrote:
That's a fair point. I'm sure you know that your decorator could always return a new class that inherits from both Sequence and the original class. As a user of dataclass, I never assumed that it wouldn't do this.
Cool, have a great vacation.

On 11 August 2018 at 01:29, Eric V. Smith <eric@trueblade.com> wrote:
Here are three points to add: 1. collections.abc.Sequence doesn't have a __subclasshook__, i.e. it doesn't support structural behaviour. There was an idea as a part of PEP 544 to make Sequence and Mapping structural, but it was rejected after all. 2. Mutating __bases__ doesn't require creating a new class. So one can just add Sequence after creation. That said, I don't like this idea, `typing` used to do some manipulations with bases, and it caused several confusions and subtle bugs, until it was "standardised" in PEP 560. 3. In my experience with some real life code the most used tuple API in named tuples is unpacking, for example: class Row(NamedTuple): id: int name: str rows: List[Row] for id, name in rows: ... I proposed to add it some time ago in https://github.com/ericvsmith/dataclasses/issues/21, it will be enough to just generate an __iter__ (btw such classes will be automatically subclasses of collections.abc.Iterable, which is structural): @data(iterable=True)class Point: x: int y: int origin = Point(0, 0) x, y = origin But this idea was postponed/deferred. Maybe we can reconsider it? -- Ivan

On Fri, Aug 10, 2018 at 04:01:59PM -0700, Neil Girdhar wrote:
Do you have a use-case or reason for this other than "it would be nice"? Nice in what way? We already have namedtuple, and for backwards compatibility if no other reason it won't be going away. What benefit do we get from allowing dataclasses to do what namedtuple already does? You already mentioned one disadvantage: namedtuple is much more memory efficient. What corresponding benefit do you see? Dataclass already supports explicit conversion to tuples and dicts. What use-cases for sequence-ness don't they support? Conceptually, I think of dataclasses as a record or a struct, not as a sequence. (I'll admit that I think of namedtuples the same way, and almost never make use of their tuple-ness.) I would find it strange for dataclass to support a sequence API out of the box. -- Steve
participants (4)
-
Eric V. Smith
-
Ivan Levkivskyi
-
Neil Girdhar
-
Steven D'Aprano