Add collections.abc.Struct as virtual base class of namedtuple classes and dataclass-decorated classes.

I feel like this proposal is not quite right, but maybe the idea will provoke some thoughts about something similar that -would- be right. The idea first came to me upon realizing that since `namedtuple` classes have no special base class beyond `tuple`, there should be some way of identifying them as being primarily structural, even though they are instances of `tuple` which, in other cases, usually means something that is primarily sequential. It is easily possible to identify a namedtuple by checking to see whether it has an `_asdict` method, but it might be nice to formalize the way that struct-like objects are identified in a virtual base class and be something that had application beyond just checking whether a `tuple` is a `namedtuple`.

This is the kernel of a really useful idea, IMO. In fact, I already proposed something related to this-- unification of dict-making APIs-- back in May: https://mail.python.org/pipermail/python-ideas/2019-May/056491.html But I like the Struct ABC approach you've suggested, too. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Wed, Oct 16, 2019 at 1:30 PM Steve Jorgensen <stevej@stevej.name> wrote:

On Wed, Oct 16, 2019 at 05:27:31PM -0000, Steve Jorgensen wrote:
I feel like this proposal is not quite right, but maybe the idea will provoke some thoughts about something similar that -would- be right.
I don't think that "structural" is the word you are looking for. Lists and tuples and sets are all structural. That's why they're called "data structures". Perhaps "collection of named fields"? Sorry, I can't think of a shorter name. I'm going to put aside the question of whether named tuples are "primarily" tuples or named attributes. You are begging the question: why "should" there be some means of identifying named tuples specifically? I'm not saying that there shouldn't be, but you haven't given any reasons why this is useful. What calling code will want to treat these named tuples the same way and so need to distinguish them as "Structs"? namedtuple("Point", ("x", "y")) namedtuple("Employee", ("name", "position", "id")) There are also built-in named-tuples that don't include the _asdict method, such as sys.hash_info. Other "Structs" (collections of named fields) might include SimpleNamespace and Dataclasses, not to mention any object with named attributes holding data. So it isn't really clear to me that this is a useful, or even meaningful, category. At the very least, we need to establish what the category means other than "created by namedtuple() factory function". -- Steven

On Oct 16, 2019, at 13:48, Steven D'Aprano <steve@pearwood.info> wrote:
I think what he’s looking for is the adjective that goes with the quasi-word “struct” as used in C, and in Python’s ctypes and C API. The type name “Struct” isn’t bad here, but the adjective “structural” has to many other connotations. Maybe borrowing a word from a different language—record or datatype or something—would avoid that problem? But I think the big problem here is the one you identified. We can make a list of things that are “like C structs” (which should probably include structseq builtin/extension types, ctypes.Struct, third-party stuff like attrs and maybe numpy structured dtypes, in addition to namedtuple and dataclass). Maybe we could even come up with a way to either programmatically detect them or get them all registered. But what would you do with it? It might be useful for reflection if there were a way to enumerate all of the “struct fields” that worked for all “struct types”, but there isn’t. Also, while all of these things _allow_ you to define a plain-old-data struct, none of them _require_ it. You can have a dataclass, namedtuple subclass, structseq, ctypes.Struct, etc. that has all kinds of fancy behavior, including not just methods, but @property and weird __new__ overloads and even custom __getattribute__. And of course you can just as easily define a regular old class and then use it as a plain-old-data struct. So, would this even be useful as a hint, much less a guarantee, of anything?

Maybe borrowing a word from a different language—record or datatype or something—would avoid that problem?
Hmm, I don't think "record" would work. The terms "record", "struct", and "tuple" are far too synonymous with one another ( https://en.wikipedia.org/wiki/Record_(computer_science)). This term would have to be something that would apply to namedtuples (or similarly structured datatype) without applying to tuples as well. Perhaps something like "associative container" might be more viable? This would apply to any type that could be directly converted to or represented as a dictionary. On Wed, Oct 16, 2019 at 8:18 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

On Oct 16, 2019, at 22:13, Kyle Stanley <aeros167@gmail.com> wrote:
Maybe borrowing a word from a different language—record or datatype or something—would avoid that problem?
Hmm, I don't think "record" would work. The terms "record", "struct", and "tuple" are far too synonymous with one another (https://en.wikipedia.org/wiki/Record_(computer_science)).
Typically there’s a distinction between a record and a tuple—in fact, two distinctions. A tuple or product type is just the product of the types of its fields. All tuples of a str and an int are the same type, (str, int). Even if you give that type multiple names. For example, if you define file=(str, int) to represent a path and an fd and later define person=(str, int) to represent a name and age, file and person are the same type. A record type or datatype is a distinct type from all other types, even if they happen to have the same field types. A file(fd: int, path: str) is not the same type as a person(age: int, name: str). Sometimes field names are optional (and a record type with field names is a “proper” record type). From a quick scan of the Wikipedia article you linked, it appears to agree with this:
A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. In type theory, product types (with no field names) are generally preferred due to their simplicity, but proper record types are studied in languages such as System F-sub.
I’m sure there’s at least one exception that uses “record” to mean a plain product type, but I don’t think any major languages do. So I don’t think there’s any danger of people expecting Record to include tuples based on analogy with other languages, or type-theory definition, or anything else.
Perhaps something like "associative container" might be more viable? This would apply to any type that could be directly converted to or represented as a dictionary.
Surely it’s dict itself (along with third-party things like sorteddict types) that’s the associative container. That’s how C++ and other languages use the term. And if you look up “associative container” on Wikipedia, it redirects to “associative array”, which is the abstract data type of things like dict. It has nothing to do with records, except in a few languages like JavaScript that merge OO objects and dictionaries into the same thing. Also, what exactly does “could be directly converted to or represented as a dictionary” mean? A namedtuple isn’t stored or represented as a dictionary, and the only way to convert it to a dictionary is by building a dictionary on the fly out of zipping the field names with the values. That’s if anything less direct than a normal class, where you just have to access the instance’s __dict__. Finally, why are we even bikeshedding this? As Steven already explained, there’s no point in trying to nail down details for something when it’s not even remotely clear what it’s meant to be, or even could be, used for.

Steve Jorgensen wrote:
Along with this idea, maybe `namedtuple` should became a type (not just a callable as it is now) for classes produced by calling `namedtuple()`. The `namedtuple` type could in turn inherit from `tuple` and from `Record` (or `Struct` or whatever name is decided for that).

See Raymond Hettinger's comment about doing something like that: https://bugs.python.org/msg74136 It is a key feature for named tuples that they are exactly equivalent to a
hand-written class.
So unless the core devs are given a very good reason to do that, it is definitely not going to happen.

On Oct 17, 2019, at 01:35, Steve Jorgensen <stevej@stevej.name> wrote:
Along with this idea, maybe `namedtuple` should became a type (not just a callable as it is now) for classes produced by calling `namedtuple()`. The `namedtuple` type could in turn inherit from `tuple` and from `Record` (or `Struct` or whatever name is decided for that).
If it’s the type of the classes, then it’s a metaclass. And it definitely shouldn’t inherit from tuple or Record; that’s mixing up the levels (like making the list type be a Sequence instead of making list objects be Sequences). If it’s the type of the objects, then each namedtuple is no longer its own type, which defeats the purpose of many uses of namedtuple. Also, it would be a very weird class where instantiating the class doesn’t give you an instance, but instead something that has to be called again to get an instance. If there were an exceptionally good reason for that, “very weird” might be acceptable, but I don’t think “making all namedtuple instances detectable as instances of Record” qualifies. Also, you could get what you want without changing namedtuple from a function: just register the class before returning it (or add it as a base up top) and you’re done. In fact. If there were a Record ABC, you could trivially do this yourself: def recordtuple(*args, **kw): cls = collections.namedtuple(*args, **kw) collections.abc.Record.register(cls) return cls But meanwhile, you still haven’t given any reason why Record would be useful. What exactly is it supposed to mean? When would you want to check it? What would you do with an object only if you know it’s a Record? This isn’t like checking for a POD type in C++ or a struct type in Swift, because none of the important things that tells you even make sense in Python.

Andrew Barnert wrote: [snip]
The reason I am interested in having it is to disambiguate collections that are primarily to be treated as collections of values from those that are primarily to be accessed as attributes with values. As an example, when converting to JSON, it would make more sense to represent a regular tuple as a JSON array, but it would make more sense to represent a namedtuple as a JSON object.

On 10/17/2019 2:36 PM, Steve Jorgensen wrote:
This seems specific to your application. I don't think the language can disambiguate this for you. I think the best you can do is use the known way of looking for namedtuples and dataclasses. Although I think that in dataclasses case, is_dataclass is an anti-pattern and I should not have added it. For your use case, what's the difference between these two classes: @dataclass class A: x: int y: float class B: def __init__(self, x, y): self.x = x self.y = y Do you really want code that treats these differently, just because one used some shortcut to write the __init__ function? Eric

Eric V. Smith wrote:
My thinking is that checking for type of `Struct` (or `Record` or whatever name it has) is most useful in cases when the object is also identifiable as a collection and is to disambiguate those that are primarily collections from those that are not. A miscellaneous class like… def __init__(self, x, y): self.x = x self.y = y …would not be a collection, so there would be no need to disambiguate it.

On Oct 17, 2019, at 16:03, Steve Jorgensen <stevej@stevej.name> wrote:
My thinking is that checking for type of `Struct` (or `Record` or whatever name it has) is most useful in cases when the object is also identifiable as a collection and is to disambiguate those that are primarily collections from those that are not.
Can you give us an actual use case—some code that you have, or at least something you could reasonably want to write—where you need to distinguish something’s structy-ness, but have to work around it, so this feature would make your code better?

This is the kernel of a really useful idea, IMO. In fact, I already proposed something related to this-- unification of dict-making APIs-- back in May: https://mail.python.org/pipermail/python-ideas/2019-May/056491.html But I like the Struct ABC approach you've suggested, too. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Wed, Oct 16, 2019 at 1:30 PM Steve Jorgensen <stevej@stevej.name> wrote:

On Wed, Oct 16, 2019 at 05:27:31PM -0000, Steve Jorgensen wrote:
I feel like this proposal is not quite right, but maybe the idea will provoke some thoughts about something similar that -would- be right.
I don't think that "structural" is the word you are looking for. Lists and tuples and sets are all structural. That's why they're called "data structures". Perhaps "collection of named fields"? Sorry, I can't think of a shorter name. I'm going to put aside the question of whether named tuples are "primarily" tuples or named attributes. You are begging the question: why "should" there be some means of identifying named tuples specifically? I'm not saying that there shouldn't be, but you haven't given any reasons why this is useful. What calling code will want to treat these named tuples the same way and so need to distinguish them as "Structs"? namedtuple("Point", ("x", "y")) namedtuple("Employee", ("name", "position", "id")) There are also built-in named-tuples that don't include the _asdict method, such as sys.hash_info. Other "Structs" (collections of named fields) might include SimpleNamespace and Dataclasses, not to mention any object with named attributes holding data. So it isn't really clear to me that this is a useful, or even meaningful, category. At the very least, we need to establish what the category means other than "created by namedtuple() factory function". -- Steven

On Oct 16, 2019, at 13:48, Steven D'Aprano <steve@pearwood.info> wrote:
I think what he’s looking for is the adjective that goes with the quasi-word “struct” as used in C, and in Python’s ctypes and C API. The type name “Struct” isn’t bad here, but the adjective “structural” has to many other connotations. Maybe borrowing a word from a different language—record or datatype or something—would avoid that problem? But I think the big problem here is the one you identified. We can make a list of things that are “like C structs” (which should probably include structseq builtin/extension types, ctypes.Struct, third-party stuff like attrs and maybe numpy structured dtypes, in addition to namedtuple and dataclass). Maybe we could even come up with a way to either programmatically detect them or get them all registered. But what would you do with it? It might be useful for reflection if there were a way to enumerate all of the “struct fields” that worked for all “struct types”, but there isn’t. Also, while all of these things _allow_ you to define a plain-old-data struct, none of them _require_ it. You can have a dataclass, namedtuple subclass, structseq, ctypes.Struct, etc. that has all kinds of fancy behavior, including not just methods, but @property and weird __new__ overloads and even custom __getattribute__. And of course you can just as easily define a regular old class and then use it as a plain-old-data struct. So, would this even be useful as a hint, much less a guarantee, of anything?

Maybe borrowing a word from a different language—record or datatype or something—would avoid that problem?
Hmm, I don't think "record" would work. The terms "record", "struct", and "tuple" are far too synonymous with one another ( https://en.wikipedia.org/wiki/Record_(computer_science)). This term would have to be something that would apply to namedtuples (or similarly structured datatype) without applying to tuples as well. Perhaps something like "associative container" might be more viable? This would apply to any type that could be directly converted to or represented as a dictionary. On Wed, Oct 16, 2019 at 8:18 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:

On Oct 16, 2019, at 22:13, Kyle Stanley <aeros167@gmail.com> wrote:
Maybe borrowing a word from a different language—record or datatype or something—would avoid that problem?
Hmm, I don't think "record" would work. The terms "record", "struct", and "tuple" are far too synonymous with one another (https://en.wikipedia.org/wiki/Record_(computer_science)).
Typically there’s a distinction between a record and a tuple—in fact, two distinctions. A tuple or product type is just the product of the types of its fields. All tuples of a str and an int are the same type, (str, int). Even if you give that type multiple names. For example, if you define file=(str, int) to represent a path and an fd and later define person=(str, int) to represent a name and age, file and person are the same type. A record type or datatype is a distinct type from all other types, even if they happen to have the same field types. A file(fd: int, path: str) is not the same type as a person(age: int, name: str). Sometimes field names are optional (and a record type with field names is a “proper” record type). From a quick scan of the Wikipedia article you linked, it appears to agree with this:
A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. In type theory, product types (with no field names) are generally preferred due to their simplicity, but proper record types are studied in languages such as System F-sub.
I’m sure there’s at least one exception that uses “record” to mean a plain product type, but I don’t think any major languages do. So I don’t think there’s any danger of people expecting Record to include tuples based on analogy with other languages, or type-theory definition, or anything else.
Perhaps something like "associative container" might be more viable? This would apply to any type that could be directly converted to or represented as a dictionary.
Surely it’s dict itself (along with third-party things like sorteddict types) that’s the associative container. That’s how C++ and other languages use the term. And if you look up “associative container” on Wikipedia, it redirects to “associative array”, which is the abstract data type of things like dict. It has nothing to do with records, except in a few languages like JavaScript that merge OO objects and dictionaries into the same thing. Also, what exactly does “could be directly converted to or represented as a dictionary” mean? A namedtuple isn’t stored or represented as a dictionary, and the only way to convert it to a dictionary is by building a dictionary on the fly out of zipping the field names with the values. That’s if anything less direct than a normal class, where you just have to access the instance’s __dict__. Finally, why are we even bikeshedding this? As Steven already explained, there’s no point in trying to nail down details for something when it’s not even remotely clear what it’s meant to be, or even could be, used for.

Steve Jorgensen wrote:
Along with this idea, maybe `namedtuple` should became a type (not just a callable as it is now) for classes produced by calling `namedtuple()`. The `namedtuple` type could in turn inherit from `tuple` and from `Record` (or `Struct` or whatever name is decided for that).

See Raymond Hettinger's comment about doing something like that: https://bugs.python.org/msg74136 It is a key feature for named tuples that they are exactly equivalent to a
hand-written class.
So unless the core devs are given a very good reason to do that, it is definitely not going to happen.

On Oct 17, 2019, at 01:35, Steve Jorgensen <stevej@stevej.name> wrote:
Along with this idea, maybe `namedtuple` should became a type (not just a callable as it is now) for classes produced by calling `namedtuple()`. The `namedtuple` type could in turn inherit from `tuple` and from `Record` (or `Struct` or whatever name is decided for that).
If it’s the type of the classes, then it’s a metaclass. And it definitely shouldn’t inherit from tuple or Record; that’s mixing up the levels (like making the list type be a Sequence instead of making list objects be Sequences). If it’s the type of the objects, then each namedtuple is no longer its own type, which defeats the purpose of many uses of namedtuple. Also, it would be a very weird class where instantiating the class doesn’t give you an instance, but instead something that has to be called again to get an instance. If there were an exceptionally good reason for that, “very weird” might be acceptable, but I don’t think “making all namedtuple instances detectable as instances of Record” qualifies. Also, you could get what you want without changing namedtuple from a function: just register the class before returning it (or add it as a base up top) and you’re done. In fact. If there were a Record ABC, you could trivially do this yourself: def recordtuple(*args, **kw): cls = collections.namedtuple(*args, **kw) collections.abc.Record.register(cls) return cls But meanwhile, you still haven’t given any reason why Record would be useful. What exactly is it supposed to mean? When would you want to check it? What would you do with an object only if you know it’s a Record? This isn’t like checking for a POD type in C++ or a struct type in Swift, because none of the important things that tells you even make sense in Python.

Andrew Barnert wrote: [snip]
The reason I am interested in having it is to disambiguate collections that are primarily to be treated as collections of values from those that are primarily to be accessed as attributes with values. As an example, when converting to JSON, it would make more sense to represent a regular tuple as a JSON array, but it would make more sense to represent a namedtuple as a JSON object.

On 10/17/2019 2:36 PM, Steve Jorgensen wrote:
This seems specific to your application. I don't think the language can disambiguate this for you. I think the best you can do is use the known way of looking for namedtuples and dataclasses. Although I think that in dataclasses case, is_dataclass is an anti-pattern and I should not have added it. For your use case, what's the difference between these two classes: @dataclass class A: x: int y: float class B: def __init__(self, x, y): self.x = x self.y = y Do you really want code that treats these differently, just because one used some shortcut to write the __init__ function? Eric

Eric V. Smith wrote:
My thinking is that checking for type of `Struct` (or `Record` or whatever name it has) is most useful in cases when the object is also identifiable as a collection and is to disambiguate those that are primarily collections from those that are not. A miscellaneous class like… def __init__(self, x, y): self.x = x self.y = y …would not be a collection, so there would be no need to disambiguate it.

On Oct 17, 2019, at 16:03, Steve Jorgensen <stevej@stevej.name> wrote:
My thinking is that checking for type of `Struct` (or `Record` or whatever name it has) is most useful in cases when the object is also identifiable as a collection and is to disambiguate those that are primarily collections from those that are not.
Can you give us an actual use case—some code that you have, or at least something you could reasonably want to write—where you need to distinguish something’s structy-ness, but have to work around it, so this feature would make your code better?
participants (8)
-
Andrew Barnert
-
Eric V. Smith
-
Greg Ewing
-
Kyle Stanley
-
MRAB
-
Ricky Teachey
-
Steve Jorgensen
-
Steven D'Aprano