Why do we have two obvious ways to create a simple data structure? Let's deprecate one.

[Migrating the discussion from https://bugs.python.org/issue44768.] PEP 20 says:
There should be one-- and preferably only one --obvious way to do it.
There are two ways to create a simple named type to store data: collections.namedtuple and dataclasses.dataclass. I propose deprecating namedtuple. As far as the interface is concerned, the namedtuple is almost completely equivalent to a frozen dataclass - with some iterability syntactic sugar thrown in. I don't think there are use cases for namedtuple that would not be trivial to rewrite with dataclasses. As far as the implementation is concerned, the namedtuple is faster. If efficiency is a concern, why do we make our users decide? We can choose the most efficient one on the library's end. C++ does something similar with bool vectors - the library has a special case for where it would be more optimal to use a different data structure underneath. TL;DR: If dataclass is so good, why keep namedtuple around?

I was writing some code the other day, and it needed a quick-and-dirty data structure definition for a set of related variables. I looked back at other code to try be consistent, and found that I used dataclasses in some parts and namedtuples in others. Both seemed the right thing to do at the time - almost to the extent where I could change one way for the other and it would still be the same code. You can easily get a dataclass represented as tuple and vice versa. The way they work under the hood may be different, but the interfaces are very close. Two different modules are doing practically the same thing!

On Thu, Jul 29, 2021 at 9:37 AM <pavel@lexyr.com> wrote:
I was writing some code the other day, and it needed a quick-and-dirty data structure definition for a set of related variables. I looked back at other code to try be consistent, and found that I used dataclasses in some parts and namedtuples in others. Both seemed the right thing to do at the time - almost to the extent where I could change one way for the other and it would still be the same code.
You can easily get a dataclass represented as tuple and vice versa. The way they work under the hood may be different, but the interfaces are very close. Two different modules are doing practically the same thing!
Interfaces ARE frequently very similar, and that's deliberate. We have tuples, lists, dicts, all using subscript notation. That's a good thing! And if your code sometimes uses one and sometimes uses the other, that seems like fairly good evidence that you find both of them useful. I think you just answered your own question :) ChrisA

It's not actually that, although that's a good point you are making. I found myself using both of them not because one is more useful in certain cases and the other in others in small and niche ways. Both of the times I just used the latest one that came to mind. The fact that two different classes in two different modules have similar enough APIs that one could easily substitute one for another or vice versa without second thought depending on which StackOverflow answer comes up higher in the search (to oversimplify) with very little breaking doesn't sit well with me at all.

I frequently use each of namedtuples and data classes in contexts where the other one would not be appropriate. Yes, I also sometimes use an object where either would serve... In those cases, mostly SimpleNamespace would likewise be fine. So would a one line class definition. On Wed, Jul 28, 2021, 4:54 PM <pavel@lexyr.com> wrote:

Hi, I ran across this nice article a few days ago - https://death.andgravity.com/namedtuples which provides some answers as to why you might consider using named tuples. best, —titus

I'm with you; since dataclasses were introduced, namedtuple has not see any use from me, though none of my uses have demanded ultra-high efficiency either. I wonder how many users are currently relying on namedtuple __getitem__ semantics though. that's functionality dataclasses do not (currently) have. Random thought I don't know the answer to: Any reason __slots__ can't be used on a dataclass to improve efficiency? On Wed, 2021-07-28 at 22:22 +0000, pavel@lexyr.com wrote:

On Thu, Jul 29, 2021 at 9:28 AM Paul Bryan <pbryan@anode.ca> wrote:
It can. It still doesn't get efficiency down to where a namedtuple is, and it also doesn't make it into a tuple. Allow me to simplify matters. You could, instead of using a regular tuple, just use a dict with keys that are consecutive integers. This would behave in a very similar way, and then you could subclass the dict to change the iteration behaviour (or even rely on old-style fallback iteration). Does that mean that tuples are nothing but special-purpose dicts for efficiency? Not even slightly. They are different data structures for different purposes. It's the same with namedtuple, dataclass, and all the other different tools we have. They serve very different purposes. And even if one of them could truly be described as a higher-performance less-flexible version of another, there'd still be very little benefit to deprecating it. Remember that breaking people's code is a VERY serious concern. ChrisA

I've thing I still use NamedTuple for is when I want type safe heterogeneous iterable unpacking, which is only possible for tuples (and NamedTuple) eg I'd like to be able to express both: tx, rx = trio.MemoryChanel[int]() And: with trio.MemoryChannel[int]() as channel: n.start_soon(worker, channel.recieve_channel.clone()) On Thu, 29 Jul 2021, 00:36 Chris Angelico, <rosuav@gmail.com> wrote:

From https://death.andgravity.com/namedtuples linked to above by Titus:
To me, this is a perfect case of behavior that namedtuples are suited towards. It shows that namedtuples are still necessary despite the presence of dataclasses. The maintainability problems of namedtuples came from people using them where the order actually wasn't important. I wouldn't be opposed to adding this to the documentation for namedtuple:: .. note:: Also consider whether frozen :mod:`dataclasses` suits your use case. It may be a better tool to use if: * the members of your collection are not inherently ordered * you would like to apply type annotations Feel free to chime in if you can think of other reasons to use a dataclass instead! I definitely feel that the backwards compatibility implications of deprecating it are huge. Meanwhile, the presence of dataclasses means that there are far fewer use cases where namedtuples are the right tool. Let's document that! On Wed, Jul 28, 2021 at 7:41 PM Thomas Grainger <tagrain@gmail.com> wrote:

I like this idea! It's true that deprecating the namedtuple we lose important semantics of order - however, if its use as a weaker dataclass is explicitly discouraged in the documentation, then much of the problem (i.e., people that want to have a simple data object use two incompatible things depending on which parts of the the Python doc they read first) simply fades away.

On Thu, Jul 29, 2021 at 01:10:39AM -0000, pavel@lexyr.com wrote:
Why is that a problem? I can write notes with a pen or a pencil. Sometimes I *need* a pen, or I *need* a pencil. But most of the time I will choose whichever happens to come to mind at the time. No big deal. As far as I am concerned, the only problem here is that you are taking the "There should be one..." koan too seriously, and misinterpreting what it means. It does *not* mean "there shouldn't be more than one way to do two slightly different things". Dataclasses are not a special kind of object, they are a framework for automating a number of common methods used in class definitions. Dataclasses are just classes with a fancy meta-API to automate common methods. As such, you are right: with sufficient cleverness, we could probably develop a dataclass API to generate named tuples. But we already have an API to generate named tuples. In the history of Python, named tuples came first. But even if dataclasses had come first, we would still want named tuples. We would need an API to create them. Why do you care whether that API is spelled "@dataclass(...)" or "collections.namedtuple(...)"? -- Steve

I'm with you on the backwards-compatibility front. Changing Python fast and for no particular reason incurs a big cost. Is the reason good enough to justify removing a chunk of the interface? Good question. To your dict argument: if there was a native Pythonic way to make a frozen list, what would a tuple's purpose be then, if not just another name for that? That also seems like an appropriate analogy.

On Thu, Jul 29, 2021 at 10:00 AM <pavel@lexyr.com> wrote:
I'm with you on the backwards-compatibility front. Changing Python fast and for no particular reason incurs a big cost. Is the reason good enough to justify removing a chunk of the interface? Good question.
Answer: Almost never. Case in point: Python 3.10.0b2+ (heads/3.10:33a7a24288, Jun 9 2021, 20:47:39) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
The threading module *STILL* has these methods, despite them having been deprecated since Python 2.6. The benefit of removing them is almost zero, and the cost of removing them is breakage. So they are still there. (In this case, since there's a perfect equivalence - threading.current_thread() and Thread.name - the deprecated ones are no longer listed in the docs, but the code still works.) The bar for removing functionality and breaking code is high.
To your dict argument: if there was a native Pythonic way to make a frozen list, what would a tuple's purpose be then, if not just another name for that? That also seems like an appropriate analogy.
The question would first be: what is a frozen list's purpose that a tuple is not doing? Once you answer that, you can then answer the question of whether a tuple is still necessary. In the case of namedtuple and dataclass, this has already been answered. ChrisA

On Wed, Jul 28, 2021 at 11:58:37PM -0000, pavel@lexyr.com wrote:
We do have a Pythonic way to make a frozen list. It is spelled "tuple". I know that people often say that lists are intended for homogeneous data like [1, 2, 3] and tuples are intended for heterogeneous data like (1, 2.0, "three") but that's not mandatory. -- Steve

MyPy has plans to support heterogenous tuple representation, it turns out. It is tracked here: https://github.com/python/mypy/issues/5152 - albeit not very actively.

What about creating a dataclasses.datatuple decorator, that replaces typing.NamedTuple, and offers the same (or restricted) interface as regular dataclasses? This would make the distinction explicit between a mutable, object-like dataclass, and the immutable, tuple-like named-/datatuple. With this, we could also get rid of the fake-immutable frozen dataclasses. In the same way, a dataclasses.datadict could replace typing.TypedDict, which is very limited at this point anyway. So instead of trying to merge different concepts, let's use one create one interface to rule them all. Because, after all: "There should be one-- and preferably only one --obvious way to do it."

On Thu, Jul 29, 2021 at 9:19 PM Joren Hammudoglu <jhammudoglu@gmail.com> wrote:
https://xkcd.com/927/ ChrisA

I was writing some code the other day, and it needed a quick-and-dirty data structure definition for a set of related variables. I looked back at other code to try be consistent, and found that I used dataclasses in some parts and namedtuples in others. Both seemed the right thing to do at the time - almost to the extent where I could change one way for the other and it would still be the same code. You can easily get a dataclass represented as tuple and vice versa. The way they work under the hood may be different, but the interfaces are very close. Two different modules are doing practically the same thing!

On Thu, Jul 29, 2021 at 9:37 AM <pavel@lexyr.com> wrote:
I was writing some code the other day, and it needed a quick-and-dirty data structure definition for a set of related variables. I looked back at other code to try be consistent, and found that I used dataclasses in some parts and namedtuples in others. Both seemed the right thing to do at the time - almost to the extent where I could change one way for the other and it would still be the same code.
You can easily get a dataclass represented as tuple and vice versa. The way they work under the hood may be different, but the interfaces are very close. Two different modules are doing practically the same thing!
Interfaces ARE frequently very similar, and that's deliberate. We have tuples, lists, dicts, all using subscript notation. That's a good thing! And if your code sometimes uses one and sometimes uses the other, that seems like fairly good evidence that you find both of them useful. I think you just answered your own question :) ChrisA

It's not actually that, although that's a good point you are making. I found myself using both of them not because one is more useful in certain cases and the other in others in small and niche ways. Both of the times I just used the latest one that came to mind. The fact that two different classes in two different modules have similar enough APIs that one could easily substitute one for another or vice versa without second thought depending on which StackOverflow answer comes up higher in the search (to oversimplify) with very little breaking doesn't sit well with me at all.

I frequently use each of namedtuples and data classes in contexts where the other one would not be appropriate. Yes, I also sometimes use an object where either would serve... In those cases, mostly SimpleNamespace would likewise be fine. So would a one line class definition. On Wed, Jul 28, 2021, 4:54 PM <pavel@lexyr.com> wrote:

Hi, I ran across this nice article a few days ago - https://death.andgravity.com/namedtuples which provides some answers as to why you might consider using named tuples. best, —titus

I'm with you; since dataclasses were introduced, namedtuple has not see any use from me, though none of my uses have demanded ultra-high efficiency either. I wonder how many users are currently relying on namedtuple __getitem__ semantics though. that's functionality dataclasses do not (currently) have. Random thought I don't know the answer to: Any reason __slots__ can't be used on a dataclass to improve efficiency? On Wed, 2021-07-28 at 22:22 +0000, pavel@lexyr.com wrote:

On Thu, Jul 29, 2021 at 9:28 AM Paul Bryan <pbryan@anode.ca> wrote:
It can. It still doesn't get efficiency down to where a namedtuple is, and it also doesn't make it into a tuple. Allow me to simplify matters. You could, instead of using a regular tuple, just use a dict with keys that are consecutive integers. This would behave in a very similar way, and then you could subclass the dict to change the iteration behaviour (or even rely on old-style fallback iteration). Does that mean that tuples are nothing but special-purpose dicts for efficiency? Not even slightly. They are different data structures for different purposes. It's the same with namedtuple, dataclass, and all the other different tools we have. They serve very different purposes. And even if one of them could truly be described as a higher-performance less-flexible version of another, there'd still be very little benefit to deprecating it. Remember that breaking people's code is a VERY serious concern. ChrisA

I've thing I still use NamedTuple for is when I want type safe heterogeneous iterable unpacking, which is only possible for tuples (and NamedTuple) eg I'd like to be able to express both: tx, rx = trio.MemoryChanel[int]() And: with trio.MemoryChannel[int]() as channel: n.start_soon(worker, channel.recieve_channel.clone()) On Thu, 29 Jul 2021, 00:36 Chris Angelico, <rosuav@gmail.com> wrote:

From https://death.andgravity.com/namedtuples linked to above by Titus:
To me, this is a perfect case of behavior that namedtuples are suited towards. It shows that namedtuples are still necessary despite the presence of dataclasses. The maintainability problems of namedtuples came from people using them where the order actually wasn't important. I wouldn't be opposed to adding this to the documentation for namedtuple:: .. note:: Also consider whether frozen :mod:`dataclasses` suits your use case. It may be a better tool to use if: * the members of your collection are not inherently ordered * you would like to apply type annotations Feel free to chime in if you can think of other reasons to use a dataclass instead! I definitely feel that the backwards compatibility implications of deprecating it are huge. Meanwhile, the presence of dataclasses means that there are far fewer use cases where namedtuples are the right tool. Let's document that! On Wed, Jul 28, 2021 at 7:41 PM Thomas Grainger <tagrain@gmail.com> wrote:

I like this idea! It's true that deprecating the namedtuple we lose important semantics of order - however, if its use as a weaker dataclass is explicitly discouraged in the documentation, then much of the problem (i.e., people that want to have a simple data object use two incompatible things depending on which parts of the the Python doc they read first) simply fades away.

On Thu, Jul 29, 2021 at 01:10:39AM -0000, pavel@lexyr.com wrote:
Why is that a problem? I can write notes with a pen or a pencil. Sometimes I *need* a pen, or I *need* a pencil. But most of the time I will choose whichever happens to come to mind at the time. No big deal. As far as I am concerned, the only problem here is that you are taking the "There should be one..." koan too seriously, and misinterpreting what it means. It does *not* mean "there shouldn't be more than one way to do two slightly different things". Dataclasses are not a special kind of object, they are a framework for automating a number of common methods used in class definitions. Dataclasses are just classes with a fancy meta-API to automate common methods. As such, you are right: with sufficient cleverness, we could probably develop a dataclass API to generate named tuples. But we already have an API to generate named tuples. In the history of Python, named tuples came first. But even if dataclasses had come first, we would still want named tuples. We would need an API to create them. Why do you care whether that API is spelled "@dataclass(...)" or "collections.namedtuple(...)"? -- Steve

I'm with you on the backwards-compatibility front. Changing Python fast and for no particular reason incurs a big cost. Is the reason good enough to justify removing a chunk of the interface? Good question. To your dict argument: if there was a native Pythonic way to make a frozen list, what would a tuple's purpose be then, if not just another name for that? That also seems like an appropriate analogy.

On Thu, Jul 29, 2021 at 10:00 AM <pavel@lexyr.com> wrote:
I'm with you on the backwards-compatibility front. Changing Python fast and for no particular reason incurs a big cost. Is the reason good enough to justify removing a chunk of the interface? Good question.
Answer: Almost never. Case in point: Python 3.10.0b2+ (heads/3.10:33a7a24288, Jun 9 2021, 20:47:39) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
The threading module *STILL* has these methods, despite them having been deprecated since Python 2.6. The benefit of removing them is almost zero, and the cost of removing them is breakage. So they are still there. (In this case, since there's a perfect equivalence - threading.current_thread() and Thread.name - the deprecated ones are no longer listed in the docs, but the code still works.) The bar for removing functionality and breaking code is high.
To your dict argument: if there was a native Pythonic way to make a frozen list, what would a tuple's purpose be then, if not just another name for that? That also seems like an appropriate analogy.
The question would first be: what is a frozen list's purpose that a tuple is not doing? Once you answer that, you can then answer the question of whether a tuple is still necessary. In the case of namedtuple and dataclass, this has already been answered. ChrisA

On Wed, Jul 28, 2021 at 11:58:37PM -0000, pavel@lexyr.com wrote:
We do have a Pythonic way to make a frozen list. It is spelled "tuple". I know that people often say that lists are intended for homogeneous data like [1, 2, 3] and tuples are intended for heterogeneous data like (1, 2.0, "three") but that's not mandatory. -- Steve

MyPy has plans to support heterogenous tuple representation, it turns out. It is tracked here: https://github.com/python/mypy/issues/5152 - albeit not very actively.

What about creating a dataclasses.datatuple decorator, that replaces typing.NamedTuple, and offers the same (or restricted) interface as regular dataclasses? This would make the distinction explicit between a mutable, object-like dataclass, and the immutable, tuple-like named-/datatuple. With this, we could also get rid of the fake-immutable frozen dataclasses. In the same way, a dataclasses.datadict could replace typing.TypedDict, which is very limited at this point anyway. So instead of trying to merge different concepts, let's use one create one interface to rule them all. Because, after all: "There should be one-- and preferably only one --obvious way to do it."

On Thu, Jul 29, 2021 at 9:19 PM Joren Hammudoglu <jhammudoglu@gmail.com> wrote:
https://xkcd.com/927/ ChrisA
participants (10)
-
Chris Angelico
-
David Mertz
-
Eric V. Smith
-
Jack DeVries
-
Joren Hammudoglu
-
Paul Bryan
-
pavel@lexyr.com
-
Steven D'Aprano
-
Thomas Grainger
-
Titus Brown