A subclassing API for named tuples?

An exchange with Antoine in one of the enum threads sparked a thought. A recurring suggestion for collections.namedtuple is that it would be nice to be able to define them like this (as it not only avoids having to repeat the class name, but also allows them to play nicely with pickle and other name-based reference mechanisms): class MyTuple(collections.NamedTuple): __fields__ = "a b c d e".split() However, one of Raymond's long standing objections to such a design for namedtuple is the ugliness of people having to remember to include the right __slots__ definition to ensure it doesn't add any storage overhead above and beyond that for the underlying tuple. For the intended use case as a replacement for short tuples, an unused dict per instance is a *big* wasted overhead, so that concern can't be dismissed as premature optimisation:
However, the thought that occurred to me is that the right metaclass definition allows the default behaviour of __slots__ to be flipped, so that you get "__slots__ = ()" defined in your class namespace automatically, and you have to write "del __slots__" to get normal class behaviour back:
So, what do people think? Too much magic? Or just the right amount to allow a cleaner syntax for named tuple definitions, without inadvertently encouraging people to do bad things to their memory usage? (Note: for backwards compatibility reasons, we couldn't use a custom metaclass for the classes returned by the existing collections namedtuple API. However, we could certainly provide a distinct collections.NamedTuple type which used a custom metaclass to behave this way). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Thu, 14 Feb 2013 23:19:51 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
You don't *have* to remember to include it. You just have to include it if you really care about the memory footprint. Which is something that is easy to mention in the documentation. (also, an empty dict that never gets accessed is not much of a problem performance-wise) Speaking for myself, most uses of namedtuple are not performance-critical. They are for convenience: I want an immutable, hashable, comparable record-like class (with a useful definition of equality, which is also very convenient for unit tests :-)) and I don't want to write that behaviour by hand every time. Regards Antoine.

On Feb 14, 2013 6:20 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
Something similar that I've been using: @as_namedtuple("a b c d e") class MyTuple: """My namedtuple with extra stuff.""" @property def something_special(self): """...""" return ... Support for default values is also something I've added, but that's relatively orthogonal to this discussion. -eric

On 15/02/13 00:19, Nick Coghlan wrote:
How would that differ from this? class MyTuple(collections.namedtuple("MyTupleParent", "a b c d e")): pass Apart from the DRY violation in the class name, I find that perfectly acceptable, and it seems to work fine with pickling: py> t = MyTuple(2, 4, 8, 16, 32) py> t MyTuple(a=2, b=4, c=8, d=16, e=32) py> s = pickle.dumps(t) py> u = pickle.loads(s) py> u == t True -- Steven

On 15/02/13 09:40, Antoine Pitrou wrote:
Exactly the same thing can be said about the __field__ line. The only difference is that in one case you reach "unwieldy" a little sooner than in the other. There are well-known ways to deal with excessively long lines of code which don't require a new collection type. class MyClassWithAnExtremelyLongName( collections.namedtuple("Cheese", ("cheddar swiss ricotta camembert gouda parmesan brie limburger havarti" " danish_blue roquefort greek_feta provolone mozzarella edam maasdam" " stilton wensleydale red_leicester american colby monterey_jack kapiti" " casu_marzu") ) ): pass Style arguments about the placement of closing brackets to /dev/null :-) I'm simply not seeing enough benefit to NamedTuple to make up for the invariable confusion between NamedTuple and namedtuple. -- Steven

On Fri, 15 Feb 2013 10:17:32 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
I don't find that readable at all. Having many indentations in a single declaration line makes things quite messy in my opinion, and your important class declaration is now drowning in a sea of literals. The equivalent __fields__ *would* be readable, if properly formatted. Regards Antoine.

On 15/02/13 18:03, Antoine Pitrou wrote:
You don't have to stick the literals in the class definition. This is an easy problem to solve with existing techniques: from collections import namedtuple FIELDNAMES = """...""" # Format it however you like. class MyClassWithAnExtremelyLongName(namedtuple("Cheese", FIELDNAMES)): pass There is no need to introduce confusion and uncertainty, "Should I use namedtuple or NamedTuple?".To say nothing of the invariable cases where people use the wrong one and have to deal with the cryptic error: py> class C(collections.namedtuple): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: function() argument 1 must be code, not str Moving the field name definitions outside of the namedtuple into a magic dunder attribute adds complexity. To use namedtuple, I just need to remember the calling signature, which is trivial. To use the proposed NamedTuple, I have to remember a magic dunder attribute __fields__ that is not used anywhere else. I don't see this as a win for simplicity. -- Steven

On Sat, 16 Feb 2013 15:51:28 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Still not very elegant IMO (and it introduces spurious globals in the module). Parameters for class construction shouldn't be outside of the class declaration. Really, the class declaration syntax has been *designed* to deal with all of this. It's counter-productive to try not to use it. (but, yes, now the official namedtuple API is the one we know, I agree that having two ways to do it may be one too many) Regards Antoine.

16.02.2013 11:18, Antoine Pitrou wrote:
I agree. I see some nicer (IMHO) alternatives... Apart from the recipe I mentioned in the recent post (although I am *not* convinced it should be added to the stdlib) some decorator-based way may be nice, e.g.: @namedtuple(fields='length weight is_poisonous') class Snake: def hiss(self): return 'hiss' + self.length * 's' Cheers. *j

17.02.2013 22:38, Steven D'Aprano wrote:
No, not at all. It only means that signature specification of the namedtuple factory function would need to be extended a bit (or a separate decorator, such as namedtuple.subclass, would need to be added as an attribute) to support returning a decorator instead of a ready named tuple type.
I don't understand what do you mean. It's obvious that a named tuple *type* must be callable (as any other instantiable type) and that in 99% of cases named tuple *instances* should not be... Cheers. *j

Please remember that in the Py3k docs is a link to my namedtuple.abc recipe which covers most of the issues discussed in this thread (+ ABC registration/isinstance/issubclass checking). Cheers. *j PS. If we wanted to have the "automatic-always-__slots__" feature (discussed in this thread), we probably need to dedent line #57 of the recipe one indentation level left... PPS. As for Py3.3, the recipe is outdateed a bit as it uses deprecated @abstractproperty -- but it can be easily fixed.

On 02/14/2013 05:19 AM, Nick Coghlan wrote:
A recurring suggestion for collections.namedtuple is that it would be nice to be able to define them like this [...]
FWIW, I think namedtuple is overused. Not that there's anything innately wrong with namedtuple; it's just that I think too many types are iterable which shouldn't be. (Nobody unpacks the stat_result anymore, that's like 1995 man.) I suggest people use types.SimpleNamespace unless iterability is specifically required. //arry/

On Fri, Feb 15, 2013 at 5:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Indeed. In particular, consuming namespaces from an iterator is substantially more annoying than consuming a tuple. I'd put the size cutoff for switching over to a namespace somewhere around the 5 item mark. At 2-3, the tuple is often clearly superior, at 4 it's arguable, at 5, unpacking starts to get a bit hard to read. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Feb 14, 2013, at 5:19 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
To me, this smells of over-engineering. For most uses, the current form of named tuple is simple and clean: CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"]) The current form of namedtuple is also very flexible. The docs for it show how it can easily be subclassed to add computed fields, new reprs, etc. It also works great for implementing prototype instance, enums, etc. And it has a verbose option that makes it self-documenting. The namedtuple API was born out of mixing the best parts of many different implementations found in the field. It went through extensive evaluation, review and refinement as a recipe on ASPN. IMO, there is zero need to wreck its simplicity by throwing metaclass firepower into the mix. We really don't need a second way to do it. For people who care about the memory used by subclasses, it takes less effort to learn how to use slots that it does to learn and remember an second API for namedtuples. After all, the sole justification for __slots__ is to prevent the creation of an instance dictionary. That is what it's for. I think your real issue doesn't have anything to do with named tuples in particular. Instead, your issue is with subclassing *any* class that uses __slots__. Raymond

On Mon, Feb 18, 2013 at 7:46 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
My real issue is with interminable enum discussions, this thread was merely a side effect of enum-induced frustration :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Mon, 18 Feb 2013 21:48:48 -0800, Raymond Hettinger <raymond.hettinger@gmail.com> a écrit :
The problem with enum is that since it doesn't unlock any particularly interesting feature, it must at least cater to all existing use cases to be desirable for stdlib inclusion. Hence the long discussions. Regards Antoine.

Le Thu, 14 Feb 2013 23:19:51 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
You don't *have* to remember to include it. You just have to include it if you really care about the memory footprint. Which is something that is easy to mention in the documentation. (also, an empty dict that never gets accessed is not much of a problem performance-wise) Speaking for myself, most uses of namedtuple are not performance-critical. They are for convenience: I want an immutable, hashable, comparable record-like class (with a useful definition of equality, which is also very convenient for unit tests :-)) and I don't want to write that behaviour by hand every time. Regards Antoine.

On Feb 14, 2013 6:20 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
Something similar that I've been using: @as_namedtuple("a b c d e") class MyTuple: """My namedtuple with extra stuff.""" @property def something_special(self): """...""" return ... Support for default values is also something I've added, but that's relatively orthogonal to this discussion. -eric

On 15/02/13 00:19, Nick Coghlan wrote:
How would that differ from this? class MyTuple(collections.namedtuple("MyTupleParent", "a b c d e")): pass Apart from the DRY violation in the class name, I find that perfectly acceptable, and it seems to work fine with pickling: py> t = MyTuple(2, 4, 8, 16, 32) py> t MyTuple(a=2, b=4, c=8, d=16, e=32) py> s = pickle.dumps(t) py> u = pickle.loads(s) py> u == t True -- Steven

On 15/02/13 09:40, Antoine Pitrou wrote:
Exactly the same thing can be said about the __field__ line. The only difference is that in one case you reach "unwieldy" a little sooner than in the other. There are well-known ways to deal with excessively long lines of code which don't require a new collection type. class MyClassWithAnExtremelyLongName( collections.namedtuple("Cheese", ("cheddar swiss ricotta camembert gouda parmesan brie limburger havarti" " danish_blue roquefort greek_feta provolone mozzarella edam maasdam" " stilton wensleydale red_leicester american colby monterey_jack kapiti" " casu_marzu") ) ): pass Style arguments about the placement of closing brackets to /dev/null :-) I'm simply not seeing enough benefit to NamedTuple to make up for the invariable confusion between NamedTuple and namedtuple. -- Steven

On Fri, 15 Feb 2013 10:17:32 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
I don't find that readable at all. Having many indentations in a single declaration line makes things quite messy in my opinion, and your important class declaration is now drowning in a sea of literals. The equivalent __fields__ *would* be readable, if properly formatted. Regards Antoine.

On 15/02/13 18:03, Antoine Pitrou wrote:
You don't have to stick the literals in the class definition. This is an easy problem to solve with existing techniques: from collections import namedtuple FIELDNAMES = """...""" # Format it however you like. class MyClassWithAnExtremelyLongName(namedtuple("Cheese", FIELDNAMES)): pass There is no need to introduce confusion and uncertainty, "Should I use namedtuple or NamedTuple?".To say nothing of the invariable cases where people use the wrong one and have to deal with the cryptic error: py> class C(collections.namedtuple): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: function() argument 1 must be code, not str Moving the field name definitions outside of the namedtuple into a magic dunder attribute adds complexity. To use namedtuple, I just need to remember the calling signature, which is trivial. To use the proposed NamedTuple, I have to remember a magic dunder attribute __fields__ that is not used anywhere else. I don't see this as a win for simplicity. -- Steven

On Sat, 16 Feb 2013 15:51:28 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Still not very elegant IMO (and it introduces spurious globals in the module). Parameters for class construction shouldn't be outside of the class declaration. Really, the class declaration syntax has been *designed* to deal with all of this. It's counter-productive to try not to use it. (but, yes, now the official namedtuple API is the one we know, I agree that having two ways to do it may be one too many) Regards Antoine.

16.02.2013 11:18, Antoine Pitrou wrote:
I agree. I see some nicer (IMHO) alternatives... Apart from the recipe I mentioned in the recent post (although I am *not* convinced it should be added to the stdlib) some decorator-based way may be nice, e.g.: @namedtuple(fields='length weight is_poisonous') class Snake: def hiss(self): return 'hiss' + self.length * 's' Cheers. *j

17.02.2013 22:38, Steven D'Aprano wrote:
No, not at all. It only means that signature specification of the namedtuple factory function would need to be extended a bit (or a separate decorator, such as namedtuple.subclass, would need to be added as an attribute) to support returning a decorator instead of a ready named tuple type.
I don't understand what do you mean. It's obvious that a named tuple *type* must be callable (as any other instantiable type) and that in 99% of cases named tuple *instances* should not be... Cheers. *j

Please remember that in the Py3k docs is a link to my namedtuple.abc recipe which covers most of the issues discussed in this thread (+ ABC registration/isinstance/issubclass checking). Cheers. *j PS. If we wanted to have the "automatic-always-__slots__" feature (discussed in this thread), we probably need to dedent line #57 of the recipe one indentation level left... PPS. As for Py3.3, the recipe is outdateed a bit as it uses deprecated @abstractproperty -- but it can be easily fixed.

On 02/14/2013 05:19 AM, Nick Coghlan wrote:
A recurring suggestion for collections.namedtuple is that it would be nice to be able to define them like this [...]
FWIW, I think namedtuple is overused. Not that there's anything innately wrong with namedtuple; it's just that I think too many types are iterable which shouldn't be. (Nobody unpacks the stat_result anymore, that's like 1995 man.) I suggest people use types.SimpleNamespace unless iterability is specifically required. //arry/

On Fri, Feb 15, 2013 at 5:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Indeed. In particular, consuming namespaces from an iterator is substantially more annoying than consuming a tuple. I'd put the size cutoff for switching over to a namespace somewhere around the 5 item mark. At 2-3, the tuple is often clearly superior, at 4 it's arguable, at 5, unpacking starts to get a bit hard to read. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Feb 14, 2013, at 5:19 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
To me, this smells of over-engineering. For most uses, the current form of named tuple is simple and clean: CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"]) The current form of namedtuple is also very flexible. The docs for it show how it can easily be subclassed to add computed fields, new reprs, etc. It also works great for implementing prototype instance, enums, etc. And it has a verbose option that makes it self-documenting. The namedtuple API was born out of mixing the best parts of many different implementations found in the field. It went through extensive evaluation, review and refinement as a recipe on ASPN. IMO, there is zero need to wreck its simplicity by throwing metaclass firepower into the mix. We really don't need a second way to do it. For people who care about the memory used by subclasses, it takes less effort to learn how to use slots that it does to learn and remember an second API for namedtuples. After all, the sole justification for __slots__ is to prevent the creation of an instance dictionary. That is what it's for. I think your real issue doesn't have anything to do with named tuples in particular. Instead, your issue is with subclassing *any* class that uses __slots__. Raymond

On Mon, Feb 18, 2013 at 7:46 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
My real issue is with interminable enum discussions, this thread was merely a side effect of enum-induced frustration :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Mon, 18 Feb 2013 21:48:48 -0800, Raymond Hettinger <raymond.hettinger@gmail.com> a écrit :
The problem with enum is that since it doesn't unlock any particularly interesting feature, it must at least cater to all existing use cases to be desirable for stdlib inclusion. Hence the long discussions. Regards Antoine.
participants (9)
-
Antoine Pitrou
-
Eric Snow
-
Greg Ewing
-
Gregory P. Smith
-
Jan Kaliszewski
-
Larry Hastings
-
Nick Coghlan
-
Raymond Hettinger
-
Steven D'Aprano