namedtuple baseclass
Hello all, I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')". One possible approach would be to: 1. Rename 'namedtuple' function to '_namedtuple' 2. Add a class 'namedtuple(tuple)', with its '__new__' method proxying '_namedtuple' function 3. Modify the class template to derive namedtuples from the 'namedtuple' class, instead of 'tuple' This way, it's possible to simple write 'isinstance(o, namedtuple)'. I have a working patch that implements the above logic (all python unittests pass), so if you find this useful I can start an issue on bugs.python.org. Thank you, Yury
I never liked this implementation of namedtuple with "exec". I remember some proposals (and even a working implementation) of namedtuple done with metaclasses. I Don't remember why they were rejected. I think at least having a base class other than tuple is something useful. +1 João Bernardo
Yeah, while I was working on the patch, I thought about rewriting it all
without the use of "exec". But that would be too much of a change 10 days
before RC1. Therefore, the proposed change is minimal, aimed to only
slightly improve the current design.
Yury
On Sat, Jan 11, 2014 at 7:14 PM, João Bernardo
I never liked this implementation of namedtuple with "exec". I remember some proposals (and even a working implementation) of namedtuple done with metaclasses. I Don't remember why they were rejected.
I think at least having a base class other than tuple is something useful.
+1
João Bernardo
On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote:
Hello all,
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Let me see if I understand your use-case. You want to dynamically dispatch on various objects. Given two objects: p1 = (23, 42) p2 = namedtuple("pair", "a b")(23, 42) assert p1 == p2 you want to dispatch p1 and p2 differently. Is that correct? Then, given a third object: class Person(namedtuple("Person", "name sex age occupation id")): def say_hello(self): print("Hello %s" % self.name) p3 = Person("Fred Smith", "M", 35, "nurse", 927056) you want to dispatch p2 and p3 the same. Is that correct? If I am correct, I wonder what sort of code you are writing that wants to treat p1 and p2 differently, and p2 and p3 the same. To me, this seems ill-advised. Apart from tuple (and object), p2 and p3 should not share a common base class, because they have nothing in common. [...]
This way, it's possible to simple write 'isinstance(o, namedtuple)'.
I am having difficulty thinking of circumstances where I would want to do that. -1 on the idea. -- Steven
Hi Steven,
On Sat, Jan 11, 2014 at 8:05 PM, Steven D'Aprano
On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote:
Hello all,
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Let me see if I understand your use-case. You want to dynamically dispatch on various objects. Given two objects:
p1 = (23, 42) p2 = namedtuple("pair", "a b")(23, 42) assert p1 == p2
you want to dispatch p1 and p2 differently. Is that correct?
Then, given a third object:
class Person(namedtuple("Person", "name sex age occupation id")): def say_hello(self): print("Hello %s" % self.name)
p3 = Person("Fred Smith", "M", 35, "nurse", 927056)
you want to dispatch p2 and p3 the same. Is that correct?
Well, it all depends on a use case ;) In my concrete use case - yes, more to that below.
If I am correct, I wonder what sort of code you are writing that wants to treat p1 and p2 differently, and p2 and p3 the same. To me, this seems ill-advised. Apart from tuple (and object), p2 and p3 should not share a common base class, because they have nothing in common.
Well, everything in python is a subclass/instance of object, so what? Yes, I think that different namedtuples should be an instance of some remote common parent, derived from tuple, because they are different, they *are* namedtuples after all. They have field names for the data stored in them, and that is what distinguishes them from plain tuples.
[...]
This way, it's possible to simple write 'isinstance(o, namedtuple)'.
I am having difficulty thinking of circumstances where I would want to do that.
My use case: I have a system that dumps python objects to some intermediate format, which is later converted to html, or dumped in a terminal (for debug, reporting, and other purposes). And I want to dump namedtuples with their field names/values (not as a simple tuples). I'm sure there are much more use cases than my current itch. Python has the richest and most beautiful OO facilities, we have lots of ABCs and elegant exceptions tree, everything is well structured. To me, it's logical, that one of the most commonly used classes should have a proper base class. - Yury
See also http://bugs.python.org/issue7796 for a discussion of this issue. -- Eric.
On Jan 11, 2014, at 8:05 PM, Steven D'Aprano
wrote: On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: Hello all,
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Let me see if I understand your use-case. You want to dynamically dispatch on various objects. Given two objects:
p1 = (23, 42) p2 = namedtuple("pair", "a b")(23, 42) assert p1 == p2
you want to dispatch p1 and p2 differently. Is that correct?
Then, given a third object:
class Person(namedtuple("Person", "name sex age occupation id")): def say_hello(self): print("Hello %s" % self.name)
p3 = Person("Fred Smith", "M", 35, "nurse", 927056)
you want to dispatch p2 and p3 the same. Is that correct?
If I am correct, I wonder what sort of code you are writing that wants to treat p1 and p2 differently, and p2 and p3 the same. To me, this seems ill-advised. Apart from tuple (and object), p2 and p3 should not share a common base class, because they have nothing in common.
[...]
This way, it's possible to simple write 'isinstance(o, namedtuple)'.
I am having difficulty thinking of circumstances where I would want to do that.
-1 on the idea.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Hi Eric, Thank you very much for bringing this up. I couldn't find that issue (perhaps, because I was looking for an open ticket).
From the discussion there, it seems that Raymond and Guido agreed to have a common base class for namedtuple for py3.3; however, that was in 2010/2011.
Perhaps, any doubts that existed at that time are not the case now?
Thanks,
Yury
On Sat, Jan 11, 2014 at 8:27 PM, Eric V. Smith
See also http://bugs.python.org/issue7796 for a discussion of this issue.
-- Eric.
On Jan 11, 2014, at 8:05 PM, Steven D'Aprano
wrote: On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: Hello all,
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Let me see if I understand your use-case. You want to dynamically dispatch on various objects. Given two objects:
p1 = (23, 42) p2 = namedtuple("pair", "a b")(23, 42) assert p1 == p2
you want to dispatch p1 and p2 differently. Is that correct?
Then, given a third object:
class Person(namedtuple("Person", "name sex age occupation id")): def say_hello(self): print("Hello %s" % self.name)
p3 = Person("Fred Smith", "M", 35, "nurse", 927056)
you want to dispatch p2 and p3 the same. Is that correct?
If I am correct, I wonder what sort of code you are writing that wants to treat p1 and p2 differently, and p2 and p3 the same. To me, this seems ill-advised. Apart from tuple (and object), p2 and p3 should not share a common base class, because they have nothing in common.
[...]
This way, it's possible to simple write 'isinstance(o, namedtuple)'.
I am having difficulty thinking of circumstances where I would want to do that.
-1 on the idea.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Jan 12, 2014 at 4:44 AM, Yury Selivanov
Perhaps, any doubts that existed at that time are not the case now?
Sometimes I feel that various questions about namedtuple class, record and similar proposals need a separate FAQ, but like everybody else I am lazy to create one, so it never happens.
It sounds like the consensus there wasn't to have a base class for namedtuple, but instead to have an abc that all namedtuples, and C namedtuple-like types, would be registered with, and that would have no API beyond that of Sequence.
If I understand the original request in this thread, I'm not sure this would satisfy the use case.
He's looking to detect namedtuples so he can extract their names along with their values. Which is a perfectly reasonable thing to do for the kind of reflective code he wants to write. It would presumably use code like this:
if isinstance(x, NamedTuple);
d = OrderedDict(zip(x._fields, x))
do_stuff(d)
But that won't work with any abstract NamedTuple, only one that has a _fields member that lists the field names. So you'd need to write this:
if isinstance(NamedTuple):
try:
d = OrderedDict(zip(x._fields, x))
except AttributeError:
whoops, it's an os.stat_result or something
else:
do_stuff(d)
And at that point, the isinstance check isn't helping anything over the duck typing on _fields, which you can already do today.
So to satisfy this use case, you'd either need an actual namedtuple base class instead of an abc, or an abc that adds some API for getting the field names (or name-value pairs). Either of which seems reasonable--except for the odd quirk of having a public API in a class that's prefixed with an underscore. (If it's not prefixed with an underscore, it can conflict with a field name, which defeats the whole purpose of namedtuple.)
Sent from a random iPhone
On Jan 11, 2014, at 17:44, Yury Selivanov
Hi Eric,
Thank you very much for bringing this up. I couldn't find that issue (perhaps, because I was looking for an open ticket).
From the discussion there, it seems that Raymond and Guido agreed to have a common base class for namedtuple for py3.3; however, that was in 2010/2011.
Perhaps, any doubts that existed at that time are not the case now?
Thanks, Yury
On Sat, Jan 11, 2014 at 8:27 PM, Eric V. Smith
wrote: See also http://bugs.python.org/issue7796 for a discussion of this issue.
-- Eric.
On Jan 11, 2014, at 8:05 PM, Steven D'Aprano
wrote: On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: Hello all,
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Let me see if I understand your use-case. You want to dynamically dispatch on various objects. Given two objects:
p1 = (23, 42) p2 = namedtuple("pair", "a b")(23, 42) assert p1 == p2
you want to dispatch p1 and p2 differently. Is that correct?
Then, given a third object:
class Person(namedtuple("Person", "name sex age occupation id")): def say_hello(self): print("Hello %s" % self.name)
p3 = Person("Fred Smith", "M", 35, "nurse", 927056)
you want to dispatch p2 and p3 the same. Is that correct?
If I am correct, I wonder what sort of code you are writing that wants to treat p1 and p2 differently, and p2 and p3 the same. To me, this seems ill-advised. Apart from tuple (and object), p2 and p3 should not share a common base class, because they have nothing in common.
[...]
This way, it's possible to simple write 'isinstance(o, namedtuple)'.
I am having difficulty thinking of circumstances where I would want to do that.
-1 on the idea.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Jan 12, 2014 at 6:53 PM, Andrew Barnert
So to satisfy this use case, you'd either need an actual namedtuple base class instead of an abc, or an abc that adds some API for getting the field names (or name-value pairs). Either of which seems reasonable--except for the odd quirk of having a public API in a class that's prefixed with an underscore. (If it's not prefixed with an underscore, it can conflict with a field name, which defeats the whole purpose of namedtuple.)
Is compatibility with the current namedtuple important, or can this be done another way? For instance, the fields could be retrieved with __getitem__ instead: # Hacking it in with a subclass. Gives no benefit # but is a proof of concept. class Point(namedtuple('Point', ['x', 'y'])): def __getitem__(self, which): if which=="fields": return self._fields return super().__getitem__(which)
a=Point(1,2) a.x 1 a.y 2 a.fields Traceback (most recent call last): File "
", line 1, in <module> a.fields AttributeError: 'Point' object has no attribute 'fields' a["fields"] ('x', 'y') a[0] 1 a[1] 2
Normally, __getitem__ will be used with integers (since this is basically a sequence, not a mapping). Would it break things to use a string in this way? It's guaranteed not to collide with either form of access (as a tuple, or as fields). ChrisA
On Sun, Jan 12, 2014 at 07:17:56PM +1100, Chris Angelico wrote:
On Sun, Jan 12, 2014 at 6:53 PM, Andrew Barnert
wrote: So to satisfy this use case, you'd either need an actual namedtuple base class instead of an abc, or an abc that adds some API for getting the field names (or name-value pairs). Either of which seems reasonable--except for the odd quirk of having a public API in a class that's prefixed with an underscore. (If it's not prefixed with an underscore, it can conflict with a field name, which defeats the whole purpose of namedtuple.)
Is compatibility with the current namedtuple important, or can this be done another way? For instance, the fields could be retrieved with __getitem__ instead:
It's a tuple. It already uses __getitem__ to return items indexed by position. Adding magic so that obj["fields"] is an alias for obj._fields is, well, horrible.
# Hacking it in with a subclass. Gives no benefit # but is a proof of concept. class Point(namedtuple('Point', ['x', 'y'])): def __getitem__(self, which): if which=="fields": return self._fields return super().__getitem__(which)
I think you missed that namedtuple like objects written in C don't have a _fields attribute, e.g. os.stat_result. If you're going to insist that they add special handling in __getitem__, wouldn't it just be cleaner and simpler to get them to add a _fields attribute? So... * An ABC for namedtuple as agreed by Raymond and Guido wouldn't include any extra functionality beyond Sequence, so it doesn't guarantee the existence of _fields; that doesn't satisfy the use-case. * An actual namedtuple superclass only works for the namedtuple factory function, not for C namedtuple-like types. Both could be fixed -- Python could define a namedtuple superclass, and all relevant C types like os.stat_result could be changed to inherit from them. (But what of those which don't?) Or the ABC could be extended to include a promise of _fields, but that would exclude C types. Either way, in order to satisfy this use-case, there would be a whole lot of changes needed. Or, you can duck-type: if isinstance(o, tuple): try: fields = o._fields except AttributeError: fields = ... # fall back Have I missed something? -- Steven
On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano
It's a tuple. It already uses __getitem__ to return items indexed by position. Adding magic so that obj["fields"] is an alias for obj._fields is, well, horrible.
It's only an alias in the simple version that I did there. If it were to be used as a means of avoiding the _fields reserved name, it wouldn't be an alias. But yes, it is somewhat magical. I was hunting for an out-of-band way to get that sort of information. ChrisA
On Sun, Jan 12, 2014 at 10:46:51PM +1100, Chris Angelico wrote:
On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano
wrote: It's a tuple. It already uses __getitem__ to return items indexed by position. Adding magic so that obj["fields"] is an alias for obj._fields is, well, horrible.
It's only an alias in the simple version that I did there. If it were to be used as a means of avoiding the _fields reserved name, it wouldn't be an alias. But yes, it is somewhat magical. I was hunting for an out-of-band way to get that sort of information.
I still don't get how you think this solves the problem that the OP's use-case is to use isinstance() to identify namedtuples, then read _fields. But with the (proposed, not implemented) namedtuple ABC, isinstance(o, NamedTuple) could be true and o._fields fail. Breaking backwards compatibility to write that as o["fields"] instead won't help, because it will still fail: py> t = os.stat_result([1]*10) py> t["fields"] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: tuple indices must be integers, not str Changing namedtuple is not enough. Oh, and this is a backwards-compatibility breaking change, because _fields is part of the *public* API for namedtuple, despite the leading underscore. So I fail to see how anything short of a massive re-engineering of not just namedtuple but also any C namedtuple-like types will satisfy the OP's use-case. Have I missed something? -- Steven
On Sun, Jan 12, 2014 at 10:55 PM, Steven D'Aprano
On Sun, Jan 12, 2014 at 10:46:51PM +1100, Chris Angelico wrote:
On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano
wrote: It's a tuple. It already uses __getitem__ to return items indexed by position. Adding magic so that obj["fields"] is an alias for obj._fields is, well, horrible.
It's only an alias in the simple version that I did there. If it were to be used as a means of avoiding the _fields reserved name, it wouldn't be an alias. But yes, it is somewhat magical. I was hunting for an out-of-band way to get that sort of information.
I still don't get how you think this solves the problem that the OP's use-case is to use isinstance() to identify namedtuples, then read _fields.
That was a slightly tangential comment stemming from Andrew Barnert's remark that using _fields for a public API is quirky. (Which is why I quoted him in my post.) This would no longer use an underscore name for something public. That's all. ChrisA
Steven,
On Sun, Jan 12, 2014 at 6:55 AM, Steven D'Aprano
On Sun, Jan 12, 2014 at 10:46:51PM +1100, Chris Angelico wrote:
On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano
wrote: It's a tuple. It already uses __getitem__ to return items indexed by position. Adding magic so that obj["fields"] is an alias for obj._fields is, well, horrible.
It's only an alias in the simple version that I did there. If it were to be used as a means of avoiding the _fields reserved name, it wouldn't be an alias. But yes, it is somewhat magical. I was hunting for an out-of-band way to get that sort of information.
I still don't get how you think this solves the problem that the OP's use-case is to use isinstance() to identify namedtuples, then read _fields. But with the (proposed, not implemented) namedtuple ABC, isinstance(o, NamedTuple) could be true and o._fields fail.
If we decide to implement an ABC, then any class that satisfies it should implement '_fields' (and _make, and other namedtuple public methods) properly (this can be enforced in the ABC's '__subclasshook__')
Breaking backwards compatibility to write that as o["fields"] instead won't help, because it will still fail:
py> t = os.stat_result([1]*10) py> t["fields"] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: tuple indices must be integers, not str
Changing namedtuple is not enough.
Oh, and this is a backwards-compatibility breaking change, because _fields is part of the *public* API for namedtuple, despite the leading underscore.
So I fail to see how anything short of a massive re-engineering of not just namedtuple but also any C namedtuple-like types will satisfy the OP's use-case. Have I missed something?
If we go with the ABC route, then we can simply implement '_fields' and other namedtuple methods for the low-level C structure os.stat_results is using later. But for now, stat_result is not a namedtuple (lacks all of namedtuple API). So I'm not sure that C namedtuple-like types should hold us bask on this proposal. BTW, ABC proposal aside: the current namedtuple implementation creates the class from a template with "exec" call. For every namedtuple, it's entire set of methods is created over and over again. Even for the memory efficiency sake, having a base class with *some* of the common methods (which are currently in the template) is better. - Yury
On Jan 12, 2014 5:52 AM, "Yury Selivanov"
BTW, ABC proposal aside: the current namedtuple implementation creates the class from a template with "exec" call. For every namedtuple, it's entire set of methods is created over and over again. Even for the memory efficiency sake, having a base class with *some* of the common methods (which are currently in the template) is better.
It's a trade-off. We increase the definition-time cost by using exec, but minimize the cost of traversing the attribute lookup chain when using instances. The purely ABC approach in the referenced issue preserves this instance-favoring-optimization design. -eric
Eric,
On Sun, Jan 12, 2014 at 11:33 AM, Eric Snow
On Jan 12, 2014 5:52 AM, "Yury Selivanov"
wrote: BTW, ABC proposal aside: the current namedtuple implementation creates the class from a template with "exec" call. For every namedtuple, it's entire set of methods is created over and over again. Even for the memory efficiency sake, having a base class with *some* of the common methods (which are currently in the template) is better.
It's a trade-off. We increase the definition-time cost by using exec, but minimize the cost of traversing the attribute lookup chain when using instances. The purely ABC approach in the referenced issue preserves this instance-favoring-optimization design.
-eric
Correct me if i'm wrong, but what's the point of speeding up (2%?) attribute lookup on "_make", "__repr__", and other namedtuple methods? What matters is the performance of "__getitem__" and field property access, but that would be the same if a metaclass (or simple "type" call) is used to construct nametuples. Anyways, I'm not proposing to touch the main bulk of the current implementation (and perhaps there are another reasons why it is as it is). The only thing I think would be nice to have (for now), is to have a base class for namedtuples other than tuple. Thank you, Yury
On Jan 11, 2014, at 11:04 PM, Yury Selivanov
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch.
A named tuple is a protocol, not a class. Here's the glossary entry: ''' named tuple Any tuple-like class whose indexable elements are also accessible using named attributes (for example, time.localtime() returns a tuple-like object where the year is accessible either with an index such as t[0] or with a named attribute like t.tm_year). A named tuple can be a built-in type such as time.struct_time, or it can be created with a regular class definition. A full featured named tuple can also be created with the factory function collections.namedtuple(). The latter approach automatically provides extra features such as a self-documenting representation like Employee(name='jones', title='programmer'). '''
Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Yes, that is the correct way of doing it. ABCs weren't meant to replace all instances of duck typing. Raymond P.S. Here's a link to previous discussion on the subject: http://bugs.python.org/issue7796
On 1/12/2014 3:01 PM, Raymond Hettinger wrote:
On Jan 11, 2014, at 11:04 PM, Yury Selivanov
mailto:yselivanov.ml@gmail.com> wrote: I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch.
A named tuple is a protocol, not a class. Here's the glossary entry: ''' named tuple
Any tuple-like class whose indexable elements are also accessible using named attributes (for example, time.localtime()
returns a tuple-like object where the /year/ is accessible either with an index such as t[0] or with a named attribute like t.tm_year). A named tuple can be a built-in type such as time.struct_time
, or it can be created with a regular class definition. A full featured named tuple can also be created with the factory function collections.namedtuple() . The latter approach automatically provides extra features such as a self-documenting representation like Employee(name='jones', title='programmer'). '''
That is a really nice glossary entry. I had not seen it before.
Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')".
Yes, that is the correct way of doing it.
That looks fine to me also, so I agree that nothing new is needed. -- Terry Jan Reedy
From: Steven D'Aprano
Changing namedtuple is not enough.
In fact, it's almost completely orthogonal to adding a NamedTuple ABC. Changing namedtuple shouldn't be necessary, and definitely won't be sufficient.
So I fail to see how anything short of a massive re-engineering of not just namedtuple but also any C namedtuple-like types will satisfy the OP's use-case. Have I missed something?
I said pretty much the same thing yesterday… but on further reflection, I think it's a lot simpler than it looks. First, let's write collections.abc.NamedTuple: class NamedTuple(Sequence): @classmethod def __subclasshook__(cls, sub): if not issubclass(sub, collections.abc.Sequence): return False try: sub._fields return True except: return NotImplemented That's easy, and it works with namedtuple types with no change, and it should work with any Python wrapper type that's designed to emulate namedtuple without using it (e.g., if someone decides to write a custom implementation with a shared base class, so he can make all of his types share implementations for _make and friends, as has been suggested on this thread). So, what about C types? Obviously they don't generally supply _fields—or anything else useful. But most (all?) of the namedtuple-like types in builtins/stdlib are built with PyStructSequence, and adding _fields to them requires just a few lines at the end of PyStructSequence_InitType2: PyObject *_fields = PyTuple_New(visible_length_key); for (i=0; i!=visible_key_length; ++i) { PyObject *field = PyUnicode_FromString(desc->fields[i].name); PyTuple_SET_ITEM(_fields, i, field); } PyDict_SetItemString(dict, "_fields", fields); In fact, that might be worth doing even without the NamedTuple ABC proposal. But StructSequence has only been an exposed, documented protocol since 3.3, so surely there are extension modules out there that do their namedtuple-like types manually. (In a quick look around, I couldn't find any examples—although I did find a couple with Python wrappers that create a namedtuple around the result returned by a C implementation function—but I'm sure they exist.) Obviously you need to be able to get the field names from somewhere—whether that's an attribute or method on the type, copy-pasting from documentation or source, or even parsing the repr of an instance or something—but then you can just generate a wrapper from the type and its field names. And we could just leave it at that: "Sorry, those aren't NamedTuple classes, but you can always implement a wrapper in Python yourself." Or we could add a wrapper-generator to the collections module. Something like this: def namedtupleize(cls, fields): if isinstance(fields, str): fields = fields.split() class Sub: _fields = fields def __init__(self, *args, **kwargs): self.values = cls(*args, **kwargs) def __repr__(self): return repr(self.values) # a handful of other special methods that can't be getattrified def __getattr__(self, attr): return getattr(self.values, attr) return Sub statfields = 'st_mode st_ino st_dev st_nlink st_uid st_gid st_size st_atime st_mtime st_ctime' Stat = namedtuplize(os.stat_result, stat fields) stats = (Stat(os.stat(f)) for f in os.listdir('.')) (I'm using os.stat_result as an example, even though it's already a PyStructSequence so you wouldn't need it here, only for lack of a real-life example.) And then you can write a wrapper around os.stat that returns a Stat instead of an os.stat_result. Or, going the other way, in a quick&dirty script that just wraps a handful of these, you can just even wrap each object: def namedtuplify(obj, fields): return namedtuplize(type(obj), fields)(obj) While the namedtuplize function could be useful in the stdlib, the namedtuplify function is less useful, and there are many cases where it's a bad idea, and it's trivial to write yourself if you have need it, so I wouldn't add that to collections, except maybe as a recipe in the docs. One last thing: Either the ABC or the wrapper could also add _as_odict and the other methods that can be easily derived from _fields, because they're useful, and I frequently see people doing _as_odict by calling getattr(self, field) on each field.
Raymond, On January 12, 2014 at 3:01:42 PM, Raymond Hettinger (raymond.hettinger@gmail.com) wrote:
On Jan 11, 2014, at 11:04 PM, Yury Selivanov wrote:
I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch.
A named tuple is a protocol, not a class.
This line actually makes a lot of sense, thank you for the explanation. Since it’s a protocol, and a widely used one, then how about reopening a discussion (started in #7796) on adding an ABC ‘collections.abc.NamedTuple’? I understand the issue with structseq, but we can have the ABC now for regular named tuples. If/Once the named tuple API is implemented for structseqs, it will automatically conform to the proposed ABC. Thank you, Yury
I don't think the proposed NamedTuple ABC adds anything on top of duck typing on _fields (or on whichever other method you need, and possibly checking for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a protocol, not a type. But I think one of the ideas that came out of that discussion is worth pursuing on its own: giving a _fields member to every structseq type. Most of the namedtuple-like classes in the builtins/stdlib, like os.stat_result, are implemented with PyStructSequence. Since 3.3, that's been a public, documented protocol. A structseq type is already a tuple. And it stores all the information needed to expose the fields to Python, it just doesn't expose them in any way. And making it do so is easy. (Either add it to the type __dict__ at type creation, or add a getter that generates it on the fly from tp_members.) Of course a structseq can do more than a namedtuple. In particular, using a structseq via its _fields would mean that you miss its "non-sequence" fields, like st_mtime_ns. But then that's already true for using a structseq as a sequence, or just looking at its repr, so I don't think that's a problem. (The "visible fields" are visible for a reason…) And this still wouldn't mean that _fields is part of the "named tuple protocol" described in the glossary, just that it's part of structseq types as well as collections.namedtuple types. And this wouldn't give structseq an on-demand __dict__ so you can just call var(s) instead of OrderedDict(zip(s._fields, s)). Still, it seems like a clear win. A small patch, a bit of extra storage on each structseq type object (not on the instances), and now you can reflect on the most common kind of C named tuple types the same way you do on the most common kind of Python named tuple types.
Here's a quick patch: diff -r bc5f257f5cc1 Lib/test/test_structseq.py --- a/Lib/test/test_structseq.pySun Jan 12 14:12:59 2014 -0800 +++ b/Lib/test/test_structseq.pySun Jan 12 16:31:15 2014 -0800 @@ -28,6 +28,16 @@ for i in range(-len(t), len(t)-1): self.assertEqual(t[i], astuple[i]) + def test_fields(self): + t = time.gmtime() + self.assertEqual(t._fields, + ('tm_year', 'tm_mon', 'tm_mday', 'tm_hour', 'tm_min', + 'tm_sec', 'tm_wday', 'tm_yday', 'tm_isdst')) + st = os.stat(__file__) + self.assertIn("st_mode", st._fields) + self.assertIn("st_ino", st._fields) + self.assertIn("st_dev", st._fields) + def test_repr(self): t = time.gmtime() self.assertTrue(repr(t)) diff -r bc5f257f5cc1 Objects/structseq.c --- a/Objects/structseq.cSun Jan 12 14:12:59 2014 -0800 +++ b/Objects/structseq.cSun Jan 12 16:31:15 2014 -0800 @@ -7,6 +7,7 @@ static char visible_length_key[] = "n_sequence_fields"; static char real_length_key[] = "n_fields"; static char unnamed_fields_key[] = "n_unnamed_fields"; +static char _fields_key[] = "_fields"; /* Fields with this name have only a field index, not a field name. They are only allowed for indices < n_visible_fields. */ @@ -14,6 +15,7 @@ _Py_IDENTIFIER(n_sequence_fields); _Py_IDENTIFIER(n_fields); _Py_IDENTIFIER(n_unnamed_fields); +_Py_IDENTIFIER(_fields); #define VISIBLE_SIZE(op) Py_SIZE(op) #define VISIBLE_SIZE_TP(tp) PyLong_AsLong( \ @@ -327,6 +329,7 @@ PyMemberDef* members; int n_members, n_unnamed_members, i, k; PyObject *v; + PyObject *_fields; #ifdef Py_TRACE_REFS /* if the type object was chained, unchain it first @@ -389,6 +392,19 @@ SET_DICT_FROM_INT(real_length_key, n_members); SET_DICT_FROM_INT(unnamed_fields_key, n_unnamed_members); + _fields = PyTuple_New(desc->n_in_sequence); + if (!_fields) + return -1; + for (i = 0; i != desc->n_in_sequence; ++i) { + PyObject *field = PyUnicode_FromString(members[i].name); + PyTuple_SET_ITEM(_fields, i, field); + } + if (PyDict_SetItemString(dict, _fields_key, _fields) < 0) { + Py_DECREF(_fields); + return -1; + } + Py_DECREF(_fields); + return 0; } @@ -417,7 +433,8 @@ { if (_PyUnicode_FromId(&PyId_n_sequence_fields) == NULL || _PyUnicode_FromId(&PyId_n_fields) == NULL - || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL) + || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL + || _PyUnicode_FromId(&PyId__fields) == NULL) return -1; return 0; ----- Original Message -----
From: Andrew Barnert
To: "python-ideas@python.org" Cc: Sent: Sunday, January 12, 2014 4:17 PM Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) I don't think the proposed NamedTuple ABC adds anything on top of duck typing on _fields (or on whichever other method you need, and possibly checking for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a protocol, not a type.
But I think one of the ideas that came out of that discussion is worth pursuing on its own: giving a _fields member to every structseq type.
Most of the namedtuple-like classes in the builtins/stdlib, like os.stat_result, are implemented with PyStructSequence. Since 3.3, that's been a public, documented protocol. A structseq type is already a tuple. And it stores all the information needed to expose the fields to Python, it just doesn't expose them in any way. And making it do so is easy. (Either add it to the type __dict__ at type creation, or add a getter that generates it on the fly from tp_members.)
Of course a structseq can do more than a namedtuple. In particular, using a structseq via its _fields would mean that you miss its "non-sequence" fields, like st_mtime_ns. But then that's already true for using a structseq as a sequence, or just looking at its repr, so I don't think that's a problem. (The "visible fields" are visible for a reason…)
And this still wouldn't mean that _fields is part of the "named tuple protocol" described in the glossary, just that it's part of structseq types as well as collections.namedtuple types.
And this wouldn't give structseq an on-demand __dict__ so you can just call var(s) instead of OrderedDict(zip(s._fields, s)).
Still, it seems like a clear win. A small patch, a bit of extra storage on each structseq type object (not on the instances), and now you can reflect on the most common kind of C named tuple types the same way you do on the most common kind of Python named tuple types. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 01/12/2014 04:32 PM, Andrew Barnert wrote:
Here's a quick patch:
Please put the patch on the issue tracker[1]. Create a new issue if an appropriate one does not already exist. Thanks. -- ~Ethan~ [1] http://bugs.python.org
See http://bugs.python.org/issue20230 for the issue and patch. Thanks to Ethan Furman for telling me to post it there instead of here. ----- Original Message -----
From: Andrew Barnert
To: Andrew Barnert ; "python-ideas@python.org" Cc: Sent: Sunday, January 12, 2014 4:32 PM Subject: Re: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) Here's a quick patch:
diff -r bc5f257f5cc1 Lib/test/test_structseq.py --- a/Lib/test/test_structseq.pySun Jan 12 14:12:59 2014 -0800 +++ b/Lib/test/test_structseq.pySun Jan 12 16:31:15 2014 -0800 @@ -28,6 +28,16 @@ for i in range(-len(t), len(t)-1): self.assertEqual(t[i], astuple[i]) + def test_fields(self): + t = time.gmtime() + self.assertEqual(t._fields, + ('tm_year', 'tm_mon', 'tm_mday', 'tm_hour', 'tm_min', + 'tm_sec', 'tm_wday', 'tm_yday', 'tm_isdst')) + st = os.stat(__file__) + self.assertIn("st_mode", st._fields) + self.assertIn("st_ino", st._fields) + self.assertIn("st_dev", st._fields) + def test_repr(self): t = time.gmtime() self.assertTrue(repr(t)) diff -r bc5f257f5cc1 Objects/structseq.c --- a/Objects/structseq.cSun Jan 12 14:12:59 2014 -0800 +++ b/Objects/structseq.cSun Jan 12 16:31:15 2014 -0800 @@ -7,6 +7,7 @@ static char visible_length_key[] = "n_sequence_fields"; static char real_length_key[] = "n_fields"; static char unnamed_fields_key[] = "n_unnamed_fields"; +static char _fields_key[] = "_fields"; /* Fields with this name have only a field index, not a field name. They are only allowed for indices < n_visible_fields. */ @@ -14,6 +15,7 @@ _Py_IDENTIFIER(n_sequence_fields); _Py_IDENTIFIER(n_fields); _Py_IDENTIFIER(n_unnamed_fields); +_Py_IDENTIFIER(_fields); #define VISIBLE_SIZE(op) Py_SIZE(op) #define VISIBLE_SIZE_TP(tp) PyLong_AsLong( \ @@ -327,6 +329,7 @@ PyMemberDef* members; int n_members, n_unnamed_members, i, k; PyObject *v; + PyObject *_fields; #ifdef Py_TRACE_REFS /* if the type object was chained, unchain it first @@ -389,6 +392,19 @@ SET_DICT_FROM_INT(real_length_key, n_members); SET_DICT_FROM_INT(unnamed_fields_key, n_unnamed_members); + _fields = PyTuple_New(desc->n_in_sequence); + if (!_fields) + return -1; + for (i = 0; i != desc->n_in_sequence; ++i) { + PyObject *field = PyUnicode_FromString(members[i].name); + PyTuple_SET_ITEM(_fields, i, field); + } + if (PyDict_SetItemString(dict, _fields_key, _fields) < 0) { + Py_DECREF(_fields); + return -1; + } + Py_DECREF(_fields); + return 0; } @@ -417,7 +433,8 @@ { if (_PyUnicode_FromId(&PyId_n_sequence_fields) == NULL || _PyUnicode_FromId(&PyId_n_fields) == NULL - || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL) + || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL + || _PyUnicode_FromId(&PyId__fields) == NULL) return -1; return 0;
From: Andrew Barnert
To: "python-ideas@python.org" Cc: Sent: Sunday, January 12, 2014 4:17 PM Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) I don't think the proposed NamedTuple ABC adds anything on top of duck typing on _fields (or on whichever other method you need, and possibly checking for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a protocol, not a type.
But I think one of the ideas that came out of that discussion is worth
----- Original Message ----- pursuing
on its own: giving a _fields member to every structseq type.
Most of the namedtuple-like classes in the builtins/stdlib, like os.stat_result, are implemented with PyStructSequence. Since 3.3, that's been a public,
documented protocol. A structseq type is already a tuple. And it stores all the information needed to expose the fields to Python, it just doesn't expose them in any way. And making it do so is easy. (Either add it to the type __dict__ at type creation, or add a getter that generates it on the fly from tp_members.)
Of course a structseq can do more than a namedtuple. In particular, using a
structseq via its _fields would mean that you miss its "non-sequence" fields, like st_mtime_ns. But then that's already true for using a structseq as a sequence, or just looking at its repr, so I don't think that's a problem. (The "visible fields" are visible for a reason…)
And this still wouldn't mean that _fields is part of the "named tuple protocol" described in the glossary, just that it's part of structseq types as well as collections.namedtuple types.
And this wouldn't give structseq an on-demand __dict__ so you can just call var(s) instead of OrderedDict(zip(s._fields, s)).
Still, it seems like a clear win. A small patch, a bit of extra storage on each structseq type object (not on the instances), and now you can reflect on the most common kind of C named tuple types the same way you do on the most common kind of Python named tuple types. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 13 Jan 2014 11:19, "Andrew Barnert"
See http://bugs.python.org/issue20230 for the issue and patch. Thanks to
Ethan Furman for telling me to post it there instead of here. This approach sounds good to me for 3.5. The ABC recipe might make a good addition to the ActiveState cookbook. Cheers, Nick.
----- Original Message -----
From: Andrew Barnert
To: Andrew Barnert ; "python-ideas@python.org" <
Cc: Sent: Sunday, January 12, 2014 4:32 PM Subject: Re: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class)
Here's a quick patch:
diff -r bc5f257f5cc1 Lib/test/test_structseq.py --- a/Lib/test/test_structseq.pySun Jan 12 14:12:59 2014 -0800 +++ b/Lib/test/test_structseq.pySun Jan 12 16:31:15 2014 -0800 @@ -28,6 +28,16 @@ for i in range(-len(t), len(t)-1): self.assertEqual(t[i], astuple[i])
+ def test_fields(self): + t = time.gmtime() + self.assertEqual(t._fields, + ('tm_year', 'tm_mon', 'tm_mday', 'tm_hour', 'tm_min', + 'tm_sec', 'tm_wday', 'tm_yday', 'tm_isdst')) + st = os.stat(__file__) + self.assertIn("st_mode", st._fields) + self.assertIn("st_ino", st._fields) + self.assertIn("st_dev", st._fields) + def test_repr(self): t = time.gmtime() self.assertTrue(repr(t)) diff -r bc5f257f5cc1 Objects/structseq.c --- a/Objects/structseq.cSun Jan 12 14:12:59 2014 -0800 +++ b/Objects/structseq.cSun Jan 12 16:31:15 2014 -0800 @@ -7,6 +7,7 @@ static char visible_length_key[] = "n_sequence_fields"; static char real_length_key[] = "n_fields"; static char unnamed_fields_key[] = "n_unnamed_fields"; +static char _fields_key[] = "_fields";
/* Fields with this name have only a field index, not a field name. They are only allowed for indices < n_visible_fields. */ @@ -14,6 +15,7 @@ _Py_IDENTIFIER(n_sequence_fields); _Py_IDENTIFIER(n_fields); _Py_IDENTIFIER(n_unnamed_fields); +_Py_IDENTIFIER(_fields);
#define VISIBLE_SIZE(op) Py_SIZE(op) #define VISIBLE_SIZE_TP(tp) PyLong_AsLong( \ @@ -327,6 +329,7 @@ PyMemberDef* members; int n_members, n_unnamed_members, i, k; PyObject *v; + PyObject *_fields;
#ifdef Py_TRACE_REFS /* if the type object was chained, unchain it first @@ -389,6 +392,19 @@ SET_DICT_FROM_INT(real_length_key, n_members); SET_DICT_FROM_INT(unnamed_fields_key, n_unnamed_members);
+ _fields = PyTuple_New(desc->n_in_sequence); + if (!_fields) + return -1; + for (i = 0; i != desc->n_in_sequence; ++i) { + PyObject *field = PyUnicode_FromString(members[i].name); + PyTuple_SET_ITEM(_fields, i, field); + } + if (PyDict_SetItemString(dict, _fields_key, _fields) < 0) { + Py_DECREF(_fields); + return -1; + } + Py_DECREF(_fields); + return 0; }
@@ -417,7 +433,8 @@ { if (_PyUnicode_FromId(&PyId_n_sequence_fields) == NULL || _PyUnicode_FromId(&PyId_n_fields) == NULL - || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL) + || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL + || _PyUnicode_FromId(&PyId__fields) == NULL) return -1;
return 0;
From: Andrew Barnert
To: "python-ideas@python.org" Cc: Sent: Sunday, January 12, 2014 4:17 PM Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) I don't think the proposed NamedTuple ABC adds anything on top of duck typing on _fields (or on whichever other method you need, and possibly checking for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a protocol, not a type.
But I think one of the ideas that came out of that discussion is worth
----- Original Message ----- pursuing
on its own: giving a _fields member to every structseq type.
Most of the namedtuple-like classes in the builtins/stdlib, like os.stat_result, are implemented with PyStructSequence. Since 3.3, that's been a
python-ideas@python.org> public,
documented protocol. A structseq type is already a tuple. And
it stores all
the
information needed to expose the fields to Python, it just doesn't expose them in any way. And making it do so is easy. (Either add it to the type __dict__ at type creation, or add a getter that generates it on the fly from tp_members.)
Of course a structseq can do more than a namedtuple. In particular, using a
structseq via its _fields would mean that you miss its "non-sequence" fields, like st_mtime_ns. But then that's already true for using a structseq as a sequence, or just looking at its repr, so I don't think that's a problem. (The "visible fields" are visible for a reason…)
And this still wouldn't mean that _fields is part of the "named tuple protocol" described in the glossary, just that it's part of structseq types as well as collections.namedtuple types.
And this wouldn't give structseq an on-demand __dict__ so you can just call var(s) instead of OrderedDict(zip(s._fields, s)).
Still, it seems like a clear win. A small patch, a bit of extra storage on each structseq type object (not on the instances), and now you can reflect on the most common kind of C named tuple types the same way you do on the most common kind of Python named tuple types. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (12)
-
anatoly techtonik
-
Andrew Barnert
-
Chris Angelico
-
Eric Snow
-
Eric V. Smith
-
Ethan Furman
-
João Bernardo
-
Nick Coghlan
-
Raymond Hettinger
-
Steven D'Aprano
-
Terry Reedy
-
Yury Selivanov