Differences creating tuples and collections.namedtuples
Terry Reedy
tjreedy at udel.edu
Mon Feb 18 16:28:43 EST 2013
On 2/18/2013 6:47 AM, John Reid wrote:
> I was hoping namedtuples could be used as replacements for tuples
> in all instances.
This is a mistake in the following two senses. First, tuple is a class
with instances while namedtuple is a class factory that produces
classes. (One could think of namedtuple as a metaclass, but it was not
implemented that way.) Second, a tuple instance can have any length and
different instances can have different lengths. On the other hand, all
instances of a particular namedtuple class have a fixed length. This
affects their initialization. So does the fact that Oscar mentioned,
that fields can be initialized by name.
> There seem to be some differences between how tuples and namedtuples
> are created. For example with a tuple I can do:
>
> a=tuple([1,2,3])
But no sensible person would ever do that, since it creates an
unnecessary list and is equivalent to
a = 1,2,3
The actual api is tuple(iterable). I presume you know that, but it gets
to the question you ask about 'why the difference?'. The only reason to
use an explicit tuple() call is when you already have an iterable,
possibly of unknown length, rather than the individual field objects. In
the latter case, one should use a display.
> with namedtuples I get a TypeError:
>
> from collections import namedtuple
> B=namedtuple('B', 'x y z')
> b=B([1,2,3])
There is no namedtuple B display, so one *must* use an explicit call
with the proper number of args. The simplest possibility is B(val0,
val1, val2). Initializaing a namedtuple from an iterable is unusual, and
hence gets the longer syntax. In other words, the typical use case for a
namedtuple class is to replace statements that have tuple display.
return a, b, c
to
return B(a, b, c)
or
x = a, b, c
to
x = B(a, b, c)
It is much less common to change tuple(iterable) to B(iterable).
> def canSequence(obj):
> if isinstance(obj, (list, tuple)):
> t = type(obj)
> return t([can(i) for i in obj])
> else:
> return obj
The first return could also be written t(map(can, obj)) or, in Python 3,
t(can(i) for i in obj).
> where obj is a namedtuple and t([can(i) for i in obj]) fails with the TypeError. See http://article.gmane.org/gmane.comp.python.ipython.user/10270 for more info.
>
> Is this a problem with namedtuples, ipython or just a feature?
With canSequence. If isinstance was available and the above were written
before list and tuple could be subclassed, canSequence was sensible when
written. But as Oscar said, it is now a mistake for canSequence to
assume that all subclasses of list and tuple have the same
initialization api.
In fact, one reason to subclass a class is to change the initialization
api. For instance, before python 3, range was a function that returned a
list. If lists had always been able to be subclasses, it might instead
have been written as a list subclass that attached the start, stop, and
step values, like so:
# python 3
class rangelist(list):
def __init__(self, *args):
r = range(*args)
self.extend(r)
self.start = r.start
self.stop = r.stop
self.step = r.step
r10 = rangelist(10)
print(r10, r10.start, r10.stop, r10.step)
>>>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0 10 1
However, define can() and canSequence(r10) will raise a TypeError, just
as with a namedtuple B instance.
TypeError: 'list' object cannot be interpreted as an integer
So, while your question about the namedtuple api is a legitimate one,
your problem with canSequence is not really about namedtuples, but about
canSequence making a bad assumption.
--
Terry Jan Reedy
More information about the Python-list
mailing list