[Python-ideas] A subclassing API for named tuples?

Nick Coghlan ncoghlan at gmail.com
Thu Feb 14 14:19:51 CET 2013


An exchange with Antoine in one of the enum threads sparked a thought.

A recurring suggestion for collections.namedtuple is that it would be
nice to be able to define them like this (as it not only avoids having
to repeat the class name, but also allows them to play nicely with
pickle and other name-based reference mechanisms):

    class MyTuple(collections.NamedTuple):
        __fields__ = "a b c d e".split()

However, one of Raymond's long standing objections to such a design
for namedtuple is the ugliness of people having to remember to include
the right __slots__ definition to ensure it doesn't add any storage
overhead above and beyond that for the underlying tuple.

For the intended use case as a replacement for short tuples, an unused
dict per instance is a *big* wasted overhead, so that concern can't be
dismissed as premature optimisation:

>>> import sys
>>> class Slots(tuple): __slots__ = ()
...
>>> class InstanceDict(tuple): pass
...
>>> sys.getsizeof(tuple([1, 2, 3]))
72
>>> x = Slots([1, 2, 3])
>>> sys.getsizeof(x)
72
>>> y = InstanceDict([1, 2, 3])
>>> sys.getsizeof(y)  # All good, right?
72
>>> sys.getsizeof(y.__dict__) # Yeah, not so much...
96

However, the thought that occurred to me is that the right metaclass
definition allows the default behaviour of __slots__ to be flipped, so
that you get "__slots__ = ()" defined in your class namespace
automatically, and you have to write "del __slots__" to get normal
class behaviour back:

>>> class SlotsMeta(type):
...     def __prepare__(cls, *args, **kwds):
...         return dict(__slots__=())
...
>>> class SlotsByDefault(metaclass = SlotsMeta): pass
...
>>> class Slots(tuple, SlotsByDefault): pass
...
>>> class InstanceDict(tuple, SlotsByDefault): del __slots__
...
>>> sys.getsizeof(Slots([1, 2, 3]))
72
>>> Slots().__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Slots' object has no attribute '__dict__'
>>> sys.getsizeof(InstanceDict([1, 2, 3]))
72
>>> sys.getsizeof(InstanceDict([1, 2, 3]).__dict__)
96

So, what do people think? Too much magic? Or just the right amount to
allow a cleaner syntax for named tuple definitions, without
inadvertently encouraging people to do bad things to their memory
usage? (Note: for backwards compatibility reasons, we couldn't use a
custom metaclass for the classes returned by the existing collections
namedtuple API. However, we could certainly provide a distinct
collections.NamedTuple type which used a custom metaclass to behave
this way).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list