[Python-ideas] namedtuple is not as good as it should be

Christian Tismer tismer at stackless.com
Mon Jun 10 14:37:48 CEST 2013


Hey Raymond,

On 10.06.13 06:06, Raymond Hettinger wrote:
>
> On Jun 9, 2013, at 5:55 PM, Christian Tismer <tismer at stackless.com 
> <mailto:tismer at stackless.com>> wrote:
>
>> If it is necessary to have class instances like today, ok. But there 
>> is no
>> need to search that class in a pickle! Instead, the defining set of 
>> attributes
>> could be pickled (uniquely stored by tuple comparison), and the class 
>> could
>> be re-created on-the-fly at unpickling time.
>
> That is a reasonable wish :-)
> But, I'm not sure how you propose for it to work.

I'm not sure about this, yet.
It just hit me because I'm fiddling with DB issues, where I use identity 
dicts
to fold billions of data records, so I'm thinking a bit like:

There could be a dict somewhere which holds all the seen namedtuple names
in a mapping that memorizes them all.
This is a bit similar to interning.
You could have named tuple fields without actually naming the tuple.

But let us for now assume namedtuple has a name, still. ;-)

> What would you expect from:
>
> >>> Soldier = namedtuple('Soldier', ['name', 'rank', 'serial_number'])
> >>> johnny = Soldier('John Doe', 'Private', 12345)
> >>> roger = Soldier('Roger', 'Corporal', 67890)
> >>> fireteam = [johnny, roger]
> >>> pickletools.dis(pickle.dumps(fireteam))
>
> Would it re-specify the class for every instance (at a cost of both 
> speed and space)?
>
>    [(nametuple, 'Soldier', ['name', 'rank', 'serial_number'], ('John 
> Doe', 'Private', 12345)),
>     (nametuple, 'Soldier', ['name', 'rank', 
> 'serial_number'], ('Roger', 'Corporal', 67890))]
>

No, of course not!

> Or would you have a mechanism to specify the names just once?
>
>    [(nametuple, 'Soldier', ['name', 'rank', 'serial_number'])
>     (Soldier, ('John Doe', 'Private', 12345)),
>     (Soldier, ('Roger', 'Corporal', 67890))]
>

Rough idea:
The names would be specified once, like now.
The defining string tuple goes somewhere in a central store that keeps the
named tuples, identified by exactly the tuple of names.
The name of a namedtuple is just a handle.

So the tuple of names can be used to find its definition in a global store
for namedtuple prototypes, and the associated class can be looked up there.
I think also the generated class can be cached there.
But this can all be done on demand, because there is only one implementation
for one tuple of names.

By having such a registry thing under the hood, the user sees just tuples
of names anonymously and does not need to think of a class at all.
This makes the namedtuple very much like a tuple, again.

> What would you do with a customized named tuples?
>
> >>> class PrettySoldier(Soldier):
>         def __repr__(self):
>               return 'N:{0} R:{1} S:{2}'.format(*self)
>

If you want to build a custom named tuple, you would get the basic class
out of the prototype store and then derive your own methods.
Sure, we are then back on field one. ;-)

---
But: maybe the idea really can be extended, because we are still defining
something decorative for constants, just with some modifications.
Maybe namedtuple should have its own metaclass that copes with registering
the templates of some derived PrettyTuple.

> How about enumerations?  To pickle the value, Color.red, would you propose
> for that instance to store the definition of the enumeration as well 
> as its value?
>

I'm not sure about enums, yet.
Is an enum as dynamic as the field names of a generic database table?

Enum is different in that you want to have certain constants being of
a certain type, and two similar definitions are intentionally not compatible
by default. So I think the name of an enum is important.

Named tuples are still tuples and compatible. They are just a bit more
specific since they have optional names.
They suffer right now from the fact that their class-ness suddenly
jumps into your face when you need to store them.


Some implementation
------------------

The function

namedtuple(name, fields)

right now produces a namedtuple class and returns that.

Instead of creating the class immediately, the class is looked up
in a cache that stores the created namedtuple classes once.
Creation only happens if a tuple of names is not found in the cache.

Pickling of a namedtuple class does not pickle the class but calls
namedtuple with arguments, as you wrote above:

    [(nametuple, 'Soldier', ['name', 'rank', 'serial_number'])
     (Soldier, ('John Doe', 'Private', 12345)),
     (Soldier, ('Roger', 'Corporal', 67890))]

We could maybe get further and have a "namelesstuple" which is
simply identified by tuple identity.
The shape
('name', 'rank', 'serial_number')

is made unique by having a dict like the interneds dict.
We have one such dict that holds our tuples, like I use:

class _UniqeDict(dict):
     '''
     A dict that stores its records and returns a unique version of that 
key'
     Usage: str_val = unique[str_val]
     '''
     def __missing__(self, key):
         self[key] = key
         return key

unique = _UniqeDict()

So namelesstuple(*seq) does

- check if the seq is already an interned tuple, which
   then can continue without any analysis,
   or does all the magic to create a namelesstuple class.

- a second dict maps from {tuple: namelessclass}
   where the keys are from unique.

Now we have tuples that can be identified just by their structure
and don't need a name.

Soldier = namelesstuple('name', 'rank', 'serial_number')

You can now use soldier to create many such instances.

s1 = Soldier('John Doe', 'Private', 12345)
s2 = Soldier('Roger', 'Corporal', 67890)

Then the repr:

 >>> s1
(name='John Doe', rank='Private', serial_number=12345)

 >>> soldier
namlesstuple('name', 'rank', 'serial_number')

I think that approach also meets the idea of Ryan.

I have just written a very rough implementation as a proof of concept.

will send it later.

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer at stackless.com>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship* http://starship.python.net/
14482 Potsdam                :     PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130610/30e16ced/attachment-0001.html>


More information about the Python-ideas mailing list