[Python-ideas] namedtuple is not as good as it should be
Christian Tismer
tismer at stackless.com
Mon Jun 10 00:55:13 CEST 2013
Sorry Raymond if this offends you.
But after some extensive use of namedtuple I think it needs
a re-design.
The pros:
_________
- a namedtuple can be easily used in place of a tuple.
- it provides names for its fields and behaves like a tuple, otherwise.
Furthermorre:
- a named tuple is the natural choice for modelling the records of a
database.
The cons:
- All of the above becomes wrong if you use namedtuple as a real
replacement for tuple.
- especially for databases it makes little sense as it is.
Reason:
Pickling
________
- pickling of tuples:
always possible if it contains built-in types.
tuples are simply tuples. There is just _one_ type.
- pickling of namedtuple:
sometimes possible, if your definition is static enough.
namedtuple has a subtype per tuple layout, and you need to cope with
that.
Just to be clear about that: Sure, it is possible to pickle named tuples,
but you have to think about it! And to have to thing about it trashes
a lot of the fun of having those names for free.
And typically this happens after you did your analysis of 20 GB of data:
You cannot pickle your nicely formatted namedtuple instances after the fact.
Actually, to save all the computation, you do a hack that turns all your
alive namedtuple instances back into ordinary tuples.
Silent implications introduced by namedtuple:
_____________________________________________
Without being very explicit, namedtuple makes you use it happily instead
of tuples.
But instead of using a native, simple type, you now use a not-so-simple,
user-defined type.
This type
- is not built in
- has a class definition
- needs a global, constant definition _somewhrere_
Typically, you run a script interactively, and typically you need to pickle
some __main__.somename namedtuple class instances.
This is exactly what you don't want! You want to have some anonymous
data in the pickle and don't want to make anything fixed in stone.
namedtuple() Factory Function for Tuples with Named Fields
__________________________________________________________
It would be great if namedtuple were just this, as the doku says.
But instead, it
- creates a named class, i.e. forces me to name it
- you create instances of that specific class and not just tuple.
Usability for databases
______________________
For simple databases which enumerate (employee, salary, ...) or
(shoesize, height, married) as example "database"s, namedtuple is ok.
As soon as you write a real database implementation with no fixed
layout, you get into trouble.
Easy database approach:
You define a dbtable as a collection of tuples, maybe as a dict with fast
index keys. Not a problem with tuples, which are of type tuple.
With named tuple, you suddenly see yourself creating namedtuple instead.
But those namedtuple records cannot be pickled when used as a replacement of
regular tuples, because they now have a dynamically created type, and
extra actions
are necessary to make it possible to pickle those.
From the documentation:
EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
This implicitly suggests that namedtuple is the tool of choice to support
databases in Python.
Wrong!
It supports simple, fixed data structures in Python.
For instance, you can use it to define a fixed structural record to
define the
general description of a database field and its attributes.
For a database itself instead, this is very wrong.
Nobody uses a fixed class definition to define the structure of some
database
tables. Instead, you use a generic approach with a row class that describes
what a row is, but dynamically.
So why all this rant?
___________________
What I'm trying to explain is that namedtuple should be closer to tuple!
A namedtuple should not be an explicit class with instances, but a generic
subclass of tuple, for all namedtuples.
Then, if the user decides to use a namedtuple to build his own class
upon that,
fine. He then might want to do everything needed to support and pickle
his class.
But the namedtuple should be a singe (maybe builtin) class that is just
a tuple with field names, nothing more.
Implementation idea (roughly)
_____________________________
Whatever a namedtuple does, it should behave as closely as possible
to a tuple, just providing attribute names.
Pickling support should be so that the user does not need to know that
a namedtuple has a special class. Actually, there should be only a generic
class, and the namedtuple "class" is a template instance that just holds
the names. Those names could go into some registry or whatever.
The only interesting thing about a namedtuple is the set of names used.
This set of names is not eligible to enforce the whole import machinery,
the associated problems etc. The set of attribute names defines the
namedtuple,
and that's it.
If it is necessary to have class instances like today, ok. But there is no
need to search that class in a pickle! Instead, the defining set of
attributes
could be pickled (uniquely stored by tuple comparison), and the class could
be re-created on-the-fly at unpickling time.
Conclusion
___________
I love namedtuple, and I hate it. I want to get rid of the second half
of this sentence.
Let us invent one that does not enforce class behavior.
I am thinking of a prototype...
cheers - chris
p.s.: there is a lot about database design not mentioned here.
--
Christian Tismer :^) <mailto:tismer at stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130610/87d15e62/attachment-0001.html>
More information about the Python-ideas
mailing list