
* Teach vars() to work with classes defining __slots__. Essentially, __slots__ are just an implementation detail designed to make instances a bit more compact. * Make the docstring writable for staticmethods, classmethods, and properties. We did this for function objects and it worked-out well. * Have staticmethods and classmethods automatically fill-in a docstring from the wrapped function. An editor's tooltips would benefit nicely. * Add a pure python named_tuple class to the collections module. I've been using the class for about a year and found that it greatly improves the usability of tuples as records. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 * Give builtin types a __name__ attribute so that we have a uniform way of accessing type names. * Enhance doctest with a method that computes updated doctest results. If the only change I make to a matrix suite is to add commas to long numbers in the matrix repr(), then it would be nice to have an automated way to update the matrix output on all the other test cases in the module. * add an optional position argument to heapq.heapreplace to allow an arbitrary element to be updated in-place and then have the heap condition restored. I've now encountered three separate parties trying to figure-out how to do this.

On 2/15/07, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
Hm, but why would they still have to be tuples? Why not just have a generic 'record' class?
* Give builtin types a __name__ attribute so that we have a uniform way of accessing type names.
This already exists, unless I misunderstand you: Python 2.2.3+ (#94, Jun 4 2003, 08:24:18) [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-113)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
int.__name__ 'int'
What I could have used for Py3k (specifically for the 2to3 transformer): mods to the DocTestParser class that allow you to exactly reproduce the input string from the return value of parse(). Unrelated, I'd like the tokenize module to be more easily extensible. E.g. I'd like to add new tokens, and I'd like to be able to change its whitespace handling. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 15/02/2007 20.59, Raymond Hettinger wrote:
+1 from me too, I've been using a class with the same name and semantic (though an inferior implementation) for almost two years now, with great benefits. As suggested in the cookbook comment, please consider changing the semantic of the generated constructor so that it accepts a single iterable positional arguments (or keyword arguments). This matches tuple() (and other containers) in behaviour, and makes it easier to substitute existing uses with named tuples. -- Giovanni Bajo

Raymond Hettinger <raymond.hettinger <at> verizon.net> writes:
The implementation of this recipe is really clean and I like it a lot (I even think of including it in our codebase), but there are a few issues that I would like to point out. 1. I find the camelcase confusing and I think that NamedTuple should be spelled namedtuple, since is a function, not a class. The fact that it returns classes does not count ;) 2. I agree with Giovanni Bajo, the constructor signature should be consistent with regular tuples. For instance I want to be able to map a named tuple over a record set as returned by fetchall. 3. I would like to pass the list of the fields as a sequence, not as a string. It would be more consistent and it would make easier the programmatic creation of NamedTuple classes at runtime. 4. I want help(MyNamedTuple) to work well; in particular it should display the right module name. That means that in the m dictionary you should add a __module__ attribute: __module__ = sys._getframe(1).f_globals['__name__'] 5. The major issue is that pickle does work with named tuples since the __module__ attribute is wrong. The suggestion in #4 would solve even this issue for free. 6. The ability to pass a show function to the __repr__ feems over-engineering to me. In short, here is how I would change the recipe: import sys from operator import itemgetter def namedtuple(f): """Returns a new subclass of tuple with named fields. >>> Point = namedtuple('Point x y'.split()) >>> Point.__doc__ # docstring for the new class 'Point(x, y)' >>> p = Point((11,), y=22) # instantiate with positional args or keywords >>> p[0] + p[1] # works just like the tuple (11, 22) 33 >>> x, y = p # unpacks just like a tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessable by name 33 >>> p # readable __repr__ with name=value style Point(x=11, y=22) """ typename, field_names = f[0], f[1:] nargs = len(field_names) def __new__(cls, args=(), **kwds): if kwds: try: args += tuple(kwds[name] for name in field_names[len(args):]) except KeyError, name: raise TypeError( '%s missing required argument: %s' % (typename, name)) if len(args) != nargs: raise TypeError( '%s takes exactly %d arguments (%d given)' % (typename, nargs, len(args))) return tuple.__new__(cls, args) template = '%s(%s)' % ( typename, ', '.join('%s=%%r' % name for name in field_names)) def __repr__(self): return template % self m = dict(vars(tuple)) # pre-lookup superclass methods (for faster lookup) m.update(__doc__= '%s(%s)' % (typename, ', '.join(field_names)), __slots__ = (), # no per-instance dict __new__ = __new__, __repr__ = __repr__, __module__ = sys._getframe(1).f_globals['__name__'], ) m.update((name, property(itemgetter(index))) for index, name in enumerate(field_names)) return type(typename, (tuple,), m) if __name__ == '__main__': import doctest TestResults = namedtuple(['TestResults', 'failed', 'attempted']) print TestResults(doctest.testmod())

Thanks for the comments. Will experiment with them and get back to you.. The alternate naming is fine. At work, we call it ntuple(). The constructor signature has been experimented with several time and had best results in its current form which allows the *args for casting a record set returned by SQL or by the CSV module as in Point(*fetchall(s)), and it allows for direct construction with Point(2,3) without the slower and weirder form: Point((2,3)). Also, the current signature works better with keyword arguments: Point(x=2, y=3) or Point(2, y=3) which wouldn't be common but would be consistent with the relationship between keyword arguments and positional arguments in other parts of the language. The string form for the named tuple factory was arrived at because it was easier to write, read, and alter than its original form with a list of strings: Contract = namedtuple('Contract stock strike volatility expiration rate iscall') vs. Contract = namedtuple('Contract', 'stock', 'strike', 'volatility', 'expiration', 'rate', 'iscall') That former is easier to edit and to re-arrange. Either form is trivial to convert programmatically to the other and the definition step only occurs once while the use of the new type can appear many times throughout the code. Having experimented with both forms, I've found the string form to be best thought it seems a bit odd. Yet, the decision isn't central to the proposal and is still an open question. The __module__ idea is nice. Will add it. Likewise with pickling. The 'show' part of __repr__ was done for speed (localized access versus accessing the enclosing scope. Will rename it to _show to make it clear that it is not part of the API. Thanks for the accolades and the suggestions. Raymond ----- Original Message ----- From: "Michele Simionato" <michele.simionato@gmail.com> To: <python-dev@python.org> Sent: Monday, February 19, 2007 9:25 PM Subject: Re: [Python-Dev] Py2.6 ideas

More thoughts on named tuples after trying-out all of Michele's suggestions: * The lowercase 'namedtuple' seemed right only because it's a function, but as a factory function, it is somewhat class-like. In use, 'NamedTuple' more closely matches my mental picture of what is happening and distinguishes what it does from the other two entries in collections, 'deque' and 'defaultdict' which are used to create instances instead of new types. * I remembered why the __repr__ function had a 'show' argument. I've changed the name now to make it more clear and added a docstring. The idea was the some use cases require that the repr exactly match the default style for tuples and the optional argument allowed for that possiblity with almost no performance hit. The updated recipe is at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 Raymond

Raymond Hettinger <python <at> rcn.com> writes:
This is debatable. I remember Guido using lowercase for metaclasses in the famous descrintro essay. I still like more the lowercase for class factories. But I will not fight on this ;)
But what about simply changing the __repr__? In [2]: Point = NamedTuple('Point','x','y') In [3]: Point(1,2) Out[3]: Point(x=1, y=2) In [4]: Point.__repr__ = tuple.__repr__ In [5]: Point(1,2) Out[5]: (1, 2) It feels clearer to me. Michele Simionato

Raymond Hettinger wrote:
I think you mean something like [Point(*tup) for tup in fetchall(s)], which I don't like for the reasons explained later.
I don't buy this argument. Yes, Point(2,3) is nicer than Point((2,3)) in the interactive interpreter and in the doctests, but in real life one has always tuples coming as return values from functions. Consider your own example, TestResults(*doctest.testmod()). I will argue that the * does not feel particularly good and that it would be better to just write TestResults(doctest.testmod()). Moreover I believe that having a subclass constructor incompatible with the base class constructor is very evil. First of all, you must be consistent with the tuple constructor, not with "other parts of the language". Finally I did some timing of code like this:: from itertools import imap Point = namedtuple('Point x y'.split()) lst = [(i, i*i) for i in range(500)] def with_imap(): for _ in imap(Point, lst): pass def with_star(): for _ in (Point(*t) for t in lst): pass and as expected the performances are worse with the * notation. In short, I don't feel any substantial benefit coming from the *args constructor.
``Contract = namedtuple('Contract stock strike volatility expiration rate iscall'.split())`` is not that bad either, but I agree that this is a second order issue. Michele Simionato

Michele Simionato wrote:
That also nicely makes another point: this form accepts not just a list of strings but any iterable. That sounds nice. I'd vote against the one-string approach; it seems wholly un-Pythonic to me. Python already has a parser, so don't invent your own. Speaking of un-Pythonic-ness, the whole concept of a "named tuple" strikes me as un-Pythonic. It's the only data structure I can think of where every item inside has two names. (Does TOOWTDI apply to member lookup?) I'd prefer a lightweight frozen dict, let's call it a "record" after one of the suggestions in this thread. That achieves symmetry: mutable & immutable & heavyweight lightweight +-------------------------- positional | list tuple keyword | dict record For new code, I can't think of a single place I'd want to use a "NamedTuple" where a "record" wouldn't be arguably more suitable. The "NamedTuple" makes sense only for "fixing" old APIs like os.stat. My final bit of feedback: why is it important that a NamedTuple have a class name? In the current implementation, the first identifier split from the argument string is used as the "name" of the NamedTuple, and the following arguments are names of fields. Dicts and sets are anonymous, not to mention lists and tuples; why do NamedTuples need names? My feeling is: if you want a class name, use a class. /larry/

Larry Hastings wrote:
I knocked together a quick prototype of a "record" to show what I had in mind. Here goes: ------ import exceptions class record(dict): __classname__ = "record" def __repr__(self): s = [self.__classname__, "("] comma = "" for name in self.__names__: s += (comma, name, "=", repr(self[name])) comma = ", " s += ")" return "".join(s) def __delitem__(self, name): raise exceptions.TypeError("object is read-only") def __setitem__(self, name, value): self.__delitem__(name) def __getattr__(self, name): if name in self: return self[name] return object.__getattr__(self, name) def __hasattr__(self, name): if name in self.__names__: return self[name] return object.__hasattr__(self, name) def __setattr__(self, name, value): # hack: allow setting __classname__ and __names__ if name in ("__classname__", "__names__"): super(record, self).__setattr__(name, value) else: # a shortcut to throw our exception self.__delitem__(name) def __init__(self, **args): names = [] for name, value in args.iteritems(): names += name super(record, self).__setitem__(name, value) self.__names__ = tuple(names) if __name__ == "__main__": r = record(a=1, b=2, c="3") print r print r.a print r.b print r.c # this throws a TypeError # r.c = 123 # r.d = 456 # the easy way to define a "subclass" of record def Point(x, y): return record(x = x, y = y) # you can use hack-y features to make your "subclasses" more swell def Point(x, y): x = record(x = x, y = y) # a hack to print the name "Point" instead of "record" x.__classname__ = "Point" # a hack to impose an ordering on the repr() display x.__names__ = ("x", "y") return x p = Point(3, 5) q = Point(2, y=5) r = Point(y=2, x=4) print p, q, r # test pickling import pickle pikl = pickle.dumps(p) pp = pickle.loads(pikl) print pp print pp == p # test that the output repr works to construct s = repr(p) print repr(s) peval = eval(s) print peval print p == peval ------ Yeah, I considered using __slots__, but that was gonna take too long. Cheers, /larry/

On 2/20/07, Larry Hastings <larry@hastings.org> wrote:
Here's a simple implementation using __slots__: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237 And you don't have to hack anything to get good help() and a nice repr(). Declare a simple class for your type and you're ready to go:: >>> class Point(Record): ... __slots__ = 'x', 'y' ... >>> Point(3, 4) Point(x=3, y=4) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Steven Bethard <steven.bethard <at> gmail.com> writes:
Here's a simple implementation using __slots__:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237
That's pretty cool! Two suggestions: 1. rename the _items method to __iter__, so that you have easy casting to tuple and lists; 2. put a check in the metaclass such as ``assert '__init__' not in bodydict`` to make clear to the users that they cannot override the __init__ method, that's the metaclass job. Great hack! Michele Simionato

Steven Bethard wrote:
Thanks for steering me to it. However, your implementation and Mr. Hettinger's original NamedTuple both requires you to establish a type at the onset; with my prototype, you can create records ad-hoc as you can with dicts and tuples. I haven't found a lot of documentation on how to use __slots__, but I'm betting it wouldn't mesh well with my breezy ad-hoc records type. Cheers, /larry/

Larry Hastings <larry@hastings.org> wrote:
If it helps, you can think of Steven's and Raymond's types as variations of a C struct. They are fixed at type definition time, but that's more or less the point. Also, you can find more than you ever wanted to know about __slots__ by searching on google for 'python __slots__' (without quotes), but it would work *just fine* with your ad-hoc method, though one thing to note with your method - you can't guarantee the order of the attributes as they are being displayed. Your example would have the same issues as dict does here: >>> dict(b=1, a=2) {'a': 2, 'b': 1} Adding more attributes could further arbitrarily rearrange them. - Josiah

Josiah Carlson wrote:
one thing to note with your method - you can't guarantee the order of the attributes as they are being displayed.
Actually, my record type *can*; see the hack using the __names__ field. It won't preserve that order during iteration--but it's only a prototype so far, and it could be fixed if there was interest. /larry/

Larry Hastings <larry@hastings.org> wrote:
Actually, it *can't*. The ordering of the dict produced by the **kwargs arguments is exactly same as a regular dictionary. >>> def foo(**k): ... return k ... >>> foo(b=1, a=2) {'a': 2, 'b': 1} >>> foo(hello=1, world=2) {'world': 2, 'hello': 1} After construction you can preserve the order it has, but you've already lost the order provided in the call, so the order you get is arbitrarily defined by implementation details of dictionaries and hash(str). - Josiah

Josiah Carlson wrote:
Just to set the record.py straight, more for posterity than anything else: Actually, it *does*, because my prototype has explicit support for imposing such an ordering. That's what the "__names__" field is used for--it's an optional array of field names, in the order you want them displayed from __repr__(). Mr. Carlson's posting, while correct on general principles, was just plain wrong about my code; I suspect he hadn't bothered to read it, and instead based his reply on speculation about how it "probably" worked. My "record" prototype has no end of failings, but an inability to "guarantee the order of the attributes as they are being displayed" is simply not one of them. /larry/

Larry Hastings <larry@hastings.org> wrote:
No, I just didn't notice the portion of your code (in the 'if __name__ == "__main__" block) that modified the __names__ field after record construction. In the other record types that were offered by Steven and Raymond, the point is to have an ordering that is fixed. I didn't notice it because I expected that it was like Raymond and Steven's; static layout after construction - but that is not the case. - Josiah

On 2/20/07, Steven Bethard <steven.bethard@gmail.com> wrote:
Here's a brief comparison between Raymond's NamedTuple factory (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261) and my Record class (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237). Given a ``records`` module containing the recipes and the following code:: class RecordPoint(Record): __slots__ = 'x', 'y' NamedTuplePoint = NamedTuple('NamedTuplePoint x y') I got the following times from timeit:: $ python -m timeit -s "import records; Point = records.RecordPoint" "Point(1, 2)" 1000000 loops, best of 3: 0.856 usec per loop $ python -m timeit -s "import records; Point = records.NamedTuplePoint" "Point(1, 2)" 1000000 loops, best of 3: 1.59 usec per loop $ python -m timeit -s "import records; point = records.RecordPoint(1, 2)" "point.x; point.y" 10000000 loops, best of 3: 0.209 usec per loop $ python -m timeit -s "import records; point = records.NamedTuplePoint(1, 2)" "point.x; point.y" 1000000 loops, best of 3: 0.551 usec per loop $ python -m timeit -s "import records; point = records.RecordPoint(1, 2)" "x, y = point" 1000000 loops, best of 3: 1.47 usec per loop $ python -m timeit -s "import records; point = records.NamedTuplePoint(1, 2)" "x, y = point" 10000000 loops, best of 3: 0.155 usec per loop In short, for object construction and attribute access, the Record class is faster (because __init__ is a simple list of assignments and attribute access is through C-level slots). For unpacking and iteration, the NamedTuple factory is faster (because it's using the C-level tuple iteration instead of a Python-level generator). STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

"Steven Bethard" <steven.bethard@GMAIL.COM> wrote: [snip]
Somewhere in the back of my mind something is telling me - if you can get a tuple with attributes based on __slots__, you just duplicate the references as both an attribute AND in the standard tuple PyObject* array, which could offer good speed in both cases, at the cost of twice as many pointers as the other two options. But alas, it isn't possible right now... >>> class foo(tuple): ... __slots__ = 'a' ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Error when calling the metaclass bases nonempty __slots__ not supported for subtype of 'tuple' Ah well. Anyone have any intuition as to what they will be doing with named tuples/record objects enough to know which case is better? Also, it doesn't seem like it would be terribly difficult to use metaclasses to get the ease of creation from the Record type for the NamedTuple variant, if desired. - Josiah

Josiah Carlson wrote:
It should be possible to start with a tuple subclass and add descriptors to provide named access. If the descriptor is implemented in C it should be just as efficient as using __slots__, but with no duplication of the references. Given a suitable descriptor type, it shouldn't be hard to write a factory function to assemble such classes. -- Greg

Larry Hastings <larry@hastings.org> wrote:
I disagree. def add(v1, v2) return Vector(i+j for i,j in izip(v1,v2)) x,y,z = point And more are examples where not having a defined ordering (as would be the case with a 'static dict') would reduce its usefulness. Having a defined ordering (as is the case of lists and tuples) implies indexability (I want the ith item in this sequence!). It also allows one to use a NamedTuple without change otherwise anywhere you would have previously used a tuple (except for doctests, which would need to be changed). This was all discussed earlier. - Josiah

Josiah Carlson wrote:
I realize I'm in the minority here, but I prefer to use tuples only as lightweight frozen lists of homogeneous objects. I understand the concept that a tuple is a frozen container of heterogeneous objects referenced by position, it's considered a perfectly Pythonic idiom, it has a long-standing history in mathematical notation... I *get* it, I just don't *agree* with it. Better to make explicit the association of data with its name. x.dingus is simply better than remembering "the ith element of x is the 'dingus'". The examples you cite don't sway me; I'd be perfectly happy to use a positionless "record" that lacked the syntactic shortcuts you suggest. I stand by my assertion above. At least now you know "where I'm coming from", /larry/

Michele Simionato <michele.simionato <at> gmail.com> writes:
BTW, I take back this point. It is true that the generation expression is slower, but the direct equivalent of imap is starmap and using that I don't see a significant difference in execution times. So it seems that starmap is smart enough to avoid unnecessary tuple unpacking (as you probably know ;-). I still don't like for a subclass to have an incompatible signature with the parent class, but that's just me. Michele Simionato

Michele Simionato wrote:
Hello all, If this is being considered for inclusion in the standard library, using _getframe' hackery will guarantee that it doesn't work with alternative implementations of Python (like IronPython at least which doesn't have Python stack frames). At least wrapping it in a try/except would be helpful. All the best, Michael Foord

At 10:17 AM 2/20/2007 +0000, Fuzzyman wrote:
Here's a way that doesn't need it: @namedtuple def Point(x,y): """The body of this function is ignored -- but this docstring will be used for the Point class""" This approach also gets rid of the messy string stuff, and it also allows one to specify default values. The created type can be given the function's __name__, __module__, and __doc__.

On Feb 15, 2007, at 2:59 PM, Raymond Hettinger wrote:
A while back I wrote up a command-line doctest driver that does this and more [1]. The output of "doctest_driver.py --help" is below; note in particular the --update and --debug options. I also wrote an emacs mode [2] that has support for running doctest & putting the output (^c^c) or diffs (^c^d) in an output buffer; and for updating the output of selected examples (^c^r). I never really got around to doing official releases for either (although doctest-mode.el is part of the python-mode.el package). I could move some of this code into the doctest mode, if it seems appropriate; or I could just go ahead and release the driver and publicize it. -Edward [1] http://nltk.svn.sourceforge.net/viewvc/nltk/trunk/nltk/nltk_lite/ test/doctest_driver.py?view=markup [2] http://python-mode.svn.sourceforge.net/viewvc/*checkout*/python- mode/trunk/python-mode/doctest-mode.el?revision=424

Sorry, forgot to include the output of doctest_driver.py --help. Here it is: -Edward Usage: doctest_driver.py [options] NAME ... Options: --version show program's version number and exit -h, --help show this help message and exit Actions (default=check): --check Verify the output of the doctest examples in the given files. -u, --update Update the expected output for new or out-of- date doctest examples in the given files. In particular, find every example whose actual output does not match its expected output; and replace its expected output with its actual output. You will be asked to verify the changes before they are written back to the file; be sure to check them over carefully, to ensure that you don't accidentally create broken test cases. --debug Verify the output of the doctest examples in the given files. If any example fails, then enter the python debugger. Reporting: -v, --verbose Increase verbosity. -q, --quiet Decrease verbosity. -d, --udiff Display test failures using unified diffs. --cdiff Display test failures using context diffs. --ndiff Display test failures using ndiffs. Output Comparison: --ellipsis Allow "..." to be used for ellipsis in the expected output. --normalize_whitespace Ignore whitespace differences between the expected output and the actual output.

On 2/15/07, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
Hm, but why would they still have to be tuples? Why not just have a generic 'record' class?
* Give builtin types a __name__ attribute so that we have a uniform way of accessing type names.
This already exists, unless I misunderstand you: Python 2.2.3+ (#94, Jun 4 2003, 08:24:18) [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-113)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
int.__name__ 'int'
What I could have used for Py3k (specifically for the 2to3 transformer): mods to the DocTestParser class that allow you to exactly reproduce the input string from the return value of parse(). Unrelated, I'd like the tokenize module to be more easily extensible. E.g. I'd like to add new tokens, and I'd like to be able to change its whitespace handling. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 15/02/2007 20.59, Raymond Hettinger wrote:
+1 from me too, I've been using a class with the same name and semantic (though an inferior implementation) for almost two years now, with great benefits. As suggested in the cookbook comment, please consider changing the semantic of the generated constructor so that it accepts a single iterable positional arguments (or keyword arguments). This matches tuple() (and other containers) in behaviour, and makes it easier to substitute existing uses with named tuples. -- Giovanni Bajo

Raymond Hettinger <raymond.hettinger <at> verizon.net> writes:
The implementation of this recipe is really clean and I like it a lot (I even think of including it in our codebase), but there are a few issues that I would like to point out. 1. I find the camelcase confusing and I think that NamedTuple should be spelled namedtuple, since is a function, not a class. The fact that it returns classes does not count ;) 2. I agree with Giovanni Bajo, the constructor signature should be consistent with regular tuples. For instance I want to be able to map a named tuple over a record set as returned by fetchall. 3. I would like to pass the list of the fields as a sequence, not as a string. It would be more consistent and it would make easier the programmatic creation of NamedTuple classes at runtime. 4. I want help(MyNamedTuple) to work well; in particular it should display the right module name. That means that in the m dictionary you should add a __module__ attribute: __module__ = sys._getframe(1).f_globals['__name__'] 5. The major issue is that pickle does work with named tuples since the __module__ attribute is wrong. The suggestion in #4 would solve even this issue for free. 6. The ability to pass a show function to the __repr__ feems over-engineering to me. In short, here is how I would change the recipe: import sys from operator import itemgetter def namedtuple(f): """Returns a new subclass of tuple with named fields. >>> Point = namedtuple('Point x y'.split()) >>> Point.__doc__ # docstring for the new class 'Point(x, y)' >>> p = Point((11,), y=22) # instantiate with positional args or keywords >>> p[0] + p[1] # works just like the tuple (11, 22) 33 >>> x, y = p # unpacks just like a tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessable by name 33 >>> p # readable __repr__ with name=value style Point(x=11, y=22) """ typename, field_names = f[0], f[1:] nargs = len(field_names) def __new__(cls, args=(), **kwds): if kwds: try: args += tuple(kwds[name] for name in field_names[len(args):]) except KeyError, name: raise TypeError( '%s missing required argument: %s' % (typename, name)) if len(args) != nargs: raise TypeError( '%s takes exactly %d arguments (%d given)' % (typename, nargs, len(args))) return tuple.__new__(cls, args) template = '%s(%s)' % ( typename, ', '.join('%s=%%r' % name for name in field_names)) def __repr__(self): return template % self m = dict(vars(tuple)) # pre-lookup superclass methods (for faster lookup) m.update(__doc__= '%s(%s)' % (typename, ', '.join(field_names)), __slots__ = (), # no per-instance dict __new__ = __new__, __repr__ = __repr__, __module__ = sys._getframe(1).f_globals['__name__'], ) m.update((name, property(itemgetter(index))) for index, name in enumerate(field_names)) return type(typename, (tuple,), m) if __name__ == '__main__': import doctest TestResults = namedtuple(['TestResults', 'failed', 'attempted']) print TestResults(doctest.testmod())

Thanks for the comments. Will experiment with them and get back to you.. The alternate naming is fine. At work, we call it ntuple(). The constructor signature has been experimented with several time and had best results in its current form which allows the *args for casting a record set returned by SQL or by the CSV module as in Point(*fetchall(s)), and it allows for direct construction with Point(2,3) without the slower and weirder form: Point((2,3)). Also, the current signature works better with keyword arguments: Point(x=2, y=3) or Point(2, y=3) which wouldn't be common but would be consistent with the relationship between keyword arguments and positional arguments in other parts of the language. The string form for the named tuple factory was arrived at because it was easier to write, read, and alter than its original form with a list of strings: Contract = namedtuple('Contract stock strike volatility expiration rate iscall') vs. Contract = namedtuple('Contract', 'stock', 'strike', 'volatility', 'expiration', 'rate', 'iscall') That former is easier to edit and to re-arrange. Either form is trivial to convert programmatically to the other and the definition step only occurs once while the use of the new type can appear many times throughout the code. Having experimented with both forms, I've found the string form to be best thought it seems a bit odd. Yet, the decision isn't central to the proposal and is still an open question. The __module__ idea is nice. Will add it. Likewise with pickling. The 'show' part of __repr__ was done for speed (localized access versus accessing the enclosing scope. Will rename it to _show to make it clear that it is not part of the API. Thanks for the accolades and the suggestions. Raymond ----- Original Message ----- From: "Michele Simionato" <michele.simionato@gmail.com> To: <python-dev@python.org> Sent: Monday, February 19, 2007 9:25 PM Subject: Re: [Python-Dev] Py2.6 ideas

More thoughts on named tuples after trying-out all of Michele's suggestions: * The lowercase 'namedtuple' seemed right only because it's a function, but as a factory function, it is somewhat class-like. In use, 'NamedTuple' more closely matches my mental picture of what is happening and distinguishes what it does from the other two entries in collections, 'deque' and 'defaultdict' which are used to create instances instead of new types. * I remembered why the __repr__ function had a 'show' argument. I've changed the name now to make it more clear and added a docstring. The idea was the some use cases require that the repr exactly match the default style for tuples and the optional argument allowed for that possiblity with almost no performance hit. The updated recipe is at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 Raymond

Raymond Hettinger <python <at> rcn.com> writes:
This is debatable. I remember Guido using lowercase for metaclasses in the famous descrintro essay. I still like more the lowercase for class factories. But I will not fight on this ;)
But what about simply changing the __repr__? In [2]: Point = NamedTuple('Point','x','y') In [3]: Point(1,2) Out[3]: Point(x=1, y=2) In [4]: Point.__repr__ = tuple.__repr__ In [5]: Point(1,2) Out[5]: (1, 2) It feels clearer to me. Michele Simionato

Raymond Hettinger wrote:
I think you mean something like [Point(*tup) for tup in fetchall(s)], which I don't like for the reasons explained later.
I don't buy this argument. Yes, Point(2,3) is nicer than Point((2,3)) in the interactive interpreter and in the doctests, but in real life one has always tuples coming as return values from functions. Consider your own example, TestResults(*doctest.testmod()). I will argue that the * does not feel particularly good and that it would be better to just write TestResults(doctest.testmod()). Moreover I believe that having a subclass constructor incompatible with the base class constructor is very evil. First of all, you must be consistent with the tuple constructor, not with "other parts of the language". Finally I did some timing of code like this:: from itertools import imap Point = namedtuple('Point x y'.split()) lst = [(i, i*i) for i in range(500)] def with_imap(): for _ in imap(Point, lst): pass def with_star(): for _ in (Point(*t) for t in lst): pass and as expected the performances are worse with the * notation. In short, I don't feel any substantial benefit coming from the *args constructor.
``Contract = namedtuple('Contract stock strike volatility expiration rate iscall'.split())`` is not that bad either, but I agree that this is a second order issue. Michele Simionato

Michele Simionato wrote:
That also nicely makes another point: this form accepts not just a list of strings but any iterable. That sounds nice. I'd vote against the one-string approach; it seems wholly un-Pythonic to me. Python already has a parser, so don't invent your own. Speaking of un-Pythonic-ness, the whole concept of a "named tuple" strikes me as un-Pythonic. It's the only data structure I can think of where every item inside has two names. (Does TOOWTDI apply to member lookup?) I'd prefer a lightweight frozen dict, let's call it a "record" after one of the suggestions in this thread. That achieves symmetry: mutable & immutable & heavyweight lightweight +-------------------------- positional | list tuple keyword | dict record For new code, I can't think of a single place I'd want to use a "NamedTuple" where a "record" wouldn't be arguably more suitable. The "NamedTuple" makes sense only for "fixing" old APIs like os.stat. My final bit of feedback: why is it important that a NamedTuple have a class name? In the current implementation, the first identifier split from the argument string is used as the "name" of the NamedTuple, and the following arguments are names of fields. Dicts and sets are anonymous, not to mention lists and tuples; why do NamedTuples need names? My feeling is: if you want a class name, use a class. /larry/

Larry Hastings wrote:
I knocked together a quick prototype of a "record" to show what I had in mind. Here goes: ------ import exceptions class record(dict): __classname__ = "record" def __repr__(self): s = [self.__classname__, "("] comma = "" for name in self.__names__: s += (comma, name, "=", repr(self[name])) comma = ", " s += ")" return "".join(s) def __delitem__(self, name): raise exceptions.TypeError("object is read-only") def __setitem__(self, name, value): self.__delitem__(name) def __getattr__(self, name): if name in self: return self[name] return object.__getattr__(self, name) def __hasattr__(self, name): if name in self.__names__: return self[name] return object.__hasattr__(self, name) def __setattr__(self, name, value): # hack: allow setting __classname__ and __names__ if name in ("__classname__", "__names__"): super(record, self).__setattr__(name, value) else: # a shortcut to throw our exception self.__delitem__(name) def __init__(self, **args): names = [] for name, value in args.iteritems(): names += name super(record, self).__setitem__(name, value) self.__names__ = tuple(names) if __name__ == "__main__": r = record(a=1, b=2, c="3") print r print r.a print r.b print r.c # this throws a TypeError # r.c = 123 # r.d = 456 # the easy way to define a "subclass" of record def Point(x, y): return record(x = x, y = y) # you can use hack-y features to make your "subclasses" more swell def Point(x, y): x = record(x = x, y = y) # a hack to print the name "Point" instead of "record" x.__classname__ = "Point" # a hack to impose an ordering on the repr() display x.__names__ = ("x", "y") return x p = Point(3, 5) q = Point(2, y=5) r = Point(y=2, x=4) print p, q, r # test pickling import pickle pikl = pickle.dumps(p) pp = pickle.loads(pikl) print pp print pp == p # test that the output repr works to construct s = repr(p) print repr(s) peval = eval(s) print peval print p == peval ------ Yeah, I considered using __slots__, but that was gonna take too long. Cheers, /larry/

On 2/20/07, Larry Hastings <larry@hastings.org> wrote:
Here's a simple implementation using __slots__: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237 And you don't have to hack anything to get good help() and a nice repr(). Declare a simple class for your type and you're ready to go:: >>> class Point(Record): ... __slots__ = 'x', 'y' ... >>> Point(3, 4) Point(x=3, y=4) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Steven Bethard <steven.bethard <at> gmail.com> writes:
Here's a simple implementation using __slots__:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237
That's pretty cool! Two suggestions: 1. rename the _items method to __iter__, so that you have easy casting to tuple and lists; 2. put a check in the metaclass such as ``assert '__init__' not in bodydict`` to make clear to the users that they cannot override the __init__ method, that's the metaclass job. Great hack! Michele Simionato

Steven Bethard wrote:
Thanks for steering me to it. However, your implementation and Mr. Hettinger's original NamedTuple both requires you to establish a type at the onset; with my prototype, you can create records ad-hoc as you can with dicts and tuples. I haven't found a lot of documentation on how to use __slots__, but I'm betting it wouldn't mesh well with my breezy ad-hoc records type. Cheers, /larry/

Larry Hastings <larry@hastings.org> wrote:
If it helps, you can think of Steven's and Raymond's types as variations of a C struct. They are fixed at type definition time, but that's more or less the point. Also, you can find more than you ever wanted to know about __slots__ by searching on google for 'python __slots__' (without quotes), but it would work *just fine* with your ad-hoc method, though one thing to note with your method - you can't guarantee the order of the attributes as they are being displayed. Your example would have the same issues as dict does here: >>> dict(b=1, a=2) {'a': 2, 'b': 1} Adding more attributes could further arbitrarily rearrange them. - Josiah

Josiah Carlson wrote:
one thing to note with your method - you can't guarantee the order of the attributes as they are being displayed.
Actually, my record type *can*; see the hack using the __names__ field. It won't preserve that order during iteration--but it's only a prototype so far, and it could be fixed if there was interest. /larry/

Larry Hastings <larry@hastings.org> wrote:
Actually, it *can't*. The ordering of the dict produced by the **kwargs arguments is exactly same as a regular dictionary. >>> def foo(**k): ... return k ... >>> foo(b=1, a=2) {'a': 2, 'b': 1} >>> foo(hello=1, world=2) {'world': 2, 'hello': 1} After construction you can preserve the order it has, but you've already lost the order provided in the call, so the order you get is arbitrarily defined by implementation details of dictionaries and hash(str). - Josiah

Josiah Carlson wrote:
Just to set the record.py straight, more for posterity than anything else: Actually, it *does*, because my prototype has explicit support for imposing such an ordering. That's what the "__names__" field is used for--it's an optional array of field names, in the order you want them displayed from __repr__(). Mr. Carlson's posting, while correct on general principles, was just plain wrong about my code; I suspect he hadn't bothered to read it, and instead based his reply on speculation about how it "probably" worked. My "record" prototype has no end of failings, but an inability to "guarantee the order of the attributes as they are being displayed" is simply not one of them. /larry/

Larry Hastings <larry@hastings.org> wrote:
No, I just didn't notice the portion of your code (in the 'if __name__ == "__main__" block) that modified the __names__ field after record construction. In the other record types that were offered by Steven and Raymond, the point is to have an ordering that is fixed. I didn't notice it because I expected that it was like Raymond and Steven's; static layout after construction - but that is not the case. - Josiah

On 2/20/07, Steven Bethard <steven.bethard@gmail.com> wrote:
Here's a brief comparison between Raymond's NamedTuple factory (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261) and my Record class (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237). Given a ``records`` module containing the recipes and the following code:: class RecordPoint(Record): __slots__ = 'x', 'y' NamedTuplePoint = NamedTuple('NamedTuplePoint x y') I got the following times from timeit:: $ python -m timeit -s "import records; Point = records.RecordPoint" "Point(1, 2)" 1000000 loops, best of 3: 0.856 usec per loop $ python -m timeit -s "import records; Point = records.NamedTuplePoint" "Point(1, 2)" 1000000 loops, best of 3: 1.59 usec per loop $ python -m timeit -s "import records; point = records.RecordPoint(1, 2)" "point.x; point.y" 10000000 loops, best of 3: 0.209 usec per loop $ python -m timeit -s "import records; point = records.NamedTuplePoint(1, 2)" "point.x; point.y" 1000000 loops, best of 3: 0.551 usec per loop $ python -m timeit -s "import records; point = records.RecordPoint(1, 2)" "x, y = point" 1000000 loops, best of 3: 1.47 usec per loop $ python -m timeit -s "import records; point = records.NamedTuplePoint(1, 2)" "x, y = point" 10000000 loops, best of 3: 0.155 usec per loop In short, for object construction and attribute access, the Record class is faster (because __init__ is a simple list of assignments and attribute access is through C-level slots). For unpacking and iteration, the NamedTuple factory is faster (because it's using the C-level tuple iteration instead of a Python-level generator). STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

"Steven Bethard" <steven.bethard@GMAIL.COM> wrote: [snip]
Somewhere in the back of my mind something is telling me - if you can get a tuple with attributes based on __slots__, you just duplicate the references as both an attribute AND in the standard tuple PyObject* array, which could offer good speed in both cases, at the cost of twice as many pointers as the other two options. But alas, it isn't possible right now... >>> class foo(tuple): ... __slots__ = 'a' ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Error when calling the metaclass bases nonempty __slots__ not supported for subtype of 'tuple' Ah well. Anyone have any intuition as to what they will be doing with named tuples/record objects enough to know which case is better? Also, it doesn't seem like it would be terribly difficult to use metaclasses to get the ease of creation from the Record type for the NamedTuple variant, if desired. - Josiah

Josiah Carlson wrote:
It should be possible to start with a tuple subclass and add descriptors to provide named access. If the descriptor is implemented in C it should be just as efficient as using __slots__, but with no duplication of the references. Given a suitable descriptor type, it shouldn't be hard to write a factory function to assemble such classes. -- Greg

Larry Hastings <larry@hastings.org> wrote:
I disagree. def add(v1, v2) return Vector(i+j for i,j in izip(v1,v2)) x,y,z = point And more are examples where not having a defined ordering (as would be the case with a 'static dict') would reduce its usefulness. Having a defined ordering (as is the case of lists and tuples) implies indexability (I want the ith item in this sequence!). It also allows one to use a NamedTuple without change otherwise anywhere you would have previously used a tuple (except for doctests, which would need to be changed). This was all discussed earlier. - Josiah

Josiah Carlson wrote:
I realize I'm in the minority here, but I prefer to use tuples only as lightweight frozen lists of homogeneous objects. I understand the concept that a tuple is a frozen container of heterogeneous objects referenced by position, it's considered a perfectly Pythonic idiom, it has a long-standing history in mathematical notation... I *get* it, I just don't *agree* with it. Better to make explicit the association of data with its name. x.dingus is simply better than remembering "the ith element of x is the 'dingus'". The examples you cite don't sway me; I'd be perfectly happy to use a positionless "record" that lacked the syntactic shortcuts you suggest. I stand by my assertion above. At least now you know "where I'm coming from", /larry/

Michele Simionato <michele.simionato <at> gmail.com> writes:
BTW, I take back this point. It is true that the generation expression is slower, but the direct equivalent of imap is starmap and using that I don't see a significant difference in execution times. So it seems that starmap is smart enough to avoid unnecessary tuple unpacking (as you probably know ;-). I still don't like for a subclass to have an incompatible signature with the parent class, but that's just me. Michele Simionato

Michele Simionato wrote:
Hello all, If this is being considered for inclusion in the standard library, using _getframe' hackery will guarantee that it doesn't work with alternative implementations of Python (like IronPython at least which doesn't have Python stack frames). At least wrapping it in a try/except would be helpful. All the best, Michael Foord

At 10:17 AM 2/20/2007 +0000, Fuzzyman wrote:
Here's a way that doesn't need it: @namedtuple def Point(x,y): """The body of this function is ignored -- but this docstring will be used for the Point class""" This approach also gets rid of the messy string stuff, and it also allows one to specify default values. The created type can be given the function's __name__, __module__, and __doc__.

On Feb 15, 2007, at 2:59 PM, Raymond Hettinger wrote:
A while back I wrote up a command-line doctest driver that does this and more [1]. The output of "doctest_driver.py --help" is below; note in particular the --update and --debug options. I also wrote an emacs mode [2] that has support for running doctest & putting the output (^c^c) or diffs (^c^d) in an output buffer; and for updating the output of selected examples (^c^r). I never really got around to doing official releases for either (although doctest-mode.el is part of the python-mode.el package). I could move some of this code into the doctest mode, if it seems appropriate; or I could just go ahead and release the driver and publicize it. -Edward [1] http://nltk.svn.sourceforge.net/viewvc/nltk/trunk/nltk/nltk_lite/ test/doctest_driver.py?view=markup [2] http://python-mode.svn.sourceforge.net/viewvc/*checkout*/python- mode/trunk/python-mode/doctest-mode.el?revision=424

Sorry, forgot to include the output of doctest_driver.py --help. Here it is: -Edward Usage: doctest_driver.py [options] NAME ... Options: --version show program's version number and exit -h, --help show this help message and exit Actions (default=check): --check Verify the output of the doctest examples in the given files. -u, --update Update the expected output for new or out-of- date doctest examples in the given files. In particular, find every example whose actual output does not match its expected output; and replace its expected output with its actual output. You will be asked to verify the changes before they are written back to the file; be sure to check them over carefully, to ensure that you don't accidentally create broken test cases. --debug Verify the output of the doctest examples in the given files. If any example fails, then enter the python debugger. Reporting: -v, --verbose Increase verbosity. -q, --quiet Decrease verbosity. -d, --udiff Display test failures using unified diffs. --cdiff Display test failures using context diffs. --ndiff Display test failures using ndiffs. Output Comparison: --ellipsis Allow "..." to be used for ellipsis in the expected output. --normalize_whitespace Ignore whitespace differences between the expected output and the actual output.
participants (12)
-
Edward Loper
-
Fuzzyman
-
Giovanni Bajo
-
Greg Ewing
-
Guido van Rossum
-
Josiah Carlson
-
Larry Hastings
-
Michele Simionato
-
Phillip J. Eby
-
Raymond Hettinger
-
Raymond Hettinger
-
Steven Bethard