Guido van Rossum wrote:
On 2/15/07, Raymond Hettinger <raymond.hettinger@verizon.net> wrote:
* Add a pure python named_tuple class to the collections module. I've been using the class for about a year and found that it greatly improves the usability of tuples as records. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261
Hm, but why would they still have to be tuples? Why not just have a generic 'record' class?
Hmm - possibilities. "record" definitely has greater connotations of heterogeneous elements than "tuple", which would put paid to the constant arguments that "a tuple is really just an immutable list". list - primarily intended for homogeneous elements record - primarily intended for heterogeneous elements, elements are (optionally?) named and have mutable and immutable versions of each. Maybe the current list syntax would then continue to create a mutable list, and the current tuple syntax would create an immutable record (with no element names) i.e. the current tuple. Tim Delaney
>> Hm, but why would they still have to be tuples? Why not just have a >> generic 'record' class? Tim> Hmm - possibilities. "record" definitely has greater connotations Tim> of heterogeneous elements than "tuple", which would put paid to the Tim> constant arguments that "a tuple is really just an immutable list". (What do you mean by "... put paid ..."? It doesn't parse for me.) Based on posts the current thread in c.l.py with the improbable subject "f---ing typechecking", lots of people refuse to believe tuples are anything other than immutable lists. Skip
On Thu, Feb 15, 2007 at 05:41:51PM -0600, skip@pobox.com wrote:
Tim> Hmm - possibilities. "record" definitely has greater connotations Tim> of heterogeneous elements than "tuple", which would put paid to the Tim> constant arguments that "a tuple is really just an immutable list".
(What do you mean by "... put paid ..."? It doesn't parse for me.)
"Put paid" usually means "to finish off"; Tim is saying this would finish the constant arguments that &c &c... --amk
[Raymond Hettinger]
* Add a pure python named_tuple class to the collections module. I've been using the class for about a year and found that it greatly improves the usability of tuples as records. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261
[Delaney, Timothy]
Hmm - possibilities. "record" definitely has greater connotations of heterogeneous elements than "tuple", which would put paid to the constant arguments that "a tuple is really just an immutable list".
No need to go so widely off-track. The idea is to have an efficient type that is directly substitutable for tuples but is a bit more self-descriptive. I like to have the doctest result cast at NamedTuple('TestResults failed attempted). The repr of that result looks like TestResult(failed=0, attempted=15) but is still accessible as a tuple and passes easily into other functions that expect a tuple. This sort of thing would be handly for things like os.stat(). http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 Raymond
Raymond Hettinger schrieb:
No need to go so widely off-track. The idea is to have an efficient type that is directly substitutable for tuples but is a bit more self-descriptive. I like to have the doctest result cast at NamedTuple('TestResults failed attempted). The repr of that result looks like TestResult(failed=0, attempted=15) but is still accessible as a tuple and passes easily into other functions that expect a tuple. This sort of thing would be handly for things like os.stat(). http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261
I'd like to repeat Guido's question: Why does this still need to support the tuple interface (i.e. indexed access)? I'm not (anymore) sure that you are aware that the os.stat result *already* has named fields, in addition to the indexed access. However, the indexed access is deprecated, and only preserved for backwards compatibility. So why would a new type be handy for os.stat? And, if it's not for os.stat, what other uses does it have? Regards, Martin
Martin v. Löwis wrote:
Raymond Hettinger schrieb:
No need to go so widely off-track. The idea is to have an efficient type that is directly substitutable for tuples but is a bit more self-descriptive. I like to have the doctest result cast at NamedTuple('TestResults failed attempted). The repr of that result looks like TestResult(failed=0, attempted=15) but is still accessible as a tuple and passes easily into other functions that expect a tuple. This sort of thing would be handly for things like os.stat(). http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261
I'd like to repeat Guido's question: Why does this still need to support the tuple interface (i.e. indexed access)?
So that it remains interoperable with existing libraries that expect a tuple? Otherwise you'd be casting (and copying) every time you needed to pass it to something that used indexed access. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Nick Coghlan schrieb:
I'd like to repeat Guido's question: Why does this still need to support the tuple interface (i.e. indexed access)?
So that it remains interoperable with existing libraries that expect a tuple? Otherwise you'd be casting (and copying) every time you needed to pass it to something that used indexed access.
Can you give a few example, for libraries where this isn't already done? Regards, Martin
Martin v. Löwis wrote:
Nick Coghlan schrieb:
I'd like to repeat Guido's question: Why does this still need to support the tuple interface (i.e. indexed access)?
So that it remains interoperable with existing libraries that expect a tuple? Otherwise you'd be casting (and copying) every time you needed to pass it to something that used indexed access.
Can you give a few example, for libraries where this isn't already done?
I don't have any specific examples of that, no - that's why I phrased it as a question. However, another aspect that occurred to me is that inheriting from tuple has significant practical benefits in terms of speed and memory consumption, at which point it doesn't seem worthwhile to *remove* the indexing capability. I suppose you *could* write a completely new C-level record class, but given that Raymond's NamedTuple class gets good performance from a Python implementation, rewriting it in C seems like wasted effort. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Nick Coghlan schrieb:
However, another aspect that occurred to me is that inheriting from tuple has significant practical benefits in terms of speed and memory consumption, at which point it doesn't seem worthwhile to *remove* the indexing capability.
I'm not so sure that inheriting from tuple, and giving it named fields, has significant speed and memory benefits. In particular for the memory benefits, you can use __slots__ to achieve the same effects, and more efficiently so (because it you don't store the tuple length). As for speed, I would have to see measurements to be convinced it is faster.
I suppose you *could* write a completely new C-level record class, but given that Raymond's NamedTuple class gets good performance from a Python implementation, rewriting it in C seems like wasted effort.
It wouldn't necessarily be rewriting: In the C API, you have already the PyStructSequence machinery (see posixmodule.c:stat_result_fields for an example). It's just that this machinery isn't available to Python code, yet, and no alternative convenience library is, either (other than using __slots__, which won't directly give indexed access). Regards, Martin
At 01:38 PM 2/16/2007 +0100, Martin v. Löwis wrote:
Nick Coghlan schrieb:
However, another aspect that occurred to me is that inheriting from tuple has significant practical benefits in terms of speed and memory consumption, at which point it doesn't seem worthwhile to *remove* the indexing capability.
I'm not so sure that inheriting from tuple, and giving it named fields, has significant speed and memory benefits. In particular for the memory benefits, you can use __slots__ to achieve the same effects, and more efficiently so (because it you don't store the tuple length). As for speed, I would have to see measurements to be convinced it is faster.
For an otherwise-pure Python implementation, the performance benefit of inheriting from a tuple is in having ready-made C implementations of hashing and comparison.
On 2/16/07, Nick Coghlan <ncoghlan@gmail.com> wrote:
Martin v. Löwis wrote:
Raymond Hettinger schrieb:
No need to go so widely off-track. The idea is to have an efficient type that is directly substitutable for tuples but is a bit more self-descriptive. I like to have the doctest result cast at NamedTuple('TestResults failed attempted). The repr of that result looks like TestResult(failed=0, attempted=15) but is still accessible as a tuple and passes easily into other functions that expect a tuple. This sort of thing would be handly for things like os.stat(). http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261
I'd like to repeat Guido's question: Why does this still need to support the tuple interface (i.e. indexed access)?
So that it remains interoperable with existing libraries that expect a tuple? Otherwise you'd be casting (and copying) every time you needed to pass it to something that used indexed access.
In the case of os.stat and friends I propose that in Py3k we drop the tuple-ness completely; it's been dual-op since 2.2. Maybe Raymond's proposed record type should have two versions: one that's also a tuple, for compatibility, and one that's just a record. The compatibility version should also support having named fields that don't show up in the tuple view -- this has proved very useful for the os.stat() result. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Maybe Raymond's proposed record type should have two versions: one that's also a tuple, for compatibility, and one that's just a record.
FWIW, ML unifies tuples and records by defining a tuple to be a record whose component names are all consecutive integers starting with 1. For example, in ML, the literal { name = "ark", state = "NJ" } represents a record with type { name: string, state: string }. The identifiers "name" and "state" are bound during compilation, ML being a statically typed language. In ML, one extracts a component named foo by applying a function named #foo. So, for example, the value of #state { name = "ark", state = "NJ" } is "NJ", and trying to evaluate #foo { name = "ark", state = "NJ" } results in a compilation error because of type-checking failure. Component names can be either identifiers or integers. So, for example, { name = "spells", 1 = "xyzzy", 2 = "plugh" } is a record of type {1: string, 2: string, name: string }. So here is the point. If the component names of a record are all positive integers with no gaps, the record is *also* a tuple. So, for example { 2 = "plugh", 1 = "xyzzy" } has exactly the same meaning--including the same type--as { "xyzzy", "plugh" } In both cases, the compiler normalizes the display, both of the value (i.e. it prints {"xyzzy", "plugh"} instead of { 2 = "plugh", 1 = "xyzzy" }, and it prints the type as string * string instead of (the equivalent) { 1: string, 2: string } So in ML, tuple types aren't really anything special -- they're just abbreviations for elements of a particular subset of record types.
[Martin v. Löwis]
Why does this still need to support the tuple interface (i.e. indexed access)?
I used named tuples whereever I need a tuple but the number and meaning of the fields starts to tax my memory. For doctests, I return a named tuple like TestResults(failed=0, attempted=15). That named tuple can still be unpacked after a function call: f,a=results. And it can be unpacked in a function call: f(*results). It can be handed to functions that expect a tuple: 'Missed %d out of %d tests' % results. Also, the named tuple used with indexed access has the same high performance as a regular tuple; however, if an error occurs, its repr is shown in a more readable form. Likewise, when constructing the NamedTuple, an editor's tooltips reminds you of what goes in each field. Those properties have proved useful to me when modeling option contracts where each contract has to track the remaining time, interest rate, option type, underlying security, and strike price. The same applies to model results: delta, gamma, vega, theta, rho. This could also be done with attribute access, but it would be much slower and much more verbose when unpacking the model's results: d, g, v, t, r = model(somecontract) vs. m = model(somecontract) d, g, v, t, r = m.delta, m.gamma, m.vega, m.theta, m.rho
I'm not (anymore) sure that you are aware that the os.stat result *already* has named fields, in addition to the indexed access.
Of course, that specific example was solved long ago. We did not however expose a general purpose mechanism applicable where similar issues arise for other tuples. Raymond
Raymond Hettinger schrieb:
d, g, v, t, r = model(somecontract)
I find that line quite unreadable, and find it likely that I would not be able to remember the standard order of the fields. You almost "had me" with the two fields example, but this makes me think "-1" again. Is it really that you need all these values in the following computation? For stat, this was never the case: you would only need some field normaly (especially when the more esoteric, platform-dependent fields got added). If you absolutely want tuple unpacking on a record-like object, I'd suggest to support explicit tuple conversion, like d, g, v, t, r = model(somecontract).to_tuple() Or, if you absolutely cannot stand the explicit tuple creation, add def __getitem__(self, index): return getattr(self, self.__slots__[index]) # or is it self.[self.__slots__[index]] :-? No need to inherit from tuple for that.
Of course, that specific example was solved long ago. We did not however expose a general purpose mechanism applicable where similar issues arise for other tuples.
As you've explained now, your use case is not similar. For os.stat, it's a means of transition and backwards compatibility. For your code, it seems you want it a permanent feature in your code. Regards, Martin
Raymond Hettinger schrieb:
d, g, v, t, r = model(somecontract)
[MvL]
I find that line quite unreadable
Of course, I can't give you the fully spelled-out line from proprietary code. But at this point we're just talking about the use cases for tuples with or without named attributes. Some functions return multiple values and some calls to those functions do tuple unpacking. That is ubiquitous throughout Python. If the tuple also happens to be a NamedTuple, you get tooltips for it (reminding you which fields are which) and any error messages will show the full repr with both the names and values. If not unpacked, then the attribute access is helpful. Something like contract.vega or testresult.failures or somesuch. Essentially, I'm proposing a variant of tuple that has self-documenting extra features: traditional positional arguments construction or option keyword argument construction annotated repr: Contract(type='put', strike=45, security='IBM', expirymonth=4) instead of: ('put', 45, 'IBM, 4) optional attribute access: contract.strike nice docstring for tooltips: 'Contract(type, strike, security, expirymonth)' The use cases are the same as the ones for tuples. The new type is just more self-documenting. That's all there is to it. FWIW, I've been using NamedTuples for at least six months and have found them to be a nice improvement over straight-tuples in situations where I can't easily remember what each tuple position represents. If added to the collections module, I think NamedTuples will become quite popular.
If you absolutely want tuple unpacking on a record-like object, I'd suggest to support explicit tuple conversion, like
d, g, v, t, r = model(somecontract).to_tuple()
Entirely unnecessary. The goal is to have better tuples with low overhead and a near zero learning curve. Raymond
"Raymond Hettinger" <python@rcn.com> wrote:
Raymond Hettinger schrieb:
d, g, v, t, r = model(somecontract)
[MvL]
I find that line quite unreadable
Of course, I can't give you the fully spelled-out line from proprietary code. But at this point we're just talking about the use cases for tuples with or without named attributes. Some functions return multiple values and some calls to those functions do tuple unpacking. That is ubiquitous throughout Python. If the tuple also happens to be a NamedTuple, you get tooltips for it (reminding you which fields are which) and any error messages will show the full repr with both the names and values.
If not unpacked, then the attribute access is helpful. Something like contract.vega or testresult.failures or somesuch.
For what it's worth, I've actually been using a similar approach with lists and global names of list indices because I needed a mutable structure, the list instance was significantly smaller than an object with __slots__ (by a factor of 3), and because using global constants was actually competitive with a __slots__ name lookup. After having seen your tuple recipe, I've been planning on converting it to a list-based recipe for the same benefits (except for unpacking) in my own code. Then again, I'm also looking forward to adding the tuple-based recipe to my own library for all of the reasons you outlined. - Josiah
participants (10)
-
"Martin v. Löwis" -
A.M. Kuchling -
Andrew Koenig -
Delaney, Timothy (Tim) -
Guido van Rossum -
Josiah Carlson -
Nick Coghlan -
Phillip J. Eby -
Raymond Hettinger -
skip@pobox.com