Optional kwarg making attrgetter & itemgetter always return a tuple

attrgetter and itemgetter are both very useful functions, but both have a significant pitfall if the arguments passed in are validated but not controlled: if receiving the arguments (list of attributes, keys or indexes) from an external source and *-applying it, if the external source passes a sequence of one element both functions will in turn return an element rather than a singleton (1-element tuple). This means such code, for instance code "slicing" a matrix of some sort to get only some columns and getting the slicing information from its caller (in situation where extracting a single column may be perfectly sensible) will have to implement a manual dispatch between a "manual" getitem (or getattr) and an itemgetter (resp. attrgetter) call, e.g. slicer = (operator.itemgetter(*indices) if len(indices) > 1 else lambda ar: [ar[indices[0]]) This makes for more verbose and less straightforward code, I think it would be useful to such situations if attrgetter and itemgetter could be forced into always returning a tuple by way of an optional argument: # works the same no matter what len(indices) is slicer = operator.itemgetter(*indices, force_tuple=True) which in the example equivalences[0] would be an override (to False) of the `len` check (`len(items) == 1` would become `len(items) == 1 and not force_tuple`) The argument is backward-compatible as neither function currently accepts any keyword argument. Uncertainty note: whether force_tuple (or whatever its name is) silences the error generated when len(indices) == 0, and returns a null tuple rather than raising a TypeError. [0] http://docs.python.org/dev/library/operator.html#operator.attrgetter

On 9/13/2012 9:15 AM, Masklinn wrote:
This seems like a plausible idea. The actual C version requires one argument. The Python equivalent in the doc does not (hence the different signature), as it would return an empty tuple for empty *items. -- Terry Jan Reedy

On 13/09/12 23:15, Masklinn wrote:
For those who, like me, had to read this three or four times to work out what Masklinn is talking about, I think he is referring to the fact that attrgetter and itemgetter both return a single element if passed a single index, otherwise they return a tuple of results. If a call itemgetter(*args)(some_list) returns a tuple, was that tuple a single element (and args contained a single index) or was the tuple a collection of individual elements (and args contained multiple indexes)? py> itemgetter(*[1])(['a', ('b', 'c'), 'd']) ('b', 'c') py> itemgetter(*[1, 2])(['a', 'b', 'c', 'd']) ('b', 'c')
Why is this a problem? If you don't like writing this out in place, write it once in a helper function. Not every short code snippet needs to be in the standard library.
-1 There is no need to add extra complexity to itemgetter and attrgetter for something best solved in your code. Write a helper: def slicer(*indexes): getter = itemgetter(*indexes) if len(indexes) == 1: return lambda seq: (getter(seq), ) # Wrap in a tuple. return getter -- Steven

On 2012-09-14, at 03:20 , Steven D'Aprano wrote:
Because it adds significant complexity to the code, and that's for the trivial version of itemgetter, attrgetter also does keypath resolution so the code is nowhere near this simple. It's also anything but obvious what this snippet does on its own.
It's not really "every short code snippet" in this case, it's a way to avoid a sometimes deleterious special case and irregularity of the stdlib.
I don't agree with this statement, the stdlib flag adds very little extra complexity, way less than the original irregularity/special case and way less than necessary to do it outside the stdlib. Furthermore, it makes the solution (to having a regular output behavior for (attr|item)getter) far more obvious and makes the code itself much simpler to read.

On 14/09/12 17:43, Masklinn wrote:
I don't consider that to be *significant* complexity.
I don't understand what you mean by "keypath resolution". attrgetter simply looks up the attribute(s) by name, just like obj.name would do. It has the same API as itemgetter, except with attribute names instead of item indexes.
It's also anything but obvious what this snippet does on its own.
Once you get past the ternary if operator, the complexity is pretty much entirely in the call to itemgetter. You don't even use itemgetter in the else clause! Beyond the call to itemgetter, it's trivially simple Python code. slicer = operator.itemgetter(*indices, force_tuple=flag) is equally mysterious to anyone who doesn't know what itemgetter does.
I disagree that this is a "sometimes deleterious special case". itemgetter and attrgetter have two APIs: itemgetter(index)(L) => element itemgetter(index, index, ...)(L) => tuple of elements and likewise for attrgetter: attrgetter(name)(L) => attribute attrgetter(name, name, ...)(L) => tuple of attributes Perhaps it would have been better if there were four functions rather than two. Or if the second API were: itemgetter(sequence_of_indexes)(L) => tuple of elements attrgetter(sequence_of_names)(L) => tuple of attributes so that the two getters always took a single argument, and dispatched on whether that argument is an atomic value or a sequence. But either way, it is not what I consider a "special case" so much as two related non- special cases. But let's not argue about definitions. Special case or not, can you demonstrate that the situation is not only deleterious, but cannot be reasonably fixed with a helper function? Whenever you call itemgetter, there is no ambiguity because you always know whether you are calling it with a single index or multiple indexes.
Whether or not it is empirically less than the complexity already there in itemgetter, it would still be adding extra complexity. It simply isn't possible to end up with *less* complexity by *adding* features. (Complexity is not always a bad thing. If we wanted to program in something simple, we would program using a Turing machine.) The reader now has to consider "what does the force_tuple argument do?" which is not necessarily trivial nor obvious. I expect a certain number of beginners who don't read documentation will assume that you have to do this: slicer = itemgetter(1, 2, 3, force_tuple=False) if they want to pass something other than a tuple to slicer. Don't imagine that adding an additional argument will make itemgetter and attrgetter *simpler* to understand. To me, a major red-flag for your suggested API can be seen here: itemgetter(1, 2, 3, 4, force_tuple=False) What should this do? I consider all the alternatives to be less than ideal: - ignore the explicit keyword argument and return a tuple anyway - raise an exception To say nothing of more... imaginative... semantics: - return a list, or a set, anything but a tuple - return a single element instead of four (but which one?) The suggested API is not as straight-forward as you seem to think it is.
The only thing I will grant is that it aids in discoverability of a solution: you don't have to think of the (trivial) solution yourself, you just need to read the documentation. But I don't see either the problem or the solution to be great enough to justify adding an argument, writing new documentation, and doubling the number of tests for both itemgetter and attrgetter. -- Steven

On 2012-09-14, at 11:02 , Steven D'Aprano wrote
It takes dotted paths, not just attribute names
I would expect either foreknowledge or reading up on it to be obvious in the context of its usage.
Which conflict for a sequence of length 1, which is the very reason why I started this thread.
Which as usual hinges on the definition of "reasonably", of course the situation can be "fixed" (with "reasonably" being a wholly personal value judgement) with a helper function or a reimplementation of an (attr|item)getter-like function from scratch. As it can pretty much always be. I don't see that as a very useful benchmark.
Whenever you call itemgetter, there is no ambiguity because you always know whether you are calling it with a single index or multiple indexes.
That is not quite correct, even ignoring that you have to call `len` to do so when the indices are provided by a third party, the correct code gets yet more complex as the third party could provide an iterator which would have to be reified before being passed to len(), increasing the complexity of the "helper" yet again.
At no point did I deny that, as far as I know or can see.
The exact same as `itemgetter(1, 2, 3, 4)`, since `force_tuple` defaults to False.
I have trouble seeing how such interpretations can be drawn up from explicitly providing the default value for the argument. Does anyone really expect dict.get(key, None) to always return None?
The suggested API is not as straight-forward as you seem to think it is.
It's simply a proposal to fix what I see as an issue (as befits to python-ideas), you're getting way too hung up on something which can quite trivially be discussed and changed.
It also aids in the discoverability of the problem in the first place, and in limiting the surprise when unexpectedly encountering it for the first time.

On Sep 13, 11:15 pm, Masklinn <maskl...@masklinn.net> wrote:
# works the same no matter what len(indices) is slicer = operator.itemgetter(*indices, force_tuple=True)
I'd be inclined to write that as: slicer = force_tuple(operator.itemgetter(*indices)) With force_tuple then just being another decorator.

On Thu, Sep 13, 2012 at 11:15 PM, Masklinn <masklinn@masklinn.net> wrote:
Both attrgetter and itemgetter are really designed to be called with *literal* arguments, not via *args. In particular, they are designed to be useful as arguments bound to a "key" parameter, where the object vs singleton tuple distinction doesn't matter. If that behaviour is not desirable, *write a different function* that does what you want, and don't use itemgetter or attrgetter at all. These tools are designed as convenience functions for a particular use case (specifically sorting, and similar ordering operations). Outside those use cases, you will need to drop back down to the underlying building blocks and produce your *own* tool from the same raw materials. For example: def my_itemgetter(*subscripts): def f(obj): return tuple(obj[x] for x in subscripts) return f I agree attrgetter is slightly more complex due to the fact that it *also* handles chained lookups, where getattr does not, but that's a matter of making the case for providing chained lookup (or even str.format style field value lookup) as a more readily accessible building block, not for making the attrgetter API more complicated. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2012-09-14, at 13:01 , Nick Coghlan wrote:
It was my understanding that they are also designed to be useful for mapping (such a usage is shown in itemgetter's examples), which is a superset of the use case outlined here.
And save for one stumbling block, they are utilities I love for their convenience and their plain clarity of purpose.

On 14 September 2012 12:36, Masklinn <masklinn@masklinn.net> wrote:
I can see why you would expect different behaviour here, though. I tend not to think of the functions in the operator module as convenience functions but as *efficient* nameable functions referring to operations that are normally invoked with a non-function syntax. Which is more convenient out of the following: 1) using operator import operator result = sorted(values, key=operator.attrgetter('name')) 2) using lambda result = sorted(values, key=lambda v: v.name) I don't think that the operator module is convenient and I think that it damages readability in many cases. My primary reason for choosing it in some cases is that it is more efficient than the lambda expression. There is no special syntax for 'get several items as a tuple'. I didn't know about this extended use for attrgetter, itemgetter. I can't see any other functions in the operator module (abs, add, and_, ...) that extend the semantics of the operation they are supposed to represent in this way. In general it is bad to conflate scalar/sequence semantics so that a caller should get a different type of object depending on the length of a sequence. I can see how practicality beats purity in adding this feature for people who want to use these functions for sorting by a couple of elements/attributes. I think it would have been better though to add these as separate functions itemsgetter and attrsgetter that always return tuples. Oscar

On 9/14/12, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I would normally write that as from operator import attrgetter as attr ... # may use it several times result=sorted(values, key=attr('name')) which is about the best I could hope for, without being able to use the dot itself.
2) using lambda result = sorted(values, key=lambda v: v.name)
And I honestly think that would be worse, even if lambda didn't have a code smell. It focuses attention on the fact that you're creating a callable, instead of on the fact that you're grabbing the name attribute.
Yeah, but that can't really be solved well in python, except maybe by never extending an API to handle sequences. I would personally not consider that an improvement. Part of the problem is that the cleanest way to take a variable number of arguments is to turn them into a sequence under the covers (*args), even if they weren't passed that way. -jJ

On Fri, Sep 14, 2012 at 9:36 PM, Masklinn <masklinn@masklinn.net> wrote:
The "key" style usage was definitely the primary motivator, which is why the ambiguity in the *args case wasn't noticed. If it *had* been noticed, the multiple argument support likely never would have been added. As it is, the *only* case where the ambiguity causes problems is when you want to use *args with these functions. Since they weren't built with that style of usage in mind, they don't handle it well. Making them even *more* complicated to work around an earlier design mistake doesn't seem like a good idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9/13/2012 9:15 AM, Masklinn wrote:
This seems like a plausible idea. The actual C version requires one argument. The Python equivalent in the doc does not (hence the different signature), as it would return an empty tuple for empty *items. -- Terry Jan Reedy

On 13/09/12 23:15, Masklinn wrote:
For those who, like me, had to read this three or four times to work out what Masklinn is talking about, I think he is referring to the fact that attrgetter and itemgetter both return a single element if passed a single index, otherwise they return a tuple of results. If a call itemgetter(*args)(some_list) returns a tuple, was that tuple a single element (and args contained a single index) or was the tuple a collection of individual elements (and args contained multiple indexes)? py> itemgetter(*[1])(['a', ('b', 'c'), 'd']) ('b', 'c') py> itemgetter(*[1, 2])(['a', 'b', 'c', 'd']) ('b', 'c')
Why is this a problem? If you don't like writing this out in place, write it once in a helper function. Not every short code snippet needs to be in the standard library.
-1 There is no need to add extra complexity to itemgetter and attrgetter for something best solved in your code. Write a helper: def slicer(*indexes): getter = itemgetter(*indexes) if len(indexes) == 1: return lambda seq: (getter(seq), ) # Wrap in a tuple. return getter -- Steven

On 2012-09-14, at 03:20 , Steven D'Aprano wrote:
Because it adds significant complexity to the code, and that's for the trivial version of itemgetter, attrgetter also does keypath resolution so the code is nowhere near this simple. It's also anything but obvious what this snippet does on its own.
It's not really "every short code snippet" in this case, it's a way to avoid a sometimes deleterious special case and irregularity of the stdlib.
I don't agree with this statement, the stdlib flag adds very little extra complexity, way less than the original irregularity/special case and way less than necessary to do it outside the stdlib. Furthermore, it makes the solution (to having a regular output behavior for (attr|item)getter) far more obvious and makes the code itself much simpler to read.

On 14/09/12 17:43, Masklinn wrote:
I don't consider that to be *significant* complexity.
I don't understand what you mean by "keypath resolution". attrgetter simply looks up the attribute(s) by name, just like obj.name would do. It has the same API as itemgetter, except with attribute names instead of item indexes.
It's also anything but obvious what this snippet does on its own.
Once you get past the ternary if operator, the complexity is pretty much entirely in the call to itemgetter. You don't even use itemgetter in the else clause! Beyond the call to itemgetter, it's trivially simple Python code. slicer = operator.itemgetter(*indices, force_tuple=flag) is equally mysterious to anyone who doesn't know what itemgetter does.
I disagree that this is a "sometimes deleterious special case". itemgetter and attrgetter have two APIs: itemgetter(index)(L) => element itemgetter(index, index, ...)(L) => tuple of elements and likewise for attrgetter: attrgetter(name)(L) => attribute attrgetter(name, name, ...)(L) => tuple of attributes Perhaps it would have been better if there were four functions rather than two. Or if the second API were: itemgetter(sequence_of_indexes)(L) => tuple of elements attrgetter(sequence_of_names)(L) => tuple of attributes so that the two getters always took a single argument, and dispatched on whether that argument is an atomic value or a sequence. But either way, it is not what I consider a "special case" so much as two related non- special cases. But let's not argue about definitions. Special case or not, can you demonstrate that the situation is not only deleterious, but cannot be reasonably fixed with a helper function? Whenever you call itemgetter, there is no ambiguity because you always know whether you are calling it with a single index or multiple indexes.
Whether or not it is empirically less than the complexity already there in itemgetter, it would still be adding extra complexity. It simply isn't possible to end up with *less* complexity by *adding* features. (Complexity is not always a bad thing. If we wanted to program in something simple, we would program using a Turing machine.) The reader now has to consider "what does the force_tuple argument do?" which is not necessarily trivial nor obvious. I expect a certain number of beginners who don't read documentation will assume that you have to do this: slicer = itemgetter(1, 2, 3, force_tuple=False) if they want to pass something other than a tuple to slicer. Don't imagine that adding an additional argument will make itemgetter and attrgetter *simpler* to understand. To me, a major red-flag for your suggested API can be seen here: itemgetter(1, 2, 3, 4, force_tuple=False) What should this do? I consider all the alternatives to be less than ideal: - ignore the explicit keyword argument and return a tuple anyway - raise an exception To say nothing of more... imaginative... semantics: - return a list, or a set, anything but a tuple - return a single element instead of four (but which one?) The suggested API is not as straight-forward as you seem to think it is.
The only thing I will grant is that it aids in discoverability of a solution: you don't have to think of the (trivial) solution yourself, you just need to read the documentation. But I don't see either the problem or the solution to be great enough to justify adding an argument, writing new documentation, and doubling the number of tests for both itemgetter and attrgetter. -- Steven

On 2012-09-14, at 11:02 , Steven D'Aprano wrote
It takes dotted paths, not just attribute names
I would expect either foreknowledge or reading up on it to be obvious in the context of its usage.
Which conflict for a sequence of length 1, which is the very reason why I started this thread.
Which as usual hinges on the definition of "reasonably", of course the situation can be "fixed" (with "reasonably" being a wholly personal value judgement) with a helper function or a reimplementation of an (attr|item)getter-like function from scratch. As it can pretty much always be. I don't see that as a very useful benchmark.
Whenever you call itemgetter, there is no ambiguity because you always know whether you are calling it with a single index or multiple indexes.
That is not quite correct, even ignoring that you have to call `len` to do so when the indices are provided by a third party, the correct code gets yet more complex as the third party could provide an iterator which would have to be reified before being passed to len(), increasing the complexity of the "helper" yet again.
At no point did I deny that, as far as I know or can see.
The exact same as `itemgetter(1, 2, 3, 4)`, since `force_tuple` defaults to False.
I have trouble seeing how such interpretations can be drawn up from explicitly providing the default value for the argument. Does anyone really expect dict.get(key, None) to always return None?
The suggested API is not as straight-forward as you seem to think it is.
It's simply a proposal to fix what I see as an issue (as befits to python-ideas), you're getting way too hung up on something which can quite trivially be discussed and changed.
It also aids in the discoverability of the problem in the first place, and in limiting the surprise when unexpectedly encountering it for the first time.

On Sep 13, 11:15 pm, Masklinn <maskl...@masklinn.net> wrote:
# works the same no matter what len(indices) is slicer = operator.itemgetter(*indices, force_tuple=True)
I'd be inclined to write that as: slicer = force_tuple(operator.itemgetter(*indices)) With force_tuple then just being another decorator.

On Thu, Sep 13, 2012 at 11:15 PM, Masklinn <masklinn@masklinn.net> wrote:
Both attrgetter and itemgetter are really designed to be called with *literal* arguments, not via *args. In particular, they are designed to be useful as arguments bound to a "key" parameter, where the object vs singleton tuple distinction doesn't matter. If that behaviour is not desirable, *write a different function* that does what you want, and don't use itemgetter or attrgetter at all. These tools are designed as convenience functions for a particular use case (specifically sorting, and similar ordering operations). Outside those use cases, you will need to drop back down to the underlying building blocks and produce your *own* tool from the same raw materials. For example: def my_itemgetter(*subscripts): def f(obj): return tuple(obj[x] for x in subscripts) return f I agree attrgetter is slightly more complex due to the fact that it *also* handles chained lookups, where getattr does not, but that's a matter of making the case for providing chained lookup (or even str.format style field value lookup) as a more readily accessible building block, not for making the attrgetter API more complicated. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2012-09-14, at 13:01 , Nick Coghlan wrote:
It was my understanding that they are also designed to be useful for mapping (such a usage is shown in itemgetter's examples), which is a superset of the use case outlined here.
And save for one stumbling block, they are utilities I love for their convenience and their plain clarity of purpose.

On 14 September 2012 12:36, Masklinn <masklinn@masklinn.net> wrote:
I can see why you would expect different behaviour here, though. I tend not to think of the functions in the operator module as convenience functions but as *efficient* nameable functions referring to operations that are normally invoked with a non-function syntax. Which is more convenient out of the following: 1) using operator import operator result = sorted(values, key=operator.attrgetter('name')) 2) using lambda result = sorted(values, key=lambda v: v.name) I don't think that the operator module is convenient and I think that it damages readability in many cases. My primary reason for choosing it in some cases is that it is more efficient than the lambda expression. There is no special syntax for 'get several items as a tuple'. I didn't know about this extended use for attrgetter, itemgetter. I can't see any other functions in the operator module (abs, add, and_, ...) that extend the semantics of the operation they are supposed to represent in this way. In general it is bad to conflate scalar/sequence semantics so that a caller should get a different type of object depending on the length of a sequence. I can see how practicality beats purity in adding this feature for people who want to use these functions for sorting by a couple of elements/attributes. I think it would have been better though to add these as separate functions itemsgetter and attrsgetter that always return tuples. Oscar

On 9/14/12, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I would normally write that as from operator import attrgetter as attr ... # may use it several times result=sorted(values, key=attr('name')) which is about the best I could hope for, without being able to use the dot itself.
2) using lambda result = sorted(values, key=lambda v: v.name)
And I honestly think that would be worse, even if lambda didn't have a code smell. It focuses attention on the fact that you're creating a callable, instead of on the fact that you're grabbing the name attribute.
Yeah, but that can't really be solved well in python, except maybe by never extending an API to handle sequences. I would personally not consider that an improvement. Part of the problem is that the cleanest way to take a variable number of arguments is to turn them into a sequence under the covers (*args), even if they weren't passed that way. -jJ

On Fri, Sep 14, 2012 at 9:36 PM, Masklinn <masklinn@masklinn.net> wrote:
The "key" style usage was definitely the primary motivator, which is why the ambiguity in the *args case wasn't noticed. If it *had* been noticed, the multiple argument support likely never would have been added. As it is, the *only* case where the ambiguity causes problems is when you want to use *args with these functions. Since they weren't built with that style of usage in mind, they don't handle it well. Making them even *more* complicated to work around an earlier design mistake doesn't seem like a good idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (7)
-
alex23
-
Jim Jewett
-
Masklinn
-
Nick Coghlan
-
Oscar Benjamin
-
Steven D'Aprano
-
Terry Reedy