[Python-ideas] Optional kwarg making attrgetter & itemgetter always return a tuple

Fri Sep 14 11:02:54 CEST 2012

On 14/09/12 17:43, Masklinn wrote:
> On 2012-09-14, at 03:20 , Steven D'Aprano wrote:
>>> This means such code, for instance code "slicing" a matrix of some sort
>>> to get only some columns and getting the slicing information from its
>>> caller (in situation where extracting a single column may be perfectly
>>> sensible) will have to implement a manual dispatch between a "manual"
>>> getitem (or getattr) and an itemgetter (resp. attrgetter) call, e.g.
>>>
>>>      slicer = (operator.itemgetter(*indices) if len(indices)>   1
>>>                else lambda ar: [ar[indices[0]])
>>
>>
>> Why is this a problem?
>
> Because it adds significant complexity to the code,

I don't consider that to be *significant* complexity.

> and that's for the
> trivial version of itemgetter, attrgetter also does keypath resolution
> so the code is nowhere near this simple.

I don't understand what you mean by "keypath resolution". attrgetter
simply looks up the attribute(s) by name, just like obj.name would do. It
has the same API as itemgetter, except with attribute names instead of
item indexes.

> It's also anything but obvious what this snippet does on its own.

Once you get past the ternary if operator, the complexity is pretty much
entirely in the call to itemgetter. You don't even use itemgetter in the
else clause! Beyond the call to itemgetter, it's trivially simple Python
code.

slicer = operator.itemgetter(*indices, force_tuple=flag)

is equally mysterious to anyone who doesn't know what itemgetter does.

>> If you don't like writing this out in place, write
>> it once in a helper function. Not every short code snippet needs to be in
>> the standard library.
>
> It's not really "every short code snippet" in this case, it's a way to
> avoid a sometimes deleterious special case and irregularity of the stdlib.

I disagree that this is a "sometimes deleterious special case". itemgetter
and attrgetter have two APIs:

itemgetter(index)(L) => element
itemgetter(index, index, ...)(L) => tuple of elements

and likewise for attrgetter:

attrgetter(name)(L) => attribute
attrgetter(name, name, ...)(L) => tuple of attributes

Perhaps it would have been better if there were four functions rather than
two. Or if the second API were:

itemgetter(sequence_of_indexes)(L) => tuple of elements
attrgetter(sequence_of_names)(L) => tuple of attributes

so that the two getters always took a single argument, and dispatched on
whether that argument is an atomic value or a sequence. But either way,
it is not what I consider a "special case" so much as two related non-
special cases.

But let's not argue about definitions. Special case or not, can you
demonstrate that the situation is not only deleterious, but cannot be
reasonably fixed with a helper function?

Whenever you call itemgetter, there is no ambiguity because you always know
whether you are calling it with a single index or multiple indexes.

>>> This makes for more verbose and less straightforward code, I think it
>>> would be useful to such situations if attrgetter and itemgetter could be
>>> forced into always returning a tuple by way of an optional argument:
>>
>> -1
>>
>> There is no need to add extra complexity to itemgetter and attrgetter for
>> something best solved in your code.
>
> I don't agree with this statement, the stdlib flag adds very little
> extra complexity, way less than the original irregularity/special case

Whether or not it is empirically less than the complexity already there in
itemgetter, it would still be adding extra complexity. It simply isn't
possible to end up with *less* complexity by *adding* features.

(Complexity is not always a bad thing. If we wanted to program in something
simple, we would program using a Turing machine.)

The reader now has to consider "what does the force_tuple argument do?"
which is not necessarily trivial nor obvious. I expect a certain number of
beginners who don't read documentation will assume that you have to do this:

slicer = itemgetter(1, 2, 3, force_tuple=False)

if they want to pass something other than a tuple to slicer. Don't imagine
that adding an additional argument will make itemgetter and attrgetter
*simpler* to understand.

To me, a major red-flag for your suggested API can be seen here:

itemgetter(1, 2, 3, 4, force_tuple=False)

What should this do? I consider all the alternatives to be less than
ideal:

- ignore the explicit keyword argument and return a tuple anyway
- raise an exception

To say nothing of more... imaginative... semantics:

- return a list, or a set, anything but a tuple
- return a single element instead of four (but which one?)

The suggested API is not as straight-forward as you seem to think it is.

> and way less than necessary to do it outside the stdlib. Furthermore, it
> makes the solution (to having a regular output behavior for
> (attr|item)getter) far more obvious and makes the code itself much simpler
> to read.

The only thing I will grant is that it aids in discoverability of a
solution: you don't have to think of the (trivial) solution yourself, you
just need to read the documentation. But I don't see either the problem
or the solution to be great enough to justify adding an argument, writing
new documentation, and doubling the number of tests for both itemgetter and
attrgetter.

-- 
Steven