[Python-ideas] Optional kwarg making attrgetter & itemgetter always return a tuple

Masklinn masklinn at masklinn.net
Fri Sep 14 11:29:47 CEST 2012


On 2012-09-14, at 11:02 , Steven D'Aprano wrote
>> and that's for the
>> trivial version of itemgetter, attrgetter also does keypath resolution
>> so the code is nowhere near this simple.
> 
> I don't understand what you mean by "keypath resolution". attrgetter
> simply looks up the attribute(s) by name, just like obj.name would do. It
> has the same API as itemgetter, except with attribute names instead of
> item indexes.

It takes dotted paths, not just attribute names

>> It's also anything but obvious what this snippet does on its own.
> 
> Once you get past the ternary if operator, the complexity is pretty much
> entirely in the call to itemgetter. You don't even use itemgetter in the
> else clause! Beyond the call to itemgetter, it's trivially simple Python
> code.
> 
> slicer = operator.itemgetter(*indices, force_tuple=flag)
> 
> is equally mysterious to anyone who doesn't know what itemgetter does.

I would expect either foreknowledge or reading up on it to be obvious
in the context of its usage.

>>> If you don't like writing this out in place, write
>>> it once in a helper function. Not every short code snippet needs to be in
>>> the standard library.
>> 
>> It's not really "every short code snippet" in this case, it's a way to
>> avoid a sometimes deleterious special case and irregularity of the stdlib.
> 
> 
> I disagree that this is a "sometimes deleterious special case". itemgetter
> and attrgetter have two APIs:
> 
> itemgetter(index)(L) => element
> itemgetter(index, index, ...)(L) => tuple of elements
> 
> and likewise for attrgetter:
> 
> attrgetter(name)(L) => attribute
> attrgetter(name, name, ...)(L) => tuple of attributes
> 
> Perhaps it would have been better if there were four functions rather than
> two. Or if the second API were:
> 
> itemgetter(sequence_of_indexes)(L) => tuple of elements
> attrgetter(sequence_of_names)(L) => tuple of attributes
> 
> so that the two getters always took a single argument, and dispatched on
> whether that argument is an atomic value or a sequence. But either way,
> it is not what I consider a "special case" so much as two related non-
> special cases.

Which conflict for a sequence of length 1, which is the very reason
why I started this thread.

> But let's not argue about definitions. Special case or not, can you
> demonstrate that the situation is not only deleterious, but cannot be
> reasonably fixed with a helper function?

Which as usual hinges on the definition of "reasonably", of course the
situation can be "fixed" (with "reasonably" being a wholly personal
value judgement) with a helper function or a reimplementation of an
(attr|item)getter-like function from scratch. As it can pretty much
always be. I don't see that as a very useful benchmark.

> Whenever you call itemgetter, there is no ambiguity because you always know
> whether you are calling it with a single index or multiple indexes.

That is not quite correct, even ignoring that you have to call `len` to
do so when the indices are provided by a third party, the correct code
gets yet more complex as the third party could provide an iterator which
would have to be reified before being passed to len(), increasing the
complexity of the "helper" yet again.

>>>> This makes for more verbose and less straightforward code, I think it
>>>> would be useful to such situations if attrgetter and itemgetter could be
>>>> forced into always returning a tuple by way of an optional argument:
>>> 
>>> -1
>>> 
>>> There is no need to add extra complexity to itemgetter and attrgetter for
>>> something best solved in your code.
>> 
>> I don't agree with this statement, the stdlib flag adds very little
>> extra complexity, way less than the original irregularity/special case
> 
> Whether or not it is empirically less than the complexity already there in
> itemgetter, it would still be adding extra complexity. It simply isn't
> possible to end up with *less* complexity by *adding* features.

At no point did I deny that, as far as I know or can see.

> (Complexity is not always a bad thing. If we wanted to program in something
> simple, we would program using a Turing machine.)
> 
> The reader now has to consider "what does the force_tuple argument do?"
> which is not necessarily trivial nor obvious. I expect a certain number of
> beginners who don't read documentation will assume that you have to do this:
> 
> slicer = itemgetter(1, 2, 3, force_tuple=False)
> 
> if they want to pass something other than a tuple to slicer. Don't imagine
> that adding an additional argument will make itemgetter and attrgetter
> *simpler* to understand.
> 
> 
> To me, a major red-flag for your suggested API can be seen here:
> 
> itemgetter(1, 2, 3, 4, force_tuple=False)
> 
> What should this do?

The exact same as `itemgetter(1, 2, 3, 4)`, since `force_tuple` defaults
to False.

> I consider all the alternatives to be less than
> ideal:
> 
> - ignore the explicit keyword argument and return a tuple anyway
> - raise an exception
> 
> To say nothing of more... imaginative... semantics:
> 
> - return a list, or a set, anything but a tuple
> - return a single element instead of four (but which one?)

I have trouble seeing how such interpretations can be drawn up from
explicitly providing the default value for the argument. Does anyone
really expect dict.get(key, None) to always return None?

> The suggested API is not as straight-forward as you seem to think it is.

It's simply a proposal to fix what I see as an issue (as befits to
python-ideas), you're getting way too hung up on something which can
quite trivially be discussed and changed.

>> and way less than necessary to do it outside the stdlib. Furthermore, it
>> makes the solution (to having a regular output behavior for
>> (attr|item)getter) far more obvious and makes the code itself much simpler
>> to read.
> 
> The only thing I will grant is that it aids in discoverability of a
> solution

It also aids in the discoverability of the problem in the first place, and
in limiting the surprise when unexpectedly encountering it for the first
time.




More information about the Python-ideas mailing list