Python's simplicity philosophy

Mon Nov 17 05:45:46 EST 2003

Bengt Richter wrote:
   ...
>>Nope -- itertools is not about CONSUMERS of iterators, which this one
>>would be.  All itertools entries RETURN iterators.

> Ok, but what about returning an iterator -- e.g., funumerate(f, seq) --
> that supplies f(x),x pairs like enumerate(seq) supplies i,x?

That doesn't seem to conflict with Raymond Hettinger's vision (he's the
itertools guru), but you might check with him directly.

> [I'd suggest extending enumerate, but I already want to pass optional
> [range parameters there,
> so one could control the numeric values returned, e.g.,
> enumerate(seq,<params>) corresponding to zip(xrange(<params>),seq))].

I know Raymond wants to be able to control the starting point if Guido
will allow that extension (RH is also enumerate's author) but I don't
know if he's interested in the stride and stop values, too.

> [BTW, should xrange() default to xrange(0,sys.maxint,1)?]

I doubt xrange has enough of a future to be worth extending it.  I'd
rather have an irange returning an iterator (and given that itertools.count
already does basically the job you mention, treating irange without args
as an error may be more useful).  No special reason to single out sys.maxint
when the int/long distinction is rapidly withering, btw.

>>Given that the sort method of lists now has an optional key= argument, I

Sorry, I try to remember to always say something like "now (in the 2.4
pre-alpha on CVS)" but sometimes I do forget to repeat this every time I
mention 2.4 novelties.

> This is a new one on me:
>  >>> seq.sort(key=lambda x:x)
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
>  TypeError: sort() takes no keyword arguments

In 2.3, yes.  Get a 2.4 CVS snapshot and you'll see it work.

> Do you mean the comparison function? Or is there something else now too?
> I'm beginning to infer that key= is actually a keyword arg for a
> _function_ to get a "key" value from a composite object (in which case

Right.

> ISTM "getkeyvalue" or "valuefunc" would be a better name). But IMO "key"
> suggests it will be used on elements x like x[key], not passing a
> definition key=whatever and then using key(x) to get values.

The concept and terminology of a "sort key" or "sorting key" is very
popular, so the concision of the 'key' attribute name was chosen in
preference to your longer suggestions.  E.g., to sort a list of strings
in order of increasing string length (and otherwise stably),

  thelist.sort(key=len)

was deemed preferable to

  thelist.sort(getkeyvalue=len)

Considering that the key= parameter is meant to take the place of most
uses of DSU (and thus likely become extremely popular), I concur with
the current choice for the name.  However, it's not etched in stone yet:
2.4 is at the pre-alpha stage.  You can check the python-dev archives
for past discussions of the issue, then join python-dev to offer your
contributions, if you think they're helpful; this is a course of action
that is always open to anybody, of course.

>>think the obvious approach would be to add the same optional argument to
>>min
>>and max with exactly the same semantics.  I.e., just as today:
>>
>>x = max(somelist)
>>somelist.sort()
>>assert x == somelist[-1]
>>
>>we'd also have
>>
>>x = max(somelist, key=whatever)
>>somelist.sort(key=whatever)
>>assert x == somelist[-1]
> I think I like it, other than the name. Maybe  s/key/valuefunc/ ;-)

If max and min do acquire such an optional argument, it will of course
have the same name and semantics as it does for sort.

>>That would be max(seq, key=len) in my proposal.
> 
> That's a nice option for max (and min, and ??), but ISTM that it would

max and min are it (unless heapq or bisect grow new _classes_ where such
comparison-flexibility would also be natural to have; in the current,
function-centered state of heapq, such an optional argument would not
fit well, and the situation for bisect is not crystal-clear either way).

> also be nice to have a factory for efficient iterators of this kind.
> It would probably be pretty efficient then to write
> 
>     maxlen, maxitem = max(funumerate(len,seq))

Yes, roughly as efficient as it is today (2.4, again) to write the
less concise
    max(izip(imap(len, seq), seq))
[for a RE-iterable seq] using more general itertools entries.  [If
seq is not necessarily reiterable, you'd need to add a tee to this;
again, see the itertools in the current 2.4 snapshot].

However, this does not deal with the issue of seq items needing
to be comparable (and at reasonable cost, too) when they have the
same value for len(item).  max and min with an optional argument
would know to ONLY compare the "sort key", just like sort does,
NEVER the items themselves.  So, to fully emulate the proposed
max and min, you need to throw in an enumerate as well.

> or
> 
>     def longest(seq):
>         return max(funumerate(len,seq))[-1]
> 
> and it would be available as infrastructure for other efficient loops in
> addition to being tied in to specific sequence processors like max and
> min.

The only issue here is whether izip(imap(len, seq), seq) is frequent
enough to warrant a new itertools entry.  As it cannot take the place
of the proposed optional argument to min and max, it doesn't really
affect them directly.  The general interest of
    izip(imap(len, seq), enumerate(seq))
is, I think, too low to warrant a special entry just to express it.

Alex