[Python-3000] Adding sorting to generator comprehension

Talin talin at acm.org
Wed Apr 26 09:47:00 CEST 2006


Ian Bicking <ianb <at> colorstudy.com> writes:

> 
> Josiah Carlson wrote:
> > One of the features of generator expressions which makes it desireable
> > instead of list comprehensions is that generator expressions may use
> > less memory *now*, and may be able to start returning results *now*.
> > 
> > Using (<genexp> orderby ...) as a replacement for sorted((genexp), key=...)
> > is a bit misleading because while the original generator expression
> > could have been space and time efficient, the orderby version certainly may
> > not be.
> 
> Certainly it changes the performance substantially (of course if the 
> expression is translated and executed elsewhere the performance can 
> actually be improved, so it can go both ways).  Since list 
> comprehensions are planned to just be syntactic sugar for generator 
> comprehension, generator comprehensions are now the more fundamental 
> construct.
> 
> But yeah, it is a little awkward, since something that is sorted can be 
> returned as a list anyway, except for the fact that the expression 
> itself could be ported off elsewhere which isn't normal Python.  (But 
> should be normal Python!)
> 

I think that I am (mostly) in agreement with Ian on this one, but perhaps with
different reasoning.

There seems to be a common feeling on py-dev that "list comprehensions are just
a special case of generators expressions, so really, we don't need them".

But then we turn around and say "Well, this feature might be handy for list
comprehensions, but since list comprehensions are based on generator
expressions, and since this feature would make generator expressions
inefficient, we won't do it."

In other words, we dismiss any evidence that disproves our initial assumption.
This is sort of like saying that we don't need the 'def' statement, since you
can define functions with "func = lambda"; and also we shouldn't add multi-line
statements, since lambda doesn't support them.

(Of course, I am overstating the argument so that I can make the point.)

There are a number of tendancies which I would caution programmers to be wary of:

'Overfactoring' - also known as 'excessive reductionism'. It is the tendancy to
try and boil down everything into one mathematically perfect form, of which all
real programs are just special cases. Often you run into two problems which are
superficially similar, but have a subtle distinction; overfactoring tends to
blur and hide that distinguishing line. The result is an algorithm that solves
neither problem well.

'Simplifying the implementation design instead of the usage design' - The subtle
distinction between list comprehensions and generator expressions has a
not-so-subtle effect on how they are used. Our primary goal should be to
optimize those usage patterns; Minimizing the number of special cases in the
interpreter should take second or third place behind that goal, IMHO.

The fact is that sorting happens a *lot*. Sorting with key= happens a lot. A
large proportion of my list comprehensions involve sorting or ordering of one
form or another, and the number of ugly sorted + key + lambda expressions is
enough to convince me that an "orderby" or "ascending" or whatever clause would
be a welcome addition.

-- Talin




More information about the Python-3000 mailing list