generator expressions: performance anomaly?

Sun Jan 16 07:18:23 EST 2005

"John Machin" <sjmachin at lexicon.net> wrote in message
news:1105854381.117059.59940 at c13g2000cwb.googlegroups.com...
> Please consider the timings below, where a generator expression starts
> out slower than the equivalent list comprehension, and gets worse:
>
> >python -m timeit -s "orig=range(100000)" "lst=orig[:];lst[:]=(x for x
> in orig)"
 . . .
> >python -m timeit -s "orig=range(200000)" "lst=orig[:];lst[:]=(x for x
> in orig)"

This has nothing to do with genexps and everything to do with list slice
assignment.

List slice assignment is an example of a tool with a special case optimization
for inputs that know their own length -- that enables the tool to pre-allocate
its result rather than growing and resizing in spurts.  Other such tools include
tuple(), map() and zip().

The incredulous tone of the posting suggests a presumption that genexps always
outperform list comps.  That is not the case. Sometimes you want a list instead
of a generator because you want more than just iteration; perhaps the consumer
function may be able to take advantage of length reporting, indexing, slicing,
re-iterability or other useful list behaviors.

Also, the timing jig bites.  Use xrange() instead of range().  Otherwise,
there's an immediate forfeit of the memory savings normally being sought by the
use of generators and iterators.

> Background: There was/is a very recent thread about ways of removing
> all instances of x from a list. /F proposed a list comprehension to
> build the result list.

FWIW, deques have a natural idiom for this kind of situation:

    for i in xrange(len(mydeque)):
        x = mydeque.popleft()
        if some_condition(x):
            mydeque.append(x)

> Given a requirement to mutate the original list,
> this necessitates the assignment to lst[:].

Sometimes people introduce arbitrary requirements and twist themselves in knots
when real applications tend to allow a wider range of alternative solutions.
There is a funky code smell emanating from any code resembling
a[:]=some_iterator.  It would make me examine how 'a' is being used and check
whether the surrounding code can use the iterator or an new object.

> Comments, clues, ... please.

As a point of interest, note that Py2.4 improved some of its built-in iterators
to report their length if requested.  Now you now why.

>>> d = dict(a=1, b=2, c=3, d=4)
>>> len(d.iteritems())
4

Raymond Hettinger