generator expressions: performance anomaly?

Sun Jan 16 16:51:49 EST 2005

On Sun, 16 Jan 2005 12:18:23 GMT, "Raymond Hettinger"
<vze4rx4y at verizon.net> wrote:

>"John Machin" <sjmachin at lexicon.net> wrote in message
>news:1105854381.117059.59940 at c13g2000cwb.googlegroups.com...
>> Please consider the timings below, where a generator expression starts
>> out slower than the equivalent list comprehension, and gets worse:
>>
>> >python -m timeit -s "orig=range(100000)" "lst=orig[:];lst[:]=(x for x
>> in orig)"
> . . .
>> >python -m timeit -s "orig=range(200000)" "lst=orig[:];lst[:]=(x for x
>> in orig)"
>
>This has nothing to do with genexps and everything to do with list slice
>assignment.
>
>List slice assignment is an example of a tool with a special case optimization
>for inputs that know their own length -- that enables the tool to pre-allocate
>its result rather than growing and resizing in spurts.  Other such tools include
>tuple(), map() and zip().
>

My reading of the source: if the input is not a list or tuple, a
(temporary) tuple is built from the input, using PySequence_Tuple() in
abstract.c. If the input cannot report its own length, then that
function resorts to "growing and resizing in spurts", using the
following code:

		if (j >= n) {
			if (n < 500)
				n += 10;
			else
				n += 100;
			if (_PyTuple_Resize(&result, n) != 0) {

Perhaps it could be changed to use a proportional increase, like
list_resize() in listobject.c, which advertises (amortised) linear
time. Alternative: build a temporary list instead?