# Optimizing list processing

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Dec 12 13:08:58 CET 2013

```On Wed, 11 Dec 2013 21:26:51 -0500, Terry Reedy wrote:

> On 12/11/2013 6:54 PM, Steven D'Aprano wrote:
>> I have some code which produces a list from an iterable using at least
>> one temporary list, using a Decorate-Sort-Undecorate idiom.
>
> It is a non-standard version thereof, as DSU usually means to decorate
> with a key that gets discarded.

I do decorate with a key that gets discarded. The actual values that are
being sorted are the enumerated values 0, 1, 2, ... and the temporary key
values come from the iterable.

Confused yet? :-)

> A couple of examples of input and expected output would have been good
> ;-).

Only if I wanted you to understand the algorithm, which isn't really
important. My question is about swapping between in-place modifications
and external copying. If I wanted to be really mysterious (annoying), I
could have just give pseudo-code :-)

Please don't focus on the algorithm I gave. Focus on the fact that I
could write it like this:

if some condition to do with the computer's available memory:
make modifications in place
else:
make a copy of the data containing the modifications

if only I knew what that condition was and how to test for it.

>> The algorithm looks something like this (simplified):
>>
>> table = sorted([(x, i) for i,x in enumerate(iterable)])
>
> This makes two temporaries when only one is needed, and shows why we

You may be right in this particular case, I don't know I haven't tried
it, but there are cases where list comps are significantly faster than
generator expressions. For example, when passed to str.join, list comps
are (on my computer) consistently 15% faster:

[steve at ando ~]\$ python3.3 -m timeit "''.join([chr(n) for n in range(1,
201)])"
10000 loops, best of 3: 100 usec per loop
[steve at ando ~]\$ python3.3 -m timeit "''.join([chr(n) for n in range(1,
201)])"
10000 loops, best of 3: 94.1 usec per loop
[steve at ando ~]\$ python3.3 -m timeit "''.join(chr(n) for n in range(1,
201))"
10000 loops, best of 3: 116 usec per loop
[steve at ando ~]\$ python3.3 -m timeit "''.join(chr(n) for n in range(1,
201))"
10000 loops, best of 3: 109 usec per loop

> table = sorted((x, i) for i,x in enumerate(iterable)) is equivalent to
> the table; table.sort lines below.
>
> The following (untested) should be faster as it avoids tuple unpacking
> and repacking.

I'm not terribly fussed about micro-optimizations here. I'm concerned
about *macro-optimizing* the case where creating two (or more) lists
forces the computer to page memory in and out of virtual memory. Saving a
few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about.

[...]
> I expect the list comp be faster than in-place as long as the result
> list can be allocated and held in memory without paging. (This of course
> depends on system memory and other memory uses.)

Exactly!

> List iterators have a
> __length_hint__ method giving the length of the underlying list, so the
> list comp result list can potentially be allocated once and then filled
> in by enumeration and replacement, but in C rather than Python code.

I don't want to pre-allocate the list comp. I want to avoid having to
have two LARGE lists in memory at the same time, one containing the
decorated values, and one not. I know that this is a good optimization
for large enough lists. On my computer, ten million items is sufficient
to demonstrate the optimization, and with sufficient testing I could
determine a rough cut-off value, below which list comps are more
efficient and above which in-place modifications are better.

But I don't know how to generalize that cut-off value. If I buy extra
RAM, the cut-off value will shift. If I run it on another computer, it
will shift.

P.S. The algorithm I'm working on is a way of generating index and rank
tables. Not that it really matters -- what matters is determining whether
or not to shift from "make a copy of the list" to "modify the list in
place".

--
Steven

```