Performance of lists vs. list comprehensions

Tue Jan 19 16:20:42 EST 2010

That's surprising. I wouldn't implement it that way at all.  I'd use a
dynamically-expanding buffer as I suggested.  That way you get a
single pass and don't have to calculate anything before you begin.  In
the best case, you'd use half the memory (the result just fits in the
buffer after its last expansion and no saved tuple).  In the worst
case, the memory use is about the same (you just expanded the buffer
using a 2x expansion rule then you hit the last item).

Still I suppose the author thought of that approach and rejected it
for reasons I can't yet see.

On Tue, Jan 19, 2010 at 4:01 PM, Arnaud Delobelle
<arnodel at googlemail.com> wrote:
> Gerald Britton <gerald.britton at gmail.com> writes:
>
>> [snip]
>>
>>>
>>> Yes, list building from a generator expression *is* expensive. And
>>> join has to do it, because it has to iterate twice over the iterable
>>> passed in: once for calculating the memory needed for the joined
>>> string, and once more to actually do the join (this is implementation
>>> dependent, of course). If the iterable is a list already, the list
>>> building is not needed.
>>
>> if join has to iterate twice over the iterable, how does this work?
>>
>> $ python3.1
>> Python 3.1.1+ (r311:74480, Nov  2 2009, 14:49:22)
>> [GCC 4.4.1] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> l = map(str, (x for x in range(10) if int(x)%2))
>>>>> '.'.join(l)
>> '1.3.5.7.9'
>>>>>
>>
>> If join had to iterate twice over l, it would be consumed on the first
>> pass.  If it works as you say then join would have to copy the
>> iterable on the first pass, effectively turning it into a list.
>> Though I didn't read through it, I would suppose that join could use a
>> dynamic-table approach to hold the result, starting with some
>> guesstimate then expanding the result buffer if and when needed.
>
> Looking at the source (py3k):
>
> PyObject *
> PyUnicode_Join(PyObject *separator, PyObject *seq)
> {
>    [skip declarations]
>
>    fseq = PySequence_Fast(seq, "");
>    if (fseq == NULL) {
>        return NULL;
>    }
>
>    [code that works out the length of the joined string then allocates
>     memory, then fills it]
> }
>
> Where PySequence_Fast(seq, "") returns seq if seq is already a tuple or
> a list and otherwise returns a new tuple built from the elements of seq.
>
> So no, it doesn't guess the size of the joined string and yes, it
> iterates twice over the "sequence" (I would have thought it should be
> called an iterable) by copying it into a tuple.
>
> --
> Arnaud
> --
> http://mail.python.org/mailman/listinfo/python-list
>

-- 
Gerald Britton