Why doesn't join() call str() on its arguments?

Dima Dorfman dima at trit.invalid
Fri Feb 18 01:54:21 CET 2005

On 2005-02-18, Andy Dustman <farcepest at gmail.com> wrote:
> The reason it does this is exactly why you said: It iterates over the
> sequence and gets the sum of the lengths, adds the length of n-1
> separators, and then allocates a string this size. Then it iterates
> over the list again to build up the string.

The other (and, I suspect, the real) reason for materializing the
argument is to be able to call unicode.join if it finds Unicode
elements in the sequence. If it finds such an element, unicode.join
has to be called on the entire sequence; the part already accumulated
can't be used because unicode.join wants to call PyUnicode_FromObject
on all the elements. Since it can't know whether the original argument
is reiterable, it has to keep around the materialized sequence.

> For generators, you'd have to make a trial allocation and start
> appending stuff as you go, periodically resizing. This *might* end up
> being more efficient in the case of generators, but the only way to
> know for sure is to write the code and benchmark it.

Even if it's not faster, it should use about half as much memory for
non-sequence arguments. That can be a big win if elements are being
generated on the fly (e.g., it's a generator that does something other
than just iterate over an existing sequence).

More information about the Python-list mailing list