[Python-3000] characters data type

Josiah Carlson jcarlson at uci.edu
Thu May 4 03:43:46 CEST 2006


"Guido van Rossum" <guido at python.org> wrote:
> OK, point taken, for this particular set of parameters (building a 16
> MB string from 1K identical blocks).
> 
> But how much slower will the list.append version be if the blocks are
> 10 bytes instead of 1024? That could make a huge difference. (In fact,
> I timed something similar to what you posted, and the doubling
> approach is actually faster when the buffer is 256 bytes or less.
> 
> My conclusion: we need to agree on a real benchmark before giving up.

In my programming efforts, I've found two cases which use
''.join(strings) quite often:

1. socket reads
2. content generation


In the socket reading case, I tend to use s.read(4096) or so, though I
have seen calls in the 512-65536 range.  Generally it all depends on how
much data the particular application expects to be reading at any one
time.

Also in my experience, content generation tends to be a bunch of
relatively small strings (maybe 10-100 bytes), which also tends to kill
the string += operation.


Regardless of what does end up being faster in a microbenchmark (which I
agree we should have to compare and contrast certain approaches from a
performance perspective), from a memory allocator perspective, the fewer
reallocs that are necessary to come up with a single string-like
representation of the data, I think, the better, as reallocs do tend to
fragment address space (an issue I've had to deal with recently).


> > Note that removing the string[:] copy in the list.append
> > version only reduces the running time by about .07 seconds.
> 
> That's because a string slice that returns the whole string is
> optimized to an INCREF operation. So you were really copying the same
> buffer over and over, which adds to locality and makes a huge
> difference in memory performance.

Good point.  Making the input 1025 bytes, and performing block[:-1]
resulted in a running time of 13.94 seconds.

Doing a similar thing for the x += x case, making it x += x[:-1], pushed
that one to 11.69 seconds.

And finally, doing the same thing with the array version, x.extend(x[:-1])
gets 11.68 seconds.


 - Josiah



More information about the Python-3000 mailing list