[Python-3000] characters data type
Josiah Carlson
jcarlson at uci.edu
Thu May 4 03:43:46 CEST 2006
"Guido van Rossum" <guido at python.org> wrote:
> OK, point taken, for this particular set of parameters (building a 16
> MB string from 1K identical blocks).
>
> But how much slower will the list.append version be if the blocks are
> 10 bytes instead of 1024? That could make a huge difference. (In fact,
> I timed something similar to what you posted, and the doubling
> approach is actually faster when the buffer is 256 bytes or less.
>
> My conclusion: we need to agree on a real benchmark before giving up.
In my programming efforts, I've found two cases which use
''.join(strings) quite often:
1. socket reads
2. content generation
In the socket reading case, I tend to use s.read(4096) or so, though I
have seen calls in the 512-65536 range. Generally it all depends on how
much data the particular application expects to be reading at any one
time.
Also in my experience, content generation tends to be a bunch of
relatively small strings (maybe 10-100 bytes), which also tends to kill
the string += operation.
Regardless of what does end up being faster in a microbenchmark (which I
agree we should have to compare and contrast certain approaches from a
performance perspective), from a memory allocator perspective, the fewer
reallocs that are necessary to come up with a single string-like
representation of the data, I think, the better, as reallocs do tend to
fragment address space (an issue I've had to deal with recently).
> > Note that removing the string[:] copy in the list.append
> > version only reduces the running time by about .07 seconds.
>
> That's because a string slice that returns the whole string is
> optimized to an INCREF operation. So you were really copying the same
> buffer over and over, which adds to locality and makes a huge
> difference in memory performance.
Good point. Making the input 1025 bytes, and performing block[:-1]
resulted in a running time of 13.94 seconds.
Doing a similar thing for the x += x case, making it x += x[:-1], pushed
that one to 11.69 seconds.
And finally, doing the same thing with the array version, x.extend(x[:-1])
gets 11.68 seconds.
- Josiah
More information about the Python-3000
mailing list