StringChain -- a data structure for managing large sequences of chunks of bytes
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Fri Mar 12 03:52:30 EST 2010
On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote:
> Folks:
>
> Every couple of years I run into a problem where some Python code that
> worked well at small scales starts burning up my CPU at larger scales,
> and the underlying issue turns out to be the idiom of accumulating data
> by string concatenation.
I don't mean to discourage you, but the simple way to avoid that is not
to accumulate data by string concatenation.
The usual Python idiom is to append substrings to a list, then once, at
the very end, combine into a single string:
accumulator = []
for item in sequence:
accumulator.append(process(item))
string = ''.join(accumulator)
> It just happened again
> (http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
> to make the data accumulator efficient without introducing a bunch of
> bugs into the surrounding code.
I'm sorry, I don't agree about that at all. I've never come across a
situation where I wanted to use string concatenation and couldn't easily
modify it to use the list idiom above.
[...]
> Here are some benchmarks generated by running python -OOu -c 'from
> stringchain.bench import bench; bench.quick_bench()' as instructed by
> the README.txt file.
To be taken seriously, I think you need to compare stringchain to the
list idiom. If your benchmarks favourably compare to that, then it might
be worthwhile.
--
Steven
More information about the Python-list
mailing list