[Python-Dev] io.BytesIO slower than monkey-patching io.RawIOBase

Nick Coghlan ncoghlan at gmail.com
Tue Jul 17 07:48:44 CEST 2012


On Tue, Jul 17, 2012 at 2:57 PM, John O'Connor <jxo6948 at rit.edu> wrote:
>>
>> The second approach is consistently 10-20% faster than the first one
>> (depending on input) for trunk Python 3.3
>>
>
> I think the difference is that StringIO spends extra time reallocating
> memory during the write loop as it grows, whereas bytes.join computes
> the allocation size first since it already knows the final length.

BytesIO is actually missing an optimisation that is already used in
StringIO: the StringIO C implementation uses a fragment accumulator
internally, and collapses that into a single string object when
getvalue() is called. BytesIO is still using the old
"resize-the-buffer-as-you-go" strategy, and thus ends up repeatedly
reallocating the buffer as the data sequence grows incrementally.

It should be optimised to work the same way StringIO does (which is
effectively the same way that the monkeypatched version works)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list